You are on page 1of 10

An Overview of Turbo Codes and Their Applications

(Invited Paper)
Claude Berrou, Ramesh Pyndiah, Patrick Adde, Catherine Douillard and Raphal Le Bidan
GET/ENST Bretagne, Laboratoire TAMCIC (UMR CNRS 2872), PRACom
Technople Brest Iroise, CS 83818, 29238 Brest Cedex 3, FRANCE
E-mail: {firstname.lastname}@enst-bretagne.fr
Abstract More than ten years after their introduction,
Turbo Codes are now a mature technology that has been
rapidly adopted for application in many commercial
transmissions systems. This paper provides an overview of
the basic concepts employed in Convolutional and Block
Turbo Codes, and review the major evolutions in the field
with an emphasis on practical issues such as implementation
complexity and high-rate circuit architectures. We address
the use of these technologies in existing standards and also
discuss future potential applications for this error-control
coding technology.

I. INTRODUCTION
Error-control codes, also called error-correcting codes
or channel codes, are a fundamental component of
virtually every digital transmission system in use today.
Channel coding is accomplished by inserting controlled
redundancy into the transmitted digital sequence, thus
allowing the receiver to perform a more accurate decision
on the received symbols and even correct some of the
errors made during the transmission. In his landmark
1948 paper that pioneered the field of Information
Theory, Claude E. Shannon proved the theoretical
existence of good error-correcting codes that allow data
to be transmitted virtually error-free at rates up to the
absolute maximum capacity (usually measured in bits per
second) of a communication channel, and with
surprisingly low transmitted power (in contrast to
common belief at that time). However Shannons work
left unanswered the problem of constructing such
capacity-approaching channel codes. This problem has
motivated intensive research efforts during the following
four decades, and has led to the discovery of fairly good
codes, usually (but not always see convolutional codes
for example) obtained from sophisticated algebraic
constructions. However, 3 dB or more stood between
what the theory promised and the practical performance
offered by error-correcting codes in the early 90s.
The introduction of Convolutional Turbo Codes (CTC)
in 1993 [1,2], quickly followed by the invention of Block
Turbo Codes (BTC) in 1994 [3,4], closed much of the
remaining gap to capacity. Today, advanced Forward
Error Correction (FEC) systems employing Turbo Codes
commonly approach Shannons theoretical limit within a
few tenths of a decibel. Practical implications are
numerous. Using Turbo Codes, a system designer can for
example achieve a higher throughput (by a factor 2 or
more) for a given transmitted power, or, alternatively,
achieve a given data rate with reduced transmitted

energy. Historically, Turbo Codes were first deployed for


satellite links and deep-space missions, where they
offered impressive Bit-Error Rate (BER) performance
beyond existing levels with no additional power
requirement (a premium resource for satellites). Since
then, they have made their way in 3G wireless phones,
Digital Video Broadcast (DVB) systems, or Wireless
Metropolitan Area Networks (WMAN). They are also
considered for adoption in several emerging standards
including enhanced versions of Wi-Fi networks.
A decade after the discovery of Turbo Codes, this
paper provides an overview of this advanced FEC
technology. The next two sections review the basic
concepts and the major evolutions in the field for both
Convolutional and Block Turbo Codes. Practical issues
relevant to the system designer such as implementation
complexity and high-rate circuit architectures are also
addressed, and the use of Turbo Codes in existing
standards is discussed. Some personal views about the
next evolutions expected in the field of channel coding
are finally proposed in conclusion.
II. CONVOLUTIONAL TURBO CODES
Classical Convolutional Turbo Codes, also called
Parallel Concatenated Convolutional Codes (PCCC),
result from a pragmatic construction conducted by C.
Berrou and A. Glavieux, based on the intuitions of G.
Battail [5], J. Hagenauer and P. Hoeher [6], who, in the
late 80s, highlighted the interest of introducing
probabilistic processing in digital communications
receivers. Previously, other researchers including P. Elias
[7], R. G. Gallager [8] and M. Tanner [9] had already
imagined coding and decoding systems closely related to
the principles of Turbo Codes.
A. Principles of Turbo Codes
The classical Turbo Code is shown in Fig. 1 and
consists of the parallel concatenation of two binary
Recursive Systematic Convolutional (RSC) codes C1 and
C2 separated by a permutation (interleaver) . Serial
concatenation is also possible [10] (with its own pros and
cons) but will not be discussed here. RSC codes are a key
component of Turbo Codes. They are based on Linear
Feedback Shift-Registers (LFRS) and act as pseudorandom scramblers. RSC codes offer several advantages
in comparison with classical non-recursive nonsystematic convolutional codes. First, they resemble
random codes, and it is known from Shannons

pioneering work that random-like codes are the key to


approach capacity. In addition, they perform better than
classical convolutional codes at low signal to noise ratios
[2]. Finally, RSC codes have the interesting property that
only a small fraction of finite weight information
sequences yields finite weight (low redundancy) coded
sequences at the encoders output. These particular
sequences are called Return To Zero (RTZ) sequences in
the literature and play a fundamental role in the
asymptotic performance of the Turbo Code [11,12].

Fig. 1. The classical Turbo Code.

Optimal decoding of the overall Turbo Code is not


possible in practice due to a prohibitive number of states
to consider. Instead, a clever divide-and-conquer strategy
with manageable complexity and near-optimum
performance is applied at the receiver side. Turbo
decoding relies on the exchange of probabilistic
messages between two Soft-Input Soft-Output (SISO)
decoders. Usually (but not necessarily), probabilistic
information is expressed in Log Likelihood Ratio (LLR)
form. Denoting by Pr{d=1} the probability that, at a
given step of the decoding process, a binary datum d has
logical value 1, the LLR L(d) about d is given by:

Pr{d = 1}

L(d ) = ln
1 Pr{d = 1}

(1)

The sign of L(d) gives the hard-decision about d, while


the magnitude |L(d)| measures the reliability of this
decision. The role of a SISO decoder consists in taking
input LLR estimates about the transmitted bits and trying
to improve these estimates, using the local redundancy of
the considered component code. The output LLR
delivered by the SISO decoder may be written as:

Loutput (d ) = Linput (d ) + z (d )

(2)

The probabilistic quantity z(d) is called the extrinsic


information about bit d. This is the result of the decoders
estimation of d, but not taking its own input into account.
It is precisely this extrinsic information that is exchanged
iteratively between the two SISO decoders during the
decoding process. Subtracting the decoders input from
its output prevents the decoder from acting as a positive
feedback amplifier and introduces stability (a crucial
issue!) in the feedback process. Usually, after a given
number of iterations, one observes that the two decoders

converge towards a stable final decision for d. In practice


and depending on the nature of the SISO decoder, finetuning operations (scaling, clipping) may be applied to
the extrinsic information in order to ensure convergence
within a small number of iterations.
B. Example of performance results
Table I show some examples of performance results
for the DVB-RCS Turbo Code over an AWGN channel
using 8 iterations and 4-bit input quantization. We have
reported the Eb/N0 level (dB) required to achieve a target
Frame Error Rate (FER) of 10-4 for different code rates
and block lengths. The corresponding gap with respect
to the Sphere-Packing Bound (SPB) is also given. We
recall that the SPB provides a theoretical lower bound on
the minimum Eb/N0 required to achieve a given FER with
the best codes of a given finite block size [13]. Although
these performance do not fully reflect the current state of
the art in CTC, we observe that the DVB-RCS perform
very close (from 1.0 to 1.5 dB) to the theoretical limits
under real implementation constraints.
Rate
1/2
2/3
3/4
4/5
6/7

ATM (53 bytes)


MPEG (188 bytes)
2.3 dB ( = 1.0 dB)
1.8 dB ( = 1.0 dB)
3.3 dB ( = 1.1 dB)
2.6 dB ( = 0.9 dB)
3.9 dB ( = 1.3 dB)
3.2 dB ( = 1.1 dB)
4.6 dB ( = 1.5 dB)
3.8 dB ( = 1.2 dB)
5.2 dB ( = 1.4 dB)
4.4 dB ( = 1.1 dB)
Table I. Minimum Eb/N0 (dB) required to achieve a target
FER=10-4 with the DVB-RCS CTC over an AWGN channel.

B. Advances in the field


Many progress in the understanding and design of
CTC have been made during the last decade. Some of
these advances are surveyed in this section.
1) Low-complexity decoding algorithms
CTC were originally decoded using the BCJR-MAP
algorithm [2]. However this algorithm does not lend itself
easily to a digital hardware implementation since it
involves many real number multiplications. Efficient
implementations have been proposed that operate directly
in the logarithm domain, thereby translating
multiplications into additions (see for example [14]).
Among them, the Max-Log-Map decoding algorithm
realizes a good trade-off between performance and
complexity, with the added advantage of not requiring
any knowledge of the noise level. In addition, the
introduction of sliding-window decoding algorithm has
helped in reducing the internal memory requirements and
the decoding latency of the SISO decoders [15].
2) Stop criterion
A stop criterion facilitates the convergence of the
iterative decoding process and helps reducing the average
power consumption of the decoder by reducing the
average number of iterations required to decode a block,
without compromising performance. Various stop criteria
have been proposed for CTC over the years. As an
illustration, a detailed investigation of several stopping
rules can be found in ref. [16].

3) Circular termination of the component codes


In contrast to classical block codes, convolutional
codes are not a priori well suited for the transmission of
finite-length information sequences. Several solutions are
available to circumvent this problem. The first idea
consists in not terminating the two trellises at the end of
the encoding process. This, however, introduces some
performance loss in the decoding process (possible error
floor). Another solution, adopted in several standards
(CCSDS, UMTS), forces the termination of at least one
of the two trellises in the all-zero state. This is
accomplished by inserting additional dummy tailbits at
the end of the information sequence, thereby reducing the
overall spectral efficiency of the transmission. The final
solution consists in properly initializing the encoder
memory so that the final state of the encoder register
becomes equal to the initial state. This technique does not
require additional termination bits. The resulting code
trellis has a circular representation, hence the name
circular (or tailbiting) termination. Long known for
classical non-recursive convolutional codes, this
technique has been adapted to RSC codes in [12].
Circular RSC (CRSC) codes have several advantages
over terminated RSC codes. In particular, circular
termination guarantees a uniform protection level of all
the bits in the coded sequence since they benefit from the
whole set of redundancy. It also facilitates the design of
parallel decoding architecture. Note finally that the usual
SISO decoding algorithms can be easily accommodated
to handle the specificity of circular termination.

between double-binary CTC employing the exact MAP


or its sub-optimal Max-Log-MAP approximation,
whereas a degradation of 0.5 dB or less is usually
observed with binary CTC in the latter case. Hence
double-binary CTC provide an efficient and versatile
FEC solution with reasonable decoding complexity.

Fig 2. Performance comparison between 8-state binary (m=1)


and duo-binary (m=2) CTC of various code rates with QPSK
modulation over an AWGN channel.

4) Non-binary CTC
Classical binary CTC usually employ rate-1/2 RSC
codes. In contrast, non-binary CTC are based on parallel
concatenation of rate-m/(m+1) (usually m=2) CRSC
component codes. 8-state CTC from this family have
already been adopted in several standards, including
DVB-RCS, DVB-RCT and IEEE 802.16a standard
(WiMAX) for Wireless Metropolitan Area Networks
(WMAN). Non-binary CTC indeed exhibit remarkable
properties, as shown in [17,18]. First, we observe a better
convergence of the iterative process (less correlation
effects between the decoders). Non-binary CTC are also
less sensitive (less degradation) with respect to
puncturing patterns. These two points are illustrated in
Fig. 2 where we compare the performance of 8-states
binary (m=1) and double-binary (m=2) CTC with octal
generator polynomials (15,13) and different code rates.
The fact that the encoder and decoder non longer operate
on individual bits but rather on non-binary symbols also
introduce additional degree of freedom in the
permutation design (use of two-stage intra- and intersymbol permutation), yielding better minimum distance
and lower error floors. Architectures for non-binary
SISO decoders provide a higher throughput and reduced
latency (although the decoding operations are slightly
more complex since there are more edges to consider in
the individual trellises). Finally, non-binary SISO
decoders have increased robustness with respect to the
use of sub-optimum decoding algorithms. As shown in
Fig. 3, there is almost no performance degradation

Fig 3. Performance comparison between MAP and Max-LogMAP decoding of 8-state binary (m=1) and duo-binary (m=2)
CTC with QPSK modulation over an AWGN channel.

5) Better understanding of the code performance


Performance curves of Turbo Codes exhibit a very
characteristic behavior comprising a (usually steep) turbo
cliff (waterfall region) at low SNR, followed by a
progressive flattening of the performance curve (the socalled error-floor region) at moderate to high SNR.
The introduction of the notion of a probabilistic
uniform interleaver has facilitated the asymptotic
performance analysis of Turbo Codes and has provided
useful guidelines with respect to the choice of the
component codes [19]. More recently, several methods
have been proposed that allow to compute or at least
closely estimate the true minimum Hamming distance of
the Turbo Code [20,21]. These tools are of great practical

Application

Turbo Code

Termination Polynomials

Rates

CCSDS (deep space missions)

Binary, 16-state

Tail bits

1/6, 1/4, 1/3, 1/2

23, 33, 25, 37

UMTS, Cdma2000 (3G mobile)

Binary, 8-state

Tail bits

13, 15, 17

1/4, 1/3, 1/2

DVB-RCS (Return Channel over Satellite)

Duo-binary, 8-state

Circular

15, 13

1/3 up to 6/7

DVB-RCT (Return Channel over Terrestrial)

Duo-binary, 16-state

Circular

15, 13

1/2, 3/4

M4 (Inmarsat)

Binary, 16-state

None

23, 35

1/2

Skyplex (Eutelsat)

Duo-binary, 8-state

Circular

15, 13

4/5, 6/7

WiMAX (IEEE 802.16)


Duo-binary, 8-state
Table II. Current known applications of Convolutional Turbo Codes.

Circular

15, 13

1/2 up to 7/8

importance in order to design good permutations yielding


high minimum distance (low error floors). In parallel, the
introduction of EXtrinsic Information Transfer (EXIT)
charts [22] and other related convergence analysis
methods (density evolution, etc) has led to a better
understanding of the Turbo Codes behavior in the turbo
cliff region (convergence threshold, convergence speed).
The combination of these various tools allow the system
designer to carefully optimize the performance of his
Turbo Codes with respect to the now classical
convergence versus minimum distance dilemma
encountered with capacity-approaching codes.
6) The art of permutation design
The pseudo-random permutation is another key
component of Turbo Codes. Originally introduced to
break correlation effects during the iterative decoding
process, the permutation function has been quickly
recognized as a fundamental parameter of the code itself.
When considering very large block sizes (say 30000 bits
or more), a permutation drawn at random will yield a
good Turbo Code with high probability. To quote David
Forney (MIT): It sometimes seems that almost any
simple codes interconnected by a large pseudo-random
interleaver and decoded with sum-product decoding will
yield near-Shannon-limit performance. This is no longer
true when one aims at designing CTC operating on small
blocks with good performance (low error floor) at low
Bit-Error Rates (BER). The way the permutation is
devised (together with the choice of the component
codes) indeed fixes the minimum Hamming distance dmin
of the Turbo Code, and therefore the corresponding
achievable asymptotic coding gain Ga 10 log10(Rdmin).
Regularity of the permutation is another important factor
that should not be overlooked in practice. Indeed, the
more regular the permutation, the easier it is to conceive
high-throughput
parallel
decoding
architectures.
Designing permutations having both good structural and
spreading properties actually remains an on-going area of
research that regularly inspires new contributions.
Recently however, two permutation models have been
proposed that satisfy most of the requirements for a good
permutation. Called Dithered Relatively Prime (DRP)
permutation [23] and Almost Regular Permutation (ARP)
[24] respectively, these two simple models are actually

based on similar ideas, and combine a high-level regular


permutation with local controlled disorder. These two
solutions have good asymptotic performance at low BER
and led to very efficient hardware implementation. Note
that the ARP permutation model has been used in the
CTC adopted in the DVB-RCS and DVB-RCT standards.
C. Applications of CTC
A decade after their introduction, CTC are already in
use in several industry standards. Some of them are
described in Table II (see also the EchoStar system for
satellite TV developed by Broadcom Corp. for another
example). The corresponding four CTC commonly used
in practice are shown in Fig. 3. Let us examine the
relative merits and limitations of each of these codes.
Since the choice of a FEC system is usually dictated by
practical system constraints such as latency, residual
error rate or silicon area, we will consider here three
different FER regions corresponding to different Quality
of Service (QoS) requirements:

Medium error rates (FER > 10-4)

This is typically the domain of Automatic Repeat


reQuest (ARQ) systems and is also the more favorable
range of error rates for CTC. 8-state component codes are
sufficient to reach near-optimum performance. The
binary CTC in Fig. 3a is suitable for rates < 1/2. The duobinary code of Fig. 3b is preferable for higher rates (less
sensitivity to puncturing patterns). In both cases,
performance close to the theoretical limit is achieved
with existing silicon decoders for most coding rates and
block sizes, even the shortest.

Low error rates (10-9 < FER < 10-4)

16-state CTC are usually preferable to 8-state CTC in


this context since they offer better performance (by about
1 dB at a FER of 10-7) in this region. The choice between
the two solutions mainly depends on the desired trade-off
between performance and decoding complexity. The
corresponding Turbo Codes are shown in Fig. 3c and 3d.
Again, binary CTC are suitable for coding rates < 1/2 and
non-binary CTC should be used for higher rates. Note
also that the permutation must be very carefully designed
in order to maintain good performance at low error rates.

B
A

k binary
data

k/2 binary
couples
permutation

Y1

Y1

permutation

Y2

(a)

(b)

Y2

polynomials 15, 13 (or 13, 15)


B
X

k binary
data

k/2 binary
couples
Y1

permutation

Y1

permutation

Y2

(c)

(d)

Y2

polynomials 23, 35 (or 31, 27)

Fig. 3. The four CTC used in practice: a) 8-state binary; b) 8-state duo-binary; c) 16-state binary; d) 16-state duo-binary.

Very low error rates (FER < 10-9)

For the time being, the minimum Hamming distances


that are currently obtained with CTC cannot prevent a
change of slope in the performance curves at very low
error rates. An increase of about 25% in the minimum
distance of the code would be necessary to make CTC
attractive for those applications that operate in this errorrate region (such as optical transmission or mass storage
systems for example).
To summarize the previous discussion, 8-state CTC
are particularly appropriate for ARQ systems and short to
medium block sizes. On the other hand, 16-state CTC are
necessary for broadcast systems, long blocks, or high
coding rates. Several remaining challenges are currently
under investigation. In particular, it would be desirable to
reduce by half the number of iterations required to
achieve convergence (from 8 to 4), and to decrease the
complexity of the Max-Log-MAP decoder for 16-state
Turbo Codes.
III. BLOCK TURBO CODES
Block Turbo Codes (BTC), also called Turbo Product
Codes (TPC), offer an interesting alternative to CTC for
applications requiring either high code rates (R > 0.8),
very low error floors, or low-complexity decoders able to
operate at several hundreds of megabits per second (and
even higher).

A. Construction and iterative decoding


The general concept of Block Turbo Codes is based
on iterative SISO decoding of product codes which were
introduced by P. Elias in 1954 [7]. Product codes are
constructed by serial concatenation of two (or more)
systematic linear block codes C1 and C2 with parameters
(n1,k1,1) and (n2,k2,2), where ni, ki, and i stand for the
code length, code dimension and minimum Hamming
distance of each component code Ci. As shown in Fig. 4,
data bits are placed in a k1k2 information matrix [M] and
the rows and columns are encoded by the codes C and
C respectively, yielding a n1n2 coded matrix [C]. The
product code has length n=n1.n2, dimension k=k1.k2, and
code rate R=R1.R2 where Ri is the code rate of code Ci.
All the rows of the coded matrix are code words of C
and all columns are code words of C. It follows from
this important property that the minimum Hamming
distance of the product code is the product =1.2 of the
minimum Hamming distance i of the component codes
[4]. Hence it is easy to construct product codes with large
minimum distance, that do not suffer from error-floor
problems as may do CTC in the absence of a careful
permutation design.
In the iterative decoding process, all the rows and
columns of the received matrix are decoded sequentially
at each iteration. Thus the data bits but also the parity bits
can exploit the extrinsic information which is another
advantage of serial over parallel concatenation.

(m)

(m)

[W(m)]

[R]

[R(m)] Decoding of rows or [W(m+1)]


columns of matrix P
[R]

[R]
DELAY LINE

Fig. 5. Block diagram of the Block Turbo Decoder.

B. Examples of performance results

Fig. 4. Construction of the product code.

The basic component of the Block Turbo Decoder is


the SISO decoder used for decoding the rows and
columns of the product code. The SISO decoder consists
of a modified Chase-II soft-input hard-output decoder
[25] augmented by a soft-output computation unit. Given
a soft-input sequence R in LLR form corresponding to a
row or column of the observation matrix [R], the ChaseII decoder first forms the binary hard-decision sequence
Y from R. The reliabity of the decision on the jth coded
bit is given by the magnitude |rj| of the corresponding soft
input. 2s error patterns are generated by considering all
possible combinations of 0 and 1 in the s least reliable
bit positions. These error patterns are added to the harddecision sequence Y to form candidate sequences that are
decoded by a bounded-distance algebraic decoder. This
procedure returns a list containing at most 2s distinct
candidate code words. Among them, the code word D
at minimum euclidean distance from the observation R
is selected as the Maximum Likelihood (ML) estimate.
On the basis of this decision, soft-output computation is
performed as follows. For a given bit in position i, the list
of candidate code words is searched for a concurrent
codeword C at minimum euclidean distance from R
and such that cjdj. If such a codeword exists, the soft
output rj on this bit is given by:
R C 2 R D 2

r j =
d j
4

(3)

Otherwise, the soft output is computed empirically using:


r j = d j

(4)

where is a positive constant which increases with the


iterations. As for CTC, the extrinsic information wj about
bit j is finally obtained by subtracting the soft-input
contribution to the soft-output computed by the decoder:

w j = r j r j

(5)

The block diagram of a block turbo decoder is illustrated


in Fig. 5. [R] is the channel output LLR matrix, [W] is
the extrinsic information matrix and is another scaling
factor which increases with the iteration number m.

Performance of BCH-BTC using single- and doubleerror correcting component codes at iteration 4 are given
in Table III for QPSK modulation over an AWGN
channel. (Eb/N0) is the energy per bit to single side-band
noise spectral density ratio for a target BER of 10-4 and
(Sh) is the gap to the Sphere Packing Bound for the
same BER. Extended versions of BCH component codes
are used in order to maximize the product (R) which
guarantees maximum asymptotic gain. Sixteen test
patterns are used to generate the subset of candidate
code words based on the four (s=4) least reliable bits.
We observe that product codes using BCH component
codes with minimum Hamming distance (MHD) 4
exhibit a gap (Sh) of less than 1 dB while those with a
MHD of 6 exhibit a gap (Sh) slightly higher than 1 dB
(< 1.2 dB). Hence, BCH-BTC perform close to the
theoretical limits in both cases.
C1, C2
N
k
R
Eb/N0 (Sh)
(16,11,4)
256
121
0,473 16 3,35 0,72
(16,7,6)
256
49
0,191 36 3,70 1,04
(32,26,4)
1024
676
0,660 16 3,05 0,81
(32,21,6)
1024
441
0,431 36 2,50 1,18
(64,57,4)
4096
3249 0,793 16 3,45 0,84
(64,51,6)
4096
2601 0,635 36 2,70 1,18
(128,120,4) 16384 14400 0,879 16 4,10 0,88
(128,113,6) 16384 12769 0,779 36 3,35 1,17
(256,247,4) 65536 61009 0,931 16 4,80 0,95
Table III. Performance of different BCH-BTC at iteration 4
for a BER of 10-4 on AWGN channel using QPSK.

C. Significant advances in the field


This section reviews some of the most significant
improvements proposed for BTC over the last few years.
1) Reduced search for a concurrent codeword
A detailed analysis of the SISO decoding algorithm
shows that most of the decoding complexity lies in the
soft output computation. Further investigation have
shown that this complexity is mainly due to the search
and test (parity and metric) for a concurrent code word C
in for each component bit dj of the decision D. In order
to reduce the complexity of the decoder, a slightly
different strategy has been proposed in [26]. Instead of
searching for a concurrent code word for every
component dj of D, the simplified algorithm selects once
and for all the L code words in at minimum Euclidean
distance from the observation R in the list of candidate
code words. Computation of the soft output is then

performed by restricting the search for concurrent code


words to these L selected candidate code words. For L=1,
a single concurrent code word is considered and
complexity is divided by a factor of ~10. Metrics in Eq.
(3) are computed and for each component dj of D the
simplified algorithm applies Eqs. (3) or (4) given the
parity test (dj=cj). This concept can be extended for
higher values of L. However increasing the value of L
improves the performance of the SISO decoder at the
expense of complexity. A good trade-off between
performance and complexity is obtained for L=3 as
shown in Table IV. The results reported in this table have
been obtained with BCH(128,120,4) (i.e. extended
Hamming) component codes. Using the simplified
algorithm described above, the SISO decoder complexity
of an extended Hamming code is less than 6000 gates
and is nearly independent of the code length.
L
1
3
16
Gain (10-6)
Ref.
0.06 dB
0.13 dB
Complexity
Ref.
+13.5%
+90%
Table IV : Performance and complexity of the SISO decoder
for a BCH(128,120,4) code as a function of the number l of
considered concurrent code words.

2) Adaptive computation of the scaling factor


The main draw-back in the SISO algorithm presented
previously comes from the use of an empirical constant
scaling factor in the absence concurrent code words
during the soft-output computation operation. This rough
approximation of the soft output results in a flattening of
the BER curve at high Eb/N0. This effect is amplified
when the considered number L of concurrent code words
in is reduced. To mitigate this effect, we also proposed
in [26] to dynamically compute for each decoded row
(or column) using the following equation:

= rl
l

(5)

where denotes the subset of least reliable bit at the


decoder input. To illustrate the resulting improvement, a
comparison of the two methods is given in Fig. 6. A
BCH(32,21,6)2 product code is considered using QPSK
over an AWGN channel. The number of test patterns was
limited to 16, the number of concurrent code words was
limited to L=3, 5-bit quantization was used at the decoder
input and 4 iterations were performed. We observe a
significant improvement in coding gain (> 0.25 dB) at
low BER (10-8). Note also that the decoder performance
is very close (<0.1 dB) to the theoretical ML asymptotic
performance bound [9]. This clearly shows that the
iterative decoder is asymptotically optimal and realizes a
good trade-off between complexity and performance.
3) Stop criterion
An efficient stop criterion is easily derived based on the
particular structure of the product code. If all the rows
(resp. columns) after column (resp. row) decoding at a
given iteration are code words of C1 (resp. C2), then the
decoding algorithm has converged and the decoding
process is stopped. This stop criterion is very efficient

and the additional complexity is negligible. A detailed


study of this method can be found in [27].
BER
-2

10

Fixed Beta
Variable Beta
Lower Bound

-3

10

-4

10

-5

10

-6

10

-7

10

-8

10

2,5

3,5

Eb/No (dB)
Fig. 6. Performance comparison for the BTC(32,21,6)2
product code using fixed and variable (QPSK over AWGN).

4) Adaptation of the product code parameters


For practical applications it is very often required to
adapt the parameters (code length and dimension) of the
product code to those of the application. In [27] a method
was introduced to overcome this problem and that relies
on shortening and puncturing techniques. The idea there
is to maximize the number of dummy bits used for
shortening and also minimize the number of punctured
bits for a given set of code parameters (length, dimension
and rate). This strategy is motivated by the fact that
dummy bits are known to the decoder and thus carry a
high reliability whereas punctured bits carry no
information (zero reliability). Simulation results given in
[27] show that, within a certain limit, the modified BTC
operate within 1 dB of the theoretical limit.
5) Reed-Solomon BTC
BTC constructed form binary BCH component codes
have two important limitations. First, very large coded
blocks (> 60,000 bits) are required to achieve high code
rates (R > 0.9). This is not always compatible with
pratical systems constraints, especially when we consider
wireless transmissions over fast time-varying fading
channels. In addition, BCH-BTC combined with highorder Quadrature Amplitude Modulation (QAM) in order
to achieve spectrally-efficient communications exhibit a
gap between actual code performance and the Shannon
limit that increases with the size of the modulation
alphabet [28]. Recent research results have shown that
non-binary BTC constructed from single-error correcting
Reed-Solomon codes overcome the aforementioned
limitations of binary BCH-BTC [29,30]. In particular,
RS-BTC can achieve reliable performance (within 1 dB
of the theoretical bound) with both binary and QAM
modulation over an AWGN channel with a lowcomplexity decoder. In addition, non binary RS-BTC
outperform binary BCH-BTC of similar code rate in term
of memory size, implementation complexity and

decoding delay since they exhibit an overall smaller code


length (by a factor of ~2.8).
6) High-rate decoding architectures for BTC
Several companies including TurboConcept (France)
or ComTech AHA (USA) already provide IP cores for
BTC that can operate at several (typically 20-200)
megabits per second. However, there are specific
applications where very high speed (gigabits per second)
decoders are required. Typical examples are data storage
systems or optical transmission. In order to meet such
throughput constraints, the trivial solution consists in
duplicating decoders operating in parallel with an
appropriate scheduling scheme (Mux and Demux). This
solution can be extremely expensive in the case of turbo
codes as turbo decoders are generally more complex than
their classical counterparts. Furthermore, for turbo codes
with large block size very large RAMs (Random Access
memory) are required to store channel data and extrinsic
information. Thus, duplicating turbo codes implies
duplicating RAMs which is not very cost efficient.
An innovative solution was proposed in [31] for
increasing the turbo decoding speed of BTC. In the
following, we distinguish here between the memory size
and the Processing Unit (PU) used to perform the SISO.
In the case of product codes, all the rows (or columns) at
a given decoding step m can be decoded independently.
The idea here is to use several PU in parallel each
decoding a different row (or column) of the matrix. By
using p parallel PU, the processing speed can be
increased by p without increasing the memory size. A
classical solution would be to read and store the data at a
speed p times faster than the decoding speed of one PU.
A more elegant solution is to read the p data elements
simultaneously but this approach is not possible with
classical RAM technology. In the innovative solution we
propose to break down the memory in p2 blocks and
organize the data appropriately so that, at each clock
cycle, the data elements for the p decoders are read
simultaneously (note that the concept is identical for the
store operation).
This new architecture has been studied in [31] in terms
of trade-off between complexity and decoding speed.
Table V below compares different candidate
architectures that yield a factor p2 on the decoding speed
of the BTC. It is clear from this table that the innovative
architecture provides an extremely attractive solution for
high data rate applications from all point of views (RAM,
PU complexity and decoding delay). The main limitation
here comes from the maximum number of samples
processed simultaneously in the PU. The objective
currently is to design processing units capable of
processing 8 samples per clock cycle. Thus with a 100
MHz clock cycle and p=8, a decoding speed of 6.4
Gigabits per second becomes feasible.

Trivial
Classical Innovativ
(PU)1
(PU)1
e (PU)p
Data rate
p2
p2
p2
2
RAM
p
p
1
PU complexity
p2
p2
p2/2
Decoding delay
1
#1/p
#1/p2
Table V. Comparison of different architectures for high datarate decoding of BTC.

E. BTC in the standards


BTC are currently in use in several proprietary satellite
(VSAT) transmission systems. In addition, they have
been adopted in 2001 as an optional FEC system for both
the uplink and downlink of the IEEE 802.16 standard
(WiMAX). The product code standardized in WiMAX is
obtained by serial concatenation of two identical
extended Hamming component codes. A straightforward
shortening strategy (deleting the first rows and columns
of the product code matrix) is applied in order to match
the required block size. The mother extended Hamming
code is either the (64,57,4) or the (32,26,4) code. The
resulting BTC configurations are given in Table VI.
More recently, BTC have been selected by the HomePlug
Powerline Alliance as part of the FEC system for
broadband home networking over the power line [32]. In
this context, BTC are used to protect sensitive frame
control data from the errors caused by severe impulsive
noise events.
Product Code
Rate
Payload size
1024 bits
(39,32)(39,32) [Downlink] 0.673
3136 bits
(53,46)(51,44) [Downlink] 0.749
0.608
456 bits
(30,24)(25,9) [Uplink]
Table VI. BTC configurations for IEEE 802.16.

It has been recently discovered that both binary and


non-binary (RS) BTC also offer near-channel capacity
performance over the Binary Symmetric Channel (BSC)
and Binary Erasure Channel (BEC) models [28,30].
Combined with the fact that BTC exhibit large minimum
distance (low error floors) by construction, this result
suggest that high code-rate BTC is a promising lowcomplexity FEC for applications where soft outputs may
not be available at the channel output due to economical
(data storage systems) and/or technological (optical
transmissions) reasons.
VI. CONCLUSION
The introduction of Turbo Codes in 1993 really took
the channel coding community by surprise and initially
raised a lot of skepticism. Ten years later, Turbo Codes
are now a mature technology that has found its way in
practical industry standards. When the aim is to approach
closely the performance limit promised by Information
Theory, the iterative decoding concept offers
considerable savings (by several orders of magnitude)
compared to a single code. Furthermore, the additional
complexity required by turbo decoding compared to the
simple 16 or 64-state Viterbi decoders abundantly used
over the last two decades, seems to be quite compatible

with the continuing progress in microelectronics. Higher


circuit frequencies and larger possibilities of parallelism
may even reduce the latency problem (a real weak point
of Turbo Codes) down to a negligible level for most
applications. And beyond the simple introduction of a
new error-correcting solution, the Turbo Principle (i.e.
the way to process data iteratively in receivers so that no
information is wasted) has also opened up a new way of
thinking in the construction of communication
algorithms.
ACKNOWLEDGEMENT
The authors wish to acknowledge the assistance of all
their colleagues at ENST Bretagne who contributed to
development and promotion of Turbo Codes, as well as
France Telecom R&D for its continuous financial support
over the years.
REFERENCES
[1] C. Berrou, A. Glavieux and P. Thitimajshima, "Near
Shannon limit error-correcting coding and decoding:
Turbo Codes," in Proc. IEEE Int. Conf. Commun. ICC93,
Geneva, Switzerland, May 1993, pp. 1064-1070.
[2] C. Berrou and A. Glavieux, "Near optimum error
correcting and decoding: Turbo Codes," IEEE Trans.
Commun., vol. 44, no. 10, Oct. 1996, pp. 1261-1271.
[3] R. Pyndiah, A. Glavieux, A. Picart and S. Jacq, "Near
optimum decoding of product codes," in Proc. IEEE
Global Telecommun. Conf. GLOBECOM94, San
Francisco, CA, Dec. 1994, pp. 339-343.
[4] R. Pyndiah, "Near optimum decoding of product codes:
Block Turbo Codes," IEEE Trans. Commun., vol. 46, no.
8, Aug. 1998, pp. 1003-1010.
[5] G. Battail, "Coding for the Gaussian channel: the promise
of weighted-output decoding," Intl J. Sat. Commun., vol.
7, 1989, pp. 183-192.
[6] J. Hagenauer and P. Hoeher, "A Viterbi algorithm with
soft-decisions outputs and its applications," in Proc. IEEE
Global Telecommun. Conf. GLOBECOM89, Dallas, TX,
Nov. 1989, pp. 47.11-47.17.
[7] P. Elias, "Error-free coding," IRE Trans. Inform. Theory,
vol. 4, no. 4, Sept. 1954, pp. 29-39.
[8] R. G. Gallager, "Low-density parity-check codes," IRE
Trans. Inform. Theory, vol. IT-8, Jan. 1962, pp. 21-28.
[9] R. M. Tanner, "A recursive approach to low-complexity
codes," IEEE Trans. Inform. Theory, vol. 27, Sept. 1981,
pp. 543-547.
[10] S. Benedetto, D. Divsalar, G. Montorsi and F. Pollara,
"Serial concatenation of interleaved codes: performance
analysis, design and iterative decoding," IEEE Trans.
Inform. Theory, vol. 44, no. 3, May 1998, pp. 909-926.
[11] S. Dolinar and D. Divsalar, "Weight distributions for
Turbo Codes using random and nonrandom permutations,"
JPL TDA Progress Report, vol. 42-122, 15 Aug. 1995.
[12] C. Berrou, C. Douillard and M. Jzquel, "Multiple
parallel concatenation of circular recursive systematic
convolutional (CRSC) codes," Ann. Tlcommun., vol. 54,
no 3-4, Mar.-Apr. 1999, pp. 166-172.
[13] S. Dolinar, D. Divsalar and F. Pollara, "Code performance
as a function of block size," JPL TMO Progress Report,
vol. 42-133, May 1998.
[14] P. Robertson, P. Hoeher and E. Villebrun, "Optimal and
suboptimal maximum a posteriori algorithms suitable for
turbo decoding," European Trans. Telecommun., vol. 8,
Mar.-Apr. 1997, pp. 119-125.

[15] S. Benedetto, D. Divsalar, G. Montorsi and F. Pollara, "A


soft-input soft-output maximum a posteriori (MAP)
module to decode parallel and serial concatenated codes,"
JPL TDA Progress Report, vol. 42-127, Nov. 1996.
[16] A. Matache, S. Dolinar and F. Pollara, "Stopping rules for
turbo decoders," JPL TMO Progress Report, vol. 42-142,
Aug. 2000.
[17] C. Berrou, M. Jzquel, C. Douillard and S. Kroudan,
"The advantages of non binary turbo codes," in Proc.
IEEE Inform. Theory Workshop ITW01, Cairns, Australia,
Sept. 2001, pp. 61-63.
[18] C. Douillard and C. Berrou, "Turbo Codes with ratem/(m+1) constituent convolutional codes," To appear in
IEEE Trans. Commun., 2005-2006.
[19] S. Benedetto and G. Montorsi, "Unveiling Turbo Codes:
Some results on parallel concatenated coding schemes,"
IEEE Trans. Inform. Theory, vol. 42, no. 2, Mar. 1996, pp.
409-428.
[20] R. Garello, R. Pierleoni and S. Benedetto, "Computing the
free distance of Turbo Codes and serially concatenated
codes with interleavers," IEEE J. Select Areas Commun.,
vol. 19, no. 5, May 2001, pp. 800-812.
[21] C. Berrou, S. Vaton, M. Jzquel and C. Douillard,
"Computing the minimum distance of linear codes by the
error impulse method," Proc. IEEE Global Telecommun.
Conf. GLOBECOM02, Taipei, China, vol. 2, Nov. 2002,
pp. 1017-1020.
[22] S. ten Brink, "Convergence behavior of iteratively decoded
parallel concatenated codes," IEEE Trans. Commun., vol.
49, no. 10, Oct. 2001, pp. 1727-1737.
[23] S. Crozier and P. Guinand, "Distance upper bounds and
true minimum distance results for Turbo Codes with DRP
interleavers," Proc. 3rd Int. Symp. on Turbo Codes &
Related Topics ISTC03, Brest, France, Sept. 2003, pp.
169-172.
[24] C. Berrou, Y. Saouter, C. Douillard, S. Kroudan and M.
Jzquel, "Designing good permutations for Turbo Codes:
Towards a single model," Proc. IEEE Int. Conf. Commun.
ICC04, Paris, France, vol. 1, Jun. 2004, pp. 341-345.
[25] D. Chase, "A class of algorithms for decoding block codes
with channel measurement information," IEEE Trans.
Inform. Theory, vol. 18, no. 1, Jan. 1972, pp. 170-182.
[26] P. Adde and R. Pyndiah, "Recent simplifications and
improvements of Block Turbo Codes," Proc. 2nd Int.
Symp. on Turbo Codes & Related Topics ISTC00, Brest,
France, Sept. 2000, pp. 133-136.
[27] R. Pyndiah, "Iterative decoding of product codes: Block
Turbo Codes," Proc. 1st Int. Symp. on Turbo Codes &
Related Topics ISTC97, Brest, France, Sept. 1997, pp. 7179.
[28] R. Pyndiah and P. Adde, "Performance of high code rate
BTC for non traditional applications," Proc. 3rd Int. Symp.
on Turbo Codes & Related Topics ISTC03, Brest, France,
Sept. 2003, pp. 157-160.
[29] R. Zhou, A. Picart, R. Pyndiah and A. Goalic, "Reliable
transmission with low-complexity Reed-Solomon Block
Turbo Codes," Proc. IEEE 1st Int. Symp. on Wireless
Commun. Systems ISWCS04, Mauritius, Sept. 2004, pp.
193-197.
[30] R. Zhou, A. Picart, R. Pyndiah and A. Goalic, "Potential
applications of low-complexity non-binary high-rate Block
Turbo Codes," Proc. IEEE Military Commun. Conf.
MILCOM04, Monterey, CA, Oct. 2004.
[31] J. Cuevas, P. Adde, S. Kroudan and R. Pyndiah, "New
architecture for high rate turbo decoding of product
codes," Proc. IEEE Global Telecommun. Conf.
GLOBECOM02, Taipei, China, vol. 2, Nov. 2002, pp.
1363-1367.
[32] HomePlug 1.0 Technical White Paper. [Online] Available:
http://www.homeplug.com .

You might also like