Professional Documents
Culture Documents
TELECOMMUNICATIONS
VOLUME 2
WILEY ENCYCLOPEDIA OF TELECOMMUNICATIONS
TELECOMMUNICATIONS
VOLUME 2
John G. Proakis
Editor
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic,
mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States
Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate
per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-750-4470, or on
the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John
Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, e-mail: permreq@wiley.com.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make
no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any
implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should
consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care Department within the U.S. at
877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in
electronic format.
10 9 8 7 6 5 4 3 2 1
D
DATA COMPRESSION (or approximately restored) by the source decoder. The
channel code achieves the goal of reliable transmission:
JOHN KIEFFER The channel encoder inserts a small amount of redundancy
University of Minnesota in the channel input stream that will allow the channel
Minneapolis, Minnesota decoder to correct the transmission errors in that stream
that are caused by the channel.
2. Combined Code Design. One code (as in Fig. 1)
1. INTRODUCTION
is designed to accomplish the twin goals of bandwidth
efficiency and reliable transmission. Clearly, combined
A modern-day data communication system must be
code design is more general than separated code design.
capable of transmitting data of all types, such as text,
However, previously separated codes were preferred to
speech, audio, image or video data. The block diagram in
combined codes in data communication system design.
Fig. 1 depicts a data communication system, consisting of
There were two good reason for this: (a) Claude Shannon
source, encoder, channel, and decoder:
showed that if the data sequence is sufficiently long,
The source generates the data sequence that is to
and if the probabilistic models for the source and the
be transmitted through the data communication system.
channel are sufficiently simple, then there is no loss in the
The encoder converts the data sequence into a binary
bandwidth versus reliability tradeoff that is achievable
codeword for transmission through the channel. The
using separated codes of arbitrary complexity instead of
decoder generates a reconstructed data sequence that
combined codes of arbitrary complexity; and (b) the code
may or not be equal to the original data sequence.
design problem is made easier by separating it into the two
The encoder/decoder pair in Fig. 1 is the code of the
decoupled problems of source code design and channel code
data communication system. In Fig. 1, the source and
design. For the communication of short data sequences, or
channel are fixed; the choice of code is flexible, in order
for the scenario in which the complexity of the code is to be
to accomplish the twin goals of bandwidth efficiency and
constrained, there can be an advantage to using combined
reliable transmission, described as follows:
codes as opposed to separated codes; consequently, there
much attention has focused on the combined code design
1. Bandwidth efficiency — the portion of the available
problem since the mid-1980s. At the time of the writing of
channel bandwidth that is allocated in order
this article, however, results on combined code design are
to communicate the given data sequence should
somewhat isolated and have not yet been combined into a
be economized.
nice theory. On the other hand, the two separate theories
2. Reliable transmission — the reconstructed data of source code design and channel code design are well
sequence should be equal or sufficiently close to developed. The purpose of the present article is to provide
the original data sequence. an introduction to source code design.
Unfortunately, these are conflicting goals; less use of In source code design, one can assume that the
bandwidth makes for less reliable transmission, and communication channel introduces no errors, because
conversely, more reliable transmission requires the use the purpose of the channel code is to correct whatever
of more bandwidth. It is the job of the data communication channel errors occur. Thus, we may use Fig. 3 below, which
system designer to select a code that will yield a good contains no channel, as the conceptual model guiding
tradeoff between these two goals. Code design is typically source code design.
done in one of the following two ways. The system in Fig. 3 is called a data compression
system — it consists of the source and the source code
1. Separated Code Design. Two codes are designed, consisting of the (source encoder, source decoder) pair.
a source code and a channel code, and then the source The data sequence generated by the source is random
code and the channel code are cascaded together. Figure 2 and is denoted X n ; the notation X n is a shorthand for the
illustrates the procedure. The source code is the pair following random sequence of length n:
consisting of the source encoder and source decoder; the
channel code is the (channel encoder, channel decoder) X n = (X1 , X2 , . . . , Xn ) (1)
pair. The source code achieves the goal of bandwidth
efficiency: The source encoder removes a large amount of The Xi values (i = 1, 2, . . . , n) are the individual data
redundancy from the data sequence that can be restored samples generated by the source. In Fig. 3, BK is a
X̂ n . The sequence X̂ n is a deterministic function of BK , and 1.4. Lossless and Lossy Compression Systems
therefore also a deterministic function of X n ; X̂ n is random
and its distribution can be computed from the distribution Two types of data compression systems are treated in this
of X n . Notationally article: lossless compression systems and lossy compression
systems. In a lossless compression system, the set An of
X̂ n = (X̂1 , X̂2 , . . . , X̂n ) (3) possible data sequence inputs to the system is finite, the
encoder is a one-to-one mapping (this means that there is
where each X̂i (i = 1, 2, . . . , n) is an approximation to Xi . a one-to-one correspondence between data sequences and
their binary codewords), and the decoder is the inverse of
1.1. Mathematical Description of Source the encoder; thus, in a lossless compression system, the
random data sequence X n generated by the source and its
We need to give a formal mathematical model describing
reconstruction X̂ n at the decoder output are the same:
the probabilistic nature of the source in Fig. 3. Accordingly,
a source is defined to be a triple [n, An , Pn ], where Pr[X n = X̂ n ] = 1
• n is a positive integer that is the length of the random In a lossy compression system, two or more data sequences
data sequence X n generated by the source [n, An , Pn ]. in An are assigned the same binary codeword, so that
• An is the set of all sequences of length n which
are realizations of X n . (The set An models the set Pr[X n = X̂ n ] > 0
of all possible deterministic sequences that could be
Whether one designs a lossless or lossy compression
processed by the data compression system driven by
system depends on the type of data that are to
the source [n, An , Pn ].)
be transmitted in a data communication system. For
• Pn denotes the probability distribution of X n ; it is a example, for textual data, lossless compression is used
probability distribution on An . We have because one typically wants perfect reconstruction of the
transmitted text; on the other hand, for image data, lossy
Pr[X n ∈ Sn ] = Pn (Sn ), Sn ⊂ An
compression would be appropriate if the reconstructed
Pr[X n ∈ An ] = Pn (An ) = 1 image is required only to be perceptually equivalent to the
original image.
The alphabet of the source [n, An , Pn ] is the smallest set This article is divided into two halves. In the first half,
A such that An ⊂ An , where An denotes the set of all we deal with the design of lossless codes, namely, source
sequences of length n whose entries come from A. For codes for lossless compression systems. In the second half,
DATA COMPRESSION 633
design of lossy codes (source codes for lossy compression binary codeword for each sequence in xn ∈ An and how to
systems) is considered. compute xn from its codeword. We will specify each lossless
code discussed in this section using either method 1 or 2,
2. LOSSLESS COMPRESSION METHODOLOGIES depending on which method is more convenient. Method
1 is particularly convenient if the codeword lengths are
In this section, we shall be concerned with the problem known in advance, since, as pointed out earlier, a tree
of designing lossless codes. Figure 4 depicts a general can be constructed that yields the codewords. Method 2 is
lossless compression system. In Fig. 4, the pair consisting more convenient if the data length n is large (which makes
of source encoder and source decoder is called a lossless the storing of an encoding table impractical).
code. A lossless code is completely determined by its
source encoder part, since the source decoder is the inverse 2.1. Entropy Bounds
mapping of the source encoder. It is helpful to understand the entropy upper and lower
Let [n, An , Pn ] be a given source with finite alphabet. bounds on the performance of lossless codes. With these
When a lossless code is used to compress the data bounds, one can determine before designing a lossless code
generated by the source [n, An , Pn ], a lossless compression what kind of performance it is possible for such a code to
system Sn results as in Fig. 4. The effectiveness of the achieve, as well as what kind of performance it is not
lossless code is then evaluated by means of the figure possible to achieve.
of merit The entropy of the source [n, An , Pn ] is defined by
R(Sn ) = n−1
Pn (xn )K(xn )
xn ∈An Hn = (− log2 Pn (xn ))Pn (xn )
xn ∈An
where K(xn ) is the length of the codeword assigned by the
lossless code to the data sequence xn ∈ An . The figure of Suppose that the random data sequence generated by the
merit R(Sn ) is called compression rate and its units are source [n, An , Pn ] is compressed by an arbitrary lossless
‘‘code bits per data sample.’’ An efficient lossless code for code and let Sn be the resulting lossless compression
compressing data generated by the source [n, An , Pn ] is a system (as in Fig. 4). Then, the compression rate is known
lossless code giving rise to a compression system Sn for to satisfy the relationship
which the compression rate R(Sn ) is minimized or nearly
minimized. In this section, we put forth various types of Hn
R(Sn ) ≥ (5)
lossless codes that are efficient in this sense. n
In lossless code design, we make the customary
assumption that the codewords assigned by a lossless code Conversely, it is known that there exists at least one
must satisfy the prefix condition, which means that no lossless code for which
codeword is a prefix of any other codeword. If K1 , K2 , . . . , Kj Hn + 1
are the lengths of the codewords assigned by a lossless R(Sn ) ≤ (6)
n
code, then Kraft’s inequality
Assume that the data length n is large. We can combine the
2−K1 + 2−K2 + · · · + 2−Kj ≤ 1 (4) bounds (5) and (6) to assert that a lossless code is efficient
if and only if the resulting compression rate satisfies
must hold. Conversely, if positive integers K1 , K2 , . . . , Kj
obey Kraft’s inequality, then there exists a lossless code Hn
R(Sn ) ≈
whose codewords have these lengths. In this case, one n
can build a rooted tree T with j leaves and at most two
The quantity Hn /n is called the entropy rate of the source
outgoing edges per internal vertex, such that K1 , K2 , . . . , Kj
are the lengths of the root-to-leaf paths; the codewords are [n, An , Pn ]. Our conclusion is that a lossless code is efficient
obtained by labeling the edges along these paths with 0s for compressing source data if and only if the resulting
and 1s. The tree T can be found by applying the Huffman compression rate is approximately equal to the entropy
algorithm (covered in Section 2.2) to the set of probabilities rate of the source. For the memoryless source and the
2−K1 , 2−K2 , . . . , 2−Kj . Markov source, this result can be sharpened, as the
There are two methods for specifying a lossless code for following discussion shows.
the source [n, An , Pn ]: (1) an encoding table can be given
2.1.1. Efficient Lossless Codes for Memoryless Sources.
which lists the binary codeword to be assigned to each
Assume that the given source [n, An , Pn ] is a memoryless
sequence in An (decoding is then accomplished by using
source; let A be the finite source alphabet. There is a
the encoding table in reverse), or (2) encoding and decoding
probability mass function [p(a): a ∈ A] on A such that
algorithms can be given that indicate how to compute the
n
Pn (x1 , x2 , . . . , xn ) = p(xi ), (x1 , x2 , . . . , xn ) ∈ An (7)
i=1
Source Source
Source Xn BK Xn
Let H0 be the number
encoder decoder
H0 = (− log2 p(a))p(a) (8)
Figure 4. Lossless data compression system. a∈A
634 DATA COMPRESSION
It is easy to show from (7) that the source entropy satisfies [42], a code called the Shannon–Fano code was put
forth that obeys the monotonicity principle; it assigns
Hn = nH0 a codeword to data sequence (x1 , x2 , . . . , xn ) ∈ An of length
Let H0 be the number (8) and H1 be the number P(a1 )K1 + P(a2 )K2 + · · · + P(aj )Kj
H1 = (− log2 π(a1 , a2 ))p(a1 )π(a1 , a2 ) is minimized. The Huffman algorithm constructs the
a1 ∈A a2 ∈A encoding table of the Huffman code. The Huffman
algorithm operates recursively in the following way.
It can be shown from these last two equations that First, the letters ai , aj with the two smallest probabilities
P(ai ), P(aj ) are removed from the alphabet A and
Hn = H0 + (n − 1)H1 replaced with a single ‘‘superletter’’ ai aj of probability
P(ai ) + P(aj ). Then, the Huffman code for the reduced
Thus, for large n, the entropy rate Hn /n of the source alphabet is extended to a Huffman code for the original
is approximately equal to H1 . Assuming that the data alphabet by assigning codeword w0 to ai and codeword
length n is large, we conclude that a lossless code for w1 to aj , where w is the codeword assigned to the
compressing the data generated by the stationary Markov superletter ai aj .
source [n, An , Pn ] is efficient if and only if the resulting Table 1 gives an example of the encoding table for
compression rate is approximately equal to H1 . the Huffman code for a first-order source with alphabet
In the rest of this section, we survey each of the {a1 , a2 , a3 , a4 }. It is easy to deduce that the code given by
following efficient classes of lossless codes: Table 1 yields minimum compression rate without using
the Huffman algorithm. First, notice that the codeword
• Huffman codes lengths K1 , K2 , . . . , Kj assigned by a minimum compression
• Enumerative codes rate code must satisfy the equation
• Arithmetic codes
2−K1 + 2−K2 + · · · + 2−Kj = 1 (12)
• Lempel–Ziv codes
Fix a source [n, An , Pn ] with An finite; let Sn denote a Source Letter Probability Codeword
lossless compression system driven by this source (see
Fig. 4). In 1948, Claude Shannon put forth the following 1
a1 0
monotonicity principle for code design for the system 2
Sn : The length of the binary codeword assigned to each 1
data sequence in An should be inversely related to the a2 10
5
probability with which that sequence occurs. According 3
to this principle, data sequences with low probability of a3 110
20
occurrence are assigned long binary codewords, whereas 3
data sequences with high probability of occurrence are a4 111
20
assigned short binary codewords. In Shannon’s 1948 paper
DATA COMPRESSION 635
[Kraft’s inequality (2.4) is satisfied; if the inequality were Step 3. Assign an integer weight to each vertex v of
strict, at least one of the codewords could be shortened T ∗ as follows. If v has no siblings or is further to the
and the code would not yield minimum compression rate.] left than its siblings, assign v a weight of zero. If
Since the source has only a four-letter alphabet, there v has siblings further to the left, assign v a weight
are only two possible solutions to (12), and therefore only equal to the number of leaves of T ∗ that are equal
two possible sets of codeword lengths for a first-order to or subordinate to the siblings of v that are to the
code, namely, 1,2,3,3 and 2,2,2,2. The first choice yields left of v.
compression rate (or, equivalently, expected codeword Step 4. To encode xn ∈ An , follow the root-to-leaf path
length) of in T ∗ terminating in the leaf of T ∗ labeled by xn . Let
1 12 + 2 15 + 3 20
3
+ 3 20
3
= 1.8 I be the sum of the weights of the vertices along this
root-to-leaf path. The integer I is called the index
code bits per data sample, whereas the second choice yields of xn , and satisfies 0 ≤ I ≤ N − 1. Encode xn into
the worse compression rate of 2 code bits per data sample. the binary codeword of length log2 N obtained by
Hence, the code given by Table 1 must be the Huffman finding the binary expansion of I and then padding
code. One could use the Huffman algorithm to find this that expansion (if necessary) to log2 N bits by
code. Combining the two least-probable letters a3 and a4 appending a prefix of zeros.
into the superletter a3 a4 , the following table gives the
Huffman encoder for the reduced alphabet: For example, suppose that the source [n, An , Pn ]
satisfies
Source Letter Probability Codeword An = {aaa, aba, abb, abc, baa, bba, caa, cba, cbb, cca}
(13)
1 Then steps 1–3 yield the tree T ∗ in Fig. 5, in which every
a1 0
2 vertex is labeled with its weight from step 3. [The 10
6 leaves of this tree, from left to right, correspond to the 10
a3 a4 11
20 sequences in (13), from left to right.] The I values along
1 the 10 paths are just the cumulative sums of the weights,
a2 10
5 which are seen to give I = 0,1,2,3,4,5,6,7,8,9. The codeword
length is log2 10 = 4, and the respective codewords are
(This is immediate because for a three-letter alphabet,
there is only one possible choice for the set of codeword 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001
lengths, namely, 1,2,2.) Expanding the codeword 11 for
a3 a4 into the codewords 110 and 111, we obtain the Sometimes, the tree T ∗ has such a regular structure
Huffman code in Table 1. that one can obtain an explicit formula relating a data
sequence and its index I, thereby dispensing with the
2.3. Enumerative Codes need for the tree T ∗ altogether. A good example of this
occurs with the source [n, An , P] in which An is the set of
Enumerative coding, in its present state of development, all binary sequences of length n and having a number of
is due to Thomas Cover [11]. Enumerative coding is used ones equal to m (m is a fixed positive integer satisfying
for a source [n, An , Pn ] in which the data sequences in An
are equally likely; the best lossless code for such a source
is one that assigns codewords of equal length. Here is 0
Cover’s approach to enumerative code design:
0 ≤ m ≤ n). For a data sequence (x1 , x2 , . . . , xn ) ∈ An , Cover a memoryless source [n, An , Pn ] in which An consists of all
showed that its index I is computable as sequences of length n whose entries come from the set
{0, 1, 2, . . . , j − 1}, where j is a fixed positive integer. Then,
n
j−1 for each data sequence (x1 , x2 , . . . , xn ) ∈ An , the probability
I= xj (14)
x1 + x2 + · · · + xj assigned by the source is
j=1
n
There are mn sequences in An . Therefore, Eq. (14) provides Pn (x1 , x2 , . . . , xn ) = pxi
a one-to-one correspondence between these sequences and i=1
the integers
n where p0 , p1 , . . . , pj−1 are given nonnegative numbers
0, 1, 2, 3, . . . , −1
m that sum to one. We specify the arithmetic code for
this source by describing algorithms for encoding and
Having obtained I from (x1 , x2 , . . . , xn ) via this equation, decoding. Let a0 = 0, a1 = 1. Arithmetic encoding of
(x1 , x2 , . . . , xn )is encoded by expanding integer I in binary (x1 , x2 , . . . , xn ) ∈ An takes place according to the following
out to log2 mn bits. Conversely, (x1 , x2 , . . . , xn ) is decoded three-step algorithm:
from its bit representation by first finding I, and then
finding the unique expansion of I in (14) given by the Encoding Step 1. For each i = 1, 2, . . . , n, a subin-
right-hand side. To illustrate, suppose that n = 8 and terval Ii = [ai , bi ] of [0, 1] is recursively determined
m = 4. Then, An contains 84 = 70 sequences and I can be according to the formula
any integer between 0 and 69, inclusively. Suppose I = 52.
To decode back into the data sequence in An that gave rise Ii = [ai−1 , ai−1 + (bi−1 − ai−1 )p0 ], xi = 0
to the index I = 52, the decoder finds the unique integers
0 ≤ j1 < j2 < j3 < j4 < 8 satisfying = [ai−1 + (p0 + · · · + pxi −1 )(bi−1 − ai−1 ), ai−1
+ (p0 + · · · + pxi )(bi−1 − ai−1 )], xi > 0
j j j j
52 = 1 + 2 + 3 + 4
1 2 3 4 By construction, the last interval In will have length
equal to Pn (x1 , x2 , . . . , xn ).
The solution is j1 = 1, j2 = 4, j3 = 5, and j4 = 7. The
data sequence we are looking for must have ones in Encoding Step 2. The integer
positions j1 + 1 = 2, j2 + 1 = 5, j3 + 1 = 6, j4 + 1 = 8; this is
the sequence (0,1,0,0,1,1,0,1). k = − log2 Pn (x1 , x2 , . . . , xn ) + 1
2.4.1. Precise Description. Arithmetic codes can be Example 1. Take the source alphabet to be {0, 1}, take
constructed for any source. For simplicity, we assume the probabilities defining the memoryless source to be
DATA COMPRESSION 637
p0 = 25 , p1 = 35 , and take the source to be [5, {0, 1}5 , P5 ] x3 = 1. Similarly, the decoder can determine x4 and x5 by
(known to both encoder and decoder). Suppose that the two more rounds of this procedure.
data sequence to be arithmetically encoded is (1, 0, 1,
1, 0). We need to recursively determine the intervals The reader sees from the preceding example that
I1 , I2 , I3 , I4 , I5 . We have the arithmetic code as we have prescribed it requires
ever greater precision as more and more data samples
3 2 are processed. This creates a problem if there are
I1 = right ths of [0, 1] = ,1
5 5 a large number of data samples to be arithmetically
encoded. One can give a more complicated (but less
2 2 16
I2 = left ths of I1 = , intuitive) description of the arithmetic encoder/decoder
5 5 25
that uses only finite precision (integer arithmetic is
3 62 16 used). A textbook [41] gives a wealth of detail on
I3 = right ths of I2 = ,
5 125 25 this approach.
3 346 16 2.4.2. Performance. In a sense that will be described
I4 = right ths of I3 = ,
5 625 25 here, arithmetic codes give the best possible performance,
2 346 1838 for large data length. First, we point out that for any
I5 = left ths of I4 = , source, an arithmetic code can be constructed. Let the
5 625 3125
source be [n, An , Pn ]. For (x1 , x2 , . . . , xn ) ∈ An , one can use
108 conditional probabilities to factor the probability assigned
The length of the interval I5 is . Therefore, the length to this sequence:
3125
of the binary codeword must be
n
3125 Pn (x1 , x2 , . . . , xn ) = p(x1 ) p(xi | x1 , x2 , . . . , xi−1 ).
k = log2 +1=6 i=2
108
1784 The factors on the right side are explained as follows. Let
The midpoint of the interval I5 is M = . Expanding X1 , X2 , . . . , Xn be the random data samples generated by
3125
this number in binary, we obtain the source. Then
of bits in the binary expansion of kj − 1 is log2 (kj) . Let the source. Thus, one can use the same Lempel–Ziv code
Wj be the string of bits of length log2 (kj) assigned to Ij for all sources — such a code is called a universal code [13].
as described in the preceding text. Then, the Lempel–Ziv In practical compression scenarios, the Lempel–Ziv
encoder output is obtained by concatenating together the code has been superseded by more efficient modern
strings W1 , W2 , . . . , Wt . dictionary codes, such as the YK algorithm [51].
To illustrate, suppose that the data sequence
(x1 , x2 , . . . , xn ) is binary (i.e., k = 2), and has seven
3. LOSSY COMPRESSION METHODOLOGIES
blocks B1 , B2 , . . . , B7 in its Lempel–Ziv parsing. These
blocks are assigned, respectively, strings of code
In this section, we shall be concerned with the problem
bits W1 , W2 , W3 , W4 , W5 , W6 , W7 of lengths log2 (2) =
of designing lossy codes. Recall from Fig. 3 that a lossy
1, log2 (4) = 2, log2 (6) = 3, log2 (8) = 3, log2 (10) =
compression system consists of source, noninvertible
4, log2 (12) = 4, and log2 (14) = 4. Therefore, any binary
source encoder, and source decoder. Figure 6 gives
data sequence with seven blocks in its Lempel–Ziv
separate depictions of the source encoder and source
parsing would result in an encoder output of length
decoder in a lossy compression system.
1 + 2 + 3 + 3 + 4 + 4 + 4 = 21 code bits. In particular, for
The same notational conventions introduced earlier are
the data sequence (2.17), the seven strings W1 , . . . , W7 are
in effect here: X n [Eq. (1)] is the random data sequence
[referring to (21)]
of length n generated by the source, X̂ n (1.3) is the
reconstructed data sequence, and BK (2) is the variable-
W1 = (1)
length codeword assigned to X n . The source decoder
W2 = (1, 0) component of the lossy compression system in Fig. 3 is the
cascade of the ‘‘quantizer’’ and ‘‘lossless encoder’’ blocks
W3 = (0, 1, 1) in Fig. 6a; the source decoder component in Fig. 3 is the
W4 = (0, 0, 0) lossless decoder of Fig. 6b. The quantizer is a many-to-
one mapping that converts the data sequence X n into its
W5 = (1, 0, 0, 0) reconstruction X̂ n . The lossless encoder is a one-to-one
W6 = (0, 1, 1, 0) mapping that converts the reconstructed data sequence
X̂ n into the binary codeword BK from which X̂ n can be
W7 = (0, 0, 0, 1) recovered via application of the lossless decoder to BK .
It is the presence of the quantizer that distinguishes a
Concatenating, we see that the codeword assigned to data lossy compression system from a lossless one. Generally
sequence (17) by the Lempel–Ziv encoder is speaking, the purpose of the quantizer is alphabet
reduction — by dealing with a data sequence from a
(1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1) (22) reduced alphabet instead of the original data sequence
over the original alphabet, one can hope to perform
We omit a detailed description of the Lempel–Ziv data compression using fewer code bits. For example, the
decoder. However, it is easy to see what the decoder would ‘‘rounding-off quantizer’’ is a very simple quantizer. Let
do. For example, it would be able to break up the codeword the data sequence
(22) into the separate codewords for the phrases, because,
from the size k of the data alphabet, it is known how many X n = (1.1, 2.6, 4.4, 2.3, 1.7) (23)
code bits are allocated to the encoding of each Lempel–Ziv
phrase. From the separate codewords, the decoder recovers be observed. Then, the rounding-off quantizer generates
the integer representing each phrase; dividing each of the reconstructed data sequence
these integers by k to obtain the quotient and remainder,
the pairs representing the phrases are obtained. Finally, X̂ n = (1, 3, 4, 2, 2) (24)
these pairs yield the phrases, which are concatenated
together to obtain the original data sequence. It takes only 10 code bits to compress (24), because its
entries come from the reduced alphabet {1, 2, 3, 4}; it
2.5.4. Performance. Let [n, An , Pn ] be any data source
with alphabet of size k. Let Sn be the lossless data
compression system driven by this source that employs (a)
the Lempel–Ziv code. It is known that there is a positive Lossless
Source Xn Quantizer X̂ n BK
constant Ck (depending on k but not on n) such that encoder
Hn Hn Ck
≤ R(Sn ) ≤ +
n n log2 n
(b)
where Hn is the source entropy. Thus, the Lempel–Ziv code Lossless
BK X̂ n
is not quite as good as the Huffman code or the arithmetic decoder
code, but there is an important difference. The Huffman
code and arithmetic code require knowledge of the source. Figure 6. Source encoder (a) and decoder (b) in lossy compres-
The preceding performance bound is valid regardless of sion system.
640 DATA COMPRESSION
would take many more code bits to compress the original the rate–distortion tradeoffs that are possible in lossy code
sequence (23). design is called rate–distortion theory. Section 3.1 gives an
As indicated in Fig. 6, a lossy code consists of quantizer, introduction to this subject.
lossless encoder, and lossless decoder. When a lossy code is The quantizer employed in a lossy code can be one of
used to compress the data generated by a source [n, An , Pn ], two types, either a scalar quantizer or vector quantizer.
a lossy compression system Sn results. The effectiveness A scalar quantizer quantizes one data sample at a time,
of the lossy code is then evaluated by means of two figures whereas for some m > 1, a vector quantizer quantizes
of merit, the (compression) rate R(Sn ) and the distortion m data samples at a time. Lossy codes that employ
D(Sn ), defined by scalar quantizers are called ‘‘SQ-based codes’’ and are
covered in Section 3.2; lossy codes that employ vector
E[K]
quantizers are called ‘‘VQ-based codes’’ and are covered
R(Sn ) =
n in Section 3.3. Subsequent sections deal with two other
n important lossy coding techniques, trellis-based coding,
D(Sn ) = n−1
E[d(Xi , X̂i )] and transform coding.
i=1
3.1. Distortion Bounds
In the preceding, E denotes the expected value operator, In designing a lossy code for a source [n, An , Pn ] to produce
K is the random codeword length of the codeword BK a lossy compression system Sn , the usual approach is
assigned to X n in Fig. 5a, and d is a fixed distortion the fixed-rate approach, in which one attempts to find a
function mapping pairs of source letters into nonnegative lossy code that minimizes or approximately minimizes the
real numbers. The distortion function d is typically one of distortion D(Sn ) among all lossy codes satisfying the rate
the following two types: constraint R(Sn ) ≤ R, where R > 0 is a fixed constant. We
adopt the fixed-rate approach for the rest of this article.
1. Squared-error distortion: In this section, we will give upper and lower bounds on
the distortion performance of lossy codes for a fixed source,
d(Xi , X̂i ) = (Xi − X̂i )2 subject to the constraint that the compression rate be no
more than R code bits per data sample, on average. With
2. Hamming distortion: these bounds, one can determine before designing a lossy
code what types of performance are and are not possible
d(Xi , X̂i ) = 0, Xi = X̂i for such a code to achieve. The upper and lower bounds
1, otherwise on distortion performance that shall be developed in this
section are expressible using Claude Shannon’s notion of
Squared-error distortion is typically used for an infinite distortion–rate function [43], discussed next.
source alphabet and Hamming distortion is typically used
for a finite source alphabet. When squared-error distortion 3.1.1. Distortion–Rate Function. The concept of mutual
is used, distortion is sometimes measured in decibels as information will be needed in order to define the distor-
the figure tion–rate function. Let X, Y be two random variables.
Let f (x) be the probability density function of X, and let
n
∞
−1
g(y | x) be the conditional probability density function of Y
n x2 fXi (x) dx
−∞ given X = x. Then, the mutual information I(X;Y) of X, Y
= 10 log10
i=1
[D(Sn )]dec is the number
D(Sn ) ∞ ∞
I(X; Y) = f (x)g(y | x) log2
−∞ −∞
where Xi (1 ≤ i ≤ n) represents the ith data sample
generated by the source [n, An , Pn ] and fXi denotes the g(y | x)
probability density function of Xi ; note that small D(Sn ) ×
∞
dx dy
would correspond to a large decibel measure of distortion. f (u)g(y | u)du
−∞
Suppose that two different lossy codes have been
designed to compress the random data generated by a We are now ready to define the concept of distortion–rate
given source [n, An , Pn ], resulting in lossy compression function. For simplicity, we restrict ourselves to the
systems S1n and S2n , respectively. Then, one can declare memoryless source [n, An , Pn ]. We suppose that the source
that the lossy code giving rise to system S1n is better than alphabet is a subset of the real line; let X be a random
the lossy code giving rise to system S2n if R(S1n ) < R(S2n ) variable such that the random data samples Xi (1 ≤ i ≤ n)
and D(S1n ) < D(S2n ). However, it may be that neither generated according to the memoryless source [n, An , Pn ]
lossy code is better than the other one in this sense, are independent copies of X. The distortion–rate function
since the inverse relation between rate and distortion D(R) of the memoryless source [n, An , Pn ] (for a given
precludes the design of a lossy code for which rate and nonnegative distortion function d such as Hamming
distortion are simultaneously small. Instead, for a given distortion or squared-error distortion) is then defined for
source, the design goal should be to find a lossy code each R > 0 by
that yields the smallest rate for a fixed distortion, or the
smallest distortion for a fixed rate. The theory detailing D(R) = min{E[d(X, Y)] : I(X; Y) ≤ R} (25)
DATA COMPRESSION 641
where Y denotes any random variable jointly distributed that are this efficient. This represents a clear difference
with X, and we assume that the minimum in (25) exists between lossless code and lossy code design; it is known
and is finite. how to construct efficient lossless codes, but it is a
computationally difficult problem to find efficient lossy
Example 4. One important type of memoryless source codes [18]. For example, for large n, the nth-order
for which the distortion–rate function has a closed-form Gaussian memoryless source can be encoded to yield
expression is the Gaussian memoryless source [n, An , Pn ], squared-error distortion of roughly ‘‘6 decibels per bit’’
which is the memoryless source whose independent (the distortion–rate function performance), but only since
random data samples are generated according to a 1990 has it been discovered how to find such codes for this
Gaussian distribution with mean 0 and variance σ 2 . simple source model [29].
For the Gaussian memoryless source with squared-
error distortion 3.2. SQ-Based Codes
D(R) = σ 2 2−2R (26) Let R be a fixed positive integer. A 2R -bit scalar quantizer
for quantizing real numbers in the interval [a, b] is a
Sources such as this one in which D(R) can be mapping Q from [a, b] into a finite subset of [a, b] of size
computed explicitly are rare. However, in general, one N = 2R . Let I1 , I2 , . . . , IN be subintervals of [a, b] that form
can use the Blahut algorithm [6] to approximate D(R) a partition of [a, b] (the Ij values are called the quantization
arbitrarily closely. intervals of Q). Let L1 , L2 , . . . , LN be points in [a, b] chosen
so that Lj ∈ Ij for each j = 1, 2, . . . , N (Lj is called the
3.1.2. Distortion Lower Bound. Fix a memoryless quantization level for interval Ij ). The quantizer Q accepts
source [n, An , Pn ] with distortion–rate function D(R). Fix as input any real number x in the interval [a, b]. The output
R > 0, and fix any lossy code for [n, An , Pn ] for which Q(x) generated by the quantizer Q in response to the input
the resulting compression system Sn satisfies the rate x is the quantization level Lj assigned to the subinterval Ij
constraint R(Sn ) ≤ R. Then, the following well-known [4] of [a, b] containing x. In other words, the 2R -bit quantizer
distortion lower bound is valid: Q is a nondecreasing step function taking 2R values.
Let [n, An , Pn ] be a source (such as a memoryless source)
D(Sn ) ≥ D(R) (27) in which each randomly generated data sample has the
same probability density function f . Let the alphabet for
Example 5. Assume the nth-order Gaussian memory- [n, An , Pn ] be the interval of real numbers [a, b]. Let
less source model and squared-error distortion. Substitut- Q be a 2R -bit scalar quantizer defined on [a, b]. We
ing the expression for D(R) in (26) into (27), one concludes describe a lossy code for the source [n, An , Pn ] induced
that any lossy code with compression rate ≤ R yields by Q. Referring to Fig. 6a, we must explain how the
distortion in decibels no greater than (20 log10 2)R ≈ 6R. lossy code quantizes source sequences of length n and
Thus, a Gaussian memoryless source cannot be encoded losslessly encodes the quantized sequences. The lossy code
to yield a better distortion performance than ‘‘6 decibels quantizes each source sequence (x1 , x2 , . . . , xn ) ∈ An into
per bit.’’ the sequence (Q(x1 ), Q(x2 ), . . . , Q(xn )); this makes sense
because each entry of each source sequence belongs to the
3.1.3. Distortion Upper Bound. Let f be a probability interval [a, b] on which Q is defined. Assign each of the 2R
density function. For each n = 1, 2, . . ., let [n, An , Pn ] be the quantization levels of Q an R-bit binary address so that
memoryless source in which n independent random data there is a one-to-one correspondence between quantization
samples are generated according to the density f . Assume levels and their addresses; then, the lossy code losslessly
finiteness of the distortion–rate function D(R) for these encodes (Q(x1 ), Q(x2 ), . . . , Q(xn )) by replacing each of its
sources [since the distortion–rate function depends only entries with its R-bit address, yielding an overall binary
on f , all of these sources will have the same distortion- codeword of length nR. Let Sn be the lossy compression
rate function D(R)]. Fix R > 0. It is well known [35, 53] system in Fig. 6 arising from the lossy code just described,
that there is a positive constant C such that for every and let d be the distortion function that is to be used. It is
n = 2, 3, . . ., there is a lossy code for [n, An , Pn ] such not hard to see that
that the rate and distortion for the resulting compression
system Sn satisfy R(Sn ) = R
b
R(Sn ) ≤ R D(Sn ) = d(x, Q(x))f (x) dx
a
C log2 n
D(Sn ) ≤ D(R) + (28)
n If in the preceding construction we let Q vary over all
2R -bit scalar quantizers on [a, b], then we obtain all
Combining the distortion upper bound (28) with the possible SQ based lossy codes for the source [n, An , Pn ]
distortion lower bound (27), if n is large, there must with compression rate R.
exist a code for an nth order memoryless source that Let R be a positive integer. Consider the following prob-
yields rate ≤ R and distortion ≈ D(R); that is, the lem: Find the 2R -bit scalar quantizer Q on [a, b] for which
distortion–rate function D(R) does indeed describe the
b
distortion performance of the most efficient lossy codes.
But, in general, it is not known how to find codes d(x, Q(x))f (x) dx
a
642 DATA COMPRESSION
is minimized. This quantizer Q yields the SQ based lossy cannot be achievable using SQ-based lossy codes — more
code for the source [n, An , Pn ], which has compression rate sophisticated lossy codes would be required.
R and minimal distortion. We call this scalar quantizer
the minimum distortion 2R -bit scalar quantizer. In general, it may not be possible to solve the Eqs. (29)
and (30) to find the Lloyd–Max quantizer explicitly. But,
3.2.1. Lloyd–Max Quantizers. We present the solution a numerical approximation of it can be found using the
given in other papers [17,28,30] to the problem of LBG algorithm covered in the next section.
finding the minimum distortion 2R -bit scalar quantizer
for squared-error distortion. Suppose that the probability 3.3. VQ-Based Codes
density function f satisfies Fix a positive integer m. Let R denote the set of real
b
numbers, and let Rm denote m-dimensional Euclidean
x2 f (x) dx < ∞ space, that is, the set of all sequences (x1 , x2 , . . . , xm ) in
a which each entry xi belongs to R. An m-dimensional vector
quantizer Q is a mapping from Rm onto a finite subset C of
and that − log2 f is a concave function on [a, b]. Let Q be Rm . The set C is called the codebook of the m-dimensional
a 2R -bit scalar quantizer on [a, b] with quantization levels vector quantizer Q; the elements of C are called codevectors,
{Lj }j=1
N
and quantization intervals {Ij }j=1
N
, where N = 2R . Let and if x ∈ Rm , then Q(x) is called the codevector for
x. We define a mapping to be a vector quantizer if it
y0 < y1 < · · · < yN−1 < yN is an m-dimensional vector quantizer for some m. The
scalar quantizers can be regarded as special cases of the
be the points such that interval Ij has left endpoint yj−1 vector quantizers (they are the one-dimensional vector
and right endpoint yj (j = 1, . . . , N). We call Q a Lloyd–Max quantizers); the codebook of a scalar quantizer is just the
quantizer if set of its quantization levels, and a codevector for a scalar
quantizer is just one of its quantization levels.
yj = ( 12 )[Lj + Lj+1 ], j = 1, . . . , N − 1 (29) Let Q be an m-dimensional vector quantizer with
codebook C. We call Q a nearest-neighbor quantizer if
and yj Q quantizes each x ∈ Rm into a codevector from codebook
xf (x) dx C that is at least as close to x in Euclidean distance as any
Lj = yj
j−1 y
, j = 1, 2, . . . , N. (30) other codevector from C. (We use squared-error distortion
f (x) dx in this section; therefore, only nearest-neighbor quantizers
yj−1 will be of interest to us.)
Fix a memoryless source [n, An , Pn ] whose alphabet
There is only one 2R -bit Lloyd–Max scalar quan- is a subset of the real line, such that n is an integer
tizer, and it is the unique minimum distortion 2R -bit multiple of m; let fm be the common probability density
scalar quantizer. function possessed by all random vectors of m consecutive
samples generated by this source. Let Q be a fixed
Example 6. One case in which it is easy to solve the m-dimensional nearest-neighbor vector quantizer whose
Lloyd–Max equations (29) and (30) is the case in which codebook is of size 2j . We describe how Q induces a lossy
R = 1 and f is an even function on the whole real line. The code for the source [n, An , Pn ] that has compression rate
unique Lloyd–Max quantizer Q is then given by R = j/m. Referring to Fig. 6, we have to explain how the
induced lossy code quantizes source sequences generated
∞
by [n, An , Pn ], and how it losslessly encodes the quantized
2 xf (x) dx, x<0
sequences. Let (x1 , x2 , . . . , xn ) ∈ An be any source sequence.
Q(x) = 0
∞
Partitioning, one obtains the following n/m blocks of
−2 xf (x) dx, x ≥ 0
0
length m lying in m-dimensional Euclidean space Rm :
Each codevector in C can be uniquely represented using gives rise to a distortion figure that can be obtained using
a j-bit binary address. The induced lossy code losslessly symmetry considerations. The 2D plane can be partitioned
encodes the quantized sequence (x̂1 , x̂2 , . . . , x̂n ) into the into eight congruent regions over which the integrals of
binary codeword obtained by concatenating together the the integrand in (31) are identical. One of these regions is
binary addresses of the codevectors above that were used
to form the quantized sequence; this binary codeword π π
S = (x1 , x2 ) : x1 ≥ 0, − tan x1 ≤ x2 ≤ tan x1
is of fixed length (n/m)j = nR, and so, dividing by n, 8 8
the compression rate R code bits per data sample has
The region S is the set of points in the plane R2 that
been achieved. Let Sn be the lossy compression system
are closest in Euclidean distance to the codevector (1, 0).
that arises when the lossy code just described is used
Therefore, the distortion is
to compress the data sequences generated by the source
[n, An , Pn ]. The resulting rate and distortion performance
1 −(x21 + x22 )
are given by 4 [(x1 − 1)2 + x22 ] exp dx1 dx2
2π 2
S
j
R(Sn ) = R =
m This integral is easily evaluated via conversion to
−1 polar coordinates. Doing this, one obtains distortion of
D(Sn ) = m min ||x − y|| fm (x) dx
2
(31)
Rm y∈C 5.55 dB. One can improve the distortion to 5.95 dB by
increasing the radius of the circle around which the eight
where ||x − y||2 denotes the square of the Euclidean codevectors in the codebook are distributed. Examining
distance between vectors x = (xi ) and y = (yi ) in Rm : the distortion–rate function of the Gaussian memoryless
source, we see that a distortion of about 9 dB is best
m
possible at the compression rate of 1.5 code bits per data
||x − y||2 = (xi − yi )2
i=1
sample. Hence, we can obtain a >3-dB improvement in
distortion by designing a VQ-based code that uses a vector
If we let Q vary over all nearest-neightbor m-dimensional quantizer of dimension >2.
vector quantizers whose codebooks are of size 2j , then
the lossy codes for the source [n, An , Pn ] induced by 3.3.1. LBG Algorithm. The LBG algorithm [27] is an
these are the VQ-based codes of rate R = j/m. Obviously, iterative algorithm for vector quantizer design. It works
of this large number of lossy codes, one would want for any dimension m, and employs a large set T of ‘‘training
to choose one of minimal distortion; however, it is vectors’’ from Rm . The training vectors, for example, could
typically an intractable problem to find such a minimum represent previously observed data vectors of length m.
distortion code. An initial codebook C0 contained in Rm of some desired
size is selected (the size of the codebook is a reflection of
Example 8. Suppose that the source [n, An , Pn ] for the desired compression rate). The LBG algorithm then
which a VQ-based code is to be designed is a Gaussian recursively generates new codebooks
memoryless source with mean 0 and variance 1. Suppose
that the desired compression rate is 1.5 code bits per C1 , C2 , C3 , . . .
data sample. Since 1.5 = 32 , we can use a two-dimensional
vector quantizer with codebook of size 23 = 8. The table as follows. Each codebook Ci (i ≥ 1) is generated from the
below gives one possibility for such a vector quantizer. previous codebook Ci−1 in two steps:
The left column gives the codevectors in the codebook;
the right column gives the binary address assigned to Step 1. For each v ∈ T, a closest vector to v in Ci−1
each codevector. The codevectors in this codebook can be is found (with respect to Euclidean distance), and
visualized as eight equally spaced points along the unit recorded. Let xv denote the vector that is recorded
circle in the plane, which is of radius 1 and centered at the for v ∈ T.
origin (0, 0). Step 2. For each vector y ∈ {xv : v ∈ T} recorded in step
1, the arithmetic mean of all vectors v in T for which
Codevector Address xv = y is computed. The set of arithmetic means
forms the new codebook Ci .
(1, 0) 000
(cos(π/4), sin(π/4)) 001 The LBG codebooks {Ci } either eventually coincide, or else
(0, 1) 010 eventually keep cycling periodically through a finite set
(cos(3π/4), sin(3π/4)) 011 of codebooks; in either case, one would stop iterations
(−1, 0) 100 of the LBG algorithm at a final codebook Ci as soon
(cos(5π/4), sin(5π/4)) 101 as one of these two scenarios occurs. The final LBG
(0, −1) 110 codebook, if used to quantize the training set, will
(cos(7π/4), sin(7π/4)) 111 yield smaller distortion on the training set than any of
the previously generated codebooks (including the initial
The lossy code induced by this 2D vector quantizer, when codebook). The LBG algorithm has been used extensively
used to compress data generated by the source [n, An , Pn ], since its discovery for VQ-based code design in both
644 DATA COMPRESSION
theoretical and practical scenarios. The one-dimensional that is an integer multiple of m. Let P be the set of all
LBG algorithm can be used to find a good approximation paths in G that begin at v∗ and consist of n/m edges. Then
to the unique Lloyd–Max quantizer, if the training set
is big enough. However, in higher dimensions, the LBG • Each path in P gives rise to a sequence of length n
algorithm has some drawbacks: (1) the final LBG codebook over the alphabet A if one writes down the labels on
may depend on the initial choice of codebook or (2) the its edges in the order in which these edges are visited.
final LBG codebook may not yield minimum distortion.
• Let R = j/m. There are 2nR paths in P, and it
Various stochastic relaxation and neural network–based
consequently takes nR bits to uniquely identify any
optimization techniques have been devised to overcome
one of these paths.
these deficiencies [38, Chap. 14;52].
3.3.2. Tree-Structured Vector Quantizers. In tree- A trellis is a pictorial representation of all of the
structured vector quantization [19, Chap. 12;32], the 2j paths in P. The trellis-based lossy code for quantiz-
codevectors in the VQ codebook are placed as labels on ing/encoding/decoding all the data sequences in An works
the 2j leaf vertices of a rooted binary tree of depth j. Each in the following way:
of the 2j − 1 internal vertices of the tree is also labeled
with some vector. To quantize a vector x, one starts at the Quantization Step. Given the data sequence (x1 , x2 , . . . ,
root of the tree and then follows a unique root-to-leaf path xn ) from An , the encoder finds a ‘‘minimal path’’ in
through the tree by seeing at each intermediate vertex of P, which is a path in P giving rise to a sequence
the path which of its two children has its label closer to (y1 , y2 , . . . , yn ) in An for which
x, whereupon the next vertex in the path is that child; x
is quantized into the codevector at the terminus of this
n
G of length n/m = 3 that start at v0 are then given the The minimal path is the path whose sum of weights
trellis representation: is the smallest; this is the path marked in bold in the
following figure:
(1,1)/0
u0 (1,1)
(1,0)/1
(−1,−1)/1
a
b
(1,1) (1,1) (1,1)
(1,0) (1,0) (1,0)
b
a
(0,−1) (0,−1) c
(−1,−1) (−1,−1) d
samples at the decoder, on average, while transmitting use squared-error distortion. Then, the compression rate
only half of the bits that would be necessary for perfect R and distortion D for this lossy code are
reconstruction (which would require a compression rate of
2 bits per sample). R = R1 + R2
n
n
3.4.1. Marcellin–Fischer Codes. Marcellin and Fischer D = n−1 E[(Xi(1) − X̂i(1) )2 ] + n−1 E[(Xi(2) − X̂i(2) )2 ]
[29] developed an important class of trellis-based codes. i=1 i=1
In certain applications, the data samples to be compressed is a minimum, where in the first minimization Q1 ranges
are not one-dimensional quantities; that is, for some over all 2R1 -bit quantizers, and in the second minimization
m > 1, each data sample is a point in m-dimensional Q2 ranges over all 2R2 -bit quantizers. Finding the best
Euclidean space. way to split up R into a sum R = R1 + R2 is called the bit
allocation problem in lossy source coding.
We may be able to obtain a smaller distortion by
Example 11. One may wish to compress a large square transforming the original data pairs from R2 into another
image by compressing the sequence of 8 × 8 blocks that sequence of pairs from R2 , and then doing the quantization
are obtained from partitioning the image. Each data and encoding of the transformed pairs in a manner similar
‘‘sample’’ in this case could be thought of as a point in to what we did above for the original pairs. This is the
m-dimensional Euclidean space with m = 64. (The JPEG philosophy behind transform coding. In the following, we
image compression algorithm takes this point of view.) make this idea more precise.
For simplicity, let us assume that the dimension of the Let m be a fixed positive integer. The data sequence to
data samples is m = 2. Let us write the data samples as be compressed is
random pairs X1 , X2 , . . . , Xn (34)
! "
Xi(1) where each Xi is an Rm -valued random column vector of
, i = 1, 2, . . . , n (32)
Xi(2) the form
(1)
Xi
Suppose that one is committed to lossy codes that employ X (2)
only scalar quantization. The sequence Xi = i..
.
Xi(m)
X1(1) , X2(1) , . . . , Xn(1)
We write the data sequence in more compact form as the
might have one type of model that would dictate that some m × n matrix M(X) whose columns are the Xi values:
scalar quantizer Q1 be used to quantize these samples. On
the other hand, the sequence
M(X) = [X1 X2 · · · Xn ]
X1(2) , X2(2) , . . . , Xn(2)
Let A be an m × m invertible matrix of real numbers. The
matrix A transforms the matrix M(X) into an m × n matrix
might have another type of model that would dictate the
M(Y) as follows:
use of a different scalar quantizer Q2 . Let us assume that
Q1 is a 2R1 -bit quantizer and that Q2 is a 2R2 -bit quantizer. M(Y) = AM(X)
The sequence resulting from quantization of (3.32) would
then be ! " Equivalently, A can be used to transform each column of
X̂i(1) M(X) as follows:
, i = 1, 2, . . . , n (33)
X̂i(2)
Yi = AXi , i = 1, 2, . . . , n
where =
X̂i(1) Q1 (Xi(1) )
and = X̂i(2) Q2 (Xi(2) ).
The sequence
(33) is transmitted by the encoder to the decoder using Then, M(Y) is the matrix whose columns are the Yi values:
nR1 + nR2 bits, and is reconstructed by the decoder as the
decoder’s estimate of the original sequence (32). Let us M(Y) = [Y1 Y2 · · · Yn ]
DATA COMPRESSION 647
Let Y(j) denote the jth row of M(Y)(j = 1, 2, . . . , m). Then, The decoder must decode the separate bit streams and
the matrix M(Y) can be partitioned in two different ways: then combine the decoded streams by means of an inverse
transform to obtain the reconstructions of the original
Y(1) data samples; this ‘‘back end’’ of the system is called the
Y(2) synthesis stage of the transform code, and is depicted in
M(Y) = [Y1 Y2 · · · Yn ] = .. Fig. 7b.
.
In Fig. 7b, Ŷ(j) is the row vector of length n obtained by
Y(m) quantizing the row vector Y(j) (j = 1, 2, . . . , m). The matrix
M(Ŷ) is the m × n matrix whose rows are the Ŷ(j) terms.
The row vectors Y(j) (j = 1, 2, . . . , m) are the coefficient The matrix M(X̂) is also m × n, and is formed by
streams generated by the transform code; these streams
are separately quantized and encoded for transmission M(X̂) = A−1 M(Ŷ)
to the decoder, thereby completing the ‘‘front end’’ of the
transform code, which is called the analysis stage of the Let us write the columns of M(X̂) as
transform code and is depicted in Fig. 7a.
Note in Fig. 7a that the rates at which the separate
X̂1 , X̂2 , . . . , X̂n .
encoders operate are R1 , R2 , . . . , Rm , respectively. The
separate quantizers in Fig. 7a are most typically taken
These are the reconstructions of the original data
to be scalar quantizers (in which case the Ri terms
sequence (34).
would be integers), but vector quantizers could be used
The distortion (per sample) D resulting from using the
instead (in which case all Ri would be rational numbers).
transform code to encode the source [n, An , Pn ] is
If we fix the overall compression rate to be the target
value R, then the bit allocation must be such that R =
n
R1 + R2 + · · · + Rm . The m separate bit streams generated D = n−1 E[||Xi − X̂i ||2 ] (35)
in the analysis stage are multiplexed into one rate R i=1
bit stream for transmission to the decoder, who then
obtains the separate bit streams by demultiplexing — this With the compression rate fixed at R, to optimize
multiplexing/demultiplexing part of the transform code, the transform code, one must select R1 , R2 , . . . , Rm and
which presents no problems, has been omitted in Fig. 7a. quantizers at these rates so that R = R1 + R2 + · · · + Rm
(a)
Multiply
(b)
Form
Multiply
A−1
…
…
…
Matrix
Figure 7. (a) Analysis stage and (b) synthesis stage of transform code.
648 DATA COMPRESSION
and D is minimized. One can also attempt to optimize the Burrows–Wheeler transform [7]; and (2) grammar-based
choice of the transformation matrix A as well, although, codes [25,31], which are based on grammar transforms.
more typically, the matrix A is fixed in advance. If A is
an orthogonal matrix (i.e., the transformation is unitary),
4. NOTES ON THE LITERATURE
then the right side of (35) is equal to
n This article has provided an introduction to basic data
n−1 E[||Yi − Ŷi ||2 ] compression techniques. Further material may be found
i=1 in one textbook [41] that provides excellent coverage of
both lossless and lossy data compression, and in another
which makes the problem of optimally designing the textbook [40] that provides excellent coverage of lossless
transform code much easier. However, A need not data compression. There have been several excellent
be orthogonal. survey articles on data compression [5,15,20].
Various data compression standards employ the basic
Various transforms have been used in transform data compression techniques that were covered in this
coding. Some commonly used transforms are the discrete article. Useful material on data compression standards
cosine transform, the discrete sine transform, and the may be found in a book by Gibson et al. [21], which covers
Karhunen–Loeve transform — a good account of these the JPEG and JBIG still-image compression standards,
transforms may be found in Ref. 41, Chap. 12. Special the MPEG audio and MPEG video compression standards,
mention should be made of the wavelet transform, which and multimedia conferencing standards.
is becoming ubiquitous in state of the art compression
methods. Various image compression methods employ the 4.1. Speech Compression
wavelet transform, including the EZW method [44], the
Speech is notoriously hard to compress — consequently, a
SPIHT method [39], and the EBCOT method [46]; the
specialized body of techniques have had to be developed
wavelet transform is the backbone of the JPEG 2000
for speech compression. Since these specialized techniques
image compression standard [47]. The wavelet transform
are not applicable to the compression of other types of
has also been applied to fingerprint compression, speech,
data, and since this article has focused rather on general
audio, electrocardiogram, and video compression — a good
compression techniques, speech compression has been
account of these applications may be found in Ref. 45,
omitted; coverage of speech compression may be found
Chap. 11.
in the textbook by Deller et al. [14].
Subband coding may be regarded as a special case
of transform coding. It originally arose in the middle
4.2. Fractal Image Compression
1970s as an effective technique for speech compression
[12]; subsequently, its applications have grown to include Fractal image compression has received much attention
the compression of recorded music, image compression, since 1987 [2,3,16,23]. A fractal image code compresses
and video compression. In subband coding, one filters an image by encoding parameters for finitely many
a data sequence of length n by both a highpass filter contractive mappings, which, when iterated, yield an
and a lowpass filter and then throws every other filtered approximation of the original image. The limitations of
sample away, yielding a stream of n/2 highpass-filtered the fractal image compression method arise from the fact
samples and a stream of n/2 lowpass-filtered samples; that it is not clear to what extent natural images can be
these two streams are separately quantized and encoded, approximated in this way.
completing the analysis stage of a subband coding system.
The highpass and lowpass filters are not arbitrary; 4.3. Differential Coding
these must be complementary in the sense that if no
Differential codes (such as delta modulation and differen-
quantization and encoding takes place, then the synthesis
tial pulsecode modulation), although not covered in this
stage must perfectly reconstruct the original sequence of
article, have received coverage elsewhere in this encyclo-
data samples. Further passes of highpass and lowpass
pedia. Differential codes have become part of the standard
filterings can be done after the first pass in order to yield
body of material covered in a first course in communication
several substreams with frequency content in different
systems [26, Chap. 6].
subbands; in this way, the subband coding system can
become as sophisticated as one desires — one can even
obtain the wavelet transform via a certain type of subband BIOGRAPHY
decomposition. Subband coding has received extensive
coverage in several recent textbooks on multirate and John C. Kieffer received the B.S. degree in applied math-
multiresolution signal processing [1,49,50]. ematics in 1967 from the University of Missouri, Rolla,
We have concentrated on lossy transform codes, but Missouri, and the M.S. and Ph.D. degrees in mathematics
there are lossless transform codes as well. A lossless from the University of Illinois Champaign — Urbana in
transform code is similar in concept to a lossy transform 1968 and 1970, respectively. He joined the University of
code; the main difference is that the transformed data Missouri Rolla (UMR) in 1970 as an Assistant Professor.
sequence is not quantized in a lossless transform code. At UMR he worked on ergodic theory and information
Good examples of lossless transform codes are (1) theory. Since 1986, he has been a Professor at the Univer-
the transform code for text compression based on the sity of Minneapolis Twin Cities Department of Electrical
DATA COMPRESSION 649
and Computer Engineering, where he has been working 22. D. Huffman, A method for the construction of minimum
on data compression. Dr. Kieffer has 70 MathSciNet pub- redundancy codes, Proc. IRE 40: 1098–1101 (1952).
lications. He was named a Fellow of the IEEE in 1993. 23. A. Jacquin, Image coding based on a fractal theory of iterated
His areas of interest are grammar-based codes, trellis- contractive image transformations, IEEE Trans. Image Proc.
coded quantization, and the interface between information 1: 18–30 (1992).
theory and computer science. 24. N. Jayant and P. Noll, Digital Coding of Waveforms, Prentice-
Hall, Englewood Cliffs, NJ, 1984.
BIBLIOGRAPHY 25. J. Kieffer and E. Yang, Grammar-based codes: A new class of
universal lossless source codes, IEEE Trans. Inform. Theory
1. A. Akansu and R. Haddad, Multiresolution Signal Decompo- 46: 737–754 (2000).
sition: Transforms, Subbands, Wavelets, Academic Press, San
26. B. Lathi, Modern Digital and Analog Communications
Diego, 1992.
Systems, 3rd ed., Oxford Univ. Press, New York, 1998.
2. M. Barnsley, Fractals Everywhere, Academic Press, Boston,
27. Y. Linde, A. Buzo and R. Gray, An algorithm for vector
1988.
quantizer design, IEEE Trans. Commun. 28: 84–95 (1980).
3. M. Barnsley and L. Hurd, Fractal Image Compression,
28. S. Lloyd, Least squares quantization in PCM, IEEE Trans.
Wellesley, AK Peters, Ltd., MA, 1993.
Inform. Theory 28: 129–137 (1982).
4. T. Berger, Rate Distortion Theory: A Mathematical Basis for
29. M. Marcellin and T. Fischer, Trellis coded quantization
Data Compression, Prentice-Hall, Englewood Cliffs, NJ, 1971.
of memoryless and Gauss-Markov sources, IEEE Trans.
5. T. Berger and J. Gibson, Lossy source coding, IEEE Trans. Commun. 38: 82–93 (1990).
Inform. Theory 44: 2693–2723 (1998).
30. J. Max, Quantizing for minimum distortion, IRE Trans.
6. R. Blahut, Computation of channel capacity and rate- Inform. Theory 6: 7–12 (1960).
distortion functions, IEEE Trans. Inform. Theory 18: 460–473
(1972). 31. C. Nevill-Manning and I. Witten, Compression and explana-
tion using hierarchical grammars, Comput. J. 40: 103–116
7. M. Burrows and D. Wheeler, A block-sorting lossless data
(1997).
compression algorithm, unpublished manuscript, 1994.
32. A. Nobel and R. Olshen, Termination and continuity of greedy
8. J. Cleary and I. Witten, Data compression using adaptive
growing for tree-structured vector quantizers, IEEE Trans.
coding and partial string matching, IEEE Trans. Commun.
Inform. Theory 42: 191–205 (1996).
32: 396–402 (1984).
33. R. Pasco, Source Coding Algorithms for Fast Data Compres-
9. J. Conway and N. Sloane, Fast quantizing and decoding
sion, Ph.D. thesis, Stanford Univ., 1976.
algorithms for lattice quantizers and codes, IEEE Trans.
Inform. Theory 28: 227–232 (1982). 34. W. Pennebaker, J. Mitchell, G. Langdon, and R. Arps, An
10. J. Conway and N. Sloane, Sphere Packings, Lattices, and overview of the basic principles of the Q-coder adaptive binary
Groups, 2nd ed., Springer-Verlag, New York, 1993. arithmetic coder, IBM J. Res. Dev. 32: 717–726 (1988).
11. T. Cover, Enumerative source encoding, IEEE Trans. Inform. 35. R. Pilc, The transmission distortion of a source as a function
Theory 19: 73–77 (1973). of the encoding block length, Bell Syst. Tech. J. 47: 827–885
(1968).
12. R. Crochiere, S. Webber and J. Flanagan, Digital coding of
speech in subbands, Bell Syst. Tech. J. 56: 1056–1085 (1976). 36. J. Rissanen, Generalized Kraft inequality and arithmetic
coding, IBM J. Res. Dev. 20: 198–203 (1976).
13. L. Davisson, Universal noiseless coding, IEEE Trans. Inform.
Theory 19: 783–795 (1973). 37. J. Rissanen and G. Langdon, Arithmetic coding, IBM J. Res.
Dev. 23: 149–162 (1979).
14. J. Deller, J. Proakis and J. Hansen, Discrete-Time Processing
of Speech Signals, Macmillan, Englewood Cliffs, NJ, 1993. 38. H. Ritter, T. Martinez and K. Schulten, Neural Computation
and Self-Organizing Maps, Addison-Wesley, Reading, MA,
15. D. Donoho, M. Vetterli, R. DeVore and I. Daubechies, Data
1992.
compression and harmonic analysis, IEEE Trans. Inform.
Theory 44: 2435–2476 (1998). 39. A. Said and W. Pearlman, A new fast and efficient coder based
16. Y. Fisher, ed., Fractal Image Compression: Theory and on set partitioning in hierarchical trees, IEEE Trans. Circuits
Application, Springer-Verlag, New York, 1995. Syst. Video Tech. 6: 243–250 (1996).
17. P. Fleischer, Sufficient conditions for achieving minimum 40. D. Salomon, Data Compression: The Complete Reference,
distortion in a quantizer, IEEE Int. Conv. Record, 1964, Part Springer-Verlag, New York, 1998.
I, Vol. 12, pp. 104–111. 41. K. Sayood, Introduction to Data Compression, 2nd ed.,
18. M. Garey, D. Johnson and H. Witsenhausen, The complexity Morgan Kaufmann, San Francisco, 2000.
of the generalized Lloyd-Max problem, IEEE Trans. Inform. 42. C. Shannon, A mathematical theory of communication, Bell
Theory 28: 255–256 (1982). System Tech. J. 27: 379–423, 623–656 (1948).
19. A. Gersho and R. Gray, Vector Quantization and Signal 43. C. Shannon, Coding theorems for a discrete source with
Compression, Kluwer, Boston, 1992. a fidelity criterion, IRE Nat. Conv. Record, 1959, Part 4,
20. R. Gray and D. Neuhoff, Quantization, IEEE Trans. Inform. pp. 142–163.
Theory 44: 2325–2383 (1998). 44. J. Shapiro, Embedded image coding using zerotrees of wavelet
21. J. Gibson et al., Digital Compression for Multimedia: Princi- coefficients, IEEE Trans. Signal Process. 41: 3445–3462
ples and Standards, Morgan-Kaufmann, San Francisco, 1998. (1993).
650 DESIGN AND ANALYSIS OF A WDM CLIENT/SERVER NETWORK ARCHITECTURE
45. G. Strang and T. Nguyen, Wavelets and Filter Banks, the system must provide more bandwidth. For example, a
Wellesley-Cambridge Univ. Press, Wellesley, MA, 1996. video-on-demand system that provides 1000 videostreams
46. D. Taubman, High performance scalable image compression of MPEG2 quality needs about 5–6 Gbps of bandwidth at
with EBCOT, IEEE Trans. Image Process. 9: 1158–1170 the server side. Such high-bandwidth requirement can-
(2000). not be satisfied in conventional client/server systems.
47. D. Taubman and M. Marcellin, JPEG 2000: Image Com- The most recent technological advances in optical com-
pression Fundamentals, Standards and Practice, Kluwer, munication, especially wavelength-division multiplexing
Hingham, MA, 2000. (WDM) technology, as well as computer hardware, make
48. G. Ungerboeck, Channel coding with multilevel/phase sig- the design of such a high-bandwidth server possible. How-
nals, IEEE Trans. Inform. Theory 28: 55–67 (1982). ever, what kind of network architecture can we use to
construct such high-bandwidth network? How well is the
49. P. Vaidyanathan, Multirate Systems and Filter Banks,
Prentice-Hall, Englewood Cliffs, NJ, 1993.
performance of the proposed network architecture? We
describe and study a client/server WDM network archi-
50. M. Vetterli and J. Kovačević, Wavelets and Subband Coding, tecture and its performance. This architecture is suitable
Prentice-Hall, Englewood Cliffs, NJ, 1995.
for multimedia services, and is unicast-, multicast-, and
51. E. Yang and J. Kieffer, Efficient universal lossless data broadcast-capable.
compression algorithms based on a greedy sequential The remainder of this article is organized as follows.
grammar transform. I. Without context models, IEEE Trans. We outline the enabling technologies in Section 2. The
Inform. Theory 46: 755–777 (2000).
WDM networking technology is summarized in Section 3.
52. K. Zeger, J. Vaisey and A. Gersho, Globally optimal vector In Section 4, we describe the system architecture and
quantizer design by stochastic relaxation, IEEE Trans. Signal connection setup protocol. Section 5 describes the system
Process. 40: 310–322 (1992).
performance and analysis for unicast and multicast
53. Z. Zhang, E. Yang and V. Wei, The redundancy of source services. We discuss the scheduling policy for user requests
coding with a fidelity criterion. I. Known statistics, IEEE in Section 6. Finally, Section 7 provides concluding
Trans. Inform. Theory 43: 71–91 (1997). remarks.
54. J. Ziv and A. Lempel, A universal algorithm for data
compression, IEEE Trans. Inform. Theory 23: 337–343 2. WDM CLIENT–SERVER NETWORK-ARCHITECTURE-
(1977). ENABLING TECHNOLOGIES
55. J. Ziv and A. Lempel, Compression of individual sequences
via variable-rate coding, IEEE Trans. Inform. Theory 24: Optical fiber communication plays a key role to enable
530–536 (1978). high-bandwidth data connection. Now, a single fiber
strand has a potential bandwidth of 50 THz corresponding
to the low-loss region shown in Fig. 1. To make good use
DESIGN AND ANALYSIS OF A WDM of the potential bandwidth of the fiber, WDM technology
CLIENT/SERVER NETWORK ARCHITECTURE is widely used. In WDM, the tremendous bandwidth of
a fiber (up to 50 THz) is divided into many nonoverlap-
WUSHAO WEN ping wavelengths, called WDM channels [1]. Each WDM
BISWANATH MUKHERJEE channel may operate at whatever possible speed indepen-
University of California, Davis dently, ranging from hundreds of Mbps to tens of Gbps.
Davis, California WDM technology is now deployed in the backbone net-
works [1,2]. OC-48 WDM backbone networks have already
1. INTRODUCTION
Output fiber 2
Input fiber 2
l1, ..., l4
l2
l4 l4
l4
Input fiber 4 Output fiber 4
l4
l4 l1, ..., l4
Passive star
been widely implemented, and the OC-192 WDM backbone • Passive Star. A passive star (see Fig. 2) is a
network is being tested and implemented. The enabling ‘‘broadcast’’ device. A signal that is inserted on a
technologies of the WDM client/server network architec- given wavelength from an input fiber port will have
ture include tunable/nontunable wavelength transmitter its power equally divided among (and appear on the
and receiver, optical multiplexer, optical tap, and high- same wavelength on) all output ports. As an example,
performance computer in additional to WDM technology. in Fig. 2, a signal on wavelength λ1 from input fiber 1
We outline the basic functions for each optical component. and another on wavelength λ4 from input fiber 4 are
broadcast to all output ports. A ‘‘collision’’ will occur
• Transmitter and Receiver. An optical transmitter [1] when two or more signals from the input fibers are
is used to convert an electrical signal into an opti- simultaneously launched into the star on the same
cal signal and inject it to the optical transmission wavelength. Assuming as many wavelengths as there
media, namely, the fiber. An optical receiver actually are fiber ports, an N × N passive star can route N
consists of an optical filter [1] and an opticoelectri- simultaneous connections through itself.
cal signal transformer. The transmitter and receiver • Optical Tap. An optical tap is similar to a one-input,
can be classified as tunable and nontunable. A non- two-output passive star. However, the output power
tunable transmitter/receiver transmits or receives in one port is much higher than that in the other. The
signal only from a fixed frequency range, implying output port with higher power is used to propagate a
that only fixed wavelengths can be used. For a tun- signal to the next network element. The other output
able transmitter/receiver, the passband can be tuned port with lower power is used to connect to a local
to different frequencies, which makes it possible to end system. The input fiber can carry more than one
transmit or receive data for different wavelengths wavelength in an optical tap.
at different times. In a WDM system, the band-
width of an optical fiber is divided into multiple
3. WDM NETWORKING ARCHITECTURE
channels (or wavelengths). Communication between
two nodes is possible only when the transmitter of 3.1. Point-to-Point WDM Systems
the source node and receiver of the destination node
WDM technology is being deployed by several telecom-
are tuned to the same channel during the period of
munication companies for point-to-point communications.
information transfer. There are four types of con-
This deployment is being driven by the increasing
figuration between two communication end nodes.
demands on communication bandwidth. When the demand
They are fixed transmitter/fixed receiver, fixed trans-
exceeds the capacity in existing fibers, WDM is a more
mitter/tunable receiver, tunable transmitter/fixed
cost-effective alternative compared to laying more fibers.
receiver, and tunable transmitter/tunable receiver. A
WDM mux/demux in point-to-point links is now
tunable transmitter/receiver is more expensive than
available in product form. Among these products, the
a fixed transmitter/receiver.
maximum number of channels is 160 today, but this
• Optical Multiplexer. An optical multiplexer combines number is expected to increase.
signals from different wavelengths on its input ports
(fiber) onto a common output port (fiber) so that a 3.2. Broadcast-and-Select (Local) Optical WDM Network
single outgoing fiber can carry multiple wavelengths A local WDM optical network may be constructed by
at the same time. connecting network nodes via two-way fibers to a passive
652 DESIGN AND ANALYSIS OF A WDM CLIENT/SERVER NETWORK ARCHITECTURE
(1) single-hop [3] or (2) multihop [4]. Also, note that, when
Unicast a source transmits on a particular wavelength λ1 , more
than one receiver can be tuned to wavelength λ1 , and all
such receivers may pick up the information stream. Thus,
Multicast the passive star can support ‘‘multicast’’ services.
Detailed, well-established discussions on these network
architectures can be found elsewhere, [e.g., 1,3,4].
Workstation
4.1. Architecture
Passive star coupler In our client/server architecture, the server is equipped
with M fixed WDM transmitters that operate on M
different wavelengths, and a conventional bidirectional
Ethernet network interface that operates on a separate
WDM channel is used as the control signaling channel.
Every workstation (client) is equipped with a tunable
WDM receiver (TR) and an Ethernet network interface.
The Ethernet client and server interfaces are connected
Figure 3. A passive-star-based local optical WDM network.
together and form the signaling channel. All control
signals from the clients to the server or vice versa will be
broadcast on the control channel by using the IEEE 802.3
star, as shown in Fig. 3. A node sends its transmission to protocol [5]. The control channel can be used as a data
the star on one available wavelength, using a laser that channel between the clients, or between a client and the
produces an optical information stream. The information server. In the normal data transmission from the server to
streams from multiple sources are optically combined the clients, the server uses the M data channels to provide
by the star and the signal power of each stream is data service to the clients. A client can tune its receiver
equally split and forwarded to all the nodes on their to any of the M server channels and receive data from
receive fibers. A node’s receiver, using an optical filter, the server. We show in Fig. 4 a typical client/server WDM
is tuned to only one of the wavelengths; hence it can system described above. This system can be implemented
receive the information stream. Communication between using a passive star coupler, which connects the server
sources and receivers may follow one of two methods: and clients using two-way fibers [1].
WDM
server 1 control channel
Ethernet
receiver
NIC
Fixed
Transmitter 1
Disk
Transmitter 2
Processor
M transmitter Processor
Transmitter M
I/O Disk I/O
Ethernet NIC Control channel
Figure 4. A typical client/server WDM net- NUI
work.
DESIGN AND ANALYSIS OF A WDM CLIENT/SERVER NETWORK ARCHITECTURE 653
Since a WDM data channel can have a data rate as Client Server
high as several gigabits per second, a WDM client/server
network can be used for those applications that require
Begin Propagation time
a high volume of data transfer or those multimedia reques
t
applications that require high bandwidth. However, a
fiber can carry only several hundred wavelengths today,
Request
and this may not be enough if we assign a whole
queuing time
alive
wavelength for a single user request. To make good use of Keep−
the channel capacity, time-division multiplexing (TDM)
technology is used to carve a WDM channel into
multiple subchannels in time domain. The server can Setup
Setup
then assign a TDM subchannel to serve the user acknow
ledgem
request. Clients must use some filtering mechanism ent
r
to get the data from the subchannel. However, the
data transfe
client and the server must communicate with each Begin
other to negotiate on which subchannel they plan
to use. Channel
This network is also broadcast- and multicast-capable Possib
le resp holding
onse (N
because every client can tune its receiver to any of the M Data AK) time
data channels, and more than one client may tune their transfer
receivers to the same channel at any given time. time
r
transfe
4.2. Operations and Signaling End of
Channe
In a client/server network, the server is able to provide l releas
e
Time
service to clients on request. In such a network, the
upstream data traffic volume (from clients to server) is
generally much smaller than the downstream data traffic Figure 5. The request setup procedure in a WDM client/server
(from server to clients). Therefore, in a WDM client/server system.
network, it is possible to use a single Ethernet channel
to fulfill the signaling tasks between the clients and the
server. All the clients’ upstream traffic and the server’s
control traffic share the common signaling channel via
information such as the target client’s identification
statistical multiplexing. Figure 5 shows a connection setup
information, request ID, the data channel that will
procedure of a client’s request in the WDM client/server
be used, the information of the file to be sent out,
architecture.
and the service profile granted for the request.
Figure 5 indicates that, to setup a connection for a
client’s request, the network must perform the following Step 3. On receiving the setup message, the target
operations: client’s tunable receiver tunes to the assigned
channel and sends back a setup acknowledgment
message. Now, the client’s receiver is waiting for the
Step 1. A client sends a request message to the server requested data.
via the common control channel. The message should
Step 4. When the server receives the setup acknowl-
include information such as the identification of the
client, the identification of the requested-file, and edgment message, the connection is set up. Then,
the service profile. On receiving the user request, the the server begins to send out the data as soon as the
server will respond with a request–acknowledgment channel is available. If no setup acknowledgment
message to the client. This message includes message is received from the client within certain
information on the request’s status, the server timeout duration, the reserved channel is released
status, and so on. and the client’s request is discarded.
Step 2. The server processes the client’s request. If Step 5. Now the connection is set up and data transfer
no data channel or server resource is available, the begins. As we know, the optical channel’s bit error
request is put in the server’s request queue and rate (BER) is very low, on an order of 10−9 –10−12 ,
it waits for the server’s channel scheduler (SCS) to so it is safe to assume that the data channels have
allocate a data channel or server resource for it. If very low packet error rates. Therefore, the system
the request waits for a long time, the server should uses NAK signals to indicate any error packets. No
send keep-alive messages periodically to the client. acknowledgment signal will be sent out if a packet
As soon as the SCS knows that a channel is available, is correctly received. However, it is possible and
the server’s scheduler will select the request (if any) helpful for a client to send out some control message
with highest priority and reserve the channel for to synchronize the data transmission.
it. Then, a setup message will be sent to the client Step 6. After the server finishes sending the client’s
via the control channel. The setup message includes requested data, it sends an end-of-connection
654 DESIGN AND ANALYSIS OF A WDM CLIENT/SERVER NETWORK ARCHITECTURE
message. The client then responds with a channel- now has two major tasks: handling user requests and
release message to the server. Then, the server tears caching data. An end user who wants to access some
down the connection and frees the channel. data service, connects to the nearest WDM client. This
can be done by a standard client/server service, which
In this client/server WDM network, the server may receive implies that no modification is necessary for the end user.
three kinds of messages: connection request messages, On receiving a service request from the end user, the
acknowledgment messages, and data transmission control WDM client then analyzes and processes the request.
messages. The server should give higher priority to Batching2 may be used to accommodate requests for the
acknowledgment and control messages so that the system same service from different users. If the user’s requested
can process clients’ responses promptly and minimize the file or program is stored at the WDM client, the request can
channel idle-holding time.1 It is possible that the requests be served directly without referring to the WDM server;
from the clients come in a bursty manner. Therefore, otherwise, the WDM client will contact the server for the
sometimes, the server may not have enough channels to data. However, no matter whether the service is provided
serve the requests immediately. In that case, the request is by the WDM client or the WDM server, it is transparent
buffered. When a channel is available again, there may be to the end users.
more than one client waiting for a free channel. Therefore, This internetworking architecture is suitable for
the server should have some mechanism to allocate a multimedia service providers. A service provider can
free channel when it is available. The mechanism used deploy the WDM clients in different cities to provide
to decide which request should be assigned with the free service to end users. Because the system is broadcast-
channel when there are multiple requests waiting for it is and multicast-capable, it has very good scalability.
called the (SCS) mechanism [6].
The WDM client/server network architecture is easy to To analyze the system’s performance, we consider an
internetwork. In this network architecture, the server has example system that uses fixed-length cells (k bytes/cell)
very high bandwidth; therefore, it is not cost-effective to send data from the server to the clients. Every cell
to connect the server with the outside network directly. occupies exactly one time slot in a WDM channel. The
Instead, the clients are connected to outside networks. control messages between a client and the server also use
Figure 6 shows the internetworking scheme where the
clients act as proxy servers for outside networks. A client
2 Batching makes use of multicast technology. It batches multiple
requests from different users who ask for the same service and
1A channel’s idle holding time is the time that the channel is use a single stream from the server to serve the users at the same
reserved for a client but not used for data transfer. time by using multicasting [6].
WDM
server
Internet
Figure 6. WDM client/server internetworking. End user End user End user
DESIGN AND ANALYSIS OF A WDM CLIENT/SERVER NETWORK ARCHITECTURE 655
102
5.1. Bottleneck Analysis
data. Therefore, bottleneck analysis does not apply in this • Clients’ requests form a Poisson process. The request
service mode. rate is λr requests/s. So the interarrival times are
negative exponentially distributed with mean 1/λr
5.2. System Performance Illustrative Study minutes.
• There are M data channels altogether. Each channel’s
This subsection studies the system performance in terms capacity is B bps.
of delay, throughput, and average queuing length. Let us
• The file sizes that users request from the server are
first define several keywords:
exponentially distributed with average file size f . We
also assume the average file size to be relatively large.
• A Request’s System Time. A request’s system time is Therefore, the data transfer time is much larger than
measured from the time that a user sends out the the round-trip time and the client’s receiver tuning
request message to the time that the user receives all time, and we can omit the round-trip time and client’s
data. Let us denote the time as St . Then, from Fig. 5, receiver-tuning time in our analysis.
we can express St as • The scheduler uses first-come, first-served (FCFS)
policy.
St = 2R + tq + td (3)
Using on the above assumptions, we analyze the system
where R is the round-trip time between the client and as follows. For FCFS scheduling policy, we can model the
the server; tq is the request’s queuing time, defined system as an M/M/m queuing system. In this system,
as the time from the instant a request reaches the a data channel is regarded as a server in the model.
server to the time the request starts to be served by The service time is equal to the channel holding time.
the server; and td is the data transfer time. For every We have assumed that the file sizes that users requested
client’s request, the round-trip time and the data from the server are exponentially distributed with average
transfer time are fixed. The only optimizable factor f . Therefore, the channel holding time at each channel
is the queuing time for a given system. Therefore, in is negative exponential distributed with mean µ = B/f .
order to reduce the response time, we should have a Let us define ρ = λr /Mµ. Clearly the system should
good scheduling algorithm in the server so that we require 0 ≤ ρ < 1. If ρ ≥ 1, then the system’s service
can minimize the average request delay. rate will be smaller than the arrival rate, which means
that the waiting queue will be built up to infinity, and
• A Request’s Channel Holding Time. The channel
the system will become unstable. On the basis of these
holding time for a request, denoted as H, can be
parameters and the M/M/m model, we have the following
expressed by
results [7]:
H = 2R + td = D − tq (4)
• The average number of busy channels Ub in the
system is given by
Therefore, for a given request, the channel holding
time is almost nonoptimizable, because the round- λr
trip time and data transfer time are all nonoptimiz- E[Ub ] = Mρ = (6)
µ
able for a given request in a specific system.
• System Throughput. When the system operates in
the stable state, it should be able to serve all user • The throughput of the system is given by λr f . This is
requests. So the system throughput should equal because the average number of the served requests
the average request rate multiplied by the average and the arrival requests should be in balance.
file size. The maximum reachable throughput in the • The average queue length Qn of the system is given by
system, denoted as θmax , can be express as
(Mρ)M ρ
Qn = Mρ + π0 (7)
f /B M! (1 − ρ)2
θmax = MB (5)
f /B + 2R
where π0 is the stationary zero queue-length
where M is total number of data channels, B is the channel probability and is given by
capacity, R is the average round-trip time from the clients
to the server, and f is the average file size. Here, 2R is ! "−1
M−1
(Mρ)i (Mρ)M 1
the average overhead conceived by the server to setup a π0 = 1 + + (8)
connection. i! (M)! 1 − ρ
i=1
We analyze a system’s performance based on the following • The average queuing time is given by the average
assumptions: system time minus the service time (we omit the
DESIGN AND ANALYSIS OF A WDM CLIENT/SERVER NETWORK ARCHITECTURE 657
DESIGN AND ANALYSIS OF LOW-DENSITY as the message-passing algorithm. The iterative decoding
PARITY-CHECK CODES FOR APPLICATIONS algorithms employed in the current research literature
TO PERPENDICULAR RECORDING CHANNELS are suboptimal, although simulations have demonstrated
their performance to be near optimal (e.g., near maximum
EROZAN M. KURTAS likelihood). Although suboptimal, these decoders still have
ALEXANDER V. KUZNETSOV very high complexity and are incapable of operating in
Seagate Technology the >1-Gbps (gigabit-per-second) regime.
Pittsburgh, Pennsylvania The prevalent practice in designing LDPC codes is
BANE VASIC to use very large randomlike codes. Such an approach
completely ignores the quadratic encoding complexity in
University of Arizona
Tucson, Arizona length N of the code and the associated very large memory
requirements. MacKay [35] proposed a construction of
irregular LDPC codes resulting in a deficient rank
1. INTRODUCTION parity-check matrix that enables fast encoding since
the dimension of the matrix to be used in order to
Low-density parity-check (LDPC) codes, first introduced calculate parity bits turns out to be much smaller
by Gallager [19], are error-correcting codes described by than the number of parity check equations. Richardson
sparse parity-check matrices. The recent interest in LDPC et al. [44] used the same encoding scheme in the context
codes is motivated by the impressive error performance of their highly optimized irregular codes. Other authors
of the Turbo decoding algorithm demonstrated by Berrou have proposed equally simple codes with similar or even
et al. [5]. Like Turbo codes, LDPC codes have been shown better performance. For example, Ping et al. [42] described
to achieve near-optimum performance [35] when decoded concatenated tree codes, consisting of several two-state
by an iterative probabilistic decoding algorithm. Hence trellis codes interconnected by interleavers, that exhibit
LDPC codes fulfill the promise of Shannon’s noisy channel performance almost identical to turbo codes of equal block
coding theorem by performing very close to the theoretical length, but with an order of magnitude less complexity.
channel capacity limit. An alternative approach to the random constructions
One of the first applications of the Gallager LDPC is the algebraic construction of LDPC codes using finite
codes was related to attempts to prove an analog of the geometries [24,25]. Finite geometry LDPC codes are
Shannon coding theorem [47] for memories constructed quasicyclic, and their encoders can be implemented with
from unreliable components. An interesting scheme of linear shift registers with feedback connections based on
memory constructed from unreliable components was their generator polynomials. The resulting codes have
proposed by M. G. Taylor, who used an iterative threshold been demonstrated to have excellent performance in
decoder for periodic correction of errors in memory AWGN, although their decoder complexities are still
cells that degrade steadily in time [52]. The number somewhat high. Vasic [54–56] uses balanced incomplete
of elements in the decoder per one memory cell was block designs (BIBDs) [6,11] to construct regular LDPC
a priori limited by a capacity-like constant C as the codes. The constructions based on BIBD’s are purely
number of memory cells approached infinity. At the time, combinatorial and lend themselves to low complexity
the Gallager LDPC codes were the only class of codes implementations.
with this property, and therefore were naturally used There have been numerous papers on the application
in the Taylor’s memory scheme. For sufficiently small of turbo codes to recording channels [14–16,22] showing
probabilities of the component faults, using Gallager that high-rate Turbo codes can improve performance dra-
LDPC codes of length N, Taylor constructed a reliable matically. More recent work shows that similar gains can
memory from unreliable components and showed that the be obtained by the use of random LDPC and Turbo prod-
probability of failure after T time units, P(T, N), is upper- uct codes [17,28–30]. Because of their high complexity,
bounded by P(T, N) < A1 TN −α , where 0 < α < 1 and A the LDPC codes based on random constructions cannot be
is a constant. A similar bound with a better constant used for high speed applications (>1 Gbps) such as the
α was obtained [26]. In fact, Ref. 26 report even the next generation data storage channels.
tighter bound P(T, N) < A2 T exp{−γ N β }, with a constant In this article we describe how to construct LDPC codes
0 < β < 1. Using a slightly different ensemble of LDPC based on combinatorial techniques and compare their
and some other results [60], Kuznetsov [27] proved the performance with random constructions for perpendicular
existence of LDPC codes that lead to the ‘‘pure’’ exponential recording channels. In Section 2 we introduce various
bound with a constant β = 1 and decoding complexity deterministic construction techniques for LDPC codes. In
growing linearly with code length. A detailed analysis of Section 3 we describe the perpendicular channel under
the error correction capability and decoding complexity of investigation which is followed by the simulation results.
the Gallager-type LDPC codes was done later by Pinsker Finally, we conclude in Section 4.
and Zyablov [61].
Frey and Kschischang [18] showed that all compound 2. DESIGN AND ANALYSIS OF LDPC CODES
codes including Turbo codes [5], classical serially con-
catenated codes [3,4], Gallager’s low-density parity-check In this section we introduce several deterministic
codes [19], and product codes [20] can be decoded by constructions of low-density parity-check codes. The
Pearl’s belief propagation algorithm [41] also referred to constructions are based on combinatorial designs. The
DESIGN AND ANALYSIS OF LOW-DENSITY PARITY-CHECK CODES 659
codes constructed in this fashion are ‘‘well-structured’’ incomplete block design (BIBD) is a pair (V,B), where V
(cyclic and quasicyclic) and, unlike random codes, can is a v-element set and B is a collection of b · k-subsets
lend themselves to low-complexity implementations. of V, called blocks, such that each element of V is
Furthermore, the important code parameters, such as contained in exactly r blocks and any 2-subset of V is
minimum distance, code rate and the graph girth, are fully contained in exactly λ blocks. A design for which every
determined by the underlying combinatorial object used block contains the same number k of points, and every
for the construction. Several classes of codes with a wide point is contained in the same number r of blocks is
range of code rates and minimum distances are considered called a tactical configuration. Therefore, BIBD can be
First, we introduce a construction using general Steiner considered as a special case of a tactical configuration.
system (to be defined later), and then discuss projective The notation BIBD(v, k, λ) is used for a BIBD on v points,
and affine geometry codes as special cases of Steiner block size k, and index λ. The BIBD with a block size
systems. Our main focus are regular Gallager codes [19] k = 3 is called a Steiner triple system. A BIBD is called
(LDPC codes with constant column and row weight), but resolvable if there exists a nontrivial partition of its set
as it will become clear later, the combinatorial method of blocks B into parallel classes, each of which in turn
can be readily extended to irregular codes as well. The partitions the point set V. A resolvable Steiner triple
first construction is given by Vasic [54–56] and exploits system with index λ = 1 is called a Kirkman system. These
resolvable Steiner systems to construct a class of high-rate combinatorial objects originate from Kirkman’s famous
codes. The second construction is by Kou et al. [24,25], and schoolgirl problem posted in 1850 [23] (see also Refs. 49
is based on projective and affine geometries. and 43). For example, collection B = {B1 , B2 , . . . , B7 } of
The concept of combinatorial designs is well known, blocks B1 = {0, 1, 3}, B2 = {1, 2, 4}, B3 = {2, 3, 5}, B4 =
and their relation to coding theory is profound. The codes {3, 4, 6}, B5 = {0, 4, 5}, B6 = {1, 5, 6} and B7 = {0, 2, 6} is
as well as their designs are combinatorial objects, and a BIBD(7,3,1) system or a Kirkman system with v = 7,
their relation does not come as a surprise. Many classes and b = 7.
of codes including extended Golay codes and quadratic We define the block-point incidence matrix of a (V, B) as
residue (QR) codes can be interpreted as designs (see a b × v matrix A = (aij ), in which aij = 1 if the jth element
Refs. 8 and 48 and the classical book of MacWilliams and of V occurs in the ith block of B, and aij = 0 otherwise. The
Sloane [37]). The combinatorial designs are nice tools for point-block incidence matrix is AT .
code design in a similar way as bipartite graphs are helpful The block-point matrix for the BIBD(7,3,1) described
in visualizing the message passing decoding algorithm. above is
Consider an (n, k) linear code C with parity-check matrix 1 1 0 1 0 0 0
0 1 1 0 1 0 0
H [31]. For any vector x = (xv )1≤v≤n in C and any row of H
0 0 1 1 0 1 0
hc,v · xv = 0, 1 ≤ c ≤ n − k (1) A= 0 0 0 1 1 0 1
1 0 0 0 1 1 0
v
0 1 0 0 0 1 1
This is called the parity-check equation. To visualize 1 0 1 0 0 0 1
the decoding algorithm, the matrix of parity checks
is represented as a bipartite graph with two kinds of and the corresponding bipartite graph is given in Fig. 2.
vertices [38,58,50]. The first subset (B), consists of bits, Each block is incident with the same number of points
and the second is a set of parity-check equations (V). An k, and every point is incident with the same number r
edge between a bit and an equation exists if the bit is of blocks. If b = v, and hence r = k, the BIBD is called
involved in the check. For example, the bipartite graph symmetric. The concepts of a symmetric BIBD(v, k, λ) with
corresponding to k ≥ 3 and of a finite projective plane are equivalent (see
Ref. 34, Chap. 19). If one thinks of points as parity-check
1 0 0 1 0 1 1 equations and of blocks as bits in a linear block code, then
H = 0 1 0 1 1 1 0 the AT defines a matrix of parity checks H of a Gallager
0 0 1 0 1 1 1 code [19,36]. The row weight of A is k, column weight is r,
is shown in Fig. 1. and the code rate is R = [b − rank(H)]/b.
Checks
0 1 2 3 4 5 6 (points)
Figure 2. The bipartite graph representation of the Kirkman
Figure 1. An example of a bipartite graph. (7,3,1) system.
660 DESIGN AND ANALYSIS OF LOW-DENSITY PARITY-CHECK CODES
It is desirable to have each bit ‘‘checked’’ in as many base block of a (7,3,1) CDF. To illustrate this, we create
equations as possible, but because of the iterative nature an array (1) = (i,j ), of differences (1) i,j = b1,i − b1,j
of the decoding algorithm, the bipartite graph must not
contain short cycles. In other words, the graph girth (the 0 6 4
length of the shortest cycle) must be large [7,51]. These (1)
= 1 0 5
two requirements are contradictory, and the tradeoff is 3 2 0
especially difficult when we want to construct a code that
is both short and has a high rate. The girth constraint is Given base blocks Bj , 1 ≤ j ≤ t, the orbits Bj + g can be
related to the constraint that every t-element subset of V calculated as a set {bj,1 + g, . . . , bj,k + g}, where g ∈ V. A
is contained in as few blocks as possible. If each t-element construction of a BIBD is completed by creating orbits
subset is contained in exactly λ blocks, the underlying for all base blocks. If the number of base blocks in the
design is known as t design. An example of a 5-design is difference families is t, then the number of blocks in a
the extended ternary Golay code [8]. However, the codes BIBD is b = tv. For example, it can be easily verified (by
based on t designs (t > 2) have short cycles, and therefore creating the array ) that the blocks B1 = {0, 1, 4} and
we will restrict ourselves to the 2-designs (i.e., BIBD) B2 = {0, 2, 7} are the base block of a (13,3,1) CDF of a
or more specifically to the designs with the index λ = 1. group V = Z13 . The two parallel classes (or orbits) are as
The λ = 1 constraint means that no more than one block given below:
contains the same pair of points or, equivalently, that
there are no cycles of length four in a bipartite graph. The 0 1 2 3 4 5 6 7 8 9 10 11 12
lower bound on a rate of a 2-(v, k, λ)-design-based code is B1 + g = 1 2 3 4 5 6 7 8 9 10 11 12 0
given by 4 5 6 7 8 9 10 11 12 0 1 2 3
v(v − 1)
λ −v 0 1 2 3 4 5 6 7 8 9 10 11 12
k(k − 1)
R≥ (2) B2 + g = 2 3 4 5 6 7 8 9 10 11 12 0 1
v(v − 1)
λ 7 8 9 10 11 12 0 1 2 3 4 5 6
k(k − 1)
The corresponding matrix of parity checks is
This bound follows from the basic properties of designs
(Ref. 9, Lemma 10.2.1 and Corollary 10.2.2, pp. 344–345). 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0
By counting ones in H across the rows and across the 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1
0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0
columns, we conclude that
0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0
1 0 0
0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0
b·k=v·r (3) 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0
H = 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1
0 0 0
0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0
If we fix the point u, and find the number of pairs (u, w), 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0
u = w, we arrive to the relation
0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0
0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0
r · (k − 1) = λ · (v − 1) (4) 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1
Since the point-block incidence matrix (matrix of parity This matrix contains only columns of weight 3. Of course,
checks) H has v rows and b columns (v ≤ b), the rank since the order of the group in our example is low,
of A cannot be larger than v, and the code rate, the code rate is low as well (R ≥ 12 ). Generally, given a
which is R = [b − rank(AT )]/b cannot be smaller than (v, k, λ) CDF, a t · k-element subset of Zv , with base blocks
(b − v)/b. Dividing (3) and (4) yields (2). A more precise Bi = {bi,1 , . . . , bi,k }, 1 ≤ i ≤ t, the matrix of parity checks
characterization of code rate can be obtained by using the can be written in the form
rank (and p rank) of the incidence matrix of 2-designs as
explained by Hamada [21]. In the
case of t-designs, t > 1 H = [H1 H2 · · · Ht ] (5)
the number of blocks is b = λ vt / kt (see Ref. 34, p. 191).
We have shown that the concept of BIBD offers a useful where each submatrix is a circulant matrix of dimen-
tool for designing codes. Let us now show a construction of sion v × v. Formula (5) indicates that the CDF codes
Steiner triple systems using difference families of Abelian have a quasicyclic structure similar to the Townsend and
groups. Let V be an additive Abelian group of order v. Weldon [53] self-orthogonal quasicyclic codes and Wel-
Then t · k-element subsets of V, Bi = {bi,1 , . . . , bi,k }1 ≤ i ≤ t don’s [57] difference set codes. Each orbit in the design
form a (v, k, λ) difference family (DF) if every nonzero corresponds to one circulant submatrix in a matrix of
element of V can be represented exactly λ ways as a parity checks of quasicyclic codes.
difference of two elements lying in a same member of a The codes based on Zv are particularly interesting
family, that is, it occurs λ times among the differences because they are conceptually extremely simple and have
bi,m − bi,n , 1 ≤ i ≤ t, 1 ≤ m, n ≤ k. The sets Bi are called a structure that can be easily implemented in hardware.
base blocks. If V is isomorphic with Zv , a group of integers Notice also that for a given constraint (v, k, λ) the CDF-
modulo v, then a (v, k, λ) DF is called a cyclic difference based construction maximizes the code rate because for
family (CDF). For example, the block B1 = {0, 1, 3} is a a given v the number of blocks is maximized. The code
DESIGN AND ANALYSIS OF LOW-DENSITY PARITY-CHECK CODES 661
rate is independent of the underlying group. Other groups excellent reference. The Abel and Greg Table 2.3 in Ref. 12
different than Zv may lead to similar or better codes. (pp. 41–42) summarizes the known results in existence of
short designs. However, very often the construction of
2.2. Constructions for Cyclic Difference Families these design is somewhat heuristic or works only for a
As we have shown, it is straightforward to construct a given block size. In many cases such constructions give
BIBD once the CDF is known. However, finding the CDF a very small set of design with parameters of practical
is a much more complicated problem and solved only interests. An important subclass of BIBDs are so-called
for some values of v, k, and λ. Now we describe how to infinite families (Ref. 12, p. 67). The examples of these
construct a CDF. BIBDs include projective geometries, affine geometries,
unitals, Denniston designs and some geometric equiva-
2.2.1. The Netto’s First Construction. This scheme (see lents of 2-designs (see Refs. 2 and 12, VI.7.12). The known
Refs. 10 and 12, p. 28) is applicable if k = 3, and v is a infinite families of BIBD are listed in Table 1 [12,13]. Since
power of a prime, such that v ≡ 1 (mod 6). Since v is a q is a power of a prime, they can be referred as to finite
power of a prime, then Zv is a Galois field [GF(v)]. Let Euclidean and finite projective geometries.
be a multiplicative group of the field, and let ω be its As we explained above, resolvable BIBDs are those
generator [a primitive element in GF(v)] [39]. Write v as BIBDs that possess parallel classes of blocks. If the
v = 6t + 1, t ≥ 1, and for d, a divisor of v − 1, denote by
d blocks are viewed as lines, the analogy of BIBD and finite
the group of dth powers of ω in GF(6t + 1), and by ωi
d the geometry becomes apparent. It is important to notice that
coset of dth powers of ωi . Then the set {ωi
2t | 1 ≤ i ≤ t} not all BIBDs can be derived from finite geometries, but
defines a (6t + 1, 3, 1) difference family [40,59]. In the this discussion is beyond the scope of the discussion here.
literature the base blocks are typically given in the form The most interesting projective and affine geometries are
{0, ωi (ω2t − 1), ωi (ω4t − 1)} rather than as {ωi , ωi+2t , ωi+4t }. those constructed using Gallois fields [GF(q)]. In a m-
An alternative combinatorial way of constructing a CDF dimensional projective geometry PG(m, q) a point a is
is proposed by Rosa [33]. Rosa’s method also generates a specified by a vector a = (aj )0≤j≤m , where aj ∈ GF(q). The
(6t + 3, 3, 1) CDF. In Ref. 53 a list of constructions for vectors a = (aj )0≤j≤m , and λa = (λaj )0≤j≤m , [λ ∈ GF(q), λ =
rate- 12 codes is given. The details are not given here, but 0] are considered to be representatives of the same point.
the parameters are the same as of Netto construction. In There are q(m+1) − 1 nonzero tuples, and λ can take
Ref. 12 [p. 28] Colbourn noticed that this construction is q − 1 nonzero values, so that there are (q − 1) tuples
perhaps wrongly associated with Netto. representing the same point. Therefore, the total number
of points is (qm+1 − 1)/(q − 1). The line through two
2.2.2. The Netto’s Second Construction. This construc- distinct points a = (aj )0≤j≤m and b = (bj )0≤j≤m is incident
tion [40] can be used to create a cyclic difference family with the points from the set {µa + vb}, where µ and v are
when the number of points v is a power of a prime, and not both zero. There are q2 − 1 choices for µ and v, and
v ≡ 7 (mod 12). Again let ω be a generator of the multiplica- each point appears q − 1 times in line {µa + vb}, so that
tive group and let v = 6t + 1, t ≥ 1, and
d the group of the number of points on a line is (q2 − 1)/(q − 1) = q + 1.
dth powers in GF(v), then the set {ω2i
2t | 1 ≤ i ≤ t} defines For example, consider the nonzero elements of GF(22 ) =
base blocks of the so called Netto triple system. The Netto {0, 1, ω, ω2 } that represent the points of PG(2,4). The
systems are very interesting because of the following prop- triples a = (1, 0, 0), ωa = (ω, 0, 0), and ω2 a = (ω2 , 0, 0), as
erty. Netto triple systems on Zv , v power of a prime and we ruled, all represent the same point. The line incident
v ≡ 19 (mod 24) are Pasch-free [32,45,46] (see also Ref. 12, with a and b = (0, 1, 0) is incident with all of the
p. 214, Lemma 13.7), and as shown in Ref. 56 achieve the following distinct points: a, b, a + b = {1, 1, 0}, a + ωb =
upper bound on minimum distance. For example, consider {1, ω, 0}, a + ω2 b = (1, ω2 , 0). In the expression {µa + vb},
the base blocks of the Netto triple system difference family there are 42 − 1 = 15 combinations of µ and v that are not
on Zv (ω = 3) where B1 = {0, 14, 2}, B2 = {0, 40, 18}, B3 = both zero, but each point has 4 − 1 = 3 representations,
{0, 16, 33}, B4 = {0, 15, 39}, B5 = {0, 6, 7}, B6 = {0, 11, 20}, resulting in 15 3
points on each line.
and B7 = {0, 13, 8}. The resulting code is quasi-cyclic, has A hyperplane is a subspace of dimension m − 1 in
dmin = 6, length b = 301 and R ≥ 0.857. PG(m, q); that is, it consists of the points satisfying a
homogenous linear equation λj · aj = 0, λj ∈ GF(q). A
2.2.3. Burratti Construction for k = 4 and k = 5. For 0≤j≤m
k = 4, Burratti’s method gives CDFs with v points,
provided that v is a prime and v = 1 mod 12. The CDF
Table 1. Known Infinite Families of 2-(v, k, 1) Designs
is a set {ω6i B: 1 ≤ i ≤ t}, where base blocks have the
form B = {0, 1, b, b2 }, where ω is a primitive element in k v Parameter Name
GF(v). The numbers b ∈ GF(12t + 1) for some values of
q qn n ≥ 2, -power of Affine geometries
v are given in Ref. 10. Similarly, for k = 5, the CDF is
a prime
given as {ω10i B: 1 ≤ i ≤ t}, where B = {0, 1, b, b2 , b3 }, b ∈
q+1 (qn − 1)/(q − 1) n ≥ 2, -power of Projective geometries
GF(20t + 1). Some Buratti designs are given in Ref. 10.
a prime
2.2.4. Finite Euclidean and Finite Projective Geome- q+1 q3 + 1 q-power of a Unitals
tries. The existence and construction of short designs prime
(b < 105 ) is an active area of research in combinatorics. 2m 2m (2s + 1) − 2s 2≤m<s Denniston designs
The handbook edited by Colbourn and Dinitz [12] is an
662 DESIGN AND ANALYSIS OF LOW-DENSITY PARITY-CHECK CODES
projective plane of order q is a hyperplane of dimension example, the lines of slope 1 are the points {1, 9, 17, 25, 33},
m = 2 in PG(m, q). It is not difficult to see that the {2, 10, 18, 26, 34}, {3, 11, 19, 27, 35}, and so on. We assume
projective plane of order q is a BIBD[(q3 − 1)/(q − 1), q + that the lattice labels are periodic in vertical (y) dimension.
1, 1]. The Euclidean (or affine) geometry EG(q, m) is The slopes 2, 3, . . . , m can be defined analogously.
obtained by deleting the points of a one hyperplane A set B of all k-element sets of V obtained by taking
in PG(m, q). For example, if we delete a hyperplane labels of points along the lines with different slopes s,
λ0 a0 = 0, that is, the hyperplane with a0 = 0, then the 0 ≤ s ≤ m − 1 is a BIBD. Since m is a prime, for each
remaining qm points can be labeled by the m-tuples lattice point (x, y) there is exactly one line with slope
a = (aj )1≤j≤m . Euclidean or affine geometry EG(q, m) is s that go through (x, y). For each pair of lattice points,
a BIBD(q2 , q, 1) [37]. For more details, see Kou et al. [25]. there is exactly one line that passes through both points.
Therefore, the set B of lines is a 2-design. The block
2.2.5. Lattice Construction of 2-(v , k ,1) Designs. In this size is k, number of blocks is b = m2 and each point
section we address the problem of construction of BIBDs in the design occurs in exactly m blocks. We can also
of large block sizes. As shown, the Buratti-type CDFs and show [56] that the (m, k = 3) lattice BIBD is Pasch-
the projective geometry approach offer a quite limited set free. Consider a periodically extended lattice. The proof
of parameters and therefore a small class of codes. In this is based on the observation that it is not possible to
section we give a novel construction of 2-(v, k,1) designs draw the quadrilateral (no sides with infinite slope are
with arbitrary block size. The designs are lines connecting allowed) in which each point occurs twice. Figure 4 shows
points of a rectangular integer lattice. The idea is to trade one such quadrilateral. Without loss of generality, we
a code rate and number of blocks for the simplicity of can assume that the ting point of two lines is (0,0).
construction and flexibility of choosing design parameters. The slopes of four lines in Fig. 4 are s, p, p + a/2, and
The construction is based on integer lattices as explained s − a/2. The points (0,0), (0,a), (2,2s), and (2, a + 2p) are
below. all different, and each occupies two lines. As long as
Consider a rectangular integer lattice L = {(x, y): 0 ≤ the remaining four points are concerned, they will be
x ≤ k − 1, 0 ≤ y ≤ m − 1}, where m is a prime. Let l: L → V on two lines in one of these three cases: (1) s = p + a/2
be an one-to-one mapping of the lattice L to the point set and s + a/2 = a + p; (2) s = s + a/2 and p + a/2 = p + a;
V. An example of such mapping is a simple linear mapping and (3) s = p + a and p + a/2 = s + a/2 (all operations
l(x, y) = m · x + y + 1. The numbers l(x, y) are referred to are modulo m). Case 1 implies s − p = a/2, which means
as lattice point labels. For example, Fig. 3 depicts the that points (2,2s) and (2, a + 2p) are identical, which
rectangular integer lattice with m = 7 and k = 5. is a contradiction. If both equalities in cases 2 and 3
A line with slope s, 0 ≤ s ≤ m − 1, starting at the point were satisfied, then a would have to be 0, which would
(x, a), contains the points {(x, a + sx mod m): 0 ≤ x ≤ k − 1}, mean that two leftmost points are identical, again a
0 ≤ a ≤ m − 1. For each slope, there are m classes of contradiction.
parallel lines corresponding to different values. In our By generalizing this result to k > 3, we can conjecture
that the (m, k) lattice BIBDs contain no generalized Pasch
configurations. The lattice construction can be extended
7 to nonprime vertical dimension m, but the slopes s and m
must be coprime.
7 14 21 28 35 Figure 5 shows the growth of the required code length
6
with a lower bound on a minimum distance as a parameter.
6 13 20 27 34
5
(0,a)
5 12 19 26 33
4
(1,a + p)
4 11 18 25 32
3
y
(2,a +2p)
3 10 17 24 31 (1,s + a/2)
2
2 9 16 23 30
1 (1,p + a/2)
1 8 15 22 29
0 (2,2s)
(1,s)
−1 (0,0)
−1 0 1 2 3 4 5
x
Figure 3. An example of the rectangular grid for m = 7 and
k = 5. Figure 4. Quadrilateral in a lattice finite geometry.
DESIGN AND ANALYSIS OF LOW-DENSITY PARITY-CHECK CODES 663
104
103
Code length
d = 16
d = 14
102
d = 12 d = 10
d=8 d=6
3. APPLICATIONS OF LDPC CODES TO PERPENDICULAR where g(t) is the read-back voltage corresponding to an
RECORDING SYSTEMS isolated positive going transition of the write current
(transition response). For perpendicular recording systems
In this section we consider the performance of LDPC under consideration g(t) is modeled as
codes based on random and deterministic constructions
St
in perpendicular recording systems. First we give a brief 2 2
g(t) = √ e−x dx = erf (St)
description of the perpendicular recording channel model. π 0
&
3.1. Perpendicular Recording System where S = 2 1n2/D and D is the normalized density of
The write head, the magnetic medium, and the read head the recording system. There are various ways one can
constitute the basic parts of a magnetic recording system. define D. In this work we define D as D = T50 /Tc , where
The binary data are written onto the magnetic medium by T50 represents the width of impulse response at a half of
applying a positive or negative current to the inductive its peak value.
write head, creating a magnetic field that causes the In Figs. 6 and 7, the transition and dibit response
magnetic particles on the media to align themselves in [g(t + Tc /2) − g(t − Tc /2)] of a perpendicular recording
either of two directions. During playback, the alignment system are presented for various normalized densities.
of the particles produces a magnetic field rising that In our system, the received noisy read-back signal is
produces a voltage in the read head when it passes over the filtered by a lowpass filter, sampled at intervals of Tc
media. This process can be approximated in the following seconds, and equalized to a partial response (PR) target.
simplified manner. Therefore, the signal at the input to the detector at the
If the sequence of symbols ak ∈ C ⊆ {±1} is written on kth time instant can be expressed as
the medium, then the corresponding write current can be
expressed as zk = aj fk−j + wk
j
i(t) = (ak − ak−1 )u(t − KTc )
where wk is the colored noise sequence at the output of the
k
equalizer and {fn } are the coefficients of the target channel
where u(t) is a unit pulse of duration Tc seconds. Assuming response. In this work we investigated PR targets of the
the read-back process is linear, the induced read voltage, form (1 + D)n with special emphasis on n = 2, named PR2
V(t), can be approximated as case. The input signal-to-noise-ratio (SNR) is defined as
SNR = 10 × log10 (Vp2 /σ 2 ), where Vp is the peak amplitude
V(t) = ak g(t − kTc ) of the isolated transition response which is assumed to
k be unity.
664 DESIGN AND ANALYSIS OF LOW-DENSITY PARITY-CHECK CODES
0.8
ND = 1.5
ND = 2
0.6
ND = 2.5
ND = 3
0.4
0.2
Amplitude
0
−0.2
−0.4
−0.6
−0.8
−1
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 6. Transition response of a perpendicu-
lar recording channel. t /T
0.5
Amplitude
0.4
0.3
0.2
0.1
0
Figure 7. Dibit response of a perpendicular record- −5 −4 −3 −2 −1 0 1 2 3 4 5
ing channel. t /T
10−3
BER
10−4
10−5
10−6
16 16.5 17 17.5 18 18.5 19 19.5 20
SNR
10−3
BER
10−4
10−5
10−6
16 16.5 17 17.5 18 18.5 19 19.5 20
SNR
Figure 10 makes a similar comparison between random codes for a perpendicular recording channel. We have
LDPC codes of weights 3 and 4, and lattice designs with shown the tradeoff between BER performance, error floor,
weights 4 and 5. As in Kirkman constructions, lattice and complexity via extensive simulations.
designs have a higher error floor than do their random
counterparts.
Acknowledgment
The authors wish to thank our editor, John G. Proakis.
4. CONCLUSION
10−3
BER
10−4
10−5
10−6
16 16.5 17 17.5 18 18.5 19 19.5 20
SNR
electrical and computer engineering from the Northeast- Japan, Institute of Experimental Mathematics of Essen
ern University, Boston, Massachusetts, in 1994 and 1997, University in Germany, CRL of Hitachi, Ltd., in Tokyo,
respectively. He joined Quantum Corporation (now Max- Japan. In 1998–1999, he was a member of technical staff
tor) in 1996 as a senior design engineer and worked on of the Data Storage Institute (DSI) in Singapore, and
coding and signal processing for read-channel architec- worked on coding and signal processing for hard disk
tures employed in data storage devices. Since 1999, he drives. Currently, he is a research staff member of the
has been with Seagate Technology where he is the direc- Seagate Technology in Pittsburgh, Philadelphia.
tor of channels research working on future data coding
and recovery systems for storage applications. Dr. Kurtas Dr. Vasic received his B.S., M.S., and Ph.D., all in
has over 50 papers in various journals and conference electrical engineering from University of Nis, Serbia.
proceedings in the general field of digital communica- From 1996 to 1997 he worked as a visiting scientist
tions technology. His research interests span information at the Rochester Institute of Technology, and Kodak
theory, coding, detection and estimation, synchronization, Research, Rochester, New York, where he was involved
and signal processing techniques. Dr. Kurtas is the recipi- in research in optical storage channels. From 1998 to 2000
ent of the 2001 Seagate Technology Outstanding Technical he was with Lucent Technologies, Bell Laboratories. He
Contribution and Achievement Award. Dr. Kurtas has five was involved in research in read channel architectures
pending patent applications. and iterative decoding and low-density parity check
codes. He was involved in development of codes and
Alexander V. Kuznetsov received his Diploma with detectors for five generations of Lucent (now Agere)
excellence in radio engineering and a Ph.D. degree read channel chips. Presently, Dr. Vasic is a faculty
in information theory from the Moscow Institute of member of the University of Arizona, Electrical and
Physics and Technology, Moscow, Russia, and the degree Computer Engineering Department. Dr. Vasic is an author
of doctor of engineering sciences from the Institute of a more than fifteen journal articles, more than fifty
for Information Transmission Problems IPPI, Russian conference papers, and one book chapter ‘‘Read channels
Academy of Sciences, in 1970, 1973, and 1988, respectively. for magnetic recording,’’ in CRC Handbook of Computer
In 1973, he joined the Russian Academy of Sciences as a Engineering. He is a member of the editorial board of
scientific fellow, where for 25 years he worked on coding the IEEE Transactions on Magnetics. He will serve as a
theory and its applications to data transmission and technical program chair for IEEE Communication Theory
storage. On leave from IPPI, he held different research Workshop in 2003, and as a coorganizer of the Center for
positions at the Royal Institute of Technology and Lund Discrete Mathematics and Theoretical Computer Science
University in Sweden, Concordia University in Canada, Workshop on Optical/Magnetic Recording and Optical
Eindhoven Technical University in the Netherlands, Transmission in 2003. His research interests include:
Rand African University in RSA, Osaka University and coding theory, information theory, communication theory,
Nara Advanced Institute of Sciences and Technology in digital communications, and recording.
DESIGN AND ANALYSIS OF LOW-DENSITY PARITY-CHECK CODES 667
40. E. Netto, Zur theorie der triplesysteme, Math. Ann. 42: DIFFERENTIATED SERVICES
143–152 (1893).
41. J. Pearl, Probabilistic Reasoning in Intelligent Systems: IKJUN YEOM
Networks of Plausible Inference, Morgan Kaufmann, San Korea Advanced Institute of
Mateo, CA, 1988. Science and Technology
Seoul, South Korea
42. L. Ping and K. Y. Wu, Concatenated tree codes: A low-
complexity, high performance approach, IEEE Trans. Inform. A. L. NARASIMHA REDDY
Theory (Dec. 1999). Texas A&M University
43. D. K. Ray-Chaudhuri and R. M. Wilson, Solution of Kirk- College Station, Texas
man’s school-girl problem, Proc. Symp. Pure Math., Amer.
Mathematical Society, Providence, RI, 1971, pp. 187–203. 1. INTRODUCTION
44. T. Richardson, A. Shokrollahi, and R. Urbanke, Design of
provably good low-density parity check codes, IEEE Trans. Providing quality of service (QoS) has been an important
Inform. Theory 47: 619–637 (Feb. 2001). issue in Internet engineering, as multimedia/real-time
applications requiring certain levels of QoS such as
45. R. M. Robinson, Triple systems with prescribed subsystems,
delay/jitter bounds, certain amount of bandwidth and/or
Notices Am. Math. Soc. 18: 637 (1971).
loss rates are being developed. The current Internet
46. R. M. Robinson, The structure of certain triple systems, Math. provides only best-effort service, which does not provide
Comput. 29: 223–241 (1975). any QoS. The Internet Engineering Task Force (IETF) has
47. C. E. Shannon, A mathematical theory of communication, proposed several service architectures to satisfy different
Bell Syst. Tech. J. 372–423, 623–656 (1948). QoS requirements to different applications.
48. E. Spence and V. D. Tonchev, Extremal self-dual codes from The Differentiated Services (Diffserv) architecture has
symmetric designs, Discrete Math. 110: 165–268 (1992). been proposed by IETF to provide service differentiation
according to customers’ service profile called service-level
49. D. R. Stinson, Frames for Kirkman triple systems, Discrete
agreement (SLA) in a scalable manner [1–4]. To provide
Math. 65: 289–300 (1988).
service differentiation to different customers (or flows),
50. R. M. Tanner, A recursive approach to low complexity codes, the network needs to identify the flows, maintain the
IEEE Trans. Inform. Theory IT-27: 533–547 (Sept. 1981). requested service profile and conform the incoming traffic
51. R. M. Tanner, Minimum-distance bounds by graph analysis, based on the SLAs.
IEEE Trans. Inform. Theory 47(2): 808–821 (Feb. 2001). A simple Diffserv domain is illustrated in Fig. 1.
52. M. G. Taylor, Reliable information storage in memories In a Diffserv network, most of the work for service
designed from unreliable components, Bell Syst. Tech. J. differentiation is performed at the edge of the network
47(10): 2299–2337 (1968). while minimizing the amount of work inside the network
core, in order to provide a scalable solution. The routers
53. R. Townsend and E. J. Weldon, Self-orthogonal quasi-cyclic
at the edge of the network, called boundary routers or
codes, IEEE Trans. Inform. Theory IT-13(2): 183–195 (1967).
edge routers, may monitor, shape, classify, and mark
54. B. Vasic, Low density parity check codes: Theory and practice, differentiated services code point (DSCP) value assigned
National Storage Industry Consortium (NSIC) quarterly to specific packet treatment to packets of flows (individual
meeting, Monterey, CA, June 25–28, 2000. or aggregated) according to the subscribed service. The
55. B. Vasic, Structured iteratively decodable codes based on routers in the network core forward packets differently to
Steiner systems and their application in magnetic recording, provide the subscribed service. The core routers need to
Proc. GLOBECOM 2001, San Antonio, TX, Nov. 2001. provide only several forwarding schemes, called per-hop
56. B. Vasic, Combinatorial constructions of structured low- behaviors (PHBs) to provide service differentiation. It is
density parity check codes for iterative decoding, IEEE Trans. expected that with appropriate network engineering, per-
Inform. Theory (in press). domain behaviors (PDBs), and end-to-end services can be
57. E. J. Weldon, Jr., Difference-set cyclic codes, Bell Syst. Tech. constructed based on simple PHBs.
J. 45: 1045–1055 (Sept. 1966).
1.1. Differentiated Services Architecture
58. N. Wiberg, H.-A. Loeliger, and R. Kötter, Codes and iterative
decoding on general graphs, Eur. Trans. Telecommun. 6: In the Diffserv network, we can classify routers into two
513–525 (Sept./Oct. 1995). types, edge or boundary routers and core routers according
to their location. A boundary router may act both as
59. R. M. Wilson, Cyclotomy and difference families in elemen-
an ingress router and as an egress router depending
tary Abelian groups, J. Number Theory 4: 17–47 (1972).
on the direction of the traffic. Traffic enters a Diffserv
60. V. V. Zyablov and M. S. Pinsker, Error correction properties network at an ingress router and exits at an egress router.
and decoding complexity of the low-density parity check codes, Each type of router performs different functions to realize
2nd Int. Symp. Information Theory, Tsahkadsor, Armenia, service differentiation. In this section, we describe each of
1971. these routers with its functions and describe how service
61. V. V. Zyablov and M. S. Pinsker, Estimation of the error- differentiation is provided.
correction complexity for Gallager low-density codes, Prob-
lems Inform. Trans. 11: 18–28 (1975) [transl. from Problemy 1.1.1. Boundary Router. A basic function of ingress
Peredachi Informatsii 11(1): 23–26]. routers is to mark DSCPs to packets based on SLAs so that
DIFFERENTIATED SERVICES 669
H
E
H
R
E
H
core routers can distinguish the packets and forward them 1.1.3. Assured Forwarding PHB. Assured forwarding
differently. When a packet arrives at an ingress router, (AF) PHB has been proposed in [4,6]. In the AFPHB,
the router first identifies the flow to which the packet the edge devices of the network monitor and mark
belongs and monitors the flow to conform to its contract. incoming packets of either individual or aggregated flows.
If it conforms to its contract, the packet is marked by A packet of a flow is marked IN (in profile) if the
the DSCP of the contracted service. Otherwise, the packet temporal sending rate at the arrival time of the packet
may be delayed, discarded, or unmarked to make injected is within the contract profile of the flow. Otherwise,
traffic conform to its contract. the packet is marked OUT (out of profile). Packets of
An egress router may monitor traffic forwarded to other a flow can be hence marked both IN and OUT. The
domains and perform shaping or conditioning to follow temporal sending rate of a flow is measured using time
a traffic conditioning agreement (TCA) with the other sliding window (TSM) or a token bucket controller. IN
domains. packets are given preferential treatment at the time of
congestion; thus, OUT packets are dropped first at the time
1.1.2. Core Router. In the Diffserv model, functions of
of congestion.
core routers are minimized to achieve scalability. A core
Assured forwarding can be realized by employing RIO
router provides requested PHB specified in the DS field
(RED with IN/OUT) drop policy [6] in the core routers. RIO
of the arriving packets. Currently, expedited forwarding
drop policy is illustrated in Fig. 2. Each router maintains a
(EF) [3] and assured forwarding (AF) [4] PHBs have been
virtual queue for IN packets and a physical queue for both
standardized by IETF.
IN and OUT packets. When the network is congested and
To provide end-to-end QoS, we may consider flows that
the queue length exceeds minTh− OUT, the routers begin
transit over multiple network domains. While a hop-by-
dropping OUT packets first. If the congestion persists even
hop forwarding treatment in a Diffserv domain is defined
after dropping all incoming OUT packets and the queue
by a PHB, Per-domain behavior (PDB) is used to define
length exceeds minTh− IN, IN packets are discarded. With
edge-to-edge behavior over a Diffserv domain [5]. A PDB
this dropping policy, the RIO network gives preference
defines metrics that will be observed by a set of packets
to IN packets and provides different levels of service
with a particular DSCP while crossing a Diffserv domain.
to users per their service contracts. Different marking
The set of packets subscribing to a particular PDB is
policies [7–9] and correspondingly appropriate droppers
classified and monitored at an ingress router. Conformant
have been proposed to improve the flexibility in providing
packets are marked with a DSCP for the PHB associated
service differentiation.
with the PDB. While crossing the Diffserv domain, core
routers treat the packets based only on the DSCP. An 1.1.4. Expedited Forwarding PHB. Expedited forward-
egress router may measure and condition the set of packets ing (EF) PHB was proposed [3] as a premium service for
belonging to a PDB to ensure that exiting packets follow the Diffserv network. The EFPHB can be used to guar-
the PDB. antee low loss rate, low latency, low jitter, and assured
1
Drop probability
Que. length
minTh_OUT maxTh_OUT minTh_IN maxTh_IN Figure 2. RED IN/OUT (RIO) drop policy.
670 DIFFERENTIATED SERVICES
excess bandwidth Be as the difference between realized different reservations when there is plenty of excess
throughput and its contract rate. It is given by bandwidth in the network.
' ( ) • The realized bandwidth is observed to be inversely
k 2 W related to the RTT of the flow.
3 −R if R ≥ &
4RTT pout 2 • For best-effort flows, R = 0. Hence, Be (= k 6/pout /
Be = '( ) (2) 2RTT; see line D in Fig. 4) gives the bandwidth likely
k 6 to be realized by flows with no reservation.
R2 + − R otherwise
2RTT pout • Comparing the above mentioned best-effort band-
&
width when R ≥ 2/pout , we realize that the reser-
If Be of a flow is positive, this means that the flow vation rates larger than 3.5 times the best-effort
obtains more than its contract rate. Otherwise, it does bandwidth cannot be met.
not reach its contract rate. From (2) and Fig. 4, we can
• Equation (2) clearly shows that excess bandwidth
observe that
cannot be equally shared by flows with different
reservations without any enhancements to basic RIO
• When&a flow reserves relatively higher bandwidth scheme or to TCP’s congestion avoidance mechanism.
(R ≥ 2/pout ), Be is decreased as the reservation
rate
& is increased. Moreover, if R is greater than When several TCP flows are aggregated, the impact
3 2/pout (see line C in Fig. 4), the flow cannot reach of an individual TCP sawtooth behavior is reduced,
its reservation rate. and the aggregated sending rate is stabilized. If the
• When& a flow reserves relatively lower bandwidth marker neither maintains per-flow state nor employs
(R < 2/pout ; see line B in Fig. 4), it always realizes other specific methods for distinguishing individual
at least its reservation rate. As it reserves less flows, an arriving packet is marked IN with the
bandwidth, it obtains more excess bandwidth. probability, pm (= contract rate/aggregated sending rate).
TCP’s multiplicative decrease of sending rate after In the steady state, pm is approximately equal for all the
observing a packet drop results in a higher loss of individual flows. A flow sending more packets then gets
bandwidth for flows with higher reservations. This more IN packets, and consequently, the contract rates
explains the observed behavior. consumed by individual flows is roughly proportional
• Equation (2) also shows that as the probability of to their sending rates. We call this marking behavior
OUT packet drop decreases, the flows with smaller proportional marking.
reservation benefit more than do the flows with With the assumptions that all packets of aggregated
larger reservations. This points to the difficulty in flows are of the same size, k, a receiver does not employ
providing service differentiation between flows of delayed ACK, and the network is not oversubscribed, the
throughput of the ith flow and the aggregated throughput
BA of n flows are given in [19] by
× 105 mi 3rA 3k
10 Bi = · + mi (3)
Simulation
n
4 4
9 mj
Estimation
j=1
8
n
3k
n
Achieved rate (bps)
7
BA = Bi = mi (4)
6 i=1
4 i=1
• A: B = Contract rate
n
where Bs = 3k
4
mi , and it is approximately the through-
2
• B: R = i=1
√ pout put which the aggregated flows can achieve with zero
• C: R = 3 2 contract rate (rA = 0). According to the analysis above, the
√ pout following observations can be made:
• D: B = k 6
2RTT √ pout
• The total throughput realized by an aggregation is
Figure 4. Observations from the model. impacted by the contract rate. The larger the contract
672 DIFFERENTIATED SERVICES
rate, the smaller the excess bandwidth claimed by the Cross traf. 1 Cross traf. n − 1
aggregation.
• When the contract rate is larger than 4 times Bs , the Tagged Tagged
realized throughput is smaller than the contract rate. flow 1 2 n −1 n flow
• The realized throughput of a flow is impacted by
the other flows in the aggregation (as a result of Cross traf. 1 Cross traf. n −1
the impact on pm ) when proportional marking is
employed. Figure 5. Multihop topology.
(a) (b)
Target rate
Instantaneous rate
0.5 Avg. achieved rate 0.5
Achieved rate (Mbps)
0.3 0.3
0.2 0.2
Target rate
0.1 0.1
Instantaneous rate
Avg. achieved rate
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Marking rate (Mbps) Marking rate (Mbps)
(c)
0.5
Achieved rate (Mbps)
0.4
0.3
0.2
Target rate
0.1
Instantaneous rate
Avg. achieved rate
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Marking rate (Mbps)
Figure 6. Achieved rates with the adaptive marking, with targets of (a) 0.1 Mbps, (b) 0.3 Mbps,
and (c) 0.5 Mbps .
4.5
3. DELAY ANALYSIS OF THE EFPHB
4
3.5 The EFPHB is proposed to provide low delay, jitter, and
3
loss rate. Low loss rate is easily achievable with strict
admission control and traffic conditioning. Regarding the
2.5 EFPHB, providing low delay and jitter is an important
50 60 70 80 90 100 110 120 130 140 150
issue and has been widely studied. In this section, we
Round–trip time (msec.)
introduce studies on delay analysis of the EFPHB and
Figure 7. Throughput of flows with different round-trip present some experimental work for practical performance
times . evaluation of the EFPHB.
674 DIFFERENTIATED SERVICES
3.1. Virtual Wire Service at a node that is serving a non-EF packet. The worst-case
delay occurs when the EF packet arrives immediately after
Virtual wire (VW) service aims to provide a very low end-
the node begins to send a non-EF packet, and it is S/B,
to-end delay jitter for flows subscribing to the EFPHB [22].
where S is the packet size. From (7), n should be at least 2
More precisely, as the term, virtual wire suggests, VW
in order to satisfy the jitter window, and which means that
service is intended to ‘‘mimic, from the point of view of
the bandwidth assigned to EF traffic should be configured
the originating and terminating nodes, the behavior of a
to be less than half of the link bandwidth.
hard-wired circuit of some fixed capacity [22].’’
To provide VW service, the following two conditions
3.2. Packet-Scale Rate Guarantee
are required: (1) each node in the domain must implement
the EFPHB, and (2) conditioning the aggregate so that it’s Packet-scale rate guarantee service has been proposed to
arrival rate at any node is always less than that node’s analyze and characterize the delay of the EFPHB more
departure rate. If the arrival rate is less than the virtual precisely [10–12]. A node is said to provide packet-scale
wire’s configured rate, packets are delivered with almost rate guarantee R with latency E if, all j ≥ 0, the jth
no distortion in the interpacket timing. Otherwise, packets departure time, d(j), of the EF traffic is less than or equal
are unconditionally discarded not to disturb other traffic. to F(j) + E, where F(j) is defined iteratively by
In [22], a jitter window is defined as the maximum time
interval between two consecutive packets belonging to a F(0) = 0, d(0) = 0 (8a)
flow so that the destination cannot detect delay jitter.
Consider a constant-bit rate (CBR) flow with rate R. The For all j > 0:
packets of the flow arrive at an egress router through a
link of bandwidth B(= nR) and leave to their destination L(j)
F(j) = max[a(j), min{d(j − 1), F(j − 1)}] + (8b)
through a link of bandwidth R as shown in Fig. 8. R
Let’s consider two consecutive packets: P0 and P1 . To
where a(j) is the arrival time of the jth packet of
transmit P1 immediately after transmitting P0 and to hide
length L(j).
jitter, the last bit of P1 should arrive at the node before T1 .
If a node provides packet-scale rate guarantee, it has
T1 is the time when the last bit of P0 leaves the node and
been shown [10] that any EF packet arriving at a node
is calculated by
S at time t will leave that node no later than at time
T1 = T0 + (6) t + Q/R + E, where Q is the total backlogged EF traffic at
R
time t. This property infers that a single-hop worst delay is
where S is the packet size. Then, the jitter window, , is bounded by B/R + Ep when all input traffic is constrained
given by by a leaky-bucket regulator with parameters (R, B), where
S S S 1 R is the configured rate, and B is the bucket size. Note
= − = × 1− (7)
R B R n that Ep is the error term for processing individual EF
packets.
Generally, a packet experiences propagation, transmis- If every node in a Diffserv domain regulates its EF
sion, and queueing delay while traveling a network path. input traffic with the maximum burst size B, then
As long as the path is identical, propagation and transmis- the worst case delay of each packet crossing h hops
sion delay are the same. Therefore, as long as the sum of is just h times the single hop delay. However, the
queueing delays that a packet observes in an EF domain Diffserv architecture performs traffic conditioning only
is less than , the destination does not observe jitter. at the ingress and not in the interior. As a result, it
There are three possible sources of queueing delay of a is possible for bursty traffic larger than B to arrive
VW packet: (1) non-EF packets ahead of the packet in a at a node in the network even if the EF traffic has
queue, (2) the previous VW packet of the same flow, and been regulated initially at the ingress. This problem
(3) VW packets of other flows. In a properly configured EF may be solved by appropriately distributing traffic and
domain, the arrival rate of the EF traffic should be less limiting the number of hops traversed by each flow.
than the departure rate at any node. Then, the queueing Therefore, we need to consider topology constraints as
delay caused by the other EF traffic (cases 2 and 3) is zero. well as ingress traffic conditioning when designing an
In an EF domain, EF and non-EF packets are separately EF service. Topology-independent utilization bounds have
queued, and EF packets are serviced at a higher priority. been obtained for providing bounded delay service that
Therefore, case 1 occurs only when an EF packet arrives take network diameter into account [23].
is required to configure EF nodes [3]. The parameter 4. NOVEL AND OTHER QoS
route specifies a DS-domain-to-DS-domain route. The
parameters startTime and endTime denote the time The Diffserv framework is still in development, and new
duration of the traffic, peakRate is the peak rate of services are constantly being proposed. In this section, we
the traffic, MTU is the maximum transmission unit, and introduce some of the proposals receiving attention.
jitter is the worst-case jitter required. QPS traffic is
regulated at an ingress node by a token bucket profiler 4.1. A Bulk Handling Per-Domain Behavior
with token rate of peakRate and bucket depth of MTU.
Some applications need to transfer bulk data such as movie
The QPS provides low loss rate, low queueing delay,
files or backup data over the network without any timing
and low jitter as long as the traffic arriving at the ingress
constraints. These transfers do not need any assurance as
node conforms to its token bucket profile. It is necessary
long as they are eventually completed. A bulk-handling
to discard the nonconformant packets or to reshape them
(BH) per-domain behavior (PDB) has been proposed [28]
in order to achieve the desired QoS.
for supporting such traffic. It is reasonable to exclude
QBone enables us to evaluate and analyze the
such traffic from best-effort traffic in order to prevent
Diffserv performance more practically through large scale,
competition with other valuable traffic.
interdomain experiments. An interesting experiment
BHPDB traffic has the lowest priority in a Diffserv
with video streaming applications has been conducted
network. A BH packet is transferred only when there is
for observing end-to-end performance of the Diffserv
no other packet. In the presence of other traffic, it may
network [26]. Here we present a summary of the
be discarded or delayed. To implement the BHPDB, only
experiment and results.
marking for PDB is enough. Policing or conditioning is
3.3.1. Video Streaming Applications in Diffserv Net- not required since there is no configured rate, reserved
works. With faster computer networks and improved resources or guarantees. BHPDB traffic is different from
compression schemes, video streaming applications are best-effort (BE) traffic in that there are at least some
becoming popular. To improve quality of video playback resources assigned to BE traffic.
on a given network, the applications deploy new technolo-
gies such as FEC (forward error correction) and layered 4.2. An Assured Rate Per-Domain Behavior
coding to adjust to network dynamics. However, there is An assured rate (AR) PDB has been proposed [29]. The
a limit to improving the quality without any QoS support ARPDB provides assured rate for one-to-one, one-to-few,
from the network. To play back video clips seamlessly, a or one-to-any traffic. Here one-to-one traffic means traffic
certain amount of bandwidth, bounded jitter, and low loss entering a network at one ingress router and exiting at one
rate are required. Therefore, video streaming applications egress router, and one-to-few traffic means traffic having
can be good target applications of a Diffserv network. more than one fixed egress routers. One-to-any traffic
Experiments with video streaming applications in a refers to traffic entering at one ingress router and exiting
Diffserv network with EFPHB have been performed [26]. at multiple (any) egress routers. Assured rate can be
A local network with several routers configured for implemented using the AFPHB.
EFPHB connected to the QBone network was used The ARPDB is suitable for traffic requiring certain
as a testbed. Quality index and frame loss rate were amount of bandwidth but no delay bounds. A possible
measured with various token rates and bucket sizes example service with the ARPDB is VPN (virtual private
at ingress router policers. Quality index was measured network) services with one-to-few traffic. One-to-any
by a variant of ITS (Institute of Telecommunication service of the ARPDB can be used to deliver multicast
Sciences) video quality measurement (VQM) tool [27]. traffic with a single source, but appropriate traffic
The VQM captures features of individual and sequence
measurement should be supported since packets in a
of frames from both the received and original streams,
multicast are duplicated in the interior of the network
and measures the difference between them. The VQM
and the total amount of traffic at the egress routers may
was originally developed to measure quality of television
be larger than the amount of traffic at the ingress router.
and videoconferencing systems. The variant was modified
The ARPDB stipulates how packets should be marked and
from the VQM for measuring quality of video stream
treated across all the routers within a Diffserv domain.
transmitted over IP networks.
From the experiments and measurements, interesting
4.3. Relative Differentiated Services
relationship between video quality and token bucket
parameters of EF ingress policer have been observed: Relative differentiated service has been proposed [30] to
(1) the quality of video stream can be controlled by the provide service differentiation without admission control.
ingress policer, (2) frame loss rate itself does not reflect The fundamental idea of the relative differentiated
the quality of the video stream, (3) the token rate should services is based on the fact that absolute service
be configured to be larger than the video encoding rate guarantees are not achievable without admission control.
for achieving acceptable quality, and (4) a small increase In such cases, only relative service differentiation can be
of bucket depth may result in substantial improvement of provided since network resources are limited.
the video quality. A proportional delay differentiation model, which pro-
The first observation confirms that the Diffserv vides delays to each class on the basis of a proportionality
network can provide service differentiation to real-world constraint, has been proposed [30]. Backlog-proportional
applications. rate schedulers and waiting-time priority schedulers [31]
676 DIFFERENTIATED SERVICES
24. B. Teitelbaum, QBone Architecture (v1.0), Internet2 QoS was the invention of frequency modulation (FM) by
Working Group Draft, Aug. 1999; see: http://www.internet Armstrong in the 1930s and the improvement in audio
2.edu/qos/wg/qbArch/1.0/draft-i2-qbonearch-1.0.html. quality that came along with the new technology. Because
25. K. Nichols, V. Jacobson, and L. Zhang, A Two-Bit Differen- FM needed higher transmission bandwidths, it was
tiated Services Architecture for the Internet, Nov. 1997; see: necessary to move to higher frequencies. While the first FM
ftp://ftp.ee.lbl.gov/papers/dsarch.pdf. radio stations in the United States operated at frequencies
26. W. Ashmawi, R. Guerin, S. Wolf, and M. Pinson, On the around 40 MHz, today’s FM radio spectrum is located in
impact of pricing and rate guarantees in Diff-Serv networks: the internationally recognized VHF FM band from 88
A video streaming application perspective, Proc. ACM to 108 MHz. Although the reach of the FM signals was
Sigcomm’01, Aug. 2001. below that of the former AM programs, due to the shorter
27. ITU-T Recommendation J. 143, User Requirements for Objec- wavelength of the FM band frequencies, broadcasters
tive Perceptual Video Quality Measurements in Digital Cable adopted the new technology after initial skepticism and
Television, Recommendations of the ITU, Telecommunication the popularity of FM broadcasting grew rapidly after its
Standardization Sector. introduction.
28. B. Carpenter and K. Nichols, A Bulk Handling Per- The next step toward increased perceived quality was
Domain Behavior for Differentiated Services, Internet Draft, the introduction of stereobroadcasting in the 1960s. Con-
Jan. 2001; see: http://www.ietf.org/internet-drafts/draft-ietf- stant advances in audiobroadcasting technology resulted
diffserv-pdb-bh-02.txt. in today’s high-quality audio reception with analog trans-
29. N. Seddigh, B. Nandy, and J. Heinanen, An Assured Rate mission technology. So, what is the reason for putting in a
Per-Domain Behavior for Differentiated Services, Work in lot of time and effort into developing a new digital system?
Progress, July 2001. In 1982 the compact disk (CD) was introduced and
30. C. Dovrolis, D. Stiliadis, and P. Ramanathan, Proportional replaced the existing analog record technology within just
differentiated services: delay differentiation and packet a few years. The advantages of this new digital medium
scheduling, Proc. SIGCOMM’99, Aug. 1999, pp. 109–120. are constant high audio quality combined with robustness
31. L. Kleinrock, Queueing Systems, Vol. II, Wiley, 1976. against mechanical impacts. The new technology also
resulted in a new awareness of audio quality and the
32. C. Dovrolis and P. Ramanathan, Proportional differentiated
services, Part 2: Loss Rate Differentiation and Packet CD signal became a quality standard also for other audio
Dropping, Proc. IWQoS’00, Pittsburgh PA, June 2000. services like broadcasting.
Using good analog FM equipment, the audio quality
is fairly comparable to CD sound when the receiver
is stationary. However, along with the development of
DIGITAL AUDIOBROADCASTING broadcasting technology, the customs of radio listeners
also changed. While at the beginning, almost all receivers
RAINER BAUER were stationary, nowadays many people are listening to
Munich University of audiobroadcasting services in their cars, which creates
Technology (TUM)
a demand for high-quality radio reception in a mobile
Munich, Germany
environment. This trend also reflects the demand for
mobile communication services, which has experienced
1. INTRODUCTION a dramatic boom.
Due to the character of the transmission channel, which
The history of broadcasting goes back to James Clerk leads to reflections from mountains, buildings, and cars,
Maxwell, who theoretically predicted electromagnetic for example, in combination with a permanently changing
radiation in 1861; and Heinrich Hertz, who experimentally environment caused by the moving receiver, a number
verified their existence in 1887. The young Italian of problems arise that cannot be handled in a satisfying
Guglielmo Marconi picked up the ideas of Hertz and way with existing analog systems. The problem here is
began to rebuild and refine the Hertzian experiments and termed multipath propagation and is illustrated in Fig. 1.
soon was able to carry out signaling over short distances The transmitted signal arrives at the receiver not only
in the garden of the family estate near Bologna. In the via the direct (line-of-sight) path but also as reflected and
following years the equipment was continuously improved scattered components that correspond to different path
and transmission over long distances became possible. delays and phase angles. This often results in severe
The first commercial application of radio transmission interference and therefore signal distortion.
was point-to-point communication to ships, which was Another drawback is interference caused by transmis-
exclusively provided by Marconi’s own company. Pushed sion in neighboring frequency bands (adjacent-channel
by the promising possibilities of wireless communication interference) or by transmission of different programs
the idea of broadcasting soon also came up. The first in the same frequency with insufficient spacial distance
scheduled radio program was transmitted by a 100-W between the transmitters (cochannel interference).
transmitter at a frequency around 900 kHz in Pittsburgh The tremendous progresses in digital signal processing
late in 1920. and microelectronics in the 1980s initiated a trend to
The installation of more powerful transmitters and supplement and even substitute existing analog systems
progress in receiver technology led to a rapidly increasing with digital systems. Introduction of the compact disk
number of listeners. A milestone in broadcasting history was already mentioned above, but a change from analog
678 DIGITAL AUDIOBROADCASTING
to digital systems also took place in the field of and mass communications. Broadcasting will have to face
personal communications. This was seen, for example, competition with other communication networks that have
in the introduction of ISDN (Integrated Services Digital access to the consumer. To meet future demands, it is
Network) in wireline communications and also in GSM necessary to introduce digital broadcasting systems.
(Global System for Mobile Communications) for personal
mobile communications. The latter system caused a
2. SYSTEM ASPECTS AND DIFFERENT APPROACHES
rapidly growing market for mobile communications
systems and drove the development of new high-
Besides technical aspects, the introduction of a digital
performance wireless communication techniques.
audiobroadcasting system must also meet market issues.
Several advantages of digital systems suggest the appli-
The transition from an analog to a digital system has
cation of digital techniques also in broadcasting, which
to take place gradually. Users have to be convinced
is still dominated by analog systems [1]. For instance,
that digital audiobroadcasting has significant advantages
advanced digital receiver structures and transmission and
compared to analog systems in order to bring themselves
detection methods are capable of eliminating the distor-
to buy new digital receivers instead of conventional
tions caused by multipath propagation, which enables
analog ones. This can be obtained only by added value
high-quality reception also in mobile environments.
and new services. On the other hand, the situation
To use the available frequency spectrum in an efficient
of the broadcasters and network operators must be
way, the digitized audio signals can be compressed
considered. Simultaneous transmission of digital and
with powerful data reduction techniques to achieve
analog programs results in increasing costs that have
a compression factor of up to 12 without perceptible
to be covered. Standards have to be adopted to guarantee
degradation of audio quality compared to a CD signal.
planning reliability to broadcasters, operators, and also
Techniques such as channel coding to correct trans-
manufacturers [2].
mission errors are applicable only to digital signals. These
methods allow an error free transmission even when errors
2.1. An Early Approach Starts Services Worldwide
occur on the transmission channel.
With advanced signal detection and channel coding An early approach to standardize a digital audiobroad-
techniques, a significantly lower signal to noise ratio is casting system was started in the 1980s by a European
sufficient in producing a particular sound quality when telecommunication consortium named Eureka. In 1992
compared to equivalent analog transmission systems. This the so-called Eureka-147 DAB system was recommended
results in lower transmission power and therefore reduced by the International Telecommunication Union (ITU) and
costs and also less ‘‘electromagnetic pollution,’’ which has became an ITU standard in 1994. The standard is aimed at
recently become a topic of increasing importance and terrestrial and satellite sound broadcasting to vehicular,
relevance. portable and fixed receivers and is intended to replace the
Another advantage of a digital broadcasting system analog FM networks in the future. The European Telecom-
is its flexibility. While conventional analog systems are munications Standard Institute (ETSI) also adopted this
restricted to audio transmission, a digital system can standard in 1995 [3].
also provide other services besides audio transmission The terrestrial service uses a multicarrier modulation
and is therefore open to trends and demands of technique called coded orthogonal frequency-devision
the future. In combination with other digital services multiplex (COFDM), which is described later in this article
such as Internet or personal communications, a digital along with other technical aspects of this system. With
broadcasting technology further supports the convergence this multiplexing approach up to six high-quality audio
of information and communication techniques. Also programs (called an ensemble) are transmitted via closely
there is a change of paradigm in the way we obtain spaced orthogonal carriers that allocate a total bandwidth
information in everyday life. The growing importance of of 1.536 MHz. The overall net data rate that can be
the Internet obliterates the boundaries between individual transmitted in this ensemble is approximately 1.5 Mbps
DIGITAL AUDIOBROADCASTING 679
(megabits per second) [4]. Although originally a sound several advantages with respect to the received signal
broadcasting system was to be developed, the multiplex quality. However, the ensemble format also shows
signal of Eureka-147 DAB is very flexible, allowing the drawbacks that aggravate the implementation of Eureka-
transmission of various data services also. Because of the 147 DAB in certain countries. The broadcasting market
spreading in frequency by OFDM combined with channel in the United States for example is highly commercial
coding and time interleaving, a reliable transmission can with many independent private broadcasters. In this
be guaranteed with this concept. A further advantage environment it would be almost impossible to settle on
of the Eureka-147 system is the efficient use of radio a joint transition scenario from analog to digital. Also, the
spectrum by broadcasting in a single-frequency network reserved frequency spectrum in the L band is not available
(SFN). This means that one program is available at in the United States [6]. Therefore, a different system was
the same frequency across the whole coverage area of considered by the U.S. broadcasting industry. The idea was
the network. In a conventional analog audiobroadcasting that no additional frequency spectrum (which would cause
environment, each program is transmitted on a separate additional costs, because spectrum is not assigned but
carrier (in FM broadcasting with a bandwidth of auctioned off in the United States) should be necessary,
300–400 kHz). If the network offers the same program and the broadcasters should remain independent from
outside the coverage area of a particular transmitter each other and be able to decide on their own when
another frequency is usually used to avoid cochannel to switch from analog to digital. A concept that fulfills
interference. On the other hand, adjacent channels within all these demands is in-band/on-channel (IBOC) digital
the coverage area of one transmitter are not used in order audiobroadcasting. In this approach the digital signal is
to reduce adjacent-channel interference. This leads to a transmitted in the frequency portion to the left and the
complex network and requires a sophisticated frequency right of the analog spectrum, which is left free to avoid
management, both of which can be avoided with a single- interference from nearby signals. The basic concept is
frequency network. The basic structures of a single- shown in Fig. 3. The spectral position and power of the
frequency network and a multiple frequency network digital signal is designed to meet the requirements of the
are compared in Fig. 2. A further advantage of a single- spectrum mask of the analog signal. The IBOC systems
frequency network is the simple installation of additional will be launched in an ‘‘hybrid IBOC’’ mode, where the
transmitters to fill gaps in the coverage. digital signal is transmitted simultaneously to the analog
To operate a system like Eureka-147 DAB dedicated one. When there is sufficient market penetration of the
frequency bands have to be reserved because it cannot digital services, the analog programs can be switched off
coexist with an analog system in the same frequency and the spectrum can be filled with a digital signal to
band. In 1992 the World Administrative Radio Conference obtain an all-digital IBOC system.
(WARC) reserved almost worldwide a 40-MHz wide After the merger of USA Digital Radio (USADR) and
portion (1452–1492 MHz) of the L band for this purpose. Lucent Digital Radio (LDR) to iBiquity Digital Corp. there
In several countries a part of the spectrum in the VHF is currently a single developer of IBOC technology in the
band III (in Germany, 223–240 MHz) is also reserved for United States.
Eureka-147 DAB.
At present, the Eureka-147 DAB system is in operation 2.3. Digital Sound Broadcasting in Frequency Bands Below
or in pilot service in many countries worldwide [5]. 30 MHz
2.2. A System for More Flexibility The first radio programs in the 1920s were transmitted
at frequencies around 900 kHz using analog amplitude
The strategy to merge several programs into one modulation (AM). Although frequency modulation (FM),
broadband multiplex signal prior to transmission has which was introduced in the late 1930s and operated at
higher frequencies, allows a better audio quality, there
(a) are still many radio stations, particularly in the United
States, that use the AM bands for audiobroadcasting. One
f6 f7 advantage of these bands is the large coverage area that
f1
f3
f4
f5
f1 Power-
f2 spectrum
Analog host
(b) Digital sidebands
f1 f1
f1
f1 f1
f1 f1
f1 f1
f1
f
Figure 2. Conventional FM networks (a) and single-frequency
network (b). Figure 3. In-band/on-channel signaling.
680 DIGITAL AUDIOBROADCASTING
can be obtained because of the propagation conditions of diversity when a direct signal to one of the satellites is
these frequencies. A major drawback, however, is the poor blocked.
sound quality, due to the limited bandwidth available in To guarantee reception even in situations when the
these low-frequency bands. satellite signal is totally blocked, for example, in tunnels or
On the other hand, the coverage properties still make in urban canyons, Sirius and XM use terrestrial repeaters
this frequency band attractive for audiobroadcasting. With that rebroadcast the signal.
state-of-the-art audio compression technology and digital One major advantage of a satellite system is the
transmission methods, the sound quality can be increased large coverage area. Regions with no or poor terrestrial
to near FM quality. broadcasting infrastructure can be easily supplied with
In 1994 the European project (Narrow Band Digital high-quality audio services. On the other hand, it is
Broadcasting (NADIB)) started to develop concepts for a difficult to provide locally oriented services with this
digital system operating in the AM band. The international approach.
consortium Digital Radio Mondiale (DRM) [7], which
was officially founded in 1998, began to develop an 2.5. Integrated Broadcasting Systems
appropriate system. Together with a second approach
Besides systems that are designed primarily to broadcast
proposed by iBiquity Digital Corp., the ITU finally gave
audio services, approaches emerge that provide a technical
a recommendation that encompasses both systems for
platform for general broadcasting services. One of these
digital sound broadcasting in the frequency bands below
systems is the Japanese Integrated Services Digital Broad-
30 MHz [8].
casting (ISDB) approach, which covers satellite, cable, and
One major difference between both approaches is the
terrestrial broadcasting as distribution channels. ISDB is
strategy to deal with existing analog AM signals. While
intended to be a very flexible multimedia broadcasting
the DRM scheme is an all-digital approach that occupies
concept that incorporates sound and television and data
all frequencies within the channels with a bandwidth of 9
broadcasting in one system. The terrestrial component
or 10 kHz or multiples of these bandwidths, the iBiquity
ISDB-T, which is based on OFDM transmission technol-
system applies the IBOC strategy described above for FM
ogy, will be available by 2003–2005. A second scheme that
systems.
should be mentioned here is DVB-T, the terrestrial branch
of the European digital video broadcasting (DVB) system,
2.4. Mobile Satellite Digital Audio Radio Services (SDARS)
which is also capable of transmitting transparent services
While the intention of terrestrial digital audio broadcast- but is not optimized for mobile transmission.
ing technology is to replace existing analog radio in the
AM and FM bands, new satellite platforms are emerging 2.6. Which Is the Best System?
that are intended for mobile users. Previous satellite ser-
To summarize the different approaches in sound broad-
vices such as ASTRA Digital (a proprietary standard of
casting to mobile receivers, we have to consider the basic
the satellite operator SES/ASTRA) or DVB-S, the satel-
demands of the respective market (both customer and
lite distribution channel of the European DVB (Digital
broadcaster aspects).
Video Broadcasting) system, were designed to serve sta-
If the situation among the broadcasters is sufficiently
tionary users with directional satellite dishes. In 1999 the
homogeneous, which allows the combination of several
WorldSpace Corp. started its satellite service to provide
individual programs to ensembles that are jointly
digital high quality audio to emerging market regions such
transmitted in a multiplex, then the Eureka-147 system
as Africa, Asia, and South and Central America [9]. Orig-
provides a framework for spectrally efficient nationwide
inally developed to serve portable receivers, the system is
digital audiobroadcasting. By adopting this system, a
about to be expanded to mobile receivers as well.
strategy for the transition from analog to digital that
Two systems that are targeting mobile users from the
is supported by all parties involved must be mapped out.
beginning are the two U.S. satellite systems operated
A more flexible transition from analog to digital with
by Sirius Satellite Radio Inc. and XM Satellite Radio Inc.,
no need for additional frequency bands is possible with
which intended to launch their commercial service in 2001.
the IBOC approach. Also, small local radio stations can be
Both systems are based on proprietary technology [10].
more easily considered using this approach.
The three systems differ in the orbital configurations
However, if the strategy is to cover large areas with the
of their satellites. The complete WorldSpace network will
same service, then satellite systems offer advantages over
consist of three geostationary satellites (each using three
the terrestrial distribution channel.
spot beams and serving one of the intended coverage areas
Africa, parts of Asia, and South and Central America). The
XM system uses only two geostationary satellites located 3. THE EUREKA-147 DAB SYSTEM
to guarantee optimum coverage of the United States. A
completely different approach was chosen for the Sirius Several existing and future systems have been summa-
system, where three satellites in a highly elliptical orbit rized in the previous section. In this section we focus on
rise and set over the coverage area every 16 h. The orbit Eureka-147 DAB as it was the first digital audiobroad-
enables the satellites to move across the coverage area at casting system in operational service.
a high altitude (even higher than a geostationary orbit) Eureka-147 DAB is designed to be the successor of
and therefore also provide a high elevation angle. Two of FM stereobroadcasting. It started as a proposal of a
the three satellites are always visible to provide sufficient European consortium and is now in operational or pilot
DIGITAL AUDIOBROADCASTING 681
service in countries around the world [5]. Although several services together with information about the multiplex
alternative approaches are under consideration, right now structure, service information, and so on. Audio and data
Eureka-147 DAB is the first all-digital sound broadcasting services form the main service channel (MSC), while
system that has been in operation for years. When we refer service and multiplex information are combined in the fast
to DAB below, we mean Eureka-147 DAB. Some features information channel (FIC). Every input branch undergoes
of the system are a specific channel coding matched to the particular
protection level required by the channel. Data compression
• Data compression with MPEG-Audio layer 2 is specified only for audio signals in the DAB standard.
(MPEG — Moving Picture Experts Group) according Data services are transparent and are transmitted in
to the standards ISO-MPEG 11172-3 and ISO-MPEG either a packet mode or a stream mode.
1318-3 (for comparison with the well-known audio After additional time interleaving, which is skipped
compression algorithm MP3, which is MPEG-Audio in Fig. 4, audio and data services form the MSC, which
layer 3). consists of a sequence of so-called common interleaved
• Unequal error protection (UEP) is provided to the frames (CIFs) assembled by the main service multiplexer.
compressed audio data where different modes of pro- The final transmission signal is generated in the
tection are provided to meet the requirements of transmission frame multiplexer, which combines the MSC
different transmission channels, namely, radiofre- and the FIC. Together with the preceding synchronization
quency transmission for a variety of scenarios or information, OFDM symbols are formed and passed to the
cable transmission. transmitter.
In the following the main components of the block
• The concept of (coded orthogonal frequency-devision
diagram (flowchart) in Fig. 4 are explained in more detail.
multiplex (COFDM)) copes very well with the
problem of multipath propagation, which is one of 3.2. Audio Coding
the major problems in mobile systems. Furthermore,
this transmission scheme allows the operation of a Since the bit rate of a high-quality audio signal (e.g., a CD
single-frequency network (SFN). signal with 2 × 706 kbps) is too high for a spectral efficient
transmission, audio coding according to the MPEG Audio
• The DAB transmission signal carries a multiplex
layer 2 standard is applied. For a sampling frequency of
of several sound and data services. The overall
48 kHz the resulting bitstream complies with the ISO/IEC
bandwidth of one ensemble is 1.536 MHz and
11172-3 layer 2 format, while a sampling frequency of
provides a net data rate of approximately 1.5 Mbps.
24 kHz corresponds to ISO/IEC 13818-3 layer 2 LSF
The services can be combined very flexibly within one
(low sampling frequency). The main idea behind the
ensemble. Up to six high-quality audio programs or a
compression algorithms is utilization of the properties
mixture of audio and data services can be transmitted
of human audio perception, which are based on spectral
in one ensemble.
and temporal masking phenomena. The concept is shown
in Fig. 5, where the sound pressure level is plotted as
3.1. General Concept
a function of frequency. The solid curve indicates the
The basic concept of signal generation in a DAB system threshold in quiet, which is the sound pressure level of a
is shown in Fig. 4. The input to the system may either pure tone that is barely audible in a quiet environment.
be one or several audio programs or one or several data The curve shows that a tone has to show a higher level
90
Masker
80
50
40
30
20
10
0
at very low and very high frequencies to be perceivable Each segment is then transformed from the time domain
than at medium frequencies around 3 kHz, where the into the frequency domain by a filter bank with 32
human auditory system is very sensitive. Besides the subbands. In parallel, the masking threshold is calculated
threshold in quiet, each additional sound also creates for each segment in a psychoacoustic model. The subband
a masking pattern that is also depicted in the figure. samples undergo a quantization process in which the
The shape of the pattern depends on the level and the number of quantization levels are controlled by the
frequency of the underlying sound. A general observation requirements given by the psychoacoustic model. Finally,
is that the slope of the masking pattern is steeper toward the quantized samples together with the corresponding
lower frequencies than in the opposite direction. All sound side information that is necessary to reconstruct the signal
events that lie below this masking pattern (also indicated in the decoder are multiplexed into an audio frame. The
in Fig. 5) are not perceivable by the human ear and frame also contains program-associated data (PAD).
therefore do not have to be transmitted. Since a general The bit rates available for DAB are between 8 and
audio signal consists of a more complex spectrum, the 192 kbps for a monophonic channel. To achieve an audio
first step in audio coding is a transformation from the quality that is comparable to CD quality, approximately
time domain into the frequency domain. Each frequency 100 kbps are necessary per monochannel, which means
component creates a masking pattern by itself, and the a data reduction of a factor of ∼7. Since the Eureka-147
masking pattern of the overall signal can be calculated DAB standard was fixed in the early 1990s, MPEG Audio
by a superposition of the individual patterns. The signal layer 2 was chosen for audio coding because it combines
is divided into frequency bands in the spectral domain, good compression results with reasonable complexity.
and each band is coded (quantized) separately in such a With today’s leading-edge audio compression algo-
way that the quantization noise lies below the masking rithms such as MPEG2-AAC (advanced audiocoding) or
threshold in this band. By this technique the quantization the Lucent PAC (perceptual audiocoder) codec, compres-
noise can be shaped along the frequency axis, and the sion gains of a factor of 12 can be realized without
overall bit rate that is necessary to represent the signal noticeable differences in a CD signal. The AAC codec,
can be reduced. for example, is proposed for audiocoding in the Digital
A simplified block diagram of the audio encoder is Radio Mondiale approach as well as in the Japanese ISDB
shown in Fig. 6. For processing, the sampled signal is system, while the PAC technology is applied in the Sirius
divided into segments of length 24 or 48 ms, respectively. and XM satellite services, for example.
PCM 32 DAB
audio Filter bank Quantization Frame
samples 32 sub-bands and coding packing
Audio frame
Psychoacoustic Bit
model allocation PAD
3.3. Channel Coding the corresponding bit at the output of the convolutional
encoder is transmitted, while a binary 0 indicates that
Between the components for audioencoding and channel the bit is not transmitted (punctured). In the DAB
encoding an energy dispersal is performed by a pseudo specification, 24 of these puncturing patterns are defined,
random scrambling that reduces the probability of which allows a selection of code rates between 89 and
systematic regular bit patterns in the data stream. 8
. A code rate of 89 means that for every 8 bits that
32
The basic idea of channel coding is to add redundancy enter the encoder, 9 bits are finally transmitted over the
to a digital signal in such a way, that the redundancy channel (depicted in the upper pattern in Fig. 7). On the
can be exploited in the decoder to correct transmission other hand, with a code rate of 32 8
, all output bits are
errors. In DAB, a convolutional code with memory six transmitted as indicated by the lower pattern in Fig. 7.
and rate 14 is applied. The encoder is depicted in Fig. 7. To apply unequal error protection, the puncturing pattern
The code rate 14 means that for every information bit can be changed within an audio frame. This is shown
that enters the encoder, 4 coded bits are produced. In in Fig. 8. While header and side information is protected
the case of an audio signal not every part of the source with the largest amount of redundancy, the scale factors
coded audio frame has the same sensitivity to transmission are protected by a weaker code, and the subband samples
errors. Very sensitive segments have to be protected by finally have the least protection. The program-associated
a strong code, while a weaker code can be applied to data (PAD) at the end of an audio frame are protected
other parts. This concept, called unequal error protection roughly by the same code rate as the scale factors. Besides
(UEP), can easily be realized by a technique termed the different protection classes within an audio frame, five
puncturing, which means that not every output bit of protection levels are specified to meet the requirements of
the convolutional encoder is transmitted. According to a the intended transmission scenario. A cable transmission
defined rule, some of the bits are eliminated (punctured) for example needs far less error protection than a very
prior to transmission. This technique is also shown in critical mobile multipath environment. Therefore, the
Fig. 7. A binary 1 in the puncturing pattern means that DAB specifications contain 64 different protection profiles
Selection of protection
profile
+ + + + 11111111
..
.
11111111
+ + + + 10000000
..
.
11111111
+ + + 00000000
..
.
11111111
00000000
..
.
11111111
for different combinations of audio bit rate (32–384 kbps) transform (FFT). Therefore, the modulation symbols of
and protection level. a transmission frame are mapped to the corresponding
carriers before an IFFT generates the corresponding time-
3.4. Modulation and Transmission Format domain transmission signal. The individual carriers of the
DAB signal are DQPSK-modulated, where the ‘‘D’’ stands
As shown in Fig. 4, the channel coded data are multiplexed
for differential, meaning that the information is carried
in the main service multiplexer. The structure of the
in the phase difference between two successive symbols
frames that are finally transmitted across the air
rather than in the absolute phase value. This allows an
interface depends on the chosen transmission mode. DAB
information recovery at the receiver by just comparing
provides four different modes depending on the intended
the phases of two successive symbols. To initialize this
radiofrequency range. The duration of a transmission
process, the phase reference symbol has to be evaluated,
frame is either 24, 48, or 96 ms, and each frame consists
which is located at the beginning of each transmission
of data from the fast information channel (FIC) and the
frame.
main service channel (MSC) preceded by synchronization
One essential feature of OFDM we did not mention yet
information.
is the guard interval. Since each overlapping of received
The transmission scheme for DAB is orthogonal
symbols disturbs the orthogonality of the subcarriers and
frequency-devision multiplex (OFDM), the motivation for
leads to a rapidly decreasing system performance, this
which will be illustrated in a brief example.
effect has to be avoided. By introduction of a guard interval
A main problem of the mobile radio channel is
at the beginning of each OFDM symbol, this interference
multipath propagation. Let us consider a system that
can be avoided. This guard interval is generated by
transmits at a symbol rate of r = 1 Msps (million symbols
periodically repeating the tail fraction of the OFDM symbol
per second) (with QPSK modulation, this means a bit
at the beginning of the same symbol. As long as the path
rate of 2 Mbps) on a single-carrier system. Hence, one
delay is not longer than the guard interval, only data
symbol has a duration of Tsc = 1 µs (where subscript
that belong to the actual symbol fall into the receiver
‘‘sc’’ indicates single carrier), and the bandwidth of the
window. By this technique all information of delayed
system is 1/Tsc = 1 MHz. We further assume a maximum
path components contribute constructively to the received
channel delay τmax of 80 µs. This means a difference in
signal. This concept is sketched in Fig. 9.
length between the direct path and the path with the
The length of the guard interval specifies the maximal
largest delay of 24 km, which is a reasonable value for
allowed path delay. In order to fix the parameters of
a terrestrial broadcasting channel. If we consider the
an OFDM system, a tradeoff between two elementary
relation between the symbol duration and the maximal
properties of the mobile radio channel has to be made.
path delay τmax /Tsc = 80, we see that this approach leads
On one hand, the guard interval has to be sufficiently
to heavy intersymbol interference since a received signal
large to avoid interference due to multipath effects. This
is still influenced by the 80 previously sent symbols.
can be obtained by a large symbol duration. Because the
We can cope with this problem more easily if we do
carrier spacing is the inverse of the symbol duration, this
not transmit the datastream on a single carrier but
results in a large number of closely spaced carriers within
multiplex the original datastream into N parallel steams
the intended bandwidth. On the other hand, a small carrier
(e.g., N = 1000) and modulate a separate carrier frequency
spacing means a high sensitivity to frequency shifts caused
with each individual stream. With the overall symbol rate
by the Doppler effect when the receiver is moving. This, in
given above, the symbol rate on each carrier is reduced
turn, depends on the used radiofrequency and the speed
to r/N = 1000 sps, which means a symbol duration of
of the vehicle.
only Tmc = 1 ms (where subscript ‘‘mc’’ denotes multiple
In Table 1 four sets of transmission parameters are
carriers). If we compare again the symbol duration with
given for the different scenarios (transmission modes).
the maximal channel delay, a received symbol overlaps
Mode I is suitable for terrestrial single frequency networks
with only an 8% fraction of the previous symbol.
in the VHF frequency range. Mode II is suitable for
But what does this mean for the bandwidth that
smaller single-frequency networks or locally oriented
is necessary to transmit the large number of carriers?
conventional networks because of the rather small
A very elegant way to minimize the carrier spacing
guard interval. Mode III has an even smaller guard
without any interference between adjacent carriers is
interval and is designed for satellite transmission on
to use rectangular pulses on each subcarrier and
frequencies up to 3 GHz where path delay is not the
space the resulting sin(x)/x-type spectra by the inverse
dominating problem. Mode IV is designed for single-
of the pulse (symbol) duration Tmc . This results in
frequency networks operating at frequencies higher than
orthogonal carriers, and the overall bandwidth of the
those of Mode I, and the parameters are in between those
system with the parameters given above is N × 1/Tmc =
of Mode I and Mode II.
1000 × 1 kHz = 1 MHz, which is the same as for the
single-carrier approach. This multicarrier approach with
3.5. DAB Network
orthogonal subcarriers, called orthogonal frequency-
devision multiplex (OFDM), is used as the transmission As mentioned before, DAB allows the operation of single-
scheme in Eureka-147 DAB. Another advantage of OFDM frequency networks. Several transmitters synchronously
is the simple generation of the transmission signal by an broadcast the same information on the same frequency.
inverse discrete Fourier transform (IDFT), which can be This becomes possible because of the OFDM transmission
implemented with low complexity using the fast fourier technique. For the receiver, it makes no difference whether
DIGITAL AUDIOBROADCASTING 685
Channel
impulse
response
t
Transmission
m−1 m m+1 m+2 m+3
frame
Direct signal
Delayed version 2
the received signal components all come from the same of Technology (TUM) in 1995 and 2002, respectively.
transmitter or stem from different transmitters. Each Since 1995, he has been working at the Institute for
component contributes constructively to the received Communications Engineering of TUM. His areas of
signal as long as the path delays stay within the guard interest are source-channel coding and decoding, as well
interval. Therefore, the maximal distance between the as iterative decoding techniques and their application to
transmitters is determined by the length of the guard wireless audio communications.
interval specified by the used transmission mode. To
ensure synchronous transmission within the network,
BIBLIOGRAPHY
a time reference is necessary, which can be provided,
for example, by the satellite-based Global Positioning
System (GPS). 1. W. Hoeg and T. Lauterbach, eds., Digital Audio Broadcast-
ing — Principles and Applications, Wiley, New York, 2001.
2. WorldDAB Forum (no date), public documents (online),
BIOGRAPHY WorldDAB Documentation: Thought on a transition scenario
from FM to DAB, Bayerische Medien Technik GmbH
Rainer Bauer received his Dipl.-Ing. degree in electrical (BMT), http://www.worlddab.org/dab/aboutdab frame.htm
engineering and Dr.-Ing. degree from Munich University (Aug. 2001).
686 DIGITAL FILTERS
3. European Telecommunications Standards Institute, Radio The synergy between these two factors results in an ever-
Broadcasting Systems; Digital Audio Broadcasting (DAB) to widening range of applications for digital filters, both fixed
Mobile, Portable and Fixed Receivers, European Standard and adaptive.
ETSI EN 300 401 V1.3.2 (2000-09), 2000.
4. WorldDAB Forum (no date), public documents (online), 1.1. Signals, Systems, and Filters
EUREKA-147 — Digital Audio Broadcasting, http://www.
worlddab.org/dab/aboutdab frame.htm (Aug. 2001). Signals are the key concept in telecommunications — they
represent patterns of variation of physical quantities such
5. WorldDAB (2001), homepage (online), http://www.World
DAB.org. as acceleration, velocity, pressure, and brightness. Com-
munication systems transmit the information contained in
6. D. Lavers, IBOC — Made in America, Broadcast Dialogue
a signal by converting the physical pattern of variation into
(Feb. 2000).
its electronic analogue — hence the term ‘‘analog signal.’’
7. Digital Radio Mondiale (2001), Homepage (online), http:// Systems operate on signals to modify their shape. Filters
www.DRM.org. are specialized systems, designed to achieve a particular
8. International Telecommunication Union, Systems for Digital type of signal shape modification. The relation between the
Sound Broadcasting in the Broadcasting Bands below signal that is applied to a filter and the resulting output
30 MHz, draft new recommendation ITU-R BS.[DOC.6/63], signal is known as the response of the filter.
Oct. 2000. Most signals encountered in telecommunication sys-
9. WorldSpace (2001), homepage (online), http://www.world tems are one-dimensional (1D), representing the variation
space.com. of some physical quantity as a function of a single indepen-
10. D. H. Layer, Digital radio takes the road, IEEE Spectrum dent variable (usually time). Multidimensional signals are
38(7): 40–46 (2001). functions of several independent variables. For instance, a
video signal is 3D, depending on both time and (x, y) loca-
tion in the image plane. Multidimensional signals must be
DIGITAL FILTERS converted by scanning to a 1D format before transmission
through a communication channel. For this reason, we
HANOCH LEV-ARI shall focus here only on 1D signals that vary as a function
of time.
Northeastern University
Boston, Massachusetts Filters can be classified in terms of the operation
they perform on the signal. Linear filters perform linear
operations. Offline filters process signals after they have
1. INTRODUCTION been acquired and stored in memory; online filters process
signals instantaneously, as they evolve. The majority of
The omnipresence of noise and interference in communi- filters used in telecommunication systems are online and
cation systems makes it necessary to employ filtering to linear. Since the input and output of an online filter
suppress the effects of such unwanted signal components. are continuously varying functions of time, memory is
As the cost, size, and power consumption of digital hard- required to store information about past values of the
ware continue to drop, the superiority of digital filters signal. Analog filters use capacitors and inductors as their
over their analog counterparts in meeting the increas- memory elements, while digital filters rely on electronic
ing demands of modern telecommunication equipment registers.
becomes evident in a wide range of applications. Moreover,
the added flexibility of digital implementations makes 1.2. Digital Signal Processing
adaptive digital filtering a preferred solution in situations
where time-invariant filtering is found to be inadequate. Digital signal processing is concerned with the use of
The design of linear time-invariant digital filters is by digital hardware to process digital representations of
now a mature discipline, with methodological roots reach- signals. In order to apply digital filters to real life (i.e.,
ing back to the first half of the twentieth century. Digital analog) signals, there is a need for an input interface,
filter design combines the power of modern computing with known as analog-to-digital converter, and an output
the fundamental contributions to the theory of optimized interface, known as digital-to-analog converter. These
(analog) filter design, made by Chebyshev, Butterworth, involve sampling and quantization, both of which result
Darlington, and Cauer. In contrast, the construction of in some loss of information. This loss can be reduced by
adaptive digital filters is still an evolving area, although increasing the speed and wordlength of a digital filter.
its roots can be traced back to the middle of the twentieth Digital filters have several advantages over their analog
century, to the work of Kolmogorov, Wiener, Levinson, and counterparts:
Kalman on statistically optimal filtering. Adaptive filter-
ing implementations became practical only after the early • Programmability — a time-invariant digital filter
1980s, with the introduction of dedicated digital signal can be reprogrammed to change its configuration
processing hardware. and response without modifying any hardware. An
The practice of digital filter design and implementation adaptive digital filter can be reprogrammed to change
relies on two fundamental factors: a well-developed its response adaptation algorithm.
mathematical theory of signals and systems and the • Flexibility — digital filters can perform tasks that are
availability of powerful digital signal processing hardware. difficult to implement with analog hardware, such as
DIGITAL FILTERS 687
• A signal x(·) is called bounded if there exists a positive 1 The Krönecker delta should not be confused with the continuous-
constant B such that |x(n)| ≤ B for all n. time impulse, namely, the Dirac delta function.
688 DIGITAL FILTERS
signal is bounded, and every bounded signal must have memory is required to store each input sample as
finite power: it becomes available until its processing has been
completed, and it is no more needed by the system.
Finite energy ⇒ boundedness ⇒ finite power The number of input samples that need to be kept in
memory at any given instant is known as the order
Also, since finite-energy signals have the property
of the system, and it is frequently used as a measure
limn→±∞ x(n) = 0, they represent transient phenomena.
of system complexity.
Persistent phenomena, which do not decay with time, are
represented by finite-power signals. • A relaxed system is called causal if its output
y(n) at any given instant n does not depend on
2.2. Discrete-Time Systems future samples of the input x(·). In particular, every
memoryless system is causal. Causality is a physical
A discrete-time system is a mapping namely, an operation constraint, applying to systems in which n indicates
performed on a discrete-time signal x(·) that produces (discretized) physical time.
another discrete-time signal y(·) (Fig. 2). The signal x(·) is • A relaxed system is called time-invariant if its
called the input or excitation of the system, while y(·) is response to the time-shifted input x(n − no ) is a
called the output or response. similarly shifted output y(n − no ), for every input
Most synthetic (human-made) systems are relaxed, signal x(·) and for every integer no , positive or
in the sense that a zero input [i.e., x(n) = 0 for all n] negative. In the context of (digital) filters, time-
produces a zero output. Nonrelaxed systems, such as a invariance is synonymous with fixed hardware,
wristwatch, rely on internally stored energy to produce while time variation is an essential characteristic
an output without an input. In general every system can of adaptive filters.
be decomposed into a sum of two components: a relaxed
• A relaxed system is called stable if its response to
subsystem, which responds to the system input, and an
any bounded input signal x(·) is a bounded output
autonomous subsystem, which is responsible for the zero-
signal y(·). This is known as bounded-input/bounded-
input response (Fig. 3). Since every analog and every
output (BIBO) stability, to distinguish it from other
digital filter is a relaxed system, we restrict the remainder
definitions of system stability, such as internal (or
of our discussion to relaxed systems only.
Lyapunov) stability.
Relaxed systems can be classified according to certain
input–output properties: • A relaxed system is called linear if its response to the
input x(·) = a1 x1 (·) + a2 x2 (·) is y(·) = a1 y1 (·) + a2 y2 (·)
for any a1 , a2 and any x1 (·), x2 (·). Here y1 (·) (resp.
• A relaxed system is called memoryless or static if its
y2 (·)) is the system’s response to the input x1 (·) [resp.
output y(n) at any given instant n depends only on
x2 (·)].
the input sample x(n) at the same time instant, and
does not depend on any other input sample. Such a
As explained in the introduction, digital filters can be
system can be implemented without using memory;
either fixed or adaptive. Fixed digital filters are time
it maps every input sample, as it becomes available,
invariant and most often linear, while adaptive digital
into a corresponding output sample.
filters usually consist of a linear time-variant module and
• A relaxed system is called dynamic if its output y(n) a nonlinear time-invariant module.
at any given instant n depends on past and/or future Our definitions of fundamental system properties make
samples of the input signal x(·). This means that use of several basic building blocks of discrete-time
systems in general, and digital filters in particular:
Forced response These three basic building blocks — unit delay, addition,
x(n) Relaxed
and signal scaling — are relaxed, linear, time-invariant,
+ y(n)
Figure 3. The decomposition of a nonrelaxed system. Figure 4. Block diagram representation of a unit-delay element.
DIGITAL FILTERS 689
are indistinguishable
when
f1 − f2 is an integer. For
instance, cos 3π 2
n = cos − π2 n , so that the frequency
x(n) a ax(n) f1 = 34 cannot be distinguished from its alias f2 = − 41 . From
a mathematical standpoint, we observe that the sinusoidal
signal of (1) is periodic in f0 with a period of 1. In order
Figure 6. Block diagram representation of a signal scaler. to avoid ambiguity, we must therefore restrict f0 to its
fundamental (or principal) range − 12 < f0 ≤ 12 . In addition,
by using the symmetry property cos(−x) = cos x, we can
causal, and stable systems. In fact, every linear, time-
avoid using negative frequencies altogether, which leads
invariant, and causal system can be implemented as
to the frequency range specified in Eq. (1b).
a network of interconnected delays, scalers and adders
The constraint on f0 is closely related to the well-known
(see Section 3.2). Notice that the adder and scaler are
Nyquist condition: the range of frequencies allowed at
memoryless, while the unit delay is dynamic.
the input of a sampler cannot exceed half the sampling
In reality, scalers are implemented using digital
rate. In particular, when the continuous-time sinusoid
multipliers (Fig. 7). Multipliers can also be used to form
cos 2π f0 t is sampled at a rate of Fs samples per second,
the pointwise product of two signals (Fig. 8), and they
the resulting sequence of samples forms a discrete-time
serve as a fundamental building block in adaptive digital
sinusoid, x(n) = cos 2π f0 n, where f0 = f0 /Fs . According to
filters and other nonlinear systems.
the Nyquist condition we must have f0 < Fs /2 in order to
avoid aliasing: the equivalent restriction on f0 is f0 < 12 .
2.3. Discrete-Time Sinusoidal Signals
Anti-aliasing filters must be used to avoid the presence of
A discrete-time signal of the form aliased components in sampled signals.
Figure 8. Block diagram representation of a signal multiplier. Figure 9. The impulse response of a relaxed LTI system.
690 DIGITAL FILTERS
As a result the convolution sum (2) for a causal system where g(·) is the impulse response of the interpolating filter
ranges only over 0 ≤ k < ∞. Also, an LTI system is BIBO and T is the sampling interval. The Laplace transform of
stable if, and only if this signal is
! "
∞
∞
|h(k)| < ∞ Xr (s) = x(n)e −snT
G(s)
k=−∞ n=−∞
The response of an LTI system to a sinusoidal input is namely, a product of the transfer function G(s) of the
of particular interest. The input signal x(n) = exp{jω0 n} interpolating filter with a transform-domain characteri-
produces the output zation of the discrete-time sequence of samples x(·). This
observation motivates the introduction of the Z transform
y(n) = H(ejω0 )ejω0 n ≡ H(ejω0 )x(n) (3a)
∞
A real-life LTI system has a real-valued impulse X+ (z) = x(n)z−n , X− (z) = x(n)z−n
response. The response of such a system to the real- n=0 n=−1
Thus the transfer function of a stable system is always well filter is stable if all its poles have magnitudes strictly less
defined on the unit circle T = {z; |z| = 1}, and we recognize than unity. Thus, FIR filters are unconditionally stable,
H(z)|z=ejω as the frequency response of the system [as because all of their poles are at z = 0.
defined in (3)]. More generally, if x(·) is a discrete-time In view of (6), the input–output relation of a digital
sequence, whether a signal or an impulse response, such filter can be written as a(z)Y(z) = b(z)X(z), which
∞
corresponds to a difference equation in the time domain
that |x(n)| < ∞, then the unit circle T is included in
n=−∞
y(n) + a1 y(n − 1) + · · · + aN y(n − N)
the RoC of the Z transform X(z), and so X(z)|z=ejω is well
defined. The resulting function of ω, that is = bo x(n) + b1 x(n − 1) + · · · + bM x(n − M) (7)
∞
The same input–output relation can also be represented
X(ejω ) = x(n)e−jωn (5a) by a block diagram, using multipliers (actually signal
n=−∞
scalers), adders, and delay elements (Fig. 10). Such a
block diagram representation is called a realization of the
is called the discrete-time Fourier transform (DTFT) of x(·). b(z)
From a mathematical standpoint, relation (5) is a transfer function H(z) = . Digital filter realizations
a(z)
complex Fourier series representation of the periodic are not unique: the one shown in Fig. 10 is known as
function X(ejω ), and we recognize the samples x(n) as the the direct form type 2 realization. Other realizations are
Fourier coefficients in this representation. The standard described in Section 5.
expression for Fourier coefficients A hardware implementation (i.e., a digital filter) is
π obtained by mapping the realization into a specific
1
x(n) = X(ejω )ejωn dω (5b) platform, such as a DSP chip or an ASIC (see Section 5).
2π −π A software implementation is obtained by mapping the
same realization into a computer program.
is known as the inverse DTFT. The DTFT (5) converges
absolutely (and uniformly) for
1 sequences, and the 3.3. The Power Spectrum of a Discrete-Time Signal
resulting limit is a continuous function of ω. The DTFT The power spectrum of a finite-energy signal x(·) is defined
can be extended to other types of sequences by relaxing as the square of the magnitude of its DTFT X(ejω ).
the notion of convergence of the infinite sum in (5a). For Alternatively, it can be obtained by applying a DTFT
instance, the DTFT of
2 sequences is defined by requiring to the autocorrelation sequence
convergence in the L2 (T) norm, and the resulting limit
is a square-integrable function on the unit circle T. A
∞
further extension to
∞ sequences (= bounded signals) Cxx (m) = x(n + m)x∗ (n) = x(m) x∗ (−m)
results in limits that are L1 (T) functions, and thus may n=−∞
contain impulsive components. For instance, the DTFT where the asterisk (∗) denotes complex conjugation.
of the bounded signal x(n) = ejω0 n is X(ejω ) = 2π δ(ω − ω0 ), For finite-power signals the DTFT X(ejω ) may contain
where δ(·) is the Dirac delta function. The convergence impulsive components, so that the square of its magnitude
of (5) in this case is defined only in the distribution sense. cannot be defined. Instead, the autocorrelation is defined
Also, notice that this signal has no Z transform. in this case as
and the power spectrum is defined as the DTFT of the 4. DESIGN OF FREQUENCY-SELECTIVE DIGITAL FILTERS
autocorrelation Cxx (·). Similarly, the autocorrelation of a
random stationary signal is defined as The complete process of designing a digital filter consists
of seven stages:
Cxx (m) = E {x(n + m)x∗ (n)}
1. Problem analysis — determine what the filter is
where E{ } denotes expectation (i.e., probabilistic mean). supposed to accomplish.
Thus, in all three cases the power spectrum is 2. Filter specification — select a desired (ideal) fre-
quency response for the filter, and decide how
∞ accurately it should be approximated.
Sxx (ejω ) = Cxx (m)e−jωm (8a) 3. Filter design — obtain the coefficients of a realizable
m=−∞
transfer function H(z) that approximates the desired
frequency response within the specified tolerance.
and can be viewed as a restriction to the unit circle of the
so-called complex power spectrum 4. Filter realization — determine how to construct the
filter by interconnecting basic building blocks,
∞ such as delays, adders, and signal scalers (i.e.,
Sxx (z) = Cxx (m)z−m (8b) multipliers).
m=−∞ 5. Implementation choices — select the specific hard-
ware/software platform in which the building blocks
The complex power spectrum is used in the design of will be implemented.
optimal (Wiener) filters (see Section 6.1). 6. Performance analysis — use the physical parameters
Similarly, the cross-correlation of two signals is defined of the selected platform (accuracy, cost, speed,
as etc.) to evaluate the compliance of the selected
∞ implementation with the specification of the filter.
y(n + m)x∗ (n) 7. Construction — implement the specific choices made
in the design and realization stages into the selected
n=−∞
finite-energy signals hardware/software platform.
1 N
Cyx (m) = lim y(n + m)x∗ (n)
N→∞ 2N + 1
In the problem analysis stage the designer uses specific
n=−N
information about the application to determine a desired
finite-power signals
frequency response, say, D(ω), for the filter. The
E {y(n + m)x∗ (n)}
desired magnitude response |D(ω)| is often a classical
jointly stationary random signals (ideal) frequency-selective prototype — lowpass, highpass,
(9a)
bandpass, or bandstop — except in specialized filter
and the (complex) cross spectrum is the transform of the
designs such as Hilbert transformers, differentiators,
cross-correlation:
notch filters, or allpass filters. This means that |D(ω)| is
∞ piecewise constant, so that the frequency scale decomposes
Syx (z) = Cyx (m)z−m (9b) into a collection of bands. The desired phase response
m=−∞ arg D(ω) could be linear (exactly or approximately),
minimum-phase, or unconstrained.
When y(·) is the response of a stable LTI system to the The designer must also define the set of parameters
input x(·), then that determine the transfer function of the filter to be
designed. First a choice has to be made between FIR and
Syx (z) = H(z)Sxx (z) (10) IIR, considering the following facts:
where H(z) is the transfer function of the system, and • Exact linear phase is possible only with FIR filters.
• FIR filters typically require more coefficients and
∗
1 delay elements than do comparable IIR filters, and
Syy (z) = H(z)Sxx H ∗ (11) therefore involve higher input–output delay and
z
higher cost.
In particular, by using a unit-power white noise [i.e., one • Currently available design procedures for FIR filters
with Sxx (z) = 1] input, we find that can handle arbitrary desired responses (as opposed to
ideal prototypes) better than IIR design procedures.
∗
1 • Finite precision effects are sometimes more pro-
Syy (z) = H(z) H ∗ (12)
z nounced in IIR filters. However, such effects can be
ameliorated by choosing an appropriate realization.
This expression is called a spectral factorization of the • Stability of IIR filters is harder to guarantee in
complex power spectrum Syy (z) and the transfer function adaptive filtering applications (but not in fixed
H(z) is called a spectral factor. filtering scenarios).
DIGITAL FILTERS 693
Additional constraints, such as an upper bound on the a prescribed tolerance level is equivalent to providing a
overall delay (e.g., for decision feedback equalization), separate tolerance level for each frequency band, say
or a requirement for maximum flatness at particular
frequencies, serve to further reduce the number of ||H(ejω )| − Di | ≤ δi for all ω ∈ Bi (13)
independent parameters that determine the transfer
function of the filter. where Bi is the range of frequencies for the ith band,
On completion of the problem analysis stage, the Di is the desired magnitude response in that band, and
designer can proceed to formulate a specification and δi is the prescribed level of tolerance. For instance, the
determine a transfer function that meets this specification, specification of a lowpass filter is
as described in the remainder of Section 4. The remaining
stages of the design process (realization, implementation, Passband: 1 − δp ≤ |H(ejω )| ≤ 1 + δp for |ω| ≤ ωp
performance analysis, and construction) are discussed in Stopband: |H(ejω )| ≤ δs for ωp ≤ |ω| ≤ π
Section 5.
as illustrated by the template in Fig. 11. The magni-
4.1. Filter Specification tude response of the designed filter must fit within the
unshaded area of the template. The tolerance δp charac-
Once the desired frequency response D(ω) has been terizes passband ripple, while δs characterizes stopband
determined along with a parametric characterization of attenuation.
the transfer function H(z) of the designed filter, the next
step is to select a measure of approximation quality and a 4.2. Design of Linear-Phase FIR Filters
tolerance level (i.e., the highest acceptable deviation from
The transfer function of an FIR digital filter is given by (6)
the desired response). The most commonly used measure
M
is the weighted Chebyshev (or L∞ ) norm with N = 0, so that a(z) = 1 and H(z) ≡ b(z) = bi z−i .
i=0
max W(ω)|H(ejω ) − D(ω)| The design problem is to determine the values of the
ω
coefficients {bi } so that the phase response is linear, and
where W(ω) is a nonnegative weighting function. Alter- the magnitude response |H(ejω )| approximates a specified
native measures include the weighted mean-square (or |D(ω)|. In order to ensure phase linearity, the coefficients
L2 ) norm {bi } must satisfy the symmetry constraint
π
W 2 (ω)|H(ejω ) − D(ω)|2 dω
−π
bM−i = sbi for 0 ≤ i ≤ M, where s = ±1 (14)
and the truncated time-domain mean-square norm As a result, the frequency response satisfies the constraint
H ∗ (ejω ) = sH(ejω )ejMω , so that the phase response is indeed
L linear: arg H(ejω ) = (Mω + sπ )/2. Another consequence of
|h(n) − d(n)|2 the symmetry constraint (14) is that the zero pattern
n=0 of H(z) is symmetric with respect to the unit circle:
H(zo ) = 0 ↔ H(1/z∗o ) = 0.
where h(n) is the impulse response of the designed filter
The design of a linear-phase FIR filter is successfully
and d(n) is the inverse DTFT of the desired frequency
completed when we have selected values for the filter coef-
response D(ω).
ficients {bi } such that (1) the symmetry constraint (14) is
The mean-square measures are useful mainly when
satisfied and (2) the magnitude tolerance constraints (13)
the desired response D(ω) is arbitrary, since in this
are satisfied. A design is considered optimal when the
case optimization in the Chebyshev norm can be
order M of the filter H(z) is as small as possible under
quite demanding. However, the approximation of ideal
the specified constraints. It is customary to specify the
prototypes under the Chebyshev norm produces excellent
magnitude tolerances in decibels — for instance, a lowpass
results at a reasonable computational effort, which makes
filter is characterized by
it the method of choice in this case.
The approximation of ideal prototypes is usually carried 1 + δp 1
out under the modified Chebyshev norm Rp = 20 log , As = 20 log
1 − δp δs
max W(ω)||H(ejω )| − |D(ω)|| The most popular techniques for linear-phase FIR
ω
design are
The elimination of phase information from the norm
expression reflects the fact that in this case phase is either • Equiripple — optimal in the weighted Chebyshev
completely predetermined (for linear-phase FIR filters) or norm, uses the Remez exchange algorithm to optimize
completely unconstrained (for IIR filters). Furthermore, filter coefficients. The resulting magnitude response
the desired magnitude response is constant in each is equiripple in all frequency bands (as in Fig. 11).
frequency band (e.g., unity in passbands and zero in The weight for each band is set to be inversely
stopbands), so it makes sense to select a weighting proportional to the band tolerance. For instance,
function W(ω) that is also constant in each frequency to design a lowpass filter with passband ripple
band. As a result, constraining the Chebyshev norm to Rp = 1 dB and stopband attenuation As = 40 dB, we
694 DIGITAL FILTERS
1 + dp
1 − dp
ds
0
0 wp ws p
w →
calculate first δp = 0.0575 and δs = 0.01 and then the passband attenuation of the filter, and the width
set Wp = δp−1 = 17.39 and Ws = δs−1 = 100. While, in of the transition band. The Kaiser window has a
principle, the Remez exchange algorithm can be control parameter that allows continuous adjustment
applied to any desired magnitude response |D(ω)|, of this tradeoff. Other popular windows (e.g., Bartlett,
most filter design packages (such a the signal Hamming, Von Hann, and Blackman) are not
processing toolbox in Matlab) accept only piecewise adjustable and offer a fixed tradeoff. The simplicity
constant gain specifications. of the truncation and windowing method makes it
• Least squares — optimal in the weighted mean- particularly attractive in applications that require an
square norm, uses closed-form expressions for the arbitrarily shaped magnitude response (and possibly
optimal filter coefficients. As in the equiripple also an arbitrarily shaped phase response).
method, the weight for each band is set to be inversely
proportional to the band tolerance. Because of the Specialized design methods are used to design nonstan-
availability of a simple closed-form solution, the least- dard filters such as maximally flat or minimum-phase FIR
squares method is most frequently used to design filters, Nyquist filters, differentiators, Hilbert transform-
filters with arbitrarily shaped magnitude responses. ers, and notch filters [5,6].
• Truncation and windowing — optimal in the time-
domain mean-square norm (when L = M) but not 4.3. IIR Filter Design
in any frequency-domain sense. The resulting filter
usually has many more coefficients than the one The transfer function of an IIR digital filter is given by (6)
designed by the equiripple method. The filter with N > 0, so that H(z) = b(z)/a(z). The design problem
coefficients are obtained by (1) applying the inverse is to determine the values of the coefficients {ai } and
DTFT (5b) to the desired response D(ω) (which {bi } so that the magnitude response |H(ejω )| approximates
contains the appropriate linear phase term) to obtain a specified |D(ω)|, with no additional constraints on
the impulse response d(n), and (2) multiplying this the phase response. The design of an IIR filter is
impulse response by a window function w(n), which successfully completed when we have selected values for
vanishes outside the range 0 ≤ n ≤ M. The window the filter coefficients such that the magnitude tolerance
function is selected to control the tradeoff between constraints (13) are satisfied. A design is considered
DIGITAL FILTERS 695
1 − dp
ds
0
0 wp ws p
w →
optimal when the filter order N is as small as possible The analog filter HLP (s) is one of four standard prototypes:
under the specified constraints. Butterworth, Chebyshev type 1 or 2, or elliptic (also known
There exist a number of techniques for the design of IIR as Chebyshev–Cauer). Since the magnitude response
filters with an arbitrarily shaped D(ω), some based on the of such prototypes is always bounded by unity, the
weighted (frequency-domain) L2 and L∞ norms [5], and specification template has to be modified accordingly (see
others on the truncated time-domain mean-square norm Fig. 12). As a result, the decibel scale characterization of
(with L = M + N + 1) [6]. Here we shall discuss only the tolerances for IIR design is also modified:
design of classical (ideal) frequency-selective prototypes,
1 1
which is the most common application for IIR digital Rp = 20 log , As = 20 log
filters. The most popular IIR design technique relies on a 1 − δp δs
well-developed theory of analog filter design from closed-
The variable transformation s → g(s) that converts the
form (analytical) formulas. This means that the design of
analog lowpass filter HLP (s) into a highpass, bandpass, or
an IIR digital filter decomposes into three steps:
bandstop filter Ha (s) is described in Table 1. It involves
p,LP , the passband edge frequency of the prototype HLP (s),
1. The specification of the desired classical frequency
selective digital filter (lowpass, highpass, bandpass, Table 1. Frequency Transformations for Analog Filters
or bandstop) is translated into a specification of an
Band-Edge
equivalent analog filter. Frequency of
2. An analog filter Ha (s) that meets the translated Type of Transformation g(s) Target Filter
specification is obtained by (a) designing a lowpass
p,LP p
analog filter HLP (s) from closed-form formulas and Lowpass to highpass p
s
(b) transforming HLP (s) into a highpass, bandpass,
s2 + p1 p2
or bandstop filter, as needed, by a complex variable Lowpass to bandpass p,LP p1 , p2
s(p2 − p1 )
mapping s → g(s), namely, Ha (s) = HLP [g(s)].
s(p2 − p1 )
3. The analog filter Ha (s) is transformed into a digital Lowpass to bandstop p,LP 2 p1 , p2
s + p1 p2
filter H(z) by (another) complex variable mapping.
696 DIGITAL FILTERS
as well as the passband edge frequencies of the target All four prototypes have a maximum gain of unity,
analog filter Ha (s). which is achieved either at the single frequency =
Several methods — including the impulse invariance, 0 (for Butterworth and Chebyshev 2), or at several
matched Z, and bilinear transforms — can be used to map frequencies throughout the passband (for Chebyshev 1 and
the analog transfer function Ha (s), obtained in the second elliptic). When the Butterworth or Chebyshev 1 prototype
step of IIR design, into a digital transfer function H(z). is translated into its digital form, its numerator is
The most popular of these is the bilinear transform: the proportional to (1 + z−1 )N . Thus only N + 1 multiplications
transfer function H(z) is obtained from Ha (s) by a variable are required to implement these two prototypes, in
substitution contrast to Chebyshev 2 and elliptic prototypes, which
1 − z−1 have nontrivial symmetric numerator polynomials, and
H(z) = Ha α (15a)
1 + z−1 thus require N + ceil(N + 1/2) multiplications each.
As an example we present in Fig. 13 the magnitude
where α is an arbitrary constant. The resulting digital response of the four distinct digital lowpass filters,
filter has the same degree N as the analog prototype Ha (s) designed from each of the standard analog prototypes to
from which it is obtained. It also has the same frequency meet the specification fp = 0.23, fs = 0.27, Rp = 1 dB, and
response, but with a warped frequency scale; as a result of As = 20 dB. The order needed to meet this specification is
the variable mapping (15a), we have 12 for the Butterworth, 5 for Chebyshev (both types), and
4 for the elliptic. Thus the corresponding implementation
H(ejω ) = Ha (j)|=α tan(ω/2) (15b)
cost (=number of multipliers) is in this case 13 for
Butterworth, 6 for Chebyshev 1, 11 for Chebyshev 2, and
Here we use to denote the frequency scale of the
9 for elliptic.
analog filter Ha (s) in order to distinguish it from the
frequency scale ω of the digital filter H(z). The frequency
mapping = α tan(ω/2) governs the first step of the IIR 5. REALIZATION AND IMPLEMENTATION OF DIGITAL
design process — the translation of a given digital filter FILTERS
specification into an equivalent analog filter specification.
While the tolerances Rp and As are left unaltered, all The preceding discussion of digital filters has concentrated
critical digital frequencies (such as ωp and ωs for a solely on their input–output response. However, in order
lowpass filter) are prewarped into the corresponding to build a digital filter, we must now direct our attention
analog frequencies, using the relation to the internal structure of the filter and its basic building
ω blocks. The construction of a digital filter from a given
= α tan ≡ α tan π f (15c) (realizable) transfer function H(z) usually consists of
2
two stages:
For instance, in order to design a lowpass digital filter with
passband |f | ≤ fp = 0.2 with ripple Rp = 1 dB and stopband • Realization — in this stage we construct a block
0.3 = fs ≤ |f | ≤ 0.5 with attenuation As = 40 dB, we need diagram of the filter as a network of interconnected
to design an analog filter with the same tolerances, but basic building blocks, such as unit delays, adders,
with passband || ≤ p and stopband || ≥ s , where and signal scalers (i.e., multipliers).
• Implementation–in this stage we map the filter real-
p = α tan(0.2π ), s = α tan(0.3π )
ization into a specific hardware/software architec-
ture, such as a general-purpose computer, a digital
Since the value of α is immaterial, the most common choice
signal processing (DSP) chip, a field-programmable
is α = 1 (sometimes α = 2 is used). The value of α used
gate array (FPGA), or an application-specific inte-
in the prewarping step must also be used in the last step
grated circuit (ASIC).
when transforming the designed analog transfer function
Ha (s) into its digital equivalent H(z) according to (15a).
Once the complete specification of the desired analog A realization provides an idealized characterization of
filter is available, we must select one of the four standard the internal structure of a digital filter, in the sense
lowpass prototypes (Fig. 13): that it ignores details such as number representation
and timing. A given transfer function has infinitely
• Butterworth — has a monotone decreasing passband many realizations, which differ in their performance,
and stopband magnitude response, which is maxi- as explained in Section 5.3. The main distinguishing
mally flat at = 0 and = ∞ performance attributes are numerical accuracy and
processing delay. These and other details must be
• Chebyshev 1 — has an equiripple passband magni-
addressed as part of the implementation stage, before
tude response, and a monotone decreasing stopband
we can actually put together a digital filter.
response, which is maximally flat at = ∞
• Chebyshev 2 — has a monotone decreasing passband
5.1. Realization of FIR Filters
magnitude response, which is maximally flat at
= 0, and an equiripple stopband response FIR filters are always realized in the so-called transversal
• Elliptic — has an equiripple magnitude response in (also tapped-delay-line) form, which is a special case of
both the passband and the stopband the direct-form realization described in Section 3.2. Since
DIGITAL FILTERS 697
Butterworth Chebyshev 1
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
f f
Chebyshev 2 Elliptic
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
f f
Figure 13. The four standard lowpass prototypes for IIR filter design.
N
Aj
a(z) = 1 for FIR filters, the direct form realization reduces H(z) = A0 + (16)
j=1
1 − pj z−1
to the configuration shown in Fig. 14.
The realization requires M delay elements, M + 1,
where pj are the poles of H(z) and where we assumed that
multipliers and M adders. However, since FIR filters
(1) M ≤ N and (2) all poles are simple. Both assumptions
usually satisfy the symmetry constraint bi = ±bM−i in
hold for most practical filters, and in particular for those
order to achieve linear phase, we can save about half of
obtained from the standard analog prototypes.
the number of multipliers.
Since the denominator polynomial a(z) has real
coefficients, its roots (i.e., the poles pj ) are either real
5.2. Realization of IIR Filters
or conjugate pairs. In order to avoid complex filter
The transfer function of most practical IIR filters, coefficients, the two terms representing a conjugate pair
such as those obtained from analog prototypes, have of poles are combined into a single second-order term:
the same numerator and denominator degrees: N ≡
deg a(z) = deg b(z) ≡ M. Consequently, their direct-form A A∗ (ReA) − (Rep0 )z−1
+ =2
realization requires N delay elements, 2N + 1 multipliers, 1 − p0 z−1 ∗ −1
1 − p0 z 1 − (2Rep0 )z−1 + |p0 |2 z−2
698 DIGITAL FILTERS
K The FSVD refines this characterization by specifying a
H(z) = A0 + Gk (z) multiplicative factorization
k=1
A B
where each Gk (z) is a strictly proper IIR filter or order 2 = FK FK−1 · · · F2 F1 (19b)
C D
(for conjugate pairs of poles) or order 1 (for real poles).
This results in a parallel connection of the subsystems that captures the decomposition of the realization into
Gk (z) (Fig. 15). Finally, each individual Gk (z) is realized in interconnected modules, as in the cascade or parallel
direct form. realizations. Each Fk describes a single module, and each
The cascade realization is obtained by using the module is associated with a subset of the state vector
pole–zero factorization of the transfer function s(n). The totality of all possible minimal realizations
of a given H(z) is captured by the notion of similarity
M
transformations, combined with the added flexibility of
1 − zi z−1 factoring the (A, B, C, D) matrix as in (19b).
b(z) i=1
H(z) ≡ = b0 (17)
a(z) N
1 − pj z−1 5.3. Implementation and Performance Attributes
j=1
The next stage in the design of a digital filter is imple-
mentation, that is, the mapping of a selected realization
where we recall again that the zeros zi are the roots of
into a specific hardware/software architecture. Since a
the numerator polynomial b(z) and the poles pj are the
given transfer function has multiple implementations,
roots of the denominator polynomial a(z). Since these
which differ in their performance, the choice of a specific
polynomials have real coefficients, their roots are either
implementation should be based on objective performance
real or conjugate pairs. For each conjugate pair, the
criteria. Commonly used performance attributes for eval-
corresponding first-order factors are combined into a single
uating implementations include
second-order term:
(1 − z0 z−1 )(1 − z∗0 z−1 ) = 1 − (2Rez0 )z−1 + |z0 |2 z−2 • Processing speed — can be quantified in terms of
two distinct attributes: (1) throughput, which is the
Thus we obtain a multiplicative factorization number of input samples that can be processed by the
digital filter in a unit of time, and (2) input–output
K delay, which is the duration between the instant a
H(z) = Hk (z) (18) given input sample x(no ) is applied to the digital filter,
k=1 and the instant the corresponding output sample
y(no ) becomes available.
where each Hk (z) is a proper IIR filter or order 2 (for
• Numerical accuracy — the level of error due to finite
conjugate pairs of poles) or order 1 (for real poles).
precision number representation.
This results in a cascade connection of the subsystems
Hk (z) (Fig. 16). Again, each individual Hk (z) is realized in • Hardware cost — the total number of each kind of
direct form. basic building blocks, such as delays, multipliers,
and adders.
• Implementation effort–the amount of work required
to map a given realization into a specific hard-
x (.) H1(z) H2(z) HK (z) y (.) ware/software architecture. Structural attributes of
the realization, such as modularity and/or regularity,
Figure 16. Cascade realization of an IIR digital filter. contribute to the reduction of this effort.
DIGITAL FILTERS 699
The same attributes can be also used to evaluate the of another random signal x(·). The corresponding
performance of realizations [6,7]. For instance, the direct estimate ŷ(·) is obtained by applying the observed
realization of an IIR digital filter has much poorer signal x(·) to the input of the Wiener filter, so that
numerical accuracy than does either the parallel or ŷ(·) = hopt (·) x(·).
the cascade realizations. Similarly, the input–output • System identification — the impulse response of an
delay of the parallel realization is somewhat shorter unknown relaxed LTI system can be identified from
than that of the cascade or direct realizations. Such observations of its input signal x(·) and output
observations make it possible to optimize the selection signal y(·). In fact, the optimal solution of the
of a realization, and to quantify the relative merits of Wiener problem (20) coincides with the unknown
alternative implementations. system response, provided (1) the input signal x(·) is
stationary, and is observed with no error and (2) the
6. OPTIMAL AND ADAPTIVE DIGITAL FILTERS (additive) error in measuring the output signal y(·) is
uncorrelated with x(·).
The desired response of classical frequency-selective filters
is completely specified in terms of passbands, stopbands, The unconstrained optimal solution of the Wiener
and transition bands. In contrast, optimal filters use problem (20) is a noncausal IIR filter, which is not
detailed information about the frequency content of the realizable (see Section 3.2 for a discussion of realizability).
desired signal and the interference in achieving maximal In the case of jointly stationary x(·) and y(·) signals,
suppression of the latter, along with minimal distortion of realizable solutions are obtained by imposing additional
the former. Since the information needed to construct an constraints on the impulse response h(·) and/or the joint
optimal filter is typically not available a priori, it is usually statistics of x(·), y(·):
estimated online from measurements of the available
signals. When this process of frequency content estimation • When both the autospectrum Sxx (z) and the cross-
is carried out simultaneously with signal filtering, and spectrum Syx (z) are rational functions, and the
the coefficients of the optimal filter are continuously Wiener filter is required to be causal, the resulting
updated to reflect the effect of new information, the Wiener filter is realizable; specifically, it is causal
resulting linear time-variant configuration is known as and rational.
an adaptive filter. • When x(·) and y(·) are characterized jointly by a
Fixed frequency-selective filters are appropriate in state-space model, the resulting optimal filter can be
applications where (1) the desired signal is restricted described by a state-space model of the same order.
to known frequency bands (e.g., AM and FM radio, This is the celebrated Kalman filter.
television, frequency-division multiplexing) or (2) the • Realizability can be enforced regardless of structural
interfering signal is restricted to known frequency bands assumptions on the joint statistics of x(·) and y(·). In
(e.g., stationary background in Doppler radar, fixed particular, we may constrain the impulse response
harmonic interference). On the other hand, in numerous in (20) to have finite length. This approach is common
applications the interference and the desired signal share in adaptive filtering.
the same frequency range, and thus cannot be separated
by frequency selective filtering. Adaptive filters can The classical Wiener filter is time-invariant, requiring
successfully suppress interference in many such situations an infinitely long prior record of x(·) samples, and
(see Section 7.2). thus is optimal only in steady state. In contrast, the
Kalman filter is optimal at every time instant n ≥ 0,
6.1. Optimal Digital Filters requiring only the finite (but growing) signal record
Optimal filtering is traditionally formulated in a prob- {x(k); 0 ≤ k ≤ n}. Consequently, the impulse response of
abilistic setting, assuming that all signals of interest the Kalman filter is time-variant and its length grows with
are random, and that their joint statistics are avail- n, asymptotically converging to the classical Wiener filter
able. In particular, given the second-order moments of two response. Finally, adaptive filters continuously adjust
discrete-time (zero-mean) random signals x(·) and y(·), the their estimates of the joint signal statistics, so the
corresponding mean-square optimal filter, also known as resulting filter response remains time-variant even in
a Wiener filter, is defined as the (unique) solution hopt (·) of steady state, randomly fluctuating around the Wiener
the quadratic minimization problem filter response.
Subsequently, the estimated moments are used to that were impossible to implement with analog hardware.
construct a realizable optimal filter. A detailed discussion of numerous digital filtering
• Filtering Stage. The fixed optimal filter that was applications can be found in the literature [1,4–6].
designed in the training stage is now applied to new
samples of x(·) to produce the estimate ŷ(·). 7.1. Time-Invariant Digital Filters
Ongoing improvements in the cost, size, speed, and
Alternatively, we may opt to continue the training stage power consumption of digital filters have made them
indefinitely; as new samples of x(·) and y(·) become an attractive alternative to analog filters in a variety
available, the estimated statistics and the corresponding of telecommunication applications that require a time-
optimal filter coefficients are continuously updated. This invariant filter response. The main types of such
approach gives rise to an adaptive filter configuration — a applications are:
linear time-variant digital filter whose coefficients are
adjusted according to continuously updated moment
• Frequency-selective filters — pass (or stop) prespeci-
estimates (Fig. 17).
fied frequency bands. Some examples include
Since adaptive filtering requires ongoing measurement
RF-to-IF-to-baseband conversion in digital wire-
of both x(·) and y(·), it can be applied only in applications
less systems
that involve the system identification interpretation of the
FDM multiplexing and demultiplexing
Wiener filter. Thus, instead of applying hopt (·) to x(·) to
Subband decomposition for speech and image
estimate an unknown signal y(·), adaptive filters rely on
coding (e.g., MPEG)
the explicit knowledge of both x(·) and y(·) to determine
Doppler radar moving-target indicator (MTI) to
hopt (·). Once this optimal filter response is available, it is
remove the nonmoving background
used to suppress interference and extract desired signal
Digital spectrum analysis
components from x(·), y(·) (see Section 7.2).
The most common adaptive filters use a time-variant • Matched filters/correlators — have an impulse re-
FIR configuration with continuously adjusting filter sponse that matches a given waveform. Some
coefficients. Such filters take full advantage of the power examples include:
of digital signal processing hardware because (1) the Matched filter for symbol detection in digital
ongoing computation of signal statistics and optimal filter communication systems
coefficients usually involves nonlinear operations that Waveform-based multiplexing/demultiplexing
would be difficult to implement in analog hardware, and (e.g., CDMA)
(2) modifications to the updating algorithms can be easily Front-end Doppler radar processing
implemented by reprogramming. The most commonly used • Analog-to-digital and digital-to-analog conversion
algorithms for updating the filter coefficients belong to Oversampling converters (e.g., sigma–delta mod-
the least-mean-squares (LMS) family and the recursive ulation)
least-squares (RLS) family [3]. Compact-disk recording and playing
• Specialized filters (such as)
Hilbert transformers
7. APPLICATIONS IN TELECOMMUNICATION SYSTEMS Timing recovery in digital communications (e.g.,
early–late gate synchronizer)
Since the early 1980s, digital filtering has become a key
component in a wide range in applications. We provide 7.2. Adaptive Digital Filters
here a brief summary of the main telecommunication
applications of digital filters, separated into two categories: As explained in Section 6.2, adaptive filters implement the
(1) time-invariant digital filters, which are used to replace system identification scenario of the Wiener filter (Fig. 17),
analog filters in previously known applications, and requiring two observed signals. Many applications use a
(2) adaptive digital filters, which enable new applications variation of this configuration, known as the adaptive
interference canceler, in which the received signal y(·) is a
noisy version of an unknown desired signal s(·), and the
so-called ‘‘reference signal’’ x(·) is a filtered version of the
∧ interference component of y(·) (Fig. 18).
x (n) Linear time-variant filter y (n) Two assumptions must be satisfied in order for the
adaptive filter to perform well as an interference canceler:
∧
H (z) H (z)
Adaptive filter
interference is completely cancelled. In reality, both is pointed, then the reference signal received by
assumptions are met only approximately, so that only the omnidirectional antenna is dominated by the
partial cancellation of the interference is achieved. interference, and the lack-of-correlation assumption
The interference cancelling configuration is useful is (approximately) met.
in a variety of specific telecommunication applications,
including The adaptive beamformer configuration, a variation of
the sidelobe-canceling approach, is used to maintain the
mainlobe of an antenna array pointed in a predetermined
• Echo cancellation — used in duplex communication direction, while adaptively reducing the effect of undesired
systems to suppress interference caused by signal signals received through the sidelobes [3]. It can be used,
leakage from the incoming far-end signal (the for instance, to split the radiation pattern of a cellular
‘‘reference signal’’ in Fig. 18) to the outgoing local base-station antenna into several narrow beams, each one
signal (the ‘‘desired signal’’). Such leakage is tracking a different user.
common in telephone lines (2W/4W converters), Another common adaptive filtering configuration is the
modems, teleconferencing systems and hands-off adaptive linear predictor, in which only one observed
telephone units. signal is available (Fig. 19). In this case the ‘‘reference
• Decision feedback equalization — used to reduce the signal’’ is simply a delayed version of the observed signal
effect of intersymbol interference (ISI) in a digital y(·), so that the Wiener problem (20) now becomes
communication channel. Here the ‘‘reference signal’’
is the sequence of previously received symbols, min E|y(n) − h(n) y(n − )|2 (21)
h(·)
which are assumed to be uncorrelated with the
current symbol (the ‘‘desired signal’’). In order for the which is, a -step-ahead linear prediction problem. In
equalizer to work correctly, the previously received some applications the object of interest is the adaptively
symbols must be known without error. To meet estimated linear predictor response ĥ(·), while in other
this requirement, the equalizer alternates between applications it is the predicted signal ŷ(n) = ĥ(n) y(n −
a training phase and a tracking phase. In the ), or the residual signal y(n) − ŷ(n). Telecommunication
training phase a prespecified ‘‘training sequence’’ of applications of the linear predictor configuration include
symbols is transmitted through the channel, and this
information is used by the equalizer to determine • Linear predictive coding (LPC) — used for low-rate
the channel estimate Ĥ(z). In the tracking phase the analog-to-digital conversion, and has become the
equalizer uses detected previous symbols to track standard speech coding method in GSM cellular
slow variations in the channel response: as long systems. While the original LPC approach used
as the estimated response Ĥ(z) remains close to only the linear predictor response ĥ(·) to represent
the true response H(z), the cancellation of ISI is the observed speech signal y(·), current LPC-based
almost perfect, symbols are detected without error, speech coders use, in addition, a compressed form of
and the correct ‘‘reference signal’’ is available to the the residual signal y(n) − ŷ(n) [6].
equalizer. Since decision feedback equalization relies • Adaptive notch filter — used to suppress sinusoidal
on previous symbols, it cannot reduce the ISI caused interference, such as narrowband jamming in spread-
by the precursors of future symbols — this task is left spectrum systems. Since spread-spectrum signals
to the feedforward equalizer. resemble broadband noise, which is almost com-
• Sidelobe cancellation — used to modify the radiation pletely unpredictable, while sinusoidal signals are
pattern of a narrowbeam directional antenna (or
antenna array), in order to reduce the effect of strong
nearby interferers received through the sidelobes of Adaptive
Observed signal Predicted signal
the antenna. The ‘‘reference signal’’ in this case is linear predictor ∧
y (.) ∧ y (.)
obtained by adding an inexpensive omnidirectional h (.)
antenna; if the interfering RF source is much
closer than the source at which the main antenna Figure 19. Adaptive linear predictor configuration.
702 DIGITAL OPTICAL CORRELATION FOR FIBEROPTIC COMMUNICATION SYSTEMS
perfectly predictable, the adaptive linear predictor DIGITAL OPTICAL CORRELATION FOR
response is almost entirely dominated by the sinu- FIBEROPTIC COMMUNICATION SYSTEMS
soidal component of the observed signal y(·). As
a result, one can use the adaptive linear predic- JOHN E. MCGEEHAN
tor configuration to suppress sinusoidal components MICHELLE C. HAUER
or, with a minor modification, to enhance sinu- ALAN E. WILLNER
soidal components. University of Southern
• Feedforward equalizer — used to reduce the effect of California
ISI from precursors of future symbols. This is made Optical Communications
possible by the fact that the adaptive linear predictor Laboratory
tends to render the equalized channel response Los Angeles, California
minimum phase, thereby reducing the length of
precursors. This minimum-phase property is an 1. INTRODUCTION
intrinsic characteristic of the MSE linear predictor
As bit rates continue to rise in the optical core of
defined by (21).
telecommunication networks [toward ≥40 Gbps (gigabits
per second)], the potential inability of electronic signal
BIOGRAPHY processors to handle this increase is a force driving
research in optical signal processing. Correlation, or
Hanoch Lev-Ari received the B.S. (summa cum laude) matched filtering, is an important signal processing
and the M.S. degrees in electrical engineering from the function. The purpose of a correlator is to compare
Technion, Israel Institute of Technology, Haifa, Israel an incoming signal with one that is ‘‘stored’’ in the
in 1971 and 1976, respectively, and the Ph.D. degree correlator. At the appropriate sample time, a maximum
in electrical engineering from Stanford University, Stan- autocorrelation peak will be produced if the input signal is
ford, California, in 1984. During 1985 he held a joint an exact match to the stored one. This function is typically
appointment as an Adjunct Research Professor of Elec- used to pick a desired signal out of noise, an essential
trical Engineering with the Naval Postgraduate School, requirement for radar and wireless CDMA systems. As
Monterey, California and as a Research Associate with the telecommunication systems tend toward the use of optical
Information Systems Laboratory at Stanford. He stayed fiber as the preferred transmission medium, optical CDMA
at Stanford as a Senior Research Associate until 1990. networks are receiving greater attention and will require
Since 1990 he has been an Associate Professor with the the use optical correlator implementations.
Department of Electrical and Computer Engineering at In present-day fiberoptic networks, data packets are
Northeastern University, Boston, Massachusetts. During converted to electrical form at each node to process their
1994–1996 he was also the Director of the Communi- headers and make routing decisions. As routing tables
cations and Digital Signal Processing (CDSP) Center at grow in size, this is becoming a predominant source of
Northeastern University. Dr. Lev-Ari has over 100 journal network latency. The advent of optical correlation could
and conference publications. He served as an Associate enable packet headers to be read at the speed of light by
Editor of Circuits, Systems and Signal Processing, and simply passing the packets through a bank of correlators,
of Integration, the VLSI Journal. His areas of interest each configured to match a different entry in the routing
include model-based spectrum analysis and estimation for table. Simple decision logic could then configure a switch to
nonstationary signals, scale-recursive (multirate) detec- route each packet according to which correlator produced
tion and estimation of random signals, Markov renewal a match. Thus, with the growing demand for all-optical
models for nonstationary signals, and adaptive linear and networking functions, there is a strong push to develop
nonlinear filtering techniques. practical and inexpensive optical correlators that can
identify high-speed digital bit patterns on the fly and
BIBLIOGRAPHY with minimal electronic control.
power of the signal as opposed to a voltage as in electri- the power in each 1 bit). This threshold detection may be
cal systems. Consequently, the digital waveforms consist implemented either electronically or optically, although
of only positive values (a ‘‘unipolar’’ system), causing the optical threshold detection is still a nascent research area
correlation function of any two optical signals to be a requiring further development to become practical. Elec-
completely nonnegative function. This creates some limi- tronically, the output is detected with a photoreceiver and
tations for systems that transmit specially designed sets of a simple decision circuit is used to compare the correlation
codewords intended to have good autocorrelation proper- output to the threshold value. The high-speed advantage
ties; thus, a large autocorrelation peak when the incoming of optics still prevails in this case since the correlation
signal is synchronized with the receiver and matches the function is produced in the time it takes the signal to tra-
desired codeword and a low, ideally zero, level for all other verse the optical correlator, and the decision circuit only
codewords, for all possible shifts in time [2]. Sets of code- needs to be triggered at the sample-rate, which is often in
words with these properties are termed orthogonal codes. the kilohertz–megahertz range, depending on the number
It is possible with optical phase modulation to achieve of bits in each data packet or codeword. The mathematical
a bipolar system, but this requires a coherent receiver, function describing the tapped-delay-line correlator is
which is far more complex to implement in optics than the
standard direct intensity detection receivers.
N−1
y(t) = x(t − kTbit )h(kTbit ) (1)
Just as with electronics, prior to the development of
k=0
high-speed digital signal processors, a common implemen-
tation of an optical correlator is the tapped-delay line. A where N is the number of bits in the correlation sequence,
basic implementation of an optical tapped-delay line is Tbit is one bit period, x(t − kTbit ) is the input signal delayed
shown in Fig. 1a. The delay line is configured to match the by k bit times, and h(kTbit ) represents the k weights that
correlation sequence 1101. Thus, the delay line requires multiply each of the k-bit delayed input signals. For a
four taps (one for each bit in the desired sequence), phase-modulated system, the same operation can be per-
weighted by the factors 1, 1, 0 and 1, respectively. The formed by replacing the switches with the appropriate
weights are implemented by placing a switch in each path optical phase shifters to match the desired codeword (e.g.,
that is closed for weight = 1, and opened for weight = 0. π π 0π instead of 1101). Figure 1b illustrates the delay-and-
The incoming optical bit stream is equally split among the add operation of the correlator for the case when the three
four fiberoptic delay lines. Each successive delay line adds 4-bit words 1010, 1011, and 1101 are input to the correla-
one additional bit of delay to the incoming signal before tor, where the second word is an exact match to the desired
the lines are recombined, where the powers of the four sequence. Since the correlation function for two 4-bit words
signals are added together to yield the correlation output is 7 bits long and the peak occurs during the fourth time
function. This function is sampled at the optimum time, slot, the correlation output is sampled every 4 bits and
Ts , and passed through a threshold detector that is set to compared to a threshold as shown in Fig. 1c. As expected,
detect a power level above 2 since the autocorrelation peak the correlation output for the second word exceeds the
of 1101 with itself equals 3 (or more specifically, 3 times threshold while the first and third samples produce no
(a) Correlation
Input three output, y (t )
1 × 4 combiner
1 × 4 splitter
Threshold
input words when correlated with the bit pattern 1101
(the three optimum sample times are labeled); (c) an
1 1 1 2 2 2 2 3 2 3 2 2 2 0 1 intensity profile of the correlation output from this
Time tapped-delay-line correlator. Only when the intensity
Sample Times, Ts : is above the threshold at the sample time is a match
No match Match No match signal produced.
704 DIGITAL OPTICAL CORRELATION FOR FIBEROPTIC COMMUNICATION SYSTEMS
matches. Note that for an input signal L bits long, the Cross-
length of the correlation output will be L + N − 1 bits long. correlation
Holographic outputs
It should also be noted that the correlator as shown correlation
will also produce a level 3 peak that is above the threshold sequences
t
at time Ts for a 1111 input, which is not the desired 1
01
codeword. This is because the open switch in the third 00 Pattern match:
delay line, corresponding to the third correlation bit, does 01 autocorrelation
101
not care if the third bit is a 1 or a 0 since it does not
“01011” data beam 01011
pass any light. Thus, the correlator as shown is really
configured to produce a match for the sequence 11x1, 100
01 t
where the x indicates a ‘‘don’t care’’ bit that can either be Ts
00
0 or 1. In many cases, such as in optical CDMA systems,
10
0
the set of all possible codewords used in the system is Holographic plate
specifically designed to always have a constant number
of 1 bits so that this is not an issue. But, for cases in
Ts
which the correlator must identify a completely unique
sequence, the present correlator must be augmented with, Figure 2. A collimated 01011 input beam in free space impinges
for example, a second, parallel tapped-delay line that is on an optical angular multiplexed spectral holographic (AMSH)
configured in complement to the first one (the switches are plate that is programmed to recognize five different correlation
closed for desired 0 bits and open otherwise), and produces sequences. Each correlation sequence corresponds to a different
a ‘‘match’’ signal when zero power is present at the sample scattering angle from the plate. Only the direction corresponding
time. Then, the incoming signal is uniquely identified only to a pattern match (01011) produces an intensity autocorrelation
when both correlators produce a match. This configuration peak that will be above threshold at the sample time, while the
rest produce cross-correlation outputs.
will be discussed later in further detail.
FBG reflectivity
No delay
Tbit /2 delay
~38% ~0% ~100%
Input bit 1 2 Tbit delay
pattern “10100” 3 3Tbit /2 delay Input sequence 1 2
0 1 1
2Tbit delay
3
“1” “x ” “1”
Reflection Correlation sequence
00101 of input bit 110
00101 000
00000 pattern
00101 + 110
+ 00000 11110 Correlation output No match
001112010 No
Correlation output
match Threshold ≈ 2
Threshold ≈ 2
Ts Ts
Figure 3. A fiberoptic delay line correlator in which each branch Figure 4. A fiber Bragg grating (FBG)–based optical correlator
is terminated with a fiber mirror. The correlator is configured for in which the correlation sequence is programmed by tuning the
the correlation sequence 11x1x, where the switches are closed to reflectivities of the gratings such that desired 1 bits reflect and
represent 1 bits and opened to indicate ‘‘don’t care’’ bits, meaning ‘‘don’t care’’ bits do not reflect. The cross-correlation output 11110
that the peak of the output correlation function at time Ts will is the result of the correlation between the input sequence 011
be the same regardless of whether the ‘‘don’t care’’ bits are ones and the programmed sequence 1x1. The output is sampled at time
or zeros. The correlation output for an input pattern of 10100 is Ts and a threshold detector produces a ‘‘no match’’ decision.
001112010. The level 1 output at the sample time does not exceed
the threshold and produces no match.
tuned not to reflect the incoming wavelength to represent
the case of an open switch. However, the fact remains that
fiber-mirror-based correlator shown in Fig. 3 is configured this tapped-delay-line structure requires many separate
to recognize the correlation sequence 11010 (or more branches of optical fiber, each precisely cut to provide
accurately, 11x1x, since the open switches represent ‘‘don’t delay differences as small as 12 of a bit time, which
care’’ bits) by closing the first, second and fourth switches is only 1 cm in fiber length at 10 Gbps. Using FBGs, a
and setting the threshold just below a level 3. The figure more compact, producible and manageable correlator can
shows the output cross-correlation function for the input be produced [6]. The procedure for writing gratings into
sequence 10100, which will not produce a correlation peak fibers makes it relatively simple to write several FBGs
above the threshold at the sample time. into a single fiber with precise control down to centimeter
A set of fiber Bragg gratings (FBGs) can also be used spacings (see Fig. 4). Using separate piezoelectric crystals
to construct an effective fiber-based optical correlator. An or small heaters, each FBG can be independently stretched
FBG is fabricated by creating a periodic variation in the or heated to tune its reflection spectrum. Now the tapped-
fiber’s index of refraction for a few millimeters to a cen- delay line may be implemented within a single piece of
timeter of length along the fiber core [5]. Since optical fiber fiber, again with a circulator at the input to route the
is photosensitive at ultraviolet frequencies, the grating can correlation output to the threshold detector. The FBG-
be written by illuminating the fiber from the side with the based correlator shown in Fig. 4 is configured to recognize
interference pattern of two ultraviolet laser beams. A con- a bit pattern of 101 (or, more accurately, 1x1). This is
ventional FBG acts as a reflective, wavelength-selective accomplished by writing three gratings in a row, with
filter; that is, it reflects light at a particular wavelength center-to-center spacings equal to 12 of a bit time in fiber
that is determined by the spatial period of the index so that the round-trip time between gratings corresponds
grating, and passes light at all other wavelengths. The to a 1-bit delay. The reflectivity of the third grating is
bandwidth of the FBG filter depends largely on the mag- ideally 100% since it is the last ‘‘mirror’’ in the series. The
nitude of the index variation and is typically designed to reflection spectrum of the second grating is tuned away
be <100 GHz (0.8 nm) for communications applications. from the incoming signal wavelength so that it will simply
A nice feature of FBG filters is that the reflection spec- transmit the light and there will be no reflection for the
trum can be adjusted by a few nanometers via heating or 2Tbit delay. This is the equivalent of the ‘‘open switch’’ in
stretching of the grating. This allows one to alter the wave- the previous configuration. The first grating must then be
length that the FBG reflects. The reflectivity of the grating tuned to only partially reflect the incoming light so that
is nearly 100% at the center of the reflected spectrum and the remaining light can pass through to reflect off the third
falls off outside the grating bandwidth. By tuning the FBG grating. The reflectivity of the first grating must be chosen
so that the signal wavelength intersects with the rising or carefully to guarantee that the pulses reflecting off the
falling edge of the passband, the reflected energy at that first grating have power equal to that of those that reflect
wavelength will be reduced from 100% toward 0% as the off the third grating. In practice this can be determined
grating’s passband is tuned farther away. by repeatedly sending a single pulse into the FBG array
Although it is possible to produce an optical correlator and observing the powers of the two, time-delayed output
using FBGs to simply replace the fiber mirrors described pulses on an oscilloscope (the two gratings will reflect the
above, the only advantage to this would be that the optical input pulse at different times, creating two pulses at the
switches could be removed and the FBGs could simply be output). The first FBG can then be tuned until the two
706 DIGITAL OPTICAL CORRELATION FOR FIBEROPTIC COMMUNICATION SYSTEMS
pulses have equal power. The required reflectivities of the This is merely a sampling of the available optical cor-
gratings in the array can also be calculated. Assume that relator designs, specifically concentrating on those that
there are n gratings in the array that represent 1 bits are easily compatible with fiberoptic telecommunication
and therefore must provide some partial reflection of the systems and provide easy-to-understand illustrations of
incoming signal. Let the reflectivities of each grating be optical correlators. However, many other novel correlator
Rn (e.g., if Rn = 0.4, then the grating will reflect 40% configurations have been developed in research labs. These
of the incoming light and transmit the remaining 60% include correlators based on semiconductor optical ampli-
since 1 − Rn = 0.6). Then the equation to determine the fiers (SOA) [10], erbium-doped fibers (EDFs) [11], optical
reflectivities needed to achieve equal output powers of all loop mirrors [12], and nonlinear optical crystals [13].
the time-delayed output pulses is
Rn 4. IMPLEMENTATION CHALLENGES
= Rn+1 (2)
(1 − Rn )2
While the previous section detailed some of the more
Thus, the reflectivity of the last grating in the array should
be set equal to 1, and then the reflectivities of all the common optical correlator structures, there are a number
preceding gratings can be calculated using this recursive of roadblocks facing optical correlators that keep them
equation. For the case shown in Fig. 4, with R3 = 1, we from seeing wide use beyond research environments.
get R1 = (1 − R1 ), which results in R1 = 38%. The cross- However, a number of advances not only begin tackling
correlation output for the case of an input sequence of 011 these roadblocks but also aim to decrease the cost and
is depicted in Fig. 4. Limitations of this method due to the increase the commercial viability of optical correlators.
increasing attenuation of the signal as it makes a double A driving motivation for research in optical signal
pass through the series of gratings will be discussed in the processing is the fact that electronics may at some
following section. point present a bottleneck in telecommunication networks.
By replacing the switches or reflectivity-tuned FBGs Optical signals in commercial networks currently carry
in the previous configurations with phase shifters or data at 2.5 and 10 Gbps and in research environments, at
phase-encoded FBGs, the fiber-based optical intensity 10, 40, and even 160 Gbps. At these higher speeds, either
correlators described above can be used as phase electronic circuits will be unable to efficiently process the
correlators instead [7]. The holographic correlator may data traffic, or it may actually become more economical to
also be adapted to correlate with phase-encoded signals. use optical alternatives. However, optical techniques also
For these cases, the correlation sequences are a series of face significant hurdles in moving to 40 Gbps and beyond.
phase shifts such as 0π π 00π 0π , for an 8-bit codeword. In particular, optical correlators that rely on fiberoptic
As long as the incoming codeword contains the same delays require greater fabrication precision as the bit rate
phase shifts, an autocorrelation peak will result. There rises. At 40 Gbps, a single bit time is 25 ps, corresponding
are several methods of generating optical codewords to 0.5 cm of propagation in standard optical fiber. For the
containing such a series of phase shifts. One all-fiber fiber-mirror-based correlator, the differences in length of
method utilizes a single FBG, 4 cm long, with seven very the fiber branches must equal a 12 of a bit time delay, or
thin wolfram wires (diameter = 25 µm) wrapped around 2.5 mm — an impractically small length. The FBG-based
the grating every 5 mm [8]. The 5 mm spacing provides correlator has an additional problem in that not only must
a 50-ps round-trip delay, corresponding to 1 bit time at the gratings have a center-to-center spacing of 2.5 mm at
20 Gbps, and the 7 wires mean that this correlator can 40 Gbps, but the gratings themselves are often longer than
recognize an 8-bit phase-encoded sequence. By passing a 2.5 mm. While it is possible to make FBGs that are only
current through one of the wires, the FBG will be heated 100s of micrometers long, the shorter length will make it
at only that point in the grating, causing the signal being difficult, but not impossible, to achieve a high-reflectivity
reflected by the FBG to experience a phase shift at the time grating with the appropriate bandwidth to reflect the
delay corresponding to the location of the wire. The heat- higher data-rate signal. Furthermore, there must be
induced phase shift occurs because the index of refraction some method to provide precise tuning of the closely
of the glass fiber is temperature-sensitive. Note that only spaced individual FBGs. One report [14] demonstrated
point heating is desired here to induce the phase shifts — it a grating-based tapped-delay line correlator created from
is not the aim to shift the grating’s reflection spectrum, a single FBG. A periodically spaced series of thin-film
which would occur if the entire grating were heated. This metallic heaters were deposited on the surface of the
device may be used to both generate phase-encoded signals FBG to effectively create several tunable FBGs from one
at the transmitter and to correlate with them at the long one. By passing individual currents through the
receiver. The most common application of optical phase heaters, the reflection spectrums for the portions of the
correlators is to construct optical CDMA encoders and FBG beneath the heaters are tuned away. Since the
decoders. One possible advantage of phase modulation heaters can be deposited with essentially lithographic
over intensity modulation is the potential for multilevel precision, this resolves the issue of how to precisely
coding. While intensity modulation coding primarily uses fabricate and tune very closely spaced FBGs. Of course,
on/off keying (OOK), where a 1 bit is represented by this technique will eventually be limited by how closely
the presence of light and a 0 bit by the absence of it, spaced the heaters can be before thermal crosstalk
phase coding need not utilize phase shifts of only 0 and between neighboring elements becomes a significant
π . For example, another article demonstrated four-level problem. Spectral holography is perhaps one of the more
encoding, using phase shifts of 0, π/2, π , and 3π/2 [9]. promising techniques for higher bit rates, as it is currently
DIGITAL OPTICAL CORRELATION FOR FIBEROPTIC COMMUNICATION SYSTEMS 707
feasible (albeit expensive) to create holographic plates with two gratings, preceded by a splitter, can be used
that can correlate with high-speed signals. However, the instead of a single row of gratings, thereby increasing the
high absorption loss of current free-space holographic efficiency (assuming that all gratings are 1 bits) to ∼75%
correlators at communication wavelengths remains a of the incident light. This solution also alleviates the dif-
roadblock to any practical implementation. ficulty of spacing the gratings very closely together since
Coherence effects can also present problems in optical only every other grating is on each branch, at the cost of
correlators that utilize tapped-delay-line structures in introducing coherence effects that must be mitigated.
which the light is split into several time-delayed replicas
that are recombined at the correlator output. The 5. ADVANCED TOPICS
coherence time of standard telecommunication lasers
is typically tens of nanoseconds, corresponding to a As wavelength-division-multiplexed (WDM) systems are
coherence length in fiber of ∼2 ms. When the differential becoming the standard in optical networks, it is becoming
time delay between two branches of the correlator is increasingly desirable to build modules that can act on
less than the coherence time of the laser (which is multiple wavelength channels simultaneously. To this
clearly the typical case), then the recombined signals will end, there is a growing interest in multiwavelength
coherently interfere with each other, causing large power optical correlators. One study of a multiwavelength
fluctuations in the correlation output function. This effect optical correlator uses sampled FBGs in a grating-based
destroys the correlation output and must be mitigated or correlator structure. The reflection spectrum of a sampled
prevented in order to effectively operate the correlator. FBG possesses multiple passbands so that it can filter
There are a number of ways that this problem can be several wavelength channels simultaneously [16]. When
solved. A polarization controller followed by a polarization this type of FBG is stretched or heated, the entire
beamsplitter (PBS) can be used at the input of each 1 × 2 reflection spectrum shifts, causing the reflectivity at each
optical splitter/combiner to ensure that the electric field wavelength to experience the same variation. Thus, by
polarizations between the two branches are orthogonal. replacing the standard FBGs with sampled FBGs in the
This will prevent coherent interference of the recombined correlator structure described previously, incoming signals
signals because orthogonally polarized lightbeams will on multiple wavelengths can simultaneously be correlated
not interfere with each other. Thus, for more than with a single correlation sequence. While it may still be
two branches, a tree structure of 1 × 2 splitters with necessary to demultiplex these channels prior to detection
polarization controllers and polarization beamsplitters in order to check the correlation output for each channel,
can be used. Another, more manageable, solution is to this technique still significantly reduces the number of
somehow convert the coherent light into an incoherent components that would otherwise be required in a system
signal before entering the correlator. One method of doing that provided a separate correlator for each wavelength.
this uses cross-gain modulation in a semiconductor optical In Section 2, we mentioned that the conventional N-bit
amplifier (SOA) to transfer the coherent data pattern optical tapped-delay-line correlator cannot be configured
onto incoherent light, which in this case is the amplified to uniquely identify all 2N possible bit patterns. This is
spontaneous emission light generated by the SOA [15]. because a 0 bit is represented by an open switch or a
An additional problem facing the grating-based corre- grating that is tuned not to reflect any light. This really
lator is the severe optical losses associated with multiple means that these bit positions are considered ‘‘don’t care’’
passes through the FBGs. This can so significantly limit bits and the desired autocorrelation peak will result at the
the length of the correlation sequence that it can realis- sample time whether the ‘‘don’t care’’ bits are 1s or 0s.
tically correlate with before the power in the correlation This is not an issue in optical CDMA systems, where the
output pulses are so low that they drop below the sys- set of codewords can be specifically designed to maintain a
tem noise floor. As explained before, the reflectivities of constant number of 1 bits in each codeword. However, for
the gratings are determined by the need to equalize the applications that must be able to uniquely recognize any of
pulses reflecting off of each grating representing a 1 bit, the 2N possible N-bit sequences, this situation will result
while recognizing that the pulses are attenuated with in false-positive matches whenever a 1 is present where
each pass through each grating. With four 1 bits in a a 0 bit is desired. This is important for applications such
correlation sequence, only ∼65% of the incident light is as packet header or label recognition. A solution to this
present in the total correlation output. The rest is lost as a problem is to add a second correlator that is configured in
result of multiple reflections within the correlator, as the complement to the first one and produces a ‘‘match’’ signal
reflection off an internal grating can reflect again on the when zero power is present at the sample time. This is
return trip and will no longer fall within the correlation accomplished by placing a NOT gate at the output of the
time window. These multiple internal reflections do cause threshold detector which is set just above level zero. If
undesired replicas of the signal to add to the correlation the power goes above the threshold, this indicates that at
output function, but they are so attenuated by the mul- least one 1 bit is present where a 0 is desired, and the NOT
tiple reflections that they are not typically considered a gate will convert the high output to a low one to indicate
significant impairment. As the number of 1 bits increases, ‘‘no match’’ for this correlator. This correlator therefore
the correlation sensitivity will decrease as more light is correlates with the desired 0 bits in the sequence and is
lost to these reflections. One method to mitigate this prob- thus called a ‘‘zeros’’ correlator. Likewise, the originally
lem is to interleave the gratings between multiple fiber described correlator is called a ‘‘ones’’ correlator. In the
branches [6]. For the 4-bit sequence, two branches, each zeros correlator, the switches are closed for desired 0 bits
708 DIGITAL OPTICAL CORRELATION FOR FIBEROPTIC COMMUNICATION SYSTEMS
and open otherwise (or the FBGs reflect for desired 0 bits header or label recognition, technologies that can imple-
and are tuned away otherwise). Thus, the 1 bits are ‘‘don’t ment huge arrays or banks of correlators to efficiently
care’’ bits in a zeros correlator. By combining the outputs of test the incoming signal against all possible bit sequences
the ones and zeros correlators with an AND gate, the input will be needed. Reaching these goals presents a signif-
sequence will only produce a final ‘‘match’’ signal when the icant engineering challenge, but research is continuing
input pattern uniquely matches the desired correlation to make progress, and optical correlators, combined with
sequence. An illustration of how the combination of a ones the appropriate data coding techniques, offer great poten-
and a zeros correlator can avoid false positive matches tial for bringing the ever-expanding field of optical signal
is depicted in Fig. 5. The desired correlation sequence is processing to light.
1001, meaning that the ones correlator is configured to
match a 1xx1 pattern and the zeros correlator will produce BIOGRAPHIES
a match for an x00x pattern. In Fig. 5a, the incoming
sequence is 1001, and so the ones and zeros correlators John E. McGeehan received his B.S. and M.S. degrees
both produce ‘‘match’’ signals, resulting in a ‘‘match’’ signal in electrical engineering at the University of Southern
at the output of the AND gate. In Fig. 5b, the input sequence California in Los Angeles in 1998 and 2001, respectively.
is a 1101, causing the ones correlator to still produce a He joined the Optical Communications Laboratory at the
‘‘match’’ signal (this would be a false-positive match if only University of Southern California as a research assistant
this correlator were used). But the undesired 1 bit in the in 2001 and currently is working toward his doctorate.
second time slot of the input sequence causes the power His research interests include the implementation of all-
at the sample time in the zeros correlator to go above optical networking functions and optical signal processing
threshold, resulting in a ‘‘no match.’’ The combination of as well as Raman amplification and signal monitoring. He
the ‘‘match’’ and ‘‘no match’’ signals in the AND gate produce is an author or co-author of nine technical papers.
the correct ‘‘no match’’ result.
Michelle C. Hauer received the B.S. degree in engi-
neering physics from Loyola Marymount University, Los
6. CONCLUSION Angeles, California, in 1997. She currently is a research
assistant in the Optical Fiber Communications Laboratory
While optical correlators currently see limited commer- at the University of Southern California, Los Angeles, Cal-
cial application, the frontier of all-optical networking is ifornia, where she received the M.S.E.E. degree in 2000.
rapidly approaching, aided by the ever-increasing demand Her doctoral research includes optical signal processing
to transmit more bandwidth over the network core. This, techniques for implementing all-optical networking func-
in addition to the growing interest in optical CDMA net- tions. She is the author or co-author of more than 13
works, will push designers to develop producible optical research papers. She is a member of the Tau Beta Pi, Eta
correlators that can be dynamically adjusted to recognize Kappa Nu, and Sigma Pi Sigma academic honor societies
very high bit-rate sequences. For applications such as in engineering, electrical engineering, and physics, respec-
tively. She has also held a position as a systems engineer at
Raytheon Company in El Segundo, California since 1997.
(a) Pattern matching - “ones” and “zeros”
correlators both show a match Alan Willner received his B.S. from Yeshiva University
Threshold ≈ 2 and his Ph.D. from Columbia University. He has worked at
“1xx 1” Match AT&T Bell Labs and Bellcore, and is Professor of Electri-
correlator
1001 t Match cal Engineering at the University of Southern California.
““x 00x” ” Match Professor Willner has received the following awards: the
correlator
Threshold ≈ 0 National Science Foundation (NSF) Presidential Faculty
Ts
Fellows Award from the White House, the Packard Foun-
(b) Pattern mismatch - false match avoided by “zeros” correlator dation Fellowship, the NSF National Young Investigator
Threshold ≈ 2 Award, the Optical Society of America (OSA) Fellow
”
“1xx 1” False match Award, the Fulbright Foundation Senior Scholars Award,
correlator No
1 1 0 1 t the Institute of Electronic and Electrical Engineers (IEEE)
match
“x 00x ” Lasers and Electro-Optics Society (LEOS) Distinguished
correlator No match
Traveling Lecturer Award, the USC/TRW Best Engi-
Ts Threshold ≈ 0
neering Teacher Award, and the Armstrong Foundation
Figure 5. The concept of combining 1s and 0s correlators Memorial Prize. His professional activities have included:
to uniquely recognize a bit sequence and avoid false-positive editor-in-chief of the IEEE/OSA Journal of Lightwave
matches. The ‘‘ones’’ correlator tests for 1 bits in the correlation Technology, editor-in-chief of the IEEE Journal of Selected
sequence and the ‘‘zeros’’ correlator tests for 0 bits. The correlators Topics in Quantum Electronics, V.P. for Technical Affairs
shown are configured to recognize a 1001 pattern when their
of the IEEE LEOS, Co-Chair of the OSA Science and
outputs are combined in an AND gate. (a) The input pattern 1001
results in a match for both correlators, producing a final ‘‘match’’
Engineering Council, Elected Member of the IEEE LEOS
decision at the output. (b) The input pattern 1101 results in a Board of Governors, Program Co-Chair of the Conference
match for the ‘‘ones’’ correlators but a ‘‘no match’’ for the ‘‘zeros’’ on Lasers and Electro-Optics (CLEO), General Chair of
correlator. The combination of these two outputs produces the the LEOS Annual Meeting Program, Program Co-Chair of
correct ‘‘no match’’ decision at the output of the AND gate. the OSA Annual Meeting, OSA Photonics Division Chair,
DIGITAL PHASE MODULATION AND DEMODULATION 709
General Co-Chair of the OSA Optical Amplifier Meeting, 16. J. McGeehan, M. C. Hauer, A. B. Sahin, and A. E. Willner,
and Steering and Program Committee Member of the Con- Reconfigurable multi-wavelength optical correlator for
ference on Optical Fiber Communications (OFC). Professor header-based switching and routing, Proc. Conf. Optical
Fiber Communications (OFC) 2002, paper WM4, 2002,
Willner has 325 publications, including one book.
pp. 264–266.
BIBLIOGRAPHY
DIGITAL PHASE MODULATION AND
1. F. G. Stremler, Introduction to Communications Systems, 3rd DEMODULATION
ed., Addison-Wesley, Reading MA, 1990.
2. A. Stok and E. H. Sargent, Lighting the local area: Optical FUQIN XIONG
code-division multiple access and quality of service provision- Cleveland State University
ing, IEEE Network 42–46 (Nov./Dec. 2000). Cleveland, Ohio
3. J. Widjaja, N. Wada, Y. Ishii, and W. Chijo, Photonic packet
address processor using holographic correlator, Electron. Lett.
1. INTRODUCTION
37(11): 703–704 (2001).
4. J.-D. Shin, M.-Y. Jeon, and C.-S. Kang, Fiber-optic matched Digital signals or messages, such as binary 1 and 0, must
filters with metal films deposited on fiber delay-line ends for be modulated onto a high-frequency carrier before they
optical packet address detection, IEEE Photon. Tech. Lett.
can be transmitted through some communication channels
8(7): 941–943 (1996).
such as a coaxial cable or free space. The carrier is usually
5. R. Kashyap, Fiber Bragg Gratings, Academic Press, San an electrical voltage signal such as a sine or cosine function
Diego, 1999. of time t
6. D. B. Hunter and R. A. Minasian, Programmable high-speed s(t) = A cos(2π fc t + θ )
optical code recognition using fibre Bragg grating arrays,
Electron. Lett. 35(5): 412–414 (1999). where A is the amplitude, fc is the carrier frequency, and
7. A. Grunnet-Jepsen et al., Spectral phase encoding and θ is the initial phase. Each of them or a combination of
decoding using fiber Bragg gratings, Proc. Conf. Optical Fiber them can be used to carry digital messages. The total phase
Communications (OFC) 1999, paper PD33, 1999, pp. PD33- (t) = 2π fc t + θ can also be used to carry digital messages.
1–PD33-3. The process that impresses a message onto a carrier by
8. M. R. Mokhtar, M. Ibsen, P. C. Teh, and D. J. Richardson, associating one or more parameters of the carrier with the
Simple dynamically reconfigurable OCDMA encoder/decoder message is called modulation; the process that extracts a
based on a uniform fiber Bragg grating, Proc. Conf. Optical message from a modulated carrier is called demodulation.
Fiber Communications (OFC) 2002, paper ThGG54, 2002, There are three basic digital modulation meth-
pp. 688–690. ods: amplitude shift keying (ASK), frequency shift key-
9. P. C. Teh et al., Demonstration of a four-channel WDM/ ing (FSK), and phase shift keying (PSK). In ASK, data 1
OCDMA system using 255-chip 320-Gchip/s quaternary phase and 0 are represented by the presence and absence of a
coding gratings, IEEE Photon. Tech. Lett. 14(2): 227–229 burst of the carrier sine wave, respectively. The burst of
(2002). the carrier that lasts a duration of a data period is called
10. P. Petruzzi et al., All optical pattern recognition using a a symbol. The frequency and phase of the carrier are kept
segmented semiconductor optical amplifier, Proc. Eur. Conf. unchanged from symbol to symbol. In FSK, data 1 and 0
Optical Communication (ECOC) 2001, paper We.B.2.1, 2001, are represented by two different frequencies of the carrier,
pp. 304–305. respectively. The amplitude and phase of the carrier are
11. J. S. Wey, J. Goldhar, D. L. Butler, and G. L. Burdge, Inves- kept unchanged from symbol to symbol. In PSK, data 1
tigation of dynamic gratings in erbium-doped fiber for optical and 0 are represented by two different phases (e.g., 0 or
bit pattern recognition, Proc. Conf. Lasers and Electro-optics π radians) of the carrier, respectively. The amplitude and
(CLEO) 1997, paper CThW1, 1997, pp. 443–444. frequency of the carrier are kept unchanged from symbol
12. N. Kishi, K. Kawachi, and E. Yamashita, Auto-correlation to symbol.
method for weak optical short pulses using a nonlinear ampli- Modulation schemes are usually evaluated and com-
fying loop mirror, Proc. Eur. Conf. Optical Communication pared using three criteria: power efficiency, bandwidth
(ECOC) 1997, paper 448, 1997, pp. 215–218. efficiency, and system complexity.
13. Z. Zheng and A. M. Weiner, Spectral phase correlation The bit error probability (Pb ), or bit error rate (BER), as
of coded femtosecond pulses by second-harmonic gen- it is commonly called, of a modulation scheme is related to
eration in thick nonlinear crystals, Opt. Lett. 25(13): Eb /N0 , where Eb is the energy of the modulated carrier in
(2000). a bit duration, and N0 is the noise power spectral density.
14. M. C. Hauer et al., Dynamically reconfigurable all-optical The power efficiency of a modulation scheme is defined as
correlators to support ultra-fast internet routing, Proc. Conf. the required Eb /N0 for a certain bit error probability (Pb ),
Optical Fiber Communications (OFC) 2002, paper WM7, over an additive white Gaussian noise (AWGN) channel.
2002, pp. 268–270. The bandwidth efficiency is defined as the number of
15. P. Parolari et al., Coherent-to-incoherence light conver- bits per second that can be transmitted in one hertz (1 Hz)
sion for optical correlators, J. Lightwave Technol. 18(9): of system bandwidth. System bandwidth requirement
1284–1288 (2000). depends on different criteria. Assuming the system uses
710 DIGITAL PHASE MODULATION AND DEMODULATION
Nyquist (ideal rectangular) filtering at baseband, which Thus the bandwidth efficiencies of BPSK, QPSK, 8-PSK,
has the minimum bandwidth required for intersymbol- and 16-PSK are 0.5, 1, 1.5, and 2 bps/Hz.
interference (ISI)-free transmission of digital signals, then Using the bandwidth efficiency, the bit rate that can be
the bandwidth at baseband is 0.5Rs , where Rs is the symbol supported by a system bandwidth can be easily calculated.
rate, and the bandwidth at carrier frequency is W = Rs . From above we have
Since Rs = Rb / log2 M, where Rb is the bit rate, for M-ary
modulation, the bandwidth efficiency is Rb = ηB W
Rb For example, for a satellite transponder that has a
ηB = = log2 M (bps/Hz) (Nyquist)
W bandwidth of 36 MHz, assuming that Nyquist filtering
This is called the Nyquist bandwidth efficiency. For is achieved in the system, the bit rate will be 36 Mbps
modulation schemes that have power density spectral for BPSK, 72 Mbps for QPSK, and so on. If null-to-null
nulls such as the ones of PSK in Fig. 2, the bandwidth bandwidth is required in the system, the bit rate will be
may be defined as the width of the main spectral lobe. 18 Mbps for BPSK, 36 Mbps for QPSK, and so on. Practical
Bandwidth efficiency based on this definition of bandwidth systems may achieve bit rates somewhere between these
is called null-to-null bandwidth efficiency. If the spectrum two sets of values.
of the modulated signal does not have nulls, null-to-null In this article, we first describe the commonly used class
bandwidth no longer exists, as in the case of continuous- of PSK schemes, that is, MPSK. Then BPSK and QPSK
phase modulation (CPM). In this case, energy percentage are treated as special cases of MPSK. Next, differential
bandwidth may be used as a definition. Usually 99% is PSK schemes are introduced, which are particularly useful
used, even though other percentages (e.g., 90%, 95%) are in channels where coherent demodulation is difficult or
also used. Bandwidth efficiency based on this definition of impossible, such as fading channels. At the end of this
bandwidth is called percentage bandwidth efficiency. article, advanced phase modulation schemes, such as
System complexity refers to the circuit implementation offset QPSK (OQPSK), π/4-DQPSK, and continuous-phase
of the modulator and demodulator. Associated with the modulation (CPM), including multi-h CPM, are briefly
system complexity is the cost of manufacturing, which introduced. A future trend in phase modulation is in the
is, of course, a major concern in choosing a modulation direction of CPM and multi-h CPM.
technique. Usually the demodulator is more complex than
the modulator. A coherent demodulator is much more 2. PSK SCHEMES AND THEIR PERFORMANCE
complex than a noncoherent demodulator since carrier
recovery is required. For some demodulation methods, 2.1. Signal Waveforms
sophisticated algorithms, such as the Viterbi algorithm, In PSK, binary data are grouped into n bits in a group,
are required. All these are basis for complexity comparison. called n-tuples or symbols. There are M = 2n possible
In comparison with ASK and FSK, PSK achieves better n-tuples. When each n-tuple is used to control the
power efficiency and bandwidth efficiency. For example, in phase of the carrier for a period of T, M-ary phase
terms of power efficiency, binary PSK is 3 dB better than shift keying (MPSK) is formed. The MPSK signal set is
binary ASK and FSK at high signal-to-noise ratios, while defined as
BPSK is the same as BASK and better than BFSK in terms
of bandwidth efficiency. Because of the advantages of PSK si (t) = A cos(2π fc t + θi ), 0 ≤ t ≤ T,
i = 1, 2, . . . , M
schemes, they are widely used in satellite communications, (1)
fixed terrestrial wireless communications, and wireless where A is the amplitude, fc is the frequency of the
mobile communications. carrier, and
A more complex, but more efficient PSK scheme is (2i − 1)π
θi =
quadrature phase shift keying (QPSK), where the initial M
phase of the modulated carrier at the start of a symbol is are the (initial) phases of the signals. Note that θi are
any one of four evenly spaced values, say, (0, π/2, π, 3π/2) equally spaced. Each signal si (t) is a burst of carrier
or (π/4, 3π/4, 5π/4, 7π/4). In QPSK, since there are four
of period T, called a symbol waveform. Each symbol
different phases, each symbol can represent two data bits.
waveform has a unique phase that corresponds to a n-
For example, 00, 01, 10, and 11 can be represented by
tuple; or, in other words, each n-tuple is represented
0, π/2, π , and 3π/2, respectively. In general, assuming
by a symbol waveform with a unique phase. When the
that the PSK scheme has M initial phases, known as M-
PSK signal is transmitted and received, the demodulator
ary PSK or MPSK, the number of bits represented by a
detects the phase of each symbol. On the basis of
symbol is n = log2 M. Thus each 8-PSK symbol represents
phase information, the corresponding n-tuple is recovered.
3 bits, each 16-PSK symbol represents 4 bits, and so on.
Except for the phase difference, the M symbol waveforms
As the order (M) of the PSK scheme is increased, the
in MPSK have the same amplitude (A), the same
bandwidth efficiency is increased. In terms of null-to-null
frequency (fc ), and the same energy:
bandwidth, which is often used for PSK schemes, the
efficiency is T
A2 T
E= s2i (t) dt = , i = 1, 2, . . . , M
Rb Rb Rb 1 0 2
ηB = = = = log2 M
W 2Rs 2Rb / log2 M 2 The simplest PSK scheme is binary phase shift key-
(bps/Hz) (null-to-null) ing (BPSK), where binary data (1 and 0) are represented
DIGITAL PHASE MODULATION AND DEMODULATION 711
by two symbols with different phases. This is the case Thus PSK signals can be graphically represented by
when M = 2 in MPSK. Typically these two phases are 0 a signal constellation in a two-dimensional coordinate
and π . Then the two symbols are ±A cos 2π fc t, which are system with the two orthonormal basis functions in
said to be antipodal. Eqs. (5) and (6), φ1 (t) and φ2 (t), as its horizontal and
If binary data are grouped into 2 bits per group, called vertical axes, respectively (Fig. 1). Each signal si (t) is
dibits, there are four possible dibits: 00, 01, 10, and 11. represented by a point (si1 , si2 ), or a vector from the
If each dibit is used to control the phase of the carrier, origin to this point, in the two √ coordinates. The polar
then we have quadrature phase shift keying (QPSK). This coordinates of the signal√ are ( E, θi ); that is, the signal
is the case when M = 4 in MPSK. Typically, the initial vector’s magnitude is E and its angle with respect to the
signal phases are π/4, 3π/4, 5π/4, and 7π/4. Each phase horizontal axis is θi . The
√ signal points are equally spaced
corresponds to a dibit. QPSK is the most often used on a circle of radius E and centered at the origin. The
scheme since its BER performance is the same while the bits-signal mapping could be arbitrary provided that the
bandwidth efficiency is doubled in comparison with BPSK, mapping is one-to-one. However, a method called Gray
as will be seen shortly. coding is usually used in signal assignment in MPSK.
Gray coding assigns n-tuples with only one-bit difference
2.2. Signal Constellations
in two adjacent signals in the constellation. When an M-
It is well known that sine-wave signals can be represented ary symbol error occurs because of the presence of noise,
by phasors. The phasor’s magnitude is the amplitude it is most likely that the signal is detected as the adjacent
of the sine-wave signal; its angle with respect to the signal on the constellation; thus only one of the n input
horizontal axis is the phase of the sine-wave signal. In a bits is in error. Figure 1 shows the constellations of BPSK,
similar way, signals of modulation schemes, including PSK QPSK, and 8-PSK, where Gray coding is used for bit
schemes, can be represented by phasors. The graph that assignment.
shows the phasors of all symbols in a modulation scheme
is called a signal constellation. It is a very convenient 2.3. Power Spectral Density
tool for visualizing and describing all symbols and their
relationship in the modulation scheme. It is also a powerful The power spectral density (PSD) of a signal shows the
tool for analyzing the performance of the modulation signal power distribution as a function of frequency. It
scheme. provides very important information about the relative
The PSK waveform can be written as strength of the frequency components in the signal, which,
in turn, determines the bandwidth requirement of the
si (t) = A cos θi cos 2π fc t − A sin θi sin 2π fc t (2) communication system.
= si1 φ1 (t) + si2 φ2 (t), 0 ≤ t ≤ T, i = 1, 2, . . . , M The power spectral density of a bandpass signal, such
as a PSK signal, is centered about its carrier frequency.
where Moving the center frequency down to zero, we obtain the
* baseband signal PSD, which completely characterizes the
2
φ1 (t) = cos 2π fc t, 0≤t≤T (3) signal [1–3]. The baseband PSD expression of MPSK can
T be expressed as [3]
*
2 2 2
φ2 (t) = − sin 2π fc t, 0≤t≤T (4) sin π fT sin π fnTb
T
s̃ (f ) = A2 T = A2 nTb (7)
π fT π fnTb
are two orthonormal basis functions and
T √ where Tb is the√bit duration, n = log2 M. Figure 2 shows
si1 = si (t)φ1 (t) dt = E cos θi (5) the PSDs (A = 2 and Tb = 1 for unit bit energy: Eb = 1)
0
for different values of M where the frequency axis is
T √ normalized to the bit rate (fTb ). From the figure we can
si2 = si (t)φ2 (t) dt = E sin θi (6)
0 see that the bandwidth decreases with M, or in other
words, the bandwidth efficiency increases with M. The
are the projections of the signal onto the basis functions.
Nyquist bandwidth is
The phase is related with si1 and si2 as
si2 1 Rb
θi = tan−1 BNyquist = =
si1 nTb log2 M
11 010 110
01 S2 2p
S2 √E S 1 011 S3 111 M Z1
0 S2 1 S1 q1 Z1 S4 S1
(a) 8 (b) 10
M = 16
M =8
6
M =4 M =2
M = 16
Ψ(f ) 4 Ψ(f ) −20
M =8
M =4
2
M =2
0 −50
0 0.5 1 1.5 2 0 0.5 1 1.5 2
f ⋅Tb f ⋅Tb
s1(t ), T = nTb
2
cos 2pfc t
T
Level +
n bits of {ak } MPSK signal
generator Osc. Σ
(si 1,si 2 )
+
p/2
2
− sin 2pfc t
Figure 3. MPSK modulator (Osc. = oscil- T
lator). (From Ref. 3, copyright 2000
s2(t ), T = nTb
Artech House.)
This translates to the Nyquist bandwidth efficiency of amplitude defined on [0, T]. Expression (8) requires that
log2 M. The null-to-null bandwidth is the carrier frequency be an integer multiple of the symbol
timing so that the initial phase of the signal in any symbol
2 2Rb
Bnull−to−null = = period is θk .
nTb log2 M Since MPSK signals are two-dimensional, for M ≥ 4,
This translates to a bandwidth efficiency of 0.5 log2 M, the modulator can be implemented by a quadrature
which is half of that of the Nyquist bandwidth efficiency. modulator (Fig. 3), where the upper branch is called the
Practical systems can achieve bandwidth efficiencies in-phase channel or I channel, and the lower branch is
between these two values. called the quadrature channel or the Q channel. The only
difference for different values of M is the level generator.
2.4. Modulator and Demodulator Each n-tuple of the input bits is used to control the
Over the entire time axis, we can write MPSK signal as level generator. It provides the I and Q channels the
* * particular sign and level for a signal’s horizontal and
π π vertical coordinates, respectively.
s(t) = s1 (t) cos 2π fc t − s2 (t) sin 2π fc t,
2 2 Modern technology intends to use completely digital
−∞< t<∞ (8) devices. In such an environment, MPSK signals are
digitally synthesized and fed to a D/A (digital-to-analog)
where converter whose output is the desired phase-modulated
signal.
√
∞
The coherent demodulation of MPSK is shown in
s1 (t) = E cos(θk )p(t − kT) (9)
k=−∞
Fig. 4. The input to the demodulator is the received
signal r(t) = si (t) + n(t), where n(t) is the additive white
√
∞
Gaussian noise (AWGN). There are two correlators, each
s2 (t) = E sin(θk )p(t − kT) (10)
k=−∞
consisting of a multiplier and an integrator. The carrier
recovery (CR) block synchronizes the reference signals
where θk is one of the M phases determined by the input for the multipliers with the transmitted carrier in
binary n-tuple and p(t) is the rectangular pulse with unit frequency and phase. This makes the demodulation
DIGITAL PHASE MODULATION AND DEMODULATION 713
(k +1)T r 1k
∫kT dt
2
cos 2pfc t
T
^ Choose ^
r (t ) r 2k qk ^ qi
CR tan−1 r qi − qk the
1k
smallest
2
− sin 2pfc t
T
(k +1)T Figure 4. Coherent MPSK demodulator using
∫kT dt
r 2k two correlators. (From Ref. 3, copyright 2000
Artech House.)
coherent and ensures the lowest bit error probability. The (a) Polar NRZ source a (t ) Aa (t )cos 2pfc t
correlators correlate r(t) with the two reference signals.
Because of the orthogonality of the two components of (−1, +1)
the PSK signal, each correlator produces an output as Acos 2pfc t
follows:
T T
r1 = r(t)φ1 (t) dt = [s(t) + n(t)]φ1 (t) dt = si1 + n1 Osc.
0 0
T T
r2 = r(t)φ2 (t) dt = [s(t) + n(t)]φ2 (t) dt = si2 + n2 (b)
0 0 (k +1)T l 1 1 or 0
r (t ) ∫kT dt 0
0
where si1 and si2 are given as in Eqs. (5) and (6), cos 2pfc t
respectively, and n1 and n2 are noise output. In Fig. 4
the subscript k indicates the kth symbol period.
CR
Define
+ r2
θ = tan−1
r1 Figure 5. BPSK modulator (a); coherent BPSK demodulator (b).
(From Ref. 3, copyright 2000 Artech House.)
Polar NRZ, Q (t )
(b) 1
(k +1)T r 1k
∫kT dt 0
0
2
cos 2pfc t
T
Output
r (t ) binary data
CR P/S
2
− sin 2pfc t
T
(k +1)T 1
∫kT dt
r 2k
0
0
Figure 6. (a) QPSK modulator; (b) QPSK demodulator. (From Ref. 3, copyright 2000 Artech House.)
When M = 2, (11) results in the formula for the symbol For M > 4, expression (11) cannot be evaluated in a
(and bit) error probability of BPSK: closed form. However, the symbol error probability can be
( obtained by numerically integrating (11).
2E Figure 7 shows the graphs of Ps for M = 2, 4, 8, 16,
Ps = Pb = Q
b
(BPSK) (14) and 32 given by the exact expression (11). Beyond
No
M = 4, doubling the number of phases, or increasing
the n-tuples represented by the phases by one bit,
where the Eb is the bit energy, which is also equal to the requires a substantial increase in Eb /N0 [or signal-to-
symbol energy E since a bit is represented by a symbol in noise ratio (SNR)]. For example, at Ps = 10−5 , the SNR
BPSK, and ∞ difference between M = 4 and M = 8 is approximately
1 2
Q(x) = √ e(−u /2) du (15) 4 dB, the difference between M = 8 and M = 16 is
x 2π approximately 5 dB. For large values of M, doubling the
is called the Q function. number of phases requires an SNR increase of 6 dB to
When M = 4 [Eq. (11)] results in the formula for QPSK: maintain the same performance.
For E/N0 1, we can obtain an approximation of the
( ( 2 Ps expression as
2Eb 2Eb
Ps = 2Q − Q (16) (
N0 N0
2E π
Ps ≈ 2Q sin (18)
Since the demodulator of QPSK is simply a pair of two N0 M
parallel BPSK demodulators, the bit error probability is
the same as that of BPSK:
Note that only the high signal-to-noise ratio assumption
(
is needed for the approximation. Therefore (18) is good for
2E
Pb = Q any values of M, even though it is not needed for M = 2
b
(QPSK) (17)
N0 and 4 since precise formulas are available.
The bit error rate can be related to the symbol error
Recall that QPSK’s bandwidth efficiency is double that rate by
of BPSK. This makes QPSK a preferred choice over BPSK Ps
Pb ≈ (19)
in many systems. log2 M
DIGITAL PHASE MODULATION AND DEMODULATION 715
dk
dk−1 Delay Acos 2pfc t
T
(b) 1
r (t ) (k +1)T 0 or 1
BPF ∫kT dt 0
0
Delay ref.
T
Figure 8. DBPSK modulator (a) and demodulator (b). (From Ref. 3, copyright 2000 Artech House.)
(k +1)T Wk
∫kT dt
cos 2pfc t
Delay Wk −1
T
r (t ) +
LO Xk 1 0 or 1
p Σ 0
− 0
2 Delay +
T Z k −1
sin 2pfc t
(
E for W > 1
Pb ≈ Q 1.10−5
b
N0 T
1 Coherent BPSK
≈ √ & e−Eb /2N0 1.10−6
2 π Eb /2N0
1
for W > (suboptimum DBPSK) (23) 1.10−7
T
A differentially encoded BPSK signal can also be MPSK. Compared with coherent MPSK, asymptotically
demodulated coherently (denoted as DEBPSK). It is the DMPSK requires 3 dB more SNR to achieve the same
used when the purpose of differential encoding is to error performance.
eliminate phase ambiguity in the carrier recovery circuit
for coherent PSK (see Section 4.10 of Ref. 3). This is rarely 4. OTHER PSK SCHEMES
denoted by the acronym DBPSK. DBPSK refers to the
scheme of differential encoding and differentially coherent 4.1. Offset QPSK
demodulation as we have discussed above. Offset QPSK (OQPSK) is devised to avoid the 180◦ phase
In the case of DEBPSK, the bit error probability (Pb ) of shifts in QPSK [3,5]. OQPSK is essentially the same as
the final decoded sequence {+ ak } is related to the bit error QPSK except that the I- and Q-channel pulsetrains are
probability (Pb,d ) of the demodulated encoded sequence staggered. The OQPSK signal can be written as
{+
dk } by
Pb = 2Pb,d (1 − Pb,d ) (24) A
s(t) = √ I(t) cos 2π fc t
2
(see Section 2.4.1 of Ref. 3). Substituting Pb,d as in (14)
A T
into Eq. (24), we have − √ Q t− sin 2π fc t, −∞ < t < ∞ (27)
2 2
( (
2E 2E The modulator and the demodulator of OQPSK are
Pb = 2Q 1 − Q (DEBPSK) (25)
b b
N0 N0 basically identical to those of QPSK, except that an extra
delay of T/2 seconds is inserted in the Q channel in the
modulator and in the I channel in the demodulator.
for coherently detected differentially encoded PSK. For
Since OQPSK differs from QPSK only by a delay in the
large SNR, this is just about 2 times that of coherent
Q-channel signal, its power spectral density is the same as
BPSK without differential encoding.
that of QPSK, and its error performance is also the same
Finally we need to say a few words about the
as that of QPSK.
power spectral density of differentially encoded BPSK.
In comparison to QPSK, OQPSK signals are less
Since the difference of differentially encoded BPSK from
susceptible to spectral sidelobe restoration in satellite
BPSK is differential encoding, which always produces
transmitters. In satellite transmitters, modulated signals
an asymptotically equally likely data sequence (see
must be band-limited by a bandpass filter in order to
Section 2.1 of Ref. 3), the PSD of the differentially encoded
conform to out-of-band emission standards. The filtering
BPSK is the same as BPSK, in which we assume that its
degrades the constant-envelope property of QPSK, and
data sequence is equally likely. However, it is worthwhile
the 180◦ phase shifts in QPSK cause the envelope to go
to point out that if the data sequence is not equally likely,
to zero momentarily. When this signal is amplified by the
the PSD of the BPSK is not the one in Fig. 2, but the PSD
final stage, usually a highly nonlinear power amplifier, the
of the differentially encoded PSK is still the one in Fig. 2.
constant envelope will be restored. But at the same time
3.2. Differential QPSK and MPSK the sidelobes will be restored. Note that arranging the
bandpass filter after the power amplifier is not feasible
The principles of differential BPSK can be extended to since the bandwidth is very narrow compared with the
MPSK, including QPSK. In differential MPSK, informa- carrier frequency. Hence the Q value of the filter must
tion bits are first differentially encoded. Then the encoded be extremely high such that it cannot be implemented
bits are used to modulate the carrier to produce the dif- by the current technology. In OQPSK, since the 180◦
ferentially encoded MPSK (DEMPSK) signal stream. In phase shifts no longer exist, the sidelobe restoration is
a DEMPSK signal stream, information is carried by the less severe.
phase difference θi between two consecutive symbols.
There are M different values of θi ; each represents an 4.2. π/4-DQPSK
n-tuple (n = log2 M) of information bits. Although OQPSK can reduce spectral restoration caused
In light of the modern digital technology, DEMPSK by a nonlinearity in the power amplifier, it cannot be
signals can be generated by digital frequency synthesis differentially encoded and decoded. π/4-DQPSK is a
technique. A phase change from one symbol to the next is scheme that not only has no 180◦ phase shifts like
simply controlled by the n-tuple that is represented by the OQPSK but also can be differentially demodulated.
phase change. This technique is particularly suitable for These properties make it particularly suitable for mobile
large values of M. communications where differential demodulation can
Demodulation of DEMPSK signal is similar to that of reduce the adversary effects of the fading channel. π/4-
differential BPSK [3]. The symbol error probability of the DQPSK has been adopted as the standard for the digital
differentially coherent demodulator is approximated by cellular telephone system in the United States and Japan.
( π/4-DQPSK is a form of differential QPSK. The
2E π correspondence between data bits and symbol phase
Ps ≈ 2Q sin √ (optimum DMPSK) (26)
difference are shown as follows.
N0 2M
sin 2pfc t the continuous, time-varying phase (t, a), which usually
is influenced by more than one symbol. The transmitted
M-ary symbol sequence a = {ak } is embedded in the excess
phase
∞
(t, a) = 2π h ak q(t − kT) (29)
k=−∞
with
t
q(t) = g(τ ) dτ (30)
−∞
cos 2pfc t
where g(t) is a selected pulseshape. The M-ary data ak may
take any of the M values: ±1, ±3, . . . , ±(M − 1), where M
usually is a power of 2. The phase is proportional to
the parameter h which is called the modulation index.
Phase function q(t), together with modulation index h and
input symbols ak , determine how the phase changes with
time. The derivative of q(t) is function g(t), which is the
frequency shape pulse. The function g(t) usually has a
smooth pulseshape over a finite time interval 0 ≤ t ≤ LT,
Figure 11. π/4-DQPSK signal constellation. (From Ref. 3, copy- and is zero outside. When L = 1, we have a full-response
right 2000 Artech House.) pulseshape since the entire pulse is in a symbol time T.
When L > 1, we have a partial-response pulseshape since
only part of the pulse is in a symbol time T.
We can see that the phase changes are confined to odd- The modulation index h can be any real number
number multiples of π/4(45◦ ). There are no phase changes in principle. However, for development of practical
of 90◦ or 180◦ . In addition, information is carried by the maximum-likelihood CPM detectors, h should be chosen
phase changes θk , not the absolute phase k . The signal as a rational number.
constellation is shown in Fig. 11. The angle of a vector Popular frequency shape pulses are the rectangular
(or symbol) with respect to the positive direction of the pulse, the raised-cosine pulse, and the Gaussian pulse.
horizontal axis is the symbol phase k . The symbols The well-known Gaussian minimum shift keying (GMSK)
represented by • can become symbols represented only is a CPM scheme that uses the Gaussian frequency pulse:
by ×, and vice versa. Transitions among themselves are
! ' ) ' )"
not possible. The phase change from one symbol to the 1 t− T t+ T
other is θk . g(t) = Q 2π Bb √ 2 − Q 2π Bb √ 2 ,
2T ln 2 ln 2
Since information is carried by the phase changes
θk , differentially coherent demodulation can be used. 0 ≤ Bb T ≤ 1 (31)
However, coherent demodulation is desirable when higher
power efficiency is required. There are four ways to where Bb is the 3-dB bandwidth of the premodulation
demodulate a π/4-DQPSK signal: baseband differential Gaussian filter, which implements the Gaussian pulse
detection, IF band differential detection, FM-discriminator effect [9]. GMSK is currently used in the U.S. cellular
detection, and coherent detection. The error probability digital packet data system and the European GSM system.
of the π/4-DQPSK in the AWGN channel is about 2 dB If the modulation index h is made to change cyclically,
inferior to that of coherent QPSK at high SNRs [3,8]. then we obtain multi-h CPM (or MHPM); the phase is
digital signal processing power of the electronic devices, 10. C-E. Sundberg, Continuous phase modulation: A class of
the practical implementation and use of CPM and MHPM jointly power and bandwidth efficient digital modulation
is quickly emerging. schemes with constant amplitude, IEEE Commun. Mag.
Significant contributions to CPM schemes, including 24(4): 25–38 (April 1986).
signal design, spectral and error performance analysis 11. T. Aulin and C-E. Sundberg, Continuous phase modula-
were made by Sundberg, Aulin, Svensson, and Anderson, tion — part I: Full response signaling, IEEE Trans. Commun.
among other authors [10–13]. Excellent treatment of CPM 29(3): 196–206 (March 1981).
up to 1986 can be found in the book by Anderson, et al. [13] 12. T. Aulin, N. Rydbeck, and C-E. Sundberg, Continuous phase
or the article by Sundberg [10]. A relatively concise, but modulation — Part II: Partial response signaling, IEEE
up-to-date, description of CPM and MHPM can be found Trans. Commun. 29(3): 210–225 (March 1981).
in Ref. 3. 13. J. B. Anderson, T. Aulin, and C-E. Sundberg, Digital Phase
Modulation, Plenum, New York, 1986.
BIOGRAPHY
Fuqin Xiong received his B.S. and M.S. degrees in elec- DISTRIBUTED INTELLIGENT NETWORKS
tronics engineering from Tsinghua University, Beijing,
China, and a Ph.D. degree in electrical engineering from IAKOVOS S. VENIERIS
University of Manitoba, Winnipeg, Manitoba, Canada, MENELAOS K. PERDIKEAS
in 1970, 1982, and 1989, respectively. He was a fac- National Technical University
ulty member at the Department of Radio and Electronics of Athens
in Tsinghua University from 1970 to 1984. He joined Athens, Greece
the Department of Electrical and Computer Engineering,
Cleveland State University, Ohio, as an assistant profes- 1. DEFINITION AND CHARACTERISTICS OF THE
sor in 1990, where he is currently a full professor and DISTRIBUTED INTELLIGENT NETWORK
director of the Digital Communications. Research Labora-
tory. He was a visiting scholar at University of Manitoba The distributed intelligent network represents the next
in 1984–1985, visiting fellow at City University of Hong stage of the evolution of the intelligent network (IN)
Kong, Kowloon, in spring 1997, and visiting professor concept. The term corresponds to no specific technology
at Tsinghua University in summer 1997. His areas of or implementation but rather is used to refer, collectively,
research are modulation and error control coding. He is the to a family of architectural approaches for the provisioning
author of the best selling book Digital Modulation Tech- of telecommunication services that are characterized by:
niques, published by Artech House, 2000. He is an active use of distributed, object oriented and, potentially, mobile
contributor of technical papers to various IEEE and IEE code technologies, and a more open platform for service
technical journals and IEEE conferences. He has been a provisioning.
reviewer for IEEE and IEE technical journals for years. He The intelligent network [1] first emerged around 1980
has been the principal investigator for several NASA spon- when value-added services that had previously been
sored research projects. He is a senior member of IEEE. offered only on a private branch exchange basis involving
mainly corporate users, first begun to be made available
on the public network as well. The problem was that
BIBLIOGRAPHY the only way that new services could be introduced
using the infrastructure of the time required upgrading
1. J. G. Proakis, Digital Communications, 2nd ed., McGraw- a large number of deployed telephone switches. In the
Hill, New York, 1989.
first available programmable switches, services resided
2. S. Haykin, Digital Communications, Wiley, New York, 1988. in the memory space of each switch and service view
3. F. Xiong, Digital Modulation Techniques, Artech House, was therefore local. Thus, each installed service provided
Boston, 2000. a number of additional features on the switch it was
4. J. H. Park, Jr., On binary DPSK detection, IEEE Trans. installed and so could be provided only for those telephone
Commun. 26(4): 484–486 (April 1978). calls that were routed through that particular switch.
A certain switch could have a number of services
5. B. Sklar, Digital Communications, Fundamentals and Appli-
implemented in it, while others could have different
cations, Prentice-Hall, Englewood Cliffs, NJ, 1988.
sets or none. Also, there was no guarantee that the
6. S. Benedetto, E. Biglieri, and V. Castellani, Digital Trans- implementation of a service would behave in the same way,
mission Theory, Prentice-Hall, Englewood Cliffs, NJ, 1987.
uniformly across switches coming from different vendors.
7. K. M. Simon, S. M. Hinedi, and W. C. Lindsey, Digital Com- This was the era of the ‘‘switch-based services.’’
munication Techniques, Signal Design and Detection, New services therefore could not be provided nation-
Prentice-Hall, Englewood Cliffs, NJ, 1995. wide simultaneously and often even when all involved
8. C. L. Liu and K. Feher, π/4-QPSK Modems for satellite switches had been upgraded, different service dialects
sound/data broadcast systems, IEEE Trans. Broadcast. existed as a result of the heterogeneity of the equipment
(March 1991). and the unavoidable discrepancies between the implemen-
9. G. Stüber, Principle of Mobile Communication, Kluwer, tation of the same service in switches provided by different
Boston, 1996. vendors. These problems were further aggravated by the
720 DISTRIBUTED INTELLIGENT NETWORKS
need to guard against undesirable feature interaction first of these three elements (the state machine abstracting
problems: a process that even now cannot be automated call and connection processing operations in the switch)
or tackled with, in a systematic, algorithmic manner and was not very formalized as switch-based services were
usually involves manually checking through hundreds or using whatever nonstandardized programming handles a
even thousands of possible service feature combinations. vendor’s switch was exposing. This was, after all, why
Since this process had to also account for the different ser- for a new service to be introduced, direct tampering
vice dialects, each service that was added to the network with all affected switches was necessary, and in fact
increased the complexity in a combinatorial manner. the implementation of the same service needed to be
The result was that service introduction soon became different in different switches. Also this accounted for
so costly that new services could be provided only the fact that, considering the heterogeneous nature of the
infrequently. The intelligent network idea sought to switches comprising a certain operator’s network, different
alleviate these problems. The key concept was to separate service dialects were present, resulting in a nonuniform
the resources (public telephony switches) from the logic provision of a given service depending on the switch to
that managed them. Once this was achieved, the logic which a subscriber was connected. Intelligent networks
could be installed in a central location called ‘‘service changed all that by defining a common abstraction for
control point’’ and from there, operate on the switches all switches and by centralizing service logic to a few
under its control. Service introduction would consist (often one) easily administrated and managed servers.
simply in the installation of a new software component The abstraction of the switches was that of a state
in the service control point and would therefore take machine offering hooks for interest on certain call events
effect immediately once this was completed and once to be registered, and supporting a protocol that allowed
involved switches were configured to recognize the new remote communication between the switches and the now
service prefixes. Once a switch identified an ‘‘intelligent centrally located service logic programs. It will be shown
network call,’’ that is, a call prefixed with a number in the following paragraphs that this powerful and well-
corresponding to an IN service (e.g., 0800 numbers), engineered abstraction is also vital in the distributed
it would suspend call processing and request further intelligent network.
instructions on how to proceed from the remotely located
service or service logic program. No other logic therefore 2. THE NEED FOR DISTRIBUTED INTELLIGENT
needed to be installed in the switches apart from a generic NETWORKS
facility for recognizing such IN calls and participating in
the interaction with the remotely located service logic The IN concept represents the most important evolu-
programs. This interaction consisted of notifying the tion in telecommunications since the introduction of pro-
service logic programs of intelligent network calls or grammable switches that replaced the old electromechanic
other important call events, receiving instructions from equipment. It allows for the introduction of new services,
them, and putting these instructions into effect by issuing quickly, instantly, across large geographic areas, and at
the appropriate signaling messages toward the terminals the cost of what is essentially a software development pro-
that requested the IN service or towards other resources cess as contrasted to the cost of directly integrating a new
involved in the provisioning of the service. Typical service in the switching matrix of each involved telephony
resources of the latter kind were intelligent peripherals center.
and specialized resource functions where recorded service However, after 1998, a number of technical and socioe-
messages were and still are typically kept. conomic developments have opened up new prospects and
In order to implement a mechanism such as the business opportunities and also have posed new demands,
one called for by the IN conception, three key artifacts which traditional IN architectures seem able to accom-
are needed: (1) a finite state machine implementing an modate only poorly: (1) use of mobile telephony became
abstraction of the call resources of a public switch, (2) a widespread, particularly in Europe and Japan; (2) the
remote centralized server where programs providing the Internet came of age both in terms of the penetration it
algorithmic logic of a service are executed, and (3) a achieved and is projected to achieve, and also in terms of
protocol for remote interaction between the switch and the the business uses it is put to; and (3) deregulation seems
service logic programs. Through this protocol (1) a limited to be the inevitable process globally setting network oper-
aperture of visibility of the switch functionality — in terms ators and carriers in fierce competition against each other.
of the said finite state machine — is presented to the The import of the first two of these developments is that
service logic programs; and (2) hooks are provided allowing the public switched telephony network is no longer the
the latter to influence call processing in order to effect the only network used for voice communications. Cellular tele-
desired behavior of any given service. phony networks and also telephony over the Internet are
As noted above, prior to introduction of the intelligent offering essentially the same services. Therefore, an archi-
network concept the last two of these three elements were tecture for service creation such as the IN that is entirely
not existent as service logic was embedded into every focused on the public telephony network falls short of pro-
switch in order for a service to be provided. There was viding the universal platform for service provisioning that
therefore no remote centralized server responsible for one would ideally have wished: a single platform offering
service execution and, hence, due to the locality of the the same services over the wired, wireless and Internet
interaction, no protocol needed to support the dialogue components of a global integrated network for voice and
between the switch and the service logic program. Even the data services. Deregulation, on the other hand, has two
DISTRIBUTED INTELLIGENT NETWORKS 721
primary effects; first, as noted above, it promotes com- technologies such as the Common Object Request Broker
petition between operators for market share making the Architecture (CORBA), Java’s Remote Method Invocation
introduction of new and appealing services a necessity (RMI) or Microsoft’s Distributed Component Object Model
for a carrier that wishes to stay in business. Since the (DCOM) to support the switch — services interaction.
basic service offered by any carrier is very much the same, In a traditional IN implementation, the Intelligent
a large amount of differentiation can be provided in the Network Application Protocol (INAP) information flows
form of value-added or intelligent services that are appeal- are conveyed by means of static, message-based, peer-
ing and useful to end users. The new telecommunications to-peer protocols executed at each functional entity. The
landscape is no longer homogenous but instead encom- static nature of the functional entities and of the protocols
passes a variety of networks with different technologies they employ means that in turn the associations between
and characteristics. The prospect for innovative services them are topologically fixed. An IN architecture as defined
that focus not only on the telephone network but also Inter- in Ref. 2 is inherently centralized with a small set of
net and mobile networks is immense. Particularly useful service control points and a larger set of service switching
would be hybrid services that span network boundaries points engaged with it in INAP dialogs. The service
and involve heterogeneous media and access paradigms. control points are usually the bottleneck of the entire
Typical services belonging to this genre are ‘‘click to’’ ser- architecture and their processing capacity and uptime
vices whereby a user can point to a hyperlink in his or hers in large extent determine the number of IN calls the
browsers and as a result have a phone or fax (Facsimile) entire architecture can handle effectively. Distributed
call being set up in the network. Also, media conversion object technologies can help alleviate that problem by
services whereby a user who is not able to receive a certain, making associations between functional entities less rigid.
urgent email can have a phone call in his mobile terminal This is a by-product of the location transparencies that
and listen to the contents of the mail using text to speech use of these technologies introduces in any context. More
conversion. All these services cannot be offered by relying importantly, the fact that under these technologies the
solely on an intelligent network. Other technologies and physical location of an entity’s communicating peer is not
platforms would have to be combined and this would not manifested makes service provisioning much more open.
result in a structured approach to service creation. Essen- This will be explained in more detail in later paragraphs.
tially, traditional intelligent networks cannot fulfill the The remainder of this encyclopedia article is structured
new demands on rapid service creation and deployment in as follows: the distributed IN’s conceptual model is
a converged Internet — telecommunications environment introduced and juxtaposed with that of the traditional
because the whole concept of this technology had at its IN, typical distributed IN implementation issues are
focus the public telephony network and revolved around presented, and finally some emerging architectures that
its protocols, mechanisms, and business models. Inter- adhere to the same distributed IN principles are identified.
net and cellular telephony, on the other hand, have their
own protocols, infrastructure, and mechanisms, and these 3. THE DISTRIBUTED INTELLIGENT FUNCTIONAL
cannot be seamlessly incorporated into an architecture MODEL
designed and optimized with a different network in mind.
Furthermore, the business model envisaged by the tradi- According to the International Telecommunication Unions
tional intelligent network is a closed one. It has to be said (ITU) standardization of IN, an intelligent network
that the focus of the IN standardization was not to pro- conceptual model is defined, layered in four planes
pose and enable new business models for telephony: only depicting different views of the intelligent network
to expedite the cumbersome and costly process of manual architecture from the physical plane up to the service
switch upgrading that made introduction of new services plane. Of these planes we will use as a means of
uneconomical to the point of impeding the further growth comparing the IN architecture with that of distributed
of the industry. Intelligent network succeeded in solving IN, the distributed functional plane. The distributed
this problem by removing the service intelligence from the functional plane identifies the main functional entities
switches and locating that intelligence in a few centralized that participate in the provision of any given IN service
points inside the network. However, the same organization without regard to their location or mapping to physical
continued to assume the role of the network operator or elements of the network (the aggregation of functional
carrier and that of the service provider. Deregulation and entities to physical components is reflected in the physical
the growth of Internet that adheres to a completely dif- plane). The functional models of IN and distributed IN
ferent business model necessitate the separation of these are quite different in certain respects. First, a number of
two roles and so require a technological basis that would emerging distributed IN architectures like Parlay and
support this new model. This is the problem that the next JAIN (discussed later on) incorporate new functional
generation of the intelligent network technologies face. entities to account for the new types of resources that can
The distributed intelligent network represents the next be found in a converged Internet — Telecommunications
stage of the intelligent network evolution. The three main environment and which traditional IN models could not
elements of the traditional intelligent network concept anticipate. Apart from that, a further difference emerges
survive this evolution as could be inferred by the central in the form of the more open environment for service
role they have been shown to play in delivering services provisioning, the distributed or remote-object interactions
to the end users in the traditional IN context. Distributed that characterize distributed IN and the potential use
IN is characterized by the use of distributed object of mobile code technologies that can change the locality
722 DISTRIBUTED INTELLIGENT NETWORKS
of important components at runtime (and with this, IN concept is based. This functional entity is closely linked
the end points of control flows). Finally, the business with the service control function, which is the place where
model of distributed IN with its use of distributed object service logic programs are executed. These are not depicted
technologies explicitly enables and accommodates the since they are thought of as incorporated with the latter.
separation between the roles of the network operator and Finally, the service creation environment function and the
the service provider. service management function are the entities responsible
Some of the abovementioned differences cannot be for the creation of new service logic programs, and their
reflected in the distributed functional plane as defined in subsequent injection and monitoring into the architecture
the IN’s conceptual model as this is at a level of abstraction and eventual withdrawal/superseding by other services or
where such differences are hidden. Therefore, Fig. 1 more up to date versions.
compares the two technologies (traditional and distributed The INAP protocol is executed between the service
IN) by reflecting features of both their distributed and switching and the service control functions that can be
physical planes. regarded as implementing a resource-listener-controller
Figure 1a presents a limited view of the traditional pattern. The service switching function is the resource
IN conceptual model at the distributed and physical that is monitored whereas the service control function
planes. A number of functional entities are there is responsible for setting triggers corresponding to call
identified, of which the more central to this discussion events for which it registers an interest and also the
will be the service creation environment function/service controller that acts on these events when informed of
management function, the service control function, and their occurrence. A typical event for instance would be
the service switching function. Figure 1 also depicts a the detection of a digit pattern that should invoke an
specialized resource function and an intelligent peripheral, IN service (e.g., 0800). The service switching function
which is were announcements are kept and digits undertakes the job of watching for these events and
corresponding to user’s input are collected. The second of suspending call processing when one is detected and
part of Fig. 1 also depicts an abstract entity entitled delivering an event notification to the service control
‘‘Internet/mobile resources,’’ which represents functional function for further instructions. This pattern is the same
entities present in distributed IN that correspond to also in the distributed IN and can in fact also be identified
resource types unique in the Internet or mobile networks in recent telecommunication architectures such as those
components such as mail servers, media conversion articulated by the JAIN and Parlay groups.
servers or location servers. The distributed IN conceptual model (Fig. 1b) differs
Starting from the bottom up, the service switching by (1) explicitly enabling mobility of the service logic
function corresponds to the finite state machine reflecting programs through the use of mobile code technologies
the switch’s call resources that the article drew attention and (2) replacing the message-based INAP protocol with
to as being one of the main artifacts on which the whole distributed processing technologies like CORBA, RMI
(a) (b)
Network operator / Service provider
service provider
Mobile
code
Service control point
Service control point Execution
Service control function
environment
Service creation
Service control function
environment function
API
Service
AP
P
management
INA
IN
I
AP
function
Internet or mobile resources
Network operator
Service switching Mobile
function Service switching code
Specialized function Service
Proprietary Proprietary interface management
resource
interface function Specialized
function
Call control Call control resource
function Intelligent function function
Service switching Signalling peripheral Service switching Intelligent
point point Signalling peripheral
or DCOM. It should be noted of course that both the model was materialized only to a limited extent and mainly
traditional and the distributed IN conceptual models are had to do with different configuration and aggregations
populated by an additional number of functional entities of functional entities to the hardware nodes of the
which are not depicted in Fig. 1 nor enter this discussion system. The interested reader should refer to Ref. 2
for reasons of economy. Code mobility (often implemented for a presentation of the various alternatives that are
using Java and/or some additional mobile code libraries) possible in an IN architecture. This leeway afforded to the
is used to allow service logic programs to reside not only network designer is something completely different from
in the service control function but also in the service the ability of service logic programs to roam through the
switching function where an execution environment various execution environments of the system at runtime
identical with the one that exists within the service without requiring the system to suspend its operation
control function can be found. Service logic programs in the and, under certain configuration options discussed in the
form of mobile code components populate both execution following paragraph, in a manner completely automated
environments and constitute the control part of IN service and transparent even to the network management system.
provisioning. From these execution environments, using
distributed processing technologies, the switch resources, 4. ISSUES IN DISTRIBUTED IN IMPLEMENTATIONS
as exposed by the service switching function objects, are
monitored and manipulated. This means that service logic Given the considerations outlined above and the differ-
programs in the distributed IN conceptual model are ences at the functional plane between traditional and
prime level entities and cannot be suitably depicted as distributed IN architectures, a number of approaches
pinned down inside the implementation of the service exist. Each of these approaches essentially answers a
control function. In contrast to the functional plane defining question in a different manner, and since for the
of the IN conceptual model where the service control most part they are orthogonal to each other, a large num-
function and the service logic programs were one and ber of widely differing distributed IN implementations can
the same, in the functional plane of the distributed be envisaged; all, however, share the same fundamental
IN conceptual model, the service control and also the characteristics of increased flexibility and openness when
service switching functions are containers (or execution compared with traditional IN.
environments) for service components capable of migrating The first point to examine is the set of considerations
to whichever environment is best suited to accommodate that govern the location of service logic programs inside the
their execution. It is important to consider that this network. Attendant to it are the dynamics of their mobility.
amount of flexibility would not be attainable were it This is possible once employment of mobile code tech-
not for the use of distributed processing technologies at nologies is assumed. Service logic programs implemented
the control plane. The defining characteristic of these as mobile components can migrate between execution
technologies is that they abstract process, machine and environments dynamically. Considerations valid for incor-
network boundaries and provide to the programmer and poration in any given distribution algorithm are processing
the runtime instances of service code a view of the network load, signaling load, and functionality. Given a number
as a generalized address space spanning conventional of execution environments present in the network’s ser-
boundaries. In the same manner in which a program vice switching and service control functions, each with
may hold a pointer to an object or function residing its own characteristics, simple or elaborate load balanc-
in the same local process, so it can, when distributed ing mechanisms can be devised and implemented. These
technologies are used, hold a pointer to an object located would allow service logic programs to locate themselves in
in a different process in the same machine, in a machine such a way so that no individual execution environment’s
in the same local network, or in a machine located in a resources are strained beyond a certain point. Apparent
different network. Therefore the abstraction of a single tradeoffs exist with respect to complexity, time, and com-
memory space is supported and compiled code components puting power spent in process or code migrations, which
can be mapped in an arbitrary way in processes and are in themselves costly procedures. This cost is reflected
physical nodes without needing recompilation or rebooting in terms of both time and processor load they temporar-
of the system. This is a potent facility and it opens new ily create as they involve suspension or creation of new
capabilities which the distributed IN employs. It also has threads, allocation of objects in memory, and so on. These
to be stated that the articulation of the traditional IN overheads, when also viewed in the light of the stringent
conceptual model, to an extent, anticipated a certain performance and responsiveness demands that exist in a
flexibility in the allocation of functional entities to telecommunication system, should tilt the balance in favor
physical ones according to the correspondences between of a simple and not very reactive load balancing algorithm
the functional and the physical planes. This was however that operates on ample time scales.
not materialized since the communication infrastructure Processing load is not however the only kind of
used for the conveyance of the INAP information flows load to consider. Control load is another. Control load
between the various entities of the system (and most should not be confused with signaling load, which
importantly between the service switching and the service concerns protocol message exchanges between terminals
control entities) was the signaling system 7 network, and switches at various levels of the classic Five-layered
which does not demonstrate the properties of a distributed telephony network architecture. Signaling load is for the
processing environment. Therefore the full potential most part transparent to the IN with the exception of
inherent in the abstract definition of the IN conceptual those call events that invoke an IN service and for
724 DISTRIBUTED INTELLIGENT NETWORKS
which a service switching function has been asked to data, runs an optimization algorithm and instructs a
suspend call processing and wait for instructions from the number of service logic programs to change their location
service control function. Control load in this discussion in the network according to the results thus produced.
is about the exchange of INAP information flows using The merit of this approach is that the optimization is
the selected distributed processing technology between networkwide and the distribution algorithm can take
the service logic programs and the switching resources into account a full snapshot of the network’s condition
of the system. The location transparency provided by at any given instance of time when it is invoked.
distributed technologies cannot clearly be interpreted to The disadvantage is the single point of failure and
suggest that local or remote interactions take the same the cost of polling for the necessary data and issuing
time. Therefore, when service logic programs are located in the necessary instructions. These communications could
an execution environment closer to the switch they control, negatively affect traffic on the control network, depending,
better performance can be expected. This means that, in of course, on the frequency with which they are executed.
general, the code distribution mechanism, should locate A point to consider in this respect that can lead to
service logic programs at the proximity of the resources more efficient implementations is whether networkwide
with which they are engaged if not at an execution optimization is necessary or whether locally executed
environment collocated with those resources. There is, optimizations could serve the same purpose with similar
however, a tradeoff here between processing and control results and while projecting a much lower burden on the
load. Because of the higher level of abstraction (method communication network. Indeed, it can be shown that it
calls instead of messages) offered to service logic programs is highly unlikely for the purposes of any optimization
when distributed processing technologies are used, it is to be necessary to instruct any given service logic
necessary that, under the hood, the respective middleware program to migrate to an execution environment that
that is used to support this abstraction enters into some is very remote to the one in which it was until that
heavy processing. For every method call that is issued, time executed. To see that this is the case, one can
it is necessary that each argument’s transitive closure consider that a migration operation moving a service
is calculated and then the whole structure is serialized component to a very distant location will most likely
into an array of bytes for subsequent transmission in the result in an unacceptably higher propagation delay that
form of packets using the more primitive mechanisms would degrade the responsiveness of the corresponding
offered by the network layer. On the recipient side, the resource–service logic link. Therefore locally carried
reverse procedure should take place. This set of processes optimizations could result in comparable performance
is known as marshaling/demarshaling and is known benefits at a greatly reduced communication cost when
to be one of the most costly operations of distributed compared to a networkwide optimization. Therefore,
processing. Because of the processor intensive character in each subnetwork an entity (migration manager)
of marshaling/demarshaling operations, it is possible can be responsible for performing local optimizations
that, under certain configurations, better performance is and instructing service components to assume different
attained when the service logic program is located to a configurations accordingly. Taking this notion to its logical
remote execution environment than to a local, strained, conclusion, a further implementation option would be
one. This in spite of the fact that the time necessary for to have service logic programs as autonomous entities
the propagation of the serialized byte arrays in the form (mobile agents) that are responsible for managing
of packets from the invoking to the invoked party will their own lifecycle and proactively migrating to where
be higher in the remote case. This interrelation between they evaluate their optimal location to be. As before,
processing and control load means that no clear set of considerations for deriving such an optimal location can
rules can be used to produce an algorithm that is efficient be signaling or control load experienced at the execution
in all cases. The complex nature of the operations that environment where they were hosted but also, more
take place in the higher software layers means that this appropriately in this case, the location of mobile users.
is not an optimization problem amenable to be expressed Indeed, in this last scenario one can have a large
and solved analytically or even by means of simulation, population of service code components, each instantiated
and therefore an implementor should opt for simple, to serve a specific mobile (roaming) user. See Ref. 3 for
heuristic designs, perhaps also using feedback or historic a discussion of this approach. Service logic programs
data. Another approach would be to rely on clusters of could then evaluate their optimal location in the network,
application servers into which service logic programs can also taking into consideration the physical location of the
be executed. Commercially available application servers user they serve. This can, for instance, lead to migration
enable process migration and can implement a fair amount operations triggered by the roaming of a mobile user.
of load balancing operations themselves, transparently to In this manner concepts like that of the virtual home
the programmer or the runtime instances of the service environment for both terminal and personal mobility can
components. be readily supported enabling one user to have access
The second point that can lead to differentiations in to the same portfolio of intelligent services irregardless
architecture design in distributed intelligent network of the network into which he/she is roaming and subject
is the question of who makes the abovementioned only to limitations posed by the presentation capabilities
optimization and distribution decisions. There could be a of the terminal devices he/she is using. Of course, the
central entity that periodically polls the various execution usefulness of autonomous service logic programs is not
environments receiving historic processing and signal load limited to the case of roaming or personal mobility
DISTRIBUTED INTELLIGENT NETWORKS 725
users, but this example provides an illustration of the interested reader is referred to Ref. 5 for an in-depth look
amount of flexibility that becomes feasible when advanced of the implementation of a distributed IN architecture
software technologies such as distributed processing using CORBA.
environments and mobile code are used in the context of an
intelligent network architecture. Moreover, as more logic
5. A DISTRIBUTED IN ARCHITECTURE IN DETAIL
is implemented in the form of mobile code components,
the supporting infrastructure can become much more Figure 2 depicts a typical distributed IN implementation
generic, requiring fewer modifications and configuration in more detail. The main components are (1) the network
changes and able to support different service paradigms. resources that provide the actual call control and
The generic character of the architecture means that media processing capabilities, (2) the remotely enabled
stationary code components that require human individual distributed objects that wrap the functionality provided by
or management intervention for their installation or the physical resources and expose it as a set of interfaces
modification are less likely to require tampering with, to the service layer, (3) the code components residing
and that a greater part of the service logic can be in the service layer that incorporate the business logic,
deployed from a remote management station dynamically, and (4) the communication infrastructure that makes
at runtime, contributing to the robustness of the system communication between the service logic programs and the
and to an increased uptime. Naturally, the tradeoff is that network resources, via their wrappers, possible. Of course,
making service logic programs more intelligent necessarily in an actual architecture a lot more components would
means increasing their size, straining both the memory need to be identified, such as service management stations,
resources of the execution environments into which they and service creation systems. However, the purpose of
are executed and also requiring more time for their the discussion presented here is to allow more insight,
marshaling and demarshaling when they migrate from one where appropriate, into the more technical aspects of
functional entity to another. If their size exceeds a certain a DIN implementation and not to provide a full and
threshold, migration operations may become too costly comprehensive description of a real system.
and thus rarely triggered by the load balancing algorithm Notice, first, that terminals are not depicted in Fig. 2.
they implement, negating the advantages brought by This is because interaction with terminals at the signaling
their increased autonomy and making services more level is exactly the same as in the case of traditional IN.
stationary and less responsive in changing network or Signaling emitted by the terminals is used to trigger an IN
usage conditions. Again, simpler solutions may be more session and signaling toward terminals or media resources
appropriate and a designer should exercise caution in of an IN system (like Intelligent peripherals or specialized
determining which logic will be implemented in the resource functions) is used to carry out the execution of
form of stationary code components, engraved in the a service. However, service components do not directly
infrastructure, and which will be deployed at runtime perceive signaling as they interact with their peer entities
using mobile code. on the network side by means of message- or method-
Another major issue to consider in the implementation based protocols. INAP is a typical method based protocol
of a distributed intelligent network architecture is which used at the control plane of IN and Parlay, JAIN, or a
distributed processing technology to use and how to version of INAP based on remote application programming
support its interworking with the signaling system 7 interfaces (APIs) are prime candidates for the control
network that interconnects the physical components plane of distributed IN. In any case it is the responsibility
hosting the functional entities in any intelligent network of the network resources to examine signaling messages
architecture. There are three main competing distributed and initiate an IN session where appropriate, or, in the
technologies: CORBA, DCOM, and RMI. It is not the opposite direction, to receive the instructions sent to them
purpose of this article to examine or compare these three by the service logic programs and issue the appropriate
technologies, nor to identify their major strengths and signaling messages to bring these instructions into effect.
weaknesses. However, for the purposes of an intelligent Therefore, since signaling is conceptually located ‘below’
network implementation, CORBA is best suited because the architecture presented in Fig. 2, it can safely be
it is the only one of these three technologies to have omitted in the discussion that follows. The reader who
interworking between it and the signaling system 7 wishes to gain a fuller appreciation of the temporal
network prescribed [4]. The alternative would be to bypass relationships between the signaling and INAP messages
the native IN control network and install a private internet that are issued in the course of the provisioning of an IN
over which the traffic from DCOM or RMI could be session can refer to Ref. 5.
conveyed. Since this is not a development that operators Assuming a bottom–up approach, the first step would
can be expected to implement quickly and for reasons be to explore the programmability characteristics of the
of providing a solution that has the benefit of backward deployed equipment that forms the basis of a distributed
compatibility and also allows an evolutionary roadmap IN architecture. We refer to deployed equipment as
to be defined, CORBA should be the technology of choice distributed IN, like IN before it, which has to be able to
in distributed IN implementations. A further reason is encompass and rely on equipment that is already deployed
that CORBA, which is targeted more than the other in the field if it is to succeed. Telecom operators have
two technologies for the telecommunications domain, has made huge investments in building their networks, and
more advanced real time characteristics and is therefore it would be uneconomical to replace all this equipment.
better equipped to meet the demands of an operator. The At the very least, an evolutionary approach should
726 DISTRIBUTED INTELLIGENT NETWORKS
Service layer
Service Application
Service
logic servers
logic
program Migration program
manager
Execution
Execution Execution environment
Execution
environment environment environment
Execution
environment
Mobile objects
over CORBA
CORBA CORBA
CORBA / IN interworking
Private IP network SS7
Service
logic Mail
program Switch Media
server
interface interface
interface
CORBA CORBA CORBA
Execution server server server
environment
be possible to implement where equipment would be nevertheless, that Parlay, which is one of a group of
replaced gradually over a period of years and the two emerging technologies that can fall under the ‘‘distributed’’
technologies, IN and distributed IN, would coexist. It IN heading, also allows for an approach of using existing
would be further advantageous if existing equipment IN interfaces as an interim solution before it can be
could be seamlessly integrated into the new distributed provided natively in switches as IN does). Below this
intelligent network infrastructure. To address this issue level one, can find only proprietary APIs at different levels
we note that in general, network switches are not meant of abstraction.
to be overly programmable. Certain proprietary APIs are It is therefore possible that a certain operation that a
always offered that provide an amount of control over how developer would wish to expose to the service components
the switch responds to an incoming call, but there are few cannot be implemented because the programmability
specifications defining standard interfaces that network characteristics of the switch would not allow the
switches should offer, and in any case there is much programming of this behavior. Assuming however that
discrepancy between the equipment provided by different the semantic gap between the operations that a network
vendors. Of course, the IN itself could be regarded as a resource needs to make available for remote invocation
standard interface for network switches, but the purpose and the programming facilities that are available for
here is not to build the new distributed IN infrastructure implementing these operations, can be bridged, generally
over the already existing IN one. Rather, it is to build accepted software engineering principles, object-oriented
it on top of the same lower-level facilities and services or otherwise could serve to provide the substrate of
that IN is using as part of its own implementation: the distributed IN [6]. This substrate consists of set of
and such lower-level facilities are accessible only to remotely enable objects (CORBA objects ‘‘switch interface,’’
equipment vendors. An external integrator exploring the ‘‘media interface,’’ and ‘‘mail server interface’’ depicted in
programmability features of a switch will be able to Fig. 2) that expose the facilities of the network resources
find standard interfaces only at the IN level, which, as they represent. Once this critical implementation phase
we noted, is conceptually very elevated to be useful as has been carried out, subsequent implementation is
the infrastructure of a distributed IN architecture (note, entirely at the service layer. The ‘‘switch interface,’’ ‘‘media
DISTRIBUTED INTELLIGENT NETWORKS 727
interface,’’ and ‘‘mail server interface’’ objects are each interfaces from the original protocol specification and to
responsible for exposing a remotely accessible facet of express the protocol semantics in terms of method calls
the functionality of the network resource they represent. and argument passing instead of asynchronous exchange
Their implementation has then to mediate the method of messages and packets with their payloads described
invocations it receives remotely, to the local proprietary in Abstract Syntax Notation 1. In that sense, API-based
API that each network resource natively provides. In the versions of INAP can be used to implement the control
opposite direction, one has events that are detected by plane of the distributed intelligent network, and so can,
the network resources and have, ultimately, to reach for that matter, emerging technologies such as Parlay and
the components residing in the service layer. This is JAIN. The next section discusses such approaches.
accomplished by having the CORBA objects register
themselves as listeners for these events. The process 6. EMERGING ARCHITECTURES
of registering an external object as a listener involves
identifying the events to which it is interested and A number of architectures and most notably JAIN and
providing a callback function or remote pointer that will Parlay have emerged that, although not classified under
serve to convey the notification when an event satisfying the caption of ‘‘distributed intelligent network,’’ have nev-
the criteria is encountered. From that point on, the ertheless many similarities with it. Distributed intelligent
implementation of the CORBA object will itself convey the networks make telecommunications service provisioning
notification to the higher software layers. It is interesting more open by resting on distributed object technologies
to note that at this second stage the same pattern of and utilizing software technologies such as mobile code,
listener and controller objects is also observed, with the which make the resulting implementation more flexible
exception that now the listener objects are the service and responsive to varying network conditions. Parlay and
logic programs or, in general, the service control logic in JAIN move a step further toward this direction again by
the network and the resource is the CORBA object itself. leveraging on CORBA, DCOM, or RMI to implement a
This observation is depicted at Fig. 3. yet more open service model that allows the execution
Through this wrapping of the native control interface of the services to be undertaken by actors different than
to remote API-based ones two things are achieved: (1) the the operator’s organization, in their own premises closer
middleware objects residing at the CORBA servers in the to the corporate data they control and manipulate [7]. In
middle tier of Fig. 3 can be used to implement standardized particular, Parlay and JAIN use the same approach as in
interfaces. As an example, Fig. 3 indicates INAP or Parlay Ref. 5 of defining methods corresponding more or less to
API. INAP is, of course, a message-based protocol, but actual INAP information flows and of exposing a switch’s
it is relatively straightforward to derive method — based (or, for that matter, any other network resident resources)
Service layer
Network - enterprise
Listener interfaces side boundary
INAP or according to Parlay
Parlay
Controls
API
level
Notifies
CORBA server
Control interfaces
Middleware
Listener interfaces
Controls
Notifies
Proprietary
API
level Telecom
resource
Control interfaces
Call control resources
Figure 3. Parlay or other remote API implemented in middleware form over a native resource.
728 DISTRIBUTED INTELLIGENT NETWORKS
5. F. Chatzipapadopoulos, M. Perdikeas, and I. Venieris, Mobile the same scenario (negative correlation) applies. In a way,
agent and CORBA technologies in the broadband intelligent expert investment firms, claiming to provide a diversified
network, IEEE Commun. Mag. 38(6): 116–124 (June 2000). portfolio to investors, try to do the same. They opt for
6. I. Venieris, F. Zizza, and T. Magedanz, eds., Object Oriented capital investment from among those economy sectors
Software Technologies in Telecommunications, Wiley, 2000. whose mutual fund return values are to some degree
7. M. Perdikeas and I. Venieris, Parlay-based service engineering negatively correlated or at least fluctuate, independently
in a converged Internet-PSTN environment, Comput. Networks from one sector to another. Consequently, over a long
35(6): 565–578 (2001). time period, there will be a net gain associated with the
diversified portfolio.
The principles of diversity combining have been
DIVERSITY IN COMMUNICATIONS known in the radio communication field for decades;
the first experiments were reported in 1927 [1]. In
MOHSEN KAVEHRAD diversity transmission techniques, one usually settles for
Pennsylvania State University fluctuations of signal transmissions over each channel that
University Park, Pennsylvania are more or less uncorrelated with those in other channels
and the simultaneous loss of signal will occur rarely over a
number of such channels. In order to make the probability
1. INTRODUCTION of signal loss as low as possible, an effort is made to find
many channels that are either statistically independent
In designing a reliable communication link, the system or negatively correlated. This may be performed over the
must be planned around the chosen transmission medium dimensions of time, frequency and space in a wireless
referred to as the channel. The disturbances of the medium system. For this purpose, it is occasionally possible to
must be taken into account in the process of encoding the use two different polarizations on the receiving antennas,
signal at the transmitting end, and in the process of or receivers at several different angles of arrival for
extracting the message from the received waveform at the electromagnetic wavefront, or to place antennas in
the receiving end. Once a satisfactory characterization several different spatial locations (spatial diversity) or
of the anticipated channel disturbances has been made, to transmit the signal over several widely separated
the message encoding chosen for transmission must be carrier frequencies or at several widely separated times
designed so that the disturbances will not damage the (time diversity). The term ‘‘diversity improvement’’ or
message beyond recognition at the receiving end. With a ‘‘diversity gain’’ is commonly employed to describe the
corrupted message at hand, the receiving system must be effectiveness of various diversity configurations. There is
prepared to operate continuously in the presence of the no standard definition for the effectiveness of diversity
disturbances and to take maximum advantage of the basic reception techniques. One common definition is based on
differences between the characteristics of messages and of the significance of diversity in reducing the fraction of the
disturbances. time in which the signal drops below an unusable level.
In this article, we assume that the encoded form Thus, one may define an outage rate at some specified
in which the message is to be transmitted has been level, usually with respect to the mean output noise level
selected, and that the encoded form has been translated of the combiner of the diversity channel outputs. In the
into a radiofrequency (RF) or lightwave signal by an rest of this section, the problem of correlation among such
appropriate modulation technique, such as by varying multiport channels is discussed.
some distinguishable parameter of a sinusoidal carrier Assume that a number of terminal pairs are available
in the RF or optical frequency spectrum. Improvements for different output signals yi (t) from one or more input
in system performance can be realized only through signals xj (t). When frequency (or time) diversity is used,
the utilization of appropriate corrective signal processing there is no mathematical distinction between a set
measures. Of primary interest here will be what is widely of multi-terminal-pair channels, each centered on the
known as diversity techniques [1,2] as countermeasures different carriers (or different times) and a single channel
for combating the effects of loss of received signal energy whose system function encompasses all frequencies (or all
in parts or over its entire transmission bandwidth, termed times) in use. In practice, since one may use physically
as signal fading. different receivers, the use of separate system functions
In many practical situations, one seeks economical ways to characterize the outputs from each receiver is useful.
of either transmitting and/or receiving signals in such a If space diversity is used, one may be concerned with the
way that the signal is never completely lost as a result of system function that depends on the spatial position as a
transmission disturbances. This has been traditionally the continuous variable.
case, in particular, in wireless communications. Ideally, The cross-correlation function between the outputs of
one would like to find transmission methods that are two diversity channels when the channels are both excited
negatively correlated in the sense that the loss of signal by the same signal xj (t) is fully determined by a complete
in one channel is offset by the guaranteed presence knowledge of the system functions for each channel alone.
of signal in another channel. This can occur in some In view of the random nature of the channels, the most
diversity systems, such as those that utilize antennas that can be done to provide this knowledge in practice
at different elevations in order to minimize the received is to determine the joint statistical properties of the
signal loss of energy. Also, in Section 4.3 of this article channels.
730 DIVERSITY IN COMMUNICATIONS
The signal diversity techniques mentioned above do Propagation media are generally time-varying in
not all lead to independent results. One must, therefore, character, and this causes transmitted signals to fluctuate
recognize those diversity techniques that are dependent randomly with time. These fluctuations are usually of
in order to avoid trying to ‘‘squeeze blood out of a bone.’’ three types:
For example, one cannot apply both frequency and time
diversity to the same channel in any wide sense. In fact one 1. Rapid fluctuations, or fluctuations in the instan-
might argue that complete distinction between frequency taneous signal strength, whose cause can be traced to
and time diversity may be wholly artificial, since the signal interference among two or more slowly varying copies of
designer usually has a given time–bandwidth product the signal arriving via different paths. This may conve-
available that can be exploited in conjunction with the niently be called multipath fading. If the multiple paths
channel characteristics. Another pair of diversity channels are resolved by the receiver [2], fading is called frequency-
that are not necessarily independent of each other are selective. Otherwise, it is called single-path (flat) fading.
distinguished as angular diversity and space diversity This type of fading often leads to a complete loss of the
channels. Consider an array of n isotropic antennas message during time intervals that are long even when
that are spaced a sufficient distance apart so that the compared with the slowest components of the message.
mutual impedances between antennas can be ignored. It is observed, however, that if widely spaced receiving
If the transmission characteristics of the medium are antennas are used to pick up the same signal, then the
measured between the transmitting-antenna terminals instantaneous fluctuations in signal-to-noise ratio (SNR)
and the terminals of each of the n receiving antennas, at any one of the receiving sites is almost completely inde-
then n channel system functions will result, each of pendent of the instantaneous fluctuations experienced at
which is associated with one of the spatially dispersed the other sites. In other words, at times when the signal
antennas. If the antenna outputs are added through a at one of the locations is observed to fade to a very low
phase-shifting network, the resultant array will have a level, the same signal at some other sufficiently distant
receiving pattern that can be adjusted by changing the site may very well be at a much higher level compared
phase shifting network to exhibit preferences for a variety to its own ambient noise. This type of variation is also
of different angles of arrival. The problem of combining the referred to as macrodiversity. Signals received at widely
outputs of the array through appropriate phase shifters, in spaced time intervals or widely spaced frequencies also
order to achieve major lobes that are directed at favorable show almost completely independent patterns of instan-
angles of arrival would be considered a problem in angular taneous fading behavior. Nearly uncorrelated multipath
diversity, while the problem of combining the outputs of fading has also been observed with signal waves differing
the elements in order to obtain a resultant signal whose only in polarization. It will be evident that by appropriate
qualities are superior to those of the individual outputs selection or combining techniques, it should be possible to
is normally considered as the problem of space diversity obtain from such a diversity of signals a better or more
combining, yet both can lead to the same end result. Signal reliable reception of the desired message than is possible
diversity techniques and methods of combining signals from processing only one of the signals all the time.
from a number of such channels are discussed in the next 2. The instantaneous fluctuations in signal strength
section. occur about a mean value of signal amplitude that changes
relatively so slowly that its values must be compared
2. DIVERSITY AND COMBINING TECHNIQUES at instants separated by minutes to hours before any
significant differences can be perceived. These changes
Diversity is defined here as a general technique that in short-term (or ‘‘hourly’’) mean signal amplitude are
utilizes two or more copies of a signal with varying usually attributable to changes in the attenuation in
degrees of disturbance to achieve, by a selection or the medium that the signals will experience in transit
a combination scheme, a consistently higher degree of between two relatively small geographic or space locations.
message recovery performance than is achievable from No significant random spatial variations in the received
any one of the individual copies, separately. Although mean signal amplitude are usually perceived in receiving
diversity is commonly understood to aim at improving localities that could be utilized for diversity protection
the reliability of reception of signals that are subject to against this attenuation fading or, as sometimes called,
fading in the presence of random noise, the significance ‘‘fading by shadowing’’. However, it is possible to combat
of the term will be extended here to cover conceptually this type of fading by a feedback operation in which
related techniques that are intended for other channel the receiver informs the transmitter about the level of
disturbances. the received mean signal amplitude, thus ‘‘instructing’’
The first problem in diversity is the procurement of it to radiate an adequate amount of power. But the
the ‘‘diverse’’ copies of the disturbed signal, or, if only usual practice is to anticipate the greatest attenuation
one copy is available, the operation on this copy to to be expected at the design stage and counteract it by
generate additional ‘‘diversified’’ copies. When the signal is appropriate antenna design and adequate transmitter
disturbed by a combination of multiplicative and additive power.
disturbances, as in the case of fading in the presence of 3. Another type of attenuation fading is much slower
additive random noise, the transmission medium can be than that just described. The ‘‘hourly’’ mean signal levels
tapped for a permanently available supply of diversity are different from day to day, just as they are from hour
copies in any desired numbers. to hour in any single day. The mean signal level over one
DIVERSITY IN COMMUNICATIONS 731
day changes from day to day and from month to month. signals simultaneously and selects only the best one for
The mean signal level for a period of one month changes delivery is conceptually (although not always practically)
from month to month and from season to season, and preferable. Such a technique is referred to as optimal
then there are yearly variations, and so on. As in the selection diversity.
case of the ‘‘hourly’’ fluctuations in paragraph 2, the long- In the combining techniques, all the available noisy
term fluctuations are generally caused by changes in the waveforms, good and poor, are utilized simultaneously as
constitution of the transmission medium, but the scale and indicated in Eq. (1); the ak values are all nonzero all the
duration of these changes for the long-term fluctuations time. Of all the possible choices of nonzero ak values, only
are vastly greater than those for the ‘‘hourly’’ changes. two are of principal interest. First, on the assumption that
Diversity techniques per se are ineffective here. there is no a priori knowledge or design that suggests that
some of the fk (t) values will always be poorer than the
In addition to the instantaneous-signal diversity that others, all the available copies are weighted equally in
can be achieved by seeking two or more separate channels the summation of Eq. (1) irrespective of the fluctuations in
between the transmitting and receiving antennas, certain quality that will be experienced. Thus, equal mean values
types of useful diversity can also be achieved by of signal level and equal RMS (root-mean-square) values of
appropriate design of the patterns of two or more noise being assumed, the choice a1 = a2 = · · · = ak is made,
receiving antennas placed essentially in the same location and the technique is known as equal-weight or equal-
(microdiversity), or by operations in the receiver on only gain combining. The second possible choice of nonzero
one of the available replicas of a disturbed signal. The weighting factors that is of wide interest is one in which
usefulness of ‘‘receiver diversity’’ of a disturbed signal ak depends upon the quality of fk (t) and during any short
will be demonstrated in examples of the next section. time interval the ak values are adjusted automatically to
The application discussed in Section 4.1 demonstrates yield the maximum SNR for the sum f (t). This is known
the use of diversity (under certain circumstances) from as maximal ratio combining.
a delayed replica of the desired signal arriving via a In the alternate switching–combining technique a
different path of multiple fading paths. The latter is number of the ak values up to K − 1 can be zero during
referred to as multipath diversity, where the same message certain time intervals because some of the available sig-
arrives at distinct arrival times at a receiver equipped to nals are dropped when they become markedly noisier than
resolve the multipath into a number of distinct paths [3] the others. This approach is based on the fact that the per-
with different path lengths. The example presented in formance of an equal-gain combiner will approximate that
Section 4.2 shows application of diversity for the case in of the maximal ratio combiner as long as the SNRs of the
which the interference from some other undesired signal various channels are nearly equal. But if any of the SNRs
source is the cause of signal distortion. become significantly inferior to the others, the overall SNR
The second problem in diversity is the question of how can be kept closer to the maximum ratio obtainable if the
to utilize the available disturbed copies of the signal in inferior signals are dropped out of the sum f (t).
order to achieve the least possible loss of information Over a single-path fading channel, implementation
in extracting the desired message. The techniques that of selection combining does not require any knowledge
have thus far been developed can be classified into about the channel, that is, no channel state information
(1) switching, (2) combining, and (3) a combination of (CSI) is necessary at the receiver, other than that needed
switching and combining. These operations can be carried for coherent carrier recovery, if that is employed. The
out either on the noisy modulated carriers (predetection) receiver simply selects the diversity branch that offers
or on the noisy, extracted modulations that carry the the maximum SNR. For equal-gain combining some CSI
message specifications (postdetection). In any case, if K estimation can be helpful in improving the combiner
suitable noisy waveforms described by f1 (t), f2 (t), . . . , fk (t) performance. For example, in a multipath diversity
are available, let the kth function be weighted by the factor receiver, the maximum delay spread of a multipath fading
ak , and consider the sum channel, which is indicative of the channel memory length,
can guide the integration time in equal-gain combining,
K
such that the combiner can collect the dispersed signal
f (t) = ak fk (t) (1)
k=1
energy more effectively and perhaps avoid collecting noise
over low signal energy time intervals [4]. Maximal ratio
In the switching techniques only one of the ak values combining (MRC) performs optimally, when CSI estimates
is different from zero at any given time. In one of on both channel phase and multiplicative path attenuation
these techniques, called scanning diversity, the available coefficient are available to the combiner. MRC, by using
waveforms are tried one at a time, in a fixed sequence, until these estimates and proper weighting of the received signal
one is found whose quality exceeds a preset threshold. on each branch, yields the maximum SNR ratio for the sum
That one is then delivered for further processing in order f (t), compared to the selection and equal-gain combining.
to extract the desired message, until its quality falls In the MRC case, diversity branches that bear a strong
below the preset threshold as a result of fading. It is signal are accentuated and those that carry weak signals
then dropped and the next one that meets the threshold are suppressed such that the total sum f (t) will yield
requirement in the fixed sequence is chosen. In scanning the maximum SNR [2]. This is similar to the philosophy
diversity, the signal chosen is often not the best one that in a free-market society, by making the rich richer,
available. A technique that examines the K available the society as a whole is better off, perhaps because of
732 DIVERSITY IN COMMUNICATIONS
FSK detection
2 coherent increases above the m = 1 value.
detection
10−2 3. DIVERSITY THROUGH CHANNEL CODING WITH
5 DPSK INTERLEAVING
2 PSK
In general, time and/or frequency diversity techniques
10−3 may be viewed as a form of trivial repetition (block)
5 coding of the information signal. The combining techniques
2 can then be considered as soft-decision decoding of the
trivial repetition codes. Intelligent (nontrivial) coding in
10−4
conjunction with interleaving provides an efficient method
5 of achieving diversity on a fading channel, as emphasized,
2 for example, by chase [6]. The amount of diversity provided
by a code is directly related to the code minimum distance,
10−5
0 5 10 15 20 25 30 35 dmin . With soft-decision decoding, the order of diversity
SNR per bit, g b (dB) is increased by a factor of dmin , whereas, if hard-decision
decoding is employed, the order of diversity is increased by
Figure 1. Performance of binary signaling on a Rayleigh fading
channel (from Proakis [2]).
a factor of dmin
2
. Note that, although coding in conjunction
with interleaving can be an effective tool in achieving
diversity on a fading channel, it cannot help the signal
quality received through a single stationary antenna
10−1 located in a deep-fade null. The interleaving will not be
5 Key helpful, since in practice, the interleaving depth cannot be
FSK indefinite.
2
DPSK In the next section, we present some examples
−2
10 illustrating the benefits of diversity combining in various
PSK
5 communications applications.
Probability of a bit error, P2
2
4. APPLICATION EXAMPLES
10−3
5 Three examples, illustrating benefits of diversity tech-
niques in practical communications systems, are presented
2
in this section.
10−4
5 4.1. Wideband Code-Division-Multiple-Access Using
L =1 Direct-Sequence Spread-Spectrum (DSSS) Communications
2 One application [4] treats a wireless cellular scenario.
10−5 This represents the up and downstream transmission
L =4
5 mode, from user to base station (BS) and from the base to
L =2 user, of a wireless local-area network (LAN) with a star
2 architecture. Each user has a unique DSSS code and a
10−6 correlator exists for each user in a channel bank structure
5 10 15 20 25 30 35 40
at the base station. The output of the bank of correlators
SNR per bit, gb (dB)
is fed to the usual BS circuitry and call setup is handled
Figure 2. Performance of binary signals with diversity (from using standard BS features. Average power control is used
Proakis [2]). in this system to avoid the classical near/far problem.
In wireless LAN we have a severe multipath fading
problem. It is really a classical Rayleigh multipath, fading
signal values. Statistically, the Rayleigh PDF is modified channel scenario. The role of asynchronous transmission is
to a Rice PDF [2] that is much milder in the manner clear — a user transmits at random. The signal arrives at
by which it affects the transmitted signal. Transmission the correlator bank and is detected, along with interfering
over a short (∼3 km) line-of-sight microwave channel is signals. Because the broad bandwidth of a DSSS signal
often subject to Rice fading. Finally, another PDF that can indeed exceed the channel coherence band (this is
is frequently used to describe the statistical fluctuations the channel band over which all frequency components
of signals received from a multipath fading channel is of the transmitted signal are treated in a correlated
734 DIVERSITY IN COMMUNICATIONS
manner), there is inherent diversity in transmission that in order to make this feasible, the communication system
can be exploited as multipath diversity [3] by a correlation design has to offer solutions to the problems associated
receiver. The psuedonoise (PN) sequences used as direct with IR propagation in a noisy environment. Various
sequence spreading codes in this application are indeed link designs may be employed in indoor wireless infrared
trivial repeat codes. By exclusive-OR addition of a PN communication systems. The classification is based on
sequence to a data bit, the narrowband of data is spread the degree of directionality and existence of a line-
out to the level of the wide bandwidth of PN sequence. of-sight path between a transmitter and a receiver.
A correlation receiver that knows and has available the In one configuration, instead of transmitting one wide
matching PN sequence, through a correlation operation, beam, multi-beam transmitters are utilized [9]. These emit
generates correlation function peaks representing the multiple narrow beams of equal intensity, illuminating
narrowband information bits with a correlation function multiple small areas (often called diffusing spots) on
base, in time, twice the size of a square PN sequence a reflecting surface. Each beam aims in a prespecified
pulse, called a PN chip. In this application, the correlator, direction. Such a transmitting scheme produces multiple-
or matched-filter output, will be a number of resolved line-of-sight (as seen by the receiver) diffusing spots, all of
replicas of the same transmitted information bit, displayed equal power, on an extended reflecting surface, such as a
by several correlation peaks. The number of correlation ceiling of a room. Each diffusing spot in this arrangement
peaks representing the same information bit corresponds may be considered as a secondary line-of-sight light
to the diversity order, in this application. Many orders of source having a Lambertian illumination pattern [10].
diversity can be achieved this way. Compared to other configurations, this has the advantage
The replicas obtained this way may now be presented of creating a regular grid of diffusing spots on the
to a combiner for diversity gain. A simple equal gain ceiling, thus distributing the optical signal as uniformly
combiner has been adopted [4] that is by far simpler in as possible within the communication cell as shown in
implementation than a RAKE receiver [5]. The multipath Fig. 3. A direction diversity (also known as angle diversity)
diversity exploitation in conjunction with multiantenna receiver that utilizes multiple receiving elements, with
space diversity [7] establishes a foundation for joint space each element pointed at a different direction, is used, in
and time coding. order to provide a diversity scheme for optimal rejection of
ambient noise power from sunlight or incandescent light,
4.2. Indoor Wireless Infrared (IR) Communications for example, and to substantially reduce the deleterious
effects of time dispersion via multiple arrivals of reflecting
The purpose of using infrared wireless communication
light rays, causing intersymbol interference in digital
systems in an indoor environment is to eliminate wiring.
transmission. The composite receiver consists of several
Utilization of IR radiation to enable wireless communi-
narrow field-of-view (FoV) elements replacing a single
cation has been widely studied and remote-control units
element wide-FoV receiver. The receiver consists of more
used at homes introduce the most primitive applications
than one element in order to cover several diffusing
of this type of wireless systems. A major advantage of an
spots, thus ensuring uninterrupted communication in case
IR system over an RF system is the absence of electromag-
some of the transmitter beams are blocked. Additionally,
netic wave interference. Consequently, IR systems are not
a multiple-element receiver provides diversity, thus
subject to spectral regulations as RF systems are. Infrared
allowing combining of the output signals from different
radiation, as a medium for short-range indoor communica-
receiver elements, using optimum combining methods.
tions, offers unique features compared to radio. Wireless
An increased system complexity is the price that one
infrared systems offer an inherent spatial diversity, mak-
has to pay to escape from restrictions of line-of-sight
ing multipath fading much less of a problem. It is known
links, retaining the potential for a high capacity wireless
that the dimension of the coherence area of a fully scat-
communication system.
tered light field is roughly of the order of its wavelength [8].
For indoor/outdoor wireless transmission systems, use
This is due to the fact that the receive aperture diam-
of multiple antennas at both transmit and receive sides
eter of a photodiode by far exceeds the wavelength of an
to achieve spatial diversity at RF has gained a significant
infrared waveform. Therefore, the random path phase of a
amount of attention. This is motivated by the lack of
fading channel is averaged over the photo-receiver surface.
Hence, the signal strength fluctuations that are caused by
phase cancellations in the RF domain are nonexistent in
the optical domain. An optical receiver actually receives
a large number (hundreds of thousands) of independent
signal elements at different locations on the receiving
aperture of the photodiode. This in fact provides spatial
diversity, which is very similar to employing multiple,
geographically separated antennae in an RF fading envi-
ronment. In summary, because of the inherent diversity T
channels, the frequency-selective fading effect at the opti-
R
cal carrier frequency level is not a problem in an IR system.
Since infrared transmission systems use an optical
medium for data transmission, they have an inherent Figure 3. Multispot diffusing configuration (T — transmitter;
potential for achieving a very high capacity level. However, R — receiver).
DIVERSITY IN COMMUNICATIONS 735
available bandwidth at the low-frequency end of the end. We have concentrated in this article on diversity
radio spectrum. The approach, in a way, is similar to and the effects it may have upon the signal reliabil-
the IR system described earlier. The capacity increase ity. It is established that ‘‘diversification’’ offers a gain
and spatial multiplexing for high-data-rate transmissions in reliable signal detection. However, a wireless channel
via multiple transmit antennas have been illustrated in offers endless possibilities over a multiplicity of dimen-
Ref. 11. Earlier, transmission of orthogonally-polarized sions. Diversity is only one way of introducing a long-term
transmitted digitally modulated RF waveforms was average gain into the detection process. More recently,
introduced to achieve capacity increase over multipath the availability of low-cost and powerful processors and
fading channels of point-to-point high-data-rate digital the development of good channel estimation methods have
radio routes [12]. rejuvenated an interest in adaptive transmission rate tech-
niques with feedback. This new way of thinking is termed
4.3. Polarization-Insensitive Fiberoptic Communications opportunistic communication, whereby dynamic rate and
power allocation may be performed over the dimensions
In a coherent optical receiver, receiving a modulated light-
of time, frequency, and space in a wireless system. In
wave on a single-mode fiber, the polarization state of the
a fading (scattering) environment, the channel can be
received optical signal must be matched with that of the
strong sometimes, somewhere, and opportunistic schemes
receiver local laser source signal. A mismatch reduces (per-
can choose to transmit in only those channel states. Obvi-
haps drastically) the received signal strength. Unless the
ously, some channel state information is required for an
polarization states of the light fields are controlled care-
opportunistic communication approach to be successful.
fully, receiver performance degradation is unavoidable.
Otherwise, it becomes like shooting in the dark. This is in
The problem encountered here is very similar to the flat
some respects similar to building a financial investment
fading on a single-path fading channel. Occasionally, the
portfolio of stocks, based on some ‘‘insider’s’’ information.
entire band of the received signal is severely attenuated.
Clearly, it results in more gain compared to traditional
In an earlier study [13] we examined a polarization-
methods of building a diversified portfolio of stocks, based
insensitive receiver for binary frequency shift keying
on long-term published trends. Similarly, one would expect
(FSK) transmission with discriminator demodulation.
a great loss, if the ‘‘insider’s’’ information turned out to
The polarization-state-insensitive discriminator receiver
be wrong. Consequently, opportunistic communications
is shown in Fig. 4. The two branches carry horizontally
based on wrong channel states will result in a great loss
and vertically polarized optical beams obtained through a
in the wireless network capacity. Thus, in those wireless
polarization beamsplitter. The information-carrying opti-
applications where reliable channel states may easily be
cal beams are subsequently heterodyne demodulated
obtained, it is possible to achieve enormous capacities, at a
down to FSK-modulated IF signals and are received
moderate realization complexity.
by the discriminators for demodulation. The demodu-
lated baseband signals are then combined; thereby a
polarization-independent signal is obtained. This is yet BIOGRAPHY
another example of diversity combining of equal-gain type.
Mohsen Kavehrad received his B.Sc. degree in electron-
ics from Tehran Polytechnic, Iran, 1973, his M.Sc. degree
5. CONCLUDING REMARKS
from Worcester Polytechnic in Massachusetts, 1975, and
his Ph.D. degree from Polytechnic University (Brooklyn
Given proper operating condition of the equipment, the
Polytechnic), Brooklyn, New York, November 1977 in elec-
reliability of a communication system is basically deter-
trical engineering. Between 1978 and 1989, he worked
mined by the properties of the signal at the receiving
on telecommunications problems for Fairchild Industries,
GTE (Satellite and Labs.), and AT&T Bell Laboratories.
Polarization
In 1989, he joined the University of Ottawa, Canada,
Directional beam EE Department, as a full professor. Since January 1997,
coupler splitter he has been with The Pennsylvania State University EE
Received Horizontal
signal Department as W. L. Weiss Chair Professor and founding
Vertical Photodiodes director of the Center for Information and Communica-
Local optical tions Technology Research. He is an IEEE fellow for
signal If filters his contributions to wireless communications and opti-
cal networking. He received three Bell Labs awards for his
Ideal limiters contributions to wireless communications, the 1991 TRIO
Discriminators
feedback award for a patent on an optical interconnect,
2001 IEEE VTS Neal Shepherd Best Paper Award, 5 IEEE
Lasers and Electro-Optics Society Best Paper Awards
Σ between 1991–95 and a Canada NSERC Ph.D.-thesis
Baseband equal
gain combiner award in 1995, with his graduate students for contri-
butions to wireless systems and optical networks. He has
To data over 250 published papers, several book chapters, books,
recovery circuit
and patents in these areas. His current research interests
Figure 4. Basic proposed polarization-insensitive receiver. are in wireless communications and optical networks.
736 DMT MODULATION
DMT MODULATION
N−1
s(t) = xν (t) cos(ων t) (1)
ν=0
ROMED SCHUR
STEPHAN PFLETSCHINGER For the spectra of the transmitter signals, we have
JOACHIM SPEIDEL
Institute of Telecommunications s(t) ↔ S(ω) (2)
University of Stuttgart
Stuttgart, Germany xν (t) cos(ων t) ↔ 12 Xν (ω − ων ) + 12 Xν (ω + ων ) (3)
1
N−1
1.1. Outline
The term S(ω) is shown in Fig. 2 for the simplified case
Discrete multitone (DMT) modulation as well as orthog- that all Xν (ω) are real-valued. In conventional FDM
onal frequency-division multiplex (OFDM) belong to the systems, the frequencies ων have to be chosen in such
category of multicarrier schemes. The early ideas go back a way that the spectra of the modulated signals do not
to the late 1960s and early 1970s [e.g., 1,2]. With the overlap. If all baseband signals Xν (ω) have the same
DMT MODULATION 737
cos(w0t )
cos(w0t )
x 0(t ) x^0(t )
BP LP
cos(wnt ) cos(wmt )
xn(t ) s (t ) x^m(t )
Channel BP LP
bandwidth, the carrier frequencies are normally chosen The output signal s(t) of the multicarrier transmitter is
equidistantly. However, the FDM system described above given by
wastes valuable bandwidth because (1) some space must be
N−1
∞
available for the transition between the passband and the s(t) = TS ejων t Xν [k]g(t − kTS ) (7)
stopband of each bandpass filter in Fig. 1; and (2), if s(t) is ν=0 k=−∞
real, S(ω) is conjugated symmetric, that is, S(−ω) = S∗ (ω),
where ∗ denotes the conjugate complex operation. As a where g(t) is the impulse response of the impulse shaping
consequence, the bandwidth of S(ω) is twice as required filter. The carrier frequencies ων are chosen as integer
theoretically. multiples of the carrier spacing ω:
The second drawback can be solved by quadrature ων = ν · ω, ν = 0, 1, . . . , N − 1 (8)
amplitude modulation (QAM) [9]. With QAM, two inde-
pendent baseband signals are modulated onto a sine and a All practical multicarrier systems are realized with
cosine carrier with the same frequency ων which gives an digital signal processing. Nevertheless, we will use the
unsymmetric spectrum. As a result the spectral efficiency analog model of Fig. 3 in this section, because the analysis
is doubled. can be done more conveniently and the understanding
The first drawback of FDM can be overcome by of the principles of multicarrier modulation is easy.
multicarrier modulation, such as by DMT, which allows Section 3.1 deals with the digital implementation.
for a certain overlap between the spectra illustrated in The output signal s(t) can be either real or complex. If
Fig. 2. Of course, spectral overlap in general could lead to complex, s(t) can be considered as the complex envelope
nontolerable distortions. So the conditions for this overlap and the channel is the equivalent baseband channel. As
have to be carefully established in order to recover the will be shown in Section 3.2 , DMT modulation provides a
signal perfectly at the receiver side. This will be done in real-valued output signal s(t). Nevertheless, the following
the next section. derivations hold for both cases.
The receiver input signal w(t) is demodulated and
2. MULTICARRIER BASICS filtered by the receiver filters with impulse response h(t),
resulting in the signal yµ (t), µ ∈ {0, . . . , N − 1}. Sampling
2.1. Block Diagram and Elementary Impulses this signals at the time instants t = kTS gives the discrete-
Following the ideas outlined in the previous section and time output Yµ [k]. The signal after filtering is given by
applying discrete-time signals Xν [k] at the input leads us yµ (t) = (w(t)e−jωµ t ) h(t)
to the block diagram in Fig. 3. ∞
The impulse modulator translates the sequence = w(τ )e−jωµ τ h(t − τ )dτ, µ = 0, . . . , N − 1 (9)
{. . . , Xν [0], Xν [1], . . .} into a continuous-time function −∞
∞ where denotes the convolution operation. For the
xν (t) = TS Xν [k] · δ(t − kTS ), ν = 0, 1, . . . , N − 1 moment, we assume an ideal channel. Thus w(t) = s(t)
k=−∞ and we get from (9) with (7) and (8)
(5) ∞ 'N−1
where TS is the symbol interval, δ(t) is the Dirac impulse
yµ (t) = TS ej(ν−µ)ωτ
and δ[k] is the unit impulse with −∞ ν=0
)
1 for k = 0
∞
δ[k] = (6)
0 for k = 0 × Xν [k]g(τ − kTS )h(t − τ ) dτ (10)
k=−∞
Now, the target is to recover the sequences Xν [k] at We can now formulate the condition for zero interfer-
the receiver side without distortion. As the given system ence as
is linear, we can restrict our analysis to the evaluation rd (kTS ) = δ[d]δ[k] (17)
of the response to one single transmitted symbol on one
subcarrier. We can then make use of the superposition This can be interpreted as a more general form of
theorem for the general case of arbitrary input sequences. the first Nyquist criterion because it forces not only
Basically, two types of interference can be seen from zero intersymbol interference but also zero intercarrier
Eq. (11): interference [10,11]. If we set d = 0, Eq. (17) simplifies to
the Nyquist criterion for single-carrier systems. In the
1. Intersymbol interference (ISI), in which a symbol context of multicarrier systems and filterbanks, (17) is
sent at time instant k on subcarrier µ has impact on also often called an orthogonality condition or a criterion
previous or following samples yµ (lTS ) with l = k. for perfect reconstruction.
2. Intercarrier interference (ICI), which is the result of For DMT systems without guard interval g(t) = h(t)
crosstalking between different subchannels at time holds and they are rectangular with duration TS . The
instants kTS . purpose of the guard interval will be explained in the next
section. With the carrier spacing
Without loss of generality we assume that the system
model in Fig. 3 has zero delay, i.e., the filters g(t), h(t) are 2π
not causal. We sent one unit impulse δ[k] at the time k = 0 ω = , (18)
TS
on subcarrier ν:
we obtain from Eq. (16) the elementary impulses
Xi [k] = δ[ν − i]δ[k] (12)
,
|t|
The received signal yµ (t) is free of any interference at the r0 (t) = 1− |t| ≤ TS (19)
TS
sampling instants kTS , if 0 elsewhere
,
sgn(t)(−1)d
Yµ [k] = δ[ν − µ]δ[k] (13) 1 − ejd(2π/TS )t for |t| ≤ TS
rd (t) = j2π d d = 0
To gain further insight into the nature of ISI and ICI, 0 elsewhere
we take a closer look at the received signals before they (20)
are being sampled. A unit impulse Xν [k] = δ[k] on carrier
ν yields the transmitter output signal These functions are shown in Fig. 4a and give us some
insight into the nature of intersymbol and intercarrier
s(t) = TS g(t)ejνωt (14) interference. We clearly see that the sampling instants
t = kTS do not contribute to any interference. As the
which is also the input signal w(t) as we assume an ideal elementary impulses are not always zero between the
channel. We now define the elementary impulse rν,µ (t) as sampling instants, they cause quite some crosstalk
the response yµ (t) to the receiver input signal of Eq. (14). between the subchannels. The oscillation of the crosstalk
is increased with the distance d between transmitter
rν,µ (t) = TS g(t)ej(ν−µ)ωt h(t) and receiver subchannel, whereas their amplitude is
∞ decreasing proportional to 1/d.
= TS g(τ )ej(ν−µ)ωτ h(t − τ ) dτ (15) The impact of interference can also be seen from Fig. 4b,
−∞
where the eye diagram of the real part of yµ (t) is shown
Obviously, rν,µ (t) depends only on the difference d = ν − µ. for a 16-QAM on N = 256 subcarriers; that is, the real
Thus we get and imaginary parts of the symbols Xν [k] are taken at
random out of the set {±1, ±3} and are modulated on all
rd (t) = TS (g(t)ejdωt ) h(t) (16) subcarriers.
DMT MODULATION 739
(a) (b) 6
1 r0
Re{r1}
4
0.8 Im{r1}
Re{r2}
0.6 Im{r2} 2
Re{ym(t )}
0.4
0
0.2
−2
1
−4
−0.2
−0.4 −6
−1 −0.5 0 0.5 1 0.8 0.9 1 1.1 1.2
t /TS t /TS
Figure 4. (a) Elementary impulses rd (t) of a multicarrier system with rectangular impulse
shapes; (b) eye diagram of real part of yµ (t) for 16-QAM and N = 256 carriers.
Obviously, even with an ideal channel the horizontal For the elementary impulses follows
eye opening tends to zero, requiring very small sampling
−|t| + TS − TG /2 TG TG
jitter and making a correct detection of the transmitted
for < |t| < TS −
T − T 2 2
symbols extremely difficult. S G
TG
r0 (t) = 1 for |t| <
2.2. Introduction of a Guard Interval
2
0 TG
An effective solution to increase the horizontal eye opening for |t| > TS −
2
is the introduction of a guard interval. It is introduced (23)
by choosing different impulse responses g(t) = h(t) at
sgn(t)(−1)d jsgn(t) dπ(T /Tu ) TG
e G for < |t|
transmitter and receiver. The duration of the guard j2π d 2
interval is given as rd (t) = TG d = 0
− ejd(2π /Tu )t < TS −
2
0 elsewhere
TG = TS − Tu ≥ 0 (21)
(24)
where TS and Tu denote the length of the impulse response
As can be seen from Fig. 5, there are flat regions
of the transmitter and the receiver filter g(t) and h(t),
around the sampling instants t = kTS , k = 0, ±1, . . .
respectively. Both impulse responses are rectangular and
symmetric to t = 0. which prevent any interference. In many multicarrier
In contrast to (18), the carrier spacing is now systems, DMT as well as OFDM, the guard interval is
chosen as TG /Tu = 16
1
, . . . , 14 . In Fig. 5b the eye diagram
2π for a DMT system with guard interval is shown.
ω = (22) Obviously, the horizontal eye opening is now increased
Tu
(a) (b) 6
1 r0
Re{r1}
4
0.8 Im{r1}
Re{r2}
0.6 Im{r2} 2
Re{ym(t )}
0.4
0
0.2
−2
0
−4
−0.2
−0.4 −6
−1 −0.5 0 0.5 1 0.8 0.9 1 1.1 1.2
t /TS t /TS
Figure 5. (a) Elementary impulses rd (t) with guard interval TG = Tu /4; (b) eye diagram of real
part of yµ (t) for 16-QAM and N = 256 carriers and TG = Tu /4.
740 DMT MODULATION
and is as long as the guard interval. Therefore, the can be used similarly. We now adopt causal discrete-time
impact of timing jitter and interference is reduced filters:
considerably. However, DMT and OFDM systems are
much more sensitive to timing jitter than are single-carrier 1
√ for n = 0, . . . , NS − 1
systems [12]. g[n] = (28)
N
During the guard interval period TG , no information can 0 elsewhere
be transmitted. Thus, the spectral efficiency is reduced by
a factor TG /TS . The output signal of the transmitter in Fig. 6 can be
written as
3. PRINCIPLES OF DISCRETE MULTITONE MODULATION
N−1
∞
s[n] = w−νn Xν [k]g[n − kNS ] (29)
3.1. Implementation with Digital Signal Processing ν=0 k=−∞
w −nn w mn
while the second block (k = 1) yields the output and set the latter to zero. After some calculations, this
provides the following conditions on the input sequences
{s[NS ], . . . , s[2NS − 1]} Xν [k]:
∗ N
We recognize that each output block contains the Xν = XN−ν , ν = 1, . . . , −1 (37)
2
symbols x0 [k], . . . , xN−1 [k] plus G additionally samples
taken out of the same set. The calculation of s[n] Consequently, we get from (32) with Xν [k] = Xν [k] + jXν [k]
applying a blockwise IDFT is illustrated in Fig. 7, where
a block of N input symbols is first IDFT-transformed and
then parallel–serial-converted. The commutator puts out 1
NS symbols for each block by turning more than one xi [k] = √ X0 [k] + (−1)i XN/2 [k] + 2
N
revolution, resting after each block in a different position.
Thus, although each block contains all transformed
N
symbols, the ordering and the subset of doubled samples
2
−1
2π 2π
vary. To overcome this inconvenience and to facilitate × Xν [k] cos iν − Xν [k] sin iν (38)
N N
the block processing at the receiver side, practically all ν=1
DMT systems compute a block of N samples by inverse
DFT processing and insert the additional samples at the To simplify the transmitter scheme, we can choose
beginning of the block. Therefore the guard interval is also X̃0 = X0 + jXN/2 as indicated in Fig. 8. For practical
called cyclic prefix (CP). Thus, a DMT block for index k implementations, this is of minor importance as usually
will be ordered as follows: X0 and XN/2 are set to zero. The reason will be discussed
in Section 3.3.
xN−G [k], xN−G+1 [k], . . . , xN−1 [k], x0 [k], x1 [k], . . . , xN−1 [k] It is convenient to interpret the DMT signal in (38) as a
- ./ 0 - ./ 0
sum of N/2 QAM carriers where Xν [k] modulates the νth
G samples, cyclic prefix N data samples
(34) carrier.
Thus, we can express the transmitter signal after insertion The detailed block diagram of a DMT transmitter
of the CP as as realized in practice is depicted in Fig. 8. The
parallel–serial conversion and the insertion of the guard
x̃[n] = xn mod NS [n div NS ] (35)
interval is symbolized by the commutator which inserts
at the beginning of each block G guard samples,
The discrete-time output signal with block processing copied form the end of the block. A digital-to-analog
is denoted as x̃[n] in order to distinguish it from converter (DAC) provides the continuous-time output
s[n]. At this point, the mathematical notation seems x̃(t). We clearly see that the guard interval adds some
to be a little bit more tedious than in Eq. (30), overhead to the signal and reduces the total data
but (32) and (35) describe the signals for independent throughput by a factor η = G/NS . Therefore, the length
block processing, which simplifies the operations in the G of the cyclic prefix is normally chosen much smaller
transmitter and especially in the receiver considerably. than the size N of the inverse DFT. We will see in
Later we will derive a compact matrix notation that Section 3.5 that the cyclic prefix allows for a rather simple
operates blockwise, taking advantage of the block equalizer.
independence. Figure 9 depicts the corresponding DMT receiver.
After analog-to-digital conversion and synchronization,
3.2. Real-Valued Output Signal for Baseband Transmission the signal ỹ[n] is serial–parallel converted. Out of
the NS symbols of one block, the G guard samples
Because the DMT signal is transmitted over baseband are discarded and the remaining N symbols are fed
channels, we must ensure that the output signal x̃[n] into a DFT. Note that synchronization and timing
and thus xi [k] are real-valued. Therefore we have to estimation are essential receiver functions to produce
decompose xi [k] of (32) into real and imaginary parts reliable estimates of the transmitted data at the receiver.
There have been many proposals for synchronization
with DMT modulation, such as that by Pollet and
x0[k ] Peeters [13].
X0[k ]
Another major item to be considered with DMT modu-
x1[k ]
X1[k ] lation is the large peak-to-average power ratio (PAPR)
N -point s[n ] of the output signal x̃[n]. Thus, a wide input range
•
• IDFT •
•
of the DAC is required. Furthermore, large PAPR can
• •
cause severe nonlinear effects in a subsequent power
xN −1[k ] amplifier. The PAPR should be considered particu-
XN −1[k ] larly for multicarrier modulation with a large num-
bers of subcarriers, because this ratio increases with
Figure 7. Inverse IDFT with parallel–serial conversion. the number of subcarriers. In order to reduce the
742 DMT MODULATION
m0 ~
X0 X0 x0
Re 0
S
m1 X1 x1
b [n ] 1
N -point
IDFT
P mN /2−1 XN /2−1
N /2−1
XN /2
Im N /2
Bitloading xN −G
& gain X *N /2−1
table N /2+1
conjugate
Complex
~
x [n]
DAC
X 1* xN −1 and
N−1
lowpass
~
x (t )
Channel
FDE Demapping
y0 Y0
Cyclic ψ0
prefix
y1 Y1 P
ψ1 ^
b [n ]
Not Bitloading
used & gain table
yN −1
PAPR, several techniques have been proposed, such From Eqs. (22) and (25) we obtain TA = 2π/(Nω).
as ‘‘selective mapping’’ and ‘‘partial transmit sequence’’ Together with (28), it follows from (39) that
approaches [14].
1−w
(ω/ω−ν)NS
ω
1 for = ν
3.3. Spectral Properties S(ω) = √ 1 − w(ω/ω−ν) ω (40)
N
NS for
ω
=ν
We calculate the spectrum of the output signal s[n] in ω
Fig. 6 for one modulated subchannel, namely, the input
In order to obtain a real-valued output signal, according
signal is given by (12). This gives
to (37), we have to sent a unit impulse on subchannel
N − ν, too. In Fig. 10a the magnitude of the spectrum is
∞
s[n] = g[n] · w−νn ↔ S(ω) = g[n] · w−νn · e−jωnTA (39) depicted for the case that subcarrier ν is modulated. The
n=−∞ spectrum is quite similar to a sin(ω)/ω-function. However,
DMT MODULATION 743
(a) 10 (b) 2 dB
9 n = 10 0 dB
8 n = 11
7 n = 12 −2 dB
6
S (w)
−4 dB
Φss(w)
5
4 −6 dB
3 −8 dB
2
−10 dB
1
0
6 7 8 9 10 11 12 13 14 15 16 −20 −15 −10 −5 0 5 10 15 20
w /∆w w /∆w
Figure 10. (a) Magnitude response |S(ω)| for modulated carrier ν, G = N/8; (b) power spectral
density ss (ω) for N = 32, G = 4. The ripple increases with the duration of the guard interval.
as Fig. 6 is a discrete-time system, S(ω) is periodic with conjugate of the sequences X1 [k], . . . , X15 [k]. Note that we
ωA = N · ω. Note that the spectrum of a single subcarrier can approximately add the PSD of all sequences sν [n]
with carrier frequency νω does not have zero crossings despite their pairwise correlation because their spectra
at ω = (ν ± m)ω, m = 1, 2, . . . if a guard interval with overlap to only a very small extent. As the antialiasing
length G > 0 is used. lowpass in the DAC has a cutoff frequency of ωA /2 [15],
In order to calculate the power spectral density (PSD) of the carriers in this frequency region cannot be modulated;
a DMT signal, we now consider stochastic input sequences therefore, at least the carrier at ν = N/2 remains unused.
Xν [k] that are uncorrelated, have zero mean and variances The carrier at ν = 0 is seldom used, either, because most
σν2 . Then the PSD of the output signal sν [n] of modulator ν channels do not support a DC component.
in Fig. 6 becomes
3.4. System Description with Matrix Notation
σν2
sν sν (ω) = · |G(ω)|2 , with We have seen that the transmitter processes the data
NS
ω blockwise, resulting in simple implementation without
∞
1 1− w ω NS the need of saving data from previous blocks. We now
G(ω) = g[n]e−jωnTA = √ ω (41)
N 1 − w ω consider how this block independence can be extended
n=−∞
to the receiver when including the channel. Figure 11
From this it follows the total PSD of the transmitter shows a block diagram for the complete DMT transmission
system. For the moment we focus on the signal only and
1 2
15
ss (ω) = σ · |G(ω − νω)|2 (42) consider the influence of the noise later. The receiver input
NS ν=−15 ν signal is then given by
ν=0
ψ0 ^
x0 r [n ] y0 Y0 X0
X0
x1 P S y1 Y1 ψ1 ^
X1 X1
~
x [n ] x~[n ] y [n ] y [n ]
IDFT CP c [n ] CP DFT YN /2−1 ψN /2−1 X^N
−1
2
Not
xN −1 S P yN −1 used
XN −1
F −1 C F
impulse response c[n] with L + 1 samples c[0], . . . , c[L]. The latter expression is preferred as it directly relates
From (43) we conclude that for independence of the the signals x and y and transforms the effect of the
received blocks, it must hold that guard interval insertion and removal into the channel
matrix C. The matrix equation (47) represents the circular
L≤G (44) convolution, which is equivalent to multiplication in the
frequency domain [15]:
Thus, the cyclic prefix must be at least as long as the
length of the channel impulse response. If this condition
N−1
is satisfied, we can adopt a vector notation for the block y[n] = x[n] ⊗ c[n] = x[m] · c[(n − m) mod N] (48)
data: m=0
where y is the input signal after removal of the cyclic This can be easily verified by checking the equation
prefix. It can be expressed as Cϑµ = λµ ϑµ . We identify the inverse DFT matrix as
c[G] c[G − 1] ··· c[0] 0 ··· 0 F−1 = (ϑ0 , ϑ1 , . . . , ϑN−1 ) (50)
0 c[G] c[G − 1] ··· c[0] ··· 0
y= · x̃
··· ··· This means that the eigenvectors of the channel matrix
0 0 ··· c[G] c[G − 1] · · · c[0] C are the columns of the inverse DFT matrix. As a
(46) consequence, we can diagonalize the channel matrix by
multiplying with the IDFT and its inverse, the DFT matrix
or F
FCF−1 = where = diag(λµ ) (51)
y0
y1 With X = (X0 [k], . . . , XN−1 [k])T and Y = (Y0 [k], . . . ,
.
. YN−1 [k])T we can write
.
yG−1
= x = F−1 X, y = Cx = CF−1 X, Y = Fy = FCF−1 X
yG
(52)
yG+1
. We can now describe the input-output-relation of
..
the whole transmission system of Fig. 11 with a single
yN−1 diagonal matrix: Y = X, or Yµ [k] = λµ · Xµ [k]. The result
shows that due to the cyclic prefix the parallel subchannels
c[0] 0 ··· 0 c[G] ··· c[2] c[1]
c[1] are independent, and perfect reconstruction can be
c[0] 0 · ·· 0 c[G] ··· c[2]
··· realized by a simple one-tap equalizer with tap weight ψµ
· · · · · · · · · ··· ··· ··· ···
c[G − 1] per subchannel located at the output of the receiver DFT
··· ··· c[0] 0 ··· 0 c[G]
c[G] c[G − 1] as shown in Fig. 11. As equalization is done after the DFT,
··· ··· c[0] 0 ··· 0
this equalizer is usually referred to as a frequency-domain
0 c[G] c[G − 1] ··· ··· c[0] 0 ···
··· equalizer (FDE).
··· ··· ··· ··· ··· ··· ···
Following from Eq. (49), we can interpret the eigenval-
0 ··· ··· c[G] ··· ··· c[1] c[0]
- ./ 0 ues λµ of the channel matrix as the DFT of the channel
C impulse response. If we define
x0
N−1
x1 C(ω) = c[n] · e−jωnTA (53)
.
.
. n=0
x
× G−1 (47) as the discrete-time Fourier transform of c[n], we see that
xG
the eigenvalues are just the values of the channel transfer
xG+1
. function at the frequencies ωµ = µ · ω:
..
xN−1 λµ = C(ωµ ) (54)
DMT MODULATION 745
Because DMT is a baseband transmission scheme, the For minimization of the MSE, we set the gradient of (58)
channel impulse response c[n] is real-valued and therefore to zero:
its spectrum shows hermitian symmetry:
∂ ∂
E{|X̂µ − Xµ |2 } = 0 E{|X̂µ − Xµ |2 } = 0 (59)
λ0 , λN/2 ∈ R; λµ = λ∗N−µ for µ = 1, . . . , N/2 − 1 ∂ψµ ∂ψµ
(55)
which results in
Let us summarize the results. We have shown that
the insertion of the cyclic prefix translates the linear λµ −λµ
convolution (43) into a circular convolution (48) that ψµ = , ψµ =
|λµ |2 + σq,µ
2 /σ
µ
2 |λµ |2 + σq,µ
2 /σ 2
µ
corresponds to a multiplication in frequency domain,
as long as the impulse response of the channel is not λ∗µ
⇒ ψµ = (60)
longer than the guard interval. If the guard interval |λµ |2 + σq,µ
2 /σ 2
µ
is sufficiently large, no interblock and no intercarrier
interference occur and each subchannel signal is weighted The subchannel SNR can be calculated as
only by the channel transfer function at the subchannel
frequency. If the channel impulse response c[n] exceeds E{|Xµ |2 } σµ2 · |λµ |2
SNRµ = = 2
+1 (61)
the guard interval, the described features of DMT are E{|X̂µ − Xµ |2 } σq,µ
not valid anymore. To overcome this problem, c[n] can be
compressed by a so-called time-domain equalizer (TDE) which gives a better performance than the first case,
to the length of the guard interval. The TDE will be especially for low SNR.
applied before the guard interval is removed at the receiver It is this amazingly simple equalization with a one-tap
[e.g., 16–18]. equalizer per subchannel that makes DMT so popular for
the transmission over frequency-selective channels like
3.5. Frequency-Domain Equalization (FDE) the telephone subscriber line.
The output signal of the DFT at the receiver in Fig. 11 is
given by 4. CHANNEL CAPACITY AND BIT LOADING
Yµ [k] = λµ · Xµ [k] + qµ [k] (56)
The channel capacity [19], or the maximum error-free
where qµ [k] represents the noise introduced on the bitrate, of an ideal AWGN channel is given by
subchannel after DFT processing. For many practical ' )
channels, the noise r[n] on the channel will be Gaussian ωB σy2
Rmax = · log2 1 + 2 (62)
but not white. Therefore, the power of qµ [k] will depend 2π σr
on the subchannel. If N is chosen sufficiently large, the
subchannel bandwidth is very small and thus the noise where ωB = N · ω is the total channel bandwidth, σy2 is
in each subchannel will be approximately white with the received signal power, and σr2 is the noise power. The
2
variance σq,µ . Further, the noise is uncorrelated with capacity of subchannel µ with very small bandwidth ω
the noise on any other subchannel [5]. The equalizer is then
coefficients ψµ can be derived with or without considering
the noise term in Eq. (56). If the signal-to-noise ratio ω ωxx (ωµ )|C(ωµ )|2
Rµ = · log2 1 + (63)
(SNR) is high, we approximately neglect the noise and 2π ωrr (ωµ )
find the optimal FDE parameters as ψµ = 1/λµ . With
where xx (ω) denotes the power spectral density (PSD) of
X̂µ [k] = ψµ Yµ [k] follows X̂µ [k] = Xµ [k] + qµ [k]/λµ for the
the transmitter signal x̃[n], C(ω) is the channel transfer
reconstructed signal at the receiver. The SNR at the output
function in (53) and rr (ω) is the PSD of the (colored) noise
of the µ-th subchannel is then given by
r[n]. The total capacity of all subchannels is
E{|Xµ |2 } σµ2 · |λµ |2
ω
N−1
SNRµ = = (57) xx (ωµ )|C(ωµ )|2
E{|X̂µ − Xµ |2 }
2
σq,µ R= · log2 1 + (64)
2π µ=0 rr (ωµ )
where σµ2 = E{|Xµ [k]|2 } is the signal power of the µth input
signal at the transmitter. In the limit for ω → 0 with ω · N = ωB = const. we get
However, the SNR per subchannel can be improved by ωB
1 xx (ω)|C(ω)|2
considering the noise in the derivation of the equalizer R= log2 1 + dω (65)
2π 0 rr (ω)
coefficients ψµ . Considering the AWGN (additive white
Gaussian noise) in the subchannels, we calculate the We now want to maximize the channel capacity R,
equalizer coefficients by minimizing the mean-square subject to the constraint that the transmitter power is
error (MSE) E{|X̂µ − Xµ |2 } which can be written with limited: ωB
ψµ = ψµ + jψµ and λµ = λµ + jλµ as 1
xx (ω)dω = Pt (66)
2π 0
E{|X̂µ − Xµ |2 } = (ψµ2 + ψµ2 )(|λµ |2 σµ2 + σq,µ
2
)
Thus, we search for a function xx (ω) that maximizes
− 2σµ2 (ψµ λµ − ψµ λµ ) + σµ2 (58) Eq. (65) subject to the constraint (66). This can be
746 DMT MODULATION
accomplished by calculus of variations [20]. Therefore we increases with frequency and distance. A modulation
set up the Lagrange function scheme that is well suited for the wide variations in
the ADSL channel characteristics is provided by DMT.
|C(ω)|2 In particular, rate adaption is quite simple for DMT
L(xx , ω) = log2 1 + xx (ω) + λ · xx (ω) (67)
rr (ω) modulation to optimize the ADSL system transmission.
Some aspects of an ADSL system are regarded in the
which must fulfill the Euler–Lagrange equation remainder of this section.
The parameter of the DMT modulation scheme for
∂L d ∂L dxx
= , where xx = (68) downstream transmission are determined by the ADSL
∂xx dω ∂xx dω standard [8] as follows:
This requires that xx (ω) + rr (ω)/|C(ω)|2 = 0 = const.
Thus, • FFT size of N = 512. Consequently, excluding the
two carriers at ν = 0 and ν = N/2, 255 usable parallel
xx (ω) = 0 − rr (ω)/|C(ω)| for |ω| < ωB
2 subchannels result.
(69)
0 elsewhere • Sampling rate fA = 1/TA = 2.208 MHz.
• Guard interval length G = 32 = N/16.
This represents the ‘‘water-filling solution.’’ We can
interpret rr (ω)/|C(ω)|2 as the bottom of a bowl in which • Adaptive bitloading with a maximum number of
we fill an amount of water corresponding to Pt . The water mmax = 15 bits per subcarrier (tone).
will distribute in a way that the depth represents the • A flat transmit power spectral density of about
wanted function xx (ω), as illustrated in Fig. 12. −40 dBm/Hz (dBm = decibels per milliwatt). As the
The optimum solution according to (69) can be achieved spectrum ranges up to fA /2 = 1.104 MHz, the total
by appropriately adjusting the constellation sizes and the transmit power is about 20 dBm.
gain factors for the bit-to-symbol mapping for each carrier.
In practice, when the bit rate assignments are The achievable data throughput depends on the bit
constrained to be integer, the ‘‘water-filling solution’’ has allocation that is established during initialization of the
to be modified. There have been extensive studies on the modem and is given by
allocation of power and data rate to the subchannels,
1
known as bitloading algorithms [21–24]. In the following, N/2−1
(a) 40 dB (b) 8
7
35 dB
17. R. Schur, J. Speidel, and R. Angerbauer, Reduction of guard uses a dedicated optical wavelength (or, equivalently, color
interval by impulse compression for DMT modulation on or frequency) [2]. Multiplexing in the frequency domain
twisted pair cables, Proc. IEEE Globecom’00, Nov. 2000, has been used for many decades in the communication
Vol. 3, pp. 1632–1636. industry, including radio or TV broadcast, where several
18. W. Henkel, Maximizing the channel capacity of multicarrier channels are multiplexed and transmitted over the air
transmission by suitable adaption of time-domain equalizer, or cable. The information signals are modulated at the
IEEE Trans. Commun. 48(12): 2000–2004 (Dec. 2000). transmitter side on carrier signals of different frequencies
19. C. E. Shannon, A mathematical theory of communication, (wavelengths), where they are then combined for trans-
Bell Syst. Tech. J. 27: 379–423, 623–656 (July, Oct. 1948). mission over the same medium. At the receiver side, the
20. K. F. Riley, M. P. Hobson, and S. J. Bence, Mathematical channels are selected and separated using bandpass fil-
Methods for Physics and Engineering, Cambridge Univ. Press, ters. More recent advances in optical technology allow for
Cambridge, UK, 1998. the separation of densely spaced optical channels in the
21. U.S. Patent 4,679,227 (July, 1987), D. Hughes-Hartogs, spectrum. Consequently, more than 80 optical channels
Ensemble modem structure for imperfect transmission media. can be transmitted over the same fiber, with each channel
transporting payload signals of ≥2.5 Gbps (gigabits per
22. P. S. Chow, J. M. Cioffi, and J. A. C. Bingham, A practical
discrete multitone transceiver loading algorithm for data
second). In combination with optical amplification of mul-
transmission over spectrally shaped channels, IEEE Trans. tiplexed signals, DWDM technology provides cost-efficient
Commun. 43(2–4): 773–775 (Feb.–April 1995). high-capacity long-haul (several 100 km) transport sys-
tems. Because of its analog nature, DWDM transport is
23. R. F. H. Fischer and J. B. Huber, A new loading algorithm
for discrete multitone transmission, Proc. IEEE Globecom’96,
agnostic to the frame format and bit rate (within a cer-
Nov. 1996, Vol. 1; pp. 724–728. tain range) of the payload signals. Therefore, it is ideal
in supporting the transportation of various high–speed
24. R. V. Sonalkar and R. R. Shively, An efficient bit-loading
data and telecommunication network services, such as
algorithm for DMT applications, IEEE Commun. Lett. 4(3):
80–82 (March 2000).
Ethernet, SONET, or SDH.
These advantages fit the demands of modern com-
25. S. V. Ahamed, P. L. Gruber, and J.-J. Werner, Digital sub-
munication networks. Service providers are facing the
scriber line (HDSL and ADSL) capacity of the outside loop
merging of telecommunication and data networks and a
plant, IEEE J. Select. Areas Commun. 13(9): 1540–1549 (Dec.
1995). rapid increase in traffic flow, driven mainly by Internet
applications. The level of transmission quality is stan-
dardized in the network operation business, or, commonly
agreed. There are few possibilities to achieve competitive
DWDM RING NETWORKS advantages for network operators. But these differences,
namely, better reliability of service or flexibility of ser-
DETLEF STOLL vice provisioning, have a major influence on the market
JIMIN XIE position. This explains why the transportation of telecom-
Siemens ICN, Optisphere Networks munication and data traffic is a typical volume market [3,
Boca Raton, Florida
p. 288]. As a default strategy in volume markets, it is
JUERGEN HEILES the primary goal to provide highest data throughput at
Siemens Information & Communication the most competitive price in order to maximize market
Networks share. As a secondary goal, network operators tend to
Munich, Germany provide flexibility and reliability as competitive differen-
tiators. DWDM ring networks fulfill these basic market
1. INTRODUCTION demands in an almost ideal way.
In the following sections, we will approach DWDM ring
Ring networks are well known and widely used in today’s networks from three different viewpoints:
Synchronous Optical Network (SONET) or Synchronous
Digital Hierarchy (SDH) transport networks [1]. They • The Network Architecture Viewpoint. DWDM ring
are attractive because they offer reliable and cost- networks are an integral part of the worldwide
efficient transport. In combination with dense wavelength- communication network infrastructure. The network
division multiplexing (DWDM), ring networks provide design and the architecture have to meet certain
high capacity and transparent transport for a variety rules in order to provide interoperability and keep
of client signals. Compared to a star and bus topology, the network structured, reliable, and manageable.
the ring topology provides two diverse routes between any The DWDM ring network architecture, the network
two network nodes. Ring network topologies are therefore layers, and specific constrains of DWDM ring
often used in combination with protection schemes for networks will be discussed in Section 2. We will
increased reliability. Additionally, network management, also take a look at protection schemes, which are
operation and node design are less complex in rings than an important feature of ring networks.
in mesh topologies. • The Network Node Viewpoint. Optical add/drop
DWDM is the key technology to provide high trans- multiplexers (OADMs) are the dominant type of
mission capacity by transmitting many optical channels network elements in DWDM ring networks. The
simultaneously over the same optical fiber. Each channel ring features will be defined mainly by the OADM
DWDM RING NETWORKS 749
functionality. The basic OADM concept and its • ‘‘Signal Quality Supervision.’’ As today’s services
functionality is discussed in Section 3. are mainly digital, the bit error ratio (BER) is
• The Component Viewpoint. Different component the preferred quality-of-service information. All-
technologies can be deployed to realize the OADM optical performance supervision does not provide a
functionality. It is the goal of system suppliers, to uti- parameter that directly corresponds to the BER of
lize those components that minimize the cost involved the signal.
in purchasing and assembling these components and • Optical Transparency Length. This is the maximum
maximize the functionality of the OADM and DWDM distance that can be crossed by an optical channel.
ring network. State-of-the-art technologies and new The transparency length is limited by optical signal
component concepts are presented in Section 4. impairments, such as attenuation of optical signal
power (loss), dispersion, and nonlinear effects. It
2. DWDM RING NETWORK ARCHITECTURE depends on the data rate transported by the
optical channels, the ring capacity, the quality
2.1. Basic Architecture of the optical components, and the type of fiber
(e.g., standard single-mode fiber, dispersion shifted
The basic ring architecture is independent of transport
fiber). For 10-Gbps signals, transparency lengths
technologies such as SONET, SDH, or DWDM. Two-fiber
of 300 km can be achieved with reasonable effort.
rings (Fig. 1, left) use a single fiber pair; one fiber for each
This makes DWDM ring networks most attractive
traffic direction between any two adjacent ring nodes.
for metropolitan-area networks (MAN; ≤80 km ring
Four-fiber rings (Fig. 1, right) use two fiber pairs between
circumference) and regional area networks (≤300 km
each of the nodes. The second fiber pair is dedicated to
ring circumference).
protection or low priority traffic. It allows for enhanced
protection schemes as discussed in Section 2.4.
2.2. Transparent Networks and Opaque Networks
The optical channels are added and dropped by the
OADM nodes. Signal processing is performed mainly If the optical transparency length is exceeded, full
in the optical domain using wavelength-division multi- regeneration of the signal, including reamplification,
plexing and optical amplification techniques. All-optical reshaping, and retiming (3R regeneration) is necessary.
processing provides transparent and cost-efficient trans- With 3R regeneration however, the transport capability is
port of multiwavelength signals. However, it has some restricted to certain bit rates and signal formats. In this
constraints: case, the ring is called opaque. Specific digital processing
equipment must be available for each format that shall
• Wavelength Blocking. In order to establish an optical be transported. Furthermore, 3R regeneration is very
path, either the wavelength of this path has to be costly. The typical opaque DWDM ring topology uses all-
available in all spans between both terminating optical processing within the ring. 3R regeneration is only
add/drop nodes or the wavelength has to be converted performed at the tributary interfaces where the signals
in intermediate nodes along the path. Otherwise, enter and exit the ring (i.e., at the add/drop locations).
wavelength blocking occurs. It results in lower This decouples the optical path design in the ring from
utilization of the ring resources. It can be minimized, any client signal characteristic. Furthermore, it provides
but not avoided completely by careful and farsighted digital signal quality supervision the add/drop traffic.
network planning and wavelength management. It Support of different signal formats is still possible by
has been shown that the positive effects of wavelength using the appropriate tributary interface cards without
conversion in intermediate nodes are limited [4]. the need for reconfiguration at intermediate ring nodes.
Tributary
interfaces
OADM OADM
Line Line
interfaces Working traffic Working traffic
interfaces
Tributary
interfaces
Tributary
interfaces Line Line
interfaces interfaces Working traffic Working traffic
OADM OADM
Tributary
interfaces
Figure 1. Basic ring architectures: left, two-fiber ring; right, four-fiber ring.
750 DWDM RING NETWORKS
h
OC S OADM
S OT
OM
S
OT A
OL
OADM OADM
S
OT A OT
S OL
OM S
S
OADM S OM
OT
OCh
Figure 2. Optical-layer networks.
DWDM RING NETWORKS 751
Protection Loop
switching back
Failure
(e.g., fiber cut)
and protection traffic runs in a clockwise direction. In the optical channel are needed, while the OMS-BSHR allows
counter clockwise direction, the wavelengths N/2 + 1 to the use of a protection switch for all working/protection
N are used for working connections and the wavelengths connections in common.
1 to N/2 are used for protection. In case of a failure, the A general issue of shared protection schemes in all-
two nodes adjacent to the failure location perform ring optical networks is the restricted supervision capability
protection switching (Fig. 3, right). They bridge and select of inactive protection connections. Failures of protection
the clockwise working connection to and from the counter connections that are not illuminated can be detected
clockwise protection connection and vise versa. The whole only after activation. Usually, activation occurs due to a
working traffic that passed the failure location is now protection situation. In that case, the detection of failures
looped back to the other direction of the ring. in the protection connection comes too late. Furthermore,
In a four-fiber BSHR, dedicated fibers for working traffic the dynamic activation/deactivation of protection channels
and protection traffic are used. By this, four-fiber BSHR on a span has an influence on the other channels of the
rings provide twice the capacity than the two-fiber BSHR span. It will result in bit errors for these channels if not
rings. In addition to the ring protection of the two-fiber carefully performed.
BSHR, the four-fiber BSHR can perform span protection.
2.5. Single-Fiber Bidirectional Transmission
This provides protection against failures of the working
fiber pair between two adjacent nodes. An ‘‘automatic DWDM allows bidirectional signal transmission over a
protection switching’’ (APS) protocol is necessary for the single fiber by using different wavelengths for the two
coordination of the bridge and switch actions and for the signal directions. This allows for building two-fiber ring
use of the shared protection connection. Unlike in 1 + 1 architectures by only using a single fiber between the
OCh protecting the utilization of the ring capacity for ring nodes. It also allows for four-fiber architectures with
working traffic is always exactly 50%. A disadvantage of two physical fibers. However, the two-fiber OMS-BSHR
OMS-BSHR are the long pathlengths that are possible and OCh-SPR protection schemes with their sophisticated
due to the loop back in the protection case. In order not to wavelength assignment are not supported.
exceed the transparency length in a case of protection, the
ring circumference is reduced. 3. OADM NODE ARCHITECTURE
The bidirectional optical channel shared protection ring
(OCh-SPR) avoids the loop back problem of the OMS- The network functionalities described in the previous
BSHR. Here, ring protection is performed by the add and section are realized in the optical add/drop multiplexers.
drop nodes of the optical channel. The add nodes switch The OADM node architecture and functionality is
the add channels to the working or protection connection. described below.
The drop nodes select the working or protection channels
directly (Fig. 4). 3.1. Basic OADM Architecture
OCh-SPR can be used in both the two- and four- An OADM consists of three basic modules as shown in
fiber rings. An automatic protection switching protocol Fig. 5:
is required for the coordination of the bridge and switch
actions; as well as, for the use of the shared protection • The DWDM line interfaces, which physically process
connection. In an OCh-SPR, protection switches for each the ring traffic
752 DWDM RING NETWORKS
Protection
switching
Failure
(e.g., fiber cut)
Figure 4. Two-fiber OCh shared protection: left, normal operation; right, protection case.
DWDM DWDM
DWDM Add/drop DWDM
line line
ring module ring
interface interface
Tributary
interfaces
• The add/drop module, which performs both through 100% add/drop capacity if the capacity of the tributary
connections and the add/drop function for the optical interfaces equals the capacity of both line interfaces.
channels For instance, an OADM of 100% add/drop capacity in
• The tributary interfaces, which physically process the an 80-channel DWDM ring provides 160 channels at
add/drop channels its tributary interfaces. Designing an OADM for an
appropriate add/drop capacity is an efficient approach in
The detailed functionality of these modules depends on minimizing cost and complexity.
the OADM features. 3.2.2. Flexibility. The selection of add/drop channels in
an OADM can either be remotely configurable for channels
3.2. Characteristic OADM Features
that require frequent reconfiguration (dynamic), or it can
This section describes the most important features of the be performed manually for ‘‘long term’’ connections (static).
OADM nodes. Furthermore, channels that will never be dropped can be
passed through directly from one line interface to the
3.2.1. Add/Drop Capacity. The add/drop capacity is other without add/drop capabilities (express channels).
defined as the relationship between the capacity of Other important OADM features are
the tributary interfaces and the capacity of both line
interfaces. In an extreme case, the entire traffic entering • Transparency or opacity as discussed in Section 2.2.
the OADM from both sides of the ring must be dropped. • Wavelength conversion capability as discussed in
Therefore, an add/drop multiplexer is considered to have Sections 2.
DWDM RING NETWORKS 753
• Protection switching features as discussed in to minimize the hardware complexity, in other words
Section 2.4. installation costs, traffic classes are assigned to optical
• Signal supervision capabilities can be based on channel groups accordingly. The optical channel groups
optical parameters only, such as optical power and are processed according to their specific characteristics.
optical signal-to-noise ratio (OSNR). Additionally, in Figure 6 illustrates the group separation (classes) of
opaque rings, signal supervision can be based on bit DWDM traffic by group multiplexer and demultiplexer
error ratio monitoring. components. The groups are processed within the add/drop
module using different techniques. Methods for defining
3.3. The DWDM Line Interfaces groups and related component technologies are discussed
The main functionality of the DWDM line interfaces in Section 4 based on technological boundary conditions.
includes, primarily the compensation for physical degra- For dynamic traffic, wavelength routing technologies,
dation of the DWDM signal during transmission and such as remotely controlled optical switching fabrics
secondarily the supervision of the DWDM signal. There or tunable filters, are used. For static traffic, manual
are three major categories of degradation: (1) loss of configuration via distribution frames (patch panels)
optical signal power due to attenuation, (2) dispersion is employed. Wavelength routing necessitates higher
of the impulse shape of high-speed optical signals, and installation costs than manual configuration. However,
(3) impulse distortion or neighbor channel crosstalk due it enables the reduction of operational costs due to
to nonlinear fiber-optical effects. To overcome these degra- automation. Operational costs can become prohibitive
dations, optical amplifiers and dispersion compensation for manual configuration if frequent reconfiguration is
components may be used. For metropolitan applications, necessary. A cost-optimal OADM processes part of the
moderate-gain amplifiers can be used to reduce the non- traffic via distribution frames and the other part via
linear fiber-optical effects, since the node distance is wavelength routing techniques. Hence, the use of costly
relatively short. Design approaches are mentioned in the components is focused on cases where they benefit
references [2,10]. In the line interfaces, signal supervision most [11]. Designing an OADM for a specific add/drop
is based on optical parameters (e.g., optical power, OSNR). capacity is very efficient especially in the wavelength
routing part of the add/drop module. Figure 7 shows a
3.4. The Add/Drop Module possible architecture of a wavelength routing add/drop
The add/drop module performs the following major matrix.
functions: The add/drop process of the dynamic traffic can
be performed in two stages: the add/drop stage and
• It configures the add/drop of up to 100% of the ring the distribution stage. In order to perform wavelength
traffic. conversion, wavelength converter arrays (WCA) can
• It converts the wavelengths of the add/drop and the be used in addition. The add/drop stage is the only
through signals (optional). requirement. The distribution stage and the wavelength
• It protects the traffic on the OMS or OCh level converters are optional.
(optional). The add/drop-stage filters the drop channels out of
the DWDM signal and adds new channels to the ring.
3.4.1. Optical Channel Groups. Optical channels can Single optical channels are typically processed. But it is
be classified as dynamic, static, or express traffic. In order also possible to add or to drop optical channel groups to
Express channels
Patch panel
Dynamic
traffic Wavelength routing
add/drop matrix
Tributary interfaces
OCh OCh
group group
(De-) Add/drop stage (De-)
Mux Mux
... add/drop ports ...
Wavelength
converter array
Hair pinning
Wavelength conversion
Tributary interfaces
a single port. Various technologies for realizing add/drop element of a specific wavelength. This function can
stages will be discussed in Section 4.2. be part of the tributary interfaces as well.
The wavelength converter array can be used for wave-
length assignment of add channels from the tributary Techniques of switching fabrics to perform these functions
interfaces and for wavelength conversion of through chan- are mentioned in Section 4.3.
nels. Technologies are discussed in Sections 4.3 and 4.4. 3.5. The Tributary Interfaces
The distribution stage performs the following functions:
The tributary interfaces prepare the incoming signals
1. Assignment of Channels to Specific Tributary for transmission over the DWDM ring. Incoming signals
Interfaces. Not every tributary signal must be do not necessarily have DWDM quality. Furthermore,
connected to the ring at anytime. There may they monitor the quality of the incoming and outgoing
be ‘‘part-time’’ leased lines that share the same tributary signals. It is a strong customer requirement to
wavelength channel such as one at business hours support a large variety of signal formats and data rates as
other one at nighttime or at weekends. For example, tributary signals. In the simplest case the tributary signal
one customer uses the wavelength channel at the is directly forwarded to the add/drop module (transparent
distribution stage connects tributary signals to OADM). In this case, the incoming optical signal must
specific add channels for insertion into the ring. have the correct wavelength to fit into the DWDM line
For extraction, it connects them to specific drop signal; otherwise, the signal has to enter the ring via
channels. wavelength converters, either as part of the tributary
interface or via the WCA of the add/drop module. The
2. Wavelength Conversion for the Through Chan-
simplest way of signal supervision is monitoring the
nels. Dropped channels can be reinserted on another
optical power (loss of signal) of the incoming and outgoing
wavelength via the link ‘‘wavelength conversion’’ in
tributary signals. A more advanced approach in the optical
Fig. 7. A WCA is required for this application.
domain is the measurement of the optical signal-to-noise
3. Hairpinning. In some applications, add/drop multi- ratio. If digital signal quality supervision (i.e., bit error
plexers are used to route signals between tributary ratio measurement) is needed, 3R regeneration has to be
interfaces. This function is known as ‘‘hairpinning.’’ performed. This normally requires client signal-specific
It can be realized by the connection as shown in processing using dedicated tributary ports for different
Fig. 7. client formats, such as SONET or Gigabit Ethernet.
4. Flexible Wavelength Assignment for the Add Chan-
nels. Besides wavelength conversion, the WCA can 4. OADM COMPONENTS
be used to assign a specific wavelength to an incom-
ing tributary signal. The distribution stage allows This section focuses on component technologies of the
for flexible assignment of tributary signals to a WCA add/drop module and the tributary interfaces.
DWDM RING NETWORKS 755
4.1. Group Filters and Multiplexer/Demultiplexer technologies such as multilayer dielectric thin-film fil-
Components ter (TFF), fixed fiber bragg grating (FBG), diffraction
grating, arrayed waveguide grating (AWG), cascaded
A multiplexer/demultiplexer in the sense of a WDM com- Mach–Zehnder interferometer, and Fabry–Perot interfer-
ponent is a device that provides one optical fiber port for ometer. A detailed discussion of these technologies is given
DWDM signals and multiple fiber ports for optical chan- in Section 3.3 of Ref. 2. Interleaver filter technologies are
nels or channel groups. These multiplexers/demultiplexers described in Refs. 12 and 13.
are passive components that work in both directions.
They can be used for either separating DWDM signals 4.2. Add/Drop Filter Technologies
or for combining optical channels to one DWDM signal. Figure 9 shows two popular realization concepts of
Group filters and WDM multiplexers/demultiplexers are add/drop filters: (left) wavelength multiplexer/demultipl-
basically the same type of component. They just have exer — optical switch combination and (right) tunable filter
different optical filter characteristics. A WDM multi- cascade — circulator/coupler combination.
plexer/demultiplexer combines or separates single optical The multiplexer/demultiplexer–optical switch solution
channels whereas a group filter combines or separates opti- usually demultiplexes the entire DWDM signal and
cal channel groups. Figure 8 shows three basic realizations provides single optical channels via switches at each
of wavelength multiplexer/demultiplexer filters. add/drop port. The tunable filter cascade (Fig. 9, right)
A basic advantage of block channel groups as opposed provides exactly the same functionality. The incoming
to interleaved optical channel groups is the stronger DWDM signal is forwarded clockwise to the tunable filter
suppression of neighbor group signals. Nevertheless, we cascade by a circulator. The tunable filters reflect selected
will see in Section 4.2 that there are applications in channels. All other channels pass the filter cascade.
which interleaved channel groups support the overall The reflected channels travel back to the circulator to
system performance better than channel blocks. Static be forwarded to the drop port, again in a clockwise
multiplexer/demultiplexer filters are usually based on motion [14]. For adding optical channels, either a coupler
λ1 λ 1 ... 4 λ 1, 5, 9...
λ2 λ 5 ... 8 λ 2, 6, 10...
λ 1 ... λ N λ 1 ... λ N λ 1 ... λN
...
...
λN λ N -3 ... N λ N /4 ... N
Circulator Coupler
...
...
... ...
or a circulator can be used. A coupler provides better technology is given in Ref. 18. The optical receiver and
isolation between the add ports and the drop ports. It is transmitter components can be connected via analog
also the least expensive approach. On the other hand, electrical amplifiers allowing for the transmission of any
circulators allow for a lower transmission loss. Here, signal format up to a certain bit rate. If 3R regeneration
the advantage of interleaved channel groups becomes is performed (digital signal processing between receiver
evident: the bandwidth requirements for tunable filters and transmitter), a further limitation to certain signal
are lower. In tunable filter cascades, the reconfiguration formats may apply. In future transparent systems, the
process can affect uninvolved traffic. For example, if use of all-optical wavelength converters is expected. All-
a filter is tuned from wavelength 1 to wavelength 4, it optical wavelength converters make use of nonlinear
passes the wavelengths 2 and 3. Therefore, traffic running optical effects such as cross-gain modulation (CGM), cross
on these channels is affected by the retuning process phase modulation (XPM) or four-wave mixing (FWM).
and thus may render it unusable. The influence of the These effects are treated in detail in Ref. 19, Section 2.7.
dispersion of the tunable filters is strong compared to the Semiconductor optical amplifiers (SOA) are the preferred
multiplexer/demultiplexer–optical switch solution. This active medium as they exhibit strong nonlinearity, wide
is another drawback of tunable filter cascades. If tunable gain bandwidth and easy integration. As a pump source,
filters are tuned to a neutral park position between two unmodulated lasers are used. Tunable pump lasers
densely spaced channels, these channels can be degraded provide the capability of tunable all-optical wavelength
by the dispersion of the filter. This design issue can be converters.
overcome by using interleaved channel groups. Wider
spacing between the channels lowers the bandwidth 5. SUMMARY
requirements for the filters, and thus resulting in a lower
dispersion. The basic advantage of tunable add/drop filters DWDM ring networks provide high capacity, high
is the lower number of internal fiber connections and flexibility (multiservice integration), and high reliability
splices. However, because of the typical characteristics of (protection) at low operational costs to the operators
fiber gratings (e.g., cladding modes), this application is of metropolitan and regional networks. Because of
limited to small and medium size channel groups. A more the simple ring topology, the network management is
detailed comparison of both concepts is given in Ref. 15. relatively less complex. The ring functions are determined
mainly by the optical add/drop multiplexers. The DWDM
4.3. Optical Switching Fabrics line interface mainly determines the maximum ring
circumference that can be achieved. An add/drop module,
Optical switching fabrics provide cross–connection func-
that provides manual routing capabilities, allows for
tionality. They are the technology of choice in distribution
low installation costs. But if frequent reconfiguration is
stages of optical add/drop multiplexers (see Fig. 7). From
necessary, operational costs can become prohibitive. For
the viewpoint of OADM optical signal processing, we can
this type of traffic, wavelength routing capabilities that
distinguish between protection switching and wavelength
provide remotely controlled dynamic routing should be
routing applications.
implemented. The technologies of choice for wavelength
Protection switching requires switching times of a few
routing are integrated optical switching fabrics such as
milliseconds. Switching times up to hundreds of millisec-
MEMS and tunable filters.
onds are sufficient for wavelength routing applications.
DWDM networks can either be transparent or opaque.
For protection switching and wavelength routing appli-
In the transparent realization, no electronic traffic
cations, electromechanical, thermooptic, electrooptic, and
processing occurs within the ring. The transport is
acoustooptic switches can be used, see Ref. 2, Section 3.7
independent of the data format. In opaque networks, for
and Ref. 16, Chap. 10. Today, among the electromechani-
quality-of-service supervision and management reasons,
cal switches, the MEMS technology (microelectromechan-
digital (electronic) processing is performed at the borders
ical system) is seen as the most promising technology
of the DWDM ring network. The transport in opaque
in providing high port counts. Switches of more than
networks may be limited to certain data formats.
100 × 100 ports have been realized.
The business success of network operation is driven
Tunable filters, as discussed in Section 4.2, are also
mainly by an overall cost minimization by selling data
an alternative technology to switching fabrics in order
transport in high volume. This is a general rule in markets
to realize cross–connection functionality. Optical circuits
that are characterized by low differentiation possibilities
using tunable filters and circulators have been presented
and high impact of small differences (volume markets). In
by Chen and Lee [17].
the network operation business, differentiation is possible
by better reliability, availability and flexibility. Therefore,
4.4. Wavelength Converter Technologies DWDM ring networks are ideal in paving the way for
Today, wavelength conversion is performed by O/E/O the business success of metropolitan and regional network
(optical/electrical/optical) conversion using photodiodes operators.
and tunable or fixed-wavelength lasers that are either
modulated externally or internally (directly). Tunable BIOGRAPHIES
lasers allow for covering a wide range of wavelengths
and to minimize the number of wavelength converters Detlef Stoll received a Dipl.-Ing. degree (M.S.) in com-
that are needed. An overview about tunable laser munication engineering in 1988 from the University of
DWDM RING NETWORKS 757
Hannover, Germany, and a Ph.D. degree in electrical Inc. located in Boca Raton, Florida. He participates in
engineering in 1993 from the University of Paderborn, Ger- the OIF standardization activities for Siemens. His areas
many. The subject of his Ph.D. thesis was the derivation of interest are the new technologies of components and
of an integrated model for the calculation of the nonlinear modules for optical networks.
transmission in single-mode optical fibers including all
linear and nonlinear effects. He joined the Information &
Communication Networks Division of the Siemens Cor- BIBLIOGRAPHY
poration in 1994 as a research & development engineer.
At Siemens, he worked on the advance development of 1. M. Sexton and A. Reid, Broadband Networking: ATM, SDH,
broadband radio access networks and on the development and SONET, Artech House, Norwood, MA, 1997.
of SONET/SDH systems. From 1998 to 2000 he led the 2. R. Ramaswami and K. N. Sivarajan, Optical Networks, Mor-
advance development of a Wavelength Routing Metropoli- gan Kaufmann, San Francisco, CA, 1998.
tan DWDM Ring Network for a multi-vendor field trial.
3. P. Kotler, Marketing Management, Prentice-Hall, Englewood
Since 2000 he is managing the development of optical
Cliffs, NJ, 1999.
network solutions at Optisphere Networks, Inc. located in
Boca Raton, Florida. Dr. Stoll has filed approximately 20 4. P. Arijs, M. Gryseels, and P. Demeester, Planning of WDM
patents and published numerous articles in the field of ring networks, Photon. Network Commun. 1: 33–51 (2000).
optical networks and forward error correction techniques. 5. ITU-T Recommendation G.872, Architecture of the Opti-
His areas of interest are linear and nonlinear systems and cal Transport Networks, International Telecommunication
wavelength routing networks. Union, Geneva, Switzerland, Feb. 1999.
6. ITU-T Recommendation G.709, Interface for the Optical
Juergen Heiles received a Dipl.-Ing. (FH) degree in Transport Network (OTN), International Telecommunication
Union, Geneva, Switzerland, Oct. 2001.
electrical engineering from the University of Rhineland-
Palatinate, Koblenz, Germany, in 1986. He joined the 7. I. S. Reed and X. Chen, Error-Control Coding for Data
Public Networks Division of the Siemens Corporation, Networks, Kluwer, Norwell, MA, 1999.
Munich, Germany, in 1986 as a research & development 8. ITU-T Recommendation G.841, Types and Characteristics
engineer of satellite communication systems and, later of SDH Network Protection Architectures, International
on, fiber optic communication systems. Since 1998 he has Telecommunication Union, Geneva, Switzerland, Oct. 1998.
been responsible for the standardization activities of the 9. GR-1230-CORE, SONET Bidirectional Line-Switched Ring
Siemens Business Unit Optical Networks. He participates Equipment Generic Criteria, Telcordia, 1998.
in the ITU, T1, OIF and IETF on SONET/SDH, OTN, 10. P. C. Becker, N. A. Olsson, and J. R. Simpson, Erbium-Doped
ASON, and GMPLS and is editor or coauthor of several Fiber Amplifiers, Fundamentals and Technology, Academic
standard documents. Juergen Heiles has over 15 years Press, San Diego, CA, 1997.
of experience in the design, development, and system
11. D. Stoll, P. Leisching, H. Bock, and A. Richter, Best effort
engineering of optical communication systems starting lambda routing by cost optimized optical add/drop multi-
from PDH systems over the first SONET/SDH systems plexers and cross-connects, Proc. NFOEC, Baltimore, MD,
to today’s DWDM networks. He holds three patents Session A-1, July 10, 2001.
in the areas of digital signal processing for optical
12. H. van de Stadt and J. M. Muller, Multimirror Fabry-Perot
communication systems. His areas of interest are network interferometers, J. Opt. Soc. Am. A 8: 1363–1370 (1985).
architectures and the functional modeling of optical
networks. 13. B. B. Dingel and M. Izutsu, Multifunction optical filter with
a Michelson-Gires-Tournois interferometer for wavelength-
division-multiplexed network system applications, Opt. Lett.
Jimin Xie received a B.S. degree in electronics in 14: 1099–1101 (1998).
1986 from the Institute of Technology in Nanjing,
14. U.S. Patent 5,748,349 (1998), V. Mizrahi, Gratings-based
China, and an M.S. degree in radio communications in optical add-drop multiplexers for WDM optical communi-
1988 from the Ecole Superieure d’Electricite in Paris, cation systems.
France, and a Ph.D. degree in opto-electronics in 1993
15. D. Stoll, P. Leisching, H. Bock, and A. Richter, Metropolitan
from the Université Paris-Sud, France. He worked on
DWDM: A dynamically configurable ring for the KomNet field
submarine soliton transmission systems for Alcatel in trial in Berlin, IEEE Commun. Mag. 2: 106–113 (2001).
France from 1993 to 1995. In 1996, he worked at
16. I. P. Kaminow and T. L. Koch, Optical Fiber Telecommunica-
JDS Uniphase as a group leader in the Strategy and
tions III B, Academic Press, Boston, MA, 1997.
Research Department in Ottawa, Canada. At JDSU, he
worked on the design and development of passive DWDM 17. Y.-K. Chen and C.-C. Lee, Fiber Bragg grating based
components, such as interleaver filters, DWDM filters, large nonblocking multiwavelength cross-connects, IEEE J.
Lightwave Technol. 16: 1746–1756 (1998).
OADM modules, polarization detectors and controllers,
dispersion compensators, gain flatteners, optical switches, 18. E. Kapon, P. L. Kelley, I. P. Kaminow, and G. P. Agrawal,
etc. Dr. Xie has filed 12 patents in the field of optical Semiconductor Lasers II, Academic Press, Boston, MA, 1999.
components. Since 2001, he has been the manager of the 19. G. P. Agrawal, Fiber-optic Communication Systems, Wiley,
Optical Technologies department at Optisphere Networks, New York, NY, 1998.
E
EXTREMELY LOW FREQUENCY (ELF) with submarines [7]. Unlike many other radiobands, there
ELECTROMAGNETIC WAVE PROPAGATION are strong natural sources of ELF electromagnetic waves.
The strongest of these in most locations on the earth
STEVEN CUMMER is lightning, but at high latitudes natural emissions from
Duke University processes in the magnetosphere (the highest portions of the
Durham, North Carolina earth’s atmosphere) can also be strong. There are a variety
of scientific and geophysical applications that rely on either
natural or man-made ELF waves, including subsurface
1. INTRODUCTION geologic exploration [8], ionospheric remote sensing [9],
and lightning remote sensing [10].
Extremely low frequency (ELF) electromagnetic waves
are currently the lowest frequency waves used routinely
for wireless communication. The IEEE standard radio 2. THE NATURE OF ELF PROPAGATION
frequency spectrum defines the ELF band from 3 Hz
to 3 kHz [1], although the acronym ELF is often used Communication system performance analysis and most
somewhat loosely with band limits near these official geophysical applications require accurate, quantitative
boundaries. Because of the low frequencies involved, models of ELF propagation. Because the ELF fields
the bandwidth available for ELF communication is very are confined to a region comparable to or smaller than
small, and the data rate of correspondingly low. Existing a wavelength, phenomena in ELF transmission and
ELF communications systems with signal frequencies propagation are substantially different from those at
between 40 and 80 Hz transmit only a few bits per higher frequencies. But before delving too deeply into the
minute [2]. Despite this severe limitation, ELF waves have relevant mathematics, a qualitative description will give
many unique and desirable properties because of their substantial insight into the physics of ELF transmission,
low frequencies. These properties enable communication propagation, and reception.
under conditions where higher frequencies are simply not
usable. Like all electromagnetic waves excited on the 2.1. Transmission
ground in the HF (3–30 MHz) and lower bands, ELF A fundamental limit on essentially all antennas that
waves are strongly reflected by the ground and by the radiate electromagnetic waves is that, for maximum
Earth’s ionosphere, the electrically conducting portion of efficiency, their physical size must be a significant fraction
the atmosphere above roughly 60 km altitude [3]. The of the wavelength of the radiated energy [11]. At ELF, this
ground and ionosphere bound a spherical region commonly is a major practical hurdle because of the tremendously
referred to as the earth–ionosphere waveguide in which long wavelengths (3000 km at 100 Hz) of the radiated
ELF and VLF (very low frequency, 3–30 kHz) propagate. waves. Any practical ELF transmission system will be very
The combination of strongly reflecting boundaries and small relative to the radiated wavelength. Very large and
long ELF wavelengths (3000 km at 100 Hz) enables ELF very carefully located systems are required to radiate ELF
waves to propagate extremely long distances with minimal energy with sufficient power for communication purposes,
attenuation. Measured ELF attenuation rates (defined as and even then the antenna will still be very inefficient.
the signal attenuation rate in excess of the unavoidable Multikilometer vertical antennas are not currently
geometric spreading of the wave energy) are typically practical. Thus a long horizontal wire with buried ends
only ∼1 dB per 1000 km at 70–80 Hz [4–6]. The signal forms the aboveground portion of existing ELF antennas.
from a single ELF transmitter can, in fact, be received The aboveground wire is effectively one half of a magnetic
almost everywhere on the surface of the planet, even loop antenna, with the other half formed by the currents
though only a few watts of power are radiated by closing below ground through the earth, as shown in Fig. 1.
existing ELF systems. Because the wavelength at ELF The equivalent depth deq of the closing current with fre-
is so long, ELF antennas are very inefficient and it quency ω over a homogeneous ground of conductivity σg
takes a very large antenna (wires tens of kilometers is [12]
long) to radiate just a few watts of ELF energy. This deq = (ωσg µ0 )−1/2 = 2−1/2 δ (1)
makes ELF transmitters rather large and expensive.
Only two ELF transmitters currently operate in the where δ is the skin depth [13] of the fields in the conducting
world, one in the United States and one in Russia. ground. A grounded horizontal wire antenna of length
Besides their global propagation, the value of ELF waves l is thus equivalent to an electrically small magnetic
results from their ability to penetrate effectively through loop of area 2−1/2 δl. This effective antenna area, which is
electrically conducting materials, such as seawater and linearly related to the radiated field strength, is inversely
rock, that higher frequency waves cannot penetrate. These proportional to the ground conductivity. This implies that
properties make ELF waves indispensable for applications a poorly conducting ground forces a deeper closing current
that require long-distance propagation and penetration and therefore increases the efficiency of the antenna.
through conducting materials, such as communication This general effect is opposite that for vertically oriented
758
EXTREMELY LOW FREQUENCY (ELF) ELECTROMAGNETIC WAVE PROPAGATION 759
The low attenuation of ELF waves propagating on involved in towing a long wire behind a moving submarine
the surface of a sphere also produces a number of to act as an ELF antenna [23].
interesting propagation effects. As an ELF receiver Designing an antenna preamplifier for an ELF system
approaches the point on the earth directly opposite the is also not trivial, especially if precise signal amplitude
transmitter (the antipode), the signals arriving from all calibration is needed. Because the output reactance of
angles are comparable in amplitude, and therefore all an electrically short wire antenna is very large, the
contribute to the electromagnetic fields. This leads to so- preamplifier in such a system must have a very high input
called antipodal focusing, which has been theoretically impedance [12]. And even though the output impedance
investigated [12,19,20] but was experimentally verified of a small loop antenna is very small, system noise
only in 1998 [16]. issues usually force a design involving a stepup voltage
At frequencies low enough that the attenuation of transformer and a low-input-impedance preamplifier [24].
a fully around-the-world signal is not too severe, the Another practical difficulty with ELF receivers is that in
multi-round-trip signals self-interfere, producing cavity many locations, electromagnetic fields from power lines
resonances of the earth–ionosphere shell. These reso- (60 Hz and harmonics in the United States, 50 Hz and
nances are commonly called the Schumann resonances, harmonics in Europe and Japan) are much stronger
after W. O. Schumann who first predicted them theoret- than the desired signal. Narrowband ELF receivers must
ically [21]. They can often be observed as peaks in the carefully filter this noise, while ELF broadband receivers,
broadband background ELF fields in an electromagneti- usually used for scientific purposes, need to be located far
cally quiet location and are generated primarily by steady from civilization to minimize the power line noise.
lightning activity around the globe [22]. The frequencies The frequencies of ELF and VLF electromagnetic
of the first three resonances are approximately 8, 14, and waves happen to overlap with the frequencies of audible
20 Hz [22], and in a low-noise location they can be observed acoustic waves. By simply connecting the output of an
up to at least 7 orders [16]. ELF sensor directly to a loudspeaker, one can ‘‘listen’’ to
Wave propagation in the earth–ionosphere waveguide, the ELF and VLF electromagnetic environment. Besides
like that in a simple parallel-plate waveguide, can be single-frequency signals from ELF transmitters and short,
very compactly described mathematically by a sum of broadband pulses radiated by lightning, one might hear
discrete waveguide modes that propagate independently more exotic ELF emissions such as whistlers [25], which
within the boundaries. The mathematical details of the are discussed below, and chorus [26].
mode theory of waveguide propagation are discussed in
later sections. An important consequence of this theory is 3. APPLICATIONS OF ELF WAVES
that frequencies with a wavelength greater than half the
waveguide height can propagate in only a single mode. For The unique properties of ELF electromagnetic waves
the earth–ionosphere waveguide, propagation is single propagating between the earth and the ionosphere,
mode at frequencies less than approximately 1.5–2.0 kHz, specifically their low attenuation with distance and their
depending on the specific ionospheric conditions. This relatively deep penetration into conducting materials,
suggests a propagation-based definition of ELF, which make them valuable in a variety of communication and
is sometimes used in the scientific literature, as the scientific applications discussed below.
frequencies that propagate with only a single mode in the
earth–ionosphere waveguide (i.e., f 2 kHz) and those for 3.1. Submarine Communication
which multiple paths around the earth are not important The primary task of existing ELF communications systems
except near the antipode (i.e., f 50 Hz, above Schumann is communication with distant submerged submarines.
resonance frequencies). The ability to send information to a very distant submarine
(thousands of kilometers away) without requiring it
2.3. Reception to surface (and therefore disclose its position) or even
rise to a depth where its wake may be detectable
Receiving or detecting ELF electromagnetic waves is on the surface is of substantial military importance.
substantially simpler than transmitting them. Because The chief difficulty in this, however, is in transmitting
of the long wavelength at ELF, any practical receiving electromagnetic waves into highly conducting seawater.
antenna will be electrically small and the fields will be The amplitude of electromagnetic waves propagating in
spatially uniform over the receiving antenna aperture. An an electrically conducting material decays exponentially
ELF receiving antenna is thus usually made from either with distance. The distance over which waves attenuate
straight wires or loops as long or as large as possible. Both by a factor of e−1 is commonly referred to as the ‘‘skin
of these configurations are used as ELF receiving antennas depth.’’ This attenuation makes signal penetration beyond
in communication and scientific applications [12]. a few skin depths into any material essentially impossible.
Again, because of the long wavelength and low For an electromagnetic wave with angular frequency
transmitted field strength, the maximum practical length ω propagating in a material with conductivity σ and
possible for an electric wire antenna and the maximum permittivity ε, if σ/ωε 1, then the skin depth δ is very
area and number of turns for a magnetic loop antenna closely approximated by
are normally preferred to receive the maximum possible
−1/2
signal. How to deploy an antenna as big as possible is often 2
δ= (2)
an engineering challenge, as demonstrated by the issues ωσ µ0
EXTREMELY LOW FREQUENCY (ELF) ELECTROMAGNETIC WAVE PROPAGATION 761
propagation of the ELF signal (techniques for this are approaches taken by these researchers are fundamentally
described in the later sections of this article), important similar but treat the ionospheric boundary in different
parameters describing the current and charge transfer ways. A solution for a simplified ELF propagation problem
in the lightning can be quantitatively measured [10]. that provides significant insight into subionospheric ELF
Applications of this technique for lightning remote sensing propagation is summarized below, followed by a treatment
have led to discoveries about the strength of certain with minimal approximations that is capable of more
lightning processes [35] and have helped us understand accurate propagation predictions.
the kind of lightning responsible for a variety of the most
recently discovered effects in the mesosphere from strong 4.1. Simplified Waveguide Boundaries
lightning [36]. To understand the basic principles of ELF propagation,
A specific kind of natural ELF–VLF electromagnetic we first consider a simplified version of the problem.
emission led to the discovery in the 1950s that near-earth Consider a vertically oriented time-harmonic short electric
space is not empty but rather filled with ionized gas, or dipole source at a height z = zs between two flat, perfectly
plasma. A portion of the ELF and VLF electromagnetic conducting plates separated by a distance h, as shown in
wave energy launched by lightning discharges escapes Fig. 4. This simple approximation gives substantial insight
the ionosphere and, under certain conditions, propagates into the ELF propagation problem, which is harder to see
along magnetic field lines from one hemisphere of in more exact treatments of the problem. This simplified
earth to the other. These signals are called ‘‘whistlers’’ problem is, in fact, a reasonably accurate representation
because when the signal is played on a loudspeaker, of the true problem because at ELF the ionosphere and
the sound is a frequency-descending, whistling tone that ground are very good electrical conductors. The main
lasts around a second. Thorough reviews of the whistler factors neglected in this treatment are the curvature of
phenomenon can be found in Refs. 25 and 37. L. Storey, the earth and the complicated reflection from and losses
in groundbreaking research, identified lightning as the generated in a realistic ionosphere.
source of whistlers and realized that the slow decrease in Wait [19] solved this problem analytically and showed
frequency with time in the signal could be explained if the that the vertical electric field produced by this short
ELF–VLF waves propagated over a long, high-altitude electric dipole source, as a function of height z and distance
path through an ionized medium [38]. The presence of r from the source, is given by
plasma in most of near-earth space was later confirmed
µ0 ω I dl
with direct measurements from satellites. The study of ∞
natural ELF waves remains an area of active research, Ez (r, z) = δn S2n H0(2) (kSn r)
4h
including emissions observable on the ground at high n=0
latitudes [26,39] and emissions observed directly in space × cos(kCn Z0 ) cos(kCn z) (3)
on satellites [e.g., 40] and rockets [e.g., 41].
An interesting modern ELF-related research area is where ω = 2π × source frequency
ionospheric heating. A high-power HF electromagnetic k = ω/c
wave launched upward into the ionosphere can nonlinearly Idl = source current × dipole length
interact with the medium and modify the electrical δ0 = 12 , δn = 1, n ≥ 1
characteristics of the ionosphere [42,43]. By modulating Cn = nλ/2h = cosine of the eigenangle θn of the
the HF wave at a frequency in the ELF band, and nth waveguide mode
by doing so at high latitudes where strong ionospheric Sn = (1 − C2n )1/2 = sine of the eigenangle θn of the
electric currents routinely flow [44], these ionospheric nth waveguide mode
currents can be modulated, forming a high-altitude ELF H0(2) = Hankel function of zero order and
antenna [45]. The ELF signals launched by this novel second kind
antenna could be used for any of the applications discussed
If |kSn /r| 1, the Hankel function can be
above. Currently, a major research project, the High-
replaced by its asymptotic expansion H0(2) (kSn r) =
frequency Auroral Active Research Program (HAARP),
is devoted to developing such a system in Alaska [46].
Perfect conductor
4. MATHEMATICAL MODELING OF ELF PROPAGATION
Short
Accurate mathematical modeling of ELF propagation is z vertical Vacuum
needed for many of the applications described in the antenna
h
previous sections. It is also complicated because of the r (z,r )
influence of the inhomogeneous and anisotropic ionosphere
on the propagation. Some of the best-known researchers zs
in electromagnetic theory (most notably J. Wait and K.
Budden) worked on and solved the problem of ELF and Perfect conductor
VLF propagation in the earth–ionosphere waveguide. This Figure 4. A simplified version of the earth–ionosphere wave-
work is described in many scientific articles [e.g., 47–51] guide propagation problem. A slab of free space containing
and technical reports [e.g., 52,53], and is also conveniently a vertical antenna is bounded above and below by perfectly
and thoroughly summarized in a few books [19,20,54]. The conducting planes.
EXTREMELY LOW FREQUENCY (ELF) ELECTROMAGNETIC WAVE PROPAGATION 763
Altitude (km)
from the solution for Ez [19]. In general, all the field 120
Midnight
components in a single mode vary sinusoidally with Midday
altitude; thus the altitude variation of the fields is also 80
quite complicated in the case of multimode propagation. Positive
40
The fields produced by a horizontal electric dipole, such as Electrons ions
the ELF antenna described above for submarine signaling,
0
are a similar modal series containing different field 100 101 102 103 104 105 106
components [19]. Electron and ion density (cm−3)
Altitude (km)
propagation, it is rarely accurate enough for quantita- 120
Electrons
tively correct simulations of earth–ionosphere waveguide
propagation. The real ionosphere is not a sharp interface Ions
80
like that assumed above, but is a smoothly varying waveg-
uide boundary. Nor is it a simple electrical conductor as 40
assumed; it is a magnetized cold plasma with complicated
electromagnetic properties. The key ionospheric parame- 0
ters for electromagnetic wave propagation and reflection 102 104 106 108 1010 1012
are the concentration of free electrons and ions (electron Collision frequency (sec−1)
and ion density), the rate of collisions for electrons and Figure 6. Representative midday and midnight ionospheric
ions with other atmospheric molecules (collision frequen- electron density, ion density, and collision frequency profiles.
cies), and the strength and direction of Earth’s magnetic Negative ions are also present where needed to maintain charge
field. Figure 6 shows representative altitude profiles of neutrality.
electron and ion density for midday and midnight and
profiles of electron and ion collision frequencies. The index
of refraction of a cold plasma like the ionosphere is a com- Ionospheric boundary
plicated function of all of these parameters [58], in which R = Ri (f, qinc)
is therefore difficult to handle in analytic calculations.
Also, in general, the ground is not a perfect conductor and Short
may even be composed of layers with different electromag- vertical
Vacuum
netic properties, such as an ice sheet on top of ordinary antenna
ground. These realistic boundaries cannot be treated as h (z,r )
z
homogeneous and sharp.
r zs
Both Budden [49] and Wait [19] derived solutions for
the earth–ionosphere waveguide problem with arbitrary
Ground boundary
boundaries. Their solutions are fundamentally similar, R = Rg (f, qinc)
with slight differences in the derivation and how certain
complexities, such as the curved earth, are treated. We Figure 7. A generalized version of the earth–ionosphere waveg-
follow Wait’s treatment below. uide propagation problem. A slab of free space containing a
vertical antenna is bounded above and below by completely arbi-
trary reflecting boundaries.
4.2.1. Propagation Modeling with General Bound-
aries. We again consider propagation inside a free-space
region between two boundaries separated by a distance h.
Now, however, the boundaries at z = 0 and z = h are com- plasma and thus is anisotropic, these field groups are
pletely general and are described only by their plane-wave coupled at the upper boundary and an incident TE or TM
reflection coefficients, as shown in Fig. 7. These reflec- wave produces both TE and TM reflections. Purely TE or
tion coefficients are functions of incidence angle, incident TM fields do not exist in the earth–ionosphere waveguide.
polarization, and frequency. This important generalization The coupling between the fields means that the reflection
makes the following formulation applicable to essentially coefficient from the upper boundary is not a scalar quantity
any two-dimensional planar waveguide, regardless of size but is rather a 2 × 2 matrix, where each matrix element is
or bounding material. one of the four different reflection coefficients for a specific
Ordinarily, the electromagnetic fields can be separated incident and reflected polarization. The lower boundary
into transverse electric (TE) and transverse magnetic (TM) of the earth–ionosphere waveguide is the ground, which
groups that propagate independently in a two-dimensional is generally isotropic so that the cross-polarized reflection
waveguide. However, since the ionosphere is a magnetized coefficients in the ground reflection matrix are zero.
EXTREMELY LOW FREQUENCY (ELF) ELECTROMAGNETIC WAVE PROPAGATION 765
With this in mind, we define RI , the reflection matrix From Eqs. (5)–(7), an explicit expression for the
of the ionosphere at altitude z = h, and RG , the reflection Hertz vectors in the free-space region between the two
matrix of the ground at z = 0, as boundaries is given by
i i g Uz ikIdl exp 2ikCn h
|| R|| || R⊥ || R|| 0 =
RI (θ ) = i i RG (θ ) = g (4) Vz 4ωε0 n
⊥ R|| ⊥ R⊥ 0 ⊥ R⊥ ∂/∂C
θ =θn
in 2000 and a Presidential Early Career Award for 21. W. O. Schumann, Über die strahlungslosen eigenschingun-
Scientists and Engineers (PECASE) in 2001. His current gen einer leitenden kugel, diel von einer luftschicht und einer
research is in a variety of problems in ionospheric ionosphärenhülle umgeben ist, Z. Naturforsch. 7a: 149 (1952).
and space physics, emphasizing electromagnetic modeling 22. D. D. Sentman, Schumann resonances, in H. Volland, ed.,
and remote sensing using ground-based and satellite Handbook of Atmospheric Electrodynamics, Vol. 1, CRC
instruments. Press, Boca Raton, FL, 1995, pp. 267–295.
23. C. T. Fessenden and D. H. S. Cheng, Development of a
trailing-wire E-field submarine antenna for extremely low
BIBLIOGRAPHY frequency (ELF) reception, IEEE Trans. Commun. 22:
428–437 (1974).
1. J. Radatz, ed., The IEEE Standard Dictionary of Electrical 24. C. D. Motchenbacher and J. A. Connelly, Low Noise Elec-
and Electronics Terms, IEEE, New York, 1997. tronic System Design, Wiley-Interscience, Englewood Cliffs
NJ, 1993.
2. N. Friedman, The Naval Institute Guide to World Naval
Weapons Systems, U.S. Naval Institute, Annapolis, MD, 1997. 25. V. S. Sonwalkar, Whistlers, in J. G. Webster, ed., Wiley
Encyclopedia of Electrical and Electronic Engineering, pages
3. K. Davies, Ionospheric Radio, Peter Peregrinus, London,
1990. Wiley, New York, 1999, pp. 580–591.
4. W. L. Taylor and K. Sao, ELF attenuation rates and 26. S. S. Sazhin and M. Hayakawa, Magnetospheric chorus
phase velocities observed from slow tail components of emissions — a review, Planet. Space Sci. 50: 681–697 (May
atmospherics, Radio Sci. 5: 1453–1460 (1970). 1992).
5. D. P. White and D. K. Willim, Propagation measurements 27. J. N. Murphy and H. E. Parkinson, Underground mine com-
in the extremely low frequency (ELF) band, IEEE Trans. munications, Proc. IEEE 66: 26–50 (1978).
Commun. 22(4): 457–467 (April 1974). 28. E. R. Swanson, Omega, Proc. IEEE 71(10): 1140–1155 (1983).
6. P. R. Bannister, Far-field extremely low frequency (ELF) 29. G. M. Hoversten, Papua new guinea MT: Looking where
propagation measurements, IEEE Trans. Commun. 22(4): seismic is blind, Geophys. Prospect. 44(6): 935–961 (1996).
468–473 (1974).
30. H. G. Hughes, R. J. Gallenberger, and R. A. Pappert, Evalu-
7. T. A. Heppenheimer, Signalling subs, Popular Sci. 230(4): ation of nighttime exponential ionospheric models using VLF
44–48 (1987). atmospherics, Radio Sci. 9: 1109 (1974).
8. K. Vozoff, The magnetotelluric method, in M. Nabighian, 31. S. A. Cummer, U. S. Inan, and T. F. Bell, Ionospheric D
ed., Electromagnetic Methods in Applied Geophysics, Vol. 2, region remote sensing using VLF radio atmospherics, Radio
Society of Exploration Geophysics, Tulsa, OK, 1991, Sci. 33(6): 1781–1792 (Nov. 1998).
pp. 641–711.
32. S. A. Cummer, Modeling electromagnetic propagation in the
9. S. A. Cummer and U. S. Inan, Ionospheric E region remote earth-ionosphere waveguide, IEEE Trans. Antennas Propag.
sensing with ELF radio atmospherics, Radio Sci. 35: 1437 48: 1420 (2000).
(2000).
33. J. V. Evans, Theory and practice of ionosphere study by
10. S. A. Cummer and U. S. Inan, Modeling ELF radio atmo- thomson scatter radar, Proc. IEEE 57: 496 (1969).
spheric propagation and extracting lightning currents from
ELF observations, Radio Sci. 35: 385–394 (2000). 34. J. D. Mathews, J. K. Breakall, and S. Ganguly, The measure-
ment of diurnal variations of electron concentration in the
11. J. D. Kraus, Antennas, McGraw-Hill, New York, 1988.
60–100 km, J. Atmos. Terr. Phys. 44: 441 (1982).
12. M. L. Burrows, ELF Communications Antennas, Peter Pere-
35. S. A. Cummer and M. Füllekrug, Unusually intense continu-
grinus, Herts, UK, 1978.
ing current in lightning causes delayed mesospheric break-
13. U. Inan and A. Inan, Engineering Electromagnetics, Prentice- down, Geophys. Res. Lett. 28: 495 (2001).
Hall, Englewood Cliffs, NJ, 1999.
36. S. A. Cummer and M. Stanley, Submillisecond resolution
14. P. R. Bannister, Summary of the Wisconsin test facility lightning currents and sprite development: Observations and
effective earth conductivity measurements, Radio Sci. 11: implications, Geophys. Res. Lett. 26(20): 3205–3208 (Oct.
405–411 (1976). 1999).
15. J. C. Kim and E. I. Muehldorf, Naval Shipboard Communi- 37. R. A. Helliwell, Whistlers and Related Ionospheric Phenom-
cations Systems, Prentice-Hall, Englewood Cliffs, NJ, 1995. ena, Stanford Univ. Press, Stanford, CA, 1965.
16. A. C. Fraser-Smith and P. R. Bannister, Reception of ELF 38. L. R. O. Storey, An investigation of whistling atmospherics,
signals at antipodal distances, Radio Sci. 33: 83–88 (1998). Phil. Trans. Roy. Soc. London, Ser. A 246: 113 (1953).
17. D. R. MacGorman and W. D. Rust, The Electrical Nature of 39. A. J. Smith et al., Periodic and quasiperiodic ELF/VLF
Storms, Oxford Univ. Press, New York, 1998. emissions observed by an array of antarctic stations, J.
18. D. A. Chrissan and A. C. Fraser-Smith, Seasonal variations Geophys. Res. 103: 23611–23622 (1998).
of globally measured ELF/VLF radio noise, Radio Sci. 31(5): 40. P. Song et al., Properties of ELF emissions in the dayside
1141–1152 (Sept. 1996). magnetopause, J. Geophys. Res. 103: 26495–26506 (1998).
19. J. R. Wait, Electromagnetic Waves in Stratified Media, 41. P. M. Kintner, J. Franz, P. Schuck, and E. Klatt, Interfero-
Pergamon Press, Oxford, 1970. metric coherency determination of wavelength or what are
20. J. Galejs, Terrestrial Propagation of Long Electromagnetic broadband ELF waves? J. Geophys. Res. 105: 21237–21250
Waves, Pergamon Press, Oxford, 1972. (Sept. 2000).
EM ALGORITHM IN TELECOMMUNICATIONS 769
42. A. V. Gurevich, Nonlinear Phenomena in the Ionosphere, 63. J. R. Wait, Mode conversion and refraction effects in the
Springer-Verlag, Berlin, 1978. earth-ionosphere waveguide for VLF radio waves, J. Geophys.
43. K. Papadopoulos, Ionospheric modification by radio waves, in Res. 73(11): 3537–3548 (1968).
V. Stefan, ed., Nonlinear and Relativistic Effects in Plasmas, 64. R. A. Pappert and D. G. Morfitt, Theoretical and experimen-
American Institute of Physics, New York, 1992. tal sunrise mode conversion results at VLF, Radio Sci. 10:
44. A. D. Richmond and J. P. Thayer, Ionospheric electrodynam- 537 (1975).
ics: A tutorial, in S. Ohtani, ed., Magnetospheric Current 65. J. E. Bickel, J. A. Ferguson, and G. V. Stanley, Experimental
Systems, American Geophysical Union, Washington, DC, observation of magnetic field effects on VLF propagation at
2000, pp. 131–146. night, Radio Sci. 5: 19 (1970).
45. R. Barr, The generation of ELF and VLF radio waves in the 66. C. Greifinger and P. Greifinger, On the ionospheric param-
ionosphere using powerful HF transmitters, Adv. Space Res. eters which govern high-latitude ELF propagation in the
21(5): 677–687 (1998). earth-ionosphere waveguide, Radio Sci. 14(5): 889–895 (Sept.
46. G. M. Milikh, M. J. Freeman, and L. M. Duncan, First esti- 1979).
mates of HF-induced modifications of the D-region by the HF 67. J. D. Huba and H. L. Rowland, Propagation of electromag-
active auroral research program facility, Radio Sci. 29(5): netic waves parallel to the magnetic field in the nightside
1355–1362 (Sept. 1994). Venus ionosphere, J. Geophys. Res. 98: 5291 (1993).
47. K. G. Budden, The propagation of very-low-frequency radio 68. C. Greifinger and P. Greifinger, Noniterative procedure for
waves to great distances, Phil. Mag. 44: 504 (1953). calculating ELF mode constants in the anisotropic Earth-
48. J. R. Wait, The mode theory of v.l.f. ionospheric propagation ionosphere waveguide, Radio Sci. 21(6): 981–990 (1986).
for finite ground conductivity, Proc. IRE 45: 760 (1957). 69. A. Taflove and S. C. Hagness, Computational Electrodynam-
49. K. G. Budden, The influence of the earth’s magnetic field on ics: The Finite-Difference Time-Domain Method, Artech
radio propagation by wave-guide modes, Proc. Roy. Soc. A House, Norwood, MA, 2000.
265: 538 (1962). 70. M. Thevenot, J. P. Berenger, T. Monediere, and F. Jecko, A
50. J. R. Wait, On the propagation of E.L.F. pulses in the earth- FDTD scheme for the computation of VLF-LF propagation in
ionosphere waveguide, Can. J. Phys. 40: 1360 (1962). the anisotropic earth-ionosphere waveguide, Ann. Telecom-
mun. 54: 297–310 (1999).
51. R. A. Pappert and W. F. Moler, Propagation theory and
calculations at lower extremely low frequencies (ELF), IEEE
Trans. Commun. 22(4): 438–451 (April 1974).
52. J. R. Wait and K. P. Spies, Characteristics of the Earth- EM ALGORITHM IN TELECOMMUNICATIONS
Ionosphere Waveguide for VLF Radio Waves, Technical
Report, NBS Technical Note 300, National Bureau of COSTAS N. GEORGHIADES
Standards, 1964. Texas A&M University
53. D. G. Morfitt and C. H. Shellman, MODESRCH, an Improved College Station, Texas
Computer Program for Obtaining ELF/VLF/LF Mode PREDRAG SPASOJEVIĆ
Constants in an Earth-Ionosphere Waveguide, Technical Rutgers, The State University of
Report Interim Report 77T, Naval Electronic Laboratory New Jersey
Center, San Diego, CA, 1976. Piscataway, New Jersey
54. K. G. Budden, The Wave-Guide Mode Theory of Wave
Propagation, Logos Press, London, 1961. 1. INTRODUCTION
55. G. Keiser, Optical Fiber Communications, McGraw-Hill,
Boston, MA, 2000. Since its introduction in the late 1970s as a general
iterative procedure for producing maximum-likelihood
56. F. B. Jensen, W. A. Kuperman, M. B. Porter, and H. Schmidt,
estimates in cases a direct approach is computa-
Computational Ocean Acoustics, American Institute of
Physics, Woodbury, NY, 1994. tionally or analytically intractable, the expectation-
maximization (EM) algorithm has been used with
57. J. R. Wait and A. Murphy, The geometrical optics of VLF sky
increasing frequency in a wide variety of application areas.
wave propagation, Proc. IRE 45: 754 (1957).
Perhaps not surprisingly, one of the areas that has seen
58. K. G. Budden, The Propagation of Radio Waves, Cambridge an almost explosive use of the algorithm since the early
Univ. Press, New York, 1985. 1990s is telecommunications. In this article we describe
59. R. V. Churchill and J. W. Brown, Complex Variables and some of the varied uses of the EM algorithm that appeared
Applications, McGraw-Hill, New York, 1990. in the literature in the area of telecommunications since
60. R. A. Pappert and J. A. Ferguson, VLF/LF mode conversion its introduction and include some new results on the algo-
model calculations for air to air transmissions in the earth- rithm that have not appeared elsewhere in the literature.
ionosphere waveguide, Radio Sci. 21: 551–558 (1986). The expectation-maximization (EM) algorithm was
61. J. A. Ferguson and F. P. Snyder, Approximate VLF/LF Mode introduced in its general form by Dempster et al. in
Conversion Model, Technical Report, Technical Document 1977 [1]. Previous to that time a number of authors
400, Naval Ocean Systems Center, San Diego, CA, 1980. had in fact proposed versions of the EM algorithm for
62. J. A. Ferguson, F. P. Snyder, D. G. Morfitt, and C. H. Shell- particular applications (see, e.g., Ref. 4), but it was [1]
man, Long-wave Propagation Capability and Documentation, that established the algorithm (in fact, as argued by some,
Technical Report, Technical Document. 400, Naval Ocean a ‘‘procedure’’, rather than an algorithm) as a general
Systems Center, San Diego, CA, 1989. tool for producing maximum-likelihood (ML) estimates in
770 EM ALGORITHM IN TELECOMMUNICATIONS
situations where the observed data can be viewed as where g(y | b) is the conditional density of the data given
‘‘incomplete’’ in some sense. In addition to introducing the parameter vector to be estimated. In many cases a
the EM algorithm as a general tool for ML estimation, simple explicit expression for this conditional density does
Dempster et al. [1] also dealt with the important issue of not exist, or is hard to obtain. In other cases such an
convergence, which was further studied and refined by the expression may be available, but it is one that does not
work of Wu [5]. lend itself to efficient maximization over the set of param-
Following the publication of [1], the research com- eters. In such situations, the expectation–maximization
munity experienced an almost explosive use of the EM algorithm may provide a solution, albeit iterative (and
algorithm in a wide variety of applications beyond the possibly numerical), to the ML estimation problem.
statistics area in which it was introduced. Early applica- The EM-based solution proceeds as follows. Suppose
tion areas for the EM algorithm outside its native area that instead of the data y actually available one had data
of statistics include genetics, image processing, and, in x ∈ X from which y could be obtained through a many-to-
particular, positron emission tomography (PET) [6,7], (an one mapping x → y(x), and such that their knowledge
excellent treatment of the application of the EM algorithm makes the estimation problem easy (for example, the
to the PET problem can be found in Snyder and Miller [8]). conditional density f (x | b) is easily obtained.) In the EM
In the late 1980s there followed an increasing number of terminology, the two sets of data y and x are referred to
applications of the EM algorithm in the signal processing as the incomplete and complete data, respectively.
area, including in speech recognition, neural networks, The EM algorithm makes use of the loglikelihood
and noise cancellation. Admittedly somewhat arbitrarily, function for the complete data in a two-step iterative
we will not cite references here in the signal processing procedure that under some conditions converges to the
area, with the exception of one that had significant impli- ML estimate given in (1) [1,5]. At each step of the EM
cations in the area of communications: the paper by Feder iteration, the likelihood function can be shown to be
and Weistein [9]. This paper will be described in some- nondecreasing [1,5]; if it is also bounded (which is mostly
what more detail later, and it is important in the area the case in practice), then the algorithm converges. The
of telecommunications in that it dealt with the problem two-step procedure at the ith iteration is
of processing a received signal that is the superposition
of a number of (not individually observable) received sig- 1. E step: Compute Q(b | bi ) ≡ E[log f (x | b) | y, bi ].
nals. Clearly, this framework applies to a large number of 2. M step: Solve bi+1 = arg maxb∈B Q(b | bi ).
telecommunications problems, including multiuser detec-
tion and detection in multipath environments. The reader Here bi is the parameter vector estimate at the ith
can find other references (not a complete list) on the use iteration. The two steps of the iteration are referred to
of the EM algorithm in the signal processing (and other as the expectation (E step) and maximization (M step)
areas) in the tutorial paper by Moon [10]. For a general steps, respectively. Note that in the absence of the data x,
introduction to the EM algorithm and its applications in which makes log f (x | b) a random variable, the algorithm
the statistics area, the reader is urged to read the original maximizes its conditional expectation instead, given the
paper of Dempster et al. [1], as well as a book published incomplete data and the most recent estimate of the
in 1997 [11] on the algorithm. The appearance of a book parameter vector to be estimated.
on the EM algorithm is perhaps the best indication of its As mentioned earlier, the EM algorithm has been
widespread use and appeal. shown [1,5] to result in a monotonically nondecreasing
As mentioned above, in this manuscript we are loglikelihood. Thus, if the loglikelihood is bounded (which
interested in a somewhat narrow application area is the case in most practical systems), then the algorithm
for the EM algorithm, that of telecommunications. converges. Under some conditions, the stationary point
Clearly, the boundaries between signal processing and coincides with the ML estimate.
telecommunications are not always well defined and in
presenting the material below some seemingly arbitrary 3. AN OVERVIEW OF APPLICATIONS TO
decisions were made: Only techniques that deal with some TELECOMMUNICATIONS
aspect of data transmission are included.
In Section 2 and for completeness we first give a brief In this section we provide a brief overview on the use of
introduction of the EM algorithm. In Section 3 we present the EM algorithm in the telecommunications area.
a summary of some of the main published results on the
application of the EM algorithm to telecommunications. 3.1. Parameter Estimation from Superimposed Signals
Section 4 contains some recent results on the EM
algorithm, and Section 5 concludes. One of the earliest uses of the EM algorithm with
strong direct implications in communications appeared
in 1988 [9]. Feder and Weinstein [9] look at the problem of
2. THE ESSENTIAL EM ALGORITHM parameter estimation from a received signal, y(t) (it can be
a vector in general), which is a superposition of a number
Let b ∈ B be a set of parameters to be estimated from of signals plus Gaussian noise:
some observed data y ∈ Y. Then the ML estimate
b of b is
a solution to
K
For simplicity in illustrating the basic idea, we will assume a multiuser environment, in the presence of modulation. In
that y(t), n(t), and each θk , k = 1, 2, . . . , K are scalar this application, the modulation symbols (binary-valued)
(Feder and Weinstein [9] handle the more general case are treated as ‘‘nuisance’’ parameters. The complete
of vectors); the objective is to estimate the parameters data are chosen as the received (incomplete) data along
θk , k = 1, 2, . . . , K from the data y(t) observed over a with the binary modulation symbols over an observation
T-second interval. It is assumed that the sk (t; θk ) are window. The E step of the EM iteration involves the
known signals given the corresponding parameters θk , and computation of conditional expectations of the binary-
n(t) is white Gaussian noise with σ 2 = E[|n(t)|2 ]. Clearly, if valued symbols, which results in ‘‘soft’’ data estimates
instead of the superimposed data y(t) one had available the in the form of hyperbolic tangents. At convergence,
individual components of y(t), the problem of estimating the algorithm yields the amplitude estimates and on
the θk would become much simpler since there would be no slicing the soft-data estimates, also estimates of the
coupling between the parameters to be estimated. Thus, binary modulation symbols (although no optimality can
the complete data x(t) are chosen to be a decomposition of be claimed). Follow-up work that relates to Ref. 17 can
y(t): x(t) = [x1 (t), x2 (t), . . . , xK (t)]
(where the prime sign
be found in Refs. 18–21, in which the emphasis is not
means transpose) where on amplitude estimation (amplitudes are assumed known)
but on K-user multiuser detection. In Refs. 18, 20, and 21
xk (t) = sk (t; θk ) + nk (t), k = 1, 2, . . . , K (3) the complete data are taken to be the received (baud rate)
K matched-filter samples along with the binary modulation
nk (t) = n(t) (4) symbols of (K − 1) users, treated as interference in
k=1 detecting a particular user. In addition to the application
of the standard EM algorithm, the authors in Refs. 18,
and, thus 20, and 21 also study the use of the space-alternating
K
generalized EM (SAGE) algorithm [22,23], a variant of
y(t) = xk (t)
the EM algorithm that has better convergence properties
k=1
and may simplify the maximization step. Other work in
The nk (t) are assumed statistically independent (for the area of spread-spectrum research that uses the SAGE
analytic convenience) and have corresponding variances algorithm in jointly detecting a single user in the presence
βk σ 2 , where it is assume that βk ≥ 0 and of amplitude, phase, and time-delay uncertainties has also
been presented [24,25].
K
βk = 1 3.3. Channel Estimation
k=1
In Refs. 26 and 27, the authors deal with detection in an
Then the E step of the EM algorithm yields [9] impulsive noise channel, modeled as a class A mixture.
In these papers the role of the EM algorithm is not
K
1 directly in detection itself, but rather in estimating the
Q(θ | θ̂ ) = − x̂k (t) − sk (t; θk )2 dt (5)
βk T triplet of parameters of the (Gaussian) mixture model,
k=1
namely, the variances of the nominal and contaminant
where distributions and the probability of being under one or the
other distribution. The estimate of the mixture model is
K then used (as if it were perfect) to select the nonlinearity
x̂k (t) = sk (t; θ̂k ) + βk y(t) − si (t; θ̂i ) (6) to be used for detection. This particular application of the
i=1 EM algorithm to estimate the Gaussian mixture model
is one of its original and most typical uses. Follow-up
and θ̂i , i = 1, 2, . . . , K are the most recent estimates. At the (but more in-depth) work on the use of the EM algorithm
M step, maximization of (5) with respect to the parameter to estimating class A noise parameters can be found in
vector θ corresponds to minimizing each term in the Ref. 28. Further follow-up work on the problem of detection
sum in (5) with respect to the corresponding individual in impulsive noise can be found in Refs. 29 and 30. As in
parameter. Thus, the desired decoupling. Refs. 26 and 27, the authors in Refs. 29 and 30 use the
Feder and Weinstein [9] went on to apply the general EM algorithm to estimate parameters in the impulsive
results presented above to the problems of estimating noise model, which are then used for detection in a spread-
the multipath delays and to the problem of multiple spectrum, coded environment. In Ref. 31 the EM algorithm
source location. No modulation was present in the received is used to estimate the noise parameters in a spatial
signals, but, as mentioned above, the technique is quite diversity reception system when the noise is modeled as
general and has over the years been used in a number of a mixture of Gaussian distributions. Also in a spatial
applications in telecommunications. Direct applications of diversity environment, Baroso et al. [32] apply the EM
the results in Ref. 9 to the problem of multiuser detection algorithm to estimate blindly the multiuser array channel
can be found in Refs. 12–16. transfer function. Data detection in the paper is considered
separately.
3.2. The Multiuser Channel
The EM algorithm has also been used for channel
In Ref. 17, also published in 1988, Poor uses the EM estimation under discrete signaling. In a symposium
algorithm to estimate the amplitudes of the user signals in paper [33] and in a follow-up journal paper [34], Vallet
772 EM ALGORITHM IN TELECOMMUNICATIONS
and Kaleh study the use of the EM algorithm for random phase vector θ , namely, x = (r, θ ). Assuming PSK
channel estimation modeled as having a finite impulse modulation (the case of QAM can also be handled easily),
response, for both linear and nonlinear channels. In the E step of the algorithm at the ith iteration yields
this work, the modulation symbols are considered as
the ‘‘nuisance’’ parameters and the authors pose the Q(s | si ) = {r† s · (r† si )∗ } (8)
problem as one of estimating the channel parameters.
At convergence, symbol detection can be performed as For the maximization step of the EM algorithm, we
well. Other work on the use of the EM algorithm to distinguish between coded and uncoded transmission.
channel estimation/equalization in the presence of data Observe from (8) that in the absence of coding, maximizing
modulation can be found in the literature [12,35–37]. Q(s | si ) with respect to sequences s is equivalent to
Zamiri-Jafarian and Pasupathy present an algorithm for maximizing each individual term in the sum, specifically,
channel estimation, motivated by the EM algorithm, that making symbol-by-symbol decisions. When trellis coding
is recursive in time. is used, the maximization over all trellis sequences can
be done efficiently using the Viterbi algorithm. Follow-
3.4. The Unsynchronized Channel up work on sequence estimation under phase offset can
be found in the literature [44–47]. The use of the EM
In 1989 a paper dealing (for the first time) with sequence algorithm for phase synchronization in OFDM systems
estimation using the EM algorithm appeared [38] with a has also been studied [48].
follow-up journal paper appearing in 1991 [39]. In Refs. 38
and 39 the received data are ‘‘incomplete’’ in estimating 3.6. The Fading Channel
the transmitted sequence because time synchronization is
absent. The complete data in the paper were defined as Sequence estimation for fading channels using the EM
the baud rate (nonsynchronized) matched-filter samples algorithm was first studied in 1993 [49], with follow-up
along with the correct timing phase. The E step of the work a bit later [45,47,50]. Here the authors look at the
algorithm was evaluated numerically, and the M step problem as one of sequence estimation, where the random
trivially corresponded to symbol-by-symbol detection in fading are the unwanted parameters. A brief summary of
the absence of coding. The algorithm converged within 2–4 the EM formulation as presented in the references cited
iterations, depending on the signal-to-noise ratio (SNR). above is given below.
The fast convergence of the algorithm can be attributed to Let D, s, and N be as defined above and a be the
the fact that the parameters to be estimated came from a complex fading process modeled as a zero-mean, Gaussian
discrete set. Follow-up work using the EM algorithm for vector with independent real and imaginary parts having
the time unsynchronized channel appeared in 1994 and covariance matrix Q. Then the received data r are
1995 [40–42]. In one of Kaleh’s papers [40], which in fact
uses the Baum–Welch algorithm [4], a predecessor to EM, r = Da + N (9)
instead of posing the problem as sequence estimation in
the presence of timing offset, the author poses the problem Let the complete data x be the incomplete data r along
as one of estimating the timing offset (and additive with the random fading vector a: x = (r, a). Then, for PSK
noise variance) in the presence of random modulation. signaling, the expectation step of the EM algorithm yields
At convergence, symbol estimates can also be obtained. In
Q(s | si ) = [r† Dâi ] (10)
Ref. 42, in addition to the absence of time synchronization,
the authors assume an unknown amplitude. In the paper
where
the authors use the SAGE algorithm [22] to perform
sequence estimation. −1
I
âi = E[a | r, si ] = Q Q + (Di )∗ r (11)
SNR
3.5. The Random Phase Channel
Similar to Refs. 38 and 39, in a 1990 paper [43] the authors When the fading is static over a number of symbols,
consider sequence estimation in the presence of random the expectation step of the algorithm becomes Q(s | si ) =
phase offset, for both uncoded and trellis-coded systems. [r† sâi ], where âi = rsi† . Again, the maximization step
We provide a brief description of the results here. can be done easily and corresponds to symbol-by-symbol
Let s = (s1 , s2 , . . . , sN ) be the vector containing the detection in the uncoded case, or Viterbi decoding in the
complex modulation symbols, D be a diagonal matrix trellis-coded case.
with the elements of s as diagonal elements, and θ be Other work on the use of the EM algorithm for sequence
the phase offset over the observation window. Then the estimation in multipath/fading channels can be found in
received (incomplete) data vector r can be expressed as the literature [51–57]. In Refs. 54 and 55 the authors
introduce an approximate sequence estimation algorithm
r = Dejθ + N (7) motivated by the EM algorithm.
suppression have appeared in the literature more recently. simultaneously. The transmitted code block can be written
These include Refs. 58–62. An application to sequence in matrix form as
estimation [60] is briefly presented below.
Let d(1)
1 d(1)
2 ... d(1)
N
r(t) = S(t; a) + J(t) + n(t) d(2) d(2) d(2)
(12) 1 2 ... N
D=
.. .. .. .. (17)
be the received data, where S(t; a), 0 ≤ t < T, is the . . . .
transmitted signal (assuming some arbitrary modulation) d(L)
1 d(L)
2 ... d(L)
N
corresponding to a sequence of M information symbols a.
J(t) models the interference, and n(t) is zero-mean, where the superscript in d(l)n represents time index and
white, Gaussian noise having spectral density N0 /2. the subscript is the space index. In other words, d(l)
n is the
For generality, we will not specify the form of the complex symbol transmitted by the nth antenna during
signal part, S(t; a), but special cases of interest include the lth symbol time. We denote the row vectors of D by
direct-sequence spread-spectrum (DSSS) and multicarrier D(l) .
spread-spectrum (MCSS) applications. Assuming quasistatic fading (the nonstatic case has
Let the complete data be the received data r(t) along been handled as well), let
with the interference J(t) (which is arbitrary at this point).
Application of the EM algorithm yields the following j = ( γ1j γ2j ··· γNj )T , j = 1, 2, . . . , M
expectation step (ak is the sequence estimate at the kth
iteration and ·, · denotes inner product): be the fading vector whose components γij are the fading
coefficients in the channel connecting the ith transmit
Q(a | ak ) = r(t) − Ĵ k (t), S(t; a) (13) antenna to the jth receive antenna, for i = 1, 2, . . . , N. The
γij are assumed independent. Then the received data at
Ĵ (t) = E[J(t) | {r(τ ) : 0 ≤ τ ≤ T}, a ]
k k
(14)
the jth receive antenna are
These expressions are quite general and can be applied
to arbitrary interference models, provided the conditional Yj = Dj + Nj , j = 1, 2, . . . , M
mean estimates (CMEs) in (14) can be computed. For a
vector-space interference model Defining the complete data as {Yj , j }j=1
M
, the E step of
the EM algorithm at the kth iteration yields
N
M
L
J(t) = gj φj (t) (15) 1
Q(D | Dk ) = ˆ j )k (D(l) )∗
(yj(l) D(l) ˆ jk ) − D(l) (
j=1
2
l=1 j=1
10−3
EM decoder, J = 16, K = 2
EM decoder, J = 32, K = 2
Genie bound
√
here an application of the EM algorithm to this problem.
N
Ei
This is a summary of results from Ref. 66. B =
i
rk tanh rk (28)
vi
Let the received data be k=1
√ √
rk = Edk + vnk , k = 1, 2, . . . , N Figure 2 shows results on the bias and mean-square error
in estimating the SNR and the received signal energy for
where E is the received signal energy; v is the additive the EM algorithm and a popular high-SNR approximation
noise variance; the nk are zero-mean, unit-variance, i.i.d. to the ML estimator (the true ML estimator is much too
Gaussian; and the dk are binary modulation symbols (the complex to implement).
more general case is handled as well) with dk ∈ {1, −1}.
We are interested in estimating the vector θ = (E, v) from 4.3. On the (Non) Convergence of the EM Algorithm for
the received (incomplete) data vector r using the EM Discrete Parameters
algorithm. Let the complete data be x = (r, d). Then the
E step of the EM algorithm yields For continuous parameter estimation, under continuity
and weak regularity conditions, stationary points of
√
1 2 N·E 2 E the EM algorithm have to be stationary points of the
N N
Q(θ | θ ) = −N ln(v) −
i
rk − + · rk d̂ik loglikelihood function [e.g., 11]. On the other hand, any
v k=1 v v k=1 fixed point can be a convergence point of the discrete EM
(22) algorithm even though the likelihood of such a point can be
where √ lower than the likelihood of a neighboring discrete point.
Ei
d̂ik = E dk | r, θ i = tanh rk (23) In this subsection we summarize some results on the
vi (non)convergence of the discrete EM algorithm presented
in Ref. 67.
Taking derivatives w.r.t. (with respect to) E and v, the M Let a be a M-dimensional discrete parameter with
step of the EM algorithm yields the following recursions: values in A. It is assumed that, along with the discrete
2 EM algorithm, a companion continuous EM algorithm
Bi
Êi+1 = (24)
N ak+1 = arg max Q(a | ak ) (29)
c
a∈CM
A
v̂i+1 = − Êi+1 (25)
N where CM is the M-dimensional complex space, is well
i+1 Êi+1 defined. The companion continuous EM mapping is the
ˆ
SNR = i+1 (26)
v̂ map Mc : ak → ack+1 , for ak+1
c ∈ CM . The Jacobian of the
continuous EM mapping Mc (a) is denoted with Jc (a),
where and the Hessian of its corresponding objective function
Q(a | ak ) is denoted with HQ (a).
N
A= r2k (27) Computationally ‘‘inexpensive’’ maximization of the
k=1 objective function Q(a | ak ) over A can be obtained for
EM ALGORITHM IN TELECOMMUNICATIONS 775
4 1.5
Mean energy estimate, dB
1
0.5
0
True energy 0
−1
−5 0 5 10 −5 0 5 10
SNR, dB SNR, dB
10 8
Mean SNR estimate, dB
6
5
0
2
True SNR
Figure 2. Comparison between the EM
−5 0 and high SNR ML approximation
−5 0 5 10 −5 0 5 10
(dashed lines) for sequence length
SNR, dB SNR, dB N = 50.
An important implication of (32) is that λ̂min = 0 is Next, we provide symbol error rate results for a binary
sufficient for the nonconvergence ball to hold at most communications example where N > M and span{h(t −
one discrete stationary point. Sequence estimation, in mT), m ∈ [0, M − 1]} ⊂ span{φn (t), n ∈ [0, N − 1]} holds.
the presence of interference when λ̂min = 0 or when this Figure 3 demonstrates that, in the case of an interference
identity can be forced using rank-reduction principles, has process whose full spectrum covers the spectrum of
been analyzed in Ref. 67 and in part in Ref. 70, and is the useful signal, one could significantly increase the
presented briefly next. convergence probability of the EM algorithm using a
In the following, the received signal model given in (12) rank-reduction approach. Two second-order Gaussian
with a linear modulation processes uniformly positioned in frequency within the
signal spectra are used to model interference using N > M
M−1
eigenfunctions. The EM algorithm obtained by modeling
S(t; a) = an h(t − mT) (33)
the interference with P < M < N largest eigenfunctions
m=0
φn (t), and thus forcing λ̂min = 0 in (32), is compared to
where pulses {h(t − mT), m ∈ [0, M − 1]} are orthonormal, the performance of the EM algorithm that uses the full-
is assumed. rank model. It can be observed that the reduced-rank EM
For this problem, a reduced nonconvergence ball algorithm has little degradation, whereas the full-rank
radius has been derived [67] without having to resort EM algorithm does not converge and has a catastrophic
to the Taylor expansion of Mc (a). The reduced performance for large SNR.
nonconvergence radius is a function of only dmin and
the statistics of noise and interference. In the special 4.4. Detection in Fading Channels with Highly Correlated
case when N = M and span{h(t − nT), n ∈ [0, N − 1]} = Multipath Components
span{φn (t), n ∈ [0, N − 1]}, where φn (t) are defined in (15), Here we study another example where the convergence
the reduced radius and rnc are equal. Both are of the discrete EM algorithm can be stymied by a large
nonconvergence ball.
λ̃min We assume a continuous-time L-path Rayleigh fading
r = 1+
nc
dmin (34)
N0 channel model:
K
L
where λ̃min = minj {λj }; λj is the variance of the coefficient r1 (t) = αl,k,1 S(t − τl,k,1 ; ak ) + n1 (t)
gj in (15). For example, let’s assume M = N = 2, interfer- k=1 l=1
ence energy J = λ1 + λ2 = 2λ1 = 2λ̃min , binary antipodal
..
signaling with dmin = 1, and noise and signal realiza- .
tions such √ that âc ≈ 0 (see Ref. 67 for details). Then, if
K
L
J/N0 > 2 2 − 2 all possible discrete points in A will be
rNa (t) = αl,k,Na S(t − τl,k,Na ; ak ) + nNa (t) (35)
found inside the convergence ball whose radius is given
k=1 l=1
by (34). Consequently, the discrete EM algorithm will gen-
erate a sequence of identical estimates ak = a0 for any k Here Na is the number of receive antennas and K is
and any initial point a0 ∈ A. the number of users; ak is the vector of N symbols of
10−2
Maximum−likelihood
Symbol error−rate
10−4
Processing gain = 16
Sequence length = 4
Signal−to−interference ratio = −20 dB
Number of interferers = 2
user k; αl,k,i are zero-mean, complex circular Gaussian The E step requires estimation of the phase and
random variables of respective variances γl,k,i and τl,k,i amplitude compensated signal path components
are (assumed known) path delays. We group the fading
∗
coefficients, αl,k,i , for each i into a vector αi such that χ̂l,k,i (t; am ) = E{αl,k,i xl,k,i (t) | {ri (t); t ∈ [Ti , . . . , Tf ]}, am }
αl,k,i = [αi ]l+L·(k−1) is the l + L · (k − 1)th element of the
vector αi . The covariance matrix of αi is i . Ps of the
= ρ̂ll,kk,i (am )S(t − τl,k,i ; am ) + βl,k α̂l,k,i
∗
(am )ri (t)
transmitted symbols are assumed to be known pilot
symbols that allow for ambiguity resolution (67). The
signal is assumed to have the following form:
K
L
− ρ̂lj,kq,i (am )S(t − τj,q,i ; am )
N q=1 j=1
S(t; ak ) = an,k h(t − nT)
n=1 Signal path components are estimated by successive
refinement. The refinement attempts to find those
where h(t) is the known signaling pulse (assumed component estimates that explain the received signal with
common for simplicity) modulated by the transmitted a smallest measurement error. α̂l,k,i (am ) and ρ̂lj,kq,i (am )
symbol sequence a of length N. Pulses h(t − iT) are are, respectively, the conditional mean estimates of the
typically orthogonal to avoid ISI for frequency nonselec- complex coefficients
tive channels. The vector of time-delayed modulated sig-
nals is si (t; a) = [S(t − τ1,1,i ; a1 ), S(t − τ2,1,i ; a1 ), . . . , S(t − α̂l,k,i (am ) = E{αl,k,i | {ri (t); t ∈ [Ti , . . . , Tf ]}, am }
τL,K,i ; aK )]T , where a is a NK-dimensional vector of user
symbols such that its subvector of length K[a]k·N = [(N0 i−1 + Gi (am ))−1 si (t; am ), rH
i (t)]l
(k−1)·N+1 is
the vector of N symbols of user k.
For simplicity the receiver antennas are assumed to be and their cross-correlations
sufficiently separated so that any αi and αj for i = j are ∗
independent. Gi (a) = si (t; a), sH ρ̂lj,kq,i (am ) = E{αl,k,i αj,q,i | {ri (t); t ∈ [Ti , . . . , Tf ]}, am }
i (t; a) is the Gramian of
si (t; a) in the L2 space. Gi (am ) −1
An EM solution is nontrivially based on the EM = i−1 +
N0
solution for superimposed signals introduced by Feder
and Weinstein described in Section 3.1 (see, also Refs. 53
and 69). The complete data vector includes not only the + α̂i (am )α̂iH (am ) (39)
l+(k−1)L,j+(q−1)L
path plus noise components but also the fading vectors
αi , as The M step of the companion continuous EM algorithm
can be expressed in closed form as
(x, α) = ([x1 , . . . , xNA ], α1 , . . . , αNa ) (36)
Na
L
= ({x1,1,1 (t), . . . , xL,K,Na (t), βl,k ẑl,n,k,i (am )
i=1 l=1
t ∈ [Ti , . . . , Tf ]}, α1 , . . . , αNa ) (37) cu,n,k =
âm+1 , (n, k) ∈ Jps
Na
L
m
βl,k ρ̂ll,kk,i (a )
where xl,k,i (t) = αl,k,i S(t − τl,k,i ; ak ) + nl,k,i (t) · nl,k,i (t) is a i=1 l=1
10−1
Maximum likelihood
EM−continuous
EM−discrete
10−2
Bit error−rate
10−3
10−4
14. A. Radović, An iterative near-far resistant algorithm for 31. R. S. Blum, R. J. Kozick, and B. M. Sadler, An adaptive
joint parameter estimation in asynchronous CDMA systems, spatial diversity receiver for non-Gaussian interference and
5th IEEE Int. Symp. Personal, Indoor and Mobile Radio noise, 1st IEEE Signal Processing Workshop on Signal
Communications, 1994, Vol. 1, pp. 199–203. Processing Advances in Wireless Communications, 1997,
pp. 385–388.
15. M. J. Borran and M. Nasiri-Kenari, An efficient decoding
technique for CDMA communication systems based on the 32. V. A. N. Baroso, J. M. F. Moura, and J. Xavier, Blind array
expectation maximization algorithm, IEEE 4th Int. Symp. channel division multiple access (AChDMA) for mobile
Spread Spectrum Techniques and Applications Proc., 1996, communications, IEEE Trans. Signal Process. 46: 737–752
Vol. 3, pp. 1305–1309. (March 1998).
16. A. Chkeif and G. K. Kaleh, Iterative multiuser detector with 33. R. Vallet and G. K. Kaleh, Joint channel identification and
antenna array: An EM-based approach, Proc. Int. Symp. symbols detection, Proc. Int. Symp. Information Theory, 1991,
Information Theory, 1998, p. 424. p. 353.
17. H. V. Poor, On parameter estimation in DS/SSMA formats, 34. G. K. Kaleh and R. Vallet, Joint parameter estimation and
Proc. Advances in Communications and Control Systems, symbol detection for linear and nonlinear unknown channels,
Baton Rouge, LA, Oct. 1988, pp. 59–70. IEEE Trans. Commun. 42: 2406–2413 (July 1994).
18. L. B. Nelson and H. V. Poor, Soft-decision interference can- 35. Min Shao and C. L. Nikias, An ML/MMSE estimation
cellation for AWGN multiuser channels, Proc. 1994 IEEE Int. approach to blind equalization, 1994 IEEE Int. Conf.
Symp. Information Theory, 1994, p. 134. Acoustics, Speech, and Signal Processing (ICASSP-94), 1994,
Vol. 4, pp. 569–572.
19. H. V. Poor, Adaptivity in multiple-access communications,
Proc. 34th IEEE Conf. Decision and Control, 1995, Vol. 1, 36. H. Zamiri-Jafarian and S. Pasupathy, Recursive channel
pp. 835–840. estimation for wireless communication via the EM algorithm,
780 EM ALGORITHM IN TELECOMMUNICATIONS
1997 IEEE Int. Conf. Personal Wireless Communications, Communications, Computers and Signal Processing, 1999,
1997, pp. 33–37. pp. 511–515.
37. H. Zamiri-Jafarian and S. Pasupathy, EM-Based Recursive 53. P. Spasojević and C. N. Georghiades, Implicit diversity com-
estimation of channel parameters, IEEE Trans. Commun. bining based on the EM algorithm, Proc. WCNC ’99, New
47: 1297–1302 (Sept. 1999). Orleans, LA, Sept. 1999.
38. C. N. Georghiades and D. L. Snyder, An application of the 54. H. Zamiri-Jafarian and S. Pasupathy, Generalized MLSDE
expectation-maximization algorithm to sequence detection in via the EM algorithm, 1999 IEEE Int. Conf. Communication,
the absence of synchronization, Proc. Johns Hopkins Conf. 1999, pp. 130–134.
Information Sciences and Systems, Baltimore, MD, March 55. H. Zamiri-Jafarian and S. Pasupathy, Adaptive MLSDE
1989. using the EM algorithm, IEEE Trans. Commun. 47(8):
39. C. N. Georghiades and D. L. Snyder, The expectation max- 1181–1193 (Aug. 1999).
imization algorithm for symbol unsynchronized sequence 56. M. Leconte and F. Hamon, Performance of /spl pi/-
detection, IEEE Trans. Commun. 39(1): 54–61 (Jan. constellations in Rayleigh fading with a real channel estima-
1991). tor, 1999 IEEE Int. Conf. Personal Wireless Communication,
40. G. K. Kaleh, The Baum-Welch Algorithm for detection of 1999, pp. 183–187.
time-unsynchronized rectangular PAM signals, IEEE Trans. 57. W. Turin, MAP decoding using the EM algorithm, 1999
Commun. 42: 260–262 (Feb.–April 1994). IEEE 49th Vehicular Technology Conf., 1999, Vol. 3,
41. N. Antoniadis and A. O. Hero, Time-delay estimation for pp. 1866–1870.
filtered Poisson processes using an EM-type algorithm, IEEE 58. Q. Zhang and C. N. Georghiades, An application of the EM
Trans. Signal Process. 42(8): 2112–2123 (Aug. 1994). algorithm to sequence estimation in the presence of tone
42. I. Sharfer and A. O. Hero, Spread spectrum sequence estima- interference, Proc. 5th IEEE Mediterranean Conf. Control
tion and bit synchronization using an EM-type algorithm, and Systems, Paphos, Cyprus, July 1997.
1995 Int. Conf. Acoustics, Speech, and Signal Processing 59. O. C. Park and J. F. Doherty, Generalized projection algo-
(ICASSP-95), 1995, Vol. 3, pp. 1864–1867. rithm for blind interference suppression in DS/CDMA com-
43. C. N. Georghiades and J. C. Han, Optimum decoding of munications, IEEE Trans. Circuits Syst. II: Analog Digital
trellis-coded modulation in the presence of phase-errors, Proc. Signal Process. 44(6): 453–460 (June 1997).
1990 Int. Symp. Its Applications (ISITA’ 90), Hawaii, Nov. 60. C. N. Georghiades, Maximum-likelihood detection in the
1990. presence of interference, Proc. IEEE Int. Symp. Inform.
44. C. N. Georghiades, Algorithms for Joint Synchronization and Theory 344 (Aug. 1998).
Detection, in Coded Modulation and Bandwidth Efficient 61. C. N. Georghiades and D. Reynolds, Interference Rejection for
Transmission, Elsevier, Amsterdam, 1992. Spread-Spectrum Systems Using the EM Algorithm, Springer-
Verlag, London, 1998.
45. C. N. Georghiades and J. C. Han, On the application of the
EM algorithm to sequence estimation for degraded channels, 62. Q. Zhang and C. N. Georghiades, An interference rejection
Proc. 32nd Allerton Conf. Univ. Illinois, Sept. 1994. application of the EM algorithm to direct-sequence signals,
Kybernetika (March 1999).
46. C. R. Nassar and M. R. Soleymani, Joint sequence detection
and phase estimation using the EM algorithm, Proc. 1994 63. Y. Li, C. N. Georghiades, and G. Huang, Iterative maximum
Canadian Conf. Electrical and Computer Engineering, 1994, likelihood sequence estimation of space-time codes, IEEE
Vol. 1, pp. 296–299. Trans. Commun. 49: 948–951 (June 2001).
47. C. N. Georghiades and J. C. Han, Sequence Estimation in the 64. V. Tarokh, N. Seshadri, and A. R. Calderbank, Space-time
presence of random parameters via the EM algorithm, IEEE codes for high data rate wireless communication: Performance
Trans. Commun. 45: 300–308 (March 1997). criterion and code construction, IEEE Trans. Inform. Theory
44: 744–765 (March 1998).
48. E. Panayirci and C. N. Georghiades, Carrier phase synchro-
nization of OFDM systems over frequency-selective channels 65. Y. Xie and C. N. Georghiades, An EM-based channel estima-
via the EM algorithm, 1999 IEEE 49th Vehicular Technology tion algorithm for OFDM with transmitter diversity, Proc.
Conf., 1999, Vol. 1, pp. 675–679. Globecom 2001, San Antonio, TX, Nov. 2001.
49. J. C. Han and C. N. Georghiades, Maximum-likelihood 66. C. N. Georghiades and U. Dasgupta, On SNR and energy
sequence estimation for fading channels via the EM estimation, manuscript in preparation.
algorithm, Proc. Communication Theory Mini Conf., Houston, 67. P. Spasojević, Sequence and Channel Estimation for Channels
TX, Nov. 1993. with Memory, dissertation, Texas A&M Univ., College
50. J. C. Han and C. N. Georghiades, Pilot symbol initiated Station, TX, Dec. 1999.
optimal decoder for the land mobile fading channel, 1995 68. P. Spasojević and C. N. Georghiades, The slowest descent
IEEE Global Telecommunications Conf. (GLOBECOM ’95), method and its application to sequence estimation, IEEE
1995, pp. 42–47. Trans. Commun. 49: 1592–1601 (Sept. 2001).
51. K. Park and J. W. Modestino, An EM-based procedure for 69. P. Spasojević, Generalized decorrelators for fading channels,
iterative maximum-likelihood decoding and simultaneous Int. Conf. Information Technology: Coding and Computing,
channel state estimation on slow-fading channels, Proc. 1994 April 2001, pp. 312–316.
IEEE Int. Symp. Information Theory, 1994, p. 27.
70. C. N. Georghiades and P. Spasojević, Maximum-likelihood
52. L. M. Zeger and H. Kobayashi, MLSE for CPM signals in detection in the presence of interference, IEEE Trans.
a fading multipath channel, 1999 IEEE Pacific Rim Conf. Commun. (submitted).
F
FADING CHANNELS In Section 2, we review the large-scale fading mod-
els. Sections 3–5 describe small-scale fading. Multipath
ALEXANDRA DUEL-HALLEN fading channels are characterized in Section 3. Section 4
North Carolina State University describes useful fading models. Performance analysis for
Raleigh, North Carolina flat fading channels and diversity combining approaches
are described in Section 5.
1. INTRODUCTION
2. LARGE-SCALE FADING
Radio communication channels include shortwave iono-
spheric radiocommunication in the 3–30-MHz frequency Propagation path loss Ls (d) is defined as the ratio
band (HF), tropospheric scatter (beyond-the-horizon) radio of the transmitted power to the received power in a
communications in the 300–3000 MHz frequency band radiofrequency (RF) channel. In a free-space model, the
(UHF) and 3000–30,000-MHz frequency band (SHF), and propagation path loss is proportional to d2 , where d is the
ionospheric forward scatter in the 30–300-MHz frequency distance between the transmitter and the receiver. When
band (VHF) [1,2]. These channels usually exhibit multi- the receiving antenna is isotropic, this loss is given by [2]
path propagation that results from reflection, diffraction 2
4π d
and scattering of the transmitted signal. Multipath prop- Ls (d) = (1)
agation and Doppler effects due to the motion of the λ
transmitter and/or receiver give rise to multipath fading
where λ is the wavelength of the RF signal. In the mobile
channel characterization. The fading signal is character-
radio channel, the average path loss is usually more severe
ized in terms of large-scale and small-scale fading.
due to obstructions and is inversely proportional to dn ,
The large-scale fading model describes the average
where 2 ≤ n ≤ 6. The average path loss is given by [2,3]
received signal power as a function of the distance between
the transmitter and the receiver. Statistical variation d
around this mean (on the order of 6–10 dB) resulting from Lav (d) (dB) = Ls (d0 ) + 10n log10 (2)
d0
shadowing of the signal due to large obstructions is also
included in the model of large-scale fading. Large-scale where d0 is the close-in reference distance. This distance
fading describes the variation of the received power over corresponds to a point located in the far field of the
large areas and is useful in estimating radio coverage of the antenna. For large cells, it is usually assumed to be 1 km,
transmitter. The large scale fading model is determined by whereas for smaller cells and indoor environments, the
averaging the received power over many wavelengths (e.g., values of 100 m and 1 m, respectively, are used. The value
1–10 m for cellular and PCS frequencies of 1–2 GHz) [3]. of the exponent n depends on the frequency, antenna
The small-scale fading model describes the instanta- heights, and propagation conditions. For example, in
neous variation of the received power over a few wave- urban area cellular radio, n takes on values from 2.7 to 3.5;
lengths. This variation can result in dramatic loss of power in building line-of-sight conditions, 1.6 ≤ n ≤ 1.8; whereas
on the order of 40 dB. It is due to superposition of several in obstructed in building environments, n = 4–6 [3].
reflected or scattered components coming from different The actual path loss in a particular location can deviate
directions. Figure 1 depicts small-scale fading and slower significantly from its average value (2) due to shadowing of
large-scale fading for a mobile radiocommunication sys- the signal by large obstructions. Measurements show that
tem. The figure illustrates that small-scale variation is this variation is approximately lognormally distributed.
averaged out in the large-scale fading model. Thus, the path loss is represented by the random variable
n=2
100
n=1
90
80 n = 2.7
s = 11.8 dB
Figure 2. Scatterplot of measured data and corre-
sponding MMSE path loss model for many cities in 70
0.1 2 3 4 1 2 3 4 10
Germany. For these data, n = 2.7 and σ = 11.8 dB
(reprinted from Ref. 15, IEEE). T − R separation (km)
−10
N
c(t) = An ej(2π fn t+2π fc τn +φn ) (4) −15
n=1
−20
r (∆t)
φc (0; t) ≡ φc (t) (10)
selectivity also causes dispersion, or ISI, since delayed where (.) is the Gamma function [11],
= E(R2 ) and
versions of the transmitted signal arrive at the receiver the parameter m is the fading figure given by m =
much later (relative to the symbol interval) than
2 /E[(R2 −
)2 ], m ≥ 1/2. While the Rayleigh distribution
components associated with small delays. This channel uses a single parameter E(R2 ) = 2σ 2 to match the fading
is often modeled using several fading rays with different statistics, the Nakagami-m distribution depends on two
excess multipath delays parameters, E(R2 ) and m. For m = 1 the density (16)
reduces to Rayleigh distribution. For 12 ≤ m < 1, the tail
c(t) = c1 (t)δ(t − τ1 ) + c2 (t)δ(t − τ2 ) + · · · + cL (t)δ(t − τL ) of the Nakagami-m PDF decays slower than for Rayleigh
(13) fading, whereas for m > 1, the decay is faster. As a result,
where the components cl (t), l = 1, . . . , L are uncorrelated the Nakagami-m distribution can model fading conditions
flat fading (e.g., Rayleigh distributed) random variables. that are either more or less severe than Rayleigh fading.
The powers associated with these rays are determined by The Nakagami-m PDF for different values of m is
the multipath intensity profile (8). illustrated in Fig. 7.
Now, consider rapidity, or time variation of the fading The Rayleigh distribution models a complex fading
channel. The channel is considered slowly varying if the channel with zero mean that is appropriate for channels
channel response changes much slower than the symbol without the line-of-sight (LoS) propagation. When strong
rate. In this case, the symbol interval is much smaller than nonfading, or specular, components, such as LoS propaga-
the coherence time, T (t)c , or the signal bandwidth tion paths, are present, a DC component needs to be added
significantly exceeds the Doppler spread, W Bd . If the to random multipath, resulting in Ricean distribution. The
symbol interval is comparable to or greater than the pdf of the amplitude of Ricean fading is given by
coherence time, (or the coherence bandwidth is similar
to or exceeds the signal bandwidth), the channel is fast- 2
r r + s2 rs
fading. While most mobile radio, or PCS, channels are pR (r) = exp − I0 , r≥0 (17)
σ2 2σ 2 σ2
slowly fading, as the velocity of the mobile and the carrier
frequency increase, the channel becomes rapidly time- where s is the peak amplitude of the dominant nonfading
varying since the Doppler shift increases [see (5)]. This component and I0 (.) is the modified Bessel function of
rapid time variation results in time selectivity (which can the first kind and zero order [11]. The Ricean factor K
be exploited as time diversity), but degrades reliability of
detection and channel estimation [12–14].
4. FADING-CHANNEL MODELS
1
pθ (θ ) = , |θ | ≤ π (15)
2π
The Rayleigh distribution is a special case of the
Nakagami-m distribution that provides a more flexible
model of the statistics of the fading channel. The PDF of 0
the amplitude of the Nakagami-m distribution is [1] 0 0.5 1.0 1.5 2.0
R
2 m m 2m−1 mr2 Figure 7. The PDF for the Nakagami-m distribution, shown with
pR (r) = r exp − (16)
(m)
= 1, where m is the fading figure (reprinted from Ref. 17).
786 FADING CHANNELS
specifies the ratio of the deterministic signal power and can be extended to produce several uncorrelated fading
the variance of the multipath: components using the same set of oscillators [10]. The
autocorrelation function and the Doppler spectrum of the
s2 signals generated by the Jakes model are characterized
K = 10 log10 (dB) (18)
2σ 2 by Eqs. (11) and (12), respectively, and are shown in
Fig. 6. This statistical characterization is appropriate
As s → 0 (K → −∞), the power of the dominant path for many channels where the reflectors are distributed
diminishes, and the Ricean PDF converges to the Rayleigh uniformly around the mobile (isotropic scattering). Several
PDF. Examples of Ricean fading and other LoS channels other approaches to simulating Rayleigh fading channels
include airplane to ground communication links and based on Clarke and Gans fading models are described
microwave radio channels [1]. by Rappaport [3]. Moreover, physical models are useful
As an alternative to modeling the Rayleigh fading as a [e.g., when the variation of amplitudes, frequencies, and
complex Gaussian process, one can instead approximate phases in (4) is important to model as in long-range fading
the channel by summing a set of complex sinusoids as prediction] and autoregressive (Gauss–Markov) models are
in (4). The number of sinusoids in the set must be large often utilized to approximate fading statistics in fading
enough so that the PDF of the resulting envelope provides estimation algorithms since they result in rational spectral
an accurate approximation to the Rayleigh PDF. The characterization [12,13].
Jakes model is a popular simulation method based on
this principle [10]. The signal generated by the model is
5. DIVERSITY TECHNIQUES AND PERFORMANCE
2 j(ωd t cos αn +φn ) ANALYSIS
N
c(t) = e (19)
N n=1
5.1. Performance Analysis for Flat Fading Channels
where N is the total number of plane waves arriving at Fading channels undergo dramatic changes in received
uniformly spaced angles αn as shown in Fig. 8. The c(t) power due to multipath and Doppler effects. When com-
can be further represented as munication signals are transmitted over these channels,
the bit error rate (BER) varies as a function of the signal-
E0
c(t) = √ (c1 (t) + jcQ (t)) to-noise ratio (SNR) and is significantly degraded relative
2N0 + 1 to the BER for the additive white Gaussian noise (AWGN)
N0
√ channel with the same average SNR. Consider the follow-
cI (t) = 2 cos φn cos ωn t + 2 cos φN cos ωm t ing example of transmission of binary phase-shift-keyed
n=1 (BPSK) signal over the flat Rayleigh fading channel. At
the output of the matched filter and sampler at the bit rate
N0
√
cQ (t) = 2 sin φn cos ωn t + 2 sin φN cos ωm t 1/Tb (where Tb is the bit interval), the complex envelope
n=1 of the received signal is
The BER for other fading distributions (e.g., Ricean or Figure 9. Average error probability for two-phase PSK with
Nakagami-m) can be obtained similarly by averaging Nakagami fading (reprinted from Ref. 1).
the BER for AWGN (23) over the fading statistics as
in (24). Figure 9 illustrates performance of BPSK over the coherence time (t)c . Time diversity is utilized in
a Nakagami-m fading channel for different values of m. coded systems by interleaving the outputs of the encoder.
Observe that as m increases, the fading becomes less Frequency diversity can be employed when the trans-
severe, and the BER approaches that of the AWGN mitter bandwidth is larger than the coherence bandwidth
channel. Rayleigh fading (m = 1) results in significant of the channel, and several fully or partially uncorre-
SNR loss relative to the nonfading channel. In fact, for lated fading components can be resolved. The number of
large SNR, the BER in (25) behaves asymptotically as 14 , such uncorrelated components is determined by the ratio
whereas the BER decreases exponentially with SNR for W/(f )c [1]. In PSK or quadrature-amplitude-modulated
AWGN. The error rates of other modulation methods [e.g., (QAM) systems, the transmitter bandwidth is approxi-
coherent and noncoherent frequency shift keying (FSK), mately 1/T. Because of the narrow transmission band-
differential PSK (DPSK)] also decrease only inversely with width, these channels are often frequency-nonselective.
SNR, causing very large power consumption. When frequency selectivity is present, it usually causes
ISI since the multipath delay is significant relative to
5.2. Diversity Techniques the symbol interval. Thus, equalizers are used to mitigate
the ISI and to obtain the diversity benefit [1,12–14]. On
The poor performance of flat fading channels is due to the other hand, direct-sequence spread-spectrum (DSSS)
the presence of deep fades in the received signal (Fig. 4) systems employ waveforms with the transmission band-
when the received SNR is much lower than the average width that is much larger than the symbol rate 1/T,
SNR value. Diversity techniques help to combat fading and thus enjoy significant frequency diversity. DSSS sig-
by sending replicas of transmitted data over several nals are designed to achieve approximate orthogonality of
uncorrelated (or partially correlated) fading channels. multipath-induced components, thus eliminating the need
Since these channels are unlikely to go through a deep for equalizers. Frequency diversity combining in these
fade at the same time, higher average received SNR results systems is achieved using a RAKE correlator [1].
when the outputs of the diversity branches are combined. In addition to the diversity techniques mentioned
Many diversity techniques are used in practice. above, angle of arrival and polarization diversity are
In space, or antenna diversity systems, several antenna utilized in practice. Different diversity methods are often
elements are placed at the transmitter and/or receiver and combined to maximize diversity gain at the receiver.
separated sufficiently far apart to achieve desired degree To illustrate the effect of diversity on the performance
of independence. (Note that the correlation function in of communication systems in fading channels, consider
space is analogous to that in time [10]. Typically, antenna the following simple example. Suppose the transmitted
separations of about λ/2 at the mobile station and several BPSK symbol bk is sent over L independent Rayleigh
λ at the base station are required to achieve significant fading channels (we suppress the time index k below). The
antenna decorrelation, where λ is the wavelength.) received equivalent lowpass samples are
Time diversity relies on transmitting the same
information in several time slots that are separated by ri = ci b + zi , i = 1, . . . , L (26)
788 FADING CHANNELS
covers mostly the linear FSRs. The linear FSR sequences to be fed back to the rightmost stage. The leftmost stage
have been studied in various mathematics literature under gives an output sequence in which the first L terms are in
the term linear recurring sequences. Lidl and Niederreiter fact given as an initial condition.
gave a comprehensive treatment on this subject [13]. Some A state of this FSR at one instant k can be defined
other well-known textbooks on the theory of finite fields simply as the vector (xk−L , xk−L+1 , . . . , xk−1 ), and this will
and linear recurring sequences are available [14–18]. In be changed into (xk−L+1 , xk−L+2 , . . . , xk ) at the next instant.
this article, we will discuss the condition for their output An FSR is called linear if the connection logic is a linear
sequences to have maximum possible period. The maxi- function on xk−L , xk−L+1 , . . . , xk−1 , that is, if it is of the form
mum period sequences, which are known as m-sequences,
are described in detail, including randomness properties. xk = f (xk−L , xk−L+1 , . . . , xk−1 )
Two properties of m-sequences deserve special attention:
= cL xk−L ⊕ cL−1 xk−L+1 ⊕ · · · ⊕ c1 xk−1 (2)
m-sequences of period P have the two-level ideal autocor-
relation, which is the best over all the balanced binary
sequences of the same period, and they have the linear for some fixed constants c1 , c2 , . . . , cL . Otherwise, it is
complexity of log2 (P), which is the worst (or the small- called nonlinear. Here, the values of xi are either 0 or 1,
est) over the same set. Some related topics on these and hence the sequence is said to be over a binary alphabet,
properties will also be discussed. To give some further which is usually denoted as F2 , and ci ∈ F2 for all i. The
details of the ideal two-level autocorrelation property, we operation ⊕ can easily be implemented as an exclusive-OR
describe a larger family of balanced binary sequences operation and ci xk−i as an AND operation both using digital
which come from, so called, cyclic Hadamard difference logic gates. In the remaining discussion, we will simply
sets. An m-sequence can be regarded as the characteristic use addition and multiplication (mod 2), respectively, for
sequence of a cyclic Hadamard difference set of the Singer these operations. Over this binary alphabet, therefore, one
type. To give some better understanding of the linear can add and multiply two elements, and the subtraction is
complexity property, we describe the Berlekamp–Massey the same as addition. There is only one nonzero element
algorithm (BMA), which determines the shortest possible (which is 1), and the division by 1 is the same as the
linear FSR that generates a given sequence. multiplication by 1.
In Section 4, we describe some special cases of FSRs Another method of describing an FSR is to use its
(including nonlinear FSRs) with disjoint cycles in their truth table, in which all the 2L states are listed on
state diagrams. Branchless condition and balanced logic the left column and the next bits calculated from the
condition will be discussed. De Bruijn sequences will connection logic f are listed on the right. The state change
briefly be described. Finally, four-stage FSRs are analyzed can easily be illustrated by the state diagram in which
in detail for a complete example. Section 5 gives some every state is a node and an arrow indicates the beginning
concluding remarks. We will restrict our treatment to only (predecessor) and ending (successor) states. Figures 2 and
the sequences over a binary alphabet {0, 1} in this article. 3 show examples of three-stage FSRs, including their
truth tables, and state diagrams. Note that there are
exactly 2L states in total, and every state has at most
2. FEEDBACK SHIFT REGISTER SEQUENCES, TRUTH two predecessors and exactly one successor. Note also that
L
TABLES, AND STATE DIAGRAMS there are 22 different L-stage FSRs, corresponding to the
number of choices for the next bit column in the truth table.
The operation of an FSR can best be described by its state Finally, note that Fig. 3 has two disjoint cycles while Fig. 2
transition diagram. Its output at one time instant depends has branches and some absorbing states. Therefore, the
only on the previous state. Figure 1 shows a generic block FSR in Fig. 2 eventually will output the all-zero sequence,
diagram of an FSR (linear or non-linear) with L stages. At while that in Fig. 3 will output a sequence of period 7
every clock, the content of a stage is shifted to the left, and unless its initial condition is 000.
the connection logic or the boolean function f calculates a In order to investigate this situation more closely,
new value xk observe that any state has a unique successor, but up
to two predecessors. From this, we observe that a branch
xk = f (xk−L , xk−L+1 , . . . , xk−1 ) (1) occurs at a state that has two predecessors, and this
f (a0 , a1 , . . . , aL−1 ) = f (a0 , a1 , . . . , aL−1 ) ⊕ 1 This contains all the connection coefficients, and will
completely determine the operation of the LFSR provided
= f (a0 , a1 , . . . , aL−1 ) that the initial condition is specified. Note that cL = 0
in order for this LFSR to be genuinely with L stages.
Let g(a1 , a2 , . . . , aL−1 ) be a boolean function on L − 1 Otherwise, the recursion becomes of degree less than L,
variables such that and the LFSR with less than L stages can also be used to
implement the recursion.
f (0, a1 , . . . , aL−1 ) = g(a1 , a2 , . . . , aL−1 ) Note that it is also the characteristic polynomial of
the sequence satisfying this recursion. A given sequence
Then, the relation shown above can be written as may satisfy many other recursions that differ from each
other. The minimal polynomial of a given sequence is
f (a0 , a1 , . . . , aL−1 ) = a0 ⊕ g(a1 , a2 , . . . , aL−1 ) (3) defined as the minmum degree characteristic polynomial
of the sequence. It is irreducible, it becomes unique if it is
restricted to be monic, and it divides all the characteristic
This is called the branchless condition for an L-stage FSR.
polynomials of the sequence. We will return to this and
For FSRs with the branchless condition, the corresponding
more later when we discuss the linear complexity of
truth table has only 2L−1 independent entries, and the top
sequences.
half of the truth table must be the complement of the
In the state diagram of any LFSR, every state will have
bottom half. This condition is automatically satisfied with
a unique successor and a unique predecessor, as stated at
linear FSRs, which are the topic of the next section.
the end of the previous section. This forces the diagram to
be (possibly several) disjoint cycles of states. In Fig. 3, the
3. LINEAR FEEDBACK SHIFT REGISTERS AND state diagram has two disjoint cycles, one with length 7,
m-SEQUENCES and the other with length 1. From this, we can easily see
that the output sequence of an LFSR is periodic, and the
3.1. Basics period is the length of the cycle that the initial state (or the
initial condition) belongs to. We can conclude, therefore,
The output sequence {s(k)|k = 0, 1, 2, . . .} of a linear that the output sequence of a LFSR is periodic with some
feedback shift register (LFSR) with L stages as shown period P that depends on both the initial condition and the
in Fig. 4 satisfies a linear recursion of degree L. Given characteristic polynomial.
792 FEEDBACK SHIFT REGISTER SEQUENCES
One special initial condition is the all-zero state, and which we will not discuss in detail here. Instead, we refer
this state will always form a cycle of length 1 for any the reader to some references for further exploration in
LFSR. For any other initial state, the cycle will have theory [13–18]. See the article by Hansen and Mullen [19]
a certain length ≥1, and this length is the period of for a list of primitive polynomials of degree up to a few
the output sequence with the given initial condition. hundreds, which will generally suffice for any practical
Certainly, the cycle with the maximum possible length application. There are φ(2L − 1)/L primitive polynomials
must contain every not-all-zero state exactly once, and the of degree L over F2 . Some primitive polynomials are shown
output sequence in this case is known as the maximal in Table 1 for L up to 10. Here, φ(n) is the Euler φ function,
length linear feedback shift register sequence, or the m- and it counts the number of integers from 1 to n that are
sequence, in short. Sometimes, PN sequences are used relatively prime to n.
instead of m-sequences and vice versa, but we will make a In order to describe some properties of the output
clear distinction between these two terms. PN sequences sequences of LFSRs, we include all the four-stage LFSRs:
refer to (general) pseudonoise sequences that possess some block diagrams, characteristic polynomials, truth tables,
or various randomness properties, and m-sequences are state transition diagrams, and the output sequences
a specific and very special example of PN sequences. in Figs. 5–8. Note that there are 16 linear logics for
For an L-stage LFSR, this gives the period 2L − 1, and four-stage LFSRs, and the condition cL = 1 reduces it
Fig. 3 shows an example of an m-sequence of period 7. In into half.
fact, it shows seven different phases of this m-sequence Figure 5 shows two LFSRs that generate m-sequences
depending on the seven initial conditions. Note also that of period 15. The detailed properties of m-sequences
the history of any stage is the same m-sequence in different will be described in Section 3.2. Here, we note that
phase. two characteristic polynomials f1 (x) = x4 + x3 + 1 and
The operation of an LFSR is largely determined by its f2 (x) = x4 + x + 1 are reciprocal to each other. That is,
characteristic polynomial. It is a polynomial of degree L the coefficients are 11001 and 10011. This gives two m-
over the binary alphabet F2 . How it factors over F2 is sequences that are reciprocal to each other. In other words,
closely related to the properties of the output sequence. one is a 14 decimation of the other. Little arrows with a dot
In order to discuss the relation between the characteristic under the output sequences indicate this fact. Note that
polynomials and the corresponding output sequences, we the roots of f2 (x) are the 14th power of those of f1 (x). In
define some relations between sequences of the same general, if f (x) and g(x) are primitive of the same degree
period. and the roots of one polynomial are dth power of the
Let {s(k)} and {t(k)} be arbitrary binary sequences of other, then the m-sequence from one polynomial is a d
period P. Then we have the following three important decimation of that from the other. The truth table shows
relations between these two sequences: only the top half since its bottom half is the complement
of what is shown here, as a result of the branchless
1. One is said to be a cyclic shift of the other if there is condition.
a constant integer τ such that t(k − τ ) = s(k) for all Figure 6 shows LFSRs with the characteristic poly-
k. Otherwise, two sequences are said to be cyclically nomials that factor into smaller-degree polynomials.
distinct. When one is a cyclic shift of the other Note that f3 (x) = x4 + x3 + x2 + 1 = (x + 1)(x3 + x + 1) and
with τ = 0, they are said to be in different phases. f4 (x) = x4 + x2 + x + 1 = (x + 1)(x3 + x2 + 1). Since both of
Therefore, there are P distinct phases of {s(k)} that the degree 3 polynomials in their factorizations are prim-
are all cyclically equivalent. itive, the LFSR generates m-sequences of period 7, which
2. One is a complement of the other if t(k) = s(k) + 1 for could have been generated by a three-stage LFSR. Two
all k. characteristic polynomials are reciprocal, and the output
3. Finally, one is a decimation (or d decimation) of
the other if there are constants d and τ such that
t(k − τ ) = s(dk) for all k. If d is not relatively prime
Table 1. The Number of Primitive
to the period P, then the d decimation will result in
Irreducible Polynomials of Degree L and
a sequence with shorter period, which is P/g, where
Some Examplesa
g is the GCD of P and d.
Degree L φ(2L − 1)/L Primitive Polynomial
If some combination of these three relations applies
1 1 11
to {s(k)} and {t(k)}, then they are called equivalent. 2 1 111
Equivalent sequences share lots of common properties, 3 2 1011
and they are essentially the same sequences even if they 4 2 10011
look very different. 5 6 100101
The necessary and sufficient condition for an L-stage 6 6 1000011
LFSR to produce an m-sequence of period 2L − 1 is that the 7 18 10000011
characteristic polynomial of degree L is primitive over F2 . 8 16 100011101
This means simply that f (x) is irreducible and f (x) divides 9 48 1000010001
L 10 60 10000001001
x2 −1 − 1, but f (x) does not divide x j − 1 for all j from 1
to 2L − 2. The elementary theory of finite fields (or Galois a
The binary vector 1011 for L = 3 represents either
fields) deals much more with these primitive polynomials, x3 + x + 1 or 1 + x2 + x3 .
x0 x1 x2 x3 x0 x1 x2 x3
x4 x4
+ +
f1 ( x ) = x 4 + x 3 + 1 f2(x ) = x 4 + x + 1
States f1 f2
0000 0000
0000 0 0
0001 0 1
0001 0001
0010 0 0
0010 0011
0100 0011 0 1 0111
1001 0100 1 0 1111
0011 1110
0101 1 1
0110 1101
1101 0110 1 0 1010
1010 0111 1 1 0101
0101 1011
1011 0110
f1 (0), (000100110101111)
0111 1100
1111 1001 Figure 5. Block diagrams, state dia-
1110 f2 (0), (000111101011001) 0010 grams, truth tables, and output sequences
of four-stage LFSRs that generate m-
1100 0100
sequences: f1 (x) = x4 + x3 + 1 and f2 (x) =
1000 1000 x4 + x + 1.
x0 x1 x2 x3 x0 x1 x2 x3
x4 x4
+ +
f3(x ) = x 4 + x 3 + x 2 + 1 f4(x ) = x 4 + x 2 + x + 1
States f3 f4
0000 1111 0000 1111
0000 0 0
793
794 FEEDBACK SHIFT REGISTER SEQUENCES
x0 x1 x2 x3 x0 x1 x2 x3
x4 x4
+ +
f5(x ) = x 4 + x 2 + 1 f6(x ) = x 4 + x 3 + x + 1
States f5 f6
0000 1111
1011
0000 0 0
0000 0110
1101 0001 0 1 0010 0101
0100 1010
0010 1 0
0001 0011 1001
0011 1 1 0001
0010 0111
0100 0 1 1011 0011
0101 1111
0110 0111
1010 1110 0101 0 0
1101 1110
0100 1100 0110 1 1 1100
1000 1001
0111 1 0 1000
x0 x1 x2 x3 x0 x1 x2 x3
PSR
x4 PCR x4
+ +
f7(x ) = x 4 + x 3 + x 2 + x + 1 f8(x ) = x 4 + 1
0001 States f7 f8
0000 1111
0011
0000 0 0
0000 0110
0001 1 0 0101 0001
1100
1010 0010
1000 0010 1 0
0100
0010
0011 0 0 1000
0101 0111 0011
0100 1 0
1010 1111 0110 0111
0100 1110 0101 0 0 1100 1110
1001 1101 0110 0 0 1001 1101
1011 1011
0111 1 0
Figure 8. Block diagrams, state diagrams,
truth tables, and output sequences of four-stage f7 (0), (00011), (00101), (01111)
LFSRs with characteristic polynomials f7 (x) =
x4 + x3 + x2 + x + 1 (PSR) and f8 (x) = x4 + 1 f8 (0), (1), (01), (0001), (0011), (0111)
(PCR).
sequences are reciprocal. Figure 7 shows LFSRs with the has an irreducible but not primitive characteristic
self-reciprocal characteristic polynomials, and the output polynomial f7 . Observe that all the cycles except for the
sequences are self-reciprocal also. This means that read- cycle containing (0000) have the same length, or the
ing a sequence in the reverse direction gives a cyclically same period for its output sequences except for the all-
equivalent one to the original. zero sequence. This happens because the characteristic
Figure 8 shows two special LFSRs: the pure summing polynomial f7 is irreducible. An irreducible polynomial,
register (PSR) and the pure cycling register (PCR). PSR therefore, corresponds to a unique period, and it is called
FEEDBACK SHIFT REGISTER SEQUENCES 795
the period of the irreducible polynomial. A primitive where k − τ is computed mod P. When binary phase-
polynomial of degree L is simply an irreducible polynomial shift-keying is used to digitally modulate incoming bits,
with period 2L − 1. Possible periods of a given irreducible we are considering the incoming bits whose values are
polynomial of degree L are the factors of the integer 2L − 1, taken from the complex values {+1, −1}. The change in
which are not of the form 2 j − 1 for j < L. When 2L − 1 is alphabet between si ∈ {0, 1} and ti ∈ {+1, −1} is commonly
a prime, called a Mersenne prime, then every irreducible performed by the relation ti = (−1)si . Then we have R(τ ) =
polynomial of degree 2L − 1 must be primitive.
P−1
Details on the property of PCR and some other t(k)t(k − τ ) for each τ , and this calculates the number of
properties of FSRs will be given at the end of Section 4. k=0
agreements minus the number of disagreements when one
3.2. Properties of m-Sequences period of {s(k)} is placed on top of its (cyclically) τ -shifted
version. For any integer L ≥ 2, and for any m-sequence
Now, we will describe some basic properties of m-
{s(k)} of period P = 2L − 1, the ideal autocorrelation
sequences of period 2L − 1, mostly without proofs. The first
property of m-sequences refers to the following:
three properties, namely, balance, run-distribution, and
ideal autocorrelation are commonly known as ‘‘Golomb’s
2L − 1, τ ≡0 (mod 2L − 1)
postulates on random sequences’’ [10]. Most of the R(τ ) = (7)
−1, τ ≡ 0 (mod 2L − 1)
following properties can be easily checked for the examples
shown in Figs. 3 and 5. The ideal two-level autocorrelation property of an m-
3.2.1. Balance Property. In one period of an m- sequence enables one to construct a Hadamard matrix
sequence, the number of 1s and that of 0s are nearly of order 2L of, so called, cyclic type. A Hadamard matrix of
the same. Since the period is an odd integer, they cannot order n is an n × n matrix with entries only of ±1 such that
be exactly the same, but differ by one. This is called the any two distinct rows are orthogonal to each other [20].
balance property. When a matrix of size (2L − 1) × L is When the symbols of an m-sequence are mapped onto {±1}
formed by listing all the states of the maximum length and all the cyclic shifts are arranged in a square matrix
cycle, then the rightmost column is the m-sequence and it of order (2L − 1), the relation in (7) implies that the dot
will contain 2L−1 ones and 2L−1 − 1 zeros since the rows product of any two distinct rows is exactly −1 over the
are permutations of all the vectors of length L except for complex numbers. Therefore, adjoining a leftmost column
the all-zero vector. of all +1s and a top row of all +1s will give a cyclic
Hadamard matrix of order 2L .
3.2.2. Run Distribution Property. A string of the same Cyclic-type Hadamard matrices can be constructed
symbol of length l surrounded by different symbols at from a balanced binary sequence of period P ≡ 3 (mod 4)
both ends is called a ‘‘run of length l.’’ For example, a that has the ideal two-level autocorrelation function.
run of 1s of length 4 looks like . . . 011110 . . . . The run The m-sequences are one such class of sequences. Some
distribution property of m-sequences refers to the fact other well-known balanced binary sequences with period
that a shorter run appears more often than a longer run, P ≡ 3 (mod 4) will be described later.
and that the number of runs of 1s is the same as that of
0s. Specifically, it counts the number of runs of length l for 3.2.4. Span Property. If two vectors, (s(i), s(i + 1), . . . ,
l ≥ 1 in one period as shown in Table 2. The span property s(i + L − 1)) and (s(j), s(j + 1), . . . , s(j + L − 1)), of length L
of m-sequences implies this run distribution property. are distinct whenever i = j, then the sequence {s(k)} is said
to have this property. The indices of terms are considered
3.2.3. Ideal Autocorrelation Property. A periodic unnor-
mod P. For an m-sequence of period P, in addition, all the
malized autocorrelation function R(τ ) of a binary sequence
not-all-zero vectors of length L appear exactly once on the
{s(k)} of period P is defined as
windows of length L. This can easily be seen by observing
P−1 that an m-sequence is the sequence of the rightmost bits of
R(τ ) = (−1)s(k)+s(k−τ ) , τ = 0, 1, 2, . . . , states in the maximum length cycle of the state diagram.
k=0 Each window of length L can then easily be identified with
a state in this state diagram.
If we insert an additional 0 right after the run of 0’s of
Table 2. Run Distribution Property of
m-Sequences of Period 2L − 1 length L − 1, the sequence will have period 2L and the span
property becomes perfect in that every window of length
Length Number of Runs of 1s Number of Runs of 0s L shows all the vectors of length L exactly once. This is
L 1 0 an example of a de Bruijn sequence of order L. The above
L−1 0 1 construction has been successfully adopted [12] for use
L−2 1 1 in spread-spectrum modulations, for the values of L = 14
L−3 2 2 and L = 41. In general, many algorithms are currently
L−4 4 4 known for de Bruijn sequences of period 2L [21], and the
.. .. .. modification described above can easily be implemented,
. . .
as described in Section 4.
2 2L−4 2L−4
1 2L−3 2L−3
Total 2L−3 2L−3 3.2.5. Constant-on-the-Coset Property. For any m-
sequence of period 2L − 1, there are 2L − 1 cyclically
796 FEEDBACK SHIFT REGISTER SEQUENCES
equivalent sequences corresponding to the 2L − 1 start- m. Then the trace function trnm (·) is a mapping from F2n to
ing points. The term constant-on-the-coset property refers its subfield F2m given by
to the fact that there exists exactly one among all these
such that it is fixed with 2 decimation. An m-sequence in
e−1
mi
this phase is said to be in the characteristic phase. There- trnm (x) = x2
i=0
fore, for the m-sequence {s(k)} in the characteristic phase,
the following relation is satisfied:
It is easy to check that the trace function satisfies
the following: (1) trnm (ax + by) = a trnm (x) + b trnm (y), for
s(2k) = s(k), for all k (8) m
all a, b ∈ F2m , x, y ∈ F2n ; (2) trnm (x2 ) = trnm (x), for all x ∈
F2n ; and (3) tr1 (x) = tr1 (trm (x)), for all x ∈ F2n . See the
n m n
This relation deserves some special attention. It implies literature [13–18] for the detailed properties of the trace
that every term in the 2kth position is the same as the function.
one in the kth position. This gives a set (or several sets) of Let q = 2L and α be a primitive element of Fq . Then, an
positions in which the corresponding terms are the same. m-sequence {s(k)} of period 2L − 1 can be represented as
For example, {1, 2, 4, 8, 16, . . .} is one such set so that all
the terms indexed by any number in this set are the s(k) = trL1 (λα k ), k = 0, 1, 2, . . . , 2L − 2 (10)
same. Since the sequence is periodic with period 2L − 1,
the preceding set is a finite set and called a cyclotomic where λ = 0 is a fixed constant in Fq . We just give a
coset mod 2L − 1. Starting from 3 gives another such remark that λ corresponds to the initial condition and
set, {3, 6, 12, 24, . . .}, and so on. In general, the set of the choice of α corresponds to the choice of a primitive
integers mod 2L − 1 can be decomposed into some number polynomial as a connection polynomial when this sequence
of disjoint cyclotomic cosets, and now the constant-on-the- is generated using an LFSR. Any such representation, on
coset property describes itself clearly. the other hand, gives an m-sequence [17]. When λ = 1,
the sequence is in the characteristic phase, and the
3.2.6. Cycle-and-Add Property. When two distinct constant-on-the-coset property can easily be checked since
phases of an m-sequence are added term by term, a s(2k) = trL1 (α 2k ) = trL1 (α k ) = s(k) for all k.
sequence of the same period appears and it is a differ-
ent phase of the same m-sequence. In other words, for 3.2.9. Cross-Correlation Properties of m-Sequences. No
any given constants τ1 ≡ τ2 (mod 2L − 1), there exists yet pair of m-sequences of the same period have the ideal
another constant τ3 such that cross-correlation. The best one can achieve is a three-level
cross-correlation, and the pair of m-sequences with this
s(k − τ1 ) + s(k − τ2 ) = s(k − τ3 ), k = 0, 1, 2, . . . (9) property is called a preferred pair. Since all m-sequences
of a given period are some decimations or cyclic shifts of
This is the cycle-and-add property of m-sequences. On the each other, and they can all be represented as a single
other hand, if a balanced binary sequence of period P trace function from F2L to F2 , the cross-correlation of a
satisfies the cycle-and-add property, then P must be of the pair of m-sequences can be described as
form 2L − 1 for some integer L and the sequence must be
L −1
2
an m-sequence. L k+τ )+trL (α dk )
Golomb has conjectured that the span property and Rd (τ ) = (−1)tr1 (α 1
k=0
the ideal two-level autocorrelation property of a balanced
binary sequence implies its cycle-and-add property [22].
where d indicates that the second m-sequence is a d
This has been confirmed for L up to 10 by many others,
decimation of the first, and τ represents the amount of
but still awaits a complete solution.
phase offset with each other.
Many values of d have been identified that result in
3.2.7. Number of Cyclically Distinct m-Sequences of a preferred pair of m-sequences, but it is still unknown
Period 2L − 1. For a given L, the number of cyclically whether we have found them all. The most famous one
distinct m-sequences of period 2L − 1 is equal to the that gives a Gold sequence family comes from the value
number of primitive polynomials of degree L over F2 , d = 2i + 1 or d = 22i − 2i + 1 for some integer i when
and this is given by φ(2L − 1)/L, where φ(n) is the Euler L/(L, i) is odd. Some good references on this topic are
φ function and counts the number of integers from 1 to Refs. [36–38], and also Chapter 5 of Ref. [2].
n that are relatively prime to n. All these φ(2L − 1)/L
m-sequences of period 2L − 1 are equivalent, and they are 3.2.10. Linear Complexity. Given a binary periodic
related with some decimation of each other. Therefore, sequence {s(k)} of period P, one can always construct an
any given one m-sequence can be used to generate all LFSR that outputs {s(k)} with a suitable initial condition.
the others of the same period by using some appropriate One trivial solution is the pure cycling register as shown
decimations. in Fig. 8. It has P stages, the whole period is given as
its initial condition, and the characteristic polynomial (or
3.2.8. Trace Function Representation of m-Sequences. the connection) is given by f (x) = xP + 1 corresponding
Let q be a prime power, and let Fq be the finite field with q to s(k) = s(k − P). The best one can do is to find the
elements. Let n = em > 1 for some positive integers e and LFSR with the smallest number of stages, and the linear
FEEDBACK SHIFT REGISTER SEQUENCES 797
complexity of a sequence is defined as this number. The LFSR with the characteristic polynomial f (m) (x) and
Equivalently, it is the degree of the minimal polynomial length Lm (S) could not have generated s(0), s(1), . . . , s(m −
of the given sequence, which is defined as the minimum 1), s(m). Therefore, dm = 0.
degree characteristic polynomial of the sequence. If dn = 0, then this LFSR also generates the first n + 1
The linear complexity of a PN sequence, in general, bits s(0), s(1), . . . , s(n) and therefore, Ln+1 (S) = Ln (S) and
measures cryptographically how strong it is. It is well f (n+1) (x) = f (n) (x).
known that the same copy (whole period) of a binary If dn = 0, a new LFSR must be found to generate
sequence can be generated whenever 2N consecutive terms the first n + 1 bits s(0), s(1), . . . , s(n). The connection
or more are observed by a third party where N is the polynomial and the length of the new LFSR are updated
linear complexity of the sequence. This forces us to use by the following:
those sequences with larger linear complexity in some
practice. The m-sequences are the worst in this sense f (n+1) (x) = f (n) (x) − dn d−1 n−m (m)
m x f (x)
because an m-sequence of period 2L − 1 has its linear
complexity L, and this number is the smallest possible Ln+1 (S) = max [Ln (S), n + 1 − Ln (S)]
over all the balanced binary sequences of period 2L − 1. In
the following, we describe the famous Berlekamp–Massey The complete BM algorithm for implementations is as
algorithm for determining the linear complexity of a binary follows:
sequence [23].
condition of s(0), s(1), . . . , s(LN (S) − 1). We will denote this d = s(n) − ci s(n − i)
i=1
LFSR as LFSR(f (N) (x), LN (S)), where the characteristic
polynomial after the Nth iteration is given by
3. If d = 0, then r = r + 1, and go to step 6.
f (N) (x) = 1 + c(N)
1 x + c2 x + · · · + cLN (S) x
(N) 2 (N) LN (S) 4. If d = 0 and 2L > n, then
It is not difficult to check that (1) LN (S) = 0 if and f (x) = f (x) − db−1 xr g(x), r=r+1
only if s(0), s(1), . . . , s(N − 1) are all zeros, (2) LN (S) ≤ N,
and (3) LN (S) must be monotonically nondecreasing with and go to step 6.
increasing N.
5. If d = 0 and 2L ≤ n, then
The BMA updates the degree Ln (S) and the charac-
teristic polynomial f (n) (x) for each n = 1, 2, . . . , N. Assume
that f (1) (x), f (2) (x), . . . , f (n) (x) have been constructed, where h(x) = f (x), f (x) = f (x) − db−1 xr g(x), L = n + 1 − L,
the LFSR with connection f (n) (x) of degree Ln (S) generates g(x) = h(x), b = d, r = 1.
s(0), s(1), . . . , s(n − 1). Let
Ln (S) 6. Increase n by 1 and return to step 2.
f (n) (x) = 1 + c(n)
i x
i
i=1
When n = N and the algorithm is stopped in step (2), the
The next discrepancy, dn , is the difference between s(n) quantities produced by the algorithm bear the following
and the (n + 1)st bit generated by so far the minimal- relations:
length LFSR with Ln (S) stages, and given as
f (x) = f (N) (x)
Ln (S)
dn = s(n) + i s(n − i)
c(n) L = LN (S)
i=1
r=N−m
Let m be the sequence length before the last length change d = dN−1
in the minimal-length register:
g(x) = f (m) (x)
Lm (S) < Ln (S), and Lm+1 (S) = Ln (S) b = dm
798 FEEDBACK SHIFT REGISTER SEQUENCES
Table 3. Example of BM Algorithm to the Sequence and this has been confirmed for all the values of v = 4n − 1
(s0 , s1 , s2 , s3 , s4 , s5 , s6 ) = (1, 0, 1, 0, 0, 1, 1) over F2 up to v = 10000 with 13 cases unsettled, the smallest of
n L f (x) r g(x) b sn d which is v = 3439 [30].
0 0 1 1 1 1 1 1
3.4.1. When v = 4n − 1 Is a Prime. There are two
1 1 1+x 1 1 1 0 1
methods in this case [24,29]. The first corresponds to all
2 1 1 2 1 1 1 1
3 2 1 + x2 1 1 1 0 0 such values of v, and the resulting sequences are called
4 2 1 + x2 2 1 1 0 1 Legendre sequences. Here, D picks up integers mod v that
5 3 1 1 1 + x2 1 1 1 are squares mod v. The second corresponds to some such
6 3 1 + x + x3 2 1 + x2 1 1 0 values of v that can be represented as 4x2 + 27 for some
7 3 1 + x + x3 3 1 + x2 1 integer x, and the resulting sequences are called Hall’s
sextic residue sequences. Here, D picks up integers mod v
that are sixth powers mod v and some others.
An example of BM algorithm applied to a binary sequence
3.4.2. When v = 4n − 1 Is a Product of Twin Primes.
of length 7 is shown in Table 3.
This is a generalization of the method for constructing
Legendre sequences, and the resulting sequences are
3.4. Balanced Binary Sequences with the Ideal Two-Level
called twin prime sequences. Let v = p(p + 1), where both
Autocorrelation
p and p + 2 are prime. Then, D picks up the integers d that
In addition to m-sequences, there are some other well- are (1) squares both mod p and mod p + 2, (2) nonsquares
known balanced binary sequences with the ideal two- both mod p and mod p + 2, and (3) 0 mod p + 2.
level autocorrelation function. For period P = 4n − 1 for
some positive integer n, all these are equivalent to 3.4.3. When v = 4n − 1 = 2L − 1. Currently, this is a
(4n − 1, 2n − 1, n − 1)-cyclic difference sets [24]. most active area of research, and at least seven families are
In general, a (v, k, λ)-cyclic difference set (CDS) is a known. All the sequences of this case can best be described
k-subset D of the integers mod v, Zv , such that for each as a sum of some decimations of an m-sequence, or a sum
nonzero z ∈ Zv there are exactly λ ordered pairs (x, y), of trace functions from F2L to F2 . The m-sequences for all
x ∈ D, y ∈ D with z = x − y (mod v) [24–28]. For example, the positive integers L and GMW sequences for all the
D = {1, 3, 4, 5, 9} is a (11, 5, 2)-CDS and every nonzero composite integers L [32,33] have been known for many
integer from 1 to 10 is represented by the difference years. One important construction of a larger period from
x − y (mod 11), x ∈ D, y ∈ D, exactly twice. a given one is described by No et al. [34]. More recent
The characteristic sequence {s(t)} of a (v, k, λ)-cyclic discoveries were summarized by No et al. [35], most of
difference set D is defined as s(t) = 0 if and only if t ∈ D. which have now been proved by many others.
For other values of t, the value 1 is assigned to s(t). This
completely characterizes binary sequences of period v with
two-level autocorrelation function, and it is not difficult 4. SOME PROPERTIES OF FSR WITH DISJOINT CYCLES
to check that the periodic unnormalized autocorrelation
function is given as [29] We now return to the basic block diagram of an L-
stage FSR as shown in Fig. 1, and its truth table, and
state transition diagram as shown in Figs. 2 and 3.
v, τ ≡ 0 (mod v)
R(τ ) = (11) Unless otherwise stated, the feedback connection logic
v − 4(k − λ), τ ≡ 0 (mod v)
f (xk−L , xk−L+1 , . . . , xk−1 ) of all FSRs in this section satisfy
The out-of-phase value should be kept as low as possible the branchless condition given in Eq. (3).
in some practice, and this could happen when v = The simplest FSR with L stages is the pure cycling
4(k − λ) − 1, resulting in the out-of-phase value to be register (PCR), as shown in Fig. 8 for L = 4. It is linear
independent of the period v. The CDS with this parameter and has the characteristic polynomial f (x) = xL + 1, or
is called a cyclic Hadamard difference set, and this has the feedback connection logic xk = xk−L . This obviously
been investigated by many researchers [24–28]. satisfies the branchless condition (3), and the state
In the following, we will simply summarize all diagram consists only of disjoint cycles. In fact, one can
the known constructions for (4n − 1, 2n − 1, n − 1) cyclic prove that the number Z(L) of disjoint cycles of an L-stage
Hadamard difference sets, or equivalently, balanced PCR is given as
binary sequences of period v = 4n − 1 with the two-level
1
ideal autocorrelation function, which are also known as Z(L) = φ(d)2L/d , (12)
Hadamard sequences [30]. L d|L
Three types of periods are currently known: (1) v =
4n − 1 is a prime, (2) v = 4n − 1 is a product of twin where φ(d) is the Euler φ function and the summation is
primes, and (3) v = 4n − 1 is one less than a power of over all the divisors of L. It is not very surprising that
2. All these sequences can be represented as a sum of this number is the same as the number of irreducible
some decimations of an m-sequence. Song and Golomb polynomials of degree L over the binary alphabet F2 .
have conjectured that a Hadamard sequence of period v Golomb had conjectured that the number of disjoint cycles
exists if and only if v is one of the above three types [31], from an L-stage FSR with branchless condition satisfied
FEEDBACK SHIFT REGISTER SEQUENCES 799
by its connection logic is at most Z(L) given in (12), and Table 4. Output Sequences from
this was confirmed by Mykkeltveit in 1972 [39]. Two Degenerated FSR with L = 4
On the other hand, the minimum number of disjoint CCR Period CSR Period
cycles is 1, and this corresponds to de Bruijn sequences
of period 2L . Inserting a single 0 into any m-sequence (00001111) 8 (1) 1
right after the run of 0s of length L − 1 gives a de Bruijn (00001) 5
(01011010) 8 (00111) 5
sequence of period 2L . This can be done by the following
(01011) 5
modification to the linear logic fold that generates the
m-sequence
fnew = fold ⊕ xk−1 xk−2 · · · xk−L+1 , (13) All these satisfy the branchless condition, and the output
sequences from CCR and CSR for L = 4 are listed in
Table 4.
where x represents the complement of x. A de Bruijn
The number of L-stage FSRs with branchless condition
sequence can best be described using Good’s diagram. L−1
This is shown in Fig. 9 for L = 2 and L = 3. Note that is given as 22 . Another condition for the truth table
any node in a Good diagram has two incoming edges as of an FSR to be practically useful is the balanced logic
well as two outgoing edges. A de Bruijn sequence of period condition. A truth table of f for an L-stage FSR is said
2L corresponds to a closed path (or a cycle) on the Good to have the balanced logic condition if f has equally
diagram of order L, which visits every node exactly once. many 1s and 0s in its truth table. Both the branchless
It was shown earlier in 1946 by de Bruijn that the number condition and the balanced logic condition together imply
L−1 that f has equally many 1s and 0s in its top half of
of such cycles is given as 22 −L [10].
the truth table. The balanced logic condition guarantees
The number of disjoint cycles of an L-stage FSR with
that the autocorrelation of the output sequence of period
(possibly) nonlinear logic connections is not completely
approaching 2L tends to zero for τ = 1, 2, . . . , L.
determined. Toward this direction, we simply state a
Finally, we give a detailed analysis on the branchless
condition for the parity of this number for FSRs with
four-stage FSRs with all possible connections. With the
branchless condition. The number of disjoint cycles of 3
branchless condition on it, there are 22 = 28 = 256 such
an FSR with the branchless condition is even (or odd,
FSRs. Among these, 8 FSRs are linear as shown in
respectively) if and only if the number of 1s in its truth table
Figs. 5–8. These are collected in Table 5. Except for PCR,
of g(xk−1 , xk−2 , . . . , xk−L+1 ) is even (or odd, respectively) [10].
all the other 7 LFSRs are balanced logic FSRs. Among
In other words, the parity of the top half of the truth 3
the 248 nonlinear FSRs, there are 22 −4 = 16 FSRs, which
table is the parity of the number of disjoint cycles.
generate de Bruijn sequences of period 16, as shown in
This implies that PCR has an even number of disjoint
Table 6. This table also shows all the 16 output sequences
cycles L > 2.
in four equivalent classes, denoted as A, B, C, and D.
In addition to PCR, there are three more degener-
The linear complexity of each de Bruijn sequence is also
ate cases: complemented cycling registers (CCRs), pure
shown here. The 4th and 14th sequences are modified
summing registers (PSRs), and complemented summing
versions from the 6th and 8th m-sequences in Table 5,
registers (CSRs). Note that PCR and PSR are homo-
respectively. The modification is the insertion of a single
geneous linear, but CCR and CSR are inhomogeneous
0 into the m-sequence right after the run of 0s of length
linear:
3. Note that none of the connection logic satisfies the
balanced logic condition. Among the 256 FSRs with the
xk = xk−L , PCR
branchless condition, there are 84 = 70 FSRs that satisfy
= 1 + xk−L , CCR the balanced logic condition. Table 7 shows all of them. In
this table, ∗ represents that it is linear and is also shown
= xk−1 + xk−2 + · · · + xk−L PSR in Table 5.
= 1 + xk−1 + xk−2 + · · · + xk−L CSR
5. CONCLUDING REMARKS
Table 5. Truth Tables of All Four-Stage LFSRs, Including Their Output Sequences and
Characteristic Polynomials
Truth Table Output Sequences Characteristic Polynomials Figures
Table 6. The Truth Tables of All Four-Stage FSRs That Generate de Bruijn
Sequences in Four Equivalent Classesa
Truth Table Output Sequence Equivalent Class LC
operating. Integers mod 4 is the best known in this regard, Southern California, Los Angeles, in 1986 and 1991,
due to the application of the output sequences into QPSK respectively. After spending 2 years on the research
modulation [13,38]. staff in the Communication Sciences Institute at USC,
There are other directions to which FSR may be he joined Qualcomm Inc., San Diego, California in 1994 as
generalized. For example, one can consider LFSR with a Senior Engineer and worked in a team researching
inputs. One application is to use the LFSR with input and developing North American CDMA Standards for
as a polynomial division circuit. These are used in the PCS and cellular air interface. Finally, he joined the
decoding/encoding of Hamming codes or other channel Department of Electrical and Electronics Engineering at
(block) codes. Another example is to use multiple Yonsei University, Seoul, South Korea in 1995, and is
(nonlinear) FSRs on a stack so that the stages of an currently serving as an Associate Professor. His areas
FSR in one layer are used to produce inputs to upper layer of research interest are sequence design and analysis
FSRs on top of it. These find some important applications for speed-spectrum communications and stream ciphers,
to generating PN sequences with larger linear complexity theory and application of cyclic difference sets, and channel
in streamcipher systems. coding/decoding problems for wireless communications.
All these topics are currently under active research, He is currently visiting the University of Waterloo,
and one should look at journal transactions for the most Ontario, Canada, from March 2002 to Feb 2003 during
recent results and applications. his sabbatical leave.
BIOGRAPHY
BIBLIOGRAPHY
Hong-Yeop Song received his B.S. degree in Electronics
Engineering from Yonsei University in 1984 and his 1. S. W. Golomb, Digital Communications with Space Applica-
M.S.E.E. and Ph.D. degrees from the University of tions, Prentice-Hall, Englewood Cliffs, NJ, 1964.
FEEDBACK SHIFT REGISTER SEQUENCES 801
Table 7. Truth Tables and Cycle Length Distributions of All Four-Stage FSRs That Satisfy Both Branchless and Balanced
Logic Conditionsa
Truth Table Cycle Length Distribution Truth Table Cycle Length Distribution
1 1111000000001111 1 : 1, 15 : 1 36 1110000100011110 5 : 1, 11 : 1
2 1110100000010111 1 : 1, 4 : 1, 5 : 1, 6 : 1 37 1101000100101110 2 : 1, 14 : 1
3 1101100000100111 1 : 1, 2 : 1, 3 : 1, 10 : 1 38 1011000101001110 7 : 1, 9 : 1
4 1011100001000111 1 : 1, 15 : 1 39 0111000110001110 1 : 1, 15 : 1
5 0111100010000111 1 : 2, 5 : 1, 9 : 1 40 1100100100110110 2 : 1, 3 : 1, 5 : 1, 6 : 1
6 1110010000011011 1 : 1, 15 : 1 41 1010100101010110 5 : 1, 11 : 1
7 1101010000101011 1 : 1, 15 : 1 42* 0110100110010110 1 : 1, 5 : 3
8 1011010001001011 1 : 1, 15 : 1 43 1001100101100110 2 : 1, 14 : 1
9 0111010010001011 1 : 2, 6 : 1, 8 : 1 44 0101100110100110 1 : 1, 2 : 1, 3 : 1, 10 : 1
10 1100110000110011 1 : 1, 3 : 1, 6 : 2 45 0011100111000110 1 : 1, 15 : 1
11 1010110001010011 1 : 1, 15 : 1 46 1100010100111010 7 : 1, 9 : 1
12 0110110010010011 1 : 2, 5 : 1, 9 : 1 47 1010010101011010 4 : 1, 12 : 1
13 1001110001100011 1 : 1, 15 : 1 48 0110010110011010 1 : 1, 15 : 1
14 0101110010100011 1 : 2, 3 : 1, 11 : 1 49 1001010101101010 5 : 1, 11 : 1
15* 0011110011000011 1 : 2, 7 : 2 50* 0101010110101010 1 : 1, 15 : 1
16 1110001000011101 1 : 1, 15 : 1 51 0011010111001010 1 : 1, 15 : 1
17 1101001000101101 1 : 1, 2 : 1, 3 : 1, 10 : 1 52 1000110101110010 7 : 1, 9 : 1
18 1011001001001101 1 : 1, 3 : 1, 5 : 1, 7 : 1 53 0100110110110010 1 : 1, 3 : 1, 5 : 1, 7 : 1
19 0111001010001101 1 : 2, 3 : 1, 11 : 1 54 0010110111010010 1 : 1, 15 : 1
20 1100101000110101 1 : 1, 2 : 1, 3 : 1, 10 : 1 55 0001110111100010 1 : 1, 15 : 1
21 1010101001010101 1 : 1, 15 : 1 56 1100001100111100 2 : 1, 14 : 1
22 0110101010010101 1 : 2, 5 : 1, 9 : 1 57 1010001101011100 7 : 1, 9 : 1
23 1001101001100101 1 : 1, 2 : 1, 3 : 1, 10 : 1 58 0110001110011100 1 : 1, 15 : 1
24* 0101101010100101 1 : 2, 2 : 1, 3 : 2, 6 : 1 59 1001001101101100 2 : 1, 3 : 1, 5 : 1, 6 : 1
25 0011101011000101 1 : 2, 3 : 1, 11 : 1 60 0101001110101100 1 : 1, 2 : 1, 3 : 1, 10 : 1
26 1100011000111001 1 : 1, 15 : 1 61* 0011001111001100 1 : 1, 3 : 1, 6 : 2
27 1010011001011001 1 : 1, 15 : 1 62 1000101101110100 2 : 1, 14 : 1
28* 0110011010011001 1 : 2, 7 : 2 63 0100101110110100 1 : 1, 2 : 1, 3 : 1, 10 : 1
29 1001011001101001 1 : 1, 5 : 3 64 0010101111010100 1 : 1, 15 : 1
30 0101011010101001 1 : 2, 5 : 1, 9 : 1 65 0001101111100100 1 : 1, 2 : 1, 3 : 1, 10 : 1
31 0011011011001001 1 : 2, 5 : 1, 9 : 1 66 1000011101111000 5 : 1, 11 : 1
32 1000111001110001 1 : 1, 15 : 1 67 0100011110111000 1 : 1, 15 : 1
33 0100111010110001 1 : 2, 3 : 1, 11 : 1 68 0010011111011000 1 : 1, 15 : 1
34 0010111011010001 1 : 2, 6 : 1, 8 : 1 69 0001011111101000 1 : 1, 4 : 1, 5 : 1, 6 : 1
35 0001111011100001 1 : 2, 5 : 1, 9 : 1 70* 0000111111110000 1 : 1, 15 : 1
a
Here, a : b means that there are b cycles of length a, and * implies that the FSR is linear.
2. M. K. Simon, J. K. Omura, R. A. Scholtz, and B. K. Levitt, 10. S. W. Golomb, Shift Register Sequences, Holden-Day, San
Spread Spectrum Communications Handbook, Computer Francisco, CA, 1967; rev. ed., Aegean Park Press, Laguna
Science Press, Rockville, MD, 1985; rev. ed., McGraw-Hill, Hills, CA, 1982.
1994. 11. D. R. Stinson, Cryptography: Theory and Practice, CRC Press,
3. R. L. Perterson, R. E. Ziemer, and D. E. Borth, Introduction Boca Raton, FL, 1995.
to Spread Spectrum Communications, Prentice-Hall, Engle- 12. TIA/EIA/IS-95, Mobile Station–Base Station Compatibility
wood Cliffs, NJ, 1995. Standard for Dual-Mode Wideband Spread Spectrum Cel-
4. J. G. Proakis and M Salehi, Communication Systems Engi- lular System, published by Telecommunications Industry
Association as a North American 1.5 MHz Cellular CDMA
neering, Prentice-Hall, Englewood Cliffs, NJ, 1994.
Air-Interface Standard, July 1993.
5. A. J. Viterbi, CDMA: Principles of Spread Spectrum Commu-
13. R. Lidl and H. Niederreiter, Finite fields, in Encyclopedia of
nication, Addison-Wesley, Reading, MA, 1995.
Mathematics and Its Applications, Vol. 20, Addison-Wesley,
6. J. G. Proakis, Digital Communications, 4th ed., McGraw-Hill, Reading, MA, 1983.
New York, 2001. 14. E. R. Berlekamp, Algebraic Coding Theory, McGraw-Hill,
7. R. A. Rueppel, Analysis and Design of Stream Ciphers, New York, 1968.
Springer-Verlag, Berlin, 1986. 15. W. W. Peterson and E. J. Weldon, Error-Correcting Codes,
8. A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone, Hand- 2nd ed., MIT Press, Cambridge, MA, 1972.
book of Applied Cryptography, CRC Press, Boca Raton, FL, 16. F. MacWilliams and N. J. A. Sloane, The Theory of Error-
1996. Correcting Codes, North-Holland, 1977.
9. S. B. Volchan, What is a Random Sequence? Am. Math. 17. R. J. McEliece, Finite Fields for Computer Scientists and
Monthly 109: 46–63 (2002). Engineers, Kluwer, 1987.
802 FINITE-GEOMETRY CODES
2.2. PG Codes
N = 22s − 1
K = 22s − 3s Since GF(2(m+1)s ) contains GF(2s ) as a subfield, it is possible
to divide the 2(m+1)s − 1 nonzero elements of GF(2(m+1)s )
dmin = 2s + 1 into N(m, s) = (2(m+1)s − 1)/(2s − 1) = 2ms + 2(m−1)s + · · · +
2s + 1 disjoint subsets of 2s − 1 elements each. Then
each subset is regarded as a point in PG(m, 2s ), the
For s = 2, EG(2, 22 ) has 15 lines not passing through 0, m-dimensional projective geometry over the finite field
each point belonging to four different lines. Then, the GF(2s ).
804 FINITE-GEOMETRY CODES
Two distinct points p1 and p2 of PG(m, 2s ) define a Table 1. Some EG and PG Codes of Length
unique line L(p1 , p2 ) passing through p1 and p2 . The N ≤ 1000
line L(p1 , p2 ) contains a total of (22s − 1)/(2s − 1) = 2s + 1 EG Codes PG Codes
points pi satisfying the equation
m s µ N K dmin N K dmin
pi = α1 p1 + α2 p2 , (3) 2 2 1 15 7 5 21 11 6
2 3 1 63 37 9 73 45 10
with α1 and α2 in GF(2s ), and not both zero. Given a point 3 2 1 63 13 21 85 24 22
p, the N(m, s) − 1 = 2s (2ms − 1)/(2s − 1) other points in 3 2 2 63 48 5 85 68 6
PG(m, 2s ) can be divided into subsets of 2s points each 2 4 1 255 175 17 273 191 18
based on Eq. (3) such that each subset corresponds to a 4 2 1 255 21 85 341 45 86
4 2 2 255 127 21 341 195 22
distinct line of PG(m, 2s ) containing p. As a result, there
4 2 3 255 231 5 341 315 6
are (2ms − 1)/(2s − 1) lines intersecting on each point p of 3 3 1 511 139 73 585 184 74
PG(m, 2s ) and the total number of lines in PG(m, 2s ) is 3 3 2 511 448 9 585 520 10
N(m, s)/(2s + 1) · (2ms − 1)/(2s − 1).
As for EG codes, we associate to PG(m, 2s ) an
incidence matrix H whose rows are associated with the
N(m, s)/(2s + 1) · (2ms − 1)/(2s − 1) lines of PG(m, 2s ), and original definition). It is readily seen that both EG and PG
whose columns are associated with the N(m, s) points codes satisfy this definition. However, compared with their
of PG(m, 2s ). This incidence matrix H defines the dual counterpart LDPC codes presented by Gallager [24], EG
space of a linear block PG code C denoted CPG (m, s) of and PG codes have several fundamental differences: (1)
length N = N(m, s). As for EG codes, the dimension K their values J and L are in general much larger, (2) their
and minimum distance dmin of CPG (m, s) depend on the total number of check sums defining H is larger, and (3)
values m and s, with dmin ≥ (2ms − 1)/(2s − 1) + 1. Also, they usually have a cyclic structure. On the basis of these
PG codes are cyclic codes and have structural properties observations, new LDPC codes can be constructed from
similar to those of EG codes, namely (1) every row of EG and PG codes by splitting the rows or columns of H
H contains exactly 2s + 1 ‘‘ones,’’ (2) every column of H or its transposed matrix [17]. If these operations are done
contains exactly (2ms − 1)/(2s − 1) ‘‘ones,’’ and (3) any two in a systematic and uniform way, new LDPC codes are
rows of H have at most one ‘‘one’’ in common. obtained. Interestingly, these codes become quasicyclic, so
For m = 2, the 2-dimensional PG codes CPG (2, s) are that fast encoding based on LFSRs is still possible.
equivalent to the DSC codes described by Weldon [9] and As an example, let us consider CEG (2, 6), a (4095,3367)
have the following parameters: EG code with J = L = 64, and let us split each column
of H into 16 columns of weight 4 each. Then we
N = 22s + 2s + 1 obtain a (65520,61425) EG extended code with J = 4
K = 22s + 2s − 3s and L = 64 [17]. Note that this new code has rate
R = 0.9375 = 1 − J/L.
dmin = 2s + 2
1. For 0 ≤ i ≤ N − 1, evaluate all checksums y · h in each of the N initial inputs a new estimate of this
B(i). quantity, based on exactly the same set of constraints.
2. Let B(i)+ and B(i)− represent the sets of satisfied This is due to the fact that all EG, extended EG and PG
and unsatisfied checksums in B(i), respectively. codes presented in Section 2 are one-step majority-logic
If |B(i)+ | ≥ |B(i)− |, decode v̂i = yi . decodable codes. Consequently, this observation suggests
Else, decode v̂i = yi ⊕ 1. a straightforward heuristic approach; we may use these
estimates as new inputs and iterate the decoding process.
Since a majority vote is taken for each bit, this decoding If for 0 ≤ i ≤ N − 1, x(0)
i represents the a priori information
method is known as ‘‘majority-logic decoding.’’ It was about position i and for l ≥ 1, and x(l)i represents the a
first introduced by Reed [11] to decode Reed–Muller posteriori estimate of position i evaluated at iteration l,
codes. Furthermore, since for the codes considered, all then iterative decoding is achieved on the basis of:
N bits can be decoded by this algorithm, these codes
i = xi + fi (x
x(l) (0) (l−1)
belong to the class of ‘‘one-step majority-logic decodable ) (6)
codes.’’ Majority-logic decoding is a very simple algebraic where fi () represents the function used to compute the
decoding method that fits high-speed implementations a posteriori estimate of position i from the a priori
since only binary operations are involved. It allows one inputs and x(l−1) represents the vector of a posteriori
to correct any error pattern of Hamming weight (|B(i)| − values evaluated at iteration (l − 1). We notice that (6)
1)/2 = (2ms−1 − 2s−1 )/(2s − 1), which corresponds to the implicitly assumes an iterative updating of the original
guaranteed error-correcting capability t = (dmin − 1)/2 a priori values on the basis of the latest a posteriori
of extended EG and PG codes. Since for EG codes, estimates. Iterative majority-logic decoding (also known
|B(i)| = (2ms − 1)/(2s − 1) − 1 is even, this algorithm has as ‘‘iterative bit flipping’’ decoding) and iterative APP
to be refined in order to consider the case where |B(i)|/2 decoding approaches were first proposed [24] to decode
checksums are satisfied and |B(i)|/2 checksums are LDPC codes. Refined versions of this heuristic approach
unsatisfied. In that case, we always decode v̂i = yi , so that for some EG, extended EG and PG codes have been
the guaranteed error-correcting capability t = |B(i)|/2 = proposed [14,15].
(dmin − 1)/2 of EG codes is also achieved. Although these iterative decoding methods allow one
The sets B(i), 0 ≤ i ≤ N − 1 can also be used to evaluate to achieve significant performance improvements on the
the a posteriori probabilities qi that the values yi are in first iteration, they fall short of optimum decoding.
error, based on the a priori error probabilities pi defined The main reason is the introduction and propagation
by the channel model considered. For example, pi = ε for a of correlated values from the second iteration, which is
binary symmetric channel (BSC) with crossover probabil- readily explained as follows. Let us assume that positions
ity ε and pi = e−|4 ri /N0 | /(1 + e−|4 ri /N0 | ) for an additive white i and j contribute to the same checksum. Then, according
Gaussian noise (AWGN) channel with associated variance to (6), for l ≥ 2, xj(l−1) depends on x(0) (0)
i and consequently, xi
N0 /2. Since the checksums in each set B(i), 0 ≤ i ≤ N − 1
and fi (x(l−1) ) are correlated when evaluating x(l)
i .
are orthogonal on position i, it follows that
This problem can be overcome by computing for each
−1
|B(i)|
1 − mj,i σj ⊕1 mj,i σj position i as many a posteriori values as checksums inter-
1 − p
qi = 1 +
i
secting on that position. This method, implicitly contained
pi j=1
mj,i 1 − mj,i in the performance analysis of Ref. 24, is also known as
(5) ‘‘belief propagation’’ (BP) [26]. A good presentation of BP
where σj is the result of the jth checksum in B(i), and decoding can be found in Ref. 27. BP decoding of long
LDPC codes has been showed to closely approach the Shan-
1 non capacity of the BSC and AWGN channel [28,29]. For
mj,i = 1− (1 − 2pi ) 0 ≤ i ≤ N − 1, let x(0) represent the a priori information
2 i
i ∈N(j)\i
about position i and for 1 ≤ j ≤ |B(i)|, and l ≥ 1 let xj,i
(l)
rep-
N(j)\i representing the set of nonzero positions corre- resent the a posteriori estimate of position i computed at
sponding to checksum σj , but position i. Note that mj,i iteration l based on the checksums other than checksum j
represents the probability that the sum of the bits in in B(i). Then at iteration l, BP decoding evaluates
N(j)\i mismatches the transmitted bit i. The correspond-
ing decoding algorithm is given as follows:
(l)
xj,i = x(0)
i + fj,i (x
(l−1)
(i)) (7)
paper [17]. Despite the presence of loops in the Tanner 7. C. R. P. Hartmann, J. B. Ducey, and L. D. Rudolph, On the
graph of all these codes, near-optimum performance can structure of generalized finite geometry codes, IEEE Trans.
still be achieved. Inform. Theory 20: 240–252 (March 1974).
8. S. Lin and K. P. Yiu, An improvement to multifold euclidean
3.2. General Decoding of EG and PG Codes geometry codes, Inform. Control 28: (July 1975).
One-stage decoding of EG and PG codes in general 9. E. J. Weldon, Difference-set cyclic codes, Bell Syst. Tech. J.
is achieved by repeating the method described in 45: 1045–1055 (Sept. 1966).
Section 3.1.1 in at most µ steps. At each step, checksums
10. D. J. Costello, Jr., J. Hagenauer, H. Imai, and S. B. Wicker,
with less and less common positions are estimated until Applications of error-control coding, Commemorative Issue,
at most only one common position between any two IEEE Trans. Inform. Theory 44: 2531–2560 (Oct. 1998).
checksums remains, as in Section 3.1.1. A good description
11. I. S. Reed, A class of multiple-error-correcting codes and the
of majority-logic decoding of EG and PG codes can be found
decoding scheme, IRE Trans. Inform. Theory 4: 38–49 (Sept.
in Chaps. 7 and 8 of Ref. 23. 1954).
On the other hand, iterative decoding of EG and PG
codes in general is not as straightforwardly generalizable. 12. C. L. Chen, On majority-logic decoding of finite geometry
codes, IEEE Trans. Inform. Theory 17: 332–336 (May 1971).
In general, a multistep decoding method is also required
to obtain the a posteriori error probabilities of all N bits, 13. T. Kasami and S. Lin, On majority-logic decoding for duals of
which inherently complicates the application of iterative primitive polynomial codes, IEEE Trans. Inform. Theory 17:
decoding. 322–331 (May 1971).
14. K. Yamaguchi, H. Iizuka, E. Nomura, and H. Imai, Variable
BIOGRAPHY threshold soft decision decoding, IEICE Trans. Elect.
Commun. 72: 65–74 (Sept. 1989).
Marc P.C. Fossorier was born in Annemasse, France, 15. R. Lucas, M. Bossert, and M. Breitbach, On iterative soft-
on March 8, 1964. He received the B.E. degree from the decision decoding of linear binary block codes and product
National Institute of Applied Sciences (I.N.S.A.) Lyon, codes, IEEE J. Select. Areas Commun. 16: 276–296 (Feb.
France, in 1987, and the M.S. and Ph.D. degrees from the 1998).
University of Hawaii at Manoa, Honolulu, in 1991 and 16. R. Lucas, M. Fossorier, Y. Kou, and S. Lin, Iterative decoding
1994, all in electrical engineering. of one-step majority logic decodable codes based on belief
In 1996, he joined the Faculty of the University of propagation, IEEE Trans. Commun. 48: 931–937 (June
Hawaii, Honolulu, as an Assistant Professor of Electrical 2000).
Engineering. He was promoted to Associate Professor 17. Y. Kou, S. Lin, and M. Fossorier, Low density parity check
in 1999. codes based on finite geometries: A rediscovery and more,
His research interests include decoding techniques for IEEE Trans. Inform. Theory (Oct. 1999).
linear codes, communication algorithms, combining coding 18. H. B. Mann, Analysis and Design of Experiments, Dover, New
and equalization for ISI channels, magnetic recording and York, 1949.
statistics. He coauthored (with S. Lin, T. Kasami, and T.
19. R. D. Carmichael, Introduction to the Theory of Groups of
Fujiwara) the book Trellises and Trellis-Based Decoding Finite Order, Dover, New York, 1956.
Algorithms, (Kluwer Academic Publishers, 1998).
20. W. W. Peterson and E. J. Weldon, Jr., Error-Correcting
Dr. Fossorier is a recipient of a 1998 NSF Career
Codes, 2nd ed., MIT Press, Cambridge, MA, 1972.
Development award. He has served as editor for the IEEE
Transactions on Communications since 1996, as editor 21. I. F. Blake and R. C. Mullin, The Mathematical Theory of
for the IEEE Communications Letters since 1999, and he Coding, Academic Press, New York, 1975.
currently is the treasurer of the IEEE Information Theory 22. F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-
Society. Since 2002, he has also been a member of the Board Correcting Codes, North-Holland Mathematical Library,
of Governors of the IEEE Information Theory Society. Amsterdam, 1977.
23. S. Lin and D. J. Costello, Jr., Error Control Coding, Funda-
BIBLIOGRAPHY mentals and Applications, Prentice-Hall, Englewood Cliffs,
NJ, 1983.
1. L. D. Rudolph, Geometric Configuration and Majority Logic 24. R. G. Gallager, Low-Density Parity-Check Codes, MIT Press,
Decodable Codes, M.E.E. thesis, Univ. Oklahoma, Norman, Cambridge, MA, 1963.
1964.
25. J. L. Massey, Threshold Decoding, MIT Press, Cambridge,
2. L. D. Rudolph, A class of majority logic decodable codes, IEEE MA, 1963.
Trans. Inform. Theory 13: 305–307 (April 1967).
26. J. Pearl, Probabilistic Reasoning in Intelligent Systems:
3. P. Delsarte, A geometric approach to a class of cyclic codes, J. Networks of Plausible Inference, Morgan Kaufmann, San
Combin. Theory 6: 340–358 (1969). Mateo, CA, 1988.
4. P. Delsarte, J. M. Goethals, and J. MacWilliams, On GRM 27. D. J. C. MacKay, Good error-correcting codes based on very
and related codes, Inform. Control 16: 403–442 (July 1970). sparse matrices, IEEE Trans. Inform. Theory 45: 399–432
5. S. Lin, Shortened finite geometry codes, IEEE Trans. Inform. (March 1999).
Theory 18: 692–696 (July 1972). 28. T. Richardson, A. Shokrollahi, and R. Urbanke, Design of
6. S. Lin, Multifold Euclidean geometry codes, IEEE Trans. probably good low-density parity check codes, IEEE Trans.
Inform. Theory 19: 537–548 (July 1973). Inform. Theory (in press).
FREQUENCY AND PHASE MODULATION 807
29. S. Y. Chung, G. D. Forney, Jr., T. Richardson, and R. Ur- 2. REPRESENTATION OF FM AND PM SIGNALS
banke, On the design of low-density parity-check codes within
0.0051 dB of the Shannon limit, IEEE Commun. Lett. (in An angle modulated signal in general can be expressed as
press).
30. R. M. Tanner, A recursive approach to low complexity codes, u(t) = Ac cos(θ (t))
IEEE Trans. Inform. Theory 27: 533–547 (Sept. 1981).
θ (t) is the phase of the signal and its instantaneous
frequency fi (t) is given by
FREQUENCY AND PHASE MODULATION
1 d
fi (t) = θ (t) (1)
2π dt
MASOUD SALEHI
Northeastern University Since u(t) is a bandpass signal it can be represented as
Boston, Massachusetts
u(t) = Ac cos(2π fc t + φ(t)) (2)
kp m(t), PM
can be done. Another property of angle modulation is its φ(t) = t (6)
bandwidth expansion property. Frequency and phase mod- 2π kf −∞ m(τ ) dτ, FM
ulation systems generally expand the bandwidth such that
the effective bandwidth of the modulated signal is usually This expression shows the close relation between FM and
many times the bandwidth of the message signal.1 With PM. This close relationship makes it possible to analyze
a higher implementation complexity and a higher band- these systems in parallel and only emphasize their main
width occupancy, one would naturally raise a question as differences. The first interesting result observed from the
to the usefulness of these systems. As we will show, the above is that if we phase modulate the carrier with
the integral of a message, it is equivalent to frequency
major benefit of these systems is their high degree of noise
modulation of the carrier with the original message. On
immunity. In fact, these systems trade off bandwidth for
the other hand, the above relation can be expressed as
high noise immunity. That is the reason why FM systems
are widely used in high-fidelity music broadcasting and
d d
point-to-point communication systems where the trans- φ(t) = kp m(t), PM (7)
dt dt
mitter power is quite limited. Another benefit of theses 2π m(t), FM
systems is their constant envelop property. Unlike ampli-
tude modulation, these systems have a constant amplitude which shows that if we frequency-modulate the carrier
which makes them attractive choices to use with nonlinear with the derivative of a message the result is equivalent
amplification devices (class C amplifiers or TWT devices). to phase modulation of the carrier with the message itself.
FM radio broadcasting was first initiated by Edwin Figure 1 shows the above relation between FM and PM.
H. Armstrong in 1935 in New York on a frequency of Figure 2 illustrates a square-wave signal and its integral,
42.1 MHz. At that time when FM broadcasting started. a sawtooth signal, and their corresponding FM and PM
Later with the advent of TV broadcasting the 88–108- signals.
MHz frequency band was used for FM broadcasting. The demodulation of an FM signal involves finding
Commercial stereo FM broadcasting started in Chicago the instantaneous frequency of the modulated signal and
in 1961. then subtracting the carrier frequency from it. In the
demodulation of PM, the demodulation process is done by
finding the phase of the signal and then recovering m(t).
1Strictly speaking, the bandwidth of the modulated signal, as will The maximum phase deviation in a PM system is given by
be shown later, is infinite. That is why we talk about the effective
bandwidth. φmax = kp max[|m(t)|] (8)
808 FREQUENCY AND PHASE MODULATION
1 2
1
0 2 4 t
−1 0
0 1 2 3 4 t
1 1
FM signal
0 0
−1 Eq −1
0 1 2 3 4 ui 0 1 2 3 4
va
le
nt
1
1
PM signal
0
0
−1
Figure 2. Frequency and phase modulation of 0 1 2 3 4 −1
square and sawtooth waves. 0 1 2 3 4
FREQUENCY AND PHASE MODULATION 809
Vu (t )
the total power in the signal. A series expansion for the The bandwidth3 of the modulated signal is given by
Bessel function is given by
n+2k p a + 1)f
2(k m, PM
β Bc = 2(β + 1)fm = kf a (31)
(−1) k 2 + 1 fm , FM
∞
2 fm
Jn (β) = (26)
k!(k + n)!
k=0 or
2(kp a + 1)fm , PM
Bc = (32)
The expansion here shows that for small β, we can use the 2(kf a + fm ), FM
approximation
βn This relation shows that increasing a, the amplitude of the
Jn (β) ≈ n (27) modulating signal, in PM and FM has almost the same
2 n!
effect on increasing the bandwidth Bc . On the other hand,
Thus for a small modulation index β, only the first increasing fm , the frequency of the message signal, has
sideband corresponding to n = 1 is of importance. Also, a more profound effect in increasing the bandwidth of a
using the expansion in (26), it is easy to verify the following PM signal as compared to an FM signal. In both PM and
symmetry properties of the Bessel function. FM the bandwidth Bc increases by increasing fm , but in
PM this increase is a proportional increase and in FM
this is only an additive increase which in most cases of
Jn (β), n even
J−n (β) = (28) interest, (for large β) is not substantial. Now if we look at
−Jn (β), n odd
the number of harmonics in the bandwidth (including the
Plots of Jn (β) for various values of n are given in Fig. 4, carrier) and denote it with Mc , we have
and a tabulation of the values of the Bessel function is
given in Table 1. The underlined and doubly underlined 2kp a + 3,
PM
!
entries for each β indicate the minimum value of n that Mc = 2β + 3 = kf a (33)
2 + 3, FM
guarantees the signal will contain at least 70% or 98% of fm
the total power, respectively.
In general the effective bandwidth of an angle- Increasing the amplitude a increases the number of
modulated signal, which contains at least 98% of the harmonics in the bandwidth of the modulated signal
signal power, is approximately given by the relation in both cases. However, increasing fm , has no effect on
the number of harmonics in the bandwidth of the PM
Bc = 2(β + 1)fm (29) signal and decreases the number of harmonics in the
FM signal almost linearly. This explains the relative
where β is the modulation index and fm is the frequency of insensitivity of the bandwidth of the FM signal to the
the sinusoidal message signal. message frequency. On one hand, increasing fm decreases
It is instructive to study the effect of the amplitude the number of harmonics in the bandwidth and, at the
same time, increases the spacing between the harmonics.
and frequency of the sinusoidal message signal on the
The net effect is a slight increase in the bandwidth. In
bandwidth and the number of harmonics in the modulated
signal. Let the message signal be given by
3 From now on, by bandwidth we mean effective bandwidth unless
0.2
−0.2
−0.4
−0.6
0 1 2 3 4 5 6 7 8 9 10
Figure 4. Bessel functions for various values of n. b
FREQUENCY AND PHASE MODULATION 811
Source: From Ziemer and Tranter (1990) Houghton Mifflin, reprinted by permission.
PM, however, the number of harmonics remains constant period and we can find its Fourier series expansion as
and only the spacing between them increases. Therefore,
the net effect is a linear increase in bandwidth. Figure 5
∞
shows the effect of increasing the frequency of the message ejβm(t) = cn ej2π nfm t (36)
n=−∞
in both FM and PM.
PM FM
fc + 2fm fc + 2fm
fc fc + fm fc + 3fm f fc fc + f m fc + 3f m f
f ′m >fm f ′m >fm
fc + 2f ′m fc + 2f ′m
fc fc + f ′m fc + 3f ′m f fc fc + f ′m f
It is seen again that the modulated signal contains all When m(t) = 0, the frequency of the tuned circuit is given
frequencies of the form fc + nfm . 1
by fc = √ . In general, for nonzero m(t), we have
The detailed treatment of the spectral characteristics 2π L0 C0
of an angle-modulated signal for a general nonperiodic
deterministic message signal m(t) is quite involved 1
fi (t) = $
because of the nonlinear nature of the modulation process. π L0 (C0 + k0 m(t))
However, there exists an approximate relation for the 1 1
effective bandwidth of the modulated signal, known as the = √
2π L0 C0 k0
Carson’s rule, and given by 1+ m(t)
C0
1
Bc = 2(β + 1)W (39) = fc (42)
k0
1+ m(t)
C0
where β is the modulation index defined as
Assuming that
k0
kp max[|m(t)|], PM
ε= m(t) 1
β= kf max[|m(t)|] (40) C0
, FM
W
and using the approximations
and W is the bandwidth of the message signal m(t). Since √ ε
in wideband FM the value of β is usually around ≥5, it 1+ε ≈1+ (43)
2
is seen that the bandwidth of an angle-modulated signal
1
is much greater than the bandwidth of various amplitude ≈1−ε (44)
modulation schemes, which is either W (in SSB) or 2W (in 1+ε
DSB or conventional AM). we obtain
k0
fi (t) ≈ fc 1 − m(t) (45)
2C0
4. IMPLEMENTATION OF ANGLE MODULATORS AND
DEMODULATORS which is the relation for a frequency-modulated signal.
A second approach for generating an FM signal is by use
of a reactance tube. In the reactance tube implementation
Angle modulators are in general time-varying and an inductor whose inductance varies with the applied
nonlinear systems. One method for generating an FM voltage is employed and the analysis is very similar to
signal directly is to design an oscillator whose frequency the analysis presented for the varactor diode. It should
changes with the input voltage. When the input voltage be noted that although we described these methods for
is zero the oscillator generates a sinusoid with frequency generation of FM signals, due to the close relation between
fc , and when the input voltage changes, this frequency FM and PM signals, basically the same methods can be
changes accordingly. There are two approaches to design applied for generation of PM signals (see Fig. 1).
such an oscillator, usually called a VCO or voltage-
controlled oscillator. One approach is to use a varactor 4.1. Indirect Method for Generation of Angle-Modulated
diode. A varactor diode is a capacitor whose capacitance Signals
changes with the applied voltage. Therefore, if this
capacitor is used in the tuned circuit of the oscillator Another approach for generating an angle-modulated
and the message signal is applied to it, the frequency signal is to first generate a narrowband angle-modulated
of the tuned circuit, and the oscillator, will change in signal and then change it to a wideband signal. This
accordance with the message signal. Let us assume that method is usually known as the indirect method for
the inductance of the inductor in the tuned circuit of generation of FM and PM signals. Because of the similarity
Fig. 6 is L0 and the capacitance of the varactor diode is of conventional AM signals, generation of narrowband
given by angle modulated signals is straightforward. In fact, any
modulator for conventional AM generation can be easily
C(t) = C0 + k0 m(t) (41) modified to generate a narrowband angle modulated
signal. Figure 7 is a block diagram of a narrowband
angle modulator. The next step is to use the narrowband
angle-modulated signal to generate a wideband angle-
modulated signal. Figure 8 is a block diagram of a system
Cv that generates wideband angle-modulated signals from
narrowband angle-modulated signals. The first stage of
+ Ca L To oscillator circuit such a system is, of course, a narrowband angle modulator
m (t ) such as the one shown in Fig. 7. The narrowband angle-
− modulated signal enters a frequency multiplier that
multiplies the instantaneous frequency of the input by
Figure 6. Varactor diode implementation of an angle modulator. some constant n. This is usually done by applying the
FREQUENCY AND PHASE MODULATION 813
Integrator
FM
m (t ) f(t ) −
× +
+
PM
Frequency Frequency
Input fc nfc Output
Narrowband Frequency BP
×
angle modulator ×n filter
Local
oscillator Frequency Figure 8. Indirect generation of angle-modu-
fL 0
lated signals.
input signal to a nonlinear element and then passing its is approximately a straight line in the frequency band of
output through a bandpass filter tuned to the desired the FM signal. If the frequency response of such a system
central frequency. If the narrowband modulated signal is is given by
represented by
Bc
|H(f )| = V0 + k(f − fc ) for |f − fc | < (49)
un (t) = Ac cos(2π fc t + φ(t)) (46) 2
the output of the frequency multiplier (output of the and if the input to the system is
bandpass filter) is given by t
u(t) = Ac cos 2π fc t + 2π kf m(τ ) dτ (50)
y(t) = Ac cos(2π nfc t + nφ(t)) (47) −∞
(a)
Amplitude response
H1(f )
H (f )
f1 f
fc f
Linear region
Amplitude response
(b)
H2(f ) H1(f )
a circuit can be easily implemented but usually the linear
region of the frequency characteristic may not be wide
enough. To obtain a linear characteristics over a wider
range of frequencies, usually two circuits tuned at two fre- f2 f1 f
quencies f1 and f2 are connected in a configuration which
is known as a balanced discriminator. A balanced discrim- (c)
inator with the corresponding frequency characteristics is
shown in Fig. 11. H (f )=H1(f )−H2(f ) H1(f )
Amplitude response
The FM demodulation methods described above that
transform the FM signal into an AM signal have a
bandwidth equal to the channel bandwidth Bc occupied
f
by the FM signal. Consequently, the noise that is passed
by the demodulator is the noise contained within Bc .
−H2(f )
Received Output
signal Bandpass Lowpass signal
× Discriminator
filter filter
VCO
Figure 12. Block diagram of FMFB demodulator.
FREQUENCY AND PHASE MODULATION 815
and, hence
VCO 1
e (f ) = (f ) (64)
kv
Figure 13. Block diagram of PLL FM demodulator. 1+ G(f )
jf
yv (t) = Av sin[2π fc t + φv (t)] (56) Now, suppose that we design G(f ) such that
% %
% G(f ) %
where %kv %1 (66)
t % jf %
φv (t) = 2π kv v(τ ) dτ (57)
0
in the frequency band |f | < W of the message signal. Then
The phase comparator is basically a multiplier and a from (65), we have
filter that rejects the signal component centered at 2fc .
j2π f
Hence, its output may be expressed as V(f ) = (f ) (67)
2π kv
e(t) = 12 Av Ac sin[φ(t) − φv (t)] (58)
or, equivalently
where the difference φ(t) − φv (t) ≡ φe (t) constitutes the 1 d
phase error. The signal e(t) is the input to the loop filter. v(t) = (t)
2π kv dt
Let us assume that the PLL is in lock, so that the phase
error is small. Then kf
= m(t) (68)
kv
sin[φ(t) − φv (t)] ≈ φ(t) − φv (t) = φe (t) (59)
Since the control voltage of the VCO is proportional to the
message signal, v(t) is the demodulated signal.
under this condition, we may deal with the linearized
We observe that the output of the loop filter with
model of the PLL, shown in Fig. 14. we may express the
frequency response G(f ) is the desired message signal.
phase error as
Hence, the bandwidth of G(f ) should be the same as the
t bandwidth W of the message signal. Consequently, the
φe (t) = φ(t) − 2π kv v(τ ) dτ (60) noise at the output of the loop filter is also limited to the
0
bandwidth W. On the other hand, the output from the VCO
or, equivalently, as either is a wideband FM signal with an instantaneous frequency
that follows the instantaneous frequency of the received
d d FM signal.
φe (t) + 2π kv v(t) = φ(t) (61) The major benefit of using feedback in FM signal
dt dt
demodulation is to reduce the threshold effect that
or occurs when the input signal-to-noise-ratio to the FM
∞ demodulator drops below a critical value. We will study
d d
φe (t) + 2π kv φe (τ )g(t − τ ) dτ = φ(t) (62) the threshold effect later in this article.
dt 0 dt
since noise is additive, the noise is directly added to filter whose role is to remove the out-of-band noise. The
the signal. However, in a frequency-modulated signal, bandwidth of this filter is equal to the bandwidth of the
the noise is added to the amplitude and the message modulated signal and, therefore, it passes the modulated
information is contained in the frequency of the modulated signal without distortion. However, it eliminates the out-
signal. Therefore the message is contaminated by the noise of-band noise and, hence, the noise output of the filter is
to the extent that the added noise changes the frequency a bandpass Gaussian noise denoted by n(t). The output of
of the modulated signal. The frequency of a signal can this filter is
be described by its zero crossings. Therefore, the effect
of additive noise on the demodulated FM signal can be
described by the changes that it produces in the zero r(t) = u(t) + n(t)
crossings of the modulated FM signal. Figure 15 shows = u(t) + nc (t) cos 2π fc t − ns (t) sin 2π fc t (70)
the effect of additive noise on the zero crossings of two
frequency-modulated signals, one with high power and
the other with low power. From the above discussion and where nc (t) and ns (t) denote the in-phase and the
also from Fig. 15 it should be clear that the effect of noise quadrature components of the bandpass noise. Because
in an FM system is less than that for an AM system. It of the nonlinearity of the modulation process, an exact
is also observed that the effect of noise in a low-power analysis of the performance of the system in the presence
FM system is more than in a high-power FM system. of noise is mathematically involved. Let us make the
The analysis that we present in this chapter verifies our assumption that the signal power is much higher than the
intuition based on these observations. noise power. Then, if the bandpass noise is represented as
A block diagram of the receiver for a general angle-
modulated signal is shown in Fig. 16. The angle-modulated &
signal is represented as4 ns (t)
n(t) = n2c (t) + n2s (t) cos 2π fc t + arctan
nc (t)
u(t) = Ac cos(2π fc t + φ(t)) = Vn (t) cos(2π fc t + n (t)) (71)
t
Ac cos(2π fc t + 2π kf −∞ m(τ ) dτ ), FM
= (69)
Ac cos(2π fc t + kp m(t)), PM
where Vn (t) and n (t) represent the envelope and the
phase of the bandpass noise process, respectively, the
The additive white Gaussian noise nw (t) is added to
assumption that the signal is much larger than the noise
u(t) and the result is passed through a noise limiting
means that
p(Vn (t) Ac ) ≈ 1 (72)
(a)
5
Therefore, the phasor diagram of the signal and the noise
are as shown in Fig. 17. From this figure it is obvious that
0
t we can write
−5
r(t) ≈ (Ac + Vn (t) cos(n (t) − φ(t)))×
(b)
5 × cos 2π fc t + φ(t)
0 Vn (t) sin(n (t) − φ(t))
t + arctan
Ac + Vn (t) cos(n (t) − φ(t))
−5 ≈ (Ac + Vn (t) cos(n (t) − φ(t)))
Figure 15. Effect of noise on the zero crossings of (a) low-power Vn (t)
× cos 2π fc t + φ(t) + sin(n (t) − φ(t))
and (b) high-power modulated signals. Ac
4 When we refer to the modulated signal, we mean the signal as The demodulator processes this signal and, depending
received by the receiver. Therefore, the signal power is the power whether it is a phase or a frequency demodulator, its
in the received signal, not the transmitted power. output will be the phase or the instantaneous frequency of
BW = Bc BW = W S
u (t ) + nw (t ) r (t ) = u (t ) + n (t ) y (t ) N 0
Bandpass Angle Lowpass
filter demodulator filter
Im
Φn (t )
f(t )
Re
1
E[Yn (t + τ )Yn (t)] = Rn (τ )E cos((t + τ ) − (t))
A2c c
1
= Rn (τ )Re[Eej((t+τ )−(t)) ]
A2c c
−Bc /2 −W W Bc /2 f
1
= 2 Rnc (τ )Re[EejZ(t,τ ) ]
Ac Snc(f )
1 2
= Rn (τ )Re[e−(1/2)σZ ] N0
A2c c
1
= Rn (τ )Re[e−(R (0)−R (τ )) ]
A2c c
−Bc /2 −W W Bc /2 f
1
= 2 Rnc (τ )e−(R (0)−R (τ )) (81)
Ac Snc(f ) *G(f )
2
βp PM
PR , PM
S max |m(t)| N0 W
N0 = 2 (90)
N
βf PM
A2c o 3PR , FM
max |m(t)| N0 W
If we denote NP0RW by NS b , the signal-to-noise ratio of a
baseband system with the same received power, we obtain
−W W f
PM βp2 S
, PM
(b) N0 S (max |m(t)|)2 N b
f2 = (91)
A2c N
PM βf2 S
o
3 , FM
(max |m(t|)2 N b
− 1. Therefore
3A2c 2
2
Now we can use (74) to determine the output signal-to-
−1
noise ratio in angle modulation. First we have the output
2 S
P M , PM
signal power
max |m(t)| N b
2 S
kp PM , PM =
2 (94)
Pso = (87) N o
kf2 PM , FM
− 1
2 S
3PM , FM
max |m(t)| N b
Then, the signal-to-noise ratio, defined as
From (90) and (94), we observe that
S Pso
=
N o P no
1. In both PM and FM the output SNR is proportional
becomes 2 2 to the square of the modulation index β. Therefore,
kp Ac PM increasing β increases the output SNR even with
, PM
S 2 N0 W low received power. This is in contrast to amplitude
= (88)
N
2 2
3k A modulation, where such an increase in the received
o f c PM , FM
2W 2 N0 W signal-to-noise ratio is not possible.
2. The increase in the received signal-to-noise ratio is
A2c obtained by increasing the bandwidth. Therefore
Nothing that 2
is the received signal power, denoted by
angle modulation provides a way to trade off
PR , and
bandwidth for transmitted power.
βp = kp max |m(t)|, PM
kf max |m(t)| (89) 3. The relation between the output SNR and the
βf = , FM bandwidth expansion factor,
, is a quadratic
W
820 FREQUENCY AND PHASE MODULATION
relation. This is far from optimal.6 An information- no reason that the additive signal and noise components
theoretic analysis shows that the optimal relation at the input of the modulator result in additive signal
between the output SNR and the bandwidth and noise components at the output of the demodulator.
expansion factor is an exponential relation. In fact, this assumption is not at all correct in general,
4. Although we can increase the output signal-to-noise and the signal and noise processes at the output of the
ratio by increasing β, having a large β means having demodulator are completely mixed in a single process by
a large Bc (by Carson’s rule). Having a large Bc a complicated nonlinear relation. Only under the high
means having a large noise power at the input of the signal-to-noise ratio assumption is this highly nonlinear
demodulator. This means that the approximation relation approximated as an additive form. Particularly
p(Vn (t) Ac ) ≈ 1 will no longer apply and that the at low signal-to-noise ratios, signal and noise components
preceding analysis will not hold. In fact if we increase are so intermingled that one cannot recognize the signal
β such that the preceding approximation does not from the noise and, therefore, no meaningful signal-to-
hold, a phenomenon known as the threshold effect noise ratio as a measure of performance can be defined.
will occur and the signal will be lost in the noise. In such cases the signal is not distinguishable from the
noise and a mutilation or threshold effect is present. There
5. A comparison of the preceding result with the
exists a specific signal to noise ratio at the input of the
signal-to-noise ratio in amplitude modulation shows
demodulator known as the threshold SNR beyond which
that in both cases increasing the transmitter
signal mutilation occurs. The existence of the threshold
power (or the received power), will increase the
effect places an upper limit on the tradeoff between
output signal-to-noise ratio, but the mechanisms
bandwidth and power in an FM system. This limit is a
are totally different. In AM, any increase in the
practical limit in the value of the modulation index βf .
received power directly increases the signal power
It can be shown that at threshold the following
at the output of the receiver. This is basically
approximate relation between NP0RW = ( NS )b and βf holds
due to the fact the message is in the amplitude
of the transmitted signal and an increase in the in an FM system:
transmitted power directly affects the demodulated
S
signal power. However, in angle modulation, the = 20(β + 1) (95)
message is in the phase of the modulated signal and, N b,th
behaves as an integrator. A good approximation to such a To analyze the effect of preemphasis and deemphasis
filter is a simple lowpass filter. The modulator filter which filtering on the overall signal-to-noise ratio in FM
emphasizes high frequencies is called the pre-emphasis broadcasting, we note that since the transmitter and the
filter and the demodulator filter which is the inverse of the receiver filters cancel the effect of each other, the received
modulator filter is called the deemphasis filter. Frequency power in the message signal remains unchanged and we
responses of a sample preemphasis and deemphasis filter only have to consider the effect of filtering on the received
are given in Fig. 21. noise. Of course, the only filter that has an effect on
Another way to look at preemphasis and deemphasis the received noise is the receiver filter that shapes the
filtering is to note that due to the high level of noise in power spectral density of the noise within the message
the high-frequency components of the message in FM, it bandwidth. The noise component before filtering has a
is desirable to attenuate the high frequency components parabolic power spectrum. Therefore, the noise component
of the demodulated signal. This results in a reduction after the deemphasis filter has a power spectral density
in the noise level but it causes the higher-frequency given by
components of the message signal to be also attenuated. To SnPD (f ) = Sno (f )|Hd (f )|2
compensate for the attenuation of the higher components
N0 2 1
of the message signal we can amplify these components = f (105)
at the transmitter before modulation. Therefore at the A2c f2
1+ 2
transmitter we need a highpass filter, and at the receiver f0
we must use a lowpass filter. The net effect of these
filters should be a flat frequency response. Therefore, the where we have used (85). The noise power at the output of
receiver filter should be the inverse of the transmitter the demodulator now can be obtained as
filter. +W
The characteristics of the preemphasis and deemphasis PnPD = SnPD (f ) df
filters depend largely on the power spectral density of −W
Thus an FM system with preemphasis and deemphasis lowpass filter to remove out-of-band noise and its output
filtering improves the performance of a baseband system is used to drive a loudspeaker.
by roughly 29–30 dB.
6.1. FM Stereo Broadcasting
6. FM RADIO BROADCASTING Many FM radio stations transmit music programs in
stereo by using the outputs of two microphones placed
Commercial FM radio broadcasting utilizes the frequency in two different parts of the stage. Figure 23 is a block
band 88–108 MHz for transmission of voice and music diagram of an FM stereo transmitter. The signals from
signals. The carrier frequencies are separated by 200 kHz the left and right microphones, m (t) and mr (t), are added
and the peak frequency deviation is fixed at 75 kHz. With and subtracted as shown. The sum signal m (t) + mr (t)
a signal bandwidth of 15 kHz, this results in a modulation is left as is and occupies the frequency band 0–15 kHz.
index of β = 5. Preemphasis filtering with f0 = 2100 Hz The difference signal m (t) − mr (t) is used to AM modulate
is generally used, as described in the previous section, to (DSB-SC) a 38-kHz carrier that is generated from a 19-
improve the demodulator performance in the presence kHz oscillator. A pilot tone at the frequency of 19 kHz is
of noise in the received signal. The lower 4 MHz of added to the signal for the purpose of demodulating the
the allocated bandwidth is reserved for noncommercial DSB SC AM signal. The reason for placing the pilot tone
stations; this accounts for a total of 20 stations, and the at 19 kHz instead of 38 kHz is that the pilot is more easily
remaining 80 stations in the 92–108-MHz bandwidth are separated from the composite signal at the receiver. The
allocated to commercial FM broadcasting. combined signal is used to frequency modulate a carrier.
The receiver most commonly used in FM radio By configuring the baseband signal as an FDM signal,
broadcast is a superheterodyne type. The block diagram a monophonic FM receiver can recover the sum signal
of such a receiver is shown in Fig. 22. As in AM radio m (t) + mr (t) by use of a conventional FM demodulator.
reception, common tuning between the RF amplifier and Hence, FM stereo broadcasting is compatible with
the local oscillator allows the mixer to bring all FM conventional FM. The second requirement is that the
radio signals to a common IF bandwidth of 200 kHz, resulting FM signal does not exceed the allocated 200-kHz
centered at fIF = 10.7 MHz. Since the message signal bandwidth.
m(t) is embedded in the frequency of the carrier, any The FM demodulator for FM stereo is basically
amplitude variations in the received signal are a result the same as a conventional FM demodulator down to
of additive noise and interference. The amplitude limiter the limiter/discriminator. Thus, the received signal is
removes any amplitude variations in the received signal converted to baseband. Following the discriminator, the
at the output of the IF amplifier by band-limiting the baseband message signal is separated into the two
signal. A bandpass filter centered at fIF = 10.7 MHz with signals m (t) + mr (t) and m (t) − mr (t) and passed through
a bandwidth of 200 kHz is included in the limiter to deemphasis filters, as shown in Fig. 24. The difference
remove higher-order frequency components introduced by signal is obtained from the DSB SC signal by means
the nonlinearity inherent in the hard limiter. of a synchronous demodulator using the pilot tone. By
A balanced frequency discriminator is used for taking the sum and difference of the two composite signals,
frequency demodulation. The resulting message signal we recover the two signals m (t) and mr (t). These audio
is then passed to the audiofrequency amplifier, which signals are amplified by audio band amplifiers and the
performs the functions of deemphasis and amplification. two outputs drive dual loudspeakers. As indicated above,
The output of the audio amplifier is further filtered by a an FM receiver that is not configured to receive the
RF
amplifier
Automatic
∼ frequency
control (AFC)
Local Lowpass
oscillator filter
m (t ) +
+ Preemphasis
+ Frequency Pilot FM
∼ +
doubler modulator
+ DSB SC
mr (t )
+ Preemphasis AM
−
modulator
FM stereo transmitter
L+R
M (f ) + Mr (f )
Pilot tone
L−R
m (t ) + mr (t ) m (t )
Lowpass
filter Deemphasis +
+
0−15 kHz +
Signal
from IF
amplifier Narrowband To audio-
FM Frequency To stereo
tuned filter band
discriminator doubler indicator
19 kHz amplifier
38 kHz
Bandpass + mr (t )
Synchronous
filter Deemphasis +
demodulator −
23−53 kHz
Figure 24. FM stereo receiver.
FM stereo sees only the baseband signal m (t) + mr (t) interest are network information theory, source-channel
in the frequency range 0–15 kHz. Thus, it produces a matching problems in single and multiple user environ-
monophonic output signal which consists of the sum of the ments, data compression, channel coding, and particu-
signals at the two microphones. larly turbo codes. Professor Salehi’s research has been
supported by research grants from the National Science
BIOGRAPHY Foundation, DARPA, GTE, and Analog Devices. Professor
Salehi is the coauthor of the textbooks Communication Sys-
Masoud Salehi received a B.S. degree from Tehran tems Engineering, published by Prentice-Hall in 1994 and
University and M.S. and Ph.D. degrees from Stanford 2002, and Contemporary Communication Systems Using
University, all in electrical engineering. Before joining MATLAB, published by Brooks/Cole in 1998 and 2000.
Northeastern University, he was with the Electrical and
Computer Engineering Departments, Isfahan University BIBLIOGRAPHY
of Technology and Tehran University.
From February 1988 until May 1989 Dr. Salehi was
1. J. G. Proakis and M. Salehi, Communication Systems Engi-
a visiting professor at the Information Theory Research neering, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, in
Group, Department of Electrical Engineering, Eindhoven press.
University of Technology, The Netherlands, where he did
2. D. J. Sakrison, Communication Theory: Transmission of
research in network information theory and coding for
Waveforms and Digital Information, Wiley, New York, 1968.
storage media.
Professor Salehi is currently with the Department of 3. A. B. Carlson, Communication Systems, 3rd ed., McGraw-
Hill, New York, 1986.
Electrical and Computer Engineering and a member of
the CDSP (Communication and Digital Signal Processing) 4. L. W. Couch, II, Digital and Analog Communication Systems,
Center, Northeastern University, where he is involved in 4th ed., Macmillan, New York, 1993.
teaching and supervising graduate students in informa- 5. J. D. Gibson, Principles of Digital and Analog Communica-
tion and communication theory. His main areas of research tions, 2nd ed., Macmillan, New York, 1993.
FREQUENCY-DIVISION MULTIPLE ACCESS (FDMA) 825
6. M. A. McMahon, The Making of A Profession — Century of technique each of the K users (transmitting stations) can
Electrical Engineering in America, IEEE Press, 1984. transmit all of the time, or at least for extended periods of
7. K. S. Shanmugam, Digital and Analog Communication Sys- time but using only a portion Bi (subchannel) of the total
tems, Wiley, New York, 1979. channel bandwidth B such that Bi = B/K Hz. If the users
8. H. Taub and D. A. Schilling, Principles of Communication generate constantly unequal amounts of traffic, one can
Systems, 2nd ed., McGraw-Hill, New York, 1986. modify this scheme to assign bandwidth in proportion to
the traffic generated by each one.
9. R. E. Ziemer and W. H. Tranter, Principles of Communica-
tions, 4th ed., Wiley, New York, 1995. Adjacent users occupy different carriers of Bi band-
width, with guard channels D Hz between them to avoid
10. M. Schwartz, Information Transmission, Modulation, and
interference. Then the actual bandwidth available to
Noise, 4th ed., McGraw-Hill, New York, 1990.
each station for information is Bi = {B − (K + 1)D}/K.
11. F. G. Stremler, Introduction to Communication Systems, 3rd The input of each source is modulated over a carrier
ed., Addison-Wesley, Reading, MA, 1990. and transmitted to the channel. So the channel transmits
12. S. Haykin, Communication Systems, 4th ed., Wiley, New several carriers simultaneously at different frequencies.
York, 2001. User separability is therefore achieved by separation in
13. B. P. Lathi, Modern Digital and Analog Communication frequency. At the receiving end, the user bandpass filters
Systems, 3rd ed., Oxford Univ. Press, New York, 1998. select the designated channel out of the composite signal.
Then a demodulator obtains the transmitted baseband
signal.
FREQUENCY-DIVISION MULTIPLE ACCESS Instead of transmitting one source signal on the carrier,
(FDMA): OVERVIEW AND PERFORMANCE we can feed a multiplexed signal on it [e.g., a pulse
EVALUATION code modulation (PCM) telephone line]. Depending on
the multiplexing and modulation techniques used, several
FOTINI-NIOVI PAVLIDOU transmission schemes can be considered. Although FDMA
is usually considered to be built on the well-known FDM
Aristotle University of
Thessaloniki scheme, any multiplexing and modulation technique can
Thessaloniki, Greece be used for the processing of the baseband data, so several
forms of FDMA are possible [4–6].
In Figs. 1b,c we give a very general implementation
1. BASIC SYSTEM DESCRIPTION of the system. The first ‘‘user/station’’ transmits analog
baseband signals, which are combined in a frequency-
FDMA (frequency-division multiple access) is a very basic division multiplex (FDM) scheme. This multiplexed signal
multiple-access technique for terrestrial and satellite can modulate a carrier in frequency (FM) and then it is
systems. We recall that multiple access is defined transmitted on the common channel, together with other
as the ability of a number of users to share a carriers from other stations. If the technique used for
common transmission channel (coaxial cable, fiber, all the stations is FDM/FM, the access method is called
wireless transmission, etc.). Referring to the seven-layer FDM/FM/FDMA (Fig. 1b).
OSI (Open System Interconnection) model, the access If the stations transmit digital data, other modulation
methods and the radio channel (frequency, time, and techniques can be used like PSK (phase shift keying)
space) assignment are determined by the media access and TDM (time-division multiplex) can be applied. Then
control (MAC) unit of the second layer. again this multiplexed signal can be transmitted on a
Historically, FDMA has the highest usage and appli- frequency band on the common channel together with the
cation of the various access techniques. It is one of the carriers of other similar stations. This application is called
three major categories of fixed-assignment access meth- TDM/PSK/FDMA.
ods (TDMA, FDMA, CDMA), and since it is definitely the The possible combinations of the form mux/mod/FDMA
simplest one, it has been extensively used in telephony, in are numerous even if in practice the schemes shown
commercial radio, in television broadcasting industries, in in Fig. 1 are the most commonly used ones [3,6]. They
the existing cellular mobile systems, in cordless systems are generally known as multichannel-per-carrier (MCPC)
(CT2), and generally in many terrestrial and satellite techniques.
wireless applications [1–3]. Of course, for lower-traffic requirements the baseband
This access method is efficient if the user has a steady signals can modulate directly a carrier in either analog
flow of information to send (digitized voice, video, transfer or digital form (Fig. 1c). This concept, single channel
of long files) and uses the system for a long period of time, per carrier (SCPC), has been extensively used in
but it can be very inefficient if user data are sporadic mobile satellites since it allows frequency reallocations
in nature, as is the case with bursty computer data or according to increases of traffic and future developments
short-message traffic. In this case it can be effectively in modulation schemes. The best-known application of
applied only in hybrid implementations, as, for example, SCPC/FDMA is the single-channel-per-carrier pulse-code-
in FDMA/Aloha systems. modulation multiple-access demand assigned equipment
The principle of operation of FDMA is shown in (SPADE) system applied in Intelsat systems.
Figs. 1a–d. The total common channel bandwidth is B FDMA carriers are normally assigned according to
Hz, and K users are trying to share it. In the FDMA a fixed-frequency plan. However, in applications where
826 FREQUENCY-DIVISION MULTIPLE ACCESS (FDMA)
(a) B Hz
B/K Hz Guard band (D)
Frequency
~
Channel
1 2 3 K-1 K
(b) Users
TX 1
1~M
FDM (MUX) FDM (MOD)
F1
l
l
Users ~
1~M TXK
FK
FDM (MUX) FDM (MOD)
(c) TX 1
Terminal 1 F1
FM (MOD)
l
l TXK
Terminal K ~
PSK (MOD) FK
(d)
Bi
limited space segment resources are to be shared by 2.1. Adjacent-Channel Interference and Intermodulation
a relatively large number of users, the channel can be Noise
accessed on demand [demand assignment multiple access
As we have noted, in FDMA applications frequency
(DAMA)].
spacing is required between adjacent channels. This
In wireless applications, the FDMA architecture is also
issue has been put forward several times by opponents
known as ‘‘narrowband radio,’’ as the bandwidth of the
of FDMA to claim that the technique is not as
individual data or digitized analog signal (voice, facsimile,
efficient in spectrum use as TDMA or CDMA. Indeed,
etc.) is relatively narrow compared with TDMA and CDMA
excessive separation causes needless waste of the available
applications.
bandwidth. Whatever filters are used to obtain a sharp
frequency band for each carrier, part of the power
2. SYSTEM PERFORMANCE of a carrier adjacent to the one considered will be
captured by the receiver of the last one. In Fig. 1d we
can see three adjacent bands of the FDMA spectrum
The major factor that determines the performance of
(received at a power amplifier) composed of K carriers
the FDMA scheme is the received carrier-to-noise (plus
of equal power and identically modulated. To determine
interference) ratio, and for a FDM/FM/FDMA satellite
the proper spacing between FDMA carrier spectra, this
system this is given by [6]
adjacent channel interference (crosstalk power) must be
carefully calculated. Spacings can then be selected for
1 1 Cu Cd
= +1 +1 any acceptable crosstalk level desired. Common practice
CT /NT Cu /Nu Iu IM
is to define the guard bands equal to around 10% of
Cd 1 the carrier bandwidth (for carriers equal in amplitude
+1 +
Nd Cd /Id and bandwidth). This will keep the noise level below the
ITU-T (CCITT) requirements [6]. Simplified equations as
where subscripts T, u, and d denote total, uplink, and well as detailed analysis for crosstalk are given in Refs. 3
downlink factors: N gives the white noise; I denotes the and 6.
interference from other systems using the same frequency; In addition, when the multiple FDMA carriers pass
and IM is the intermodulation distortion, explained in through nonlinear systems, like power amplifiers in
Sections 2.1 and 2.2. satellites, two basic effects occur: (1) the nonlinear
FREQUENCY-DIVISION MULTIPLE ACCESS (FDMA) 827
Carrier-to-intermodulation power, dB
reduce the input to the amplifier from its maximum drive 20
level in order to control intermodulation distortion. In
satellite repeaters, this procedure is referred to as input 18
backoff and is an important factor in maximizing the 16
power efficiency of the repeater. So the traveling-wave
tube (TWT) has to be backed off substantially in order 14
to operate it as a linear amplifier. For example, Intelsat K
12
has established a series of monitoring stations to ensure
that its uplink power levels are maintained. This in turn 10 4
leads to inefficient usage of the available satellite power.
8 8
Several nonlinear models for nonlinear power amplifiers, 16
which have both amplitude and phase nonlinearities, have x
been proposed to calculate the carrier-to-IM versus input 8 4 0 −4 −8 −10
backoff. Input backoff, dB
An analysis of IM products for a satellite environ-
Figure 2. Intermodulation distortion.
ment can be found elsewhere in the literature, where
a detailed description and an exact calculation of inter-
modulation distortion are given [3,4,6–8]. Generally we 2.2. FDMA Throughput
have to determine the spectral distribution obtained
The throughput capability of an FDM/FM/FDMA scheme
from the mixing of the FDMA carriers. In Fig. 2a the
has been studied [3,6,7] as a function of the number
intermodulation spectrum received by the mixing of the
of carriers taking into account the carrier-to-total noise
spectra of Fig. 1d is shown. The intermodulation power
(CT /NT ) factor. The carriers are modulated by multiplexed
tends to be concentrated in the center of the total band-
signals of equal capacity. As the number of carriers
width B, so that the center carriers receive the most increases, the bandwidth allocated to each carrier must
interference. Figure 2b shows the ratio of carrier power decrease, and this leads to a reduction of the capacity of
CT to the intermodulation noise power IM, as a func- the modulating multiplexed signal. As the total capacity
tion of the number of carriers K and the degree of is the product of the capacity of each carrier and the total
backoff. number of carriers, it could be imagined that the total
In an ideal FDMA system the carriers have equal capacity would remain sensibly constant. But it is not;
powers, but this is not the case in practice. When a the total capacity decreases as the number of carriers
mixture of both strong and weak carriers are present increases. This results from the fact that each carrier
on the channel, we must ensure that the weaker carriers is subjected to a reduction in the value of C/N since
can maintain a communication link of acceptable quality. the backoff is large when the number of carriers is high
Some techniques have been implemented to accommodate (extra carriers bring more IM products). Another reason
this situation; one is known as channelization. In is the increased need for guard bands. Figure 3 depicts
channelization carriers are grouped in channels, each the throughput of a FDMA system as a function of the
with its own bandpass filter and power amplifier; in number of carriers for an Intelsat transponder of 3 MHz
each channel (group) only carriers of the same power bandwidth; it effectively shows the ratio of the total real
are transmitted. channel capacity and the potential capacity of the channel.
100
Throughput (%)
80
60
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 3. Throughput of FDM/FM/FDMA (channel band-
Number of carriers width 36 MHz) (Dicks and Brown [4], 1974 IEEE).
828 FREQUENCY-DIVISION MULTIPLE ACCESS (FDMA)
3. IMPLEMENTATION ALTERNATIVES F
r 1 3 2
As we have noted, FDMA is a very basic technology e
1 3 2
that can be combined with almost all the other access q
u 2 1 3
techniques. Figure 4a shows FDD/FDMA for analog and e 2 1 3
digital transmission. In frequency-division duplex (FDD) n 3 2 1
there is a group of K subbands for transmission in one c
3 2 1
direction and a similar contiguous group of K subbands y
for transmission in the reverse direction. A band of Time
frequencies separates the two groups. Each station is Figure 5. OFDMA with frequency hopping.
allocated a subband in both FDD bands for the duration of
its call. All the first generation analog cellular systems
orthogonal to each other. So there is no need for the
use FDD/FDMA. FDMA can also be used with TDD
relatively large guard bands necessary in conventional
(time-division duplex). Here only one band is provided for
FDMA. An example of an OFDMA/TDMA time-frequency
transmissions, so a timeframe structure is used allowing
grid is shown in Fig. 5, where users 1, 2, and 3 each
transmissions to be done during one-half of the frame while
use a certain part of the available subcarriers. The part
the other half of the frame is available to receive signals.
can be different for each one. Each user can have a
TDD/FDMA is used in cordless communications (CT2). The
fixed set of subcarriers, but it is relatively easy to allow
TDMA/FDMA structure is used in GSM (Global System for
hopping with different hopping patterns for each user and
Mobile Communication) systems (Fig. 4b), where carriers
to result in orthogonal hopping schemes. OFDMA has
of 200 kHz bandwidth carry a frame of 8 time slots.
been proposed for the European UMTS (Universal Mobile
Further generalization of the strict fixed-assignment
Telecommunications System) [9].
FDMA system is possible and has been implemented
in commercial products. Frequency-hopping schemes in
FDMA have been proposed for cellular systems. Hop- 4. COMPARISON WITH OTHER TECHNIQUES
ping is based on a random sequence. In particular,
slow frequency-hopped spread-spectrum (FHSS) FDMA The comparison between the FDMA and other access
systems, combined with power and spectrally efficient methods is based on a number of performance and
modulation techniques such as FQPSK, can have signifi- implementation criteria, the importance of which is
cantly increased capacity over other access methods. The varying very much depending on the type of system in
GSM system supports the possibility for frequency hop- which the access method is to be employed.
ping. FDMA systems in which users can randomly access In digital transmission TDMA appears to be more
each channel with an Aloha-type attempt have also been ‘‘natural,’’ so today most of the systems operate in this
proposed. scheme or at least in combination with FDMA schemes.
An interesting variation of FDMA is the OFDMA TDMA offers format flexibility since time-slot assignments
(orthogonal frequency–division multiple access) scheme among the multiple users are readily adjusted to provide
proposed for wideband communications. In OFDMA, different access rates for different users.
multiple access is achieved by providing each user with In some circumstances, FDMA schemes may be compar-
a number of the available subcarriers, which are now atively inefficient since they require guard bands between
neighboring bands to prevent interfering phenomena, so
this results in a waste of the system resources.
(a) Another problem with FDMA systems is that they suffer
1 2 K 1 2 K
from system nonlinearities; for example, they require the
lll lll
satellite transponder to be linear in nature, which cannot
be achieved in practice. However, this is not necessarily
FDD band as important in terrestrial communication systems, where
separation
the power consumption in the base station electronics is
B B
not a major design issue.
Referring to throughput performance, FDMA and
(b) Time TDMA should provide the same capability for carrying
information over a network, and this is true with respect
to bit-rate capability, if we neglect all overhead elements
8 such as guard bands in FDMA and guard times in TDMA.
7
For K users generating data at a constant uniform rate
6
8 TDMA 5 and for an overall rate capability of the system equal to
timeslots 4 R bits per second, R/K bps (bits per second) is available
3 (124 FDMA channels) to each user in both systems. Furthermore, both systems
2 lll have the same capacity-wasting properties, because if a
1 user has nothing to transmit, its frequency band cannot
200 KHz 200 KHz Frequency be used by another user.
Figure 4. (a) FDMA/FDD arrangement; (b) TDMA/FDMA for However, if we examine the average delay, assuming
GSM system. that each user is transmitting every T seconds, the delay
FREQUENCY-DIVISION MULTIPLE ACCESS (FDMA) 829
is found to be DFDMA = T for the FDMA system while it to as a thin-route service. The satellite multiservice system
is DTDMA = DFDMA − T/2{1 − 1/K} in TDMA [1]. Therefore, (SMS), provided by Eutelsat for business applications,
for two or more users TDMA is superior to FDMA. Note has up to 1600 carriers in its 72 MHz bandwidth, which
that for large numbers of users the difference in packet can be reallocated extremely quickly in response to
delay is approximately T/2. Because of these problems, changes in demand. FDMA has been used also for VSAT
fixed assignment strategies have increasingly tended to (very small-aperture terminals) satellite access, especially
shift from FDMA to TDMA as certain technical problems as DAMA method with FDMA/SCPC application. Most
associated with the latter were overcome. of today’s commercial networks are based on demand
On the other hand, in transmission environments assignment FDMA due to simple network control needed.
where spurious narrowband interference is a problem, the Detailed VSAT implementations can be easily found in the
FDMA format, with a single user channel per carrier, has literature.
an advantage compared to TDMA in that a narrowband At the present time, all mobile satellite systems that
interferer can impair the performance of only one user offer voice services have adopted a frequency–division
channel. multiple-access approach. The usual channel bandwidth
The major advantage in the implementation of FDMA is around 30 kHz, but figures of 22.5, 36, and 45 kHz
systems is that FDMA is a very mature technology and are also used. The use of FDMA in conjunction with
has the advantage of inexpensive terminals. Furthermore, common channel signaling enables future developments
channel assignment is simple and straightforward, in modulation and processing technologies to be easily
and no network timing is required. This absence of incorporated into the system and allows a variety of
synchronization problems is very attractive in fading communication standards to be supported. Also, as growth
communications channels. in the mobile terminal population occurs, incremental
Finally, in multipath fading environments, in a typical increases in the required system spectrum allocation can
FDMA system, channel bandwidth is usually smaller than be easily assigned [6].
the coherence bandwidth of the transmission channel and The modulation and multiple-access techniques used
there is no need to use an adaptive equalizer at the in the IRIDIUM system are patterned after the GSM
receiver. But at the same time, this situation removes terrestrial cellular system. A combined FDMA-TDMA
the opportunity for the implicit frequency diversity gains access format is used along with data or vocoded voice and
that are achievable when signal bandwidth approaches digital modulation techniques. Each FDMA frequency slot
the coherence bandwidth. Of course, in multipath is 41.67 kHz wide, including guard bands, and supports
environments CDMA is proved to be very effective. Its 4-duplex voice channels in a TDMA arrangement [5].
immunity to external interference and jamming, its low All the analog terrestrial mobile systems were based on
probability of intercept, and the easy integration of FDMA techniques. The band of frequencies was divided
voice/data messages it offers establish its use in future into segments, and half of the contiguous segments
multimedia communications systems. But a combination are assigned to outbound and the other to inbound
of FDMA/CDMA can improve the capacity of the system. (FDD/FDMA) cell sites with a guard band between
A detailed comparison of FDMA with other multi- outbound and inbound contiguous channels. A key
ple access techniques can be found in well-known text- question in these systems was the actual width of one user
books [5,6]. Schwartz et al. have provided a very funda- segment (e.g., in North American amperage, the segment
mental comparison [10]. width was 30 kHz).
FDMA is also applied in the contemporary cellular
5. BASIC APPLICATIONS systems and in VHF and UHF land-mobile radio
systems. The well-known GSM standard is based on a
FDMA was the first method of multiple access used for TDMA/FDMA combination that has been considered a
satellite communication systems (fixed, broadcasting, and very efficient scheme. The European digital cordless phone
mobile services) since around 1965 and will probably (DECT) is also based on a TDMA/TDD/FDMA format. In
remain so for quite some time. In 1969 three satellites, the United States in several FCC-authorized frequency
Intelsat III, provided the first worldwide satellite service bands, particularly those below 470 MHz, the authorized
via analog FDM/FM/FDMA techniques for assemblies bandwidth per channel is limited to the 5–12.5-kHz range.
of 24, 60, or 120 telephone channels, while later, in In these narrowband mobile or cellular systems, digital
Intelsat VI, 792 or 972 voice channels were delivered. FDMA could offer the most spectral- and cost-efficient
Also, special techniques are applied for the distribution solutions.
of audio and video channels on the same transponder A very interesting application of the FDMA concept is
(TV/FM/FDMA) [6]. Further, as a good example of digital found in the third-generation optical networks under the
operation in the FDMA mode, we can refer to Intelsat’s name of wavelength-division multiple access (WDMA) [11],
intermediate data rate (IDR) carrier system described by meaning that the bits of the message are addressed on the
Freeman [5]. basis of different wavelengths. WDMA networks have been
A specific FDMA application is the single-channel-per- investigated and prototyped at the laboratory level by a
carrier (SCPC) system; one of the best known is the number of groups such as British Telecom Laboratories
Intelsat (SPADE) system described above. SCPC systems and AT&T Bell Laboratories.
are most suitable for use where each terminal is required to In conclusion, we can state that FDMA as a stand-alone
provide only a small number of telephone circuits, referred concept or as a basic component in hybrid implementations
830 FREQUENCY SYNTHESIZERS
is being applied and will be applied in the future in most (UTRA) System, Part 1: System Description and Performance
communications systems. Evaluation, SMG2 Tdoc 362a/97, 1997.
10. J. W. Schwartz, J. M. Aein, and J. Kaiser, Modulation tech-
niques for multiple-access to a hard limiting repeater, Proc.
BIOGRAPHY IEEE 54: 763–777 (1966).
11. P. E. Green, Fiber Optic Networks, Prentice-Hall, Englewood
Fotini-Niovi Pavlidou received her Ph.D. degree in Cliffs, NJ, 1993.
electrical engineering from the Aristotle University of
Thessaloniki, Greece, in 1988 and the Diploma in
mechanical–electrical engineering in 1979 from the same
institution. FREQUENCY SYNTHESIZERS
She is currently an associate professor at the Depart-
ment of Electrical and Computer Engineering at the ULRICH L. ROHDE
Aristotle University engaged in teaching for the under- Synergy Microwave Corporation
and postgraduate program in the areas of mobile commu- Paterson, New Jersey
nications and telecommunications networks. Her research
interests are in the field of mobile and personal com- 1. INTRODUCTION
munications, satellite communications, multiple access
systems, routing and traffic flow in networks, and QoS Frequency synthesizers are found in all modern commu-
studies for multimedia applications over the Internet. nication equipment, and signal generators, particularly
She is involved with many national and international wireless communication systems such as cell phones,
projects in these areas (Tempus, COST, Telematics, IST) require these building blocks [1]. On the basis of a fre-
and she has been chairing the European COST262 Action quency standard, the synthesizer provides a stable refer-
on ‘‘Spread Spectrum Systems and Techniques for Wired ence frequency for the system.
and Wireless Communications.’’ She has served as a mem- Synthesizers are used to generate frequencies with
ber of the TPC in many IEEE/IEE conferences and she arbitrary resolution covering the frequency range from
has organized/chaired some conferences like the ‘‘IST audio to millimeterwave. Today, simple frequency syn-
Mobile Summit 2002,’’ the 6th ‘‘International Symposium thesizers consist of a variety of synthesizer chips, an
on Power Lines Communications-ISPLC2002,’’ the ‘‘Inter- external voltage-controlled oscillator (VCO), and a fre-
national Conference on Communications-ICT 1998,’’ etc. quency standard. For high-volume applications, such as
She is a permanent reviewer for many IEEE/IEE1 cell phones, cordless telephones, walkie-talkies, or sys-
journals. She has published about 60 studies in refereed tems where frequency synthesizers are required, a high
journals and conferences. degree of integration is desired. The requirements for syn-
She is a senior member of IEEE, currently chairing the thesizers in cordless telephones are not as stringent as in
joint IEEE VT & AES Chapter in Greece. test equipment. Synthesizers in test equipment use cus-
tom building blocks and can be modulated to be part of
arbitrary waveform generators [2].
BIBLIOGRAPHY The VCO typically consists of an oscillator with a tuning
diode attached. The voltage applied to the tuning diode
1. K. Pahlavan and A. Levesque, Wireless Information Net- tunes the frequency of the oscillator. Such a simple system
works, Wiley, New York, 1995. is a phase-locked loop (PLL). The stability of the VCO is the
2. V. K. Bhargava, D. Haccoun, R. Matyas, and P. P. Nuspl, same as the reference. There are single-loop and multiloop
Digital Communications by Satellite-Modulation, Multiple PLL systems. Their selection depends on the characteristic
Access and Coding, Wiley, New York, 1981. requirements. Figure 1 shows the block diagram of a single
3. G. Maral and M. Bousquet, Satellite Communications Sys- loop PLL [3].
tems; Systems, Techniques and Technology, Wiley, New York, The PLL consists of a VCO, a frequency divider, a phase
1996. detector, a frequency standard, and a loop filter [4–7].
4. R. M. Gagliardi, Introduction to Communications Engineer- There are limits to how high the frequency-division
ing, Wiley, New York, 1988. ratio can be. Typically, the loop where the RF frequency
is divided below 1 kHz, becomes unstable. This is due
5. R. L. Freeman, Telecommunications Transmission Hand-
book, Wiley, New York, 1998.
to the fact that microphonic effects of the resonator
will unlock the system at each occurrence of mechanical
6. W. L. Morgan and G. D. Gordon, Communications Satellite vibration. At 1 GHz, this would be a division ratio of
Handbook, Wiley, New York, 1989.
one million. To avoid such high division ratios, either
7. J. L. Dicks and M. P. Brown, Jr., Frequency division multiple multiloop synthesizers are created, or a new breed of PLL
access (FDMA) for satellite communications systems, paper synthesizers called fractional-N division synthesizers will
presented at IEEE Electronics and Aerospace Systems be considered. At the same time, direct digital synthesis is
Convention, Washington, DC, Oct. 7–9, 1974.
being improved. Using direct digital synthesis in loops can
8. N. J. Muller, Desktop Encyclopedia of Telecommunications, also overcome some of the difficulties associated with high
McGraw-Hill, New York, 1998. division ratios. There are also combinations of techniques,
9. OFDMA evaluation report, The Multiple Access Scheme which we will refer to as hybrid synthesizers. They will be
Proposal for the UMTS Terrestrial Radio Air Interface covered here.
FREQUENCY SYNTHESIZERS 831
Synthesizer chip
Crystal Voltage-
Phase Low-pass Output
oscillator ÷ R counter controlled
detector filter frequency
(Reference) oscillator
Dual
÷N ÷A
modulus
counter counter
prescaler
Control
logic
Figure 1. Block diagram of an integrated frequency synthesizer. In this case, the designer has
control over the VCO and the loop filter; the reference oscillator is part of the chip. In most cases
(≤2.5 GHz), the dual-modulus prescaler is also inside the chip.
The double-mix-divide module has increased the input required would be extremely complex. If, instead, a 5-MHz
frequency by the switch-selectable frequency increment signal f2 is also used so that fin + f1 + f2 − 10 MHz, the
f ∗ /10. These double-mix-divide modules can be cascaded two frequencies at the first mixer output will (for an f1∗ of
to form a frequency synthesizer with any degree of 1 MHz) be 1 and 11 MHz. In this case the two frequencies
resolution. The double-mix-divide modular approach has will be much easier to separate with a bandpass filter.
the additional advantage that the frequencies f1 , f2 , and fin The auxiliary frequencies f1 and f2 can be selected in
can be the same in each module, so that all modules can each design only after considering all possible frequency
contain identical components. products at the mixer output.
A direct frequency synthesizer with three digits of Direct synthesis can produce fast frequency switching,
resolution is shown in Fig. 4. Each decade switch selects almost arbitrarily fine frequency resolution, low phase
one of 10 frequencies f2 + f ∗ . In this example the output of noise, and the highest-frequency operation of any of the
the third module is taken before the decade divider. methods. Direct frequency synthesis requires considerably
For example, it is possible to generate the frequencies more hardware (oscillators, mixers, and bandpass filters)
between 10 and 19.99 MHz (in 10-kHz increments), using than do the two other synthesis techniques to be
the three module synthesizer, by selecting described. The hardware requirements result in direct
synthesizers becoming larger and more expensive. Another
fin = 1 MHz disadvantage of the direct synthesis technique is that
unwanted (spurious) frequencies can appear at the output.
f1 = 4 MHz The wider the frequency range, the more likely that the
spurious components will appear in the output. These
f2 = 5 MHz
disadvantages are offset by the versatility, speed, and
flexibility of direct synthesis.
Since
fin + f1 + f2 = 10 fin (3)
2.1. PLL Synthesizer
the output frequency will be The most popular synthesizer is based on PLL [9]. An
example of such a PLL-based synthesizer is shown in
f2∗ f∗ Fig. 5. In preparing ourselves for hybrid synthesizers,
f0 = 10 fin = f3∗ + + 1 (4) it needs to be noted that the frequency standard can
10 100
also be replaced by a direct digital synthesizer (DDS).
Since f ∗ occurs in 1-MHz increments, f1∗ /100 will provide The number of references for synthesizers seems endless,
the desired 10-kHz frequency increments. but the most complete and relevant ones are given at
Theoretically, either f1 or f2 could be eliminated, the end. Its also very important to monitor the patents.
provided If any incremental improvements are achieved there,
fin + f1 (or f2 ) = 10 fin (5) the inventors immediately try to protect them with a
patent. The following will give some insight into the major
but the additional frequency is used in practice to building blocks used for PLL and hybrid synthesizers. It
provide additional frequency separation at the mixer should be noted that most designers use a combination
output. This frequency separation eases the bandpass of available integrated circuits, and most of the high-
filter requirements. For example, if f2 is eliminated, performance solutions are due to the careful design
f1 + fin must equal 10 fin or 10 MHz. If an f1∗ of 1 MHz of the oscillator, the integrated/lowpass filter, and the
is selected, the output of the first mixer will consist of the systems architecture. All of these items will be addressed,
two frequencies 9 and 11 MHz. The lower of these closely particularly the fractional-N synthesizer, which requires
spaced frequencies must be removed by the filter. The filter a lot of handholding in the removal of spurious products.
f in + f 1∗/10 f in + f ∗/ 100 + f 2∗ / 10
f 1∗ f 2∗
Double-mix- Double-mix- Double-mix- f 0 = 10 f in + 2 + + f 3∗
f in 10 10
divide module divide module divide module
#1 #2 #3
f2+f∗ f 2 + f 2∗ f 2 + f 3∗
Ten crystal
Figure 4. Phase-incoherent frequency syn- oscillators
f2 + f ∗
thesizer with three-digit resolution.
FREQUENCY SYNTHESIZERS 833
15 MHz
(freq std or DDS)
[ (f )]
3000 MHz
Phase comparator Integrator 3000 MHz VTO output
[Kp ] [T0,T1,T2,T3, A 0] [Kv ,Tv ] [ (f )VTO]
sampled waveform.
To complete the DDS, the memory size and length
(number of bits) of the memory word must be determined.
∆t = 1/Fclock
The word length is determined by system noise require-
∆t t
ments. The amplitude of the D/A output is that of an
Figure 8. Samples per cycle sine wave. This is typical of a exact sinusoid corrupted with the deterministic noise due
single-tone DAC output. F0 = Fclock /16 after low pass filtering; to truncation caused by the finite length of the digital
only the sine envelope is present. The lowpass filter removes the words (quantization noise). If an (n + 1)-bit word length
sampling energy. Each amplitude step is held for a clock period. (including one sign bit) is used and the output of the A/D
FREQUENCY SYNTHESIZERS 835
converter varies between ±1, the mean noise from the one would use the microprocessor to round it to the next
quantization will be increment of 1 Hz relative to the output frequency.
As to the spurious response, the worst-case spurious
2n response is approximately 20 log 2R , where R is the
1 1 1 1 2(n+1)
ρ2 = = (7) resolution of the digital/analog converter. For an 8-bit
12 2 3 2
A/D converter, this would mean approximately 48 dB
down (worst case), as the output loop would have an
The mean noise is averaged over all possible waveforms. analog filter to suppress close-in spurious noise. Modern
For a worst-case waveform, the noise is a square wave devices have a 14-bit resolution. Fourteen bits of resolution
with amplitude 12 ( 12 )n and ρ 2 = 14 ( 12 )2n for each bit added can translate into 20 log 214 or 80 dB, worse case, of
to the word length, the spectral purity improves by 6 dB. suppression. The actual spurious response would be much
The main drawback of a low-power DDS is that it is better. The current production designs for communication
limited to relatively low frequencies. The upper frequency applications, such as shortwave transceivers, despite the
is directly related to the maximum usable clock frequency; fact that they are resorting to a combination of PLLs
today, the limit is about 1 GHz. DDS tends to be noisier and DDSs, still end up somewhat complicated. By using
than other methods, but adequate spectral purity can 10 MHz from the DDS and using a single-loop PLL system,
be obtained if sufficient lowpass filtering is used at the one can easily extend the operation to above 1 GHz but
output. DDS systems are easily constructed using readily with higher complexity and power consumption. This
available microprocessors. The combination of DDS for was shown in Fig 5. Figure 10 shows a multiple-loop
fine frequency resolution plus other synthesis techniques synthesizer using a DDS for fine resolution.
to obtain higher-frequency output can provide high
resolution with very rapid setting time after a frequency
change. This is especially valuable for frequency-hopping 3. IMPORTANT CHARACTERISTICS OF SYNTHESIZERS
spread-spectrum systems.
The following is a list of parameters that are used to
In analyzing both the resolution and signal-to-noise
describe the performance of the synthesizer. These are
ratio (or rather signal-to-spurious performance) of the
referred to as figures of merit.
DDS, one has to know the resolution and input
frequencies. As an example, if the input frequency is
3.1. Frequency Range
approximately 35 MHz and the implementation is for a 32-
bit device, the frequency resolution compared to the input The output frequency of a synthesizer can vary over a
frequency is 35 × 106 ÷ 232 = 35 × 106 ÷ 4.294967296 × wide range. A synthesizer signal generator typically offers
109 or 0.00815 Hz ≈0.01 Hz. Given the fact that modern output frequencies from as low as 100 kHz to as high
shortwave radios with a first IF of about 75 MHz will have as several gigahertz. The frequency range is determined
an oscillator between 75 and 105 MHz, the resolution by the architecture of the signal generator as the system
at the output range is more than adequate. In practice, frequently uses complex schemes of combining frequencies
f0
42 to
1.0 to 21.0 to 108 MHz
2 MHz 2.0 MHz 22.0 MHz Bandpass Loop
low-pass VCO
filter filter
filter
20 MHz
÷4
Direct 5.25 to
Fine frequency
digital 5.25 to 5.5 MHz
0.01 Hz
synthesizer 5.5 MHz Bandpass
f
filter
36.75 to
10 MHz
102.5 MHz
÷2 ÷4
147 to
410 MHz
1 MHz Loop
÷ 20 f VCO
filter
20 MHz
Coarse frequency ÷N
TCXO
dBm (PO2)
0.00
noise sources in and outside of the transistor modulate
the VCO, resulting in energy or spectral distribution on
both sides of the carrier. This occurs via modulation and −10.00
conversion. The noise, or better, FM noise is expressed as
the ratio of output power divided by the noise power
relative to 1 Hz bandwidth measured at an offset of −20.00
the carrier. Figure 11 shows a typical phase noise plot
of a synthesizer. Inside the loop bandwidth, the carrier
signal is cleaned up, and outside the loop bandwidth, the −30.00
measurement shows the performance of the VCO itself. 0.00 1.00 2.00 3.00 4.00
Spectrum (GHz)
3.3. Output Power
Figure 12. Predicted harmonics at the output of a VCO.
The output power is measured at the designated output
port of the frequency synthesizer. Practical designs require
an isolation stage. Typical designs require one or more 3.6. Spurious Response
isolation stages between the oscillator and the output.
Spurious outputs are signals found around the carrier of a
The output power needs to be flat. While the synthesized
synthesizer that are not harmonically related. Good, clean
generator typically is flat with only 0.1 dB +/− deviation,
synthesizers need to have a spurious-free range of 90 dB,
the VCO itself can vary as much as +/− 2 dB over the
but these requirements make them expensive. While
frequency range.
oscillators typically have no spurious frequencies besides
3.4. Harmonic Suppression possibly 60- and 120-Hz pickup, the digital electronics in a
synthesizer generates a lot of signals, and when modulated
The VCO inside a synthesizer has a typical harmonic on the VCO, are responsible for these unwanted output
suppression of better than 15 dB. For high-performance products. (See also Fig. 11.)
applications, a set of lowpass filters at the output will
reduce the harmonic contents to a desired level. Figure 12 3.7. Step Size
shows a typical output power plot of a VCO.
The resolution, or step size, is determined by the
3.5. Output Power as a Function of Temperature architecture.
All active circuits vary in performance as a function of
3.8. Frequency Pushing
temperature. The output power of an oscillator over a
temperature range should vary less than a specified value, Frequency pushing characterizes the degree to which an
such as 1 dB. oscillator’s frequency is affected by its supply voltage. For
example, a sudden current surge caused by activating a input port. The input port’s parasitic elements determine
transceiver’s RF power amplifier may produce a spike on the VCO’s maximum possible modulation bandwidth.
the VCO’s DC power supply and a consequent frequency
jump. Frequency pushing is specified in frequency/voltage
3.16. Power Consumption
form and is tested by varying the VCO’s DC supply voltage
(typically ±1 V) with its tuning voltage held constant. This characteristic conveys the DC power, usually
specified in milliwatts and sometimes qualified by
3.9. Sensitivity to Load Changes operating voltage, required by the oscillator to function
To keep manufacturing costs down, many wireless properly [70].
applications use a VCO alone, without the buffering
action of a high reverse-isolation amplifier stage. In such
4. BUILDING BLOCKS OF SYNTHESIZERS
applications, frequency pulling, the change of frequency
resulting from partially reactive loads, is an important
oscillator characteristic. Pulling is commonly specified in 4.1. Oscillator
terms of the frequency shift that occurs when the oscillator
An oscillator is essentially an amplifier with sufficient
is connected to a load that exhibits a nonunity VSWR
feedback so the amplifier becomes unstable and begins
(such as 1.75, usually referenced to 50
), compared to
oscillation. The oscillator can be divided into an amplifier,
the frequency that results with unity-VSWR load (usually
a resonator, and a feedback system [71]. One of the most
50
). Frequency pulling must be minimized, especially in
simple equations describes this. It describes the input
cases where power stages are close to the VCO unit and
admittance of an amplifier with a tuned circuit at the
short pulses may affect the output frequency. Such poor
output described by the term YL .
isolation can make phase locking impossible.
∗ Y12 × Y21
3.10. Tuning Sensitivity Y11 = Y11 − (8)
Y22 + YL
This is a VCO parameter also expressed in fre-
quency/voltage and is not part of a synthesizer specifi- The feedback is determined by the term Y12 , or in practical
cation. terms, by the feedback capacitor. The transistor oscillator
(Fig. 13a) shows a grounded base circuit. For this type of
3.11. Posttuning Drift oscillator, using microwave transistors, the emitter and
After a voltage step is applied to the tuning diode input, collector currents are in phase. That means that the input
the oscillator frequency may continue to change until it circuit of the transistor needs to adjust the phase of the
settles to a final value. The posttuning drift is one of the oscillation. The transistor stage forms the amplifier, the
parameters that limits the bandwidth of the VCO input. tuned resonator at the output determines the frequency
of oscillation, and the feedback capacitor provides enough
3.12. Tuning Characteristic energy back into the transistor so that oscillation can
occur. To start oscillation, the condition, relative to the
This specification shows the relationship, depicted as output, can be derived in a similar fashion from the input:
a graph, between the VCO operating frequency and
the tuning voltage applied. Ideally, the correspondence Y12 × Y21
∗
between operating frequency and tuning voltage is linear. Y22 = (Y22 + YL ) − (9)
Y11 + YT
3.13. Tuning Linearity If the second term of the equation on the right side is larger
For stable synthesizers, a constant deviation of frequency than the first term on the right of the = sign, then Real
∗
versus tuning voltage is desirable. It is also important to (Y22 ) is negative and oscillation at the resonant frequency
make sure that there are no breaks in tuning range, for occurs.
example, that the oscillator does not stop operating with a The oscillator circuit shown in Fig 13c is good for
tuning voltage of 0 V. low-frequency applications, but the Colpitts oscillator is
preferred for higher frequencies. The Colpitts oscillator
3.14. Tuning Sensitivity and Tuning Performance works by rotating this circuit and grounds the collector
instead of the base. The advantage of the Colpitts oscillator
This datum, typically expressed in megahertz per volt
is the fact that it is an emitter–follower amplifier with
(MHz/V), characterizes how much the frequency of a VCO
feedback and shows a more uniform gain of a wider
changes per unit of tuning voltage change.
frequency range (see Fig. 14). Oscillators with a grounded
emitter or source have more stability difficulties over wider
3.15. Tuning Speed
frequency ranges.
This characteristic is defined as the time necessary for Depending on the frequency range, there are several
the VCO to reach 90% of its final frequency upon the resonators available. For low frequencies up to about
application of a tuning voltage step. Tuning speed depends 500 MHz, lumped elements such as inductors are useful.
on the internal components between the input pin and the Above these frequencies, transmission line–based res-
tuning diode, including the capacitance present at the onators are microwave resonators, which are better. Very
(a) cap
6 pF
osc
cap ptr
cap
200 nh
6.8 pF
cap
bip
1.3 pF
300 nh
p1
ind
2 nH
8 pF
cap ind
ind
cap
cap
bFr93 ow
− [ib]
5 pF
nois : bipnoise
res
b
1 nF 1 nF
2 nH
cap
170
res
8.2 koh
ind
2.2 koh
1 nF
res
bias
+
−
V:12
(b) −50.00
−75.00
−100.00
PN1<H1> [dBc/Hz]
−125.00
−150.00
−175.00
−200.00
1.00E02 1.00E03 1.00E04 1.00E05 1.00E06 1.00E07 1.00E08
FDev [Hz]
(c)
Cf
t
Gk
y11 y12 v2 y21 v1 y22
Cc L1 C′
Figure 13. (a) A grounded base VHF/UHF oscillator with a tuned resonator tapped in the middle
to improve operational Q; (b) the predicted phase noise of the oscillator in (a); c the equivalent
circuit of the oscillator configuration of (a). For the grounded base configuration, emitter and
collector currents have to be in phase. The phase shift introduced by Cf is compensated by the
input tuned circuit.
838
FREQUENCY SYNTHESIZERS 839
res 20 cap
3300 res
nois : bipnoise 2 pF Pc
bias c
3000 1000
+ bip
−
cap b
P : 11 mm
V:5 W : 2 mm
cap
50 nH
12 pF
1 pF
ind
cap
12 pF
220
1 pF
res
Pe
ms
HU:
H : .5 mm ER : 3
Label : sub
Figure 14. Colpitts oscillator using a coil inductor or a transmission line as a resonator. In this
case all the physical parameters of the transmission line have to be provided.
small oscillators use printed transmission lines. High- Q), once excited, will oscillate infinitely because there is
performance oscillators use ceramic resonators (CROs), no resistance element present to dissipate the energy.
dielectric resonators (DROs), and surface acoustic wave In the actual case where the inductor Q is finite, the
resonators (SAWs), to name only a few. The frequency oscillations die out because energy is dissipated in the
standard, if not derived from an atomic frequency stan- resistance. It is the function of the amplifier to maintain
dard is typically a crystal oscillator. Even the atomic oscillations by supplying an amount of energy equal to
frequency standards synchronize a crystal against an that dissipated. This source of energy can be interpreted
atomic resonance to compensate for the aging of the crys- as a negative resistor in series with the tuned circuit. If
tal. Figure 15a shows a circuit of a crystal oscillator with the total resistance is positive, the oscillations will die
typical component values that are based on a 10-MHz out, while the oscillation amplitude will increase if the
third overtone AT cut crystal. Q is the ratio of stored total resistance is negative. To maintain oscillations, the
energy divided by dissipated energy. The Q of the crystal two resistors must be of equal magnitude. To see how a
can be as high as one million. The figure of merit Q is negative resistance is realized, the input impedance of the
defined by ωL/R. In our case ωL/R is circuit in Fig. 16 will be derived.
If Y22 is sufficiently small (Y22 1/RL ), the equivalent
2π × 1Hy × 10 × 106 circuit is as shown in Fig. 16. The steady-state loop
= 1.25 million
50 equations are
Therefore, the resulting Q is 1.25 million. Typical Vin = Iin (XC1 + XC2 ) − Ib (XC1 − βXC2 ) (10)
resonator Q values for LC oscillators are 200; for structure-
based resonators such as ceramic resonators or dielectric 0 = −Iin (XC1 ) + Ib (XC1 +hie ) (11)
resonators, Q values of 400 are not uncommon.
An oscillator operating at 700 MHz is shown in Fig. 15a, After Ib is eliminated from these two equations, Zin is
including its schematic. CAD simulation was used to obtained as
determine the resonance frequency phase noise and the
harmonic contents, shown in Fig. 15b. The output power Vin (1 + β)XC1 XC2 + hie (XC1 + XC2 )
can be taken off either the emitter, at which provides better Zin = = (12)
Iin XC1 + hie
harmonic filtering, or from the collector, which provides
smaller interaction between the oscillator frequency and
If XC1 hie , the input impedance is approximately
the load. This effect is defined as frequency pulling.
equal to
The tuned capacitor can now be replaced by a voltage-
dependent capacitor. A tuning diode is a diode operated in
reverse. Its PN junction capacitance changes as a function 1+β
Zin ≈ XC1 XC2 + (XC1 + XC2 ) (13)
of applied voltage. hie
A two-port oscillator analysis will now be presented. It −gm 1
is based on the fact that an ideal tuned circuit (infinite Zin ≈ + (14)
ω 2 C1 C2 jω[C1 C2 /(C1 + C2 )]
840 FREQUENCY SYNTHESIZERS
(a) + 10 V
1 µF
Crystal 100 Ω
equivalent circuit
1 µF
32 pF
156 pF
15
1H
1.2 pF 156 pF 10 µH
50 Ω 470
Derivation: That is, the input impedance of the circuit shown in Fig. 17
is a negative resistor
I in
−gm
R= (15)
ω 2 C1 C2
in series with a capacitor
C1
C1 C2
Cin = (16)
RL C1 + C2
V in Z in which is the series combination of the two capacitors.
With an inductor L (with the series resistance RS )
connected across the input, it is clear that the condition
C2 for sustained oscillation is
gm
RS = (17)
ω 2 C1 C2
and the frequency of oscillation
1
XC2 (20)
Y22
gm G
r≤ ≤ 2 (21) C2
ω 2 C1 C2 ω C1 C2 C0
5.00
Real
0.00
Imaginary
Y 1 [mA]
−5.00
−10.00
550.00 600.00 650.00 700.00 750.00 800.00 850.00 900.00 Figure 18. The CAD-based linearized currents. Condi-
Freq [MHz] tions for oscillation are X = 0 and R < 0.
842 FREQUENCY SYNTHESIZERS
IC . KF is therefore higher for a given BJT operating as an Note: Results normalized to a center
oscillator than for the same transistor operating as a small- frequency of 6 GHz
−40
signal amplifier. This must be kept in mind when consid- VCOs
ering published fC data, which are usually determined −60 YTOs
Therefore ECL II
J− K F/F Q
fout = (MP + A)fref (29) Clock
Q C
MC 1013
output
SP8640 MC 1213 47
If A is incremented by one, the output frequency changes
by fref . In other words, the channel spacing is equal to ECL III
PE1 PE2 output
fref . This is the channel spacing that would be obtained
with a fully programmable divider operating at the same Control
frequency as the P/(P + 1) divider. For this system to work, 1.5 kΩ
the A counter must underflow before the M counter does; −VE
otherwise, P/(P + 1) will remain permanently in the P + 1
Figure 24. Level shifting information for connecting the various
mode. Thus, there is a minimum system division ratio,
ECL2 and ECL3 stages.
Mmin , below which the P/(P + 1) system will not function.
To find that minimum ratio, consider the following. The A
counter must be capable of counting all numbers up to and Q
including P − 1 if every division ratio is to be possible, or Q J–K F/F
C
J–K F/F Q
Clock 10 ÷ 11 1/2 MC1032 1/2 MC1032
C Q
SP8640 1232 1232
Amax = P − 1 (30)
Mmin = P since M > A (31) PE1 PE2
Reference
From
function High Z
latch AVDO
13
N = BP + A MUX MUXout
13-bit
B counter SDout
RFinA + Prescaler Load
RFinB − P/P+1 Load
6-bit
A counter M3 M2 M1
CE AGND DGND
Figure 26. Block diagram of an advanced digital fractional-N synthesizer.
R divider
5. PHASE-LOCKED LOOP DESIGNS
N divider
5.1. The Type 2, Second-Order Loop
CP output
The following is a derivation of the properties of the
type 2, second-order loop. This means that the loop
Figure 27. A digital phase frequency discriminator with a has two integrators, one being the diode and the other
programmable delay. the operational amplifier, and is built with the order
846 FREQUENCY SYNTHESIZERS
Passive Active
Type
1 2 3 4
R1 R1 R1 C R1 R2 C
Circuit C R2 − −
C + +
F ( j ω)
F ( j ω)
F ( j w)
F ( j w)
Transfer
characteristic w w w w
1 1+ j wt 2 1 1 + j wt 2
F ( j w) = 1+ j wt 1 1+ j w(t 1 + t 2) j wt 1 j wt 1
t1 = R1C , t2 = R2C
Passive lead-lag Passive lead lag with pole Active integrator Active integrator with pole
R1 R1
R2 C2
R1 C3
R2 −
R2
C3 +
R2 C2
C2 R1
C2 −
+
Type 1.5, 2nd order (Low gain) Type 1.5, 3rd order (Low gain) Type 2, 2nd order (High gain) Type 2, 3rd order (High gain)
of 2 as can be seen from the pictures above. The basic type 3 filter, and therefore it does not have to be treated
principle to derive the performance for higher-order loops separately. Another possible transfer function is
follows the same principle, although the derivation is more
complicated. Following the math section, we will show 1 1 + τ2 s
F(s) = (34)
some typical responses [95]. R1 C s
The type 2, second-order loop uses a loop filter in the
form with
1 τ2 s + 1 τ2 = R2 C (35)
F(s) = (33)
s τ1
Under these conditions, the magnitude of the transfer
The multiplier 1/s indicates a second integrator, which function is
is generated by the active amplifier. In Table 1, this is
the type 3 filter. The type 4 filter is mentioned there as 1 $
|F(jω)| = 1 + (ωR2 C)2 (36)
a possible configuration but is not recommended because, R1 Cω
as stated previously, the addition of the pole of the origin
creates difficulties with loop stability and, in most cases, and the phase is
requires a change from the type 4 to the type 3 filter.
◦
One can consider the type 4 filter as a special case of the θ = arctan(ωτ2 ) − 90 (37)
FREQUENCY SYNTHESIZERS 847
R1 R2
R1 R1 C2
C2 C3
C1
C1 C1
st1 + 1 st1+1 s t1 + 1
F (s ) = R1 F (s ) = R1 F (s ) = R1
st1 st1(st2 + 1) s t 1( st 2 + 1)(st3 + 1)
t1 = R1C1 C1C2 C C
t1 = R1C1 ; t2 = R2 t1 = R1C1 ; t2 = R1 1 3 ;
C1 + C2 C1 + C3
t3 = R2C2
Type 2, 2nd order Type 2, 3rd order Type 2, 4th order Figure 30. Recommended passive filters for
charge pumps.
Again, as if for a practical case, we start off with the and the noise bandwidth is
design values ωn and ξ , and we have to determine τ1 and
τ2 . Taking an approach similar to that for the type 1, K(R2 /R1 ) + 1/τ2
Bn = Hz (47)
second-order loop, the results are 4
K Again, we ask the question of the final error and use the
τ1 = (38) previous error function
ωn
sθ (s)
and E(s) = (48)
2ζ s + K(R2 /R1 ){[s + (1/τ2 )]/s}
τ2 = (39)
ωn
or
s2 θ (s)
and E(s) = (49)
R1 =
τ1
(40) s2 + K(R2 /R1 )s + (K/τ2 )(R2 /R1 )
C
As a result of the perfect integrator, the steady-state error
and resulting from a step change in input phase or change of
τ2
R2 = (41) magnitude of frequency is zero.
C
If the input frequency is swept with a constant range
The closed-loop transfer function of a type 2, second-order change of input frequency (ω/dt), for θ (s) = (2ω/dt)/s3 ,
PLL with a perfect integrator is the steady-state phase error is
By introducing the terms ξ and ωn , the transfer function The maximum rate at which the VCO frequency can be
now becomes swept for maintaining lock is
2ζ ωn s + ωn2 2ω N 1
B(s) = (43) = 4Bn − rad/s (51)
s2 + 2ζ ωn s + ωn2 dt 2τ2 τ2
with the abbreviations The introduction of N indicates that this is referred to the
1/2 VCO rather than to the phase/frequency comparator. In
K R2 the previous example of the type 1, first-order loop, we
ωn = rad/s (44)
τ2 R1 referred it only to the phase/frequency comparator rather
than the VCO.
and Figure 31 shows the closed-loop response of a type 2,
1 R2 1/2
ζ = Kτ2 (45) third-order loop having a phase margin of 10◦ and with
2 R1 the optimal 45◦ .
A phase margin of 10◦ results in overshoot, which in
and K = Kθ Ko /N.
the frequency domain would be seen as peaks in the
The 3-dB bandwidth of the type 2, second-order loop is
oscillator noise–sideband spectrum. Needless to say, this
ωn , 2 $ -1/2 is a totally undesirable effect, and since the operational
B3 dB = 2ζ + 1 + (2ζ 2 + 1)2 + 1 Hz (46) amplifiers and other active and passive elements add to
2π
848 FREQUENCY SYNTHESIZERS
−20
1SA
−30
−40
−50
−60
−70
Figure 31. Measured spectrum of a synthe-
sizer where the loop filter is underdamped, −80
resulting in ≈10-dB increase of the phase noise
at the loop filter bandwidth. In this case, we −90
either don’t meet the 45◦ phase margin crite- FXD
rion, or the filter is too wide, so it shows the −100
effect of the up-converted reference frequency. Center 830.0000251 MHz 2.5 kHz Span 25 kHz
40
(a) (b) +20
(a) Second order PLL, d.f. = 0.45
Integrated response F (s ) (dB)
30 (b) Second order PLL, d.f. = 0.1 +10 Phase margin = 10°
(c) Third order PLL, q = 45°
0
Response (dB)
1 1
1 (s) = L[ω1 (t)] (reference input to δ/ω detector)
e (s) = ω1 N
s s[N(τ1 + τ2 ) + Ko Kd τ2 ] + (N + Ko Kd )
2 (s) = L[ω2 (t)] (signal VCO output frequency) +
τ1 + τ2
e (s) = L[ωe (t)] (error frequency at δ/ω detector) + (61)
s[N(τ1 + τ2 ) + Ko Kd τ2 ] + (N + Ko Kd )
2 (s)
e (s) =
1 (s) − For the first term we will use the abbreviation A, and for
N
the second term we will use the abbreviation B:
2 (s) = [
1 (s) −
e (s)]N
1/[N(τ1 + τ2 ) + Ko Kd τ2 ]
A= ' ( (62)
From the circuit described above N + Ko Kd
s s+
N(τ1 + τ2 ) + Ko Kd τ2
A(s) =
e (s)Kd τ1 + τ2
B(s) = A(s)F(s) N(τ1 + τ2 ) + Ko Kd τ2
B= (63)
N + Ko Kd
2 (s) = B(s)Ko s+
N(τ1 + τ2 ) + Ko Kd τ2
The error frequency at the detector is
After the inverse Laplace transformation, our final result
becomes
1
e (s) =
1 (s)N (52)
N + Ko Kd F(s) 1
L−1 (A) =
N + Ko Kd
The signal is stepped in frequency:
' (+
N + Ko Kd
× 1 − exp −t (64)
ω1 N(τ1 + τ2 ) + Ko Kd τ2
1 (s) = (ω1 = magnitude of frequency step)
s τ1 + τ2
(53) L−1 (B) =
N(τ1 + τ2 ) + Ko Kd τ2
N + Ko Kd
5.2.1.1. Active Filter of First Order. If we use an active × exp −t (65)
filter N(τ1 + τ2 ) + Ko Kd τ2
1 + sτ2
F(s) = (54) and finally
sτ1
and insert this in (51), the error frequency is ωe (t) = ω1 N[L−1 (A) + (τ1 + τ2 )L−1 (B)] (66)
input of the PLL as long as the loop can follow and the 5.2.2.1. Active Filter. The transfer characteristic of the
phase error does not become larger than 2π . Once the active filter is
error is larger than 2π , the loop jumps out of lock. When 1 + sτ2
F(s) = (69)
the loop is out of lock, a beat note occurs at the output of sτ1
the loop filter following the phase/frequency detector.
The tristate state phase/frequency comparator, how- This results in the formula for the phase error at the
ever, works on a different principle, and the pulses detector:
generated and supplied to the charge pump do not allow s
the generation of an AC voltage. The output of such a θe (s) = 2π (70)
s2 + (sKo Kd τ2 /τ1 )/N + (Ko Kd /τ1 )/N
phase/frequency detector is always unipolar, but relative
to the value of Vbatt /2, the integrator voltage can be either The polynomial coefficients for the denominator are
positive or negative. If we assume for a moment that
this voltage should be the final voltage under a locked a2 = 1
condition, we will observe that the resulting DC voltage
is either more negative or more positive relative to this Ko Kd τ2 /τ1
a1 =
value, and because of this, the VCO will be ‘‘pulled in’’ to N
this final frequency rather than swept in. The swept-in Ko Kd /τ1
a0 =
technique applies only in cases of phase/frequency com- N
parators, where this beat note is being generated. A typical
case would be the exclusive-OR gate or even a sample/hold and we have to find the roots W1 and W2 . Expressed in the
comparator. This phenomenon is rarely covered in the lit- form of a polynomial coefficient, the phase error is
erature and is probably discussed in detail for the first
s
time in the book by Roland Best [9]. θe (s) = 2π (71)
Let us assume now that the VCO has been pulled in (s + W1 )(s + W2 )
to final frequency to be within 2π of the final frequency,
and the time t is known. The next step is to determine the After the inverse Laplace transformation has been
lockin characteristic. performed, the result can be written in the form
W1 e−W1 t − W2 e−W2 t
5.2.2. Lockin Characteristic. We will now determine δe (t) = 2π (72)
W1 − W2
the lockin characteristic, and this requires the use of
a different block diagram. Figure 5 shows the familiar with
block diagram of the PLL, and we will use the following lim δe (t) = 2π
definitions: t→0
θe (s) = L[δe (t)] (phase error at δ/ω detector) The same can be done using a passive filter.
θ2 (s)
θe (s) = θ1 (s) − 5.2.2.2. Passive Filter. The transfer function of the
N
passive filter is
From the block diagram, the following is apparent:
1 + sτ2
F(s) = (73)
A(s) = θe (s)Kd 1 + s(τ1 + τ2 )
and finally find the roots for W1 and W2 . This can be 2.0
written in the form
' ( 1.8
Damping factor = 0.1
1 1 s
θe (s) = 2π +
τ1 + τ2 (s + W1 )(s + W2 ) (s + W1 )(s + W2 ) 1.6
with 0.4
lim δe (t) = 2π 0.2
t→0
with 0 10 20 30
lim δe (t) = 0
t→∞ Time (ms)
Figure 35. Normalized output response of a type 2, second-order
When analyzing the frequency response for the various loop with a damping factor of 0.1 and 0.05 for
n = 0.631.
types and orders of PLLs, the phase margin played an
important role. For the transient time, the type 2, second-
order loop can be represented with a damping factor or,
(Butterworth response) is shown in Fig. 40. Figure 41
for higher orders, with the phase margin. Figure 35 shows
shows an actual settling time measurement. Any deviation
the normalized output response for a damping factor of
from ideal damping, as we’ll soon see, results in ringing
0.1 and 0.47. The ideal Butterworth response would be
(in an underdamped system) or, in an overdamped system,
a damping factor of 0.7, which correlates with a phase
the voltage will crawl to its final value. This system can be
margin of 45◦ .
increased in its order by selecting a type 2, third-order loop
using the filter shown in Fig. 42. For an ideal synthesis of
5.3. Loop Gain/Transient Response Examples
the values, the Bode diagram looks as shown in Fig. 43,
Given the simple filter shown in Fig. 36 and the and its resulting response is given in Fig. 44.
parameters as listed, the Bode plot is shown in Fig. 37. The order can be increased by adding an additional
This approach can also be translated from a type 1 into a lowpass filter after the standard loop filter. The resulting
type 2 filter as shown in Fig. 38 and its frequency response system is called a type 2, fifth-order loop. Figure 45
is shown in Fig. 39. The lockin function for this type 2, shows the Bode diagram or open-loop diagram, and
second-order loop with an ideal damping factor of 0.707 Fig. 46 shows the locking function. By using a very
1.33 KOhm
391 Ohm
100. nF
1.72 KOhm
−
852
FREQUENCY SYNTHESIZERS 853
850.87500MHz
0.00 s 50.00 µs 100.0 µs
10.00 µs/div Analyze
T1 25.56 µs T2 17.56 µs ∆ −8.00 µs All Between
markers
Settling time -------
ref int Figure 41. Example of settling time measurement.
wide loop bandwidth, this can be used to clean up very slight overshoot remains, improving the phase noise
microwave oscillators with inherent comparatively poor significantly. This is shown in Fig. 53.
phase noise. This cleanup, has a dramatic influence on the
performance.
By deviating from the ideal 45◦ to a phase margin of 5.4. Practical Circuits
◦
33 , one obtains the above mentioned ringing, as is evident
from Fig. 47. The time to settle has increased from 13.3 to Figure 54 shows a passive filter used for a synthesizer chip
62 µs. with constant current output. This chip has a charge pump
To more fully illustrate the effects of nonideal phase output, which explains the need for the first capacitor.
margin, Figs. 48, 49, 50, and 51 show the lockin function Figure 55 shows an active integrator operating at
of a different type 2, fifth-order loop configured for phase a reference frequency of several megahertz. The notch
margins of 25◦ , 35◦ , 45◦ , and 55◦ , respectively. filter at the output reduces the reference frequency
I have already mentioned that the loop should avoid considerably. The notch is about 4.5 MHz.
‘‘ears’’ (Fig. 31) with poorly designed loop filters. Another Figure 56 shows the combination of a phase/frequency
interesting phenomenon is the tradeoff between loop discriminator and a higher-order loop filter as used in
bandwidth and phase noise. In Fig. 52 the loop bandwidth more complicated systems, such as fractional-division
has been made too wide, resulting in a degradation of the synthesizers.
phase noise but provides faster settling time. By reducing Figure 57 shows a custom-built phase detector with a
the loop bandwidth from about 1 kHz to 300 Hz, only a noise floor of better than — 168 dBc/Hz.
854 FREQUENCY SYNTHESIZERS
95.5 nF
461 nF
6. THE FRACTIONAL-N PRINCIPLE has to be divided by 996× every 1000 cycles. This can
easily be implemented by adding the number 0.996 to
The principle of the fractional-N PLL synthesizer was the contents of an accumulator every cycle. Every time
briefly mentioned in Section 2. The following is a the accumulator overflows, the divider divides by 18
numerical example for better understanding. rather than by 17. Only the fractional value of the
addition is retained in the phase accumulator. If we
Example 1. Considering the problem of generating move to the lower band or try to generate 850.2 MHz,
4
899.8 MHz using a fractional-N loop with a 50-MHz N remains 17, and K/F becomes 1000 . This method of
reference frequency, we obtain using fractional division was first introduced by using
analog implementation and noise cancellation, but today
it is implemented totally as a digital approach. The
K
899.8 MHz = 50 MHz N + necessary resolution is obtained from the dual-modulus
F
prescaling, which allows for a well-established method
The integral part of the division N has to be set to for achieving a high-performance frequency synthesizer
996
17 and the fractional part K/F needs to be 1000 ; (the operating at UHF and higher frequencies. Dual-modulus
K prescaling avoids the loss of resolution in a system
fractional part is not a integer) and the VCO output compared to a simple prescaler; it allows a VCO step
F
FREQUENCY SYNTHESIZERS 855
Phase
to implementing the fractional-N synthesizer principle.
Gain
0 0
Although the fractional-N technique appears to have a −10 Phase −10
good potential of solving the resolution limitation, it is not
free of having its own complications. Typically, an overflow −30 −30
from the phase accumulator, which is the adder with the −50 −50
Fref
output feedback to the input after being latched, is used
−70 −70
to change the instantaneous division ratio. Each overflow
produces a jitter at the output frequency, caused by the −90 −90
z
z
fractional division, and is limited to the fractional portion
H
kH
kH
kH
H
M
M
0
10
0
10
10
0
10
10
In our case, we had chosen a step size of 200 kHz, Frequency (Hz)
and yet the discrete sidebands vary from 200 kHz for
Figure 45. Bode plot of the fifth-order PLL system for a
K/F = 10004
to 49.8 MHz for K/F = 1000 996
. It will become microwave synthesizer. The theoretical reference suppression is
the task of the loop filter to remove those discrete better than 90 dB.
spurious components. While in the past the removal
of the discrete spurs has been accomplished by using
analog techniques, various digital methods are now [16932 + 72]
available. The microprocessor has to solve the following Fout = 50 MHz ×
1000
equation:
= 846.6 MHz + 3.6 MHz
K = 850.2 MHz
N∗ = N + = [N(F − K) + (N + 1)K]
F
By increasing the number of accumulators, frequency
Example 2. For FO = 850.2 MHz, we obtain
resolution much below 1-Hz step size is possible with
850.2 MHz the same switching speed.
N∗ = = 17.004
50 MHz
There is an interesting, generic problem associated
Following the formula above we have with all fractional-N synthesizers. Assume for a moment
that we use our 50-MHz reference and generate a 550-
K [17(1000 − 4) + (17 + 1) × 4] MHz output frequency. This means that our division
N∗ = N + =
F 1000 factor is 11. Aside from reference frequency sidebands
[16932 + 72] (±50 MHz) and harmonics, there will be no unwanted
= = 17.004 spurious frequencies. Of course, the reference sidebands
1000
360 360
280 280
200 200
Phase (degrees)
Phase (degrees)
120 120
40 40
0
−40 0
−40
−120
−120
−200
−200
−280
−280
−360
0.0 2 4 6 8 10 12 14 16 18 20 −360
0.0 7 14 21 28 35 42 49 56 63 70
Time (µs)
Time (µs)
Figure 47. Lockin function of the fifth-order PLL. Note that the
Figure 46. Lockin function of the fifth-order PLL. Note that the phase margin has been reduced from the ideal 45◦ . This results
phase lock time is approximately 13.3 µs. in a much longer settling time of 62 µs.
856
FREQUENCY SYNTHESIZERS 857
will be suppressed by the loop filter by more than 90 dB. might still be acceptable, but for a 1-kHz offset, the
For reasons of phase noise and switching speed, a loop necessary loop bandwidth of 100 Hz would make the loop
bandwidth of 100 kHz has been considered. Now, taking too slow. A better way is to use a different reference
advantage of the fractional-N principle, say that we frequency — one that would place the resulting spurious
want to operate at an offset of 30 kHz (550.03 MHz). product considerably outside the 100-kHz loop filter
With this new output frequency, the inherent spurious window. If, for instance, we used a 49-MHz reference,
signal reduction mechanism in the fractional-N chip multiplication by 11 would result in 539 MHz. Mixing
limits the reduction to about 55 dB. Part of the reason this with 550.03 MHz would result in spurious signals at
why the spurious signal suppression is less in this case ±11.03 MHz, a frequency so far outside the loop bandwidth
is that the phase frequency detector acts as a mixer, that it would essentially disappear. Starting with a
collecting both the 50-MHz reference (and its harmonics) VHF, low-phase-noise crystal oscillator, such as 130 MHz,
and 550.03 MHz. Mixing the 11th reference harmonic one can implement an intelligent reference frequency
(550 MHz) and the output frequency (550.03 MHz) results selection to avoid these discrete spurious signals. An
in output at 30 kHz; since the loop bandwidth is 100 kHz, additional method of reducing the spurious contents
it adds nothing to the suppression of this signal. To solve is maintaining a division ratio greater than 12 in all
this, we could consider narrowing the loop bandwidth cases. Actual tests have shown that these reference-based
to 10% of the offset. A 30-kHz offset would equate to spurious frequencies can be repeatedly suppressed by 80
a loop bandwidth of 3 kHz, at which the loop speed to 90 dB.
858 FREQUENCY SYNTHESIZERS
−50 res
P1 P2
Open loop
res
Phase noise (dBc/ Hz)
3.3 kOH
−70
cap
cap
sour
5 mA 1 nF 100 pF
cap
−90 10 nF
Closed loop
−110
−130
Figure 54. Type 1 high-order loop filter used for passive filter
−150 evaluation. The 1-nF capacitor is used for spike suppression as
explained in the text. The filter consists of a lag portion and an
−170 additional lowpass section.
1 10 100 1000 10 K 100 K 1M 10 M
Frequency offset from carrier (Hz) 6.1. Spur Suppression Techniques
Figure 52. Comparison between open- and closed-loop noise
While several methods have been proposed in the
prediction. Note the overshoot of around 1 kHz off the carrier.
literature, the method of reducing the noise by using a
sigma–delta modulator has shown to be most promising.
The concept is to get rid of the low-frequency phase
−30 error by rapidly switching the division ratio to eliminate
the gradual phase error at the discriminator input. By
−50 changing the division ratio rapidly between different
values, the phase errors occur in both polarities, positive
Phase noise (dBc / Hz)
cap
cap P2
n4 r1:1E6 221
res
r2:10 22 pF 221
res res 1.5 nF 1.5 nF
470 220
cap
0.1
1 nF 2.74 kOH
res
bias
+ V:5
−
Figure 55. A type 2 high-order filter with a notch to suppress the discrete reference spurs.
FREQUENCY SYNTHESIZERS 859
Phase – detector
N1.8
D3
74ACT74M
10 9 R14 R15 R39
S
11 C1
PDREF 12 1D R24 470 470 1K C16
13 8 22 100 N
R 8
2 − R40 UAB
C10 6
V10 R13 4.7 N
BAV70 3 + 1K R41
3 N1 100 C2
D3
220 CLC430AJE 1N
74ACT74M 2
4 R16 R17 C12
5
PDVCO 3 S C40
1U
2 1D C1 470 470 R20
47P 1.5 K
1 6 C11
+5 PD R 4.7 N
Figure 56. Phase/frequency discriminator including an active loop filter capable of operating up to 100 MHz.
REFCLKGENEN
DL2D2
A Y
A
B
AD 2D2
D Q Q2
DQ1
FD2
Y
A CK QN
FREQ_ON A Y
Y B RN
B REFCLK
DL1D2
REF_HALBE XO2
Y
AD2D2
Y
AD2D2
A
A
A Y
Y B
B
AD2D2
A
B
XO2D2
DETECTEN A
FREQ_ON B Y
RESB C
AD3
CLKOTOG
DETECTEN ND302
C
A Y B Y Phase
D Q B
A
FD2 ND2
CK QN
CLKO RN
A
B
AD2D2
Y
Q3
SN
D Q
Figure 57. Custom-built phase detec-
FD3 tor with a noise floor of better than
CK QN −168 dBc/Hz. This phase detector shows
extremely low phase jitter.
generators. However, this method does not allow us to The power spectral response of the phase noise for the
increase the loop bandwidth beyond the 100-kHz limit, three-stage sigma–delta modulator is calculated from
where the noise contribution of the sigma–delta modulator ' (2(n−1)
(2π )2 πf
takes over. L(f ) = × 2 sin rad2 /Hz (78)
12 × fref fref
Table 2 shows some of the modern spur suppression
methods. These three-stage sigma–delta methods with where n is the number of the stage of the cascaded
larger accumulators have the most potential. sigma–delta modulator. Equation (78) shows that the
860 FREQUENCY SYNTHESIZERS
Filter frequency response / predicted SSB modulator noise Table 2. Spur Suppression Methods
−100
−80 dB Random jittering Randomize divider Frequency jitter
Reference noise floor (−127 dBc) Sigma–delta Modulate division Quantization noise
Noise from Σ∆
−150 modulator modulation ratio
−200
f ref
f LPF
50 MHz
VCO Output
Dual-modulus prescaler
800 – 1000 Buffer
Frequency & programmable divider
MHz
reference
Data input
k D D
m bits a+b D
a+b D
a+b D
3-stage Σ∆ converter
Figure 59. Block diagram of the fractional-N synthesizer built using a custom IC capable of
operation at reference frequencies up to 150 MHz. The frequency is extensible up to 3 GHz using
binary (÷2, ÷4, ÷8, etc.) and fixed-division counters.
Modulation /sweep Frequency divider
ACTFREQ<54:0 >
WOBMODE
1 55
Sweep
mux 56Start frequency
Frequency Compensation
0 STFREQ <54:0 > register
register
55 CLKEN CLKEN
step value
A CLK CLK Z−1 Compensation
Schwert <3:0>
1
Schritwertigkeit
Modulation RES RES 15 adder Progmux <2:0>
4 Start 55 55 D Q A
D Q MSB 15 3
mux
S 55 0 LSB 40
Step width 32 ≥ LD
SCHWEIT<31:0> 1 40 15
Σ ≥ S mux TF<14 : 0>
56 Prescaler
B Newfreq 15
2
Σ
Clear
Value mux
0 +/− B
Compensation
Step clock value CIN
STEPCLK 26 calculation <14 : 0>
CLKEN Output register
CLK
861
AUFAB RES <14 : 0> <13:6>
11
Multiplier Multi<10:0>
15 B DIVPROG<7:0>
X
CLK
FASTEN
RES
16 PARSER MULTI<10:0> FASTTF<3:0> 5
CLKEN
Mod<15:0> SCHWERT<3:0> MCO_INV CLKO
CENB
WOBMODE CLKI Divider MCO
CLKEN Modulation WRB
CLK RDB SCHWEIT<31:0> RESET
register 2 CLKO2
RES ADR<4:0> 5 STRFEQ<54:0> SELCLK<1:0>
16 PROGMUX<2:0>
8
Modulation signal DATA<7:0> REFCLKGENEN
ADIN<15:0> 3 DETECTEN
INPORT<2:0> Interface FREQ_ON
CLK
RES
16 MCO_INV
CLKEN
OUTPORT<15:0> SOFTRES
KALSPEED ACDC
FMDKALEN RESETKAL
ACDC FMDCKAL RES
FMDCKALEN
RESETKAL KALSPEED
FASTF<3:0>
FASTEN DELTAIN DELTAOUT
Figure 60. Detailed block diagram of the inner workings of the fractional-N-division synthesizer chip.
862 FREQUENCY SYNTHESIZERS
−110 G N/1
−120 RF-Source/MHz
Divider:
1.000
1000.00
−130 Noise floor:
−170
−140
Figure 61. Composite phase noise of the fractional-N synthesizer system, including all noise
and spurious signals generated within the system. The discrete spurious of 7 kHz is due to the
nonlinearity of the phase detector. Its value needs to be corrected by 20.58 dB to a lesser value
because of the bandwidth of the FFT analyzer.
phase noise. Therefore, a comb generator is needed that 5. F. Gardner, Phaselock Techniques, 2nd ed., Wiley-Inter-
would take the output of the oscillator and multiply this science, New York, 1979.
up 10–20 times. 6. J. Gorski-Popiel, Frequency Synthesis: Techniques and
Figure 61 shows a simulated phase noise and termina- Applications, IEEE, New York, 1975.
tion of spurious outputs for the fraction-N-division syn- 7. V. F. Kroupa, Frequency Synthesis: Theory, Design, and
thesizer with converter. At the moment, it is unclear Applications, Wiley, New York, 1973.
if the PLL with DDS or the fractional-N-division synthe-
8. V. Manassewitsch, Frequency Synthesizers: Theory and
sizer principle with converter is the winning approach.
Design, Wiley, New York, 1976.
The DDS typically requires two or three loops and is
much more expensive, while the fractional-N approach 9. R. E. Best, Phase-Locked Loops: Theory, Design, and Appli-
requires only one loop and is a very intelligent spuri- cations, McGraw-Hill, New York, 1989.
ous removal circuit, and high-end solutions are typically 10. P. Danzer, ed., The ARRL Handbook for Radio Amateurs,
custom-built [96–111]. 75th ed., ARRL, Newington, 1997, Chap. 14, AC/RF sources
(Oscillators and Synthesizers).
BIBLIOGRAPHY 11. C. R. Chang and U. L. Rohde, The accurate simulation of
oscillator and PLL phase noise in RF sources, Wireless ’94
1. U. L. Rohde and J. Whitaker, Communications Receivers, Symp., Santa Clara, CA Feb. 15–18, 1994.
3rd ed., McGraw-Hill, New York, Dec. 2000. 12. M. M. Driscoll, Low noise oscillator design using acoustic
2. L. E. Larson, RF and Microwave Circuit Design for Wireless and other high Q resonators, 44th Annual Symp. Frequency
Communications, Artech House, Norwood, MA, 1996. Control, Baltimore, MD, May 1990.
3. U. L. Rohde, Microwave and Wireless Synthesizers: Theory 13. U. L. Rohde, Low-noise frequency synthesizers fractional N
and Design, Wiley, New York, Aug. 1997. phase-locked loops, Proc. SOUTHCON/81, Jan. 1981.
4. W. F. Egan, Frequency Synthesis by Phase Lock, Wiley- 14. W. C. Lindsey and C. M. Chie, eds., Phase-Locked Loops,
Interscience, New York, 1981. IEEE Press, New York, 1986.
FREQUENCY SYNTHESIZERS 863
15. U.S. Patent 4,492,936 (Jan. 8, 1985), A. Albarello, A. Roullet, 36. U.S. Patent 4,918,403 (April 17, 1990), F. L. Martin, Fre-
and A. Pimentel, Fractional-division frequency synthesizer quency synthesizer with spur compensation (Motorola, Inc.,
for digital angle-modulation (Thomson-CSF). Schaumburg, IL).
16. U.S. Patent 4,686,488 (Aug. 11, 1987), C. Attenborough, 37. U.S. Patent 4,816,774 (March 28, 1989), F. L. Martin,
Fractional-N frequency synthesizer with modulation com- Frequency synthesizer with spur compensation (Motorola).
pensation, (Plessey Overseas Ltd., Ilford, UK).
38. U.S. Patent 4,599,579 (July 8, 1986), K. D. McCann, Fre-
17. U.S. Patent 3,913,928 (Oct., 1975), R. J. Bosselaers, PLL quency synthesizer having jitter compensation (U.S. Philips
including an arithmetic unit. Corp., New York).
18. U.S. Patent 2,976,945, R. G. Cox, Frequency synthesizer 39. U.S. Patent 5,038,117 (Aug. 6, 1991), B. M. Miller, Multiple-
(Hewlett-Packard). modulator fractional-N divider (Hewlett-Packard).
19. U.S. Patent 4,586,005 (April 29, 1986), J. A. Crawford, 40. U.S. Patent 4,206,425 (June 3, 1980), E. J. Nossen, Digitized
Enhanced analog phase interpolation for fractional-N frequency synthesizer (RCA Corp., New York).
frequency synthesis (Hughes Aircraft Co., Los Angeles).
41. O. Peña, SPICE tools provide behavioral modeling of PLLs,
20. U.S. Patent 4,468,632 (Aug. 28, 1984), A. T. Crowley, PLL Microwaves RF Mag. 71–80 (Nov. 1997).
frequency synthesizer including fractional digital frequency
divider (RCA Corp., New York). 42. U.S. Patent 4,815,018 (March 21, 1989), V. S. Reinhardt
and I. Shahriary, Spurless fractional divider direct digital
21. U.S. Patent 4,763,083 (Aug. 9, 1988), A. P. Edwards, Low frequency synthesizer and method (Hughes Aircraft Co., Los
phase noise RF synthesizer (Hewlett-Packard), Palo Alto, Angeles).
CA.
43. U.S. Patent 4,458,329 (July 3, 1984), J. Remy, Frequency
22. U.S. Patent 4,752,902 (June 21, 1988), B. G. Goldberg, synthesizer including a fractional multiplier, (Adret Elec-
Digital frequency synthesizer (Sciteq Electronics, Inc., San tronique, Paris).
Diego, CA).
44. U.S. Patent 4,965,531 (Oct. 23, 1990), T. A. D. Riley, Fre-
23. U.S. Patent 4,958,310 (Sept. 18, 1990), B. G. Goldberg, quency synthesizers having dividing ratio controlled sigma-
Digital frequency synthesizer having multiple processing delta modulator (Carleton Univ., Ottawa, Canada).
paths.
45. U.S. Patent 5,021,754 (June 4, 1991), W. P. Shepherd,
24. U.S. Patent 5,224,132 (June 29, 1993), B. G. Goldberg,
D. E. Davis, and W. F. Tay, Fractional-N synthesizer having
Programmable fractional-N frequency synthesizer (Sciteq
modulation spur compensation (Motorola, Inc., Schaumburg,
Electronics, Inc., San Diego, CA).
IL).
25. B. G. Goldberg, Analog and digital fractional-n PLL fre-
46. U.S. Patent 3,959,737 (May 25, 1976), W. J. Tanis, Fre-
quency synthesis: A survey and update, Appl. Microwave
quency synthesizer having fractional-N frequency divider
Wireless 32–42 (June 1999).
in PLL (Engelman Microwave).
26. U.S. Patent 3,882,403, W. G. Greken, Digital frequency
47. Eur. Patent 0125790B2 (July 5, 1995), J. N. Wells, Fre-
synthesizer (General Dynamics).
quency synthesizers (Marconi Instruments, St. Albans,
27. U.S. Patent 5,093,632 (March 3, 1992), A. W. Hietala and Hertfordshire, UK).
D. C. Rabe, Latched accumulator fractional N synthesis
48. U.S. Patent 4,609,881 (Sept. 2, 1986), J. N. Wells, Frequency
with residual error reduction (Motorola, Inc., schaumburg,
IL). synthesizers (Marconi Instruments Ltd., St. Albans, UK).
28. U.S. Patent 3,734,269 (May, 1973), L. Jackson, Digital 49. U.S. Patent 4,410,954 (Oct. 18, 1983), C. E. Wheatley, III,
frequency synthesizer. Digital frequency synthesizer with random jittering for
reducing discrete spectral spurs (Rockwell International
29. U.S. Patent 4,758,802 (July 19, 1988), T. Jackson, Frac- Corp., El Segundo, CA).
tional-N synthesizer (Plessey Overseas Ltd., Ilford, UK).
50. U.S. Patent 5,038,120 (Aug. 6, 1991), M. A. Wheatley,
30. U.S. Patent 4,800,342 (Jan. 24, 1989), T. Jackson, Fre- L. A. Lepper, and N. K. Webb, Frequency modulated phase
quency synthesizer of the fractional type (Plessey Overseas locked loop with fractional divider and jitter compensation
Ltd., Ilford, UK). (Racal-Dana Instruments Ltd., Berkshire, UK).
31. Eur. Patent 0214217B1 (June 6, 1996), T. Jackson, Improve- 51. U.S. Patent 4,573,176 (Feb. 25, 1986), R. O. Yaeger, Frac-
ment in or relating to synthesizers (Plessey Overseas Ltd., tional frequency divider, (RCA Corp., Princeton, NJ).
Ilford, Essex, UK).
52. U. L. Rohde, A high performance synthesizer for base
32. Eur. Patent WO86/05046 (Aug. 28, 1996), T. Jackson,
stations based on the fractional-N synthesis principle,
Improvement in or relating to synthesizers (Plessey
Microwaves RF Mag. (April 1998).
Overseas Ltd., Ilford, Essex, UK).
53. U. L. Rohde and G. Klage, Analyze VCOs and fractional-N
33. U.S. Patent 4,204,174 (May 20, 1980), N. J. R. King, Phase
synthesizers, Microwaves RF Mag. (Aug. 2000).
locked loop variable frequency generator (Racal Communi-
cations Equipment Ltd., England). 54. R. Hassun and A. Kovalic, An arbitrary waveform synthe-
sizer for DC to 50 MHz, Hewlett-Packard J. (Palo Alto, CA)
34. U.S. Patent 4,179,670 (Dec. 18, 1979), N. G. Kingsbury,
(April 1988).
Frequency synthesizer with fractional division ratio and
jitter compensation (Marconi Co. Ltd., Chelmsford, UK). 55. L. R. Rabiner and B. Gold, Theory and Application of Digital
Signal Processing, Prentice-Hall, Englewood Cliffs, NJ,
35. U.S. Patent 3,928,813 (Dec. 23, 1975), C. A. Kingsford-
1975, Chap. 2.
Smith, Device for synthesizing frequencies which are
rational multiples of a fundamental frequency (Hewlett- 56. H. T. Nicholas and H. Samueli, An analysis of the output
Packard, Palo Alto, CA). spectrum of direct digital frequency synthesizers in the
864 FREQUENCY SYNTHESIZERS
presence of phase-accumulator truncation, 41st Annual 80. P. Braun, B. Roth, and A. Beyer, A measurement setup
Frequency Control Symp., IEEE Press, New York, 1987. for the analysis of integrated microwave oscillators, 2nd
Int. Workshop of Integrated Nonlinear Microwave and
57. L. B. Jackson, Roundoff noise for fixed point digital filters
Millimeterwave Circuits, 1992.
realized in cascade or parallel form, IEEE Trans. Audio
Electroacous. AU-18: 107–122 (June 1970). 81. C. R. Chang et al., Computer-aided analysis of free-running
microwave oscillators, IEEE Trans. Microwave Theory Tech.
58. Technical Staff of Bell Laboratories, Transmission Systems
39: 1735–1745 (Oct. 1991).
for Communication, Bell Labs, Inc., 1970, Chap. 25.
82. P. Davis et al., Silicon-on-silicon integration of a GSM
59. U.S. Patent 4,482,974, A. Kovalick, Apparatus and method
transceiver with VCO resonator, 1998 IEEE Int. Solid-State
of phase to amplitude conversion in a SIN function
Circuits Conf. Digest of Technical Papers, pp. 248–249.
generator.
83. P. J. Garner, M. H. Howes, and C. M. Snowden, Ka-band
60. C. J. Paull and W. A. Evans, Waveform shaping techniques
and MMIC pHEMT-basd VCO’s with low phase-noise
for the design of signal sources, Radio Electron. Eng. 44(10):
properties, IEEE Trans. Microwave Theory Tech. 46:
(Oct. 1974).
1531–1536 (Oct. 1998).
61. L. Barnes, Linear-segment approximations to a sinewave,
84. A. V. Grebennikov and V. V. Nikiforov, An analytic method
Electron. Eng. 40: (Sept. 1968).
of microwave transistor oscillator design, Int. J. Electron.
62. U.S. Patent 4,454,486, R. Hassun and A. Kovalic, Waveform 83: 849–858 (Dec. 1997).
synthesis using multiplexed parallel synthesizers.
85. A. Hajimiri and T. H. Lee, A general theory of phase noise
63. D. K. Kikuchi, R. F. Miranda, and P. A. Thysel, A waveform in electrical oscillators, IEEE J. Solid-State Circuits 33:
generation language for arbitrary waveform synthesis, 179–194 (Feb. 1998).
Hewlett-Packard J. (Palo Alto, CA) (April 1988).
86. Q. Huang, On the exact design of RF oscillators, Proc. IEEE
64. H. M. Stark, An Introduction to Number Theory, MIT Press, 1998 Custom Integrated Circuits Conf., pp. 41–44.
Cambridge, MA, 1978, Chap. 7. 87. Fairchild Data Sheet, Phase/Frequency Detector, 11C44,
65. W. Sagun, Generate complex waveforms at very high Fairchild Semiconductor, Mountain View, CA.
frequencies, Electron. Design (Jan. 26 1989). 88. Fairchild Preliminary Data Sheet, SH8096 Programmable
66. G. Lowitz and R. Armitano, Predistortion improves digital Divider-Fairchild Integrated Microsystems, April 1970.
synthesizer accuracy, Electron. Design (March 31 1988). 89. U. L. Rohde, Digital PLL Frequency Synthesizers — Theory
67. A. Kovalic, Digital synthesizer aids in testing of complex and Design, Prentice-Hall, Englewood Cliffs, NJ, Jan. 1983.
waveforms, EDN Mag. (Sept. 1 1988). 90. W. Egan and E. Clark, Test your charge-pump phase
68. G. Lowitz and C. Pederson, RF testing with complex detectors, Electron. Design 26(12): 134–137 (June 7 1978).
waveforms, RF Design (Nov. 1988). 91. S. Krishnan, Diode phase detectors, Electron. Radio Eng.
69. C. M. Merigold, in Kamilo Fehrer, ed., Telecommunications 45–50 (Feb. 1959).
Measurement Analysis and Instrumentation, Prentice-Hall, 92. Motorola Data Sheet, MC12012, Motorola Semiconductor
Englewood Cliffs, NJ, 1987. Products, Inc. Phoenix, AZ, 1973.
70. Synergy Microwave Corporation Designer Handbook, Spec- 93. Motorola Data Sheet, Phase-Frequency Detector, MC4344,
ifications of Synthesizers, 2001. MC4044.
71. G. Vendelin, A. M. Pavio, and U. L. Rohde, Microwave 94. U. L. Rohde and D. P. Newkirk, RF/Microwave Circuit
Circuit Design Using Linear and Nonlinear Techniques, Design for Wireless Applications, Wiley, 2000.
Wiley, New York, 1990.
95. W. C. Lindsey and M. K. Simon, eds., Phase-Locked Loops &
72. J. A. Crawford, Frequency Synthesizer Design Handbook, Their Application, IEEE Press, New York, 1978.
Artech House, Norwood, MA, 1994.
96. W. Z. Chen and J. T. Wu, A 2 V 1.6 GHz BJT phase-locked
73. D. B. Leeson, Short-term stable microwave sources, loop, Proc. IEEE 1998 Custom Integrated Circuits Conf.,
Microwave J. 59–69 (June 1970). pp. 563–566.
74. J. Smith, Modern Communication Circuits, McGraw-Hill, 97. J. Craninckx and M. Steyaert, A fully integrated CMOS
New York, 1986, pp. 252–261. DCS-1800 frequency synthesizer, 1998 IEEE Int. Solid-State
75. J. S. Yuan, Modeling the bipolar oscillator phase noise, Solid Circuits Conf. Digest of Technical Papers, pp. 372–373.
State Electron. 37(10): 1765–1767 (Oct. 1994). 98. B. De Smedt and G. Gielen, Nonlinear behavioral modeling
76. D. Scherer, Design principles and test methods for low and phase noise evaluation in phase locked loops, Proc. IEEE
phase noise RF and microwave sources, RF & Microwave 1998 Custom Integrated Circuits Conf., pp. 53–56.
Measurement Symp. Exhibition, Hewlett-Packard. 99. N. M. Filiol et al., An agile ISM band frequency synthesizer
77. S. Alechno, Analysis method characterizes microwave oscil- with built-in GMSK data modulation, IEEE J. Solid-State
lators, Microwaves RF Mag. 82–86 (Nov. 1997). Circuits 33(7): 998–1007 (July 1998).
78. W. Anzill, F. X. Kärtner, and P. Russer, Simulation of the 100. Fujitsu Microelectronics, Inc., Super PLL Application Guide,
single-sideband phase noise of oscillators, 2nd Int. Workshop 1998.
of Integrated Nonlinear Microwave and Millimeterwave 101. Hewlett-Packard Application Note 164-3, New Technique for
Circuits, 1992. Analyzing Phase-Locked Loops, June 1975.
79. N. Boutin, RF oscillator analysis and design by the loop gain 102. V. F. Kroupa, ed., Direct Digital Frequency Synthesizers,
method, Appl. Microwave Wireless 32–48 (Aug. 1999). IEEE Press, New York, 1999.
FREQUENCY SYNTHESIZERS 865
103. S. Lo, C. Olgaard, and D. Rose, A 1.8V/3.5 mA 1.1 GHz/ 107. M. Smith, Phase Noise Measurement Using the Phase Lock
300 MHz CMOS dual PLL frequency synthesizer IC for Technique, Wireless Subscriber Systems Group (WSSG)
RF communications, Proc. IEEE 1998 Custom Integrated AN1639, Motorola.
Circuits Conf., pp. 571–574. 108. V. von Kaenel et al., A 600 MHz CMOS PLL microprocessor
104. G. Palmisano et al., Noise in fully-integrated PLLs, Proc. clock generator with a 1.2 GHz VCO, 1998 IEEE Int. Solid-
6th AACD 97, Como, Italy, 1997. State Circuits Conf. Digest of Technical Papers, pp. 396–397.
105. B. H. Park and P. E. Allen, A 1 GHz, low-phase-noise CMOS 109. Motorola MECL Data Book, Chapter 6, Phase-Locked Loops,
frequency synthesizer with integrated LC VCO for wireless 1993.
communications, Proc. IEEE 1998 Custom Integrated 110. U. L. Rohde, Low noise microwave synthesizers, WFFDS:
Circuits Conf., pp. 567–570. Advances in microwave and millimeter-wave synthesizer
106. B. Sam, Hybrid frequency synthesizer combines octave technology, IEEE-MTT-Symp., Orlando, FL, May 19, 1995.
tuning range and millihertz steps, Appl. Microwave Wireless 111. U. L. Rohde, Oscillator design for lowest phase noise,
76–84 (May 1999). Microwave Eng. Eur. 35–40 (May 1994).
G
GENERAL PACKET RADIO SERVICE (GPRS) data rates up to ∼50 kbps and session establishment
times below one second. Furthermore, GPRS supports
CHRISTIAN BETTSTETTER a more user-friendly billing than that offered by circuit-
CHRISTIAN HARTMANN switched data services. In circuit-switched services, billing
Technische Universität is based on the duration of the connection. This is
München unsuitable for applications with bursty traffic, since
Institute of Communication the user must pay for the entire airtime even for
Networks idle periods when no packets are sent (e.g., when the
Munich, Germany user reads a Webpage). In contrast to this, packet-
switched services allow charging based on the amount
of transmitted data (e.g., in kilobytes) and the quality of
1. INTRODUCTION
service (QoS) rather than connection time. The advantage
for the user is that he/she can be online over a long
The General Packet Radio Service (GPRS) is a data period of time (‘‘always online’’) but will be billed
bearer service for GSM and IS136 cellular networks.
only when data are actually transmitted. The network
Its packet-oriented radio transmission technology enables
operators can utilize their radio resources in a more
efficient and simplified wireless access to Internet
efficient way and simplify the access to external data
protocol-based networks and services. With GPRS-enabled
networks.
mobile devices, users benefit from higher data rates [up
Typical scenarios for GPRS are the wireless access
to ∼50 kbps (kilobits per second)], shorter access times,
to the World Wide Web (WWW) and corporate local-
an ‘‘always on’’ wireless connectivity, and volume-based
area networks. Here, GPRS provides an end-to-end IP
billing.
connectivity; that is, users can access the Internet
In conventional GSM networks without GPRS, access
without first requiring to dial in to an Internet service
to external data networks from GSM mobile devices has
provider. Examples for applications that can be offered
already been standardized in GSM phase 2; however,
over GPRS are mobile e-commerce services, location-
on the air interface, such access occupied a complete
based tourist guides, and applications in the telemetry
circuit-switched traffic channel for the entire call period.
field. GPRS can also be used as a bearer for the Wireless
In case of bursty traffic (e.g., Internet traffic), it is obvious
Application Protocol (WAP) and the Short Message Service
that packet-switched bearer services result in a much
(SMS).
better utilization of the traffic channels. A packet channel
Considering the network evolution of second generation
will be allocated only when needed and will be released
cellular networks to third generation, GPRS enables a
after the transmission of the packets. With this principle,
smooth transition path from GSM/TDMA networks toward
multiple users can share one physical channel (statistical
the Universal Mobile Telecommunication System (UMTS).
multiplexing). Moreover, in conventional GSM exactly one
Especially, the IP backbone network of GPRS forms the
channel is assigned for both uplink and downlink. This
basis for the UMTS core network. With the introduction of
entails two disadvantages: (1) only symmetric connections
Enhanced Data Rates for GSM Evolution (EDGE), which
are supported and (2) the maximum data rate is restricted
will use 8-PSK modulation, GPRS will offer approximately
since the use of multiple channels in parallel is not
a 3 times higher data rate and a higher spectral efficiency.
possible. In order to address the inefficiencies of GSM
This mode will be called Enhanced GPRS (EGPRS).
phase 2 data services, the General Packet Radio Service
has been developed in GSM phase 2+ by the European
Telecommunications Standards Institute (ETSI). It offers
a genuine packet-switched transmission technology also 2. SYSTEM ARCHITECTURE
over the air interface, and provides access to networks
based on the Internet Protocol (IP) (e.g., the global To incorporate GPRS into existing GSM networks, several
Internet or private/corporate intranets) and X.25. GPRS modifications and enhancements have been made in the
enables asymmetric data rates as well as simultaneous GSM network infrastructure as well as in the mobile
transmission on several channels (multislot operation). stations. On the network side, a new class of nodes has
Initial work on the GPRS standardization began in been introduced, namely, the GPRS support nodes (GSNs).
1994, and the main set of specifications was approved They are responsible for routing and delivery of data
in 1997 and completed in 1999. Market introduction took packets between the mobile stations and external packet
place in the year 2000. GPRS was also standardized for data networks. Figure 1 shows the resulting GSM/GPRS
IS136 [1]; however, in this description we focus on GPRS system architecture [2,3]. A mobile user carries a mobile
for GSM. station (MS), which can communicate over a wireless link
Data transmission in conventional circuit switched with a base transceiver station (BTS). Several BTSs are
GSM is restricted to 14.4 kbps, and the connection setup controlled by a base station controller (BSC). The BTSs
takes several seconds. GPRS offers almost ISDN-like and BSC together form a base station subsystem (BSS).
866
GENERAL PACKET RADIO SERVICE (GPRS) 867
SMS-GMSC
Gd
Gp GGSN
SGSN
Gb Gf Gn
BTS External packet
BSC Gc GGSN data network
Gs Gr Gi
(e.g. Internet)
BTS
MS EIR
D
MSC/VLR HLR
SGSN Serving GPRS support node User data and signaling data
GGSN Gateway GPRS support node Signaling data
For an detailed explanation of the GSM elements and the GGSN are located in the same network, whereas the Gp
basic addresses, see the → GSM entry. interface will be used, if they are in different networks.
A serving GPRS support node (SGSN) delivers packets These interfaces are also defined between two SGSNs.
from and to the MSs within its service area. Its This allows the SGSNs to exchange user profiles when a
tasks include packet routing and transfer, functions for mobile station moves from one SGSN area to another. The
attach/detach of MSs and their authentication, mobility Gi interface connects the GGSN with external networks.
management, radio resource management, and logical From an external network’s point of view, the GGSN looks
link management. The location register of the SGSN stores like a usual IP router, and the GPRS network looks like
location information of all GPRS users registered with this any other IP subnetwork.
SGSN [e.g., current cell, current visitor location register Figure 1 also shows the signaling interfaces between
(VLR)] and their user profiles [e.g., international mobile the GSNs and the conventional GSM network entities [2].
subscriber identity (IMSI), address used in the packet data Across the Gf interface, the SGSN may query and verify
network]. the international mobile equipment identity (IMEI) of a
A gateway GPRS support node (GGSN) acts as an mobile station trying to attach to the network. GPRS
interface to external packet data networks (e.g., to the also adds some entries to the GSM registers. For mobility
Internet). It converts GPRS packets coming from the management, a user’s entry in the home location register
SGSN into the appropriate packet data protocol (PDP)
(HLR) is extended with a link to its current SGSN.
format (i.e., IP) and sends them out on the corresponding
Moreover, his/her GPRS-specific profile and current PDP
external network. In the other direction, the PDP address
address(es) are stored. The Gr interface is used to exchange
of incoming data packets (e.g., the IP destination address)
this information between HLR and SGSN. For example,
is mapped to the GSM address of the destination user. The
the SGSN informs the HLR about the current location
readdressed packets are sent to the responsible SGSN.
of the mobile station, and when a mobile station registers
For this purpose, the GGSN stores the current SGSN
addresses and profiles of registered users in its location with a new SGSN, the HLR will send the user profile to the
register. new SGSN. In a similar manner, the signaling interface
The functionality of the SGSN and GGSN can be between GGSN and HLR (Gc interface) may be used by
implemented in a single physical unit or in separate units. the GGSN to query the location and profile of a user
All GSNs are connected via an IP-based backbone network. who is unknown to the GGSN. In addition, the MSC/VLR
Within this GPRS backbone, the GSNs encapsulate the may be extended with functions and register entries that
external packets and transmit (tunnel) them using the allow efficient coordination between packet-switched and
so-called GPRS tunneling protocol (GTP). If there is a conventional circuit-switched GSM services. Examples
roaming agreement between two operators, they may for this optional feature are combined GPRS and GSM
install an inter-operator backbone between their networks location updates and combined attachment procedures.
(see Fig. 2). The border gateways (BGs) perform security Moreover, paging requests of circuit-switched GSM calls
functions in order to protect the private GPRS networks can be performed via the SGSN. For this purpose, the Gs
against attacks and unauthorized users. interface connects the registers of SGSN and MSC/VLR.
Via the Gn and the Gp interfaces (see Figs. 1 and 2), In order to exchange messages of the Short Message
user payload and signaling data are transmitted between Service via GPRS, the Gd interface interconnects the SMS
the GSNs. The Gn interface will be used, if SGSN and gateway MSC (SMS-GMSC) with the SGSN.
868 GENERAL PACKET RADIO SERVICE (GPRS)
BSC
MS BTS
BTS BSC
Inter-operator
GPRS backbone
Gn SGSN SGSN
GPRS backbone Gp
Border Border GPRS backbone
gateway gateway
Gn
GPRS net-
Gi work 2
GPRS net-
work 1 GGSN GGSN
Packet data network
Internet
Router Server
Figure 2. GPRS system architecture
and routing example. LAN
3. SERVICES and the QoS profile. The GPRS standard [4] defines four
QoS parameters: service precedence, reliability, delay, and
The bearer services of GPRS offer end-to-end packet- throughput. Using these parameters, QoS profiles can
switched data transfer to mobile subscribers. A point- be negotiated between the mobile user and the network
to-point (PTP) service is specified, which comes in two for each session, depending on the QoS demand and the
variants [4]: a connectionless mode for IP and a connection- currently available resources.
oriented mode for X.25. Furthermore, SMS messages can The service precedence is the priority of a service (in
be sent and received over GPRS. relation to other services). There are three levels of priority
It is planned to implement a point-to-multipoint defined in GPRS. In case of heavy traffic load, for example,
(PTM) service, which offers transfer of data packets to packets of low priority will be discarded first.
several mobile stations. For example, IP multicast routing The reliability indicates the transmission characteris-
protocols can be employed over GPRS for this purpose [5]. tics required by an application. Three reliability classes
Packets addressed to an IP multicast group will then be are defined, which guarantee certain maximum values for
routed to all group members. Also supplementary services the probability of packet loss, packet duplication, misse-
can be implemented, such as reverse charging and barring. quencing, and packet corruption (i.e., undetected errors in
Based on these standardized bearer and supplementary a packet); see Table 1.
services, a huge variety of nonstandardized services can As shown in Table 2, the delay parameters define
be offered over GPRS. The most important application maximum values for the mean delay and the 95-percentile
scenario is the wireless access to the Internet and to delay. The latter is the maximum delay guaranteed in 95%
corporate intranets (for e-mail communication, database of all transfers. Here, delay is defined as the end-to-end
access, Web browsing). Also WAP services can be accessed transfer time between two communicating mobile stations
in a more efficient manner than with circuit-switched or between a mobile station and the Gi interface to an
GSM. Especially mobility-related services that require external network, respectively. This includes all delays
an ‘‘always on’’ connectivity but only infrequent data within the GPRS network, such as the delay for request
transmissions (e.g., interactive city guides) benefit from and assignment of radio resources, transmission over the
the packet-oriented transmission and billing method of air interface, and the transit delay in the GPRS backbone
GPRS. network. Delays outside the GPRS network, for example
in external transit networks, are not taken into account.
3.1. Quality of Service (QoS)
Table 1. QoS Reliability Classes
Support of different QoS classes is an important feature
in order to support a broad variety of applications but Class Loss Duplication Missequencing Corruption
still preserve radio and network resources in an efficient
1 10−9 10−9 10−9 10−9
way. Moreover, QoS classes enable providers to offer 2 10−4 10−5 10−5 10−6
different billing options. The billing can be based on 3 10−2 10−5 10−5 10−2
the amount of transmitted data, the service type itself,
GENERAL PACKET RADIO SERVICE (GPRS) 869
Table 2. QoS Delay Classes (in seconds) network. In general, this address is called PDP address
128 Byte 1024 Byte (packet data protocol address). In case the external
network is an IP network, the PDP address is an IP
Class Mean 95% Mean 95%
address.
1 <0.5 <1.5 <2 <7 For each session, a so-called PDP context is created [2],
2 <5 <25 <15 <75 which describes the characteristics of the session. It
3 <50 <250 <75 <375 includes the PDP type (e.g., IPv6), the PDP address
4 Best effort assigned to the mobile station (e.g., an IP address), the
requested QoS class, and the address of a GGSN that
serves as the access point to the external network. This
Finally, the throughput parameter specifies the maxi- context is stored in the MS, the SGSN, and the GGSN.
mum and mean bit rate. Once a mobile station has an active PDP context, it
is visible for the external network and can send and
3.2. Simultaneous Usage of Packet-Switched and receive data packets. The mapping between the two
Circuit-Switched Services addresses (PDP ↔ GSM address) makes the transfer
of packets between MS and GGSN possible. In the
GPRS services can be used in parallel to circuit-switched
following we assume that access to an IP-based network
GSM services. The GPRS standard defines three classes of
is intended.
mobile stations [4]. Mobile stations of class A fully support
The allocation of an IP address can be static or dynamic.
simultaneous operation of GPRS and conventional GSM
In the first case, the mobile station permanently owns an
services. Class B mobile stations are able to register with
IP address. In the second case, using a dynamic addressing
the network for both GPRS and conventional GSM services
concept, an IP address is assigned on activation of a
simultaneously and listen to both types of signaling
PDP context. In other words, the network provider has
messages, but they can use only one of the service types at
reserved a certain number of IP addresses, and each time
a given time. Finally, class C mobile stations can attach for
a mobile station attaches to the GPRS network, it will
either GPRS or conventional GSM services at a given time.
obtain an IP address. After its GPRS detach, this IP
Simultaneous registration (and usage) is not possible,
address will be available to other users again. The IP
except for SMS messages, which can be received and sent
address can be assigned either by the user’s home network
at any time.
operator or by the operator of the visited network. The
GGSN is responsible for the allocation and deactivation
4. SESSION MANAGEMENT, MOBILITY MANAGEMENT, of addresses. Thus, the GGSN should also include DHCP
AND ROUTING (Dynamic Host Configuration Protocol [6]) functionality,
which automatically manages the available IP address
In this section we describe how a mobile station registers space.
with the GPRS network and becomes known to an external A basic PDP context activation procedure initialized
packet data network. We show how packets are routed to by an MS is as follows [2]: Using the message ACTIVATE
or from mobile stations, and how the network keeps track PDP CONTEXT REQUEST, the MS informs the SGSN about
of the user’s current location [2]. the requested PDP context. Afterward, the usual GSM
security functions are performed. If access is granted, the
4.1. Attachment and Detachment Procedure
SGSN will send a CREATE PDP CONTEXT REQUEST to the affected
Before a mobile station can use GPRS services, it must GGSN. The GGSN creates a new entry in its PDP context
attach to the network (similar to the IMSI attach used table, which enables the GGSN to route data packets
for circuit-switched GSM services). The mobile station’s between the SGSN and the external network. It confirms
ATTACH REQUEST message is sent to the SGSN. The network this to the SGSN and transmits the dynamic PDP address
then checks if the user is authorized, copies the user (if needed). Finally, the SGSN updates its PDP context
profile from the HLR to the SGSN, and assigns a packet table and confirms the activation of the new PDP context
temporary mobile subscriber identity (P-TMSI) to the user. to the MS.
This procedure is called GPRS attach. It establishes a In case the GGSN receives packets from the external
logical link between the mobile station and the SGSN, network that are addressed to a known static PDP address
such that the SGSN can perform paging of the mobile of an MS, it can perform a network-initiated PDP context
station and deliver SMS messages. For mobile stations activation procedure.
using circuit- and packet-switched services, it is possible
to implement combined GPRS/IMSI attach procedures. 4.3. Routing
The disconnection from the GPRS network is called GPRS In Fig. 2 we give an example of how IP packets are
detach. It can be initiated by the mobile station or by the routed to an external IP-based data network. A GPRS
network. mobile station located in the GPRS network 1 addresses
IP packets to a Web server connected to the Internet.
4.2. Session Management and PDP Context
The SGSN to which the mobile station is attached
To exchange data packets with external packet data encapsulates the IP packets coming from the mobile
networks after a successful GPRS attach, a mobile station station, examines the PDP context, and routes them
must apply for an address to be used in the external through the GPRS backbone to the appropriate GGSN.
870 GENERAL PACKET RADIO SERVICE (GPRS)
The GGSN decapsulates the IP packets and sends them the routing context does not change, there is no need to
out on the IP network, where IP routing mechanisms inform other network elements, such as GGSN or HLR.
transfer the packets to the access router of the destination In the second case, the new routing area is administered
network. The latter delivers the IP packets to the Web by a SGSN different from the old routing area. The new
server. SGSN realizes that the MS has entered its area. It requests
In the other direction, the Web server addresses its the PDP context(s) of the user from the old SGSN and
IP packets to the mobile station. They are routed to the informs the involved GGSNs about the user’s new routing
GGSN from which the mobile station has its IP address context. In addition, the HLR and (if needed) the MSC/VLR
(e.g., its home GGSN). The GGSN queries the HLR and are informed about the user’s new SGSN number.
obtains information about the current location of the user. Besides pure routing updates, there also exist combined
In the following, it encapsulates the incoming IP packets routing/location area updates. They are performed when-
and tunnels them to the appropriate SGSN in the current ever an MS using GPRS as well as conventional GSM
network of the user. The SGSN decapsulates the packets services moves to a new location area.
and delivers them to the mobile station. To sum up, we can say that GPRS mobility management
consists of two levels: (1) micromobility management
tracks the current routing area or cell of the user and
4.4. Location Management
(2) macromobility management keeps track of the user’s
As in circuit-switched GSM, the main task of location current SGSN and stores it in the HLR, VLR, and GGSN.
management is to keep track of the user’s current location,
so that incoming packets can be routed to his/her MS. For 5. PROTOCOL ARCHITECTURE
this purpose, the MS frequently sends location update
messages to its SGSN.
The protocol architecture of GPRS comprises transmission
In order to use the radio resources occupied for mobility-
and signaling protocols. This includes standard GSM
related signaling traffic in an efficient way, a state
protocols (with slight modifications), standard protocols
model for GPRS mobile stations with three states has
of the Internet Protocol suite, and protocols that have
been defined [2]. In IDLE state the MS is not reachable.
specifically been developed for GPRS. Figure 3 illustrates
Performing a GPRS attach, it turns into READY state.
the protocol architecture of the transmission plane [2].
With a GPRS detach it may deregister from the network
The architecture of the signaling plane includes functions
and fall back to IDLE state, and all PDP contexts will be
for the execution of GPRS attach and detach, mobility
deleted. The STANDBY state will be reached when an MS in
management, PDP context activation, and the allocation
READY state does not send any packets for a long period of
of network resources.
time, and therefore the READY timer (which was started at
GPRS attach and is reset for each incoming and outgoing
5.1. GPRS Backbone: SGSN-GGSN
transmission) expires.
The location update frequency depends on the state As mentioned earlier, the GPRS tunneling protocol (GTP)
in which the MS currently is. In IDLE state, no location carries the user’s IP or X.25 packets in an encapsulated
updating is performed; that is, the current location of the manner within the GPRS backbone (see Fig. 3). GTP is
MS is unknown. If an MS is in READY state, it will inform defined both between GSNs within the same network (Gn
its SGSN of every movement to a new cell. For the location interface) and between GSNs of different networks (Gp
management of an MS in STANDBY state, a GSM location interface).
area (→ see GSM entry) is divided into so-called routing The signaling part of GTP specifies a tunneling control
areas. In general, a routing area consists of several cells. and management protocol. The signaling is used to
The SGSN will be informed when an MS moves to a new create, modify, and delete tunnels. A tunnel identifier
routing area; cell changes will not be indicated. In addition (TID), which is composed of the IMSI of the user and
to these event-triggered routing area updates, periodic a network-layer service access point identifier (NSAPI),
routing area updating is also standardized. To determine uniquely indicates a PDP context. Below GTP, one of
the current cell of an MS that is in STANDBY state, paging of the standard Internet protocols of the transport layer,
the MS within a certain routing area must be performed. Transmission Control Protocol (TCP) or User Datagram
For MSs in READY state, no paging is necessary. Protocol (UDP), are employed to transport the GTP packets
Whenever a mobile user moves to a new routing area, within the backbone network. TCP is used for X.25 (since
it sends a ROUTING AREA UPDATE REQUEST to its assigned X.25 expects a reliable end-to-end connection), and UDP
SGSN [2]. The message contains the routing area identity is used for access to IP-based networks (which do not
(RAI) of its old routing area. The BSS adds the cell expect reliability in the network layer or below). In the
identifier (CI) of the new cell to the request, from which network layer, IP is employed to route the packets through
the SGSN can derive the new RAI. Two different scenarios the backbone. Ethernet, ISDN, or ATM-based protocols
are possible: intra-SGSN routing area updates and inter- may be used below IP. To summarize, in the GPRS
SGSN routing area updates. backbone we have an IP/X.25-over-GTP-over-UDP/TCP-
In the first case, the mobile user has moved to a routing over-IP protocol architecture.
area that is assigned to the same SGSN as the old routing For signaling between SGSN and the registers HLR,
area. The SGSN has already stored the necessary user VLR, and EIR, protocols known from conventional GSM
profile and can immediately assign a new P-TMSI. Since are employed, which have been partly extended with
GENERAL PACKET RADIO SERVICE (GPRS) 871
Application
Network Network
layer layer
(IP, X.25) (IP. X.25)
Relay
SNDCP SNDCP GTP GTP
LLC LLC
TCP/UDP TCP/UDP
Relay
Layer 2 RLC RLC BSSGP BSSGP
IP IP
PLL PLL
Layer 1 Phy. layer Phy. layer Phy. layer Phy. layer
RFL RFL
Um Gb Gn Gi
SNDCP Subnetwork dependent convergence protocol BSSGP BSS GPRS application protocol
LLC Logical link control GTP GPRS tunneling protocol
RLC Radio link control TCP Transmission control protocol
MAC Medium access control UDP User datagram protocol
PLL Physical link layer IP Internet protocol
RFL Physical RF layer
GPRS-specific functionality. Between SGSN and HLR as on the receiver side. Moreover, SNDCP offers compression
well as between SGSN and EIR, an enhanced mobile and decompression of user data and redundant header
application part (MAP) is used. The exchange of MAP information (e.g., TCP/IP header compression).
messages is accomplished over the transaction capabilities
application part (TCAP), the signaling connection control 5.2.2. GPRS Mobility Management and Session Man-
part (SCCP), and the message transfer part (MTP). The agement. For signaling between an MS and the SGSN,
BSS application part (BSSAP+), which is based on the GPRS mobility management and session management
GSM’s BSSAP, is applied to transfer signaling information (GMM/SM) protocol is employed above the LLC layer. It
between SGSN and VLR (Gs interface). This includes, in includes functions for GPRS attach/detach, PDP context
particular, signaling of the mobility management when activation, routing area updates, and security procedures.
coordination of GPRS and conventional GSM functions is
necessary (e.g., for combined GPRS and non-GPRS location 5.2.3. Data-Link Layer. The data-link layer is divided
updates, combined GPRS/IMSI attach, or paging of a user into two sublayers [7]:
via GPRS for an incoming GSM call).
• Logical link control (LLC) layer (between MS and
5.2. Air Interface SGSN)
In the following we consider the transport and signaling • Radio-link control/medium access control (RLC/
protocols at the air interface (Um) between the mobile MAC) layer (between MS and BSS)
station and the BSS or SGSN, respectively.
The LLC layer provides a reliable logical link between
5.2.1. Subnetwork-Dependent Convergence Proto- an MS and its assigned SGSN. Its functionality is based
col. An application running in the GPRS mobile station on the link access procedure D mobile (LAPDm) protocol,
(e.g., a Web browser) uses IP or X.25, respectively, in which is a protocol similar to high-level data-link control
the network layer. The subnetwork-dependent conver- (HDLC). LLC includes in-order delivery, flow control,
gence protocol (SNDCP) is used to transfer these pack- error detection, and retransmission of packets [automatic
ets between the MSs and their SGSN. Its functionality repeat request (ARQ)], and ciphering functions. It supports
includes multiplexing of several PDP contexts of the net- variable frame lengths and different QoS classes, and
work layer onto one virtual logical connection of the under- besides point-to-point, point-to-multipoint transfer is also
lying logical link control (LLC) layer and segmentation of possible. A logical link is uniquely addressed with a
network-layer packets onto LLC frames and reassembly temporary logical link identifier (TLLI). The mapping
872 GENERAL PACKET RADIO SERVICE (GPRS)
between TLLI and IMSI is unique within a routing area. QoS-related information between BSS and SGSN. The
However, the user’s identity remains confidential, since underlying network service (NS) protocol is based on the
the TLLI is derived from the P-TMSI of the user. Frame Relay protocol.
The RLC/MAC layer has two functions. The purpose
of the radio-link control (RLC) layer is to establish a 5.4. Routing and Conversion of Addresses
reliable link between the MS and the BSS. This includes
the segmentation and reassembly of LLC frames into We now explain the routing of incoming IP packets in
RLC data blocks and ARQ of uncorrectable blocks. The more detail. Figure 5 illustrates how a packet arrives
MAC layer employs algorithms for contention resolution at the GGSN and is then routed through the GPRS
of random access attempts of mobile stations on the backbone to the responsible SGSN and finally to the MS.
radio channel (slotted ALOHA), statistical multiplexing of Using the PDP context, the GGSN determines from the
channels, and a scheduling and prioritizing scheme, which IP destination address a TID and the IP address of the
takes into account the negotiated QoS. On one hand, the relevant SGSN. Between GGSN and the SGSN, the GPRS
MAC protocol allows for a single MS to simultaneously tunneling protocol is employed. The SGSN derives the
use several physical channels (several time slots of the TLLI from the TID and finally transfers the IP packet to
same TDMA frame). On the other hand, it also controls the MS. The NSAPI, which is part of the TID, maps a
the statistical multiplexing; that is, it controls how several given IP address to the corresponding PDP context. An
MSs can access the same physical channel (the same time NSAPI/TLLI pair is unique within one routing area.
slot of successive TDMA frames). This will be explained in
more detail in Section 6.
6. AIR INTERFACE
5.2.4. Physical Layer. The physical layer between MS
and BSS can be divided into the two sublayers: physical The packet-oriented air interface [7] is one of the key
link layer (PLL) and physical radiofrequency layer (RFL). aspects of GPRS. Mobile stations with multislot capability
The PLL provides a physical channel between the MS and can transmit on several time slots of a TDMA frame,
the BSS. Its tasks include channel coding (i.e., detection uplink and downlink are allocated separately, and physical
of transmission errors, forward error correction, and channels are assigned only for the duration of the
indication of uncorrectable codewords), interleaving, and transmission, which leads to a statistical multiplexing
detection of physical link congestion. The RFL operates gain. This flexibility in the channel allocation results in a
below the PLL and includes modulation and demodulation. more efficient utilization of the radio resources. On top of
the physical channels, a number of logical packet channels
5.2.5. Data Flow. To conclude this section, Fig. 4 have been standardized. A special packet traffic channel
illustrates the data flow between the protocol layers in is used for payload transmission. The GPRS signaling
the mobile station. Packets of the network layer (e.g., IP channels are used, such as for broadcast of system
packets) are passed down to the SNDCP layer, where information, multiple access control, and paging. GPRS
they are segmented to LLC frames. After adding header channel coding defines four different coding schemes,
information and a framecheck sequence (FCS) for error which allow one to adjust the tradeoff between the level of
protection, these frames are segmented into one or several error protection and data rate.
RLC data blocks. Those are then passed down to the
MAC layer. One RLC/MAC block contains a MAC and 6.1. Logical Channels
RLC header, the RLC payload (information bits), and a
block-check sequence (BCS) at the end. The channel coding Several GPRS logical channels are defined in addition to
of RLC/MAC blocks and the mapping to a burst in the the logical channels of GSM. As with logical channels in
physical layer will be explained in Section 6.3. conventional GSM, they can be grouped into two cate-
gories: traffic channels and signaling (control) channels.
5.3. BSS-SGSN Interface The signaling channels can be further divided into packet
At the Gb interface, the BSS GPRS application protocol broadcast control, packet common control, and packet
(BSSGP) is defined on layer 3. It delivers routing and dedicated control channels.
SGSN
GTP [SGSN address, Internet
BSC
TID, IP packet]
GGSN
MS SNDCP IP packet
BTS [TLLI, NSAPI, IP packet] [IP source, IP destination]
MS
IP packet
The packet data traffic channel (PDTCH) is employed MS sends over the uplink part of the PTCCH, the
for the transfer of user data. It is assigned to one mobile PTCCH/U, access bursts to the BTS. From the
station (or in case of PTM to multiple mobile stations). One delay of these bursts, the correct value for the
mobile station can use several PDTCHs simultaneously. timing advance can be derived. This value is then
The packet broadcast control channel (PBCCH) is a transmitted in the downlink part, the PTCCH/D, to
unidirectional point-to-multipoint signaling channel from inform the MS.
the BSS to the mobile stations. It is used to broadcast
information about the organization of the GPRS radio Coordination between GPRS and GSM logical channels
network to all GPRS mobile stations of a cell. Besides is also possible here to save radio resources. If the PCCCH
system information about GPRS, the PBCCH should also is not available in a cell, a GPRS mobile station can use
broadcast important system information about circuit- the common control channel (CCCH) of circuit-switched
switched services, so that a GSM/GPRS mobile station GSM to initiate the packet transfer. Moreover, if the
does not need to listen to the GSM broadcast control PBCCH is not available, it can obtain the necessary system
channel (BCCH). information via the BCCH.
The packet common control channels (PCCCHs) trans- Four different coding schemes (CS1–CS4) are defined
port signaling information for functions of the network for data transmission on the PDTCH. Depending on
access management, specifically, for allocation of radio the used coding scheme, the net data throughput on
channels, medium access control, and paging. The group the PDTCH can be 9.05, 13.4, 15.6, or 21.4 kbps. The
of PCCCHs comprises the following channels: respective coding schemes are described in Section 6.3.
Theoretically, a mobile station can be assigned up to eight
• The packet random access channel (PRACH) is used PDTCHs, each of which can be either unidirectional or
by the mobile stations to request one or more PDTCH. bidirectional.
• The packet access grant channel (PAGCH) is used to
allocate one or more PDTCH to a mobile station. 6.2. Multiple Access and Radio Resource Management
• The packet paging channel (PPCH) is used by the
BSS to determine the location of a mobile station On the physical layer, GPRS uses the GSM combination
(paging) prior to downlink packet transmission. of FDMA and TDMA with eight time slots per TDMA
frame. However, several new methods are used for channel
• The packet notification channel (PNCH) is used to
allocation and multiple access [7]. They have significant
inform mobile stations of incoming PTM messages.
impact on the performance of GPRS. In circuit-switched
GSM, a physical channel (i.e., one time slot of successive
The packet dedicated control channels are bidirectional
TDMA frames) is permanently allocated for a particular
point-to-point signaling channels. This group consists of
MS during the entire call period (regardless of whether
the following two channels:
data are transmitted). Moreover, a GSM connection is
always symmetric; that is, exactly one time slot is assigned
• The packet-associated control channel (PACCH) is to uplink and downlink.
always allocated in combination with one or more GPRS enables a far more flexible resource allocation
PDTCH. It transports signaling information related scheme for packet transmission. A GPRS mobile station
to one specific mobile station (e.g., power control can transmit on several of the eight time slots within the
information). same TDMA frame (multislot operation). The number of
• The packet timing-advance control channel (PTCCH) time slots that an MS is able to use is called multislot class.
is used for adaptive frame synchronization. The In addition, uplink and downlink are allocated separately,
874 GENERAL PACKET RADIO SERVICE (GPRS)
which saves radio resources for asymmetric traffic (e.g., (TBF) is established. With that, resources (e.g., PDTCH
Web browsing). and buffers) are allocated for the mobile station, and data
The radio resources of a cell are shared by all GSM transmission can start. During transfer, the USF in the
and GPRS users located in this cell. A cell supporting header of downlink blocks indicates to other MSs that this
GPRS must allocate physical channels for GPRS traffic. uplink PDTCH is already in use. On the receiver side,
A physical channel that has been allocated for GPRS a temporary flow identifier (TFI) helps to reassemble the
transmission is denoted as packet data channel (PDCH). packets. Once all data have been transmitted, the TBF
The number of PDCHs can be adjusted according to the and the resources are released again.
current traffic demand (‘‘capacity on demand’’ principle). The downlink channel allocation (mobile terminated
For example, physical channels not currently in use for packet transfer) is performed in a similar fashion. Here,
GSM calls can be allocated as PDCHs for GPRS to increase the BSS sends a PACKET PAGING REQUEST message on the
the quality of service for GPRS. When there is a resource PPCH or PCH [the GSM equivalent of the PPCH (→ GSM
demand for GSM calls, PDCHs may be de-allocated. entry)] to the mobile station. The mobile station replies
As already mentioned, physical channels for packet- with a PACKET CHANNEL REQUEST message on the PRACH or
switched transmission (PDCHs) are allocated only for RACH, and the further channel allocation procedure is
a particular MS when this MS sends or receives data similar to the uplink case.
packets, and they are released after the transmission.
With this dynamic channel allocation principle, multiple 6.3. Channel Coding
MSs can share one physical channel. For bursty traffic this
results in a much more efficient use of the radio resources. Channel coding is used to protect the transmitted
The channel allocation is controlled by the BSC. To data packets against errors and perform forward error
prevent collisions, the network indicates in the downlink correction. The channel coding technique in GPRS is quite
which channels are currently available. An uplink state similar to the one employed in conventional GSM (→ see
flag (USF) in the header of downlink packets shows which GSM entry). An outer block coding, an inner convolutional
MS is allowed to use this channel in the uplink. The coding, and an interleaving scheme is used. Figure 6 shows
allocation of PDCHs to an MS also depends on its multislot how a block of the RLC/MAC layer is encoded and mapped
class and the QoS of its current session. onto four bursts.
In the following we describe the procedure of uplink As shown in Table 3, four coding schemes (CS1, CS2,
channel allocation (mobile originated packet transfer). A CS3, and CS4) with different code rates are defined [8].
mobile station requests a channel by sending a PACKET For each scheme, a block of 456 bits results after encoding;
CHANNEL REQUEST message on the PRACH or RACH [the however, different data rates are obtained depending on
GSM equivalent of the PRACH (→ GSM entry)]. The BSS the used coding scheme, due to the different code rates
answers on the PAGCH or AGCH [the GSM equivalent of the convolutional encoder and to different numbers
of the PAGCH (→ GSM entry)], respectively. Once the of parity bits. Figure 6 illustrates the encoding process,
PACKET CHANNEL REQUEST is successful, a temporary block flow which will be briefly explained in the following.
Convolutional coding
Puncturing
As an example we choose coding scheme CS2. First a signature response (SRES) and transmits it back to
of all, the 271 information bits of an RLC/MAC block the SGSN. If the mobile station’s SRES is equal to the
(268 bits plus 3 bits USF) are mapped to 287 bits using a SRES calculated (or maintained) by the SGSN, the user is
systematic block encoder; thus 16 parity bits are added. authenticated and is allowed to use GPRS services. If the
These parity bits are denoted as block-check sequence SGSN does not have authentication sets for a user (i.e., Kc,
(BCS). The USF pre-encoding maps the first 3 bits of random number, SRES), it requests them from the HLR.
the block (i.e., the USF) to 6 bits in a systematic way.
Afterward, 4 zero bits (tail bits) are added at the end of
7.2. Ciphering
the entire block. The tail bits are needed for termination of
the subsequent convolutional coding. For the convolutional The ciphering functionality is performed in the LLC layer
coding, a nonsystematic rate- 12 encoder with memory 4 is between MS and SGSN (see Fig. 3). Thus, the ciphering
used. This is the same encoder as used in conventional scope reaches from the MS all the way to the SGSN
GSM for full-rate speech coding (→ see GSM entry). At (and vice versa), whereas in conventional GSM the scope
the output of the convolutional encoder a codeword of is only between MS and BTS/BSC. The standard GSM
length 588 bits results. Afterward, 132 bits are punctured algorithm generates the key Kc from the key Ki and a
(deleted), resulting in a radio block of length 456 bits. random number. Kc is then used by the GPRS encryption
Thus, we obtain a code rate of the convolutional encoder algorithm for encryption of user data and signaling. The
(taking puncturing into account) of 23 . After encoding, key Kc that is handled by the SGSN is independent of
the codewords are finally fed into a block interleaver of the key Kc handled by the MSC for conventional GSM
depth 4. services. An MS may thus have more than one Kc key.
For the encoding of the traffic channels (PDTCH), one
of the four coding schemes is chosen, depending on the
quality of the signal. The two stealing flags in a normal 7.3. Confidentiality of User Identity
burst (→ GSM entry) are used to indicate which coding
scheme is applied. The signaling channels are encoded As in GSM, the identity of the subscriber (i.e., his/her
using CS1 (an exception is the PRACH). IMSI) is held confidential. This is done by using temporary
For CS1 a systematic ‘‘fire’’ code is used for block identities on the radio channel. In particular, a packet
coding and there is no precoding of the USF bits. The temporary mobile subscriber identity (P-TMSI) is assigned
convolutional coding is done with the known rate- 12 to each user by the SGSN. This address is valid and unique
encoder, however, this time the output sequence is not only in the service area of this SGSN. From the P-TMSI,
punctured. Using CS4, the 3 USF bits are mapped to a temporary logical link identity (TLLI) can be derived.
12 bits, and no convolutional coding is applied. We achieve The mapping between these temporary identities and the
a data rate of 21.4 kbps per time slot, and thus obtain a IMSI is stored only in the MS and in the SGSN.
theoretical maximum data rate of 171.2 kbps per TDMA
frame. In practice, multiple users share the time slots, and,
thus, a much lower bit rate is available to the individual BIOGRAPHIES
user. Moreover, the quality of the radio channel will not
always allow the use of CS4. The data rate available to Christian Bettstetter is a research and teaching staff
the user depends (among other things) on the current total member at the Institute of Communication Networks
traffic load in the cell (i.e., the number of users and their at Technische Universität München TUM, Germany.
traffic characteristics), the coding scheme used, and the He graduated from TUM in electrical engineering and
multislot class of the MS. information technology Dipl.-Ing. in 1998 and then joined
the Institute of Communication Networks, where is he
7. SECURITY ASPECTS working toward his Ph.D. degree. Christian’s interests are
in the area of mobile communication networks, where his
The security principles of GSM (→ GSM entry) have been current main research area is wireless ad-hoc networking.
extended for GPRS [9]. As in GSM, they protect against His interests also include 2G and 3G cellular systems
unauthorized use of services (by authentication and and protocols for a mobile Internet. He is coauthor
service request validation), provide data confidentiality of the book GSM–Switching, Services and Protocols
(using ciphering), and keep the subscriber identity (Wiley/Teubner) and a number of articles in journals,
confidential. The standard GSM algorithms are employed books, and conferences.
to generate security data. Moreover, the two keys known
from GSM, the subscriber authentication key (Ki) and the Christian Hartmann studied electrical engineering at
cipher key (Kc), are used. The main difference is that the University of Karlsruhe (TH), Germany, where he
not the MSC but the SGSN handles authentication. In received the Dipl.-Ing. degree in 1996. Since 1997, he
addition, a special GPRS ciphering algorithm has been has been with the Institute of Communication Networks
defined, which is optimized for encryption of packet data. at the Technische Universität München, Germany, as a
member of the research and teaching staff, pursuing a
7.1. User Authentication
doctoral degree. His main research interests are in the
In order to authenticate a user, the SGSN offers a random area of mobile and wireless networks including capacity
number to the MS. Using the key Ki, the MS calculates and performance evaluation, radio resource management,
876 GEOSYNCHRONOUS SATELLITE COMMUNICATIONS
employed to provide additional traffic-carrying capa- To maximize reliability, early geosynchronous satellites
bility and redundancy. The birth of these national were designed as simple repeaters. The signals are
and regional systems has resulted in many forms of transmitted from the ground and received by the satellite
new applications, including network television distribu- in one frequency band. They are frequency translated to
tion, private data networking, and rural telephony ser- another frequency band, then amplified and transmitted
vices. back to the ground terminals. This approach generally fits
In the late 1980s and early 1990s, satellite direct well with the peer-to-peer network architecture common
broadcast of television to homes emerged as cost for for the FSS. The communications link that ground
receiving equipment came down. These direct broadcast terminals transmit and the satellite receives is referred
satellites provide tens of millions of rural as well as to as uplink, and that satellite transmits and ground
urban and suburban consumers with access to a large terminals receive is referred to as downlink.
number of channels of television programming using For MSS, however, the mobile terminals generally
small, inexpensive satellite receivers within their coverage need to communicate with a PSTN through a gateway
areas. New geosynchronous satellite systems capable station, which is fixed and much more capable. Also,
of public switched telephone network (PSTN) access the MSS frequency bands are generally quite limited.
from consumer handheld mobile terminals have also Therefore, signal from the mobile terminals in the MSS
been developed. At this writing, two audio broadcast band is translated to a feeder link frequency, then sent
satellite systems are in the process of launching their down to the gateway stations. Similarly, signals from
services in the United States to provide digital compact- the gateway station are transmitted on a feeder link
disk (CD) quality audio programming to automobiles. A frequency to the satellite, and then amplified and send
new generation of onboard processing satellites is being to the MSS downlink. In this manner, the more valuable
developed for broadband Internet access. It is clear that MSS spectrum is more efficiently used.
new geosynchronous satellite systems and applications
continue to evolve well into the twenty-first century as the 3. SATELLITE ANTENNA BEAM PATTERNS
required technologies continue to become more available.
As the spectrum and orbital slots are fixed resources that
2. FREQUENCY BANDS AND ORBITAL SLOTS cannot expand with traffic increase, frequency reuse on
the same satellite is essential to increase the capacity of
Like all radio systems, bandwidth is a resource being each satellite. One way to accomplish it is to use spatial
shared by all communications. Coordination is required separation between multiple antenna beams.
to prevent systems from interfering one another. Fre- As a primary application for the INTELSAT series of
quency bands are allocated by the International Telecom- satellites is transoceanic trunking, it becomes obvious that
munications Union (ITU), based on the intended use. the same frequency band can be reused at both continents
Satellite bands are typically designated as Fixed Satel- across the ocean with two separate, focused beams. The
lite Services (FSS), Mobile Satellite Services (MSS), and uplink from the one beam is connected to the downlink
Broadcast Satellite Services (BSS). Actual assignment to the other beam and vice versa. This leads to the use
of frequency bands may vary, however, from country to of so called ‘‘hemibeam’’ and ‘‘spot beams’’. Most of the
country because of nationalistic considerations as well as modern INTELSAT satellites include a mix of ‘‘global
requirements to protect legacy systems. As device technol- beam,’’ hemibeams, and spot beams. Hemibeams and spot
ogy for the lower frequency bands are more available than beams often use different polarization from the global
the higher frequencies, lower-frequency bands are often beam, allowing more frequency reuse.
more desirable and being used up first. Equipment for A narrower, more focused beam also provides higher
higher-frequency bands, however, can be constructed with gain, which helps reduce the size of the ground terminals.
much smaller physical dimension, due to their shorter Therefore, most national and regional satellite systems
wavelength. Higher-frequency bands are also less con- employ shaped antenna beams matching their intended
gested. The common frequency bands for geosynchronous service area. For example, the a CONUS beam covers the
satellite communications are C band and Ku band for FSS, lower 48 of the continental United States. Spot beams are
L band for MSS, Ku band for BSS. After decades of inten- often added to provide coverage of Hawaii and parts of
sive research and development, Ka-band technology is just Alaska in a number of domestic satellite systems.
becoming commercially viable at the turn of this century, As more spot beams are placed on a satellite,
and it has become the new frontier for intensive activities. interconnection between these antenna beams becomes a
For FSS and BSS, highly directional antennas are used significant problem. Beam interconnection evolves from
to transmit and receive signals at the ground terminal. a fixed cross-connect to an onboard RF switch, and
It is therefore possible to reuse the same frequency band eventually to an onboard baseband packet switch. (See
by other satellites a few degrees away. Orbital slots are Sections 10 and 11.)
allocated by the ITU along the geosynchronous ring around Multibeam satellite antennas are typically based either
the world. The spacing between satellites is 2◦ for FSS. It on offset multiple feeds or phased-array technology.
is extended to 9◦ for BSS to allow the use of smaller receive Higher antenna gain, however, implies larger physical
antennas. To accommodate the use of low-gain handheld dimensions. Given the limited launching vehicle alterna-
antennas, however, the MSS satellites need much wider tives, they are relatively easier to implement for higher-
orbit spacing. frequency bands. At lower frequencies, designs based on
878 GEOSYNCHRONOUS SATELLITE COMMUNICATIONS
foldable reflectors that are deployed after launch are often Because of the relatively higher bandwidth of television
used to get around the problem. signals, most of the analog television signals have been
transmitted with ‘‘full-transponder TV.’’ The baseband
video signal is frequency modulated (FM) to occupy a
4. THE SATELLITE TRANSPONDER significant portion of the satellite transponder bandwidth,
typically in the range of 24–30 MHz. The satellite
Most of the geosynchronous satellite systems assume a transponder is often operated at saturation, because
simple, wideband repeater architecture. The frequency the FM/TV is the only signal in the transponder. In
band is partitioned into sections of a reasonable band- the INTELSAT system, a ‘‘half-transponder TV’’ scheme
width, typically in the tens of megahertz (MHz). At the have also been used. In such case, there are two
receive antenna output, the individual sections are fil- video signals, each FM modulated (with overdeviation)
tered, frequency-converted and then amplified. The ampli- to ∼17.5–20 MHz, depending on the width of the
fied signal is then filtered, multiplexed, and transmitted to transponder. The satellite transponder is operated with
the ground via the transmit antenna. The repeater chain moderate output backoff, typically 2 dB. The half-
of such a section is often called a transponder. A typical transponder TV is a tradeoff between transponder
satellite may carry tens of transponders. The amplitude- bandwidth efficiency and signal quality. Since network
to-amplitude (AM/AM) and amplitude-to-phase (AM/PM) television distribution has been a large application
response of its power amplifier, and the overall amplitude segment for national and regional satellite systems, full-
and group delay responses of the filter and multiplexer transponder FM/TV are still widely used throughout the
chain usually characterize the behavior of a satellite world.
transponder. Solid-state power amplifiers are gradually Advances in solid-state power amplifier technology
replacing traveling-wave tube (TWT) amplifiers for low and use of digital transmission techniques have brought
and medium power satellites. For further discussion, see dramatic cost reduction to satellite communications in
Section 11. terms of both space and ground segments. The single-
Maximum power is delivered by a satellite transponder channel per carrier (SCPC) mode of transmission coupled
when its power amplifier is operated at saturation. The with power efficient forward error correction (FEC) coding
amplitude and phase response of the transponder is allows small ground terminals to transmit a single digital
highly nonlinear when the power amplifier is saturated. circuit at speeds from a few kilo bits per second (kbps)
Carriers of different frequencies may generate significant up to hundreds of kbps with very small antenna and
intermodulation as a result. Small signals may be power amplifier. Digital transmission with power efficient
suppressed by higher power signals when the transponder FEC coding also allows direct broadcasting satellites to
is operated in the nonlinear region. However, when a transmit tens of megabits per second of digitally encoded
satellite transponder is used to transmit a single wideband and multiplexed television signals to very small receive
carrier, such as a full-transponder analog television antennas with the satellite transponder operating at
(TV) or a high-speed digital bitstream, operating the very close to saturation. Technology advancements in
transponder at or close to saturation minimizes the digital modulation, FEC coding, satellite power amplifiers,
size of receiving antennas on the ground. When the and antennas have enabled millions of consumers to
transponder is used to send many signals with different watch direct satellite broadcasts of hundreds of television
center frequencies, however, the common practice is to programming and to Web-surf via the geosynchronous
operate the transponder with sufficient input backoff so satellites at speeds comparable to or higher than cable
that the amplifier is operating at its linear region to modems or digital subscriber lines (DSLs).
minimize the impairments due to amplifier nonlinearity.
6. MULTIPLE-ACCESS TECHNIQUES FOR
5. TRANSMISSION TECHNIQUES FOR GEOSYNCHRONOUS SATELLITES
GEOSYNCHRONOUS SATELLITES
With the exception of television broadcast and distribu-
As communications technology has evolved from applica- tion, most of the user applications do not require a single
tions to applications since the mid-1960s, the transmission ground terminal to use the entire capacity of a satellite
techniques for geosynchronous satellites also changed. transponder. To use the transponder resource effectively,
The most significant changes are certainly caused by the individual ground terminals may be assigned a differ-
conversion from analog to digital in all forms of commu- ent section of the frequency band within the transponder
nications. For example, early transoceanic links carried so that they can share the satellite transponder without
groups of frequency-division multiplexed (FDM) analog interfering with one another. Similarly, terminals may
telephony. The power amplifiers at the transmitting earth use the same frequency, but each is assigned a different
stations as well as the satellite transponder were required time slot periodically, so that only one terminal uses this
to operate with substantial backoff to ensure their linear- frequency at any given time instant. Sharing of the same
ity. This technique has very much been replaced by digital satellite transponder by frequency separation is known
trunking, for which a single high-speed digital carrier con- as frequency-division multiple access (FDMA), sharing by
taining hundreds of telephone circuits may be transmitted time separation is known as time-division multiple access
by the earth terminal. In this way the earth station power (TDMA). Generally, FDMA requires less transmit power
amplifier can be operated much more efficiently. for each ground terminal, but the satellite transponder
GEOSYNCHRONOUS SATELLITE COMMUNICATIONS 879
must be operated at the linear region, resulting in less is reproduced and subtracted from the received signal,
efficiency in the downlink. For TDMA, each ground sta- thus canceling the echo. Although long propagation delay
tion must send information in a small time slot at higher also introduces response delay in conversational speech,
speed and not transmitting during the rest of the time. subjective tests have demonstrated that ≤400-ms round-
Therefore, higher power is needed at the ground trans- trip delay is tolerable when echo cancelers are employed.
mitter, but the satellite transponder can be operated at The quality of conversational speech degrades consider-
saturation. When sending packetized data, TDMA also ably even with echo cancellation, however, if two hops of
has the advantage of inherently high burst rate. Many geosynchronous satellite links are in between.
practical systems use a combination of FDMA and TDMA The round-trip propagation delay must also be
to obtain the best engineering tradeoff. considered in designing data links using automatic repeat
The ground stations can also modulate their digital request (ARQ) for error recovery. For high-speed links, a
transmission using different code sequences with very low, very sizable amount of data can be transmitted during
or no cross-correlation, and transmit the resulting signal the two round-trip times it takes for an acknowledgment
at the same frequency. The signals can then be separated (ack) for the initial data packet to arrive at the transmit
at the receiver by correlating the received signal with the terminal. Well-known protocols such as Go-back-N can
same sequence. This technique is known as code-division perform poorly in this situation. The round-trip delay also
multiple access (CDMA). It is possible to combine CDMA adversely affects the Internet end-to-end Transmit Control
with TDMA, FDMA, or TDMA and FDMA. Protocol (TCP). The TCP transmission window is opened
To achieve higher efficiency, a ground terminal is gradually on successful transmission of successive packets.
assigned a transmission resource, such as a frequency, a The window is cut back quickly when a packet needs to
time slot, or a code sequence, only when they need it. This be retransmitted. The longer propagation delay can cause
is accomplished by setting aside a separate communication the TCP to operate at a very small transmission window
channel for the ground terminals to request the resources. most of the time. Since the end user devices generally are
The transmission resource is then allocated based on the unaware of the presence of a satellite link in between,
requests received from all the terminals. This general such issues are best resolved by performance enhancement
approach is known as demand assignment multiple access proxies (PEPs). PEP is a software proxy installed at the
(DAMA). Most DAMA systems use centralized control, ground terminal. The proxy at the transmitting terminal
and the resource assignments are sent to the terminals emulates the destination device for the transmitting end
via another separate channel. In some case, a distributed user, whereas the proxy at the receiving terminal emulates
control approach has also been implemented in the the source device for the receiving end user. The proxy
INTELSAT system, since all national gateway stations also implements a protocol optimized for the satellite link
in the INTELSAT network are considered to be equals. between the transmit and receive ground terminals. By
For distributed control, requests are received and used by breaking the communications into three segments, PEP
all ground terminals. They then execute the same demand is able to maximize the performance for the satellite link
assignment algorithm to reach identical assignments. No while maintaining the compatibility to the standard data
separate assignment messages need to be sent. DAMA communications protocols with the end user devices at
has been adopted by terrestrial cellular radio later, and is both ends.
now commonly known as part of the media access control
(MAC) protocol. 8. VERY SMALL APERTURE TERMINALS (VSAT) FOR
GEOSYNCHRONOUS SATELLITES
7. ROUND-TRIP PROPAGATION DELAY AND SERVICE Geosynchronous satellites were initially conceived as
QUALITY ‘‘repeaters in the sky.’’ Every ground terminal is capable
of communicating to any other ground terminals within a
Depending on the location of the ground terminals satellite’s coverage area on a peer-to-peer basis. This fully
with respect to the geosynchronous satellites, it takes connected mesh-shaped network is ideal for transoceanic
about 240–270 ms for the transmitted signal to prop- trunking in the INTELSAT system. When maritime
agate to a geosynchronous satellite and back down to communications via satellite emerged, it became clear
the receive ground terminal. This round-trip propagation that most of the communications are between individual
delay presents service quality issues for two-way commu- ship-terminals and their home country through their
nications. PSTN handsets are connected to the subscriber ‘‘home’’ coastal earth station. The INMARSAT system
lines via 4-wire–2-wire 2/4-wire hybrid. As a result of lim- is essentially made up of many star-shaped networks
ited isolation at the receiving end hybrid, an attenuated with coastal earth stations as their hubs. Ship-to-
echo of one’s own speech can be heard two round trips ship communications must be relayed through the hub.
later. Because of the long round-trip delay of the geosyn- By trading off the direct peer-to-peer communication
chronous satellites, the quality of conversational speech capability, total ground segment cost is drastically
can be degraded significantly by the echo. This problem is reduced. Since there are far more remote terminals
overcome by the introduction of echo cancelers. An echo than hub terminals, minimizing the remote terminal cost
canceler stores a digital copy of the speech at the transmit tends to minimize the overall system cost. Very small
end. It estimates the amount of the round-trip delay and aperture terminals (VSATs) exploit this property to create
the strength of the echo at the receive end. The prop- inexpensive private networks with geosynchronous FSS
erly delayed and attenuated replica of the stored signal satellites.
880 GEOSYNCHRONOUS SATELLITE COMMUNICATIONS
In private networks, the remote VSAT terminals beginning of the digital conversion, 64-kbps pulse-coded
typically need to send data at speeds up to low hundreds modulation (PCM) and 56-kbps PCM were used by the
of kbps (kilobits per second) to the hub. The remote-to- INTELSAT system. In the late 1980s, 32-kbps adaptive
hub communication is often referred to as ‘‘inroute’’. At differential PCM (ADPCM) combined with digital speech
these rates, reliable data communications can typically interpolation (DSI) was used to increase INTELSAT
be established via Ku-band FSS satellites using low-cost trunking capacity by a factor of four. DSI detects the
antennas with diameter less than a meter. Such antennas silence periods of conversational speech and transmits
have reasonable beamwidth, requiring no constant only the active periods of the speech. By statistically
tracking of the satellite position. Typically, VSATs use aggregating a large number of circuits, DSI provides
near-constant envelope digital modulation along with twofold capacity after taking into account the control
powerful FEC codes, and a small power amplifier operated overhead required.
at near saturation. With SCPC transmission, a large Most of the early national and regional FSS networks
number of remote terminals share a part or a whole use 64- or 56-kbps PCM. Newer networks have adopted
satellite transponder on a demand assigned TDMA/FDMA the ITU G-728 16-kbps voice coder and the G.729 8-kbps
basis. Demand-assigned TDMA/FDMA takes advantage voice coder. The lower bit rates not only increase the
of the low-duty factor, bursty nature of the data traffic, number of circuits carried by a satellite transponder, but
allowing a number of terminals to share a single frequency. also minimize the size of the remote terminal by reducing
To minimize the size of the remote terminal antenna the transmit power needed to support a fixed number of
and power amplifier, the hub terminal typically uses a voice circuits.
much larger antenna so that the downlink contributes INMARSAT used low-bit-rate voice coders such as 16
very little to the overall link noise. Signal from each and 9.6 kbps in their early digital terminals. It selected
individual remote terminal is demodulated and decoded a 4-kbps voice coder for their smallest terminals in
separately, and then routed to the appropriate terrestrial the early 1990s. A similar 4-kbps voice coder is also
network interfaces. used by a geosynchronous mobile satellite system that
For the ‘‘outroute,’’ the hub station typically time supports handheld cell-phone-like terminals. In fact,
multiplexes all the traffic toward the remote terminals into with its low-gain antennas, handheld, high-quality, voice
a single high-speed time-division multiplex (TDM) carrier, communications via satellite can only be made possible
and transmits it to the entire network of terminals via with the success of these very low bit rate voice coders.
either a part of the transponder at a different frequency, Digital television signals based on ADPCM were
or through a separate transponder. Similarly, because of demonstrated via satellites at 45 Mbps in the late
the much larger transmit antenna at the hub, the overall 1970s. 1.5-Mbps and 384-kbps coders have been used for
outroute link noise is dominated by the downlink to the teleconference applications in the late 1980s. Not until
remote terminal. The speed of the TDM data carrier is digital direct satellite broadcast services were launched in
typically in the range from submegabits per second to tens the early 1990s, has digital television coding been widely
of megabits to second, scaled according to network size. used. Based on enhanced Motion Pictures Experts Group
In the rare cases for which data must be exchanged (MPEG) or MPEG-2 standards, these video coders are
between two remote terminals, they are routed through capable of compressing broadcast quality television signals
the hub. The extra delay does not cause problem for to about 3–5 Mbps, depending on motion contents. The
most private network applications. But double hopping direct broadcast satellites typically time-multiplex up to
through the synchronous satellite creates too much delay 10 such signals into a single satellite transponder. Digital
for conversational telephony. Alternative full-mesh VSATs video compression is truly the underlining technology that
have also been developed for voice applications. In such makes satellite direct broadcast a commercial success.
private networks, a hub is responsible for interface to the
PSTN and demand assign control. The remote terminals 10. ONBOARD PROCESSING FOR GEOSYNCHRONOUS
typically utilize low-bit-rate voice coders such as the ITU SATELLITES
G.729 8-kbps voice coder. When a remote-to-remote call is
initiated, the hub assigns a pair of frequencies for the two Advanced geosynchronous satellites use multibeam anten-
remote terminals to transmit on. They in turn tune their nas to divide their coverage areas into cells like a cellular
transmitters and receivers to the respective frequencies network to increase their traffic carrying capacity. Similar
accordingly, and the circuit is established. The remote- to cellular networks, the frequency band is reused by cell
to-remote communications is possible because the SCPC clusters. The adjacent cells within a cluster use differ-
signals transmitted by these remotes are at lower bit rate ent frequency bands or different polarization to provide
than normally used for data communications. the needed isolation in addition to the spatial discrimina-
tion provided by the antenna pattern. As the number
9. DIGITAL VOICE AND TELEVISION FOR of antenna beams increases, cross-connection between
GEOSYNCHRONOUS SATELLITES uplink beams and downlink beams becomes increasingly
complicated. When the number of beams are more than a
As just demonstrated by the example of one-hop few tens on both the uplink and downlink, the task can
voice communications between VSATs, low-bit-rate voice no longer be handled with simple RF switches. Telephone
coders are instrumental for voice communications via switch technology has evolved since the 1970s from cross-
geosynchronous satellites using small terminals. At the bar to digital, and again from circuit-based ‘‘hard’’ switch
GEOSYNCHRONOUS SATELLITE COMMUNICATIONS 881
to packet-based ‘‘soft’’ switch. With each step of the evolu- The gateway will transmit, via a satellite transpon-
tion, the capacity of the switch has increased and the cost der, to multiple subscriber terminals (STs) on a single
reduced by orders of magnitude. Beam cross-connection on wideband TDM signal occupying one transponder band-
the latest high-capacity geosynchronous satellites requires width (e.g., 30 MHz.). This outbound signal will be in the
a digital switch solution. 6- or 14-GHz bands for C-band or K band systems, respec-
Onboard processing satellites typically demodulate the tively. Similarly, the gateway will receive FDM/TDMA
uplink in each antenna beam into digital signals. Data bursts from multiple STs via another satellite transpon-
packets are then FEC decoded, and put into an input der. As an example, the composite 30-MHz inbound signal
buffer. Based on the header information in each packet, could be 100 FDM channels, each 300 kHz wide, carrying
an onboard switch routes the packets to the output queue 200 ksymbol/sec QPSK modulated carrier time-shared by
of each individual downlink beam. Header information multiple users in a TDMA protocol.
may also provide traffic classification and quality of The gateway transmit traffic is converted to data blocks
service (QoS) requirements. The onboard packet switch (slots), CRC and FEC encoded, and multiplexed into
can also assign priority and exercise flow control in a TDM frame structure. The wideband TDM baseband
case of traffic congestion. For each downlink beam, data signal is applied to a QPSK modulator, which converts it
in the output queue are FEC-encoded, modulated, and to a fixed IF. The IF is up converted (U/C) to C or K band
upconverted to the RF frequency. They are then amplified and transmitted to the satellite as the outbound signal,
and transmitted. via the RF Subsystem, which includes the power amplifier
Onboard processing, in addition, offers the opportunity (HPA) and the antenna.
to optimize the uplink and downlink independently. The gateway inbound FDM/TDMA signal at the C or
Satellite transponders are most efficiently used when K band, which is composed of transmission from multiple
operated in TDM or TDMA mode at saturation. The STs, is received by the same antenna, amplified by
size of ground terminals is minimized, however, if the the LNA and downconverted (D/C) to IF. The IF is
satellite transponder is accessed via FDMA or CDMA. demodulated by the TDMA burst demodulators, each
With onboard processing, both links can be operated with operating on a separate 300-kHz channel in this example.
their most efficient access approach. When a large number Each demodulator decodes its traffic slot by slot which
of downlink beams are needed to cover an area with uneven is subsequently routed to appropriate destination by the
traffic load, a common practice is to switch a much smaller TDMA Rx controller.
number of active downlink transmitters to all the beams by The gateway transmit power requirements are high
phase-array technology. The dwell time on each individual since it has to transmit a wideband TDM signal. The high
beam can be directly proportional to the downlink traffic effective isotropic radiated power (EIRP) requirement is
to its corresponding cell, thus optimizing the utilization of adjusted by suitable choice of HPA power rating and the
the satellite resources. antenna gain. Typically, gateway antenna beamwidths are
Also, the bit error rate (BER) or the packet error rate narrow because of the HPA output power limitations, and
(PER) of the overall link equals to the sum of those incurred thus a tracking antenna is required at most Gateways
in the two links in a processing satellite, whereas the stations to track the satellite motion.
thermal noise and interference for the overall link is the
sum of those incurred in the two links with a classic ‘‘bent 11.2. Satellite Communications Payload
pipe’’ satellite. Since the slope of BER or PER versus noise Satellite communications payload is also very specific to
and interference is very steep when the link is protected applications. Conventional designs evolved from a simple
by the FEC, the overall link can be much better optimized bent pipe, global beam approach that acted as a simple
for an onboard processing satellite. repeater in the sky, to spot beam, regenerative repeaters
As a satellite system generally needs to accommodate with traffic switched between spots at baseband. There
different sizes of terminals, one of the challenges for cost- are far more advanced designs expected to operate in the
effective onboard processing is to implement the capability near future. The fundamental objective is to support as
of dynamically assigning demodulator and FEC decoder much simultaneously active traffic as possible within the
resources for different-sized uplinks. This is accomplished allocated bandwidth and the satellite power limitation.
with digital signal processing techniques that are scalable In general, noninterfering spot beams and polarization
within limits. are used to achieve frequency reuse. Spot beams with
high gain, also provide higher EIRP that may illuminate
11. SATELLITE COMMUNICATIONS SYSTEM EQUIPMENT areas with dense traffic, thus providing bandwidth reuse
and power efficiency. Further, hopping spot beams, at the
expense of complex control, provide further improvement
11.1. Gateway Architecture
in power efficiency. In this approach many spots are
There are several configurations of the satellite communi- arranged over the coverage area, but the available
cations payload. It would be difficult to develop a generic transmit power is dynamically assigned only to a small
architecture for the system since satellite payloads and sub-set of spots, at any given time. A mix of configurations
Gateways are designed based on the traffic requirements is also possible.
it is meant to serve. In Fig. 1, we illustrate a simplified In Fig. 2 we show a simple fixed spot beam payload
gateway architecture that supports digitized voice and system with two spot beams where traffic can be routed
data for TDMA application. between the East and West spots by a RF switch matrix.
UP-link
6 GHz for C-band system
14 GHz for Ku-band system
TX data
TX data
FEC QPSK
TDM encoder WB U/C BPF HPA
TX modulator IF-TX
control
Speech
Speech
Diplexer
Sampling OSC RF L.O &
Reference clock clock Polarizer
PLL
RX data
RX data
TDMA
RX FEC QPSK
burst D/C BPF LNA
control N decoder N
Speech demod
Speech
N demodulators Down-link
4 GHz for C-band system
12 GHz for Ku-band system
Receiver R M
U/C HPA F2′
D/C F2 F U
X
S
DPLXR W U/C HPA Fn′
Fn
I
West spot TX-W
T
C To diplexers TX
TX-E H
DPLXR F1 U/C HPA F1′
M RF
East spot A
Receiver T F2′ M
F2 U/C HPA
D/C R U
I X
X
U/C HPA Fn′
Fn
882
GEOSYNCHRONOUS SATELLITE COMMUNICATIONS 883
The switch configuration is programmable by the control edge of a satellite antenna beam is defined by −3 dB
equipment. The satellite payload is controlled from ground contour, then this loss will be 3 dB.
by a dedicated command and control link with a control Antenna Pointing Loss. The ground equipment
station. antenna boresight may not point exactly at the
Signals received from the ground are transmitted from satellite. To account for this pointing error, typically
either gateways or STs. The received signals are amplified 0.5 dB pointing loss is assumed.
by the LNA and demultiplexed by bandpass filters F. Free-Space Path Loss. The square-law propagation
The bandwidth of these filters sets the transponder loss factor normalized with the wavelength. This
bandwidth. In our example F1 could be assigned to definition simplifies calculation of received carrier
outbound transmission from a gateway, while F2 could power based on transmit EIRP propagation loss and
be assigned to Inbound transmissions from a group of STs. receive antenna gain. Loss = (4π d/λ)2 , where d is
The HPA for outbound may operate near saturation since the distance to the satellite, a function of elevation
there is only one (wideband TDMA) signal present. The angle. A value of 40,000 km is typically used.
HPA for Inbound traffic has to be backed off (typically
Atmospheric Loss. The atmospheric loss is due to the
output backoff of 4 dB) to support FDM/ TDMA carriers.
gaseous content of the atmosphere and higher layers,
The RF switch provides a simple return path to the
encountered during wave propagation even in clear-
same region or to a different region, based on the
sky conditions. Typical values range from 0.5 (C
switch configuration. Typically this configuration would
band) to 0.8 dB (Ka band).
support traffic between hotspots on two ends of the
coverage area for example New York and Los Angeles. Receive Flux Density. The receive flux density at the
Actual payload designs are more complicated than the satellite is a measure of received power per unit
example here. There could be a mix of CONUS beams area assuming isotropic radiation from the transmit
and many spot beams, and a mix of C- and K-band antenna. It can be obtained from the EIRP and
transponders all interconnected via the RF switch matrix the distance d (after accounting for other fixed
or matrices. It all depends on the system requirements losses such as pointing error, edge-of-beam loss, and
and payload cost. atmospheric loss but excluding free-space path loss).
(dBW)
Rx flux density
12. GEO SATELLITE SYSTEM LINK BUDGETS m2
EIRP
As in any radio communications system design, link = 10 log − losses (dB)
4π d2
budgets are needed to size the system components
to achieve performance objectives. For geosynchronous Typically the satellite repeater function is specified
satellites, the uplink and downlink microwave path losses in terms of the Rx flux density and transmit EIRP
should be overcome by the antenna gain and transmit at saturation. Actual EIRP is computed by first
power to overcome the inherent noise and interference in computing the incident flux density, hence, the
the receivers. This analysis provides the basis of designing backoff from the saturation point, then finding the
the hardware for the gateway and the ST, constrained by output backoff from saturation EIRP, based on the
the cost and data rate capability tradeoffs. The following power amplifier nonlinear gain characteristics.
terminology is used for the various parameters of a link
Receiver Noise Figure. The receiver noise figure (NF)
budget calculation.
is a measure of the thermal noise created in the
receive chain referenced to the antenna port (LNA
Transmit Power. Actual power delivered to the input). In satellite applications this can be assumed
antenna subsystem. If there are no losses between to come entirely from the LNA. Gateway LNA noise
the HPA and the antenna, this will be the power figures are very low (<1 dB), while satellite LNA NF
output of the HPA. could be >3 dB.
Antenna Gain. The actual gain of the Tx (transmitting) Equivalent Noise Temperature. The equivalent noise
antenna, i.e., directivity minus losses from illumi- temperature (Te ), a more practical parameter, is
nation efficiency and other reflector and/or radome directly related to the NF by the relationship
losses. The gain of a parabolic reflector fed by a feed- Te = (NF − 1)T0 , where T0 is ambient temperature
horn is given by G = 4π Aη/λ2 , where A is the planar (290 K is assumed for room temp).
area of the aperture, λ is the wavelength, and η is the Antenna Noise Temperature. The antenna noise
aperture efficiency of the antenna (typically 65%). temperature (Ta ) is the noise picked up by the
EIRP. Effective isotropic radiated power is the product antenna environment, which includes galactic,
of the antenna gain and transmit power (or the sum atmospheric and spillover noise. For a gateway
if these are in decibels). looking at the satellite, this parameter is only
Edge-of-Beam Loss. The satellite antenna boresight 40–60 K. For the satellite looking at the earth, this
points to one location on earth. The coverage area noise temperature could be >500 K.
is defined by constant power contours decreasing in System Noise Temperature (Ts ). This is the total
power as one moves away from the boresight. If the noise temperature at antenna port: Ts = Te + Ta .
884 GEOSYNCHRONOUS SATELLITE COMMUNICATIONS
Typically, system noise temperature of satellites is Available Fade Margin. Typically the link budget is set
around 1000 K. up for clear-sky conditions. The difference between
Receive G/T. This ratio establishes the figure of merit the received Ebi /(N0 + I0 ) and the threshold Ebi /N0
for the receiver subsystem. It is the ratio of Rx gives the fade margin. The fade margin reflects
antenna gain to the system noise temperature. the availability of the system during rain fades.
Received C/N0 . The carrier-to-noise density ratio is System availability as a function of rain statistics
computed directly from the G/T and EIRP after is beyond the scope of this paper. For the C band,
accounting for all the losses. It is independent of the a 1.5 dB margin is adequate to achieve reasonable
bit rate. (In fact, it gives an easy way to determine availability for typical locations.
what bit rates can be supported by the system given
the threshold Eb /N0 ). 12.1. Example of a Link Budget
A hypothetical system is chosen as an example to be
C/N0 (dB − Hz) = EIRP (dBW) + G/T (dB/K) used with an INTELSAT standard B, C-band (6/4-GHz)
gateway (GW) earth station. The outbound is a 20-Mbps
− losses (dB) + 228.6 dB/K/Hz
TDM, transmitted with antenna gain of 58 dBi and HPA
(Boltzmann) power 200 W. The G/T of the GW is 33.5 dB/K.
The satellite is a bent-pipe model with saturation
In a bent-pipe repeater model the uplink and flux density of −83 dBW/m2 corresponding to an EIRP
downlink system noise temperatures should be of 37 dBW, giving a gain of 120 dB/m2 at saturation
accounted for using the relationship or 125 dB/m2 in linear region, based on a standard
TWTA nonlinear transfer characteristics. The system
−1 −1 −1 noise temperature is 1000 K. The G/T is −6.0 dB, based
C C C on a gain of 24 dBi, which gives an 8◦ half-power CONUS
= +
N0 Nou Nod beam. The inbound traffic is FDM/TDMA with 250 kbps
information per channel, 120 channels spaced 300 khz
Received C/I0 . The carrier-to-interference density is apart using one transponder. The subscriber terminal (ST)
computed by finding the ratio of the received carrier contains a 2-W power amplifier and 1.8 m dish antenna.
power to the interference density. The interference Its receiver noise figure is 1.5 dB.
density is obtained by the interference power in the Threshold Eb /N0 of 3.0 dB (including 1 dB modem
channel divided by the channel bandwidth. In a implementation loss) and fade margin of 1.5 dB are
bent-pipe repeater model the uplink and downlink assumed in both directions. Other parameters are in the
interference should be accounted for by a similar link budgets shown in Tables 1 and 2 for outbound and
relationship as given above: inbound time-division multiplexing, respectively.
Table 1. Outbound Link Budget for 20-Mbps TDM Table 2. Inbound Link Budget for 250-kbps/s
FDMA/TDMA (120 Channels)
Units Clear Sky
Units Clear Sky
6-GHz Uplink
6 GHz Uplink
Outbound bit rate Mbps 20.0
Gateway antenna gain dB 58.0 Inbound bit rate Mbps 0.25
HPA Tx power dBW 23.0 ST Tx antenna gain dB 39.5
Transmit EIRP dBW 81.0 ST SSPA Tx power dBW 3.0
Uplink frequency GHz 6.0 Transmit EIRP dBW 42.5
Wavelength m 0.05 Uplink frequency GHz 6.0
Distance to satellite (max) km 40,000 Wavelength m 0.05
Free-space path loss at 6 GHz dB 200 Distance to satellite km 40,000
Atmospheric loss dB 0.5 Free-space path loss dB 200
Edge-of-beam loss dB 3.0 at 6 GHz
Gateway antenna pointing loss dB 0.5 Atmospheric loss dB 0.5
Satellite G/T (G = 24 dB; Ts = 1000 K) dB/K −6.0 Edge-of-beam loss dB 3.0
Carrier noise density (C/N0 )u dB–Hz 99.6 ST antenna pointing loss dB 0.5
Rx flux density (saturation = −83) (dBW)/m2 −86 Satellite G/T (G = 24 dB; dB/K −6.0
Ts = 1000 K)
4-GHz Downlink
Carrier noise density (C/N0 )u dB–Hz 61.1
Satellite EIRP (0.5 dB backoff) dBW 36.5 Rx flux density (saturation = −83) (dBW)/m2 −124.5
Downlink Frequency GHz 4.0
Wavelength m 0.075 4 GHz Downlink
Free-space path loss at 4 GHz dB 196.5
Downlink EIRP/ ST (linear) dBW 0.5
Atmospheric loss dB 0.5
Edge-of-beam loss dB 3.0 Downlink frequency GHz 4.0
ST antenna diameter m 1.8 Wavelength m 0.075
ST antenna Rx gain dB 36.0 Free-space path loss at 4 GHz dB 196.5
ST antenna pointing loss dB 0.5 Atmospheric loss dB 0.5
ST noise figure dB 1.5 Edge-of-beam loss dB 3.0
Equivalent noise temp (NF = 1.5 dB) K 119.6 Gateway antenna diameter m 15.2
Antenna noise temperature K 52.0 Gateway antenna Rx gain dB 54.5
System noise temperature dBK 22.3 GW antenna pointing loss dB 0.5
ST G/T dB/K 13.7 GW noise figure dB 1.0
Carrier noise density (C/N0 )d dB–Hz 78.2 Equivalent noise K 75.1
temperature (NF = 1.0 dB)
Outbound Overall Antenna noise temperature K 52.0
System noise temperature dBK 21.0
(C/N0 ) received dB–Hz 78.2 GW G/T dB/K 33.5
(C/I0 ) assumed dB–Hz 86.0 Carrier noise Density (C/N0 )d dB–Hz 62.0
C/(N0 + I0 ) dB–Hz 77.5
Eb /(N0 + I0 ) dB 4.5 Inbound Overall
Threshold Eb /N0 dB 3.0
Fade margin dB 1.5 (C/N0 ) Received dB–Hz 58.5
(C/I0 ) Assumed dB–Hz 86.0
C/(N0 + I0 ) dB–Hz 58.5
Eb /(N0 + I0 ) dB 4.5
COMSAT Laboratories, Maryland, where he worked on Threshold Eb /N0 dB 3.0
regenerative satellite transponder modem designs. He Fade margin dB 1.5
later joined MA-Com Linkabit, San Diego, California,
in April 1987, where he developed his expertise in the
areas of communications and signal processing. He joined
GOLAY CODES
Hughes Network Systems, Germantown, Maryland, in
1989 and has since worked on CDMA technology. Cur-
MARCUS GREFERATH
rently, he works at HNS as a senior Director, engineering,
San Diego State University
involved in R.F. and CDMA technology related activities. San Diego, California
He has been active in the TIA/EIA TR45.5 cdma2000 stan-
dards development, participating in the physical layer
and enhanced access procedures development. He chaired 1. INTRODUCTION
the TR45.5 cdma2000 enhanced access ad-hoc group. He
has authored/ coauthored several patents in the CDMA Among the various codes and code families that have been
physical layer and enhanced access procedures. In 1998, enjoying the attention of coding theorists, two sporadic
he received Hughes Electronics Patent Award and was examples of (extended) cyclic codes have continuously
the corecipient of the 1998 CDMA Technical Achievement attracted the interest of many scholars. These two codes,
Award. named after their discoverer M. Golay, have been known
886 GOLAY CODES
since the very first days of algebraic coding theory in the ring [x]/(xn − 1), and for a cyclic code C there exists
late 1940s. Their enduring role in contemporary coding a unique monic divisor g of xn − 1 in [x] such that
theory results from their simplicity, structural beauty, C = [x]g/(xn − 1).
depth of mathematical background, and connection to Given a linear code C, we define the dual code
other fields of discrete mathematics, such as finite
geometry and the theory of lattices. C⊥ := {x ∈ n | c · x = 0 for all c ∈ C}
This presentation is devoted to a description of the
basics about these codes. The reader should be aware and we call C self-dual, if C = C⊥ .
that the issues that we have chosen for a more detailed
discussion form only a narrow selection out of the material 2. THE GOLAY CODES AS (EXTENDED) QUADRATIC
that is available about these codes. So the article at hand RESIDUE CODES
does not claim in any way to be a comprehensive or
complete treatment of the issue. For the many aspects M. J. E. Golay (1902–1989) was a Swiss physicist known
that we are not discussing here, the reader is referred for his work in infrared spectroscopy, among other things.
to the Bibliography, and in particular to the treatise He was one of the founding fathers of coding theory,
by Conway and Sloane [6]. This article is organized as discovering the two binary Golay codes in 1949 [11] and the
follows. In Section 1 we will introduce the Golay codes ternary Golay codes in 1954. These codes, which have been
as (extended) quadratic residue codes. We will discuss called Golay codes since then, have been important not
their parameters and immediate properties. In Section 2 only because of theoretical but also practical reasons — the
we will address alternative descriptions of the Golay extended binary Golay code, for example, has frequently
codes. Besides a collection of nice generator matrices, we been applied in the U.S. space program, most notably with
will briefly mention the miracle octad generator (MOG) the Voyager I and II spacecraft that transmitted clear
construction and also Pasquier’s description of the binary color pictures of Jupiter and Saturn.
Golay code. Section 3 is devoted to the decoding of these Let p and q be prime numbers such that p is odd
codes. Among the various algorithms from the literature and q is a quadratic residue modulo p, and let ω be
we have picked a particular one due to M. Elia, which is a primitive pth root in a suitable extension field of q .
an algebraic decoder. Let QR := {i2 | i ∈ p× } denote the set of quadratic residues
We will then briefly come to the relationship of these modulo p and set NQR := p× \ QR. Both of these sets have
codes to finite geometry (design theory) and the theory size (p − 1)/2, the polynomials
of lattices. In particular, we will mention an important
lattice called the Leech lattice, which closely connected to
fQR = (x − ωi ) and fNQR = (x − ωi )
the binary Golay code, more precisely, with a 4 -linear
i∈QR i∈NQR
version of this code. This gives us a natural bridge to the
final discussion of the article, the role of the Golay codes have coefficients in q , and xp − 1 = fQR · fNQR · (x − 1).
in the investigation of ring-linear codes.
Example 1
1.1. Basic Notions
We define the Hamming distance dH on a finite set (a) For p = 23 and q = 2 and choosing a suitable
(usually a field or a ring) defined by primitive 23rd root in 211 , we obtain
0, x=y fQR = x11 + x9 + x7 + x6 + x5 + x + 1 and
dH : × → , (x, y) →
1, else
fNQR = x11 + x10 + x6 + x5 + x4 + x2 + 1.
This function is usually extended additively to the space
n for every n ∈ . A block code is a subset C of n , and its (b) For p = 11 and q = 3 and a suitable primitive 11th
Hamming minimum distance is root in 35 , we obtain
its extension by a parity check will be denoted by π and a set σ1 , . . . , σn of permutation on such that C is
G3 and is a [12,6,6] code. mapped to D under the bijection
The sphere packing bound says that if |F| = q and n → n , (c1 , . . . , cn ) → (σ1 (cπ (1)), . . . , σn (cπ (n)))
dmin (C) = 2t + 1, then
It has been proved by Pless, Goethals, and Delsarte that
t
the Golay codes are uniquely determined up to equivalence
n
|C| (q − 1)i ≤ qn by their parameters. This is the content of the following
i
i=0 theorem. Proofs can be found in the literature [8,21].
Codes meeting this bound with equality are called perfect Theorem 1
codes. Given their minimum distance, it is indeed easily
verified that G2 as well as G3 are perfect codes, since
(a) Every binary (23, 212 , 7) code is equivalent to the
Golay code G2 , and every binary (24, 212 , 8) code is
23 23 23 23
212 + + + = 223 and equivalent to G2 .
0 1 2 3
(b) Every ternary (11, 36 , 5) code is equivalent to the
11 11 11 Golay code G3 , and every ternary (12, 36 , 6) code is
36 + 2+ 4 = 311
0 1 2 equivalent to G3 .
therefore see that they are (up to equivalence) the binary entry in the first row. As ϕ(G) ∈ H we conclude that there
and ternary Golay codes. is an even number of zero entries in ϕ(G) contradicting
the fact that the parity of the first row of G is 1. Hence,
3.1. The MOG Construction this case cannot happen, and we know that the weight of
G must be at least 8.
We now discuss a construction of the binary Golay code
using the 4 -linear hexacode [6, Chap. 11].
3.2. Pasquier’s Construction
The hexacode is the [6,3,4] code H generated by
the matrix Pasquier observed [20] that the extended binary Golay
1 0 0 ω2 1 ω code can be obtained from a Reed-Solomon code over 8 .
0 1 0 1 1 1
In order to see this let
0 0 1 ω 1 ω2
tr: 8 → 2 , x → x + x2 + x4
Its Hamming weight enumerator is given by 1 + 45t4 +
18t6 , and it is clear that any word of the hexacode has 0, be the trace map and let α ∈ 8 be an element satisfying the
2, or 6 zeros. equation α 3 + α 2 + 1 = 0. It is easy to see that tr(α) = 1,
Let 4×6
2 denote the space of all 4 × 6 matrices with and that B := {α, α 2 , α 4 } forms a trace-orthogonal basis of
binary entries. As an 2 -vector space it can clearly be 8 over 2 . This means that
4×6
identified with 242 . We are going to isolate a subset of 2
that will in a natural way turn out to be equivalent to the 1, x = y
tr(xy) =
Golay code. 0, x = y
For this we first define a mapping ϕ: 24×6 → 64 , G →
(0, 1, ω, ω2 )G. Now define C to be the subset of all matrices for all x, y ∈ B.
G ∈ 4×6
2 having the following two properties: Now consider the [7,4,4] Reed–Solomon code generated
by the polynomial 3i=1 (x + α i ). Its extension by a parity
4 check yields the self-dual [8,4,5] extended Reed–Solomon
1. For every column j ∈ {1, . . . , 6} there holds gij = code generated by the matrix
i=1
6
1 α5 1 α6 0 0 0 α3
g1j ; thus, the parity of each column is that of the 0 1 α5 1 α6 0 0 α3
R :=
0 0
j=1
1 α5 1 α6 0 α3
first row of G.
0 0 0 1 α5 1 α6 α 3
2. ϕ(G) ∈ H.
We now consider the 2 -linear mapping
As both of these conditions are preserved under addition
of matrices that satisfy these conditions, we have C to be a 8 → 32 , a0 α + a2 α 2 + a1 α 4 → (a0 , a1 , a2 )
subspace of 4×6
2 . The first condition imposes 6 restrictions
on the matrix, and the second (at most) another 6. For this and extend it componentwise to an 2 -linear mapping
reason dim(C) ≥ 24 − (6 + 6) = 12. To find that this code ϕ: 88 → 24
2 . The image of the extended Reed–Solomon
is equivalent to the Golay code, we only have to check if its code described above is clearly a binary [24,12] code,
minimum weight is given by at least 8, because this forces because the vectors {αv, α 2 v, α 4 v} are independent over
its dimension down to 12 and hence makes it a [24,12,8] 2 for each of the rows v of the preceding matrix R. Hence
code, and we are finished by Theorem 1. this binary image is a [24,12] code.
Let G be a nonzero element of C. First, assume that Observing tr(xy) = ϕ(x)ϕ(x) for all vectors x, y ∈ 88 ,
6
we see that our [24,12] code is self-dual because the
g1j = 0, which by the preceding description means that
Reed–Solomon code that we started with is so. Even more
j=1
is true: this code is ‘‘doubly even,’’ which means that the
4
gij = 0 for all j ∈ {1, . . . , 6}. If ϕ(G) = 0, then ϕ(G) has Hamming weight of its vectors is always a multiple of
i=1
4. We can easily check this by finding a basis consisting
at least 4 nonzero entries, and hence G must have at of doubly even words and keeping in mind that under
least 4 nonzero columns. As each of these columns has self-duality this property inherits to linear combinations
an even number of ones, the weight of G is at least 8. If of vectors that have this property. Hence the minimum
ϕ(G) = 0 then each column of G must be the all-zero or weight of the above code is a multiple of 4. However
the all-one vector because 0, 1, ω and ω2 can be combined the underlying Reed–Solomon code has already minimum
to zero under even parity only in these two ways. Hence, weight 5, and so our code must have minimum weight ≥ 8.
again the weight of G is at least 8 because we have an Finally we can apply Theorem 1 and find that it is the
even number of nonzero columns. binary Golay code.
Now assume that the parity of the first row of G is 1.
Then each column of G has either 1 or 3 nonzero entries. Its Remark 1. It is worth noting that Goldberg [12] has
weight will therefore be at least 8 unless each column has constructed the ternary Golay code as an image of an 9 -
exactly 1 nonzero entry. Exactly the zero-entries of ϕ(G) linear [6,3,4] code in a similar manner. Again an obvious
correspond to those columns of G having their nonzero mapping between 69 and 12 3 is used to map the [6,3,4] code
GOLAY CODES 889
into a [12,6] code, which finally turns out to be equivalent Case 1 is easily recognized by the fact that here s1 =
to the ternary Golay code. s3 = s9 = 0. Case 2 occurs exactly if s31 = s3 and s33 = s9 ,
which is easily verified considering the above mentioned
definitions. For the remaining cases we observe first
4. DECODING THE GOLAY CODES
that s31 = s3 . Furthermore we still have σ1 = s1 by
definition. Setting
There are various decoders for the (cyclic) binary and
ternary Golay codes; the most common are probably that s91 + s9
by Kasami and the systematic search decoder (both have D = (s31 + s3 )2 +
s31 + s3
been nicely described in Ref. 16). Focusing on the binary
Golay code G2 , we prefer to discuss an algebraic decoder
and verifying that D = (σ2 + s21 )3 , we easily get
that has been developed by M. Elia [9]. Even though this
decoder is not the fastest known, it is of interest because √
3
√
3
σ2 = s21 + D and σ3 = s3 + s1 D
it makes use of the algebraic structure of the code in
question. Furthermore in a modified form it has been
where in the finite field 211 the (unique) third root of D
used in order to decode ring-linear versions of the Golay
can is also computed as D1365 .
code [14].
All in all, the knowledge of s1 , s3 , and s9 can be used to
compute the error locator polynomial L (z) and by finding
4.1. Decoding the Binary [23,12,7] Code
its roots, we are able to solve for e and f.
Let α ∈ 211 be a root of the generator polynomial
g = x11 + x9 + x7 + x6 + x5 + x + 1 of G2 . It clearly satisfies Remark 1. It is worth noting that Elia and Viterbo
α 23 = 1, and its associated cyclotomic coset is given by have developed an algebraic decoder also for G3 [10]. This
B = {1, 2, 4, 8, 16, 9, 18, 13, 3, 6, 12}. This shows that g has decoder works in a similar fashion and makes use of two
roots α, α 3 and α 9 . syndromes. Even though it has to consider error values
As we are dealing with a cyclic code, we will represent in addition to error locations, it treats the entire problem
its words by polynomials in the sequel. Assume the word just by determining one polynomial.
r = fg + e has been received, where the Hamming weight
of e is at most 3. We compute the syndromes 5. GOLAY CODES AND FINITE
GEOMETRY — AUTOMORPHISM GROUPS
s1 : = r(α) = e(α), s3 : = r(α ) = e(α ), and
3 3
Corollary 1. The words of weight 7 of the binary Golay codewords than any known linear code of the same length
code hold a 4 − (23, 7, 1) design. and minimum distance.
Defining the Lee weight on 4 as wLee : 4 → , r →
For the extended binary Golay code, there is another min{|r|, |4 − r|}, we obtain what is called the Gray isometry
interesting result. of 4 into 22 as
6. GOLAY CODES AND RING-LINEAR CODES where the asterisks represent some binary entries.
From this matrix it can be seen that there are 32 = 212−7
One very important observation in algebraic coding codewords that have zeros in their first 8 positions.
theory in the early 1990s was the discovery of the Furthermore for every i ∈ {1, . . . , 7} there are 32 codewords
4 -linearity of the Preparata codes, the Kerdock codes, and that have a 1 in the ith and 8th positions. We define N
related families [15]. These had previously been known to be the (nonlinear) binary code to consist of the union
as notoriously nonlinear binary codes that had more of all these 8 · 32 words, where we have cut off the first 8
GOLAY CODES 891
entries. This is certainly a code of length 16. To determine Specifically, the normalized homogeneous weight
its minimum distance we observe that any word in this (defined in [4]) on 8 as
code results from truncation of a word of the Golay code
starting with either (0, . . . , 1 . . . , 0, 1) or (0, . . . , 0). Any 0, r=0
two words of these words therefore differ in at most two whom : 8 → , r → 2, r=4
entries in the first 8 positions, and since the minimum 1, else
distance of the Golay code is 8, the initially given words
must differ in at least 6 positions. Applying the sphere is such a weight. The according Gray map of this ring into
packing bound we see that the minimum distance is at 42 is given by
most 6, and overall this proves it to be given by 6.
γ : (8 , 2whom ) → (42 , wH )
Remark 3. In light of the statements at the beginning of r → r0 (0, 0, 1, 1) + r1 (0, 1, 0, 1) + r2 (1, 1, 1, 1)
this section we have questioned whether the binary Golay
code itself might enjoy a 4 -linear representation. This
where ri ∈ 2 are the coefficients of the binary represen-
has been answered to the negative in Ref. 15.
tation of r ∈ 8 , namely, r = r0 + 2r1 + 4r2 . Again we also
denote by γ its componentwise extension to n8 → 4n 2 .
6.2. The 4 -Linear Golay Code and the Leech Lattice Following the presentation in Ref. 7 Hensel lifting the
Bonnecaze et al. investigated 4 -linear lifts of binary generator polynomial of G2 to 8 [x] results in the poly-
codes in connection with lattices. To explain how this nomial x11 + 2x10 − x9 + 4x8 + 3x7 + 3x6 − x5 + 2x4 + 4x3 +
works for the binary Golay code, we first Hensel-lift its 4x2 + x − 1, which generates a free [23,12] code G8 over 8 .
generator polynomial f = x11 + x9 + x7 + x6 + x5 + x + 1 ∈ Extending the latter code by a parity check, we obtain a
2 [x] to the divisor free 8 -linear self-dual [24,12] code G8 .
Looking into Brouwer’s and Litsyn’s tables [3,17], it
F = x11 + 2x10 − x9 − x7 − x6 − x5 + 2x4 + x − 1 is remarkable that this code has more codewords than
does any presently known binary code of length 96 and
of x23 − 1 in 4 [x]. (Recall that according to Hensel’s minimum distance 24. Hence we have an outperforming
lemma [18, Sect. XIII.4] the polynomial F is the unique example of a nonlinear code that is constructed using the
polynomial that reduces to f modulo 2 and divides x23 − 1 binary Golay code.
in 4 [x].)
Extending the cyclic [23, 12] code G4 that is generated Remark 4. Using a similar technique it has been shown
by F, we obtain a [24,12] code G4 of minimal Lee weight that a non-linear ternary (36, 312 , 15) code can be
12. Defining the Euclidean weight on 4 as the squared constructed as the image of a 9 -linear lift of the ternary
Lee weight and extending it additively, we obtain the Golay code. This code is not outperforming, but so far no
minimum Euclidean weight as 16. ternary (36, 312 ) codes with a better minimum distance is
The so-called construction A provides a means to known. The details can be found in Ref. 13.
construct a lattice
from G4 , by setting
:= ν −1 (G4 ) BIOGRAPHY
wave design, the Loran C precision navigation sys- At every point these two power spectra add to 20, as
tem, channel-measurement, optical time-domain reflec- required by (4). √
tometry [3], synchronization, spread-spectrum communi- It should be stressed that the bound of 2N on the
cations, and, recently, orthogonal frequency division mul- amplitude of a(z) on |z| = 1 is extremely low for any
tiplexing (OFDM) systems [4–7]. Initially, the properties bipolar sequence of length greater or equal to about 16.
of the pair were primarily exploited [1] in a two-channel One would not find such sequences by chance, and the
setting, and periodic GCPs have lately been proposed for complementary sequence/aperiodic correlation approach
two-sided channel-estimation, where the two sequences in is currently the only construction method known that
the pair form a preamble and postamble training sequence, tightly upper bounds these values for bipolar sequences.
respectively [8]. In recent years, the spectral spread prop- There is also the Rudin-Shapiro (RuS) construction [9],
erties of each individual sequence in the pair have also which appeared soon after Golay’s initial work, but the
been used. As an example of this, we briefly describe the RuS construction can be viewed as a basic recursive Golay
application of Golay sequences in OFDM. Here, given a construction technique. This construction is described
data sequence, a = (a0 , a1 , . . . , aN−1 ), the transmitted sig- in Section 2.2. Indeed, research on the uniformity of
nal sa (t) as a function of time t is essentially the real part polynomials on the unit circle has continued largely
of a discrete Fourier transform (DFT) of a: independently in the mathematical community for many
years [10–13], and this work indicates that sequences
N−1
with good AACFs and flat DFT spectra, or equivalently,
sa (t) = ai e2π j(if +f0 )t (5)
i=0
polynomials that are approximately uniform on |z| = 1, are
rather difficult to construct. For example, the celebrated
where f the frequency separation between adjacent conjecture of Littlewood on flat polynomials on the unit
subcarrier pairs and f0 is the base frequency. Notice circle is still open:
that |sa (t)| = |a(e2π jift )| where a(z) is the polynomial
corresponding to a. Thus, the power characteristics of the Conjecture 1 [12]. There exist a pair of constants C0 ,
OFDM signal can be studied by examining the behavior C1 and a series of degree N − 1 polynomials a(z) with ±1
of an associated polynomial on |z| = 1. In particular, if coefficients such that, as N → ∞,
a is a GCS, then we have that |sa (t)|2 ≤ 2N so that the
peak-to-mean envelope power ratio (PMEPR) of the signal √ √
C0 N ≤ |a(z)| ≤ C1 N, |z| = 1 (6)
is at most 2.0. Having such tightly bounded OFDM signals
eases amplifier specification at the OFDM transmitter. There is no known construction that produces polynomials
Let A = (A0 , A1 , . . . , AN −1 ) be the N -point oversampled satisfying the lower bound of Conjecture 1, and the
DFT of a, where N ≥ N, i.e., complementary sequence approach is the only one known
N−1 that gives polynomials satisfying the upper bound.
Ak = ai ωik = a(ωk ) 0 ≤ k < N ,
i=0
2. EXISTENCE AND CONSTRUCTION
where ω = e2π j/N is a complex N -th root of unity. For N
large, the values of the N -point oversampled DFT of a can 2.1. Necessary Conditions
be used to approximate the values a(z), |z| = 1, and thus
the complex OFDM signal in (5). As we shall see in the next section, GCPs are known to
exist for all lengths N = 2α 10β 26γ , α, β, γ ≥ 0, [14]. GCPs
Example 1. Let a = − + + − + − + + +−, b = − + + + are not known for any other lengths. Golay showed that
+ + + − −+, where ‘+’ and ‘−’ mean 1 and −1, respectively. the length N of a Golay sequence must be the sum
The AACFs of a and b are: of two squares (where one square may be 0) [2]. More
recently, it has been shown that GCPs of length N do not
ρa (k) = (10, −3, 0, −1, 0, 1, 2, −1, −2, 1),
exist if there is a prime p with p = 3(mod 4) such that
ρb (k) = (10, 3, 0, 1, 0, −1, −2, 1, 2, −1). p | N, [16]. This generalized earlier, weaker nonexistence
results. Therefore, the admissible lengths <100 are
It is evident that the AACFs of a and b sum to a δ-
function, as required by (2) and (3), so (a, b) is a GCP. The 1, 2, 4, 8, 10, 16, 20, 26, 32, 34∗ , 40, 50∗ , 52, 58∗ ,
absolute squared values of the 20-point oversampled DFT
of a and b are: 64, 68∗ , 74∗ , 80, 82∗
A = 10 · (0.40, 0.44, 0.15, 0.73, 1.85, 0.20, 1.05, Moreover, various computer searches have eliminated
1.67, 0.95, 1.96, 1.60, 1.96, 0.95, 1.67, lengths marked by ‘‘∗’’ in the preceding list. Therefore,
the lengths, N < 100, for which GCPs exist are:
1.05, 0.20, 1.85, 0.73, 0.15, 0.44)
B = 10 · (1.60, 1.56, 1.85, 1.27, 0.15, 1.80, 0.95, 1, 2, 4, 8, 10, 16, 20, 26, 32, 40, 52, 64, 80 (7)
Repeated application of Turyn’s construction, beginning The action of these symmetries on a GCP generates a set
with pairs of lengths 2, 10, and 26 given in Section 2.4, can of GCPs of size at most 64, which we call the conjugates of
be used to construct GCPs for all lengths N = 2α 10β 26γ , the original GCP. These symmetries can aid in computer
α, β, γ ≥ 0. searches for new GCPs.
We define a primitive GCP to be one that cannot
2.3. Direct Constructions be constructed from any shorter GCP by means of the
recursive constructions described in Section 2.2. Primitive
In Ref. 2, Golay gave a direct construction for GCPs
GCPs are only known to exist for lengths 2, 10, and 26.
of length N = 2m . Reference [4] gave a particularly
There is one primitive pair for lengths 2 and 26, and
compact description of this construction by using algebraic
two primitive pairs for length 10, up to equivalence via
normal forms (ANFs). With a Boolean function a(x) =
the above symmetry operations. These primitive pairs are
a(x0 , x1 , . . . , xm−1 ) in m variables, we associate a length as follows:
2m sequence a = (a0 , a1 , . . . , a2m −1 ), where
(++, +−);
m−1
ai = (−1)a(i0 ,i1 ,...,im−1 ) , i= ik 2k (−++−+−+++−, −++++++−−+);
k=0
(+−+−++++−−, ++++−++−−+);
Thus the i-th term of the sequence a is obtained by
(+−++−−+−−−−+−+−−−−++−−−+−+,
evaluating the function a at the 2-adic decomposition of
i. Then Ref. 4 showed that, for any permutation π of −+−−++−++++−−−−−−−++−−−+−+).
GOLAY COMPLEMENTARY SEQUENCES 895
Golay points out that the two GCPs of length 10 are in Ref. 18. Let a be any length N sequence. Then the merit
equivalent under decimation [2]. Specifically, the second factor of a is defined to be
pair of length 10 above is obtained from the first pair
by taking successive 3rd sequence elements, cyclically. N2
F(a) = (13)
There is no proof that more primitive pairs cannot exist
N−1
for N > 100, but none have been discovered for 40 years 2 |ρa (k)| 2
Table 1. The Number of Golay Complementary Pairs for All Lengths, N < 100
N 1 2 4 8 10 16 20 26 32 40 52 64 80
#GCPs [17] 4 8 32 192 128 1536 1088 64 15360 9728 512 184320 102912
896 GOLAY COMPLEMENTARY SEQUENCES
m-sequences for application as pseudo-noise sequences 2048-pt power DFT for length 127 m -sequence
due to their superior aperiodic spectral properties [22]. 3
Also, there are more Golay sequences than m-sequences.
Figure 1 lends some support for this view, where the 2.5
Fourier spectra (from left to right) of a length 127 m-
sequence and a length 127 shifted-Legendre sequence are
2
compared with that of a length 128 RuS sequence.
Budisin [22] proposed a highly efficient method to
perform correlation of an incoming data stream with a 1.5
Golay sequence of length N, which achieves a complexity
of 2 log2 (N) operations per sample, as opposed to N 1
operations per sample for direct correlation. To do this,
he interpreted the Golay construction for length N = 2m 0.5
sequences using delay to implement concatenation, i.e.,
a | b can be implemented as a[k] + b[k + D], where [k]
0
indicates the starting time of a and D is the length 0 200 400 600 800 1000 1200 1400 1600 1800 2000
(duration) of a. Implementing the recursion of (8) is then
achieved by serially combining delay elements, Di , of 2048-pt power DFT for length 127 legendre sequence
duration 2i , as shown in Fig. 2. Now we commence our 3
Rudin–Shapiro recursion with a = b = 1. In other words,
we input the δ function to the left-hand side of Fig. 2, and 2.5
output our GCP, (a , b ), on the right. So the pair of Golay
sequences realized by Fig. 2 are two impulse responses. We
2
can therefore reinterpret Fig. 2 as a filter that correlates a
received sequence, input from the left, with the reversals
of a and b. By choosing the ωi in Fig. 2 from {1, −1}, we 1.5
can choose to correlate with different length 2m GCPs.
1
1
In terms of polynomials, this is equivalent to
0.5
T
|ai (z)| = TN,
2
|z| = 1 (17)
i=1 0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Figure 1. Power spectra for length 127 m-sequence, length 127
Thus, all the DFT components √of a sequence a that lies in shifted-Legendre, and length 128 Rudin-Shapiro sequences,
a T-CSS are of size at most TN. Of course, a 2-CSS is (power on y-axis, spectral index on x-axis).
just a Golay complementary pair.
To date, little work has been done to formally establish
a a´
primitivity conditions for T-CSS, T > 2, although Golay d + + +
already found 4-CSS in Ref. 1. It can be shown that CSS
only occur for T even, and Turyn showed that 4-CSS are
D0 − D1 − D2 −
only admissible at lengths N if N is a sum of at most three b w0 w1 w2 b´
squares [14]. Dokovic later showed that 4-CSS exist for all
even N < 66 [24]. Tseng and Liu [23] showed that for a Figure 2. Fast Golay correlator.
GOLAY COMPLEMENTARY SEQUENCES 897
CSS of odd length N, T must be a multiple of 4. By way of permutations of O and as point-multiplication of rows of
example, here are 4-CSS of lengths 3, 5, 7: O by a constant vector (these operations maintain the
orthogonality of O).
{+++, −++, +−+, ++−} The following theorem combines T-CSS of different
lengths to build T -CSS, where T > T.
{+−−−−, −++−+, +−−−+, −−−+−}
{+++−+++, +−+++−−, +−−+−++, ++−+−−−} Theorem 5 [25]. Suppose there exist T0 -CSS, T1 -CSS,
. . ., Tt−1 -CSS, of lengths N0 , N1 , . . . , Nt−1 , respectively.
Thus, 4-CSS can exist at lengths N where 2-CSS cannot. Let T = lcm(T0 , T1 , . . . , Tt−1 ). Suppose there also exists
Turyn presented constructions for 4-CSS for all odd an S × T orthogonal matrix with ±1 entries. Then there
lengths N ≤ 33, and N = 59 [14]. exists an ST-CSS of length N = N0 + N1 + . . . + Nt−1 .
An orthogonal matrix is defined as a matrix whose
columns are pairwise orthogonal. The following theorem As with GCPs, CSS also have applications to OFDM:
is straightforward. a primary drawback with the proposal to use the set of
length 2m GCPs as a codeset for OFDM is that the code
Theorem 3 [23]. Let P be a T × N orthogonal matrix, rate of the set rapidly decreases as m increases. To obtain
N ≤ T. Then the rows of P form a T-CSS of length N. a larger codeset, one can consider sequences that lie in
T-CSS for some T > 2. The resulting codeset will have
The primitive GCP (++, +−), is an example of PMEPR at most T.
Theorem 3. When the elements of P are ±1 and N = T,
then P is a Hadamard matrix, so a subset of T-CSS is 5.3. Complementarity with Respect to a Larger Set of
provided by the set of Hadamard matrices. Transforms
Although CS pairs and, more generally, CSS, are usually
5.2. Symmetries and Constructions defined to be complementary with respect to their AACFs
The symmetries and constructions for GCPs given in (with a corresponding property on power spectra under the
Sections 2.2 and 2.3 generalize to give constructions for T- one-dimensional DFT), one can more generally define and
CSS. One can also construct T -CSS by combining T-CSS, discover sets that are complementary with respect to any
T > T. As one example, we have: specified transform. It can be shown, Ref. 26, that Golay
CSS of length 2m , as constructed using Theorem 5, have a
• Let (u, v) and (x, y) be GCPs of length N0 and N1 , very strong property:
respectively. Then, (a, b, c, d) is a 4-CSS of length
N0 + N1 , where Theorem 6. Let U be a 2 × 2 complex-valued matrix
such that UU† = 2I, where I is the 2 × 2 identity matrix,
a = (u | x), b = (u | −x), c = (v | y), d = (v | −y) ‘†’ means transpose-conjugate, and the elements of U, uij ,
satisfy |u00 | = |u01 | = |u10 | = |u11 |. Let {Uk , 0 ≤ k < m} be
A fundamental recursive construction generalizing a set of any m of these matrices. Define M = U0 ⊗ U1 ⊗
Golay’s recursive constructions and relating CSS to · · · ⊗ Um−1 . Let N = 2m and let (a0 , a1 , . . . , aT−1 ) be any
orthogonal matrices is given in Ref. 23. T-CSS of length N constructed by Theorem 5. Finally, let
AiM = Mai be the N-point spectrum of a with respect to
Theorem 4 [23]. Let (a0 , a1 , . . . , aT−1 ) be a T-CSS of Ma with elements, AM k,i , 0 ≤ k < N. Then:
length N, represented by a T × N matrix, {F}, with
rows aj . Let O = (oik ) be an S × T orthogonal matrix (so
T−1
spectrum, (see, e.g., Ref. 27). It is known that for m 60,61,63–66,72,73,75,78,80,81,90,96,97,100 Polyphase 4-
even, length 2m GCPs are bent, that is, have a completely CSS: All lengths except 71,89 T > 4: All lengths.
flat WHT spectrum [28]. Reference [26] shows that Golay A recent exhaustive search [29] found all quadriphase
constructions generate a large set of GCPs and CSS that Golay pairs up to length N = 13. These are summarized
also have a relatively flat spectrum with respect to the below where only those for which no construction is known
WHT, among other transforms. So Golay CSS may have have been counted. These are the possible primitive pairs.
applications to cryptography. The figures also omit the GCPs that are a subset of
quadriphase pairs:
5.4. CSS Mates Golay pairs over the alphabet {0, 1, −1} have been found
for all lengths, N. For such a set, the weight of the set
Let A = (a0 , a1 , . . . , aT−1 ) and B = (b0 , b1 , . . . , bT−1 ) be two
becomes an important extra parameter. The weight W is
T-CSS. Then A and B are called ‘‘mates’’ if ai and bi are
the sum of the in-phase
AACF coefficients, i.e., for a set
orthogonal as vectors for each i. We say that A and B are
(aj ), we have W = ρ0 (aj ). For example, here is a 4-CSS
‘‘mutually orthogonal CSS’’ (although, in general, ai is not
j
orthogonal to bj , i = j). Sets (A0 , A1 , . . . , AU−1 ) of pairwise of weight 7 and length 7 over the alphabet {0, 1, −1}:
mutually orthogonal CSS can be recursively constructed
in a similar way to CSS [23]. {+000 + 00, 0 + 0 + 0 − 0, 00 + 0000, 000000+}
6. COMPLEMENTARY SEQUENCES OVER LARGER The larger W, the closer is the CSS to one over a bipolar
ALPHABETS alphabet.
Yet more CSS can be found by considering multilevel
Virtually all the symmetries and constructions mentioned and QAM alphabets.
so far for bipolar sequences can be generalized to sequences
over other alphabets, but note that autocorrelation is now 7. HADAMARD MATRICES FROM COMPLEMENTARY
modified to include conjugacy, i.e., Eq. (1) is modified to, SEQUENCES
N−k−1 In Section 5, we showed that Hadamard matrices can be
ρa (k) = ai a∗i+k , 0≤k≤N−1 (21) used to construct CSS. The converse is also true: CSS can
i=0 be used to construct Hadamard matrices. Here, we present
the two best-known constructions, where Theorems 7
where ∗ means ‘‘complex conjugate.’’ and 8 use the periodic complementary property of a
For quadriphase CSS, the symmetry operations gen- complementary set (see Section 8).
erate an equivalence class of up to 1,024 sequences [29].
For polyphase pairs, unlike the GCP case, there is no Theorem 7 [32]. Let (a, b) be a GCP of length N. Let A
restriction that the length must be the sum of two and B be N × N circulant matrices with first rows a and
squares. Sivaswamy and Frank investigated and dis- b, respectively. Then,
covered many polyphase CSS, including those of odd
length [30,31]. The simplest polyphase T-CSS of length A −B
is a 2N × 2N Hadamard matrix
T is formed from the rows of the T-point DFT matrix. BT AT
In fact, from Theorem 3, the rows of any T × T orthog-
onal polyphase matrix form a T-CSS of length T. Theorem 7 can be generalized to quadriphase Hadamard
Sivaswamy [30] identified a length 3 quadriphase prim- matrices by making (a, b) a quadriphase Golay pair and
itive pair, (002, 010) (where 0, 1, 2, 3 mean i0 , i1 , i2 , i3 , by substituting conjugation for transpose.
respectively), derived quadriphase versions of GCPs and
synthesized sequence pairs of lengths 3.2k . Frank [31] Theorem 8 [14,15]. Let (u, v) and (x, y) be GCPs
further presented the following primitive quadriphase of lengths N0 and N1 , respectively. Then, a = u | x,
Golay pairs, of lengths 5 and 13: (01321, 00013), and b = u | −x, c = v | y, and d = v | −y form a length N =
(0001200302031, 0122212003203). Note that the lengths N0 + N1 4-CSS. Let A, B, C, D be N × N circulant matrices
here, 5 and 13, are half the length of the lengths 10 and with first rows a, b, c, d, respectively. Let R be a back-
26 primitive bipolar GCPs, but no transform is known circulant N × N permutation matrix. Then,
between the sets.
A −BR −CR −DR
Davis and Jedwab [4] and Paterson [7] constructed
BR A −DT R CT R is a 4N × 4N
many CSS with phase alphabet 2h and any even phase Goethals–Seidel
CR DT R A −BT R
alphabet, respectively. To do so, they worked with (Hadamard) matrix.
nonbinary generalizations of the Reed–Muller codes. DR −CT R BT R A
The resulting sequences have application to OFDM with
nonbinary modulation. 8. PERIODIC AND ODD-PERIODIC (NEGAPERIODIC)
Many polyphase 3-CSS exist. For example, Frank [31] COMPLEMENTARY SEQUENCES
presented the triphase 3-CSS (01110,11210,00201) and
provided a (possibly nonexhaustive) list of lengths, N, Researchers have recently become interested in construct-
for which a CSS exists, N up to 100: Polyphase 3-CSS: ing periodic and/or odd-periodic CSS. Periodic CSS were
1–22,24–27,30,32,33,36,37,39–42,45,48,49,51–54,57,58, considered in Ref. 33.
GOLAY COMPLEMENTARY SEQUENCES 899
Using polynomial form, periodic autocorrelation 1995 to 1998 he was a postdoctoral researcher in the
(PACF) of the length N sequence a can be expressed as Telecommunications Research Group at the University
of Bradford, U.K., researching into coding for peak fac-
PACF(a(x)) = a(x)a(x−1 )xN−1 tor reduction in OFDM systems. Since 1998 he has been
working as a postdoctoral researcher with the Coding and
where ‘∗M ’ reduces ∗ mod M. Cryptography Group at the University of Bergen, Norway.
Similarly, negaperiodic (odd-periodic) autocorrelation He has published on residue number systems, number-
(NACF) can be expressed as theoretic transforms, complementary sequences, sequence
design, quantum entanglement, coding for peak power
NACF(a(x)) = a(x)a(x−1 )xN+1 reduction, factor graphs, linear cryptanalysis, and VLSI
implementation of Modular arithmetic.
There is a simple relationship relating aperiodic, periodic,
and odd-periodic AACF, as follows:
Let a(x) represent a length N sequence. Then, the Kenneth G. Paterson received a B.Sc. (Hons) degree in
AACF of a can be computed, via the Chinese remainder mathematics in 1990 from the University of Glasgow and
theorem, as a Ph.D. degree, also in mathematics, from the University
of London in 1993. He was a Royal Society Fellow
xN + 1 at Institute for Signal and Information Processing at
a(x)a(x−1 ) = a(x)a(x−1 )xN−1 the Swiss Federal Institute of Technology, Zurich, from
2
1993 to 1994, investigating algebraic properties of block
xN − 1
− a(x)a(x−1 )xN+1 (22) chipers. He was then a Lloyd’s of London Tercentenary
2 Foundation Fellow at Royal Holloway, University of
Equation (22) expresses AACF in terms of PACF and London from 1994 to 1996, working on digital signatures.
NACF. It follows that He joined the mathematics group at Hewlett-Packard
Laboratories Bristol in 1996, becoming project manager
of the Mathematics, Cryptography and Security Group in
• A T-CSS is also a periodic and a negaperiodic T-CSS.
1999. While at Hewlett-Packard, he worked on a wide
• A set of length T sequences is only a T-CSS if it is variety of pure and applied problems in cryptography and
both a periodic and negaperiodic T-CSS. communications theory. In 2001, he joined the Information
Security Group at Royal Holloway, University of London,
Periodic and negaperiodic CSS can be used instead
becoming a Reader in 2002. His areas of research
of, say, m-sequences, for their desirable correlation
are sequences, the mathematics of communications, and
and spectral properties. They are much easier to find
cryptography and cryptographic protocols.
than aperiodic CSS due to their algebraic structure
via embedding in a finite polynomial ring. Moreover,
in a search for aperiodic CSS, an initial sieve can be C. Tellambura received his B.Sc. degree with honors from
undertaken by first identifying periodic and negaperiodic the University of Moratuwa, Sri Lanka, in 1986; his M.Sc.
CSS and then looking for the intersection of the two sets. in electronics from the King’s College, U.K., in 1988; and
It is known that periodic GCP do not exist for lengths his Ph.D. in electrical engineering from the University
36 and 18, respectively. of Victoria, Canada, in 1993. He was a postdoctoral
We have the following theorem. research fellow with the University of Victoria and the
University of Bradford. Currently, he is a senior lecturer
Theorem 9 [34]. If a length N periodic GCP exists, such at Monash University, Australia. He is an editor for the
that N = p2l u, p = u, p prime, p = 3(mod 4), then u ≥ 2pl . IEEE Transactions on Communications and the IEEE
Journal on Selected Areas in Communications (Wireless
Dokovic [24] discovered a periodic GCP of length 34. Communications Series). His research interests include
This is significant because no aperiodic GCP exists at that coding, communications theory, modulation, equalization,
length. The next unresolved case for periodic GCPs is at and wireless communications.
length 50. References 33 and 24 also present many T-CSS
for T > 2.
Lüke [35] has found many odd-periodic GCPs for even BIBLIOGRAPHY
lengths N where an aperiodic GCP cannot exist, e.g.,
u+1
N = p 2 , p an odd prime, and also for all even N, N < 50. 1. M. J. E. Golay, Multislit spectroscopy, J. Opt. Soc. Amer. 39:
He did not find any odd-length pairs. 437–444 (1949).
2. M. J. E. Golay, Complementary series, IRE Trans. Inform.
BIOGRAPHIES Theory IT-7: 82–87 (1961).
3. M. Nazarathy et al., Jr., Real-time long range complementary
Matthew G. Parker received a B.Sc. in electrical and correlation optical time domain reflectometer, IEEE J.
electronic engineering in 1982 from University of Manch- Lightwave Technol. 7: 24–38 (1989).
ester Institute of Science and Technology, U.K. and, in 4. J. A. Davis and J. Jedwab, Peak-to-mean power control in
1995, a Ph.D. in residue and polynomial residue num- OFDM, Golay complementary sequences and Reed-Muller
ber systems from University of Huddersfield, U.K. From codes, IEEE Trans. Inform. Theory IT-45: 2397–2417 (1999).
900 GOLD SEQUENCES
5. R. D. J. van Nee, OFDM codes for peak-to-average power Symp. Inform. Theory, Lausanne, Switzerland, June 30–July
reduction and error correction, in IEEE Globecomm 1996 5, 2002.
(London, U.K., Nov. 1996), pp. 740–744. 27. M. Matsui, Linear cryptanalysis method for DES, in Advances
6. H. Ochiai and H. Imai, Block coding scheme based on in Cryptology — Eurocrypt93, Lecture Notes in Computer
complementary sequences for multicarrier signals, IEICE Science, Vol. 765, pp. 386–397, Springer, 1993.
Trans. Fundamentals E80-A: 2136–2143, (1997). 28. F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-
7. K. G. Paterson, Generalized Reed-Muller codes and power Correcting Codes, Amsterdam, North-Holland, 1977.
control in OFDM modulation, IEEE Trans. Inform. Theory 29. W. H. Holzmann and H. Kharaghani, A computer search for
IT-46: 104–120 (2000). complex Golay sequences, Aust. J. Comb. 10: 251–258 (1994).
8. P. Spasojevic and C. N. Georghiades, Complementary se- 30. R. Sivaswamy, Multiphase complementary codes, IEEE
quences for ISI channel estimation, IEEE Trans. Inform. Trans. Inform. Theory IT-24: 546–552 (1978).
Theory IT-47: 1145–1152 (2001).
31. R. L. Frank, Polyphase complementary codes, IEEE Trans.
9. H. S. Shapiro, Extremal Problems for Polynomials, M.S. Inform. Theory IT-26: 641–647 (1980).
Thesis, M.I.T., 1951.
32. C. H. Yang, On Hadamard matrices constructible by circulant
10. J. Beck, Flat polynomials on the unit circle — note on a submatrices, Math. Comp. 25: 181–186 (1971).
problem of Littlewood, Bull. London Math. Soc. 23: 269–277
33. L. Bomer and M. Antweiler, Periodic complementary binary
(1991).
sequences, IEEE Trans. Inform. Theory IT-36: 1487–1494
11. J. Kahane, Sur les polynomes à coefficients unimodulaires, (1990).
Bull. London Math. Soc. 12: 321–342 (1980).
34. K. T. Arasu and Q. Xiang, On the existence of periodic comple-
12. J. E. Littlewood, On polynomials ±zm , exp(αm )zm , mentary binary sequences, Designs, Codes and Cryptography
z = eiθ , J. London Math. Soc. 41: 367–376 (1966). 2: 257–262 (1992).
13. W. Rudin, Some theorems on Fourier coefficients, Proc. Amer. 35. H. D. Lüke, Binary odd-periodic complementary sequences,
Math. Soc. 10: 855–859 (1959). IEEE Trans. Inform. Theory IT-43: 365–367 (1997).
14. R. Turyn, Hadamard matrices, Baumert-Hall units, four-
symbol sequences, pulse compression, and surface wave
encodings, J. Comb. Theory Ser. A 16: 313–333 (1974). GOLD SEQUENCES
15. C. H. Yang, Hadamard matrices, finite sequences, and
polynomials defined on the unit circle, Math. Comp. 33: HABONG CHUNG
688–693 (1979). Hongik University
16. S. Eliahou, M. Kervaire, and B. Saffari, A new restriction on Seoul, Korea
the lengths of Golay complementary sequences, J. Comb.
Theory Ser. A 55: 49–59 (1990).
1. INTRODUCTION
17. P. Borwein and R. A. Ferguson, A complete description
of Golay pairs for lengths up to 100, in prepara-
In such applications as CDMA communications, ranging,
tion, preprint available [Online]. Simon Fraser University,
and synchronization, sequences having good correlation
http://www.cecm.sfu.ca/∼pborwein/ [May, 2002].
properties have played an important role in signal designs.
18. M. J. E. Golay, Sieves for low autocorrelation binary When a single sequence is used for the purpose of
sequences, IEEE Trans. Inform. Theory IT-23: 43–51 (1977). synchronization, it must possess a good autocorrelation
19. T. Høholdt, H. E. Jensen, and J. Justesen, Aperiodic correla- property so that it can be easily distinguished from
tions and merit factor of a class of binary sequences, IEEE its time-delayed versions. Similarly, when a set of
Trans. Inform. Theory IT-31: 549–552 (1985). sequences are used for CDMA communications systems or
20. T. Høholdt and H. E. Jensen, Determination of the merit multitarget ranging systems, the sequences should exhibit
factor of Legendre sequences, IEEE Trans. Inform. Theory good cross-correlation properties so that each sequence is
IT-34: 161–164 (1988). easy to distinguish from every other sequence in the set.
21. P. Borwein and M. Mossinghoff, Rudin-Shapiro like polyno- Many individual sequences as well as families of sequences
mials in L4 , Math. Comp. 69: 1157–1166 (2000). with desirable correlation properties have been found and
22. S. Z. Budisin, Efficient pulse compressor for Golay comple-
reported. The Gold sequence family [1] is one of the oldest
mentary sequences, Elec. Lett. 27: 219–220 (1991). and best-known families of binary sequences with optimal
correlation properties. This article focuses on the binary
23. C.-C. Tseng and C. L. Liu, Complementary sets of sequences,
Gold sequences and their relatives. After briefly reviewing
IEEE Trans. Inform. Theory IT-18: 644–651 (1972).
basic definitions, some of the well-known bounds on the
24. D. Z. Dokovic, Note on periodic complementary sets of binary correlation magnitude, and cross-correlation properties of
sequences, Designs, Codes and Cryptography 13: 251–256 m sequences in Sections 2 and 3, we discuss the Gold
(1998). sequences and Gold-like sequences in Section 4.
25. K. Feng, P. J.-S. Shiue, and Q. Xiang, On aperiodic and A longer and more detailed overview on the subject of
periodic complementary binary sequences, IEEE Trans. Inf. the well-correlated sequences in general can be found in
Theory 45: 296–303 (1999). Chapter 21 of the book by Pless and Huffman [11]. The
26. M. G. Parker and C. Tellambura, A construction for binary reader can also find the article by Sarwate and Pursley [3]
sequence sets with low peak-to-average power ratio, Int. to be an excellent survey.
GOLD SEQUENCES 901
n/m−1
m·i
The maximum of θa and θc is called the maximum Trnm (x) = xq
correlation parameter θmax of the set S: i=0
From Eq. (2), one can obtain a simple lower bound on θmax s(t) = Trn1 (aα t ) (7)
as
M−1 where a is some nonzero element in Fqn and α is a
θmax ≥ N (3)
NM − 1 primitive element of Fqn . Note that the m sequence in
(7) is not complex-valued. Its symbols are the elements of
Other than the bound in (3), various lower bounds on θmax the finite field Fq . When q is a prime, the natural way of
in terms of N and M are known. The following bound is converting this finite-field sequence s(t) into a complex-
due to Welch [8]. valued sequence is taking ωs(t) , where ω is the primitive
902 GOLD SEQUENCES
qth root of unity, ej2π/q . For example, an m sequence s(t) of the pair of m sequences s(t) and s(dt) is called a preferred
length 7 is given by pair. Note that either when n = 2m or n = 2m + 1, the
above three values become −1, −1 + 2m+1 , and −1 − 2m+1 .
s(t) = Tr31 (α t ) = 1001011 Theorem 3 can be applied to obtain a preferred pair as long
as n ≡ 0 (mod 4). In the case when n is odd, selecting k
when α is the primitive element of F8 having minimal relatively prime to n yields a preferred pair, and in the case
polynomial x3 + x + 1. This m sequence s(t) is easily when n ≡ 2 (mod 4), making e = 2 also yields a preferred
converted to its complex-valued counterpart: pair. When n ≡ 0 (mod 4), Calderbank and McGuire [12]
proved the nonexistence of a preferred pair.
(−1)s(t) = − + + − + − −
4. GOLD SEQUENCES AND GOLD-LIKE SEQUENCES
An m sequence possesses many desirable properties
such as balance property, run property, shift-and-add Consider the set G of (N + 2) sequences constructed from
property, and ideal autocorrelation property [15]. Given two binary sequences u(t) and v(t) of length N given
a finite field Fqn , there are φ(qn − 1)/n cyclically distinct m as follows:
sequences whose symbols are drawn from Fq . Each of them
corresponds to different primitive element α values with G = {u(t), v(t), u(t) + v(t + i) | i = 0, 1, . . . , N − 1} (10)
different minimal polynomials. Thus, in other words, each
of the cyclically distinct m sequences of given length can be Especially when both u(t) and v(t) are m sequences, the
viewed as the decimation s(dt) of a given m sequence s(t) cross-correlation function between any two members in the
by some d relatively prime to qn − 1. The cross-correlation set becomes either the cross-correlation function between
properties between an m sequence and its decimation are u(t) and v(t), or simply the autocorrelation function of m
very important, since many sequence families, including sequence u(t) or v(t), due to the shift-and-add property of
the Gold sequence family, having optimal cross-correlation m sequences. In the late 1960s, Gold used this method
properties, are constructed from pairs of m sequences. to construct the set called Gold sequences family. Gold
Now, consider the case of q = 2, that is, of binary m sequences family is defined as the set G when u(t) and
sequences. Without loss of generality, we can assume that v(t) are preferred pair of m sequences of length 2n − 1.
an m sequence s(t) is given by The cross-correlation values of the Gold sequence family
can be directly computed from Theorem 3. Applying the
s(t) = Trn1 (α t ) set construction method above with u(t) = Trn1 (α dt ) and
v(t) = Trn1 (α t ), the pair of m sequences in Theorem 3, one
Let θ1,d (τ ) be the cross-correlation function between s(t) can easily construct the set W (referred to here as the
and its d-decimation s(dt), where d is some integer sequences family) of size 2n + 1 as follows
relatively prime to 2n − 1:
W = {wi (t) | 0 ≤ i ≤ 2n }
N−1
θ1,d (τ ) = (−1)s(t+τ )+s(dt) (8) where
t=0 n (t+i)
Tr1 (α ) + Trn1 (α dt ), for 0 ≤ i ≤ 2n − 2
From previous research, the values θ1,d (τ ) have been wi (t) = Trn1 (α dt ), for i = 2n − 1
known for various d [11], although the complete evaluation
n t
Tr1 (α ), for i = 2n
of θ1,d (τ ) for each possible d is still ongoing. One well-
known result on θ1,d (τ ) is that θ1,d (τ ) takes on at least As mentioned in the previous section, when n is odd and
three distinct values as τ varies from 0 to 2n − 2, as e = 1, the two m sequences Trn1 (α t ) and Trn1 (α dt ) in W are
long as the decimation s(dt) is cyclically distinct to s(t). preferred pair, and in this case, the family W becomes the
One can obtain two examples of such decimation d from Gold sequence family. Then θmax is given by
the following theorem which is in part due to Gold [1],
Kasami [4], and Welch [10]. θmax = 2(n+1)/2 + 1
Theorem 3. Let e = gcd(n, k) and ne be odd. Let d = When we apply the Sidelnikov bound in Eq. (5) with k = 1,
2k + 1 or d = 22k − 2k + 1. Then the cross-correlation N = 2n − 1, and M = 2n + 1, we have θmax 2
> 2n+1 − 2,
θ1,d (τ ) of m sequence Trn1 (α t ) and its decimated sequence specifically, θmax ≥ 2(n+1)/2
. But, since N is odd, θmax must
Trn1 (α dt ) by d takes on the following three values: be odd. Therefore, the Sidelnikov bound tells us that
−1 + 2(n+e)/2 , 2n−e−1 + 2(n−e−2)/2 times θmax ≥ 2(n+1)/2 + 1 (11)
−1, 2n − 2n−e − 1 times (9)
−1 − 2(n+e)/2 , 2n−e−1 − 2(n−e−2)/2 times which, in turn, implies that the Gold sequence family is
optimal with respect to the Sidelnikov bound when n is
When θ1,d (τ ) takes on the following three values odd. On the other hand, the Gold sequence family in the
case when n ≡ 2 (mod 4) is not optimal, since the
√ actual
−1, −1 + 2(n+2)/2 , −1 − 2(n+2)/2 θmax in this case is 2(n/2)+1 , which is roughly 2 times
GOLD SEQUENCES 903
the bound in (11). Finally, when n ≡ 0 (mod 4), no Gold be a family of 2n + 1 binary sequences of length N = 2n − 1.
sequence family exists, since no preferred pair exists. Then the cross-correlation function of the sequences in N
The following example shows the Gold sequence family takes on the following four values:
of length 31.
{−1 + 2m+1 , −1 + 2m , −1, −1 − 2m }
Example 1: Gold Sequence of Length 31. There are
Note that compared to the Gold-like sequences in
32 sequences of length 31 in the set. By taking α as
Example 2, the Niho family has one more sequence in
the primitive element in F25 having minimal polynomial
the set and slightly smaller θmax .
x5 + x2 + 1, we have
Udaya [14] introduced the family of binary sequences
for even n with five-valued cross-correlation property as in
w32 (t) = Tr51 (α t ) = 1001011001111100011011101010000 the following definition and theorem.
The set construction method for the families Ge and Go of Southern California in 1985 and 1988, respectively.
is identical to that of the Gold sequence family. In other From 1988 to 1991, he was an Assistant Professor in
words, the sets Ge and Go are of the same type as G in the Department of Electrical and Computer Engineering,
Eq. (10). The difference is that in Ge and Go , the sequence the State University of New York at Buffalo. Since 1991,
u(t) is the sum of many m sequences, whereas it is a single he has been with the School of Electronic and Electrical
m sequence in the Gold sequence family. For this reason, Engineering, Hongik University, Seoul, Korea, where he is
the linear span of the sequences in the families Ge and Go a professor. His research interests include coding theory,
is much larger than that of the Gold sequences. combinatorics, and sequence design.
The Gold sequences and Gold-like sequences we have
reviewed are not optimal when n is even. Here, we end this BIBLIOGRAPHY
article by introducing two examples of optimal families in
the case when n is even. 1. R. Gold, Maximal recursive sequences with 3-valued recur-
sive cross-correlation functions, IEEE Trans. Inform. Theory
Example 3: Small Set of Kasami Sequences. Let 14: 154–156 (Jan. 1968).
n = 2m, m ≥ 2, α be a primitive element of F2n , d = 2m + 1
2. D. V. Sarwate, Bounds on cross-correlation and autocorre-
and the set lation of sequences, IEEE Trans. Inform. Theory IT-25:
720–724 (1979).
K = {sb (t) = Trn1 (α t ) + Trm
1 (bα ) | b ∈ F2m }
dt
3. D. V. Sarwate and M. B. Pursley, Crosscorrelation properties
of pseudorandom and related sequences, Proc. IEEE 68:
be a family of 2 binary sequences of length N = 2 − 1.
m n
593–619 (1980).
The cross-correlation function of the sequences in the
4. T. Kasami, Weight Distribution Formula for Some Class
family K takes on the following three values:
of Cyclic Codes, Technical Report R-285 (AD 632574),
Coordinated Science Laboratory, Univ. Illinois, Urbana, April
−1, −1 + 2m , −1 − 2m 1966.
The set K is called the small set of Kasami sequences. The 5. T. Kasami, Weight distribution of Bose-Chaudhuri-Hocqu-
enghem codes, in Combinatorial Mathematics and Its
Welch bound in (4) with k = 1 gives us
Applications, Univ. North Carolina Press, Chapel Hill, NC,
1969.
θmax > 2m − 1
6. J. S. No and P. V. Kumar, A new family of binary pseudo-
random sequences having optimal correlation properties and
when N = 2 − 1 and M = 2 . But, since θmax in this case
2m m
large linear span, IEEE Trans. Inform. Theory 35: 371–379
must be an odd integer, we have (March 1989).
17. J. S. No, K. Yang, H. Chung, and H. Y. Song, New construc- of the newly founded European Telecommunications
tion for families of binary sequences with optimal correlation Standards Institute (ETSI). The first set of GSM technical
properties, IEEE Trans. Inform. Theory 43: 1596–1602 (Sept. specifications was published in 1990, and in 1991 the first
1997). GSM networks started operation. After 2 years, more than
one million users made phone calls in GSM networks.
The GSM standard soon received recognition also outside
GSM DIGITAL CELLULAR COMMUNICATION Europe: At the end of 1993, networks were installed for
SYSTEM example in Australia, Hong Kong, and New Zealand. In
the following years the number of subscribers increased
CHRISTIAN BETTSTETTER rapidly, and GSM was deployed in many countries on all
CHRISTIAN HARTMANN continents. Figure 1 shows the development of the GSM
Technische Universität subscribers worldwide and the number of networks and
München countries on the air.
Institute of Communication The aim of this article is to give an overview of the
Networks technical aspects of a GSM network. We first explain the
Munich, Germany functionality of the GSM components and their interwork-
ing. Next, in Section 3, we describe the services that GSM
1. INTRODUCTION AND OVERVIEW offers to its subscribers. Section 4 explains how data is
transmitted over the radio interface (frequencies, modu-
The Global System for Mobile Communication (GSM) lation, channels, multiple access, coding). Section 5 cov-
is an international standard for wireless, cellular, ers networking-related topics, such as mobility manage-
digital telecommunication networks. GSM subscribers ment and handover. Section 6 discusses security-related
can use their mobile phones almost worldwide for high- aspects.
quality voice telephony and low-rate data applications.
International roaming and automatic handover functions 2. SYSTEM ARCHITECTURE
make GSM a system that supports seamless connectivity
and mobility. A GSM network consists of several components, whose
Work on GSM was started by the Groupe Spécial tasks, functions, and interfaces are defined in the
Mobile of the European CEPT (Conférence Européenne des standard [1]. Figure 2 shows the fundamental components
Administrations des Postes et des Télécommunications) of a typical GSM network. A mobile user carries a mobile
in 1982. The aim of this working group was to develop station (MS) that can communicate over the radio interface
and standardize a new pan-European mobile digital with a base transceiver station (BTS). The BTS contains
communication system to replace the multitude of transmitter and receiver equipment as well as a few
incompatible analog cellular systems existing at that time. components for signal and protocol processing. The radio
The acronym GSM was derived from the name of this range of a BTS in Fig. 2 forms one cell. In practice, a BTS
group; later it was changed to Global System for Mobile with sectorized antennas can supply several cells (typically
Communication. three). Also, transcoding and rate adaption of speech, error
In 1987 the prospective network operators and the protection coding, and link control are performed in the
national administrations signed a common memorandum BTS. The essential control and protocol functions reside in
of understanding, which confirmed their commitment to the base station controller (BSC). It handles the allocation
introducing the new system based on a comprehensive of radio channels, channel setup, frequency hopping, and
set of GSM guidelines. This was an important step management of handovers. Typically, one BSC controls
for international operation of the new system. In several BTSs. The BTSs and BSC together form a base
1989 the GSM group became a technical committee station subsystem (BSS).
450
GSM networks and countries on air
500 400
GSM subscribers in million
350
400
300
GSM networks on air
300 250
200
200 GSM countries
150
100
100
50
0 0
1993 1994 1995 1996 1997 1998 1999 2000 2001 1993 1994 1995 1996 1997 1998 1999 2000 2001
EIR
HLR
SS
#7
AUC
PSTN
ISDN
GMSC
VLR
MSC
MSC
BSC VLR
BTS BSC
BTS
MS BTS MS
MS
Figure 2. GSM system architecture.
The mobile switching center (MSC) performs the home network or with a VLR of a ‘‘foreign’’ network. On
switching functions needed to route a phone call toward a location update, the MSC forwards the identity of the
its destination user. Usually one MSC is allocated to user and his/her current location area to the VLR, which
several BSCs. In addition to the functionality known subsequently updates its database. If the user has not
from usual switches in fixed ISDN networks, it must been registered with this VLR before, the HLR is informed
also handle the mobility of users. Such functions include about the current VLR of the user.
the authentication of users, location updating, handover, Each user is identified by a so-called international
and call routing to roaming users. Traffic between the mobile subscriber identity (IMSI). Together with all other
GSM network and the fixed network [e.g., PSTN (public personal information about the subscriber, the IMSI is
switched telephone network) and ISDN] is handled by stored on a chip card. This card is denoted as the subscriber
a dedicated gateway MSC (GMSC). GSM was designed identity module (SIM) in GSM and must be inserted into
to be compatible with ISDN systems using standardized the mobile terminal in order to access the network and
interfaces. use the services. The IMSI is also stored in the HLR. In
Two types of databases, namely, the home location addition to this worldwide unique address, a mobile user
register (HLR) and the visitor location registers (VLRs), are receives a temporary identifier, denoted as the temporary
responsible for storage of profiles and location information mobile subscriber identity (TMSI). It is assigned by the
of mobile users. There is one central HLR per network VLR currently responsible for the user and has only
operator and typically one VLR for each MSC. The specific local validity. The TMSI is used instead of the IMSI for
configuration is left to the network operator. transmissions over the air interface. This way, nobody can
The HLR has a record for all subscribers registered with determine the identity of the subscriber.
a network operator. It stores, for example, each user’s tele- The actual ‘‘telephone number’’ of a user is denoted
phone number, subscription profile, and authentication as the mobile subscriber ISDN number (MSISDN). It is
data. Besides this permanent administrative data, it also stored in the SIM card and in the HLR. In general, one
contains temporary data, such as the current location of a user can have several MSISDNs.
user. In case of incoming traffic to a mobile user, the HLR Two further databases are responsible for various
is queried in order to determine the user’s current location. aspects of security (verification of equipment and sub-
This allows for routing the traffic to the appropriate MSC. scriber identities, ciphering). The authentication center
The mobile station must periodically inform the network (AUC) generates and stores keys employed for user
about its current location. To assist this process, several authentication and encryption over the radio channel.
cells are combined to a so-called location area. Whenever a The equipment identity register (EIR) contains a list of all
mobile station changes its location area, it sends a location serial numbers of the mobile terminals, denoted as inter-
update to the network, indicating its current location. national mobile equipment identities (IMEI). This register
A VLR is responsible for a group of location areas and allows the network to identify stolen or faulty terminals
stores data of all users that are currently located in this and deny network access.
area. The data include part of the permanent subscriber As shown in Fig. 2, signaling between the GSM
data, which have been copied from the HLR to the VLR components in the mobile switching network is based
for faster access. In addition to this, the VLR may also on the Signaling System Number 7 (SS#7). For mobility-
assign and store local data, such as temporary identifiers. specific signaling, the MSC, HLR, and VLR hold extensions
A user may be registered either with a VLR of his/her of SS#7, the so-called mobile application part (MAP).
GSM DIGITAL CELLULAR COMMUNICATION SYSTEM 907
Operation and maintenance of a GSM network are Besides mobile voice telephony, GSM also offers
organized from a central operation and maintenance center services for data transmission, such as fax and circuit-
(OMC), which is not shown in Fig. 2. Its functions include switched access to data networks (e.g., X.25 networks)
network configuration, operation, and performance man- with data rates up to 9.6 kbps (Kilobits per second).
agement, as well as administration of subscribers and Of particular importance is the short message service
terminals. (SMS). It allows users to exchange short text messages
To summarize, the entire GSM network can be divided in a store-and-forward fashion. The network operator
into three major subsystems: the radio network (BSS), establishes a service center that accepts and stores text
the switching network (including MSCs, databases, and messages. Later, some value-added services, such as SMS
wired core network), and the operation and maintenance cell broadcast and conversion of SMS messages from/to
subsystem (OSS). email and to speech, have been implemented.
The standardization of phase 2 basically added further
supplementary services, such as call waiting, call holding,
conference calling, call transfer, and calling line identifica-
3. SERVICES AND EVOLUTION
tion. Many parts of the GSM technical specifications had
to be reworked, but all networks and terminals retained
The first GSM networks mainly offered basic telecom- compatibility to phase 1. Phase 2 was completed in 1995,
munication services — in the first place, mobile voice and market introduction followed in 1996.
telephony — and a few supplementary services. This step In the following years, a broad number of additional
in the GSM evolution is called phase 1. The supplemen- services and improvements have been developed in
tary services of phase 1 include features for call forwarding independent standardization units (GSM phase 2+) [2].
(e.g., call forwarding on mobile subscriber busy, call for- These topics affect almost all aspects of GSM, and
warding on mobile subscriber not reachable) and call enable a smooth transition from GSM to the Universal
restriction (e.g., barring of all outgoing/incoming calls, Mobile Telecommunication System (UMTS); see Fig. 3.
barring of all outgoing international calls, and barring For example, the GSM voice codecs have been improved to
of incoming international calls when roaming outside the achieve a much better speech quality (see Section 4.2).
home network). All these services had to be implemented Furthermore, a set of group call and push-to-talk
as mandatory features by all network operators. speech services with fast connection setup has been
Universal Mobile
Telecommunication
System (UMTS)
Functionality
data rate
quality of transmission
Location
service
Wireless (LCS) Enhanced Data rates
application General Packet for GSM Evolution
protocol Radio Service (EDGE)
(WAP) CAMEL (GPRS)
High Speed
SIM application
Circuit Switched
toolkit (SAT)
Data (HSCSD)
MS application
execution environment Advanced
(MExE) Enhanced full speech call
rate codec items (ASCI)
(EFR)
Data services
(circuit switched)
GSM speech
standardized under the name advanced speech call Development also continued with improved bearer
items (ASCIs). These services are especially important services for data transmission. The High-Speed Circuit-
for closed user groups, such as police and railroad Switched Data (HSCSD) service achieves higher data
operators. rates by transmitting in parallel on several traffic chan-
Another important aspect is the definition of service nels (multislot operation). The General Packet Radio
platforms. Instead of standardizing services, only features Service (GPRS) offers a packet-switched transmission
to create services and mechanisms that enable their at the air interface. It improves and simplifies wire-
introduction are specified. This allows the network less access to the Internet. Users of GPRS benefit from
providers to introduce new operator-specific services in shorter access times, higher data rates, volume-based
a faster way. Customized application for mobile network billing, and an ‘‘always on’’ wireless connectivity. (For
enhanced logic (CAMEL) represents the integration of further details, see the GPRS entry of this encyclopedia.)
intelligent network (IN) principles into GSM. It enables The Enhanced Data Rates for GSM Evolution (EDGE)
use of operator-specific services also in foreign networks. service achieves even higher data rates and a better
For example, a subscriber roaming in a different country spectral efficiency. It replaces the original GSM modu-
lation by an 8-PSK (8-phase shift keying) modulation
can easily check his/her voicebox with the usual short
scheme.
number. Other features include roaming services for
prepaid subscriptions and speed dial for closed user
groups. Also on the mobile station side, service platforms 4. AIR INTERFACE: PHYSICAL LAYER
have been developed: the SIM application toolkit and the
MS application execution environment (MExE). The SIM Figure 4 gives a schematic overview of the basic elements
application toolkit allows the operator to run specific of the GSM transmission chain. The stream of sampled
applications on the SIM card. With the toolkit, the speech is fed into a source encoder, which compresses the
SIM card is able to display new operator-specific items data. The resulting bit sequence is passed to the channel
and logos and to play sounds. For example, users encoder. Its purpose is to add, in a controlled manner, some
can download new ringing tones to their SIM card. redundancy to the bit sequence. This redundancy serves
The new applications can be transmitted, for example, to protect the data against the negative effects of noise
via SMS to the mobile station. The most important and interference encountered in the transmission over
the radio channel. On the receiver side, the introduced
components of MExE are a virtual machine for execution
redundancy allows the channel decoder to detect and
of Java code and the Wireless Application Protocol (WAP).
correct transmission errors. Without channel coding, the
With a virtual machine running on the mobile station,
achieved bit error rate would be insufficient, not only
applications can be downloaded and executed. The WAP
for speech but also for data communication. Reasonable
defines a system architecture, a protocol family, and an
bit error rates are on the order of 10−5 to 10−6 . To
application environment for transmission and display
achieve these rates, GSM uses a combination of block and
of Web-like pages for mobile devices. WAP has been convolutional coding. Moreover, an interleaving scheme is
developed by the WAP forum; services and terminals used to deal with burst errors that occur over multipath
have been available since 1999. Using a WAP-enabled and fading channels. After coding and interleaving, the
mobile GSM phone, subscribers can download information data are encrypted to guarantee secure and confident
pages, such as news, weather forecasts, stock reports, and data transmission. The encryption technique as well as
local city information. Furthermore, mobile e-commerce the methods for subscriber authentication and secrecy
services (e.g., ticket reservation, mobile banking) are of the subscriber identity are explained in Section 6.
offered. The encrypted data are subsequently mapped to bursts,
If information about the current physical location of which are then multiplexed. Finally, the stream of bits
a user is provided by the GSM network or by GPS is differentially coded, modulated, and transmitted on
(Global Positioning System), location-aware applications the respective carrier frequency over the mobile radio
are possible. A typical example is a location-aware city channel.
guide that informs mobile users about nearby sightseeing After transmission, the demodulator processes the
attractions, restaurants, hotels, and public transportation. signal, which was corrupted by the noisy channel. It
Figure 4. Basic function of the GSM transmission chain on the physical layer at the air interface.
GSM DIGITAL CELLULAR COMMUNICATION SYSTEM 909
attempts to recover the actual signal from the received is necessary to update location information, for
signal. The next steps are demultiplexing and decryption. instance.
The channel decoder attempts to reconstruct the original • The slow associated control channel (SACCH),
bit sequence, and, as a final step, the source decoder tries which carries information for synchronization, power
to rebuild the original source signal. control, and channel measurements. It is always
assigned in conjunction with a TCH or an SDCCH.
4.1. Logical Channels • The fast associated control channel (FACCH), which
GSM defines a set of logical channels [3], which are is used for short time signaling. It can be made
divided into two categories: traffic channels and signaling available by stealing bursts from a TCH.
channels. Some logical channels are assigned in a
dedicated manner to a specific user; others are common GSM also defines a set of logical channels for the General
channels that are of interest for all users in a cell. Packet Radio Service (GPRS). They are treated in the
The traffic channels (TCHs) are used for the transmis- GPRS entry of this encyclopedia.
sion of user data (e.g., speech, fax). A TCH may either be
fully used (full-rate TCH; TCH/F) or split into two half- 4.2. Speech Processing Functions and Codecs
rate channels that can be allocated to different subscribers
Transmission of voice is one of the most important services
(half-rate TCH; TCH/H). A TCH/F carries either 13 kbps
in GSM. The user’s analog speech signal is sampled at
of coded speech or datastreams at 14.5, 12, 6, or 3.6 kbps.
the transmitter at a rate of 8000 samples/s, and these
A TCH/H transmits 5.6 kbps of half-rate coded speech or
samples are quantized with a resolution of 13 bits. At the
datastreams at 6 or 3.6 kbps.
input of the speech encoder, a speech frame containing
In addition to the traffic channels, GSM specifies
160 samples, each 13 bits long, arrives every 20 ms. This
various signaling channels. They are grouped into
corresponds to a bit rate of 104 kbps for the speech signal.
broadcast channels (BCHs), common control channels
The compression of this speech signal is performed in
(CCCHs), and dedicated control channels (DCCHs).
the speech encoder. The functions of encoder and decoder
The BCHs are unidirectional and used to continually
are typically combined in a single building block, called a
distribute information to all MSs in a cell. The following
codec.
three broadcast channels are defined:
An optional speech processing function is discontinuous
transmission (DTX). It allows the radio transmitter to be
• The broadcast control channel (BCCH) broadcasts switched off during speech pauses. This saves battery
configuration information, including the network power of the MS and reduces the overall interference level
and BTS identity, the frequency allocations, and at the air interface. Voice pauses are recognized by the
availability of optional features such as voice voice activity detection (VAD). During pauses, the missing
activity detection (Section 4.2) and frequency hopping speech frames are replaced by a synthetic background
(Section 4.5). noise generated by the comfort noise synthesizer.
• The frequency correction channel (FCCH) is used Another function on the receiver side is the replacement
to distribute information about correction of the of bad frames. If a transmitted speech frame cannot be
transmission frequency. corrected by the channel coding mechanism, it is discarded
• The synchronization channel (SCH) broadcasts data and replaced by a frame that is predictively calculated
for frame synchronization of an MS. from the preceding frame.
In the following paragraphs, the speech codecs used
The group of CCCHs consists of four unidirectional in GSM are briefly characterized. The channel coding is
channels that are used for radio access control: described in Section 4.3.
• The random-access channel (RACH) is an uplink 4.2.1. Full-Rate Speech Codec. The first set of GSM
channel that is used by the MSs in a slotted ALOHA standards defined a full-rate speech codec for transmission
fashion for the purpose of requesting network access. via the TCH/F. It is an RPE-LTP (regular pulse
excitation–long-term prediction) codec [4], which is based
• The access grant channel (AGCH) is a downlink
on linear predictive coding (LPC). The RPE-LTP codec has
channel used to assign a TCH or a DCCH to an MS.
a compression rate of 1/8 and thus produces a data rate of
• The paging channel (PCH), which is also a downlink 13 kbps at its output.
channel, is employed to locate an MS in order to With the further development of GSM the speech codecs
inform it about an incoming call. have also been improved. Two competing objectives have
• The notification channel (NCH) serves to inform MSs been considered: (1) the improvement of speech quality
about incoming group and broadcast calls. toward the quality offered by fixed ISDN networks and
(2) better utilization of the frequency bands assigned to
Finally, GSM uses three different DCCHs: GSM, in order to increase the network capacity.
• The stand-alone dedicated control channel (SDCCH), 4.2.2. Half-Rate Speech Codec. The half-rate speech
which is applied for signaling between BSS and codec has been developed to improve bandwidth utiliza-
MS when there is no active TCH connection. This tion. It produces a bit stream of 5.6 kbps and is used for
910 GSM DIGITAL CELLULAR COMMUNICATION SYSTEM
speech transmission over the TCH/H. Instead of using the 4.3. Channel Coding
RPE-LTP coding scheme, the algorithm is based on code-
GSM uses a combination of block coding (as external error
excited linear prediction (CELP), in which the excitation
protection) and convolutional coding (as internal error
signal is an entry in a very large stochastically populated
protection). Additionally, interleaving of the encoded bits
codebook. The codec has a higher complexity and higher
is performed in order to spread the channel symbol errors.
latency. Under normal channel conditions, it achieves — in
The channel encoding and decoding chain is depicted in
spite of half the bit rate — almost the same speech qual-
Fig. 4. The bits coming from the source encoder are first
ity as the full-rate codec. However, quality loss occurs for
block-encoded; that is, parity and tail bits are appended
mobile-to-mobile communication, since in this case (due to
to the blocks of bits. The resulting stream of coded bits is
the ISDN architecture) one has to go twice through the
then fed into the convolutional encoder, where further
GSM speech coding/decoding process. A method to avoid
redundancy is added for error correction. Finally, the
multiple transcoding has been passed under the name
blocks are interleaved. This is done because the mobile
tandem free operation in GSM Release 98.
radio channel frequently causes burst errors (a sequence
of erroneous bits), due to long and deep fading events.
4.2.3. EFR Speech Codec. The enhanced full-rate Spreading the channel bit errors by means of interleaving
(EFR) speech codec [5] was standardized by ETSI in 1996 diminishes this statistical dependence and transforms
and has been implemented in GSM since 1998. It improves the mobile radio channel into a memoryless binary
the speech quality compared to the full- and half-rate channel. On the receiver side, the sequence of received
speech codecs without using more system capacity than channel symbols — after demodulation, demultiplexing,
the full-rate codec. The EFR codec produces a bitstream and deciphering — is deinterleaved before it is fed into
of 12.2 kbps (244 code bits for each 20-ms frame) and is the convolutional decoder for error correction. Finally, a
based on the algebraic code excitation linear prediction parity check is performed based on the respective block
(ACELP) technique [6]. encoding.
A detailed explanation of the full-rate, half-rate, and It depends on the channel type which specific channel
EFR codecs can be found in Ref. 4. coding scheme is employed. Different codes are used, for
example, for speech traffic channels, data traffic channels,
4.2.4. AMR Codec. The speech codecs mentioned and signaling channels. The block coding used for external
before all use a fixed source bit rate, which has been error protection is either a cyclic redundancy check (CRC)
optimized for typical radio channel conditions. The code or a fire code. In some cases, just some tail bits
problem with this approach is its inflexibility; whenever are added. For the convolutional coding, the memory of
the channel conditions are much worse than usual, very the used codes is either 4 or 6. The basic code rates
poor speech quality will result, since the channel capacity are r = 12 and r = 13 ; however, other code rates can be
assigned to the mobile station is too small for error-free obtained by means of puncturing (deleting some bits after
transmission. On the other hand, radio resources will encoding). In the following, the basic coding process is
be wasted for unneeded error protection if the radio explained for some channels. The detailed channel coding
conditions are better than usual. To overcome these and interleaving procedures of all channels are described
problems, a much more flexible codec has been developed in Ref. 8.
and standardized: the adaptive multirate (AMR) codec.
It can improve speech quality by adaptively switching 4.3.1. Full-Rate Speech Codec. Let us explain how the
between different speech coding schemes (with different speech of the full-rate codec is protected. One speech block
levels of error protection) according to the current channel coming from the source encoder consists of 260 bits. These
quality. bits are graded into different classes, which have different
To be more precise, AMR has two principles of impact on speech quality. The 182 bits of class 1 have
adaptability [7]: channel mode adaptation and codec mode more impact on speech quality and thus must be better
adaptation. Channel mode adaptation dynamically selects protected. The block coding stage calculates three parity
the type of traffic channel that a connection should be bits for the most important 50 bits of class 1 (known
assigned to: either a full-rate (TCH/F) or a half-rate traffic as class 1a bits). Next, four tail bits are added, and
channel (TCH/H). The basic idea here is to adapt a user’s the resulting 189 bits are fed into a rate- 12 convolutional
gross bit rate in order to optimize the usage of radio encoder of memory 4 (see Fig. 5). The 78 bits of class
resources. The task of codec mode adaptation is to adapt 2 are not encoded. Finally, the resulting 456 bits are
the coding rate (i.e., the tradeoff between the level of error interleaved.
protection versus the source bit rate) according to the A frame in which the bits of class 1 have been recognized
current channel conditions. When the radio channel is bad, as erroneous in the receiver is reported to the speech codec
the encoder operates at low source bit rates at its input using the bad-frame indication (BFI). In order to maintain
and uses more bits for forward error protection. When a good speech quality, these frames are discarded, and the
the quality of the channel is good, less error protection is last frame received correctly is repeated instead, or an
employed. extrapolation of received speech data is performed.
The AMR codec consists of eight different modes with
source bit rates ranging from 12.2 to 4.75 kbps. All modes 4.3.2. EFR Speech Codec. Using the EFR codec, a
are scaled versions of a common ACELP basis codec; the special preliminary channel coding is employed for the
12.2-kbps mode is equivalent to the EFR codec. most significant bits; 8 parity bits (generated by a CRC
GSM DIGITAL CELLULAR COMMUNICATION SYSTEM 911
960 MHz
959.8 MHz 124 Each of the 200 kHz channels
123 has 8 TDMA channels
... Downlink
200 kHz ...
... 1 2 3 4 5 6 7 8
2
935.2 MHz 1
935 MHz
Data burst, 156.25 bit periods ≈ 576.9 µs
915 MHz
914.8 MHz 124 Uplink
123 Delay
45 MHz 1 2 3 4 5 6 7 8
... 3 time slots
separation
...
200 kHz ...
2
890.2 MHz 1 Figure 6. Carrier frequencies, duplex-
890 MHz ing, and TDMA frames.
912 GSM DIGITAL CELLULAR COMMUNICATION SYSTEM
There are 64 steps for the timing advance, where one comparative measurements of neighboring BCCH carriers
step corresponds to 1-bit duration. The value tells the MS by the mobile stations.
when it must transmit relative to the downlink. Thus, the GSM uses two parameters to describe the quality of a
required adjustment always corresponds to the round-trip connection: the received signal level (RXLEV, measured in
delay between the respective mobile station and the base dBm) and the received signal quality (RXQUAL, measured
station. Therefore, the maximum timing advance value as bit error ratio before error correction). Power control as
defined in GSM also determines the maximum cell size well as handover decisions are based on these parameters.
(radius 35 km). The received signal power is measured continuously by
mobile and base stations in each received burst within a
4.7. Modulation range of −110 to −48 dBm. The respective RXLEV values
are obtained by averaging.
The modulation technique used in GSM is Gaussian For power control, upper and lower threshold values
minimum shift keying (GMSK). GMSK belongs to the for uplink and downlink are defined for the parameters
family of continuous-phase modulation, which have the RXLEV and RXQUAL. If a certain number of consecutive
advantages of a narrow transmitter power spectrum values of RXLEV or RXQUAL are above or below the
with low adjacent channel interference and a constant respective threshold value, the BSS can adjust the
amplitude envelope. This allows for use of simple transmitter power. For example, if the upper threshold
and inexpensive amplifiers in the transmitters without value for RXLEV on the uplink is exceeded, the
stringent linearity requirements. In order to facilitate transmission power of the MS will be reduced. In the
demodulation, each burst is encoded differentially before other case, if RXLEV on the uplink falls below the
modulation is performed. respective lower threshold, the MS will be ordered to
increase its transmission power. If the criteria for the
4.8. Power Control downlink are exceeded, the transmitter power of the BTS
The main purpose of power control in a mobile communica- will be adjusted. Equivalent procedures will be performed
tion system is to minimize the overall transmitted power if the values for RXQUAL violate the respective range.
in order to keep interference levels low, while providing
sufficient signal quality for all ongoing communications. 5. NETWORKING ASPECTS AND PROTOCOLS
In GSM, the transmit power of the BTS and each
MS can be controlled adaptively. As part of the radio The GSM standard defines a complex protocol architecture
subsystem link control [11], the transmit power is that includes protocols for transport of user data as well
controlled in steps of 2 dBm. For the uplink power control, as for signaling. The fact that users can roam within a
16 control steps are defined from step 0 (43 dBm = 20 W) network from cell to cell and also internationally to other
to step 15 (13 dBm) with a gap of 2 dBm between networks, requires the GSM system to handle signaling
neighboring values. Similarly, the downlink can be functions for registration, user authentication, location
controlled in steps of 2 dBm. However, the number of updating, routing, handover, allocation of channels, and so
downlink power control steps depends on the power class on. Some signaling protocols for these tasks are explained
of the BTS, which defines the maximum transmission in the following; a more comprehensive explanation can
power of a BTS (up to 320 W). It should be noted that be found in Ref. 2.
downlink power control is not applied to the BCCH Figure 8 illustrates the signaling protocol architecture
carrier, which must maintain constant power to allow between MS and MSC. Layer 1 at the air interface
Connection Connection
management management
...
Mobility Mobility
Layer 3 management management
MS BSS MSC
(between MS and BTS) is the physical layer as described a call. After receiving the channel release command, the
in Section 4. It implements the logical signaling channels. MS changes back to idle state.
The data-link layer (layer 2) at the air interface employs Another important procedure of GSM radio resource
a modified version of the LAPD (link access procedure management is the activation of ciphering, which is
D) protocol used in ISDN. It is denoted as LAPDm (LAPD initiated by the BSS by means of the cipher mode
mobile). The signaling protocols in Layer 3 between mobile command.
stations and the network are divided into three sublayers:
(1) radio resource management, (2) mobility management, 5.2. Mobility Management and Call Routing
and (3) connection management [12]. They are explained Mobility management includes tasks that are related to
in the following subsections. In the fixed network part, the mobility of a user. It keeps track of a user’s current
a slightly modified version of the message transfer part location (location management) and performs attach and
(MTP) of SS#7 is used to transport signaling messages detach procedures, including user authentication.
between BSC and MSC. Above this protocol, the signaling Before a subscriber can make a phone call or use other
connection control part (SCCP) and the BSS application GSM services, his/her mobile station must attach to the
part (BSSAP) have been defined. The latter consists of network and register its current location. Usually, the
the direct transfer application part (DTAP) and the BSS subscriber will attach to its home network, that is, the
mobile application part (BSSMAP). network with which he/she has a contract. Attachment to
other networks is possible if there is a roaming agreement
5.1. Radio Resource Management between the providers. To perform an attach, the MS sends
the IMSI of the user to the current network. The MSC/VLR
The objective of radio resource management is to set queries the HLR to check whether the user is allowed to
up, maintain, and release traffic and signaling channels access the network. If authentication is successful, the
between a mobile station and the network. This also subscriber will be assigned a TMSI and an MSRN (mobile
includes cell selection and handover procedures. station roaming number) for further use. The TMSI is only
A mobile station is continuously monitoring the BCCH valid within a location area. The MSRN contains routing
and CCCH on the downlink. It measures the signaling information, so that incoming traffic can be routed to the
strength of the BCCHs broadcasted by the nearby BTSs in appropriate MSC of the user.
order to select an appropriate cell. At the same time, the After a successful attach, GSM’s location management
MS periodically monitors the PCH for incoming calls. functions are responsible for keeping track of a user’s
A radio resource management connection establish- current location, so that incoming calls can be routed to the
ment can be initiated either by the network or by the MS. user. Whenever a powered-on MS crosses the boundary of
In either case, the MS sends a channel request on the a location area, it sends a location update request message
RACH in order to get a channel assigned on the AGCH. to its MSC. The VLR issues a new MSRN and informs the
In case of a network-initiated connection, this procedure HLR about the new location. As opposed to the location
is preceded by a paging call to be answered by the MS. registration procedure, the MS has already got a valid
Once a radio resource management connection has TMSI in this case. In addition to this event-triggered
been set up, the MS has either an SDCCH or a TCH location update procedure, GSM also supports periodic
with associated SACCH/FACCH available for exclusive location updating.
bidirectional use. On the SACCH the MS continuously The telephone number dialed to reach a user (MSISDN)
sends channel measurements if no other messages need gives no information about the current location of this
to be sent. These measurements include the values user. Clearly, we always dial the same number no matter
RXLEV and RXQUAL (see Section 4.8) of the serving cell whether our communication partner is attached to its
and RXLEV of up to six neighboring cells. The system home network or he/she is currently located in another
information sent by the BSS on the SACCH downlink country. To establish a connection, the GSM network must
contains information about the current and neighboring determine the cell in which the user resides.
cells and their BCCH carriers. Figure 9 gives an example for an incoming call from
In order to change the configuration of the physical the fixed ISDN network to a mobile user. An ISDN switch
channel in use, a channel change within the cell can be realizes that the dialed number corresponds to a subscriber
performed. The channel change can be requested by the in a mobile GSM network. From the country and network
radio resource management sublayer or by higher protocol code of the MSISDN it can determine the home network
layers. However, it is always initiated by the network of the called user and forward the call to the appropriate
and reported to the MS by means of an assignment GMSC. On arrival of the call, the GMSC queries the HLR
command. A second signaling procedure to change the in order to obtain the current MSRN of the user. With this
physical channel configuration of an established radio number, the call can be forwarded to the responsible MSC.
resource management connection is a handover, which is Subsequently, the MSC obtains the user’s TMSI from its
described in Section 5.4. VLR and sends out a paging message for the user in the
The release of radio resource management connections cells of the relevant location area. The MS responds, and
is always initiated by the network. Reasons for the channel the call can be switched through.
release could be the end of the signaling transaction, It is important to note that a subscriber can attach to
insufficient signal quality, removal of the channel in favor a GSM network irrespective of a specific mobile terminal.
of a higher priority call (e.g., emergency call), or the end of By inserting the SIM card into another terminal, he/she
GSM DIGITAL CELLULAR COMMUNICATION SYSTEM 915
ISD
MSISDN 1 N
MS
MSRN 4
LA 2 BSC GMSC
BTS
MSRN
TMSI 2
3
7 MSC MSISDN
7 TMSI
EIR
BTS BSC AUC
HLR
LA 1 7 5 VLR
8 TMSI 6 MSRN
MS
TMSI TMSI
BTS Figure 9. Routing calls to mobile
users.
can make or receive calls at that terminal and use other supports the continuation of ongoing connections, roaming
subscribed services. This is possible because the IMEI from and to other networks terminates the connection and
(identifying the terminal) and the IMSI (identifying the requires a new attachment.
user, stored in the SIM) are independent of each other. When a mobile user moves within a network and
Consequently, GSM offers personal mobility in addition to crosses cell borders, the ongoing connection is switched
terminal mobility. to a different channel through the neighboring BTS (see
Fig. 10). This procedure is called intercell handover [13].
5.3. Connection Management Handovers may even take place within the same cell,
when the signal quality on the current channel becomes
The GSM connection management consists of three
insufficient and an alternative channel of the same BTS
entities: call control, supplementary services, and short
can offer improved reception. This event is called intracell
message service (SMS). The call control handles the tasks
handover.
related to setting up, maintaining, and taking down calls.
In case of an intercell handover, GSM distinguishes
The services of call control include the establishment
between intra-MSC handover (if both old and new BTS
of normal calls (MS-originating and -terminating) and
are connected to the same MSC) and inter-MSC handover
emergency calls (MS-originating only), the termination
(in case the new BTS is connected to a different MSC
of calls, dual-tone multifrequency (DTMF) signaling, and
than the old BTS). In the latter case, the connection is
incall modifications. The latter allows for changing the
rerouted from the old MSC (MSC-A) and through the new
bearer service during a connection, for example, from
MSC (MSC-B). If during the same connection another
speech to data transmission. Essentially, call control
inter-MSC handover takes place, either back to MSC-A
signaling in GSM corresponds to the call setup according to
or to a third one (MSC-C), the connection between MSC-
Q.931 in ISDN. Some additional features are incorporated
A and MSC-B is taken down and the connection is newly
to account for the limited resources in a wireless system.
routed from MSC-A to MSC-C (unless MSC-C equals MSC-
For example, it is possible to queue call requests if no
A). This repeated inter-MSC handover event is called a
traffic channel is immediately available. Furthermore, the
subsequent handover (see Fig. 10).
network operator has the option to choose between early
Since handovers not only induce additional signaling
and late assignment procedures. With early assignment,
traffic load but also temporarily reduce the speech quality,
the TCH is assigned immediately after acknowledging the
the importance of a well-dimensioned handover decision
call request. In case of late assignment, the call is first
algorithm, which should even be locally optimized, is
processed completely, and the TCH assignment occurs
obvious. This is one reason why GSM does not have a
only after the destination subscriber has been called. This
standardized uniform algorithm for the determination of
variant avoids unnecessary allocation of radio resources
the moment of a handover. Network operators can develop
if the called subscriber is not available. Thus, the call
and deploy their own algorithms that are optimally
blocking probability can be reduced.
tuned for their networks. This is made possible through
standardizing only the signaling interface that defines the
5.4. Roaming and Handover
processing of the handover and through transferring the
GSM supports roaming of mobile users within a network handover decision to the BSS. The GSM handover is thus
as well as between different GSM networks (as long as a a network-originated handover as opposed to a mobile-
roaming agreement exists between the respective network originated handover, where the handover decision is made
providers). While roaming within a single network by the mobile station. An advantage of the GSM handover
916 GSM DIGITAL CELLULAR COMMUNICATION SYSTEM
Connection route
MSC-B
MSC-A
BTS 1
BSC
MS
Intracell MS BSC
handover MS
BTS 2
MS BTS 3
Intercell
handover
Intra- MSC
handover Inter-MSC
handover
approach is that the software of the MS does not need to and BCCH frequency) and a handover reference number.
be changed when the handover strategy or the handover On reception of the HANDOVER COMMAND, the mobile station
decision algorithm is changed in the network. Even though interrupts the current connection, deactivates the old
the GSM standard does not prescribe a mandatory decision physical channel, and switches over to the channel newly
algorithm, a simple algorithm is proposed. assigned in the HANDOVER COMMAND. On the FACCH, the
The GSM handover decision is based on measurement mobile sends the message HANDOVER ACCESS in an access
data transmitted on the SACCH, most importantly the burst to the new base station. In case both BTSs have
values RXLEV and RXQUAL. At least 32 values of RXLEV synchronized their TDMA transmission, the access burst
and RXQUAL must be averaged and the resulting mean is sent in exactly four subsequent time slots of the FACCH.
values are continuously compared with threshold values. Thereafter, the mobile station activates the new physical
Additional handover decision criteria are the distance channel in both directions, activates encryption, and sends
to the current BTS, an indication of unsuccessful power the message HANDOVER COMPLETE to the BSS. In case the
control, and the pathloss values to neighboring BTSs. The BTSs are not synchronized, the mobile station repeats the
latter are obtained by monitoring the BCCH carriers of access burst until either a timer expires (handover failure)
nearby cells. Thus, either of the following handover causes or the BTS answers with a message PHYSICAL INFORMATION
can be distinguished: that contains the currently needed timing advance to
enable the mobile station to activate the new physical
channel.
• Received signal level too low (uplink or downlink)
To avoid a series of repeated handover events at
• Bit error ratio too high (uplink or downlink) the cell boundary where varying channel conditions can
• Power control range exceeded lead to frequent changes of the pathloss to both base
• Distance between MS and BTS too high stations, a hysteresis is used in GSM. This is done by
• Pathloss to neighboring BTS is lower than pathloss defining a handover margin for each cell. A handover
to the current BTS will be performed only if the path-loss difference of both
base stations is higher than the handover margin of the
A special handover situation exists if the bit error ratio potential new cell.
(uplink or downlink) is too high but at the same time the
uplink and downlink received signal levels are sufficiently 6. SECURITY ASPECTS
high. This strongly hints at severe cochannel interference.
This problem can be solved with an intracell handover, The wireless transmission over the air interface leads to
which the BSS can perform on its own without support the danger that unauthorized people may eavesdrop on
from the MSC. the communication of a subscriber or that they use radio
Once the decision on an intercell handover is made, resources at the cost of the network provider. The security
a target cell must be chosen. Therefore it is determined functions of GSM protect against unauthorized use of
which neighboring cell’s BCCH is received with sufficient services (by authentication), provide data confidentiality
signal level. All potential target channels with lower path (by encryption), and ensure the confidentiality of the
loss than the current channel are then reported to the subscriber identity [14].
MSC with the message HANDOVER REQUIRED. Once the target Authentication is needed to verify that users are
cell has been determined, the BSS sends the signaling who they claim to be. Each subscriber has a secret key
message HANDOVER COMMAND to the MS on the FACCH. The (subscriber authentication key Ki), which is stored in the
HANDOVER COMMAND contains the new channel configuration SIM as well as in the AUC. The SIM is protected by a
and also information about the new cell (e.g., BTS identity personal identity number (PIN) that is known only by the
GSM DIGITAL CELLULAR COMMUNICATION SYSTEM 917
subscriber. To authenticate a user, the network sends a area is wireless ad-hoc networking. His interests also
random number to the mobile station. Using the key Ki include 2G and 3G cellular systems and protocols
and a specified algorithm, the mobile station calculates for a mobile Internet. He is coauthor of the book
a signature response (SRES) from the random number GSM–Switching, Services and Protocols (Wiley/Teubner)
and sends it back to the network. If the SRES value and a number of articles in journals, books, and confer-
received from the MS matches the value calculated at ences.
the network side, the user is authenticated. In addition
to user authentication, the mobile terminal must also be Christian Hartmann studied electrical engineering at
authenticated. On the basis of the IMEI, the network can the University of Karlsruhe (TH), Germany, where he
restrict access for terminals that are listed in the EIR as received the Dipl.-Ing. degree in 1996. Since 1997, he
stolen or corrupted. has been with the Institute of Communication Networks
The encryption of speech and data is a feature of GSM at the Technische Universität München, Germany, as a
that is not supported in analog cellular and fixed ISDN member of the research and teaching staff, pursuing a
networks. It is used to protect the transmission over the doctoral degree. His main research interests are in the
air interface against eavesdropping. As illustrated in the area of mobile and wireless networks including capacity
transmission chain of Fig. 4, data coming from the channel and performance evaluation, radio resource management,
encoder is encrypted before it is passed to the multiplexer. modeling, and simulation. Christian Hartmann is a
The initial random number and the key Ki are used on both student member of the IEEE.
sides, the network and the MS, to calculate a ciphering key
Kc. This key is then employed by the encryption algorithm BIBLIOGRAPHY
for the symmetric ciphering of user data and signaling
information. On the receiver side, the key Kc can be used 1. ETSI, GSM 03.02: Network Architecture, Technical Specifica-
to decipher the data. tion, 2000.
Another security feature in GSM is that a user’s identity
2. J. Eberspächer, H.-J. Vögel, and C. Bettstetter, GSM — Swit-
remains undisclosed to listeners on the radio channel.
ching, Services, and Protocols, 2nd ed., Wiley, Chichester,
The IMSI uniquely identifies a subscriber worldwide and March 2001.
should thus not be transmitted over the air interface.
Thus, in order to protect his/her identity on network 3. ETSI, GSM 05.02: Multiplexing and Multiple Access on the
Radio Path, Technical Specification, 2000.
attach, the subscriber obtains a temporary identifier, the
TMSI, from the network. The TMSI is used instead of 4. R. Steele and L. Hanzo, eds., Mobile Radio Communications,
the IMSI for transmission over the air interface. The 2nd ed., Wiley, 1999.
association between the two values is stored in the VLR. 5. R. Salami et al., Description of the GSM enhanced full rate
The TMSI changes frequently and has only local validity speech codec, in Proc. IEEE Int. Conf. Communications
within a location area. Thus, an attacker listening on the (ICC’97), Montreal, Canada, June 1997, 725–729.
radio channel cannot deduce the subscriber’s identity from 6. J. Adoul, P. Mabilleau, M. Delprat, and S. Morisette, Fast
the TMSI. CELP coding based on algebraic codes, Proc. ICASSP, April
1987, pp. 1957–1960.
7. D. Bruhn, E. Ekudden, and K. Hellwig, Adaptive multi-rate:
7. BIBLIOGRAPHIC NOTES
a new speech service for GSM and beyond, Proc. 3rd ITG
Conf. Source and Channel Coding, Munich, Germany, Jan.
Article 15 is an early survey article mainly describing the 2000.
system architecture and signaling protocols. The GSM
8. ETSI, GSM 05.03: Channel Coding, Technical Specification,
book [2] covers both network aspects (system architecture, 1999.
protocols, services, roaming) and transmission aspects. It
9. ETSI, GSM 05.01: Physical Layer on the Radio Path; General
also includes a detailed description of the GSM phase 2+
Description, Technical Specification, 2000.
services. The book [16] also contains a detailed GSM part.
The GSM chapter of Ref. 4 covers the physical layer at the 10. ETSI, GSM 05.10: Radio Subsystem Synchronization, Tech-
air interface in detail. nical Specification, 2001.
11. ETSI, GSM 05.08: Radio Subsystem Link Control, Technical
Specification, 2000.
BIOGRAPHIES
12. ETSI, GSM 04.08: Mobile Radio Interface Layer 3 Specifica-
tion, Technical Specification, 2000.
Christian Bettstetter is a research and teaching
13. ETSI, GSM 03.09: Handover Procedures, Technical Specifica-
staff member at the Institute of Communication Net-
tion, 1999.
works at Technische Universität München TUM, Ger-
many. He graduated from TUM in electrical engi- 14. ETSI, GSM 03.20: Security Related Network Functions,
neering and information technology Dipl.-Ing. in 1998 Technical Specification, 2001.
and then joined the Institute of Communication Net- 15. M. Rahnema, Overview of the GSM system and protocol
works, where is he working toward his Ph.D. degree. architecture, IEEE Commun. Mag. 92–100 (April 1993).
Christian’s interests are in the area of mobile com- 16. B. Walke, Mobile Radio Networks, 2nd ed., Wiley, Chichester,
munication networks, where his current main research New York, 2002.
H
H.324: VIDEOTELEPHONY AND MULTIMEDIA may be better in various ways, to be used automatically if
FOR CIRCUIT-SWITCHED AND WIRELESS both ends have the capability to do so.
NETWORKS∗ Above all, H.324 is designed to provide the best
performance possible (video and audio quality, delay,
DAVE LINDBERGH etc.) on low-bit-rate networks. This is achieved primarily
Polycom, Inc. by reducing protocol overhead to the minimum extent
Andover, Massachusetts possible, and by designing the multiplexer and channel
BERNHARD WIMMER aggregation protocols to allow different media channels
to interleave frequently and flexibly, minimizing latency.
Siemens AG
Munich, Germany These protocol optimizations mean that H.324 is not
suitable for use on packet-switched networks, including IP
networks, because H.324 depends on reasonably constant
1. INTRODUCTION end-to-end latency in the connection, which can’t be
guaranteed in packet-routed networks.
ITU-T Recommendation H.324 [1] is the international The design of the H.324 standard benefits from
standard for videotelephony and real-time multimedia industry’s earlier experience with ITU-T H.320, the
communication systems on low-bit-rate circuit-switched widespread international standard for ISDN videocon-
networks. The basic H.324 protocol can be used over ferencing, approved in 1990. H.324 shares H.320’s basic
almost any circuit-switched network, including modems architecture, consisting of a multiplexer that mixes the
on the PSTN (public switched telephone network, often various media types into a single bitstream (H.223), audio
known as POTS — ‘‘plain old telephone service’’), on ISDN and video compression algorithms (G.723.1 and H.263),
networks, and on wireless digital cellular networks. and a control protocol that performs automatic capabil-
H.324 is most commonly used for dialup videotelephony ity negotiation and logical channel control (H.245). Other
service over modems, and as the basis for videotelephony parts of the H.324 set, such as H.261 video compression,
service in the Universal Mobile Telecommunications H.233/234 encryption, and H.224/281 far-end camera con-
System (UMTS) of the Third Generation Partnership trol, come directly from H.320. One of the considerations
Project (3GPP). The standard enables interoperability in the development of H.324 was practical interworking,
among a diverse variety of terminal devices, including PC- through a gateway, with the installed base of H.320 ISDN
based multimedia videoconferencing systems, inexpensive systems, as well as with the ITU-T standards for mul-
voice/data modems, encrypted telephones, and remote timedia on LANs and ATM networks, H.323 and H.310
security cameras, as well as standalone videophones. respectively, and the ITU-T T.120 series of data confer-
H.324 is a ‘‘toolkit’’ standard that gives implementers encing standards. Table 1 compares H.324 with H.320 and
flexibility to decide which media types (audio, data, video, the other major ITU-T multimedia conference standards.
etc.) and features are needed in a given product, but As a second-generation standard, the design of H.324
ensures interoperability by specifying a common baseline was able to avoid the limitations of H.320. As a result,
mode of operation for all systems that support a given H.324 has features missing from (or retrofitted to) H.320
feature. In addition to the baseline modes, H.324 allows such as receiver-controlled mode preferences, the ability
other optional modes, standard or nonstandard, which to support multiple channels of each media type, and
dynamic assignment of bandwidth to different channels.
∗This article is based on the article entitled The H.324 Multimedia 1.1. Variations
Communication Standard, by Dave Lindbergh, which appeared
in IEEE Communications Magazine, Dec. 1996, Vol. 24, No. 12, The original version of the H.324 standard was approved
pp. 46–51. 1996 IEEE. in 1995, exclusively for use on PSTN connections over
918
H.324: VIDEOTELEPHONY AND MULTIMEDIA FOR CIRCUIT-SWITCHED AND WIRELESS NETWORKS 919
dialup V.34 modems, at rates of up to 28,800 bps. Since protocol. The data, video, and audio streams are optional,
then, modems have increased in speed and H.324 has been and several of each kind may be present, set up
extended to support not only PSTN connections but also dynamically during a call. H.324 terminals negotiate a
ISDN connections and wireless mobile radio links. common set of capabilities with the far end, independently
The original PSTN version of H.324 is called ‘‘H.324.’’ for each direction of transmission (terminals with
The ISDN version is called ‘‘H.324/I,’’ and the mobile asymmetric capabilities are allowed). Multipoint operation
version ‘‘H.324/M.’’ The specific variant of H.324/M used in H.324, in which three or more sites can join in the same
in UMTS is called ‘‘3G-324’’ (see Table 2). call, is possible through the use of a multipoint control
The requirements, capabilities, and protocols during a unit (MCU, also known as a ‘‘bridge’’).
call are essentially the same for all variants of H.324. The
main differences are the network interfaces and call setup
3. MODEM
procedures, and in the case of H.324/M, the use of a variant
of the H.223 multiplex that is more resilient to the very When operating on PSTN circuits, H.324 requires support
high bit error rates encountered on wireless networks. for the full-duplex V.34 modem [2], which runs at rates
We begin by describing the base PSTN version of of up to 33,600 bps. The modem can operate at lower
H.324, and describe the differences in the other variations rates, in steps of 2400 bps, as line conditions require.
afterwards. H.324 is intended to work at rates down to 9600 bps. The
mandatory V.8 or optional V.8bis protocol is used at call
2. SYSTEM OVERVIEW startup to identify the modem type and operation mode.
In most implementations, the preferred V.8bis protocol
Figure 1 illustrates the major elements of the H.324 allows a normal voice telephone call to switch into a
system. The mandatory components are the V.34 modem multimedia call at any time. ITU-T V.25ter (the ‘‘AT’’
(for PSTN use), H.223 multiplexer, and H.245 control command set) is used to control local modem functions
such as dialing. The V.34 modulation/demodulation is
quite complex, and typically adds 30–40 ms of end-to-end
Table 2. Variants of H.324 delay.
Name Description Main Differences
An important V.34 characteristic for H.324 system
design is its error burst behavior. Because of the V.34
H.324 For PSTN over modems — trellis coder and other factors, bit errors rarely occur
H.324/I For ISDN Network interface, call singly, but instead in bursts. The rate of these error bursts
setup is also highly dependent on the bit rate of the modem. A
H.324/M For mobile wireless Network interface, single ‘‘step’’ of 2400 bps changes the error rate by more
networks error-robust multiplex than an order of magnitude. This influences the proper
3G-324M Variant of H.324/M for Network interface, ‘‘tuning’’ of the modem for use with H.324: a modem used
3GPP UMTS wireless error-robust multiplex, for simple V.42 data transfer is often more efficient at
video telephony AMR audio codec the ‘‘next higher’’ rate, since retransmissions will consume
instead of G.723.1 less bandwidth than will the loss from stepping down. But
a H.324 system should be run at the ‘‘next lower’’ rate,
Video codec
Video I/O equipment H.263/H.261 A
D M
A U
P X
Audio codec Receive T
Audio I/O equipment G.723.1 path delay
Multiplex/ Modem
V.34, GSTN
Demultiplex Network
User data applications Data protocols H.223 V.8/V.8bis
T.120, etc. V.14, LAPM, etc. L L
A A
Y Y
Control protocol SRP/LAPM E E
H.245 procedures R R MCU
System control Modem
control
V.25ter
since the real-time requirements make retransmission and H.324 protocol. A PC or other external DTE can con-
inappropriate in many cases, requiring a lower error rate. trol this V.8 or V.8bis negotiation using the procedures
H.324 uses the V.34 modem directly as a synchronous of V.25ter Annex A. If V.8bis is used, the terminal’s gross
data pump, to send and receive the bitstream generated by ability to support H.324 video, audio, data, or encryp-
the H.223 multiplexer. Data compression such as V.42bis, tion is also communicated, to quickly determine if the
and retransmission protocols such as V.42 link-access desired mode is available. The V.34 startup procedure
procedure for modems (LAPM) or the Microcom network then takes about 10 seconds, after which time end-to-end
protocol (MNP) are not used at the modem level, although digital communication is established at the full V.34 data
these same protocols can be used at a higher layer for some rate allowed by the quality of the telephone connection, up
types of data streams. to 33,600 bps.
Like the V.42 error correction protocol supported in Phase D then starts, in which the H.324 terminals
most modems, H.324 requires a synchronous interface first communicate with each other on the H.245 control
to the V.34 data pump. Since most PCs have only channel. The terminals are initialized, and detailed
asynchronous RS-232 modem interfaces, PC-based H.324 terminal capabilities are exchanged using H.245. This
terminals require either synchronous interface hardware happens very quickly (less than 1 second), as the full
or some other method to pass the synchronous bitstream connection bit rate is available for transfer of this control
to the modem over the PC’s asynchronous interface. Many information. Logical channels can then be opened for the
modems support ITU-T V.80, which provides a standard various media types, and table entries can be downloaded
sync-over-async ‘‘tunneling’’ protocol between the PC and to support those channels in the H.223 multiplexer.
the modem for this purpose. At this point the terminal enters phase E, normal
Faster modems such as V.90 can also be used, but multimedia communication mode. The number and type
in most cases these modems offer asymmetric bit rates, of logical channels can be changed during the call, and
where the modem is able to receive data faster than it each terminal can change its capabilities dynamically.
can transmit. When two such modems interconnect, both Phase F is entered when either user wants to end the
sides transmit at the lower rate, yielding little or no call. All logical channels are closed, and an H.245 message
improvement over V.34 speeds. tells the far-end terminal to terminate the H.324 portion
When operating on circuit-switched networks other of the call, and specifies the new mode (disconnect, back to
than the PSTN, the modem is replaced by an equivalent voice mode, or other digital mode) to use.
interface that allows the output from the H.223 multiplex Finally, in phase G, the terminals disconnect, return to
to be carried over the network in use. H.324 Annex C voice telephone mode, or go into whatever new mode (fax,
specifies operation of H.324 on wireless (digital cellular V.34 data, etc.) was specified.
and satellite) networks, and H.324 Annex D specifies
H.324 operation on ISDN.
5. MULTIPLEX
4. CALL SETUP AND TEARDOWN H.324 uses a new multiplexer standard, H.223, to mix
the various streams of video, audio, data, and the control
The setup and teardown of a H.324 call proceeds in seven channel, together into a single bitstream for transmission.
phases, phase A–phase G. The H.324 application required the development of a new
In phase A, an ordinary telephone call is setup using multiplexing method, as the goal was to combine low
the normal procedures for dialing, ringing, and answering. multiplexer delay with high efficiency and the ability to
This can be completely manual, with the user dialing and handle bursty data traffic from a variable number of logical
answering by hand; or automatic, with the V.25ter protocol channels.
(the ‘‘AT’’ command set) used to control the modem’s Time-division multiplexers (TDMs), such as H.221,
dialing and answering functions. were considered unsuitable because they can’t easily adapt
Once the call is connected, the H.324 terminal can to dynamically changing modem and payload data rates,
immediately start the multimedia call, or, if both terminals and are difficult to implement in software because of
support the optional V.8bis protocol, the two users can complex frame synchronization and bit-oriented channel
choose to have an ordinary voice conversation first. This is allocation.
called phase B, which can continue indefinitely, until one of Packet multiplexers, such as V.42 (LAPM) and Q.922
the users decides to switch the call into H.324 multimedia (LAPF), avoid these problems but suffer from ‘‘blocking
mode. V.8bis allows this ‘‘late start’’ mode, in which an delay,’’ where transmission of urgent data, such as audio,
ordinary phone call can be switched into multimedia mode must wait for the completion of a large packet already
at any time. On the ISDN network, the V.140 protocol started. This problem occurs when the underlying channel
substitutes for V.8bis. bit rate is low enough to make the transmission time
In phase C, digital communication between the termi- of a single packet significant. In a packet multiplexer,
nals is set up. For ISDN and digital wireless networks, this delay can be reduced only by limiting the maximum
this is inherent in the network connection. For PSTN calls, packet size or aborting large packets, both of which reduce
either terminal starts the modem negotiation by sending efficiency.
V.8 or V.8bis messages, using V.23 300 bps FSK modu- The H.223 multiplexer combines the best features of
lation (which doesn’t require any ‘‘training’’ time). The TDM and packet multiplexers, along with some new ideas.
modems exchange capabilities and select V.34 modulation It incurs less delay than TDM and packet multiplexers,
H.324: VIDEOTELEPHONY AND MULTIMEDIA FOR CIRCUIT-SWITCHED AND WIRELESS NETWORKS 921
has low overhead, and is extensible to multiple channels A slightly different, but equally valid, way of thinking
of each data type. H.223 is byte-oriented for ease of about H.223 is as a continuous stream of bit-stuffed
implementation, able to byte-fill with flags to match bytes carrying information in a given pattern of logical
differing data rates, and uses a unique synchronization channels. This pattern (again refer to Fig. 3) is occasionally
character that cannot occur in valid data. In H.223, interrupted by an ‘‘escape sequence’’ consisting of an
each HDLC (high-level data link control) framed [3] HDLC flag followed by a single ‘‘header’’ byte. The header
multiplex–protocol data unit (MUX-PDU) can carry a mix byte contains a multiplex code that indicates a change to
of different data streams in different proportions, allowing a new pattern of logical channel bytes.
fully dynamic allocation of bandwidth to the different Generally, an HDLC flag may be inserted on any
channels. byte boundary, terminating the previous MUX-PDU
H.223 consists of a lower multiplex layer, which and beginning a new one. Unlike traditional packet
actually mixes the different media streams, and a set of multiplexers, H.223 can interrupt a adaptation-layer
adaptation layers that perform logical framing, sequence variable-length packet at any byte boundary to reallocate
numbering, error detection, and error correction by bandwidth, such as when urgent real-time data must be
retransmission, as appropriate to each media type. sent. The reallocation can occur within 16–24 bit times
(16 bits for the flag and header, plus 0 to 8 bits for the byte
5.1. Multiplex Layer already being transmitted), less than 1 ms at 28,800 bps.
The multiplex layer uses normal HDLC zero insertion, This compares very favorably with both TDM muxes and
which inserts a 0 bit after every sequence of five 1 bits, packet multiplexes which suffer from ‘‘blocking delay.’’
making the HDLC flag (01111110) unique. The multiplex H.223 maintains a 16-entry multiplex table at all times,
consists of one or more flags followed by a one-byte header, selected by 4 bits of the MUX-PDU header byte. Entries
followed by a variable number of bytes in the information in the multiplex table may be changed during the call, to
field. This sequence repeats. Each sequence of header byte meet requirements as various logical channels are opened
and information field is called a MUX-PDU, as shown in or closed.
Fig. 2. Of the remaining 4 bits in the header, 3 are used as a
The header byte includes a multiplex code that specifies, CRC (cyclic redundancy code) on the multiplex code. The
by reference to a multiplex table, the mapping of the final bit is the packet marker (PM) bit, which is used for
information field bytes to various logical channels. Each marking the end of some types of higher-level packets.
MUX-PDU may contain a different multiplex code, and
5.2. Adaptation Layer
therefore a different mix of logical channels.
The selected multiplex table entry specifies a pattern Different media types (audio, video, data) require different
of bytes and corresponding logical channels contained in levels of error protection. For example, T.120 application
the information field, which repeats until the closing flag. data are relatively delay-insensitive, but require full error
Each MUX-PDU may contain bytes from several different correction. Real-time audio is extremely delay sensitive,
logical channels, and may select a different multiplex but may be able to accept occasional errors with only minor
table entry, so bandwidth allocation may change with degradation of performance. Video falls between these two
each MUX-PDU. This allows many logical channels, low- extremes. Ideally, each media type should use an error-
overhead switching of bandwidth allocation, and many handling scheme appropriate to the requirements of the
different types of channel interleave and priority. All of application.
this is under control of the transmitter, which may choose The H.223 adaptation layers provide this function,
any appropriate multiplex for the application, and change working above the multiplex layer on the unmultiplexed
multiplex table entries as needed. Many syntactically sequence of logical channel bytes. H.223 defines three
compliant multiplexing algorithms, optimized for different adaptation layers, AL1, AL2, and AL3. AL1 is intended
applications, are possible [4]. primarily for variable-rate framed information, such as
Figure 3 illustrates only four logical channels (audio, HDLC protocols and the H.245 control channel. AL2 is
video, data, and control), but there may be any number of intended primarily for digital audio such as G.723.1, and
channels, as specified by the multiplex table. This allows includes an 8-bit CRC and optional sequence numbers.
multiple audio, video, and data channels for different AL3 is intended primarily for digital video such as H.261
data protocols, multiple languages, continuous presence, and H.263, and includes provision for retransmission using
or other uses. sequence numbers and a 16-bit CRC.
MUX-PDU
Figure 2. MUX-PDU in an H.223 multiplex stream
... F F F H information field F F H information field F H information field F ...
(F = HDLC flag, H = header byte).
(Each letter represents one byte, F = Flag, H = Header, A = Audio, V = Video, D = Data, C = Control)
5.3. Encryption which the terminal may describe its ability (or inability) to
operate in various combinations of modes simultaneously,
Encryption is an option in H.324, and makes use of ITU-T
as some implementations are limited in processing cycles
H.233 and H.234, both of which were originally developed
or memory availability.
for use with H.320. H.233 covers encryption procedures
Receive capabilities describe the terminal’s ability
and algorithm selection, in which FEAL, B-CRYPT, and
to receive and process incoming information streams.
DES may be used as well as other standardized and
Transmitters are required to limit the content of their
nonstandardized algorithms. H.234 covers key exchange
transmitted information to that which the receiver has
and authentication using the ISO 8732, Diffie-Hellman, or
indicated it is capable of receiving. The absence of a receive
RSA algorithms.
capability indicates that the terminal cannot receive (is a
In H.324, encryption is applied at the H.223 multi-
transmitter only).
plexer, where the encryptor produces a pseudorandom
Transmit capabilities describe the terminal’s ability
bitstream (cipher stream), which, prior to flag insertion
to transmit information streams. They serve to offer
and HDLC zero insertion, is exclusive-ORed with the out-
receivers a choice of possible modes of operation, so the
put of the H.223 multiplexer. The exclusive-OR procedure
receiver can request the mode it prefers to receive using
is not applied to the H.223 MUX-PDU header and the
the H.245 RequestMode message. This is an important
H.245 control channel. The receiver reverses this process.
feature, as local terminals directly control only what
they transmit, but users care about controlling what they
6. CONTROL receive. The absence of a transmit capability indicates that
the terminal is not offering a choice of preferred modes to
H.324 uses the H.245 multimedia system control protocol, the receiver (but it may still transmit anything within the
which is also used by H.310 (for ATM networks) and H.323 capability of the receiver).
(for packet-switched LANs and corporate intranets). Many Terminals may dynamically add or remove capabilities
of the extensions added to H.245 to support H.323 on IP during a call. Since many H.324 implementations are
networks are not used by H.324. on general-purpose PCs, other application activity on the
The control model of H.245 is based on ‘‘logical machine can result in changing resource levels available to
channels,’’ independent unidirectional bitstreams with H.324. H.245 is flexible enough to handle such a scenario.
defined content, identified by unique numbers arbitrarily Nonstandard capabilities and control messages may be
chosen by the transmitter. There may be up to 65,535 issued using the NonStandardParameter structure defined
logical channels. in H.245. This allows nonstandardized features to be
The H.245 control channel carries end-to-end control supported automatically, in the same way as standardized
messages governing operation of the H.324 system, features.
including capabilities exchange, opening and closing of
logical channels, mode preference requests, multiplex 6.2. Logical Channel Signaling
table entry transmission, flow control messages, and H.324 terminals transmit media information from trans-
general commands and indications. The H.245 structure mitter to receiver over unidirectional streams called log-
allows future expansion to additional capabilities, as ical channels (LCs). Each LC carries exactly one channel
well as manufacturer-defined nonstandard extensions to of one media type, and is identified by a logical channel
support additional features. number (LCN) arbitrarily chosen by the transmitter. Since
H.245 messages are defined using ASN.1 syntax, coded transmitters completely control allocation of LCNs, there
according to the packed encoding rules (PERs) [5] of is no need for end-to-end negotiation of LCNs.
ITU-T X.691, which provide both clarity of definition and When a transmitter opens a logical channel, the H.245
flexibility of specification. In H.324, the H.245 control OpenLogicalChannel message fully describes the content
channel runs over logical channel (LC) 0, a separate of the logical channel, including media type, codec in
channel out of band from the various media streams. use, H.223 adaptation layer and any options, and all other
LC 0 is considered to be already open when the H.324 call information needed for the receiver to interpret the content
starts up. of the logical channel. Logical channels may be closed when
Within LC 0, H.245 is carried on a numbered simplified no longer needed. Open logical channels may be inactive,
retransmission protocol (NSRP) based on V.42 exchange if the information source has nothing to send.
identification (XID) procedures. Optionally, the full V.42 Logical channels in H.324 are unidirectional, so
protocol may be used instead. H.245 itself assumes that asymmetrical operation, in which the number and type
this link layer guarantees correct, in-order delivery of of information streams is different in each direction of
messages. transmission, is allowed. However, if a terminal is capable
only of certain symmetrical modes of operation, it may
6.1. Capabilities Exchange send a capability set that reflects its limitations.
The large set of optional features in H.324 necessitates a
method for the exchange of capabilities, so that terminals 6.3. Bidirectional Logical Channels
can become aware of the common subset of capabilities Certain media types, including data protocols such as
supported by both ends. T.120 and LAPM, and video carried over AL3, inherently
H.245 capabilities exchange provides for separate require a bidirectional channel for their operation. In such
receive and transmit capabilities, as well as a system by cases a pair of unidirectional logical channels, one in each
H.324: VIDEOTELEPHONY AND MULTIMEDIA FOR CIRCUIT-SWITCHED AND WIRELESS NETWORKS 923
direction, may be opened and associated together to form Other optional audio modes can also be used, such as the
a bidirectional channel. Such pairs of associated channels wideband ITU-T G.722.1 codec, which offers 7-kHz audio
need not share the same logical channel number, since bandwidth (approximately double that of conventional
logical channel numbers are independent in each direction telephone lines) at rates of 24 or 32 kbps. Nonstandard
of transmission. To avoid race conditions that could cause audio modes can be also used in the same manner as
duplicate sets of bidirectional channels to be opened, a standardized codecs.
symmetry-breaking procedure is used, in which master
and slave terminals are chosen on the basis of random 8. VIDEO CHANNELS
numbers. There is no advantage to being master or slave.
H.324 can send color motion video over any desired fraction
of the available modem bandwidth. H.324 supports both
7. AUDIO CHANNELS H.263 and H.261 for video coding. H.263 is the preferred
method, with H.261 available to allow interworking with
The baseline audio mode for H.324 is the G.723.1 speech older ISDN H.320 videoconferencing systems without
coder/decoder (codec), which runs at 5.3 or 6.4 kbps. The the need to convert video formats, which would add an
3G-324M variant of H.324 for 3G wireless networks uses unacceptable delay.
the adaptive multirate (AMR) codec as the baseline. H.263 is based on the same video compression
G.723.1 provides near-toll-quality speech, using a techniques as H.261, but includes many enhancements,
30 ms frame size and 7.5 ms lookahead. A G.723.1 including much improved motion compensation, which
implementation is estimated to require 18–20 fixed-point result in H.324 video quality estimated equivalent to
(MIPS) (million instructions per second) in a general- H.261 at a 50–100% higher bitrate. This dramatic
purpose DSP (digital signal processor). Transmitters may improvement is most apparent at the low bit rates used by
use either of the two rates, and can change rates for each H.324; the difference is less when H.263 is used at higher
transmitted frame, as the coder rate is sent as part of bitrates. H.263 also includes a broader range of picture
the syntax of each frame. The average audio bit rate can formats, as shown in Table 3.
be lowered further by using silence suppression, in which H.324 video can range from 5 to 30 frames/second,
silence frames are not transmitted, or are replaced with depending on bitrate and picture format, H.263 options in
smaller frames carrying background noise information. In use, and the amount of complexity and movement in the
typical conversations both ends rarely speak at the same scene. A H.245 control message, the videoTemporalSpatial-
time, so this can save significant bandwidth for use by TradeOff, feature allows the receiver to specify a preference
video or data channels. for the tradeoff between frame rate and picture resolution.
Receivers can use a H.245 message to signal their Video channels use H.223 adaptation layer AL3, which
preference for low- or high-rate audio. The audio channel includes a 16-bit CRC and sequence numbering, and
uses H.223 adaptation layer AL2, which includes an 8- provision for retransmission of errored video data, at the
bit cyclic redundancy code (CRC) on each audio frame or option of the receiver.
group of frames. Since H.324 systems can support multiple channels
The G.723.1 codec imposes about 97.5 ms of end-to-end of video (although bandwidth constraints may make this
audio delay, which, together with modem, jitter buffer, impractical for many applications), continuous-presence
transmission time, multiplexer, and other system delays, multipoint operation, in which separate images of each
results in about 150 ms of total end-to-end audio delay transmitting site are presented at the receiver ‘‘Hollywood
(exclusive of propagation delay) [6]. On ISDN connections, Squares’’ style, can be easily implemented via multiple
modem latency is eliminated, leading to about 115 ms end- logical channels of video. It is up to the receiver to locally
to-end delay. These audio delays are generally less than arrange an appropriate set of the channels for display.
that of the video codec, so additional delay needs to be
added in the audio path if lip synchronization is desired. 9. DATA CHANNELS
This is achieved by adding audio delay in the receiver
only, as shown in Fig. 1. H.245 is used to send a message The H.324 multimedia system can carry data channels as
containing the time skew between the transmitted video well as video and audio channels. These can be used for any
and audio signals. Since the receiver knows its decoding
delay for each stream, the time skew message allows the Table 3. Video Picture Formats
receiver to insert the correct audio delay, or alternatively,
to bypass lip synchronization, and present the audio with Picture Luminance H.324 Video
Format Pixels Decoder Requirements
minimal delay. While multiple channels of audio can be
— — H.261 H.263
transmitted, H.324 does not currently provide for the exact
sample-level channel synchronization needed for stereo SQCIF 128 × 96 for H.263a Optionala Required
audio. QCIF 176 × 144 Required Required
Many H.324 applications do not require lip synchro- CIF 352 × 288 Optional Optional
nization, or do not require video at all. For these appli- 4CIF 704 × 576 Not defined Optional
16CIF 1408 × 1152 Not defined Optional
cations, optional H.324 audio codecs such as G.729, an
Custom formats ≤2048 × 1152 Not defined Optional
8-kbps speech codec which can reduce the total end-to-end
audio delay to about 85 ms for PSTN use, or 50 ms on a
H.261 SQCIF is any active size less than QCIF, filled out by a black
ISDN, can be important. border, coded in QCIF format.
924 H.324: VIDEOTELEPHONY AND MULTIMEDIA FOR CIRCUIT-SWITCHED AND WIRELESS NETWORKS
data protocol or application in the same way as an ordinary 11. CHANNEL AGGREGATION FOR MULTILINK
modem. Standardized protocols include T.120 data [7] for OPERATION
conferencing applications such as electronic whiteboards
and computer application sharing, user data via V.14 H.324 Annex F specifies the use of the H.226 channel
or V.42 (with retransmission), T.84 still image transfer, aggregation protocol, which allows H.324 PSTN and
T.434 binary file transfers, H.224/H.281 for remote control ISDN calls to aggregate the capacity of multiple separate
of far-end cameras with pan, tilt, and zoom functions, connections into one faster channel. This can provide
and ISO/IEC TR9577 network-layer protocols [8] such improved video quality, compared to what can be delivered
as IP (Internet protocol) and IETF PPP (point-to-point by a single V.34 or V.90 connection (at no more than
protocol) [9], which can be used to provide Internet access 56 kbps) or by a single ISDN B-channel (at 64 kbps).
over H.324. As with other media types, H.324 provides Although H.226 is a general-purpose channel aggre-
for extension to non-standard protocols, which can be gation protocol, it was designed specifically to meet the
negotiated automatically and used just like standardized requirements of H.324. Modems like V.34 vary their bit
protocols. rate as telephone-line noise conditions change. When mul-
The same capability exchange and logical channel tiple modems are used on separate lines, each modem
signaling procedures defined for video and audio channels changes its bit rate independently, so the bitrate available
are also used for data channels, so automatic negotiation on each channel varies over time without channel-to-
channel coordination.
of data capabilities can be performed. This represents a
This characteristic rules out the use of conventional
major step forward from current practice on data-only
synchronous channel aggregation protocols like H.221 and
PSTN modems, where data protocols and applications to
ISO/IEC CD 13871 ‘‘BONDING’’ [10]. These synchronous
be used must generally be arranged manually by users
protocols work by distributing units of data to each channel
before a call.
in a fixed pattern, ‘‘spreading out’’ a higher-rate data
As with all other media types, data channels are carried
stream over a number of lower-rate channels. The fixed
as distinct logical channels over the H.223 multiplexer,
pattern avoids the need to send overhead data to tell
which can accommodate bursty data traffic by dynamically
the receiver to how to reconstruct the original data.
altering the allocation of bandwidth among the different
With no overhead, the units of distributed data can be
channels in use. For instance, a video channel can be
arbitrarily small, resulting in very little latency added by
reduced in rate (or stopped altogether) when a data
the channel aggregation protocol, as well as high efficiency.
channel requires additional bandwidth, such as during
However, because the distribution pattern is fixed, these
a file or still image transfer.
synchronous protocols cannot make use of channels with
arbitrary or varying bitrates, such as those provided by
10. MULTIPOINT OPERATION AND H.320 ISDN V.34 modems.
INTEROPERABILITY Packet-oriented channel aggregation protocols avoid
this problem, but cannot simultaneously provide both
very low latency and very low overhead. In a packet-
H.324 terminals can directly participate in multipoint oriented channel aggregation protocol, such as the ‘‘PPP
calls, in which three or more participants join the same Multilink Protocol’’ of RFC 1990 [11], data are repeatedly
call, through a central bridge called a multipoint control divided into packets, and each packet is transmitted
unit (MCU). Since the connections on each link in a on the channel whose transmit queue is least full.
multipoint call may be operating at different rates, MCUs This proportionally distributes packets on all channels
can send H.245 FlowControlCommand messages to limit regardless of their rate. Because the receiver doesn’t
the overall bit rate of one or more logical channels, or the know which packets were sent on which channels, the
entire multiplex, to a supportable common mode. transmitter must include a header to delineate packet
A similar situation arises when a H.324 terminal boundaries and identify the original order of the packets.
interoperates with an H.320 terminal on the ISDN. The overhead of such a protocol is inversely proportional
For interworking with H.320 terminals, an H.324/H.320 the size of the packets — on small packets, the header
gateway can transcode the H.223 and H.221 multiplexes, overhead forms a larger proportion of the total rate.
and the content of control, audio, and data logical The latency of transmission, however, is proportional to
channels between the H.324 and H.320 protocols. If the size of the packets — larger packets mean greater
the H.320 terminal doesn’t support H.263, then H.261 buffering, and thus more latency. This results in an
(QCIF) Quarter CIF video can be used to avoid the unavoidable tradeoff between latency and overhead in
delay of video transcoding. In this case the gateway, a packet-based channel aggregation protocol. H.226
like the MCU in the multipoint case, can send the H.245 provides the ability to operate on channels with arbitrary
FlowControlCommand to force the transmitted H.324 video or varying bitrates, while still achieving low latency and
bit rate to match the H.320 video bit rate in use by the efficiency close to that of synchronous channel aggregation
H.221 multiplexer. protocols.
One way for a dual-mode (H.320 ISDN and H.324
PSTN) terminal on the ISDN to work directly with H.324 11.1. H.226 Operation
terminals on the PSTN is by using a ‘‘virtual modem’’ The H.226 channel aggregation protocol operates between
on the ISDN, which generates a G.711 audio bitstream the H.223 multiplexer and multiple physical network
representing the V.34 analog modem signal. connections, as shown in Fig. 4.
H.324: VIDEOTELEPHONY AND MULTIMEDIA FOR CIRCUIT-SWITCHED AND WIRELESS NETWORKS 925
sequence (FAS) used to start H.320. By transmitting structure. Level 0 is the same as the PSTN and ISDN
in the low-order bits, these patterns create minimal variants of H.324. Levels 1 and 2 (in H.223 Annexes
disruption to G.711 coded audio signals, creating only A and B) add robustness to the H.223 multiplex layer,
low-level background noise. Therefore G.711 audio, which while levels 3 and 3a add more error-robust adaptation
can contain speech or tones from PSTN modems, the layers in H.223 Annexes C and D. Each higher-numbered
H.320 FAS signal, and the V.140 signature, can all be level is more error-robust than the previous level, and
sent simultaneously. This allows the terminal to negotiate also requires more overhead and processing cycles. Each
and interoperate correctly if the far-end terminal supports level supports the features of all lower levels. Table 4
any of these protocols, even if it doesn’t support H.324/I summarizes these levels.
or V.140. Once both terminals detect the V.140 signature,
phase 2 is entered. 13.1.1. Level 0. Level 0 is identical to the base H.324
Phase 2 of V.140 probes the network connection to and H.223.
determine which of a variety of national ISDN and ISDN-
13.1.2. Level 1. Defined in H.223 Annex A, Level 1
like digital network types is use. Most ISDN networks
replaces the 8-bit HDLC synchronization flag ‘‘01111110’’
supply an 8.0-kHz clock that times each 8-bit byte
with a 16-bit pseudo-noise (PN) flag ‘‘1011001010000111.’’
transmitted over the network. The clock indicates the
Longer flags allow more robust detection of MUX-PDU
boundaries between bytes. Some networks transmit only
boundaries in the case of high bit error rates, because
7 bits out of each byte end-to-end (providing 56 kbps),
implementations can use correlation thresholds for flag
while others carry all 8 bits (for 64 kbps). Other network
detection. For example, if 15 out of 16 bits are correct, a
types don’t provide any byte timing at all. Even between
flag may be detected. In addition, implementations should
two terminals with full 64-kbps connections with byte
also consider the correctness of the header byte to verify
timing, sometimes there can be an intervening network
the correct detection of the flag. In an optional extension,
that drops 1 out of each 8 bits. Phase 2 of V.140 sorts
two consecutive PN flags may be used (double flag mode),
this out by exchanging a series of test patterns across
permitting even more robust flag detection.
the connections. Each terminal can then determine which
At level 1, there is no equivalent to the HDLC zero-
bits are usable for transmission, and in what order. This
insertion process, so emulation of the PN flag in the
process can make H.324/I calls possible on connections
data stream is not prevented. If possible, transmitters
that would be mismatched otherwise. should prevent the appearance of a PN flag pattern in
Finally, in phase 3, the two terminals exchange a list the information field by terminating the MUX-PDU and
of modes that they are capable of using, and choose one starting a new one when a PN flag emulation occurs.
to be used for the call. This capability exchange process Avoiding HDLC zero-insertion ensures that bit errors
can also be used as a substitute for the V.8bis ‘‘late start’’ cannot unsynchronize the zero-removal process, which
procedure in phase B of the PSTN H.324 call startup. could lead to error propagation through the entire MUX-
After V.140 is complete, call setup continues from PDU [15].
phase D. Operation of H.324/I after that is identical to
H.324 on the PSTN. 13.1.3. Level 2. Level 2, defined in H.223 Annex B,
uses the 16-bit PN flag from level 1 and additionally
13. H.324/M FOR MOBILE (WIRELESS) NETWORKS
Table 4. Level Structure of H.324/M
The success of digital cellular telephone systems led to a
strong industry desire for a real-time videophone service Level Multiplex Summary Ref.
for 2 12 and 3G cellular systems offering higher data rates, 3a 16-bit PN flag H.324, Annex C
such as GPRS (general packet radio service), EDGE
24-bit multiplex header H.223, Annexes A, B, C, D
(enhanced data rates for GSM evolution), cmda2000,
error-robust AL1M ,
and UMTS. AL2M , AL3M
The H.324 variant called H.324/M supports this
3 16-bit PN flag H.324, Annex C
application, and is described in H.324 Annex C. The
main difference from H.324 is the addition of robust 24-bit multiplex header H.223, Annexes A, B, C
error-robust AL1M,
error-detection and error-correction features in the H.223
AL2M, AL3M
multiplexer, and the use of G.723.1 Annex C scalable error-
protection coding for audio. These allow H.324/M to cope 2 16-bit PN flag H.324, Annex C
with the high bit error rates (as poor as 5 × 10−2 ) of these 24-bit multiplex header H.223, Annexes A, B
wireless networks. AL1, AL2, AL3
The 3rd Generation Partnership Project (3GPP) video- 1 16-bit PN flag H.324, Annex C
phone standard for UMTS, 3G-324M [14], uses the struc- 8-bit multiplex header H.223, Annex A
ture of H.324/M with a slightly different set of codecs and AL1, AL2, AL3
requirements. 0 8-bit HDLC flag H.324
8-bit multiplex header H.223
13.1. Level Structure of H.324/M
AL1, AL2, AL3 (same
H.324/M supports five different error-detection/correction as base PSTN H.324)
schemes, called ‘‘levels,’’ organized in a hierarchical
H.324: VIDEOTELEPHONY AND MULTIMEDIA FOR CIRCUIT-SWITCHED AND WIRELESS NETWORKS 927
replaces the 8-bit H.223 header with an extended 24-bit V.34 modem is replaced by a suitable wireless interface,
level 2 header. and a digital end-to-end connection is established.
The level 2 header includes a 4-bit multiplex code (the The main difference is the negotiation used to select
same as in base H.324), an 8-bit multiplex payload length which of the different H.223 levels, just described, will be
(MPL), and a 12-bit extended Golay code. used. The ‘‘stuffing sequence,’’ sent when the transmitter
The MPL field indicates the count of bytes in is has no information to send, is unique for each level and
the information field. The combination of PN flag is therefore used to signal the sending terminal’s highest
synchronization and the MPL makes it possible to supported level. Table 5 shows the stuffing sequences sent
overcome critical bit error or packet-loss situations that by each level.
are likely in the mobile environment. Each terminal starts by sending consecutive stuffing
The extended Golay code is used for error detection. sequences for the highest level it supports. Each receiver
It can also be used for a combination of error detection tries to detect stuffing sequences that correspond to one
and error correction, where error-detection capability of the levels supported by the receiver. If either terminal
decreases as error-correction increases. receives stuffing corresponding to a level lower than the
The packet marker (PM) bit is not included in the one it is sending, it changes its transmitted level to match
MUX-PDU header, as in levels 0 and 1. Instead, the PM is the far-end terminal. Once both terminals detect the same
signaled by inverting the following PN flag. stuffing sequence, the terminals operate at the same level
An optional extension provides even more redundancy and proceed with phase D of the H.324 call setup. (The
by re-sending the previous MUX-PDU header, in the 8- choice between levels 3 and 3a is signaled using H.245.)
bit level 0 header format, immediately after each level 2 This procedure guarantees that the terminals initially
header. start at the highest commonly supported level.
When the transmitter has no information to send, the H.324/M also defines an optional procedure to change
‘‘stuffing sequence,’’ sent repeatedly to fill the channel, is levels dynamically during the call (in phase E). This
the 2-byte PN flag followed by the 3-byte level 2 header, procedure is signaled via H.245 and applies separately
containing a multiplex code of ‘‘0000.’’ to each direction of transmission. For example, it is
possible to operate at level 1 in one direction and at
13.1.4. Level 3. H.223 Annex C defines level 3, which, level 2 in the opposite direction. This procedure is useful if
in addition to the level 2 multiplex enhancements, replaces the transmission characteristics (primarily bit error rate)
H.223 adaptation layers AL1, AL2, and AL3 with more change, and if this doesn’t happen too frequently.
error-robust ‘‘mobile’’ versions: AL1M, AL2M, and AL3M.
The level 3 adaptation layers can protect data with 13.3. Control Channel Segmentation and Reassembly Layer
forward error correction (FEC) using a rate-compatible (CCSRL)
punctured convolutional code (RCPC) [16], and with auto- At levels 1 and higher a control channel segmentation
matic repeat request (ARQ) schemes, to allow incorrectly and reassembly layer (CCSRL) is added between the
received data to be automatically retransmitted. A single NSRP and H.245 layers for logical channel 0. H.324/M
higher-level packet can also be split into several smaller has to guarantee error-free delivery for H.245 messages.
parts, to reduce the amount of data that can be damaged In general, longer NSRP frames are more likely to be
by uncorrected bit errors. affected by errors than are shorter ones. Therefore the
An optional interleaving scheme, which scrambles the CCSRL layer segments long H.245 messages into smaller
order of the payload data, can be used to make the FEC NSRP frames. This reduces the amount of NSRP data that
more effective by spreading out the effect of a short burst needs to be retransmitted due to errors.
of errors over a longer period. The headers of the higher-
level packets are also protected with FEC, using either a 13.4. Videotelephony in UMTS: 3G-324M
systematic extended BCH code, or an extended Golay code.
The level 3 stuffing sequence is the 2-byte PN flag 3G-324M [14] is the videotelephony standard for the
followed by the 3-byte level 2 header, containing a 3GPP’s UMTS. H.324/M was taken as the baseline, but
with a few changes, including
multiplex code of ‘‘1111.’’
0 HDLC flag
13.2. Call Setup 1 PN flag
2 PN flag + level 2 header, multiplex code 0000
The H.324/M call setup is essentially the same as for the 3 and 3a PN flag + level 2 header, multiplex code 1111
PSTN and ISDN variants of H.324. Similar to H.324/I, the
928 H.324: VIDEOTELEPHONY AND MULTIMEDIA FOR CIRCUIT-SWITCHED AND WIRELESS NETWORKS
Systems — Private Telecommunications Networks — Digital In signal processing the concept of orthogonal trans-
Channel Aggregation, ISO, Geneva, 1995, http://www.iso.ch. forms have a wide range of applications. Hadamard
11. K. Sklower et al., The PPP Multilink Protocol (MP), IETF matrices can be used as an orthogonal transform and,
RFC 1990, Aug. 1996. http://www.ietf.org/rfc. as in the case of the Fourier transform, a fast Hadamard
12. ITU-T Study Group 2, Recommendation E.164, ITU, Geneva, transform can be derived.
1997, http://www.itu.int. First we will give the definition and two possible con-
structions of Hadamard matrices, namely, the Sylvester
13. ITU-T Study Group 15, Recommendations of the I Series, ITU,
Geneva, 1988–2000, http://www.itu.int. and the Payley constructions. For n ≥ 4 only orders n
divisible by 4 may exist. By these two constructions, all
14. 3rd Generation Partnership Project (3GPP), TSG-SA Codec
but six possible Hadamard matrices of order up to 256
Working Group, 3G TS 26.110, Codec(s) for Circuit Switched
will be constructed. Namely, the orders 156, 172, 184, 188,
Multimedia Telephony Service: General Description, V.4.0.0,
http://www.3gpp.org. 232, and 236 cannot be constructed; however, there exist
constructions for these orders, for example, in Refs. 5 and
15. J. Hagenauer, E. Hundt, T. Stockhammer, and B. Wimmer,
3. Of course, orders larger than 256 may be obtained by
Error robust multiplexing for multimedia applications, Signal
Process. Image Commun. 14: 585–597 (1999).
the two constructions presented.
Afterward we will describe the connections to coding
16. J. Hagenauer, Rate-compatible punctured convolutional theory and then derive the fast Hadamard transform
codes (RCPC codes) and their applications, IEEE Trans.
for signal processing. Finally, we comment on several
Commun. 36(4): 389–400 (April 1988).
applications.
17. 3GPP TS 26.071: Mandatory Speech Codec; General Descrip-
tion, http://www.3gpp.org.
2. DEFINITIONS AND CONSTRUCTIONS OF
18. ISO/IEC 14496-1: 1999, Information Technology — Coding of
Audio-visual Objects, Part 1: Systems, ISO, Geneva, 1999,
HADAMARD MATRICES
http://www.iso.ch.
Definition 1. A Hadamard matrix, H = H(n), of order
19. 3GPP2, Video Conferencing Service in cdma2000, S.R0022,
V.1.0, July 2000, http://www.3gpp2.org. n, is an n × n matrix with entries +1 and −1, such that
HHT = nI, where I is the n × n identity matrix and H T is
the transposed matrix of H.
Let H(n) and H(m) be two Hadamard matrices. Then Payley Construction P2 (Symmetric Conference Matri-
H(nm) := H(n) ⊗ H(m) is a Hadamard matrix of order ces). If n ≡ 2 (mod 4) and there exists a symmetric
nm. conference matrix C(n), then
This construction was given by Sylvester [1]. Note that
often only constructions with n = 2 are called Sylvester I + C(n) −I + C(n)
H(2n) = (2)
type, however his original definition was more general. −I + C(n) −I − C(n)
+1 +1 is a Hadamard matrix.
Example 2. Let H(2) = and H(n) be Hada-
+1 −1 Construction of Conference Matrices. Now we need
mard matrices. Then the Hadamard matrix of order 2n, to construct conference matrices C(n), and for this
using Sylvester’s construction, is as follows: we will describe Paley’s constructions based on
quadratic residues in finite fields [2]. Note that in
H(n) H(n) any field, addition and multiplication are defined
H(2n) =
H(n) −H(n) and, in the following, we assume that the operations
with field elements are done accordingly. Let q be a
Walsh–Hadamard. In particular, the mth Kronecker finite field, q odd. A non-zero-element x ∈ q is called
powers of the normalized matrix S1 := H(2) gives a a quadratic residue if an element y ∈ q exists such
sequence of Hadamard matrices of the Sylvester type that x = y2 . Among the q − 1 nonzero elements of
q there exist exactly (q − 1)/2 quadratic residues
Sm := H(2m ) = S1 ⊗ Sm−1 , m = 2, 3, . . . and (q − 1)/2 quadratic nonresidues. The Legendre
function χ : q ⇒ {0, ±1} is defined by.
which are of special interest in communications. The rows
are often called Walsh or Walsh–Hadamard sequences. 0 if x = 0
χ (x) = +1 if x is a quadratic residue
−1 if x is a quadratic nonresidue
2.2. The Paley Construction
A conference matrix C(n) of order n is an n × n matrix Clearly the Legendre symbol is an indicator for
with diagonal entries 0 and other entries +1 or −1, which quadratic residues and nonresidues. A q × q matrix
satisfies CCT = (n − 1)I. Q = (Qx,y ) can be defined where the indices x,y of the
entries Qx,y are the elements of the field, x, y ∈ q and
Example 3. As an example, we give a conference matrix their value is Qx,y := χ (x − y). We denote by 1 the all one
of order 4: vector.
Skew-Symmetric Payley Conference Matrix. If q ≡ 3
0 −1 −1 −1 (mod 4), then
1 0 1 −1
C(4) =
1
.
−1 0 1 0 1
C(q + 1) = (3)
1 1 −1 0 −1T Q
The property CCT = (n − 1)I remains unchanged for is a skew-symmetric Paley conference matrix of
permutations of rows and columns that do not change order q + 1.
the diagonal elements or negations of rows and columns. Symmetric Payley Conference Matrix. If q ≡ 1 (mod
Again, matrices constructed by such operations are called 4), then
equivalent. 0 1
C(q + 1) = T (4)
Conference matrices C(n) exist only for even n. Two 1 Q
special cases are distinguished: (1) if C(n) = −CT (n) this
matrix is called a skew-symmetric conference matrix, which is a symmetric Paley conference matrix of order q + 1.
is possible only for n ≡ 0 (mod 4); and (2) if C(n) = CT (n),
this is called a symmetric conference matrix and n ≡ 2 Clearly Eq. (1) gives Paley construction P1 for a
(mod 4). Hadamard matrix of order q + 1 if q ≡ 3 (mod 4), and
We denote by I an identity matrix of order n. Eq. (2) gives Paley construction P2 for a Hadamard matrix
With both, symmetric and skew-symmetric conference of order 2(q + 1) if q ≡ 1 (mod 4). For both cases we will
matrices, Hadamard matrices can be constructed in the give an example.
following way.
Payley Construction P1 (Skew-Symmetric Conference Example 4. Let q = 7, q = 7 = {0, 1, 2, 3, 4, 5, 6}. The
Matrices). If n ≡ 0 (mod 4) and there exists a skew- elements {1, 2, 4} are quadratic residues while {3, 5, 6}
symmetric conference matrix C(n), then are quadratic nonresidues. We have q ≡ 3 mod 4, and
thus we will get a Hadamard matrix of order q + 1 = 8
H(n) = I + C(n) (1) using construction P1 in Eq. (1). First we construct
the skew-symmetric conference matrix with the help of
is a Hadamard matrix. matrix Q defined by the Legendre symbols. The rows and
HADAMARD MATRICES AND CODES 931
columns of Q are labeled by the elements of the field and the Hadamard matrix of order 20 is given by:
7 = {0, 1, 2, 3, 4, 5, 6}, respectively.
I + C(10) −I + C(10)
H(20) =
0 1 1 −1 1 −1 −1 −I + C(10) −I − C(10)
−1 0 1 1 −1 1 −1
There exist many other constructions of Hadamard
−1 −1 0 1 1 −1 1
matrices (see, e.g., constructions and tables in Refs. 3–5).
Q= 1 −1 −1 0 1 1 −1
,
−1 However, the two constructions we have presented give
1 −1 −1 0 1 1
1 almost all Hadamard matrices of order ≤256 as shown
−1 1 −1 −1 0 1
in Table 1. The matrices are constructed by combining
1 1 −1 1 −1 −1 0
the Sylvester and Paley constructions. Sm denotes the
0 1 1 1 1 1 1 1 Hadamard matrix H(2m ) of order 2m constructed by
−1 0 1 1 −1 1 −1 −1 recursion with the Hadamard matrix H(2). P1(q) and P2(q)
−1 −1 0 1 1 −1 1 −1 denote Hadamard matrices of order q + 1 and 2(q + 1),
−1 −1 −1 0 1 1 −1 1 respectively, given by the Paley constructions P1 and
C(8) =
−1
1 −1 −1 0 1 1 −1
the P2.
−1 −1 1 −1 −1 0 1 1
−1 1 −1 1 −1 −1 0 1
3. HADAMARD CODES
−1 1 1 −1 1 −1 −1 0
A binary code C of length n, of cardinality (number of code
With C(8) we get the Hadamard matrix as
words) M, and of minimal Hamming distance d is called
an [n, M, d] code.
H(8) = C(8) + I
1 1 1 1 1 1 1 1 3.1. Binary Codes from a Hadamard Matrix
−1 1 1 1 −1 1 −1 −1
By replacing +1s by 0s and −1s by 1s in a normalized
−1 −1 1 1 1 −1 1 −1
Hadamard matrix H(n) of order n, a base binary code
−1 −1 −1 1 1 1 −1 1
=
−1 1 −1 −1 1 1 1 −1
−1 −1 1 −1 −1 1 1 1
Table 1. List of Hadamard Matrices
−1 1 −1 1 −1 −1 1 1
−1 1 1 −1 1 −1 −1 1 n Construction n Construction
4 S2 132 P1(131)
Example 5. Let q = 32 = 9, q = 9 = {0, 1, α, α 2 , α 3 , 8 S3 136 S1 ⊗ H(68)
α 4 , α 5 , α 6 , α 7 }, where α is a root of the polynomial x2 − x − 1. 12 P1(11) 140 P1(139)
Elements {1, α 2 , α 4 , α 6 } are quadratic residues, while 16 S4 , P2(7) 144 S2 ⊗ H(36)
{α, α 3 , α 5 , α 7 } are quadratic nonresidues. With this, we 20 P1(19), P2(32 ) 148 P2(73)
again can construct the matrix Q as 24 P1(23), S1 ⊗ H(12) 152 S1 ⊗ H(76)
28 P2(13) 156 —
0 1 −1 1 −1 1 −1 1 −1 32 S5 160 S2 ⊗ H(40)
1 0 −1 −1 −1 1 1 −1 1 36 P2(17) 164 P1(163), P2(34 )
S1 ⊗ H(20)
−1 −1 0 1 1 1 −1 −1 1 40 168 P1(167)
1 −1 1 0 −1 −1 −1 1 1
44 P1(23) 172 —
48 P1(23), S2 ⊗ H(12) 176 S2 ⊗ H(44)
Q=
−1 −1 1 −1 0 1 1 1 −1
1 52 P2(52 ) 180 P1(179), P2(89)
1 1 −1 1 0 −1 −1 −1
56 S1 ⊗ H(28) 184 —
−1 1 −1 −1 1 −1 0 1 1
60 P1(59) 188 —
1 −1 −1 1 1 −1 1 0 −1 64 S6 192 S4 ⊗ H(12)
−1 1 1 1 −1 −1 1 −1 0 68 P1(67) 196 P2(97)
72 S1 ⊗ H(36) 200 P1(199)
The symmetric conference matrix is 76 P2(37) 204 P2(101)
80 P1(67), S2 ⊗ H(20) 208 S2 ⊗ H(52)
C(10) 84 P1(83) 212 P1(211)
88 P1(87), S1 ⊗ H(44) 216 —
0 1 1 1 1 1 1 1 1 1 92 — 220 P2(109)
1 0 1 −1 1 −1 1 −1 1 −1 96 S1 ⊗ H(48), S2 ⊗ H(24) 224 S3 ⊗ H(28)
1 1 0 −1 −1 −1 1 1 −1 1 100 P2(72 ) 228 P2(113)
1 −1 −1 0 1 1 1 −1 −1 1 104 P1(103) 232 —
1 1 −1 1 0 −1 −1 −1 1 1 108 P2(53) 236 —
=1
S2 ⊗ H(28) S2 ⊗ H(60)
−1
112 240
−1 −1 1 −1 0 1 1 1
1 116 — 244 P1(35 )
1 1 1 −1 1 0 −1 −1 −1
120 S1 ⊗ H(60) 248 S1 ⊗ H(124)
1 −1 1 −1 −1 1 −1 0 1 1
124 P2(61) 252 P1(251), P2(53 )
1 1 −1 −1 1 1 −1 1 0 −1 128 S7 256 S8
1 −1 1 1 1 −1 −1 1 −1 0
932 HADAMARD MATRICES AND CODES
C(= [n, n, d]) of length n and cardinality M = n is obtained. where subscripts are calculated (mod 2m − 1), and
Since the rows of H(n) are orthogonal, any two codewords
of C differ exactly in n/2 coordinates, hence the minimal 1 + h1 x + · · · + hm−1 xm−1 + xm
(Hamming) distance of C is d = n/2. This code is called
the Walsh–Hadamard code. With small manipulations we is a primitive irreducible polynomial.
can get three types of code from the base code C. Simplex codes have many applications, such as
range-finding, synchronizing, modulation, scrambling, or
pseudonoise sequences. The dual code of a simplex code is
1. C1[n − 1, M = n, d = n/2]. Note that the first column the Hamming code, which is a perfect code with minimal
of H(n) consists of +1s, since we assumed a distance 3. In the application section below, we will
normalized Hadamard matrix. Correspondingly, the describe the use of m sequences. For further information,
first coordinate of any codeword of C is equal see also Ref. 7.
to 0. So it can be deleted without changing the
cardinality and minimal distance. Thus one gets an 3.4. Reed–Muller Codes of First Order
[n − 1, M = n, d = n/2] code C1.
Codes C2 obtained from the Hadamard matrix Sm =
2. C2[n, M = 2n, d = n/2]. Let C2 be a code consisting S1 ⊗ Sm−1 are linear (2m , m + 1, 2m−1 ) codes. They are
of all codewords of the base code C and all known also as first-order Reed–Muller codes and are
their complements. Then C2 is an [n, M = 2n, d = denoted by R(1, m). The all-zero vector 0 and the all-
n/2] code. one vector 1 are valid codewords. All the other codewords
3. C3[n − 1, M = 2n, d = n/2 − 1]. Let C3 be a code have Hamming weight 2m−1 .
obtained from the code C2 by deleting first coor- The encoding is the mapping between an information
dinates. Then C3 is an [n − 1, M = 2n, d = n/2 − vector into a codeword. This mapping can be a multipli-
1] code. cation of the information vector by a so-called generator
matrix. One possible encoding of Reed–Muller codes is
based on a special representation of the generator matrix
3.2. The Plotkin Bound and Hadamard Matrices
Gm consisting of m + 1 rows and 2m columns. The first row
The Plotkin bound states that for an [n, M, d] binary code v0 of G is the all-one vector 1. Consider this vector as a
run of 1s of length 2m . The second vector v1 is constructed
by replacing this run by two runs of identical length 2m−1
d
M≤2 if d is even, 2d > n where the first run consists of 0s and the second of 1s. In
[2d − n] the third stage (v2 ), each run of length 2m−1 is replaced by
M ≤ 4d if d is even, n = 2d
(5) two runs of 0s and 1s with lengths 2m−2 . The procedure is
d+1 continued in a similar manner until one gets the last row
M≤2 if d is odd, 2d + 1 > n
[2d + 1 − n] vm of the form 0101 · · · 0101.
M ≤ 4d + 4 if d is odd, n = 2d + 1.
Example 6. Let m = 4, n = 2m = 16. Then the generator
There exists a conjecture that for any integer s > 0 matrix of R(1, m) is as follows:
there exists a Hadamard matrix H(4s). Levenstein proved
that if this conjecture is true then codes exist meeting the v0
v1
Plotkin bound (5) (for details, see Ref. 6).
Gm =
v2
v3
3.3. Simplex Codes
v4
The class of codes C1 is known as simplex codes, because
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
the Hamming distance of every pair of codewords is 0 0
0 0 0 0 0 0 1 1 1 1 1 1 1 1
the same. Codewords can be geometrically interpreted
=0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
as vertices of a unit cube in n − 1 dimensions and 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
form a regular simplex. If n = 2m and C1 is obtained
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
from the Hadamard matrix Sm = S1 ⊗ Sm−1 , then C1 is a
linear (2m − 1, m, 2m−1 ) code1 known also as a maximal- If the information bits are u0 , u1 , . . . , um , then the
length feedback shift register code. Any nonzero codeword corresponding codeword is
is known also as an m-sequence, or as a pseudonoise
(PN) sequence. If c = (c0 , c1 , . . . , c2 m−2 ) is a nonzero c = u0 v0 + u1 v1 + · · · + um vm
codeword, then
For the code R(1, m), a systematic encoding is also
cs = cs−1 hm−1 + cs−2 hm−2 + · · · + cs−m+1 h1 + cs−m , possible as an extended cyclic code [7].
s = m, m + 1, . . . , The decoding can be performed by the fast Hadamard
transform described below. However, there exist several
decoding methods; so-called soft decoding, which uses
1In case of linear codes one can give the dimension k instead of reliability information on the code symbols (if available) is
the number of codewords M. It is M = 2k for binary linear codes. also possible [7]. In the following we describe the multistep
HADAMARD MATRICES AND CODES 933
majority-logic decoding algorithm (the Reed algorithm), where Is denotes an identity matrix of order s. Such a
using Example 6. Let u0 , u1 , u2 , u3 , u4 be information bits matrix has exactly two nonzero entries (+1 or −1) in each
and c0 , c1 , . . . , c15 be the corresponding code symbols. row and in each column.
Note that
c + c or c + c or Example 7. Let m = 3, n = 8. Then
0 8
0 4
1 c + c or
c1 + c5 or
9
+1 + 1
c2 + c10 or
c2 + c6 or
+1 − 1
c3 + c11 or c3 + c7 or +1 + 1
u1 = , u2 = ,
c4 + c12 or
c8 + c12 or
+1 − 1
M(1) = ,
c5 + c13 or
c9 + c13 or 8 +1 + 1
c6 + c14 or c10 + c14 or
+1 − 1
c7 + c15 c11 + c15
+1 + 1
c + c or c + c or
0 2 0 1 +1 − 1
+
+
c1 c3 or
c 2 c 3 or
+1 +1
c4 + c6 or
c4 + c5 or
c + c or c + c or +1 +1
5 7 6 7
u3 = , u4 = . +1 −1
c8 + c10 or
c8 + c9 or
+1 −1
c9 + c11 or
c10 + c11 or
M(2) =
,
c12 + c14 or
c12 + c13 or
8
+1 +1
c13 + c15 c14 + c15 +1 +1
+1 −1
These equations give 8 votes for each information bit +1 −1
u1 , u2 , u3 , u4 and the majority of the votes determines the
+1 +1
value of each information bit. If 3 or less errors occurred
+1 +1
during transmission then all those bits will be decoded
+1 +1
correctly. It remains to decode u0 . Since c − (u1 v1 + u2 v2 +
+1 +1
.
u3 v3 + u4 v) = u0 v0 , we have an equation to determine u0 M(3) =
+1 −1
8
by the majority rule.
+1 −1
One of the first applications of coding was the first
+1 −1
order Reed–Muller code of length 32, which was used in
+1 −1
the beginning of the 1970s for data transmission in the
Mariner 9 mission.
The matrix Sm can be written as
from Paley constructions P1 for prime q. Such linear codes Next, y2 = y1 M(2)
n is obtained with the same number of
have length q, dimension k = 12 (q ± 1), and minimum calculations and so on. In the mth stage, one obtains y
distance satisfying the inequality d2 − d + 1 ≥ q. The with a total number of m2m calculations.
linear quadratic residue code obtained from H(24) is the
famous perfect Golay (23,12,7) code. 5. COMMUNICATION APPLICATIONS
4. THE HADAMARD TRANSFORM In this section we describe two main principles as examples
of the various applications of Hadamard matrices and
related codes and sequences. Example 8 is for user
Let x = (x1 , x2 , . . . , xn ) be a real-valued vector. If a
separation in communication systems; Example 9 is for
Hadamard matrix H(n) exists, then the vector y =
correlation in distance measurement or to measure the
xH(n) is said to be its Hadamard transform (or
channel impulse response.
Walsh–Hadamard transform). The Hadamard transform
is used in communications and physics, mostly for n = 2m
5.1. Code-Division Multiple Access (CDMA)
and H(n) = Sm . In general, the Hadamard transform
with Sm requires 2m × 2m additions and subtractions. A In CDMA communication systems, such as the standards
computationally more efficient implementation is the fast IS95 and UMTS, the user separation is done with
Hadamard transform based on the representation of Sm Walsh–Hadamard sequences. Hereby, the link from the
as a product of sparse matrices. Specifically, define for mobiles to the base station uses other means of user
1 ≤ i ≤ m matrices by separation but the so-called downlink (base station to
mobiles) uses Walsh–Hadamard sequences. First we
n := I2m−i ⊗ S1 ⊗ I2i−1
M(i) consider a small example.
934 HADAMARD MATRICES AND CODES
Example 8. Each user is assigned one row of the namely, between 4 and 256 in order to adapt to variable
Hadamard matrix. In the case of H(4) we can have four data rates and transmission conditions.
users:
5.2. Measurement of the Channel Impulse Response and
Alice: +1, +1, +1, +1 Bob: +1, −1, +1, −1 Scrambling
Charlie: +1, +1, −1, −1 Dan: +1, −1, −1, +1 For user separation, the receiver evaluates the cross-
correlation function between the received sequence and
Now for each user, information bits can be sent by using each user sequence. Measurement of distances or the chan-
her/his sequence or the inverted (negative) sequence, for nel impulse response exploits the autocorrelation function.
example, in the case of Charlie +1, +1, −1, −1 for bit zero Again we restrict ourself to real-valued sequences. Let x(k)
and −1, −1, +1, +1 for bit one. We assume that there is a be the cyclic shift of x by k positions. Then, the autocorre-
synchronous transmission of information for all users. For lation is defined as
Alice, Bob, and Dan, bit 0 is send and for Charlie a one.
1 (k)
n
Each of the four user’s receiver gets the sum of all signals
x (k) = xi xi
n
i=1
(+2, −2, +2, +2) = (+1, +1, +1, +1) + (+1, −1, +1, −1)
+ (−1, −1, +1, +1) + (+1, −1, −1, +1) The autocorrelation can take values between −1 and +1.
In the case of PN sequences the autocorrelation has either
Now the receiver can detect which bit was send by so the value 1 for k = 0 and k = n or
called correlation of the user’s sequence with the received
1
sequence as follows. For example, Bob computes − 0 < k < n, n odd,
x (k) = n
− 1
(+1) · (+2) + (−1) · (−2) + (+1) · (+2) + (−1) · (+2) = +4 0 < k < n, n even,
n−1
From the result +4, Bob concludes that a zero (the The mobile communication channel suffers from
sequence itself) was transmitted. multipath propagation when the receiver gets the sum
For Charlie we get of delayed and attenuated versions of the transmitted
signal. Usually it is assumed that the channel is time-
(+1) · (+2) + (+1) · (−2) + (−1) · (+2) + (−1) · (+2) = −4 invariant for a small period and can be described by
the impulse response of a linear time-invariant system.
From the result −4, Charlie concludes that the transmis- We will explain the measurement of the channel impulse
sion was a one (the inverted sequence). response using an example.
For the description of the CDMA principle, we Example 9. We assume that the sender transmits
restrict ourselves to real-valued sequences. Then the periodically the PN sequence pn = +1, +1, −1, −1, −1,
crosscorrelation x,y of two sequences x and y of length n +1, −1 of length 7. Therefore the transmitted signal is
is defined by (‘‘;’’ is used to mark the period):
1
n
x,y = xi yi
n
i=1
x = . . . ; +1, +1, −1, −1, −1, +1, −1;
+ 1, +1, −1, −1, −1, +1, −1; . . .
Clearly, with this definition, the crosscorrelation x,y = 0,
if x and y are orthogonal and x,y = 1 if x = y. Let Suppose the receiver gets
r = u1 x1 + u2 x2 + · · · + un xn be a received signal, where
u1 , u2 , . . . , un are the information bits {+1, −1} and the
yi = xi + 12 xi−2
xi are the rows of a Hadamard matrix. Then the CDMA
principle gives = . . . ; 32 , 12 , − 12 , − 12 , − 23 , 12 , − 32 ; 32 , 12 , − 21 , − 12 , . . .
In many CDMA systems, base stations transmit pilot 3. W. D. Wallis, A. P. Street, and J. Seberry Wallis, Combina-
tones consisting of PN sequences. Using the measurement torics: Room squares, sum-free sets, Hadamard matrices, in
method presented above, the mobile terminals can Lecture Notes in Mathematics, Springer, Berlin, 1972, p. 292.
determine the channel impulse response and use it 4. J. Seberry and M. Yamada, Hadamard matrices, sequences,
to adjust a so-called RAKE receiver. Because of the and block designs, in J. H. Dinitz and D. R. Stinson, eds.,
good autocorrelation and good cross-correlation properties Contemporary Design Theory: A Collection of Surveys, Wiley,
of PN sequences, they are also used to scramble the New York, 1992, pp. 431–560.
transmitted signals. Clearly the autocorrelation property 5. R. Craigen, Hadamard matrices and designs, in C. J. Colbourn
reduces the influence of the paths on each other and the and J. H. Dinitz, eds., The CRC Handbook of Combinatorial
crosscorrelation property guarantees the user separation. Designs, CRC Press, New York, 1996, pp. 370–377.
Thus, if we know the channel paths at the receiver, we can 6. F. J. MacWilliams and N. J. A. Sloan, The Theory of Error-
use a correlator for each path and add their results. This Correcting Codes, (3rd printing) North-Holland, 1993.
is the concept of a RAKE receiver. In IS95, a PN sequence 7. M. Bossert, Channel Coding for Telecommunications, Wiley,
of length 242 − 1 is used for this purpose, and in UMTS, New York, 1999.
scrambling codes are used on the basis of the PN sequences
of the polynomials x41 + x3 + 1 and x41 + x20 + 1.
The GPS system exploits both the good autocorrelation
and crosscorrelation properties of PN sequences. Several HELICAL AND SPIRAL ANTENNAS
satellites synchronously send their PN sequences. A GPS
receiver exploits the crosscorrelation to separate the HISAMATSU NAKANO
signals from different satellites and the autocorrelation to Hosei University
determine the timing offsets between different satellites Koganei, Tokyo, Japan
caused by signal propagation. Combining these timing
offsets with information on the actual position of the
satellites allows the precise calculations of the receiver’s 1. INTRODUCTION
position. The public PN sequences of length 1023 used
in GPS are generated by the polynomials x10 + x3 + 1 or This article discusses the radiation characteristics of
x10 + x9 + x8 + x6 = x3 + x2 + 1. various helical and spiral antennas. For this, numerical
techniques applied to these antennas are summarized
Acknowledgment in Section 2, where fundamental formulas to evaluate
The author would like to acknowledge the valuable hints and help the radiation characteristics are presented. Section 3
from Professor Ernst Gabidulin from the Institute of Physics and on helical antennas, is composed of three subsections,
Technology, Moscow, Russia. which present the radiation characteristics of normal-
mode, axial-mode, and conical-mode helical antennas,
respectively. Section 4, on spiral antennas, is composed
BIOGRAPHY
of five subsections; Sections 4.2 and 4.3 qualitatively
Martin Bossert received the Dipl.-Ing. degree in describe the radiation mechanism of spiral antennas, and
electrical engineering from the Technical University of Sections 4.4 and 4.5 quantitatively refer to the radiation
Karlsruhe, Germany, in 1981, and a Ph.D. degree form characteristics of the spirals. Finally, Section 5 presents
the Technical University of Darmstadt, Germany, in additional information on helical and spiral antennas: a
1987. After a DFG scholarship at Lingkoeping University, backfire-mode helix and techniques for changing the beam
Sweden, he joined AEG Mobile Communication, Ulm, direction of a spiral antenna.
Germany, where he was, among others, involved in the
specification and developement of the GSM System. Since
1993, he has been a professor with the University of Ulm, 2. NUMERICAL ANALYSIS TECHNIQUES
Germany, where he is currently head of the Departement
of Telecommunications and Applied Information Theory. This section summarizes numerical analysis techniques
His main research areas concern secure and reliable data for helical and spiral antennas. The analysis is based on
transmission, especially generalized concatenation/coded an electric field integral equation [1,2]. Using the current
modulation, code constructions for block and convolutional distribution obtained by solving the integral equation,
codes, and soft-decision decoding. the radiation characteristics, including the radiation field,
axial ratio, input impedance, and gain, are formulated.
BIBLIOGRAPHY
2.1. Current on a Wire
1. J. J. Sylvester, Thoughts on orthogonal matrices, simultane-
ous sign-successions, and tessellated pavements in two or more Figure 1 shows an arbitrarily shaped wire with a length
colours, with applications to Newton’s rule, ornamental tile- of Larm (from S0 to SE ) in free space. It is assumed that
work, and the theory of numbers, Phil. Mag. 34: 461–475 the wire is thin relative to an operating wavelength and
(1867). the current flows only in the wire axis direction. It is also
2. R. E. A. C. Paley, On orthogonal matrices, J. Math. Phys. 12: assumed that the wire is perfectly conducting and hence
311–320 (1933). the tangential component of the electric field on the wire
936 HELICAL AND SPIRAL ANTENNAS
z where
∧ SE 2
S 1 SE
∂ G(s, s )
S en (s) = Jn (s ) − + β 2
G(s, s
)ŝ • ŝ
ds
jωε0 S0 ∂s ∂s
(5)
Multiplying both sides of Eq. (1) by functions Wm (s)(m =
1, 2, . . . , N) and integrating the multiplied results over the
S′
∧ wire length from S0 to SE , one obtains
S′
(m0, e0)
N
In Zmn = Vm , m = 1, 2, . . . , N (6)
S0 n=1
y
O where
SE
Zmn = en (s)Wm (s)ds (7)
S0
SE
Vm = − Eis (s)Wm (s)ds (8)
S0
component. Using these two components, the axial ratio The ground plane in Fig. 3, which backs each helical
(AR) is defined as AR = {|ER | + |EL |}/{||ER | − |EL ||}. The AR arm, is very large and assumed to be of infinite extent
is an indicator of the uniformity of a CP wave. Note that in the following analysis. This assumption allows the use
AR = 1 (0 dB) when the radiation is perfectly circularly of image theory, where the ground plane is removed.
polarized. The removal of the ground plane enables one to use the
The input impedance Zin is defined as Zin = Rin + jXin = techniques described in Section 2.
Vin /Iin , where Vin and Iin are the voltage and current at
antenna feed terminals, respectively. √ The power input to 3.1. Normal-Mode Helical Antenna
the antenna is given as Pin = Rin |Iin / 2|2 . The gain G is
defined as The radiation when the circumference C is very small
√ relative to a given wavelength is investigated in this
|E(r, θ, φ)/ 2|2 /Z0 subsection. For this, a circumference of C = 0.0422λ0.5
G(θ, φ) = (25.3 mm) is chosen for the antenna structure shown in
Pin /4π r2
Fig. 3a, together with a pitch angle of α = 40◦ and number
|D(θ, φ)|2 of helical turns n = 4.5, where λ0.5 is the wavelength at
G(θ, φ) = (15)
60Pin a test frequency of 0.5 GHz = 500 MHz. The total arm
length Larm , including an initial wire length of
h =
where Z0 is the intrinsic impedance of free space (Z0 =
1.5 mm, is 14 λ0.5 . Therefore, the antenna characteristics
120 ) and D(θ, φ) is defined as
are expected to be similar to those of a monopole antenna
r of one-quarter wavelength.
D(θ, φ) = −jβr E(r, θ, φ) (16) Figure 4 shows the current I(= Ir + jIi ) distributed
e
along the helical arm at 0.5 GHz. The helical arm is chosen
D(θ, φ) is decomposed into two components as E(r, θ, φ) in to be thin: wire radius ρ = 0.001λ0.5 . It is found that the
Eq. (14): current distribution is a standing wave, as seen from the
phase distribution. Note that the phase is calculated from
D(θ, φ) = DR (θ, φ)(θ̂ − jφ̂) + DL (θ, φ)(θ̂ + jφ̂) (17) tan−1 (Ii /Ir ).
The radiation field from the current distribution
Therefore, the gains for right- and left-hand CP waves are at 0.5 GHz is shown in Fig. 5, where parts (a) and
calculated by GR (θ, φ) = |DR (θ, φ)|2 /30Pin and GL (θ, φ) = (b) are the radiation patterns in the x–z and y–z
|DL (θ, φ)|2 /30Pin , respectively. planes, respectively, and part (c) is the azimuth radiation
pattern in the horizontal plane (θ = 90◦ , 0◦ ≤ φ ≤ 360◦ ).
3. HELICAL ANTENNAS It is clearly seen that the helical antenna radiates a
linearly polarized wave: Eθ
= 0 and Eφ = 0. The maximum
Figure 2 shows a helical arm, which is specified by radiation is in the horizontal direction (θ = 90◦ ), where the
the pitch angle α, the number of helical turns n, and polarization is in the antenna axis (z-axis) direction. The
the circumference of the helix C. Helical antennas are radiation field component Eθ in the horizontal plane is
classified into three groups in terms of the circumference omnidirectional. The gain in the x direction is calculated
C relative to a given wavelength λ [4]: a normal-mode to be approximately 4.9 dB.
helical antenna (C
λ), an axial-mode helical antenna Additionally, Fig. 6 shows the radiation patterns at
(C ≈ λ), and a conical-mode helical antenna (C ≈ 2λ). The 0.5 GHz as a function of the number of helical turns
normal-mode helical antenna radiates a linearly polarized n, where the pitch angle and circumference remain
wave. The axial-mode and conical-mode helical antennas unchanged: α = 40◦ and C = 0.0422λ0.5 . It is found that,
radiate CP waves. The beam direction for each mode is as n increases, the radiation beam becomes sharper.
illustrated using arrows in Fig. 3.
3.2. Axial-Mode Helical Antenna
F
G.P.
C << l F
Figure 3. Helical antennas: (a) a normal-mode G.P.
helical antenna (C
λ); (b) an axial-mode heli- F C ≈ 2l
cal antenna (C ≈ λ); and (c) a conical-mode heli- G.P.
cal antenna (C ≈ 2λ). C ≈ 1l
arm end. The third region P2 –SE reflects this fact. The
0 reflected current from the arm end SE is not desirable
for a CP antenna, because it radiates a CP wave whose
rotational sense is opposite that of the forward current,
−90 resulting in degradation of the axial ratio.
F SE Figure 8 shows radiation patterns at 11.85 GHz, where
Arm length the electric field at a far-field point is decomposed into
Figure 4. Current distribution of a normal-mode helical right- and left-hand CP wave components. The forward
antenna. current traveling toward the arm end generates the
copolarization component. The copolarization component
for this helical antenna is a right-hand CP wave component
The amplitude |I| in the first region F –P1 shows a ER . The component EL for this helical antenna is called
rapid decay. Point P1 is the position where the current the ‘‘cross-polarization component.’’ The cross-polarization
first becomes minimal. The current in this region F –P1 component is generated by the undesired reflected current.
is a traveling wave, as seen from the phase progression. The axial ratio and gain in the z direction are calculated
It is noted that the first region acts as an exciter for the to be approximately 0.8 and 11.5 dB, respectively.
remaining helical turns. This is proved by the fact that the Additionally, Fig. 9 shows the gains for a right-hand
current distribution remains almost the same even when CP wave at 11.85 GHz as a function of the number of
the helical wire is cut at point P1 [5]. helical turns, n, for various pitch angles α, where the
30°
90° x 0°
0 −10 −20 [dB]
x f(q = 90°)
Figure 5. Radiation patterns of a normal-mode helical antenna: (a) in the x–z plane; (b) in the
y–z plane; (c) in the horizontal plane (θ = 90◦ , 0◦ ≤ φ ≤ 360◦ ).
HELICAL AND SPIRAL ANTENNAS 939
90° x x 90° x
0 −10 −20 [dB] 90° 0 −10 −20 0 −10 −20 [dB]
[dB]
Figure 6. Radiation pattern of a normal-mode helical antenna as a function of number of helical
turns: (a) n = 7.5; (b) n = 10.5; (c) n = 13.5.
4
C = 1.0l11.85
a = 12.5°
2 n = 15
Current [mA]
0
Ir
−2 Ii
I
f = 11.85 GHz
−4
0
−1000
Phase [deg.]
−3000
−5000
−7000
F P1 P2 SE
Arm length
(a) z (b) z
f = 11.85 GHz
C = 1.0l11.85 q q
a = 12.5° 0° 0°
n = 15 30° 30° 30° 30°
ER
EL 60° 60° 60° 60°
Figure 8. Radiation patterns of an axial-mode
90° x 90° y helical antenna: (a) in the x–z plane; (b) in the
0 −10 −20 [dB] 0 −10 −20 [dB] y–z plane.
circumference is kept constant: C = 1.0λ11.85 . Note that turns n = 2. The total arm length, including the initial
each gain increases as n increases. However, the gain wire length
h = 1 mm, is Larm = 4.09λ23.7 .
increase is bounded; that is, there is a maximum gain Figure 10 shows the current distribution along the
value for a given pitch angle α. helical arm, whose wire radius is ρ = 0.001λ23.7 . As
observed in the first region F –P1 of the axial-mode helical
3.3. Conical-Mode Helical Antenna antenna (see Fig. 7), the current is a decaying traveling
When the helical arm has a circumference of approxi- wave, which radiates a CP wave.
mately two wavelengths (2λ), the radiation from the helix It is obvious that local radiation from the helical arm
forms a conical beam [7,8]. To reflect this fact, a frequency forms the total radiation. If each turn of the helix is a
of 23.7 GHz is used for an antenna with a circumference of local radiation element and approximated by a loop whose
C = 25.3 mm = 2.0λ23.7 , where λ23.7 is the wavelength at circumference is two wavelengths, the currents along n
23.7 GHz. Other configuration parameters are arbitrarily local loops produce a zero field in the z direction and
chosen as follows: pitch angle α = 4◦ and number of helical a maximum radiation off the z axis; that is, the total
940 HELICAL AND SPIRAL ANTENNAS
−1000
4.2.1. First-Mode Radiation. The current along arm
A travels through point PA and reaches arm end EA .
−1500
F SE Similarly, the current along arm B travels through point
Arm length PB and reaches arm end EB . The phase of the current
Figure 10. Current distribution of a conical-mode helical
at point PB always differs from that at point PA by 180◦ ,
antenna. because of the antiphase excitation at terminals TA and
TB . This is illustrated using arrows at PA and PB in Fig. 13.
Note that the direction of the arrow at PB is opposite that
radiation forms a conical beam. Figure 11 reflects this of the arm growth, and the two arrows at PA and PB are in
fact, where the radiation in the beam direction (θ = 34◦ ) the same direction.
is circularly polarized. Note that the azimuth pattern An interesting phenomenon is found when points PA
of Fig. 11c is calculated in a plane of (θ = 34◦ , 0◦ ≤ φ ≤ and PB are located on a ring region, with a center
360◦ ), where the copolarization component ER shows an circumference of one wavelength (λ), in the spiral
omnidirectional pattern. plane. Points PA and PB in this case are separated
by approximately one-half wavelength (0.5λ) along the
circumference. The current at PA and the current at
4. SPIRAL ANTENNA its neighboring point PB on arm B are approximately
in phase, because the current traveling from point PB
A spiral antenna is an element that radiates a CP wave to point PB along arm B (traveling by approximately
over a wide frequency bandwidth [9–12]. The mechanism one-half wavelength, since the spiral arms are tightly
of the CP wave radiation is investigated for two cases of wounded) experiences a phase change of approximately
excitation: (1) antiphase and (2) in-phase. First, the CP 180◦ . Similarly, the current at PB and the current at its
radiation is qualitatively explained in terms of current neighboring point PA on arm A are approximately in phase.
band theory [9], which is based on transmission line Figure 13 illustrates these four currents at points PA , PB ,
HELICAL AND SPIRAL ANTENNAS 941
90° x 90° y
0 −10 −20 [dB] 0 −10 −20 [dB] 60°
30°
0°
f (q = 34°)
x
Figure 11. Radiation patterns of a conical-mode helical antenna: (a) in the x–z plane; (b) in the
y–z plane; (c) in a plane of (θ = 34◦ , 0◦ ≤ φ ≤ 360◦ ).
(a) z (b) y
y
EB fwn
Arm B
rA
x fst
EA TA
O x
TB
Arm A
Figure 12. A two-wire spiral antenna: (a) perspective view; (b) top view.
+
4.2.2. Third-Mode Radiation. When points PA and PB
are located on a ring region of three-wavelength (3λ) z x
circumference in the spiral plane (see Fig. 14), points −
PA and PB are separated by 1.5 wavelengths along this
3λ circumference. Therefore, the current along arm B
experiences a phase change of approximately 360◦ + 180◦
PB
from point PB to point PB . As a result, the currents at points
PA and PB are approximately in phase; that is, a current
PA′
band is formed. Similarly, the currents at points PB and PA
are approximately in phase, and a current band is formed.
The currents on arms A and B become in-phase with a
Figure 13. Current bands for first-mode radiation.
period of one-half wavelength along the 3λ circumference.
942 HELICAL AND SPIRAL ANTENNAS
y y
PB′ PB′
PA
PA
z x z x
2l loop
PB
PB PA′
PA′
3l loop
As a result, four more current bands are formed between Arrows PA and PB in Fig. 15 illustrate the phases of the
points PA and PB , as shown in Fig. 14. The directions of currents at points PA and PB . The directions of the arrows
all current bands along the 3λ circumference cause the at points PA and PB are in the arm-growth direction and
electric fields, rotating with time, to cancel on the z axis opposite each other.
and add off the z axis. This radiation is called ‘‘third-mode The current traveling from points PB to PB experiences a
radiation.’’ phase change of approximately 360◦ , because the distance
from points PB to PB along arm B is approximately one
4.2.3. Odd-Mode Radiation. So far, first-mode radia- wavelength. It follows that the direction of the arrow for
tion and third-mode radiation have been discussed under the current at point PB is in the arm-growth direction, as
the condition that the excitation at terminals TA and TB is shown in Fig. 15. The arrows at points PA and PB have
antiphase. The first- and third-mode radiation components the same direction, and indicate the formation of a current
result from the current bands formed over two regions of band.
1λ and 3λ circumferences in the spiral plane, respec- Similarly, the current traveling from points PA to PA
tively. The mechanism described in the previous section experiences a phase change of approximately 360◦ , due
leads to the formation of higher odd mth-mode radiation to an approximately one-wavelength difference between
(m = 5, 7, . . .) as long as the currents exist over regions these points. An arrow at point PA in Fig. 15 shows this
of circumference of mλ in the spiral plane. Note that fact. The arrows at points PA and PB have the same
each higher odd mth-mode radiation component becomes direction, and indicate the formation of a current band.
maximal off the z axis. The two middle points between points PA and PB along
As the currents leave the regions of circumference the 2λ circumference are separated from PA (and PB ) by
of mλ(m = 1, 3, 5, . . .) in the spiral plane, the in-phase one-half wavelength. A current band is formed at each
condition of the neighboring currents on arms A and B middle point. It follows that four current bands are formed
becomes destructive. The destructive currents contribute over a ring region of 2λ circumference in the spiral plane.
little to the radiation. The radiation, therefore, is As seen from the direction of the arrows in Fig. 15, the
characterized by a sum of odd-mode radiation components radiation from these four current bands is zero on the z
that the spiral supports. axis and maximal off the z axis. This radiation is called
‘‘second-mode radiation.’’
4.3. In-Phase Excitation
When terminals TA and TB are excited in antiphase, the 4.3.2. Even-Mode Radiation. One can conclude from
spiral has odd-mode radiation components, as discussed in the previous observation that the spiral with in-phase
the previous subsection. Now the radiation when terminals excitation does not have current bands in the ring region
TA and TB are excited in phase (terminals TA and TB are whose circumference is 3λ in the spiral plane. The phase
excited with the same amplitude and the same phase) is relationship of the currents over the 3λ ring region is
considered. Realization of in-phase excitation is found in destructive (out of phase), thereby not forming current
Section 4.5. bands. However, as the currents on arms A and B further
travel toward their arm ends, the phase relationship
4.3.1. Second-Mode Radiation. A situation where of the currents gradually becomes constructive. When
points PA and PB are located on a ring region of 2λ cir- the currents reach a ring region whose circumference
cumference in the spiral plane is investigated. The phases is four wavelengths in the spiral plane, the phases of
of the currents at points PA and PB are the same due to the neighboring currents become in-phase. Again current
two facts; terminals TA and TB are excited in phase, and bands are formed. This radiation is called ‘‘fourth-mode
the distance from terminal TA to point PA along arm A is radiation.’’ Similarly, higher even-mode radiation occurs
equal to that from terminal TB to point PB along arm B. until the currents die out. It is noted that even mth-mode
HELICAL AND SPIRAL ANTENNAS 943
radiation (m = 2, 4, . . .) have zero intensity on the z axis So far, the antenna characteristics at a frequency of
and maximal off the z axis. 6 GHz have been discussed. Now, the frequency responses
of the radiation characteristics are investigated. Figure 18
4.4. Numerical Results of a Spiral Antenna with Antiphase shows the axial ratio (AR) in the positive z direction
Excitation as a function of frequency, together with the gain for a
The radiation mechanisms of a spiral antenna have right-hand CP wave, GR , in the positive z direction. It
been qualitatively discussed in Sections 4.2 and 4.3. In is observed that, as the frequency decreases, the axial
this subsection, the radiation characteristics of a spiral ratio deteriorates. This is due to the fact that, as the
antenna with antiphase excitation are quantitatively frequency decreases, the ring region of one-wavelength
obtained on the basis of the numerical techniques circumference in the spiral plane moves toward the
presented in Section 2. periphery of the spiral and finally disappears. With the
The configuration parameters of the spiral are chosen as movement of the one-wavelength circumference ring, the
follows: spiral constant asp = 0.0764 cm/rad and winding polarization of the radiation becomes elliptical, that is, the
angle φwn ranging from φst = 2.60 rad to φend = 36.46 rad. axial ratio increases.
These configuration parameters lead to an antenna Figure 19 shows the input impedance Zin (= Rin + jXin )
diameter of 2π asp φend = 17.5 cm = 3.5λ6 , where λ6 is the as a function of frequency. The input impedance is rela-
wavelength at a frequency of 6 GHz. In other words, the tively constant over a wide frequency bandwidth. Note that
spiral at a frequency of 6 GHz includes a ring region of the input impedance is always Zin = 60π when the fol-
three-wavelength circumference in the spiral plane. Note lowing two conditions are satisfied: (1) arms A and B, each
that the wire radius of the spiral arm is ρ = 0.012λ6 . made of a strip conductor, are infinitely wound; and (2)
Figure 16 shows the current I(= Ir + jIi ) distributed the spacing between the two arms equals the width of the
along arm A at a frequency of 6 GHz. Since the currents strip conductor. The antenna satisfying these conditions
on arms A and B are symmetric with respect to the is called the ‘‘self-complementary antenna’’ [13].
centerpoint o (note that the distance between terminals
TA and TB is assumed to be infinitesimal), this figure 4.5. Numerical Results of a Spiral Antenna with In-Phase
shows only the current along arm A. It is found that Excitation
the current decreases, traveling toward the arm end. The
Figure 20 shows a spiral antenna with in-phase excitation,
phase progression of the current close to that in free space
where a round conducting disk, approximated by wires for
(= −2π s/λ6 , where s is the distance measured along the
analysis (see Fig. 20c), is used for exciting the spiral.
spiral arm from the centerpoint o to an observation point).
A voltage source is inserted between the spiral and the
This means that the wavelength of the current along the
conducting disk. The spiral is a backed by a conducting
arm, called the ‘‘guided wavelength λg ,’’ is close to the
plane of infinite extent.
wavelength propagating in free space, λ6 .
The configuration parameters are as follows: spiral
Figure 17 shows the radiation patterns in the x–z plane
constant asp = 0.04817 cm/rad, winding angle φwn ranging
and y–z plane. The spiral equally radiates waves in the ±z
from φst = 8π rad to φend = 37.5 rad, wire radius ρ =
hemispheres. The radiation has a maximum value in the
0.00314λ6 , disk diameter Ddisk = 0.49λ6 , spacing between
±z directions and is circularly polarized. The axial ratio in
spiral and disk H = 0.046λ6 , and spacing between spiral
the ±z directions is approximately 0.1 dB, and the gain is
and conducting plane Hr = 14 λ6 . The spiral at 6 GHz
approximately 5.5 dB.
includes a ring region of two-wavelength circumference in
the spiral plane. Note that a ring region of one-wavelength
8
circumference does not contribute to the radiation, and
hence the arms inside the one–wavelength circumference
are deleted, as shown in Fig. 20a.
4
Current [mA]
−4000
5. ADDITIONAL INFORMATION
o SE
Arm length s The axial-mode helix in Section 3.2 has been analyzed
Figure 16. Current distribution of a spiral antenna with under the condition that its ground plane is of infinite
antiphase excitation. extent. An interesting phenomenon is observed when
(a) ER z (b) z
EL q q
6GHz 0° 0°
30° 30° 30° 30°
x y
−20 −20
−10 −10
Figure 17. Radiation patterns of a spiral
antenna with antiphase excitation: (a) in the 0 0
x–z plane; (b) in the y–z plane. (dB) (dB)
6 200
5
GR R in
4 100
AR, GR [dB]
Z in [Ω]
3
2 0
X in
1 AR
0 −100
2 3 4 5 6 7 2 3 4 5 6 7
Frequency [GHz] Frequency [GHz]
Figure 18. Axial ratio and gain as a function of frequency. Figure 19. Input impedance as a function of frequency.
H ∼
Spiral antenna
Hr Disk
Ddisk
Disk
x Conducting plane
of infinite extent
(c)
Ddisk
Figure 20. A spiral antenna with in-phase excitation: (a) perspective view; (b) side view; (c) disk
approximated by wires.
944
HELICAL AND SPIRAL ANTENNAS 945
−20 6
−10 9
(dB)
0
10 (dB)
Figure 21. Radiation from a spiral antenna with in-phase excitation: (a) radiation pattern in the
x–z plane; (b) radiation pattern in a plane of (θ = 40◦ , 0◦ ≤ φ ≤ 360◦ ); and (c) axial ratio pattern
in a plane of (θ = 40◦ , 0 ≤ φ ≤ 360◦ ).
the size of the ground plane is made comparable to the characteristics are stable over a wide frequency band, for
circumference of the helix — the helix radiates a CP wave example, ranging from 1 to 10 GHz.
in the backward direction (in the negative z direction).
This is called the ‘‘backfiremode.’’ An application of the BIOGRAPHY
backfiremode is found in Ref. 14, where the helix is used as
a CP primary feed for a parabolic reflector. This reflector Hisamatsu Nakano received his B.E., M.E., and Dr. E.
antenna has a high gain of more than 30 dB with an degrees in electrical engineering from Hosei University,
aperture efficiency of more than 70%. It is widely used Tokyo, Japan, in 1968, 1970, and 1974, respectively.
for direct broadcasting satellite (DBS) signal reception. Since 1973 he has been a member of the faculty
For more recent research on helical antennas, including of Hosei University, where he is now a professor
the analysis of a helix wound on a dielectric rod and of electronic informatics. His research topics include
four helices with a cavity, readers are directed to Refs. 15 numerical methods for antennas, electromagnetic wave
and 16. scattering problems, and light wave problems. He has
The spiral antenna discussed in Section 4.4 has a, published more than 170 refereed journal papers and
‘‘bidirectional beam’’; that is, the radiation occurs in the 140 international symposium papers on antenna and
±z-hemispheres, as shown in Fig. 17. It is possible to relevant problems. He is the author of Helical and Spiral
change the bi-directional beam to a ‘‘unidirectional beam’’ Antennas (Research Studies Press, England, Wiley, 1987).
(radiation in the +z hemisphere) using two techniques. He published the chapter ‘‘Antenna analysis using integral
One is to use a conducting plane, as shown in Fig. 22a, equations,’’ in Analysis Methods of Electromagnetic Wave
and the other is to use a cavity, as shown in Fig. 22b. Problems, vol. 2 (Norwood, MA: Artech House, 1996).
The conducting plane in Fig. 22a is put behind the spiral He was a visiting associate professor at Syracuse
usually with a spacing of one-quarter wavelength (0.25λ) University, New York, during March–September 1981,
at the operating center frequency. The gain is increased a visiting professor at University of Manitoba, Canada,
because the rear radiation is reflected by the conducting during March–September 1986, and a visiting professor
plane and added to the front radiation. However, the at the University of California, Los Angeles, during
conductor plane affects the wideband characteristics of September 1986–March 1987.
the spiral. To keep the antenna characteristics over a Dr. Nakano received the Best Paper Award from the
wide frequency bandwidth, it is recommended that the IEEE 5th International Conference on antennas and
spacing between the spiral and the conducting plane propagation in 1987. In 1994, he received the IEEE AP-S
be small and absorbing material be inserted between Best Application Paper Award (H. A. Wheeler Award).
the outermost arms and the conducting plane [12]. A
theoretical analysis for this case using a finite-difference BIBLIOGRAPHY
time-domain method [17] is found in Ref. 18.
The inside of the cavity in Fig. 22b is filled with 1. K. K. Mei, On the integral equations of thin wire antennas,
absorbing material. The rear radiation is absorbed and IEEE Trans. Antennas Propag. 13(3): 374–378 (May 1965).
does not contribute to the front radiation. Only one-half 2. E. Yamashita, ed., Analysis Methods for Electromagnetic
of the power input to the spiral is used for the radiation. Wave Problems, Artech House, Boston, 1996, Chap. 3.
Therefore, the gain does not increase, unlike the gain for 3. R. F. Harrington, Field Computation by Moment Methods,
the spiral with a conducting plane. However, the antenna Macmillan, New York, 1968.
946 HF COMMUNICATIONS
(a) z (b) z
y y
x x
Cavity
Conducting plane
z z
Spiral Spiral
∼ ∼
Absorbing material
y y
x x
Conducting plane Cavity
Figure 22. Techniques for unidirectional beam: (a) a conducting plane; (b) a cavity filled with
absorbing material.
4. J. D. Kraus, Antennas, 2nd ed., McGraw-Hill, New York, 17. A. Taflove, Computational Electrodynamics: The Finite-
1988, Chap. 7. Difference Time Domain Method, Artech House, Norwood,
5. H. Nakano and J. Yamauchi, Radiation characteristics of MA, 1995.
helix antenna with parasitic elements, Electron. Lett. 16(18): 18. Y. Okabe, J. Yamauchi, and H. Nakano, A strip spiral
687–688 (Aug. 1980). antenna with absorbing layers inside a cavity, Proc. 2001
Communications Society Conf. IEICE, Choufu, Japan, Sept.
6. H. Nakano and J. Yamauchi, The balanced helices radiating
2001, p. B-1-107.
in the axial mode, IEEE AP-S Int. Symp. Digest, Seattle, WA,
1979, pp. 404–407.
7. R. C. Johnson, Antenna Engineering Handbook, 3rd ed.,
McGraw-Hill, New York, 1993, Chap. 13. HF COMMUNICATIONS
8. H. Mimaki and H. Nakano, A small pitch helical antenna
radiating a circularly polarized conical beam, Proc. 2000 JULIAN J. BUSSGANG
Communications Society Conf. IECE, Nagoya, Japan, Sept. STEEN A. PARL
2000, p. B-1-82. Signatron Technology
Corporation
9. J. A. Kaiser, The Archimedean two-wire spiral antenna, IRE
Concord, Massachusetts
Trans. Antennas Propag. AP-8(3): 312–323 (May 1960).
10. H. Nakano and J. Yamauchi, Characteristics of modified 1. INTRODUCTION TO HF COMMUNICATION
spiral and helical antennas, IEE Proc. 129(5)(Pt. H): 232–237
(Oct. 1982). High-frequency (HF) communications is defined by the
International Telecommunication Union (ITU) as radio
11. H. Nakano et al., A spiral antenna backed by a conducting
plane reflector, IEEE Trans. Antennas Propag. AP-34(6): transmission in the frequency range 3–30 MHz. However,
791–796 (June 1986). it is common to consider the lower end of the HF band as
extending to 2 MHz.
12. J. J. H. Wang and V. K. Tripp, Design of multioctave spiral-
HF radio communications originally became popular
mode microstrip antennas, IEEE Trans. Antennas Propag.
because of its long-range capabilities and low cost. For
39(3): 332–335 (March 1991).
high-rate digital communications, the HF channel is,
13. Y. Mushiake, Self-complementary antennas, IEEE Antennas however, a rather difficult communications medium. It
Propag. Mag. 34(6): 23–29 (Dec. 1992). subjects the transmitted signal to large variations in
14. H. Nakano, J. Yamauchi, and H. Mimaki, Backfire radiation signal level and multiple propagation paths with delay-
from a monofilar helix with a small ground plane, IEEE time differences large enough to cause signal overlap, and
Trans. Antennas Propag. 36(10): 1359–1364 (Oct. 1988). hence self-interference. A relatively high rate of fading
15. H. Nakano, M. Sakai, and J. Yamauchi, A quadrifilar helical can result and hinder the ability of receivers to adapt to
antenna printed on a dielectric prism, Proc. 2000 Asia-Pacific the channel. High-data-rate communications over HF have
Microwave Conf., Sydney, Australia, (Dec. 2000), Vol. 1, also been hampered by having to contend with narrowband
pp. 1432–1435. channel allocations, a large number of legacy users, and
16. M. Ikeda, J. Yamauchi, and H. Nakano, A quadrifilar helical restrictive regulatory requirements.
antenna with a conducting wall, Proc. 2001 Communications Communications in the HF band rely on either
Society Conf. IEICE, Choufu, Japan, (Sept. 2001), p. B-1-70. ground-wave propagation or sky-wave propagation, which
HF COMMUNICATIONS 947
2. HF RADIO APPLICATIONS
involves reflections from the ionosphere (Fig. 1). Ground-
wave propagation is composed of a direct ray, a ground HF radio was initially used for Morse code and voice
reflected ray, and a surface wave due to ground currents. transmission. Accordingly, under international radio
The lower the frequency is, the farther ground-wave conventions, much of the spectrum was organized into
signals propagate. At the upper end of HF frequencies, a series of narrowband channels, with voice channel
the ground-wave attenuates rather rapidly with distance allotted a nominal 3-kHz band for single sideband (SSB)
and plays less of a role for long-distance communications. transmission. Either the upper (USB) or the lower (LSB)
The other mode of HF propagation, sky-wave, relies on sideband could be selected. For voice communications,
the ionosphere, which is a series of ionized layers above HF radio has sometimes been combined with vocoders
the earth. Ionization is induced by solar radiation on the (speech bandwidth compression devices), particularly
day side of the earth and by cosmic rays and meteors on when double-sideband modulation of speech had to be
the night side. This causes rays to be reflected, or rather accommodated within the narrow bandwidth of a typical
refracted, from the ionosphere. HF channel.
Refraction in the ionosphere is stronger at lower Nowadays even voice is transmitted digitally, so that
frequencies. As a result, frequencies below HF are the principal interest has shifted to various digital
refracted from lower layers. At frequencies above HF rays modulation schemes and modems tailored to be effective
do not refract enough, even from the higher layers. Thus over HF channels.
there is a highest frequency supported by HF sky-wave As digital transmission was introduced, frequency
propagation. D-layer absorption determines the lowest shift keying (FSK) and phase shift keying (PSK)
frequency in the HF band supporting a sky-wave mode. became increasingly applied as modulation techniques.
Well below HF, in the very low frequency (VLF) In addition, frequency hopping has been used for military
band ground-wave and sky-wave propagation combine to digital transmission, when it is desirable to avoid detection
form the earth–ionosphere waveguide, which can become of the transmissions so as to prevent a jammer from
nearly lossless, allowing worldwide communications. HF locating and concentrating on a particular transmission
is normally the highest frequency band that can propagate frequency.
over long distances within this waveguide. A typical application of HF radio has been marine radio
The importance of HF derives from the fact that HF from coastal stations to ships and back, and between naval
radiowaves are capable of long-distance transmission vessels. The frequency is selected according to the time of
with relatively little power, which was crucial before day, season of the year, and distance to be covered. Some
satellites and long-range cables became available. The of the marine communications have switched to satellite
ability to communicate using radio transmission at long links, but HF is still commonly used because of its lower
distances was discovered by Guglielmo Marconi when he cost. HF voice links continue to be very much in use for
demonstrated the first transatlantic wireless telegraphy emergency or distress calls.
in December 1901. He used radio transmission to send A new digital service has been introduced by ARINC,
Morse code signals from England to Newfoundland. The augmenting existing HF voice radios to provide data
existence of the ionosphere was not known at that time. communications between commercial aircraft and ground.
A year later Oliver Heaviside and independently Arthur This service is intended primarily for commercial aircraft
Edwin Kennelly suggested the existence of an ionized layer flying over the oceans, out of reach of other radio
to explain long-range propagation. It was not until 1924 communications. The service, HF Data Link (HFDL) [1],
that Edward V. Appleton confirmed this. and 11 HF shore stations around the world under control
More recently the role of HF has to some extent been of two long-distance operational control (LDOC) centers,
taken over by communications satellites, which hover over one in New York and the other in San Francisco. HF
the earth in geostationary positions and permit over- links operated by ARINC are used by the Federal Aviation
the-horizon communication independent of day or night. Agency (FAA) for air traffic control (ATC) communications.
However, satellite communications is more costly than HF, Above 80◦ north, HFDL is the only communications
948 HF COMMUNICATIONS
medium available to commercial aviation. Elsewhere, the various countries and private networks linking remote
HFDL service competes with satellite services primarily outposts around the world, such as International Red
on cost. HF is still used in Australia for the Royal Flying Cross field sites and oil exploration stations.
Doctor Service (RFDS), but its role has shifted to function The allocation of HF frequencies is strictly regulated
more as an emergency backup to the telephone network by licensing in individual countries, and is coordinated
and to satellite systems. worldwide by the ITU and in the United States by
As HF data links become more sophisticated and the Federal Communications Commission (FCC). Some
better able to cope with link variability, they find more of the frequencies are allocated to citizens band (CB),
applications in digital networking, data transmission, and some to amateur fixed public use, and some to marine
email. Indeed, email is a growing marine application. and aeronautical communications. Government groups,
The shipboard HF radio stays in contact with one of including the military, have their own allocations. The full
several shore-based HF stations, which act as mail FCC table of HF allocations is available on the Internet [4].
servers. Several commercial HF email networks have been Wideband HF has been an active topic of research, but
established as an outgrowth of marine amateur radio it is still considered impractical by many because of the
operations [2]. The Red Cross has used the CLOVER legacy of narrowband frequency allocations.
protocol. Another network protocol is called PACTOR,
which is an FSK based scheme developed in Germany
in the late 1980s by a group of ham operators. A newer 3. HF CHANNEL PROPAGATION
proprietary protocol is PACTOR-II, which uses a two-
tone differential phase shift keying (DPSK) [3]. Both use HF frequencies propagate by ground-wave signals along
a parallel-tone modulation scheme. The data rates for the conducting earth or by skywave signals refracted
PACTOR-II range from 100 to 800 baud depending on from the layers of the ionosphere, as summarized in
conditions. The throughput is up to 140 characters per Table 1. Many additional details about HF propagation
second. and communication methods may be found in the
HF radios serve as important links for data communica- literature [5–7].
tions for all military services. The U.S. Air Force operates Ground-wave propagation is quite predictable and
Scope Command, a network of HF stations. The U.S. Army steady. Signal strength drops off inversely with distance,
uses HF for long-range handheld radio communications. but depends strongly on the polarization, the ground
The U.S. Navy uses HF for ship-to-ship and shore-to- conductivity, the dielectric constant, and the transmission
ship communications, mostly at short range using ground frequency. Saltwater provides excellent conductivity, and
waves. The U.S. Army and the U.S. Marine Corps often therefore transmission over seawater is subject to the
use HF in near-vertical incidence sky-wave (NVIS) mode least attenuation. At frequencies below HF, ground-
to communicate short range over mountains and other wave communications reach several thousand kilometers,
obstacles (Fig. 1). The National Communications System whereas at HF communications can range from tens of
(NCS) links a large network of HF stations (SHARES) kilometers over dry ground to hundreds of kilometers over
and a large number of allocated frequencies for national seawater.
emergency communications. The NATO Standardization Skywave signals depend primarily on the ionospheric
Agreement (STANAG) 5066 data-link protocol is used in layer from which they reflect. The layers can often
HF email gateways by the NCS and by U.S. and European support several rays, or paths, between two terminals.
military services, and has been adopted by NATO for the The transmissions at lower frequencies tend to reflect
Broadcast and Ship-Shore (BRASS) system. from lower layers, while the transmissions at higher
Other important HF radio applications include frequency penetrate deeper and reflect from higher layers.
encrypted communications with diplomatic posts in Transmissions at frequencies above the HF band tend to
go right through the ionosphere, as they are not reflected on system parameters (transmitter power, antenna,
back to earth. modulation, and noise) as well.
We use the term reflection to indicate that a ray returns The E layer sometimes contains patches of denser
to earth. Actually as a radiowave hits the ionosphere, it is ionization that generate an additional reflection called
not reflected as if from a smooth surface boundary; rather, ‘‘sporadic E.’’ Sporadic E is strong and usually nonfading.
it is refracted, or somewhat bent, so that it appears as if it It can degrade long-range HF communications by
had been reflected from a mirrorlike surface at a greater preventing signals from reaching the F2 layer. On
height. This apparent height of reflection is called the the other hand, it can make possible medium-range
virtual height. communications at frequencies well above the HF band.
The lowest ionospheric layer is the D layer, which thins A ray reflected from the ionosphere can exhibit fading
out at night. The E layer above it also thins out at night. and time-delay dispersion due to absorption and small
During the day it can refract frequencies in the lower HF variations of the propagation medium along the path. The
band. The highest layer, F, is split during the day into fading of a single ray is generally uniform across a wide
the F1 layer and the higher F2 layer. By refracting from a bandwidth and is called flat fading.
higher layer, the radiowaves can reach out further. The F2 Multiple rays can be created by reflections from
layer is the most important layer for daytime propagation different layers (see Fig. 1). The different rays arrive
of long-range HF rays; it permits communications in the at the receiver with different delays and combine to
higher end of the HF band. cause a very-frequency-dependent fading called frequency-
The ionization in the F2 layer, so important for long- selective fading.
range HF, is highly variable because of its dependence on A sky-wave signal can be rereflected by the ground
the sun. These variations have cycles of 1 day (due to the (especially seawater) to create a ray with multiple
rotation of the earth), about 27 days (due to the rotation ionospheric reflections, usually referred to as hops, as also
of the sun), seasonal (due to the movement of the earth illustrated in Fig. 1. When this happens, delay spreads
around the sun), and about 11 years (due to the observed can be especially large.
period of sunspot activity). The decomposition of a transmitted ray into multiple
The reflecting property of the ionosphere is often rays can also be caused by the influence of the earth’s
characterized by the critical frequency, which is the highest magnetic field. This can be explained by noting that
frequency that can be reflected from the ionosphere at a transmitted vertically polarized electromagnetic wave
vertical incidence. The critical frequency fc is determined can be considered as a superposition of two waves of
by the electron density N (the number of electrons per cm3 ) opposite circular polarizations. Because of the interaction
and it varies with time and location, but has a typical value of the magnetic field with electrons in the ionosphere,
around 12 MHz. The critical frequency can be determined each of these component waves is subjected to different
by vertical sounding or predicted from refraction levels and therefore follows a slightly different
√ ray path between transmitter and receiver. These two
fc = 9 · 10−3 N rays, which are termed the ordinary ray (O-mode) and
the extraordinary ray (X-mode), can arrive with different
The maximum usable frequency (MUF) is the highest delays, also generating multipath.
frequency that connects transmitter and receiver on a Sky-wave signals exhibit a considerable amount of
given link: variability. The ionosphere is not static and has variations
kfc kfc in the electron density distribution. Solar flares and
MUF = =
sin ϕ cos(θ/2) sunspots cause ionospheric turbulence and atmospheric
phenomena such as sudden ionospheric disturbances
where ϕ is the angle of incidence, θ is the angle by (SIDs), and polar cap absorption (PCA) can disrupt and
which the ray is refracted, and k ∼ = 1 is a correction disturb transmission.
factor accounting for ray bending in lower layers. This The initial frequency can be selected from ionospheric
equation can also be used to determine the critical angle, predictions based on historical data, vertical sounding, or
the maximum angle of incidence at which reflection can oblique ionospheric sounding probes. It can be updated by
occur at a particular frequency. monitoring the transmitted signal directly and switching
When the critical frequency is high enough, it is frequencies as necessary.
possible to establish HF sky-wave communications at External noise is often a limiting factor on HF links.
short ranges by tilting the HF antennas so that enough Its three components, atmospheric noise, galactic noise,
energy is radiated straight up. This unique mode of and man-made (human-generated) noise, are important
HF propagation, called near-vertical incidence sky-wave sources of noise in the HF band. We note that below HF
(NVIS) communication, is used for short-range valley-to- only atmospheric noise and synthetic noise are significant.
valley-type communication links. The NVIS transmission Atmospheric noise is due mainly to lightning and is
mode is possible only at the lower HF frequencies. therefore highly variable. Atmospheric noise levels tend
At the low end of the HF band D-layer absorption to be more severe in equatorial regions and are 30–40 dB
increases, making daytime communications difficult. The weaker near the poles of the earth. Atmospheric noise can
lowest usable frequency (LUF) is the frequency below be dominant at nighttime at frequencies below 16 MHz.
which the signal becomes too weak. Whereas the MUF is Like the other external noise sources, it decreases with
defined entirely by propagation effects, the LUF depends frequency. Galactic noise is fairly predictable and can
950 HF COMMUNICATIONS
be dominant in radio-quiet areas at frequencies above illustrated in Fig. 2. These functions are elements of the
4 MHz. Synthetic noise is highly variable and depends on ISO layered protocol model illustrated in Fig. 3.1 The ter-
transmission activity in adjoining frequency bands. minals can range in complexity from fairly simple ‘‘ham’’
In summary, the properties of HF sky-wave transmis- (amateur) radios to sophisticated automated radio net-
sion depend on the height and density of the ionospheric work terminals.
layers and on ambient noise sources. Because the iono- HF voice terminals traditionally operate in a one-way
sphere has several layers and sublayers, several refracted (simplex) mode using manual or automated ‘‘push to talk’’
ray paths are possible. Thus multipath and fading effects (PTT). Modern data modems operate in a full-duplex mode,
are common. Typical time spread within each path is requiring separate frequency allocations for each direction
20–40 µs, with individual paths separated by 2–8 ms. of communications. Typically each terminal is allocated
Some of the fades last just a fraction of a second and some several frequencies on which it can receive, so as to
a few minutes. Fading and the limited bandwidth impose increase the likelihood that one of them will permit a
critical constraints on the achievable data rates, which good connection.
can be quantified using Shannon’s channel capacity [8]. More recent advances in electronic hardware, radiofre-
quency circuits, programmable chips, and software tech-
4. KEY ELEMENTS OF THE HF DIGITAL TERMINAL nology have permitted the building of HF radio terminals
that are miniaturized and relatively inexpensive. These
A typical HF digital terminal consists of an antenna,
a radio (transmitter, receiver), control equipment (fre-
quency scanning, frequency selection, link establishment, 1 The International Standards Organization (ISO) is a worldwide
synchronization, selective calling, and link quality con- federation of national standards bodies from some 140 countries
trol), a modem, and computer or network interfaces as that publishes international standards.
Network Networking
Route selection, topology monitoring, relay management,
traffic control
Modem
Modulation / demodulation, coding / decoding,
scrambling / descrambling
Radio
Physical
Transmitter receiver(s)
HF antennas
radios can be versatile. They operate in different modes, degrade the signal-to-noise ratio when external noise
use a range of data rates, and interface with different dominates, as is the case in the HF band.
networks.
With this background, we describe the components 4.2. HF Radio
shown in Fig. 2, starting with antennas (the bottom layer
The radio transmitter generates the radiated power. The
in Fig. 3).
required amount of power (hence, the size of the power
supply) depends, of course, on the intended transmission
4.1. Antennas and Couplers distance. Power can be emitted in bursts or continuously,
according to the selected type of modulation. Transmitted
Antennas are characterized by directivity, gain, band- power can range from tens of watts for handheld units
width, radiation efficiency, physical size, and polarization. to several kilowatts for point-to-multipoint (broadcast)
HF antennas [5] on the ground often require a ground ground stations. Automatic level control (ALC) or trans-
screen to improve ground conductivity. Directivity is mea- mit level control (TLC) is used to moderate the increase in
sured by assessing the proportion of the total radiated transmitted RF (radiofrequency) power when audio input
power propagating in a given direction. Gain expresses increases.
the combined directivity and radiation efficiency and thus A single HF radio channel accommodates transmission
measures the portion of the total transmitted power radi- of voice in the range of 300–3000 Hz. HF radios generally
ated in a given direction. For ground-wave propagation, use single-sideband (SSB) modulation, transmitting only
the transmitter and receiver must have the same polariza- one of the two sidebands generated by modulating the
tion. For sky-wave propagation, the received polarization voiceband signal onto a carrier frequency. An SSB HF
is generally elliptic and can vary greatly from the transmit- radio is typically designed for voice and supports only
ted polarization due to different propagation of the ordi- low-data-rate transmission.
nary and extraordinary waves. Therefore, some receiver To introduce higher data throughput, some modern
terminals use diversity, whereby the received signals from radios operate over two channels (6-kHz band) or four
two orthogonally polarized antennas are combined to get channels (12-kHz band) at once. Two adjoining channels
the best signal. use both sidebands in an independent sideband (ISB)
Vertically polarized antennas, such as the vertical whip modulation. In a four-channel mode two sidebands on
and high-power towers, tend to be narrowband. These either side of the carrier are used [9].
antennas are best suited for ground-wave propagation At the receiving end, it is possible to improve
or for long-range sky-wave propagation. Their antenna performance by using polarization and space diversity.
patterns are omnidirectional in azimuth, making such This works best if the antennas can be spaced several
antennas ideal for point-to-multipoint communications. wavelengths apart, which may not always be practical
A vertical whip can be tilted or bent for intermediate- at HF frequencies. However, some diversity improvement
or short-range NVIS skywave. Horizontal polarization can still be achieved with smaller antenna spacing [10,11].
requires that the antenna be raised above the ground.
Another common HF antenna type is the half-wave 4.3. HF Modem
horizontal dipole on a mast, which can be used at
low frequencies and medium range. The horizontal Yagi While the earliest radios used analog amplitude modula-
antenna is also used at HF, but becomes large at low tion (AM) or frequency modulation (FM), modern digital
frequencies (wavelength at 3 MHz is 50 m or 150 ft) HF systems achieve significant information throughput
and is very narrowband. An omnidirectional pattern by using a modem (modulator/demodulator) at each end of
can be achieved with a crossed-dipole antenna. Rhombic the link. The modem modulates the data onto a transmit-
antennas can achieve significantly better antenna gain, ted carrier, and inversely demodulates the received carrier
but when designed for the low end of the HF band waveform to get the transmitted data.
(2–10 MHz) can be very large, requiring several acres. The modem must be designed to effectively cope
Narrowband antennas require an antenna tuner, which with the severe multipath and fading found on the HF
is a circuit of capacitors and inductors that allows adjust- channel. A well-designed modem can take advantage of the
redundancies generated by encoding and by propagation
ment of the antenna impedance to maximize radiation
through the different paths spaced in time or space. It
efficiency at the desired frequency. An antenna tuner auto-
can reassemble the signal that may have been spread out,
mated under microprocessor control is called a coupler.
taking advantage of all propagation paths.
Wideband antennas that do not require a tuner or
The modem is the heart of the digital HF terminal and
coupler are usually a variation of the logperiodic antenna,
is discussed in more detail in a later section.
which can be either vertically or horizontally polarized.
The logperiodic antenna is a combination of several
4.4. Link Establishment and Maintenance
narrowband elements, each tuned to a different frequency.
Properly designed, they can cover the entire 2–30 MHz The station desiring to transmit has to select from a
HF band. number of preassigned frequencies. While in the past
Small wideband antennas are used primarily for link establishment was a difficult manual task for the
receive-only functions, as they have low radiation operator, more and more modern HF radios use automatic
efficiency due to the difficulty of matching the impedance link establishment (ALE) [12]. At the receiving station the
over a wide bandwidth. Low-efficiency antennas do not ALE receiver scans the allocated frequencies to detect the
952 HF COMMUNICATIONS
transmission and to make the best frequency selection ARQ can be considered an alternative, or supplement,
for establishing the link. The selection may be based on to interleaving and coding, which are discussed below.
what is best for one particular link or for connecting a Essentially error-free communications can be achieved
transmitting station to a number of users at the same time. this way at the cost of significant delay variations.
An ALE-equipped radio can passively or actively Combining interleaving and coding with an ARQ method
evaluate the channel for the transmission frequency best can get the benefits of both approaches. ARQ is commonly
suited for the intended receiver. Active evaluation of what used by higher layer protocols, such as TCP/IP, to provide
is the proper frequency might be the result of a broadband end-to-end reliability. It can also be very useful at the link
sounder identifying the ionospheric layers. It can involve level to minimize end-to-end retransmissions.
feedback from an ALE receiver to report when a carrier The most common ARQ scheme is ‘‘go-Back-N’’ [15],
frequency change is needed. Passive evaluation, such as where retransmissions start over with the frame that
link prediction based on stored or broadcast ionospheric contains errors. The most advanced ARQ method is
data, can be reasonably effective at estimating gross selective ARQ, which allows the terminals to select
ionospheric effects. individual data frames for retransmission. One advantage
The ALE module coordinates communications for of selective ARQ over coding and interleaving is that the
automatic HF link establishment (handshake) and status delay is determined by the actual channel characteristics,
information between the two nodes, controlling both while an interleaver generally has to be designed for the
half- and full-duplex modes. Another function of ALE is worst-case scenario (long fades).
selective calling. This feature allows a radio to mute the
received signal until the transmission intended for that
particular radio is received. Selective calling requires that 5. HF MODEMS
the receiving radio have a unique address and that this
address be included as part of the transmission. HF modems perform several functions other than mod-
ALE functions international rely on standards to ulation and demodulation. These include interleaving
make radios from different manufacturers as compatible and deinterleaving, coding and decoding, and security.
as possible, such as FED-STD-1045 [13] and MIL-STD- Modems incorporate circuits for initial acquisition, adap-
188-141B [9]. Combined with availability of adaptive tive equalizers for automatic compensation of multipath
equalization in the modem, ALE enhances the likelihood dispersion, and circuits for performance monitoring and
of link establishment and good communications. for controlling the switching of carrier frequencies. This
Reliable HF communication also requires monitoring of section describes some applicable standards and key func-
the connectivity between the transmitter and the receiver tions of the modem.
to enable automatic switching to another frequency when
the current frequency has faded out or has become less
5.1. Standards
effective for the desired range of transmission.
The proliferation of different modulations, link protocols,
4.5. Network Interface and Traffic Management and data rates has led to the establishment of a series of
A network controller couples the modem to the network. standards promoting interoperability.
It controls network traffic using the link and manages The ITU, the Department of Defense (DoD), NATO,
changing conditions over the HF link. Interfaces may be Telecommunications Industries Association (TIA), and
able to adapt the modem to more than one local-area others publish carefully crafted standards. The DoD, in
network (LAN) standard (e.g., ATM or Ethernet). The particular, has issued a comprehensive series of manda-
interface performs appropriate handshake operations with tory and recommended standards for military commu-
the network equipment and may also implement a traffic nications, many of which have been adopted for com-
management function, such as congestion control. mercial use. Military HF modems follow the MIL-STD-
As an example, consider a marine email application. 188-110B [14], which specifies a number of modulation
The modem at the user’s end of the link is connected and coding waveforms for various data rates and for
through a network interface to the computer running the both single- and multichannel modems. The HF terminals
local email software, while the modem at the shore station follow the MIL-STD-188-141B [9], which covers transmit-
is connected to the Internet. ters, receivers, link establishment procedures, data-link
protocols, security, antijam technology, and network main-
tenance (using SNMP). This standard is coordinated with
4.6. Data-Link Protocol
FED-STD-1045 [13]. Military standards are available on
Digital transmission over the HF link must be controlled the Internet [16]. International standards, and many oth-
by a well-defined protocol. For instance, military radios use ers, are available from the American National Standards
the data-link protocol in MIL-STD-188-110B, Appendix Institute (ANSI) [17].
E [14]. The link data frame usually includes cyclic NATO has established its own Standardization Agree-
redundancy code (CRC) check bits, which allow detection ments (STANAGs). The U.S., NATO, and ITU standards
of errors in a frame. When errors are detected, the are highly coordinated. For instance, MIL-188-STD-110B
link protocol may use a feedback channel to initiate specifies STANAG 5066 as an optional data-link protocol.
retransmissions. Since such a feature is usually automatic, Table 2 lists representative standardized HF modula-
it is known as automatic repeat request (ARQ). tions, several of which are discussed further below.
Table 2. Characteristics of Selected HF Modem Standards
16-Tone DPSK Modem, 39-Tone DPSK Modem, ALE Single Modem, PSK Single-Tone Modem, PSK Single-Tone Modem,
MIL-STD-188-110B, MIL-STD-188-110B, MIL-STD-188-141B, MIL-STD-188-110B, MIL-STD-188-110B,
Appendix A Appendix B Appendix A Fixed Frequency Appendix C
953
110 Hz QPSK spread by 32 (75 bps) 32-QAM (8000 bps)
64-QAM (9600,
12,800 bps)
Coding and Bits repeated at rates Shortened Reed-Solomon Rate- 12 Golay code, triple 4800 bps: no coding Rate 34 convolutional code
interleaving below 2400 bps (15,11) : (14,10) at redundancy 75, 600–2400 bps: rate 12 (with tail-biting to
2400 bps, (7,3) 150–300 bps: rate 12 plus match interleaver
otherwise; repeat coded repetitions length)
bits at rates below 2400
Interleaving None Selectable 48-bit interleaving 0, 0.6, or 4.8 s 0.12-8.64 s (6 steps)
Preamble 2 tones for 66.6 ms 4 tones over 14 or 27 No fixed preamble; 0.6 or 4.8 s of a fixed BPSK 0-7 blocks of 184 8PSK
followed by all tones for periods followed by 3 transmission is an sequence symbols followed by
13.3 ms tones over 8 or 27 asynchronous sequence 184 sync symbols
periods followed by all of 24-bit words; each followed by 103
tones over 1 or 12 word has three 7-bit symbols with
periods (second number characters and a 3-bit information on data
for optional extended preamble identifying rate and interleaver
preamble) word type settings
954 HF COMMUNICATIONS
5.2. Signal Acquisition the data rate. This means that intersymbol interference
cannot be ignored, except perhaps at the lowest data rates.
HF modems must be able to achieve signal acquisition
Single-tone modems use adaptive equalization to undo the
in the presence of several channel disturbances: severe
intersymbol interference. By using the entire received
fading, multipath, non-Gaussian noise, and interference
signal, a serial modem can get better performance than
from other transmitters. Transmissions normally include
a modem that ignores the part of the signal containing
a known preamble designed to permit the modem to adjust
intersymbol interference.
as necessary to achieve signal acquisition.
Thanks to more recent advances in signal processing
After the initial acquisition, transmissions may incor-
and linear amplification technologies, single-tone modems
porate a known signal to permit continuous monitoring
can use amplitude modulation to achieve even higher data
of channel conditions. Such a signal may be in the form
rates. For example, the MIL-STD-188-110B includes 16-,
of a pilot tone in a parallel-tone system or a periodically
32-, and 64-quadrature amplitude modulation (QAM)
inserted training sequence in a single-tone system. Equal-
modems (see the last column in Table 2). These new
ization without a training sequence is known as ‘‘blind’’
modulations can have rates up to 12,800 bps within a
equalization [18] and has been studied for HF [19].
3-kHz band. Although QAM spaces signal points as far
apart as possible, it requires significantly higher signal-
5.3. Types of Modulation to-noise ratio (SNR), due the smaller spacing between
The easiest way to communicate over a channel with signal constellation points. In general, one result of this
multipath is to modulate tones with symbols of duration ‘‘compressed’’ signal constellation is the constellation loss,
much longer than the multipath spread. By ignoring the meaning that the required power grows faster than the
first part of each received tone, intersymbol interference increase in data rate. Higher-data-rate modes can be used
can be avoided. This principle is used in older frequency in short-range ground-wave applications where the signal
shift keying (FSK) modems, which transmit one or more is sufficiently strong. With sufficient transmitter power,
of several possible tones at a time, and in the parallel-tone they may also be appropriate for short- or medium-range
phase shift keying (PSK) modems commonly employed sky-wave communications.
by military and commercial users. Table 2 shows an
example of two parallel-tone PSK modems (16-tone modem 5.4. Coding
in column 1 and 39-tone modem in column 2) and a Coding can be used to correct errors by introducing
robust FSK modem used for ALE (column 3). Parallel redundancy in the transmitted data [20]. It can be used
tone modems are also used with marine single-sideband with both parallel- and serial-tone modulations. The
(SSB) radios for commercial HF email and HF Web selection of a proper code is one of the key design
applications [3]. issues. Forward error correction (FEC) block codes are
Modems with a narrow HF bandwidth allocation can simple to use because they are inherently compatible
benefit from using multiphase PSK (e.g., 4, 8, or 16 with the data frame structure. However, short-constraint-
phases), to achieve higher data rates. Differential PSK length convolutional codes are often applied and can
(DPSK), which compares the phase received in each time offer performance advantages, particularly as they can
slot to the phase received in the preceding slot, offers easily be decoded using the actual received samples (soft
a simple implementation that does not require accurate decision), as opposed to only using demodulated symbols
phase tracking. However, since the phase in the previous (hard decision). Table 2 lists several types of codes used
slot is also noisy, DPSK incurs a performance loss relative in standard HF systems. All these coding techniques add
to coherent PSK. redundancy, thus increasing the required channel data
Parallel-tone modems with narrow frequency bands rate. With conventional modulations it is then necessary
for each tone and low keying rates have several to expand the bandwidth.
advantages: (1) they are easy to implement, (2) they can Trellis modulation coding [21] is a way of combining
be quite bandwidth-efficient, and (3) the multipath spread coding and modulation that can offer coding gain without
is usually a fraction of the tone duration. The main expanding bandwidth. It has seen few applications in HF
disadvantage is the high peak-to-average power ratio in spite of the band-limited nature of HF channels. A newer
resulting from the superposition of several parallel tones. form of coding, Turbo coding [22], promises performance
This leads to the need for a linear power amplifier with a even closer to Shannon’s capacity limit by interleaving the
dynamic range much larger than the average transmitted coded data over relatively long time periods.
power. Another disadvantage is the fact that Doppler
spread can cause interference between the parallel tones.
5.5. Interleaving
Coding is used to provide an in-band diversity gain,
compensating for the selective fading caused by multipath. Fades cause errors to appear in clusters. The objective of
In serial transmission, with single-tone modems, the interleaving is to randomize error occurrences by forming
modulation can have virtually constant amplitude, so the the encoded block, not out of consecutive bits, but out
peak-to-average power ratio is close to unity. A higher of bits that are selected in a pattern spread out over
average power is then achieved with a given peak power. several blocks. When the inverse process is implemented
This means that a more efficient class C amplifier can be at the receiver, the received clusters of error are converted
used, and the transmitter can be made more compact. into randomly distributed errors. Coding then permits the
A single-tone modem requires faster keying to match random errors to be corrected at the receiver.
HF COMMUNICATIONS 955
To be effective, the interleaving period needs to be not been widely used with HF transmission. The MLSE
longer than the duration of a fade, which can be several equalizer is an optimal receiver when the channel is
seconds on the HF channel. The resulting delay can be known, but it has rarely been used because of its
undesirable, and many modems using interleaving include complexity and sensitivity to channel perturbations. It
means for selecting the interleaving period. has been used mostly at HF in experimental modems
but could become more practical as the processing power
5.6. Link Protection of new digital signal processing (DSP) chips increases.
A number of methods may be used to protect the data Because of its simplicity, the DFE is popular for modems
transmitted over the link. These include data scrambling with data rates that are high relative to the fade rate,
and encryption. Similarly, ALE transmissions may be but it has not been used much at HF because of the low
protected, for instance, as described in Appendix B of the data rates involved. As the trend to increase HF modem
military standard for automatic link establishment [9]. data rates continues, however, the DFE may become more
common.
5.7. Adaptive Equalization Since the channel is continuously changing as a result
of fading, a training sequence is periodically inserted to
Adaptive equalization is a modem feature that reassem-
effectively measure the individual path gains and phase
bles a signal that has been dispersed in time by the
shifts. The MIL-STD-188-110B serial-mode waveform uses
channel propagation effects. It was first introduced in
33–50% of the transmissions for training sequences (see
telephony [23] and later applied to fading channels [24].
Table 2, column 4). The extra reference permits using
Adaptive equalization is an effective technique for HF
links to overcome multipath. other equalization methods offering performance slightly
When the delay spread is small compared to the symbol better than a DFE, such as the data decision estimation
duration, a simpler technique called adaptive matched method [25].
filtering, or a RAKE receiver, may be used [20]. A RAKE
receiver estimates individual channel gains and uses these
6. LINK PREDICTION AND SIMULATION
estimates to adjust the gains of a tapped delay line to best
match the channel. This approach is most often used
with spread-spectrum systems, where bandwidth is large Performance prediction, based on HF channel models,
enough to resolve the individual paths, and symbols are is used to plan and operate HF communications links.
long enough to permit neglecting intersymbol interference. Channel simulators are used to test modem performance
When intersymbol interference cannot be neglected, using synthesized model channels.
adaptive equalization is needed. Many forms of adaptive Multipath characterization of the HF communications
equalization may be used to undo the channel multi- channel is traditionally based on Watterson’s narrowband
path [20]. The main concept is to combine different delayed HF channel model [26]. The Watterson model represents
versions of the received signal in order to maximize the the channel as an ideal tapped-delay line with tap spacings
signal-to-interference ratio in each symbol being demodu- selected to match the relative propagation delays of the
lated. The interference meant here includes that from resolvable multipath components. Each tap has a complex
adjoining symbols. The equalizer applies proper ampli- gain multiplier, and summation elements combine the
tude and phase to each delayed tap and combines the delayed and weighted signal components (see Fig. 4). The
resulting signals in such a way as to best reconstruct the transmitted signal feeds the tapped-delay-line channel
transmitted symbol. model. The tap gains of each path are modeled as
Two common adaptive equalizer techniques, the mutually independent complex-valued Gaussian random
decision-feedback equalizer (DFE) and the maximum- processes, producing Rayleigh fading. Each tap gain fades
likelihood sequence estimation (MLSE) equalizer, have in accordance with a Gaussian spectrum.
Time varying
path gains: g1(t ) × g2 (t ) × gn (t ) ×
Σ + Received
waveform
26. C. C. Watterson, J. R. Juroshek, and W. D. Bensema, Exper- characterization of the average behavior as well as the
imental confirmation of an HF channel model, IEEE Trans. nature of behavioral changes of the system gives rise to
Commun. Technol. COM-18(6): 792–803 (1970). the mathematical formalism called the hidden Markov
27. Radio Noise, International Telecommunications Union Rec- model (HMM).
ommendation ITU-R P.372-7, 2001. A hidden Markov model is a doubly stochastic process
28. J. Coleman (no date), Propagation Theory and Software, with an underlying stochastic process that is not readily
Part II (online): http://www.n2hos.com/digital/prop2.html observable but can only be observed through another
(Oct. 2001). set of stochastic processes that produce the sequence of
29. HF Propagation Prediction Method, International Telecom- observations [1–3]. By combining two levels of stochastic
munications Union Recommendation ITU-R, 2001, pp. 533– processes in a hierarchy, one is able to address the
537. short(er) term or instantaneous randomness in the
30. Ground-Wave Propagation Curves for Frequencies between observed event via one set of probabilistic measures while
10 kHz and 30 MHz, International Telecommunications coping with the long(er) term, characteristic variation
Union Recommendation ITU-R, 2000, pp. 368–377. with another stochastic process, namely, the Markov
31. G. R. Hand (no date), NTIA/ITS High Frequency Propagation
chain. This formalism underwent rigorous theoretical
Models (online): http://elbert.its.bldrdoc.gov/hf.html (Oct. developments in the 1960s and 1970s and reached
2001). some important milestones in the 1980s [1–7]. Today,
it has found widespread applications in stock market
32. A. Malaga, A characterization and prediction of wideband
HF skywave propagation, Proc IEEE MILCOM85: 1985,
prediction [6], ecological modeling [7], cryptoanalysis [8],
pp. 281–288. computer vision [9], and most notably, automatic speech
recognition [10]. Most, if not all, of the modern speech
33. L. Ehrman, L. Bates, J. Eschle, and J. Kates, Simulation
recognition systems are based on the HMM methodology.
of the HF channel, IEEE Trans. Commun. COM-30(8):
1809–1817 (1982).
34. B. D. Perry, A new wideband HF technique for MHz-
2. DEFINITION OF HIDDEN MARKOV MODEL
bandwidth spread spectrum radio communications, IEEE
Commun. Mag. 21(6): 28–36 (1983).
35. DAMA Demand Assignment Multiple Access, National Let X be a sequence of observations, X = (x1 , x2 , . . . , xT ),
Communications System, Office of Technology and Standards, where xt denotes an observation or measurement, possibly
FED-STD-1037C, 2000. vector-valued. Further consider a first-order N-state
Markov chain governed by a state transition probability
matrix A = [aij ], where aij is the probability of the Markov
HIDDEN MARKOV MODELS system making a transition from state i to state j; that is
model thus defines a density function for the observation It should be obvious that the number of states, N,
sequence X as follows: defines the ‘‘resolution’’ or ‘‘complexity’’ of the model, which
is intended to explain the generation of the color sequence
P(X|π, A, {bj }j=1
N
) = P(X|) as accurately as possible. One could attempt to solve the
modeling problem using a single-state machine, namely,
= P(X, q|) N = 1, with no possibility of addressing the potential
q
distinction among various urns in their composition of
= P(X|q, )P(q|) (5) colored balls. Alternatively, one can construct a model
q with a large N, such that detailed distinctions in the
collection of colored balls among urns can be analyzed and
T
the sequence of urns that led to the color observations can
= πq0 aqt−1 qt bqt (xt )
be hypothesized. This is the essence of the hidden Markov
q t=1
model.
where = (π, A, {bj }j=1
N
) is the parameter set for the model.
2.2. Elements of HMM
In this model, {bqt } defines the distribution for short-
time observations xt and A characterizes the behavior and A hidden Markov model is parametrically defined by the
interrelationship between different states of the system. triplet {π, A, B = {bj }j=1
N
}. The significance of each of these
In other words, the structure of a hidden Markov model elements is as follows:
provides a reasonable means for defining the distribution
of a signal in which characteristic changes take place from 1. N, the number of states. The number of states defines
one state to another in a stochastic manner. Normally N, the resolution and complexity of the model. Although
the total number of states in the system, is much smaller the states are hidden, for many applications there
than T, the time duration of the observation sequence. is often some physical significance attached to the
The state sequence q displays a certain degree of stability states or to sets of states of the model. In the urn-
among adjacent qt s if the rate of change of state is slow and-ball model, the states correspond to the urns.
compared to the rate of change in observations. The use of If in the application it is important to recover at
HMM has been shown to be practically effective for many any time the state the system is in, some prior
real-world processes such as a speech signal. knowledge on the meaning or the utility of the states
has to be assumed. In other applications, this prior
2.1. An Example of HMM: Drawing Colored Balls in Urns assumption may not be necessary or desirable. In
To fix the idea of an HMM model, let us try to analyze an any event, the choice of N implies our assumed
observation sequence consisting of a series of colors, say, knowledge of the source in terms of the number of
{G(reen), G, B(lue), R(ed), R, G, Y(ellow), B, . . .} (see [3]). distinct states in the observation.
The scenario may be as depicted in Fig. 1, in which N urns 2. A = [aij ], the state-transition probability matrix.
each with a large quantity of colored balls are shown. We The states of the Markov model are generally
assume there are M distinct colors of the balls. We choose interconnected according to a certain topology. An
an urn, according to some random procedure, and then ergodic model [3] is one in which each state can
pick out a ball from the chosen urn again randomly. The be reached from any other state (in a nonperiodic
color of the ball is recorded as the observation. The ball is fashion). If the process is progressive in nature,
replaced back in the urn from which it was selected and say, from state 1, 2, . . . to state N as time goes
the procedure repeats itself — a new urn is chosen followed on and observations are made, then a so-called
by a random selection of a colored ball. The entire process left-to-right topology may be appropriate. Figure 2
generates a sequence of colors (i.e., observations) that can shows examples of an ergodic model and two left-to-
be a candidate for hidden Markov modeling. right models. The topological interconnections of the
Markov chain can be expressed in a state-transition length T together with the observation sequence as defined
probability matrix. The state-transition probability in Eqs. (4) and (5):
is defined by Eq. (1) and satisfies the probability
constraints of Eq. (2) For an ergodic model, one P(X|) = P(X, q|)
can calculate the ‘‘stationary’’ or ‘‘equilibrium’’ state q
probability, the probability that the system is in
a particular state at an arbitrary time, by finding = πq0 aq0 q1 bq1 (x1 )aq1 q2 bq2 (x2 )
the eigenvalues of the state-transition probability q0 ,q1 ,...,qT
Three basic problems associated with the HMM must αt (i) = P(x1 , x2 , . . . , xt , qt = i|) (7)
be solved for the model to be useful in real-world
applications. The first, the evaluation problem, relates to that is, the probability of the partial observation sequence
the computation of the probability of an observed sequence (x1 , x2 , . . . , xt ) and state i at time t. Calculation of
evaluated on a given model. This is important because a the forward probabilities can be realized inductively as
given model represents our knowledge of an information follows:
source, and the evaluated probability indicates how likely
it is that the observed sequence came from the information
source. The second, the decoding problem, is to uncover 1. Initialization:
the sequence of states that is most likely to have led α0 (i) = πi (8)
to the generation of the observed sequence in some
optimal sense. We have mentioned that in many real- 2. Induction:
world problems a state often carries a certain physical
meaning, such as realization of a phoneme in a speech
N
utterance or a letter in handwritten script. Decoding aims αt+1 (j) = αt (i)aij bj (xt+1 )
i=1
at making the associated meaning explicit. The third, the
estimation problem, is about obtaining the values of the (0 ≤ t ≤ T − 1, 1 ≤ j ≤ N) (9)
model parameters, given an observation sequence or a
set of observation sequences. A model can be viewed as a 3. Termination:
characterization of the regularity in the ‘‘random’’ event,
which forms the basis of our knowledge of the source.
N
The regularity is encapsulated by the model parameters, P(X|) = αT (i). (10)
which have to be estimated from a set of given observation i=1
sequences, known to have come from the source.
This procedure requires on the order of N 2 T calculations.
3.1. The Evaluation Problem Using the same example of N = 5 and T = 100, we see that
We wish to calculate the probability of the observation this procedure entails about 3000 calculations instead of
sequence X = (x1 , x2 , . . . , xT ), given the model , specifi- 1072 as required in the direct procedure.
cally, to compute P(X|). Obviously, this can be accom- In a similar manner, one can define the backward prob-
plished by enumerating every possible state sequence of ability βt (i) as the probability of the partial observation
HIDDEN MARKOV MODELS 961
another model , a variable to be optimized. The auxil- hidden Markov source (of the right order), the attainable
iary function is defined, for the simplest case with a single minimum discrimination information is zero. While the
observation sequence X, as MDI criterion does not fundamentally change the problem
in model selection, it provides a way to deemphasize the
Q(, ) = P(X, q|) log P(X, q| ) (24) potentially acute mismatch between the data and the
q chosen model.
Another concern about the choice of the HMM
where P(X, q|) can be found in Eq. (6). It can be optimization criterion arises in pattern (e.g., speech)
shown that Q(, ) ≥ Q(, ) for a certain implies recognition problems in which one needs to estimate a
P(X, q| ) ≥ P(X, q|). Therefore, the second step of the number of distributions, each corresponding to a class of
algorithm involves maximizing Q(, ) as a function of events to be recognized. Let V be the number of classes
to obtain a higher, improved likelihood. These two steps of events, each of which is characterized by an HMM
iterate interleavingly until the likelihood reaches a fixed v , 1 ≤ v ≤ V. Also let P(v) be the prior distribution of
point. Detailed derivation of the reestimation formulas the classes of events. The set of HMMs and the prior
can be found in three papers [1–3]. distribution thus define a probability measure for an
arbitrary observation sequence X:
3.3.2. Other Optimization Criteria. The need to con-
sider other optimization criteria comes primarily from the
V
P(X) = P(X|v )P(v) (27)
potential inconsistency between the form of the chosen
v=1
model (i.e., an HMM) and that of the true distribution
of the data sequence. If inconsistency or model mismatch A measure of mutual information between class v and
exists, the optimality achieved in ML (maximum likeli- a realized class v observation X v can be defined, in
hood) estimation may not represent its real significance, the context of the composite probability distribution
either in the sense of data fitting or in indirect applications {P(X v |v )}Vv=1 , as
such as statistical pattern recognition. In statistical pat-
tern recognition, one needs to evaluate and compare the a P(X v , v|{v }Vv=1 )
I(X v , v; {v }Vv=1 ) = log
posteriori probabilities of all the classes, on receipt of an P(X)P(v)
unknown observation, in order to achieve the theoretically
V
optimal performance of the so-called Bayes risk [15]. A = log P(X v |v ) − log P(X v |w )P(w)
model mismatch prevents one from achieving the goal of w=1
Bayes risk. In these situations, application of optimization (28)
criteria other than maximum likelihood may be advisable. The maximum mutual information (MMI) criterion [17]
One proposal to deal with the potential problem of a is to find the entire model parameter set such that the
model mismatch is the method of minimum discrimination mutual information is maximized:
information (MDI) [16]. Let the observation sequence V
X = (x1 , x2 , . . . , xT ) be associated with a constraint R,
({v }v=1 )MMI = arg max
V
I(X , ν; {v }v=1 )
v V
(29)
for example, R = (r1 , r2 , . . . , rT ), in which each rt is {v }V
v=1 v=1
an autocorrelation vector corresponding to a short-time
observation xt . Note that X is just a realization of a The key difference between an MMI and an ML estimate
random event, which might be governed by a set of is that optimization of model parameters is carried out
possibly uncountably many distributions that satisfy the on all the models in the set for any given observation
constraint R. Let’s denote this set of distributions by (R). sequence. Equation (28) also indicates that the mutual
The minimum discrimination information is a measure of information is a measure of the class likelihood evaluated
closeness between two probability measures under the in contrast to an overall ‘‘probability background’’ (the
given constraint R and is defined by second term in the equation). Since all the distributions
are involved in the optimization process for the purpose
v(R, P(X|)) ≡ inf I(G : P(X|)) (25) of comparison rather than individual class data fitting,
G∈(R)
it in some way mitigates the model mismatch problem.
For direct minimization of recognition errors in pattern
where
recognition problems, the minimum classification error
(MCE) criterion provides another alternative [18].
g(X)
I(G : P(X|)) ≡ g(X) log dX (26) Optimization of these alternative criteria is usually
p(X|)
more difficult than the ML hill-climbing method and
where g(·) and p(·|) denote the probability density requires general optimization procedures such as the
functions corresponding to G and P(X|), respectively. The descent algorithm to obtain the parameter values.
MDI criterion tries to choose a model parameter set such
that v(R, P(X|)) is minimized. An interpretation of MDI 4. TYPES OF HIDDEN MARKOV MODELS
is that it attempts to find not just the HMM parameter
set to fit the data X but also an HMM that is as close as it Hidden Markov models can be classified according to
can be to a member of the distribution set (R), of which the Markov chain topology and the form of the in-
X could have been a true realization. If X is indeed from a state observation distribution functions. Examples of the
HIDDEN MARKOV MODELS 963
Markov chain topology have been shown in Fig. 2. The in- This leads to the following reestimation transformation
state observation distribution warrants further discussion for the HMM parameters
as it significantly affects the re-estimation procedure.
In many applications, the observation is discrete, P(X, q0 = i|) P(X, q0 = i|)
πi = = (37)
assuming a value from a finite set or alphabet, with
N P(X|)
cardinality, say, M, and thus warrants the use of a P(X, q0 = j|)
discrete HMM. Associated with a discrete HMM is the j=1
T
forming a matrix B = [bjk ], 1 ≤ j ≤ N, 1 ≤ k ≤ M. To obtain P(X, qt−1 = i, qt = j|)
an ML estimate, the Baum–Welch algorithm involves t=1
maximization of the auxiliary function Q(, ) defined in = (38)
T
Eq. (24). Note that P(X, qt−1 = i|)
t=1
T
T
log P(X, q| ) = log πq 0 + log aqt−1 qt + log bqt (xt )
T
t=1 t=1
P(X, qt = i|)δ(xt , vm )
(31) bim =
t=1
N
N
T
Q(, ) = Qπ (, π ) + Qa (, ai ) + Qb (, bi ) (32) P(X, qt = i|)δ(xt , vm )
i=1 i=1 t=1
= (39)
where π = [π1 , π2 , . . . , πN ], ai = [ai1 , ai2 , . . . , aiN ], bi =
T
P(X, qt = i|)
[bi1 , bi2 , . . . , biM ], and t=1
N
where {π }, {a}, and {b} achieve max Q(, ) as a function
Qπ (, π ) = P(X, q0 = i|) log πi (33)
i=1
of .
For continuous-density HMMs, the same iterative
N
T
procedure applies, although one needs to construct the
Qa (, ai ) = P(X, qt−1 = i, qt = j|) log aij (34) right form of the auxiliary function in order to be able
j=1 t=1
to carry out the maximization step. During the course
T of development of the HMM theory, Baum et al. [6] first
Qb (, bi ) = P(X, qt = i|) log bi (xt ) (35a) showed an algorithm to accomplish ML estimation for
t=1 an HMM with continuous in-state observation densities
M
T that are log-concave. The Cauchy distribution, which is
= P(X, qt = i|)δ(xt , vm ) log bim (35b) not log-concave, was cited as one that the algorithm
m=1 t=1 would have difficulty with. Later, Liporace [4], by invoking
a representation theorem, relaxed this restriction and
Note that in Eq. 35b, which is for a discrete HMM extended the algorithm to enable it to cope with the family
specifically, we define of elliptically symmetric density functions. The algorithm
was further enhanced at Bell Laboratories in the early
δ(xt , vm ) = 1 if xt = vm ; = 0, otherwise. 1980s [5] and extended to the case of general mixture
densities taking the form, for state i
These individual terms can be maximized separately over
π , {ai }N N
i=1 , {bi }i=1 . They share an identical form of an
M
optimization problem. Find yi , i = 1, 2, . . . , N, (or M in bi (x) = cik f (x; bik ) (40)
N
N
k=1
35b) to maximize wi log yi subject to yi = 1 and
i=1 i=1 where M is the number of mixture components, cik is
yi ≥ 0. This problem attains a global maximum at the the weight for mixture component k, and f (x; bik ) is
single point a continuous probability density function, log-concave
wi or elliptically symmetric, parameterized by bik . The
yi = , i = 1, 2, . . . , N (36) significance of this development is that the modeling
N
wk technique now has the capacity to deal with virtually
k=1 any distribution function since the mixture density can be
964 HIDDEN MARKOV MODELS
Λ1 HMM for
word 1
Probability P (X |Λ1)
evaluation
HMM for
Λ2 Index of
word 2
Observations recognized
word
Speech Feature Probability Maximum
analysis evaluation selection
V * = max P (X |Λv)
v
HMM for
Λv
word V
Probability
Figure 3. Block diagram of an isolated evaluation P (X |Λv )
word HMM recognizer (after Rabiner [3]).
used to approximate a distribution with arbitrary accuracy large-vocabulary ASR systems are based on mixture-
(by increasing M). This is important because many real- density HMMs.
world processes such as speech parameters do not follow Figure 3 illustrates the use of the HMM in an
the usual symmetric or single modal distribution form. isolated word recognition task. A recognizer requires
For detailed derivations of the reestimation formulas the knowledge of the a posteriori probabilities P(v|X)
for various kinds of density functions embedded in a for all the words/classes, v, 1 ≤ v ≤ V, evaluated on an
hidden Markov model, consult Rabiner [3] and Rabiner observed pattern or sequence, X. For simplicity, we
and Juang [10]. assume that the prior probabilities for all the classes
are equal and the recognition decision is thus based on
5. APPLICATIONS OF HMM the likelihood functions P(X|v ), 1 ≤ v ≤ V, with each
model v corresponding to a class v. The models P(X|v ),
Since its inception in the late 1960s and through its 1 ≤ v ≤ V, need to be ‘‘trained’’ on the basis of a set of
development in the 1970s and 1980s, the HMM has known samples during the design phase. ‘‘Training’’ is
found widespread use in many problem areas, such as the task of model parameter estimation using one or more
stockmarket analysis, cryptoanalysis, telecommunication observation sequences or patterns. Section 3.3 outlines the
modeling and management, and most notably automatic procedure for parameter estimation. Once these models
speech recognition, which we highlight here. are trained, they are stored in the system for likelihood
Automatic speech recognition (ASR) was traditionally evaluation on receipt of an unknown observation sequence.
treated as a speech analysis problem because of the The solution based on the forward probability calculation
complex nature of the signal. The main impact of the HMM as discussed in Section 3.1 is an efficient way to obtain
methodology was to formalize the statistical approach as the likelihood for each word in the vocabulary. The word
a feasible paradigm in dealing with the vast variability model that achieves the highest likelihood determines the
in the speech signal. Major sources of variability in a recognized word. This simple design principle has become
speech signal include the inherent uncertainty in speaking the foundation of many ASR systems.
rate variation, physiological variation such as speaker- For more sophisticated speech recognition tasks, such
specific articulatory characteristics, changes in speaking as large-vocabulary, continuous speech recognition, the
conditions (e.g., the Lombard effect [19]), uncertainty and design principle described above has to be modified to
coarticulation in speech production (e.g., underarticulated cope with the increased complexity. In continuous speech
phonemes), regional accents, and use of language. The recognition, event classes (whether they are based on
HMM has an intrinsic ability to cope with the speaking ‘‘lexical words’’ or ‘‘phonemes’’) are observed consecutively
rate variation due to the Markov chain, but the broad without clear demarcation (or pause), making it necessary
nature of variability calls for the use of a mixture- to consider composite HMMs, constructed from elementary
density HMM, particularly in the design of a system HMMs. For example, an elementary HMM may be chosen
that is supposed to perform equally well for a large to represent a phoneme, and a sequence of words would
population of users (i.e., a speaker-independent speech thus be modeled by a concatenation of the corresponding
recognition system [10]). Today, most high-performance phoneme models. The concatenated model is then used
HIDDEN MARKOV MODELS 965
in the likelihood calculation for making the recognition 2. L. R. Rabiner and B. H. Juang, An introduction to hidden
decision. The procedure is rather complex, and dynamic Markov models, IEEE ASSP Mag. 3(1): 4–16 (Jan. 1986).
programming techniques together with heuristics and 3. L. R. Rabiner, A tutorial on hidden Markov models and
combinatorics (to form search algorithms) are usually selected applications in speech recognition, Proc. IEEE 77(2):
employed to obtain the result. Also, because of the much- 257–286 (Feb. 1989).
increased variability in the realization of a continuous 4. L. R. Liporace, Maximum likelihood estimation for multivari-
speech signal, an elementary model is often qualified by ate observations of Markov sources, IEEE Trans. Inform.
its context (e.g., an ‘‘e-l-i’’ model for the phoneme /l/ to be Theory IT-28: 729–734 (Sept. 1982).
used in the word ‘‘element’’) in order to achieve the needed 5. B. H. Juang, Maximum likelihood estimation for mixture
accuracy in modeling. These context-dependent models multivariate stochastic observations of Markov chains, AT&T
greatly increase the complexity of the system design. Tech. J. 64: 1235–1249 (1985).
Today, HMM-based ASR systems are deployed in
6. L. E. Baum, T. Petri, G. Soules, and N. Weiss, A maximiza-
telecommunication networks for automatic call routing tion technique occurring in the statistical analysis of prob-
and information services, and in personal computers for abilistic functions of Markov chains, Ann. Math. Stat. 41:
dictation and word-processing. Table 1 provides a gist of 164–171 (1970).
the performance, in terms of word accuracy of various
7. L. E. Baum and J. A. Eagon, An inequality with applications
systems and tasks. Voice-enabled portals that bring
to statistical estimation for probabilistic functions of Markov
information services to the user over the Internet are also processes and to a model for ecology, Bull. Am. Math. Soc. 73:
emerging and are expected to gain popular acceptance in 360–363 (1967).
the near future. The HMM is the underpinning technology
8. D. Andelman and J. Reeds, On the cryptoanalysis of rotor
of these modern-day communication services.
machines and substitution-permutation networks, IEEE
Trans. Inform. Theory IT-28(4): 578–584 (July 1982).
BIOGRAPHY
9. H. Bunke and T. Caelli, Hidden Markov Model in Vision,
Dr. Biing-Hwang Juang received his Ph.D. in electrical a special issue of the International Journal of Pattern
and computer engineering from the University of Cali- Recognition and Artificial Intelligence, World Scientific,
fornia, Santa Barbara in 1981 and is currently Director Singapore, Vol. 45, June 2001.
of Multimedia Technologies Research at AVAYA Labs 10. L. R. Rabiner and B. H. Juang, Fundamentals of Speech
Research, a spin-off of Bell Laboratories. Before joining Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993.
AVAYA Labs Research, he was Director of Acoustics and 11. A. T. Bharucha-Reid, Elements of the Theory of Markov
Speech Research at Bell Laboratories. He is engaged in Processes and Their Applications, McGraw-Hill, New York,
and responsible for a wide range of research activities, 1960.
from speech coding, speech recognition, and intelligent sys- 12. G. D. Forney, The Viterbi algorithm, Proc. IEEE 61: 268–278
tems to multimedia and broadband communications. He (March 1973).
has published extensively and holds a number of patents
13. S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, An intro-
in the area of pattern recognition, speech communica- duction to the application of the theory of probabilistic
tion and telecommunication services. He is co-authr of functions of a Markov process to automatic speech recog-
the book Fundamentals of Speech Recognition published nition, Bell Syst. Tech. J. 62(4): 1035–1074 (April 1983).
by Prentice-Hall. He received the Technical Achievement
14. A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum
Award from the IEEE Signal Processing Society in 1998 likelihood from incomplete data via the EM algorithm, J.
and the IEEE Third Millennium Medal in 2000. He is an Roy. Stat. Soc. 39(1): 1–38 (1977).
IEEE Fellow and a Bell Labs Fellow.
15. R. Duda and P. Hart, Pattern Classification and Scene
Analysis, Wiley, New York, 1973.
BIBLIOGRAPHY
16. Y. Ephraim, A. Dembo, and L. R. Rabiner, A minimum
1. L. E. Baum, An inequality and associated maximization discrimination information approach for hidden Markov
techniques in statistical estimation for probabilistic functions modeling, IEEE. Trans. Inform. Theory IT-35(5): 1001–1003
of Markov processes, Inequalities 3: 1–8 (1972). (Sept. 1989).
966 HIGH-DEFINITION TELEVISION
17. A. Nadas, D. Nahamoo, and M. A. Picheny, On a model- high-resolution video. Resolution represents the amount
robust training method for speech recognition, IEEE Trans. of detail contained within the video, which can also be
Acoust. Speech Signal Process. ASSP-36(9): 1432–1436 (Sept. called ‘‘definition.’’ This is the basis for high-definition
1988). television. An NTSC system delivers video at a resolution
18. B. H. Juang, Wu Chou, and C. H. Lee, Minimum classifica- of approximately 480 lines in an interlaced format at an
tion error rate methods for speech recognition, IEEE Trans. approximate rate of 60 fields/s (it is actually 59.94 Hz, but
Speech Audio Process. SA-5(3): 257–265 (May 1997). we will not make a distinction between 59.94 and 60).
19. R. P. Lippmann, E. A. Martin, and D. B. Paul, Multi-style Each line of resolution contains approximately 420 pixels
training for robust isolated-word speech recognition, IEEE or picture elements. The number of lines represents the
ICASSP-87 Proc., April 1987, pp. 705–708. vertical spatial resolution in the picture, and the number of
pixels per line represents the horizontal spatial resolution.
Interlaced scanning refers to the scanning format. All
HIGH-DEFINITION TELEVISION∗ conventional television systems use this format. Television
systems deliver pictures that are snapshots of a scene
JAE S. LIM recorded at a certain number of times per second. In
Massachusetts Institute of interlaced scanning, a single snapshot consists of only odd
Technology lines, the next snapshot consists of only even lines, and
Cambridge, Massachusetts this sequence repeats. A snapshot in interlaced scanning
is called a field. In the NTSC system, 60 fields are used per
A high-definition television (HDTV) system is a television second. Although only snapshots of a scene are shown, the
system whose performance is significantly better than a human visual system perceives it as continuous motion, as
conventional television system. An HDTV system delivers long as the snapshots are shown at a sufficiently high rate.
spectacular video and multichannel CD (compact disk) In this way, the video provides accurate motion rendition.
quality sound. The system also has many features that More lines and more pixels per line in a field provide
are absent in conventional systems, such as auxiliary data more spatial details than the field can retain. An HDTV
channels and easy interoperability with computers and system may have 1080 lines and 1920-pixel/line resolution
telecommunications networks. in an interlaced format of 60 fields/sec. In this case, the
Conventional television systems were developed dur- spatial resolution of an HDTV system would be almost
ing the 1940s and 1950s. Examples of conventional 10 times that of an NTSC system. This high spatial
systems are NTSC (National Television Systems Com- resolution is capable of showing details in the picture
mittee), SECAM (Sequential Couleur a Memoire), and much more clearly, and the resultant video appears much
PAL (phase-alternating line). These systems have com- sharper. It is particularly useful for sports events, graphic
parable performance in video quality, audio quality, and material, written letters, and movies.
transmission robustness. The NTSC system, used in North The high spatial resolution in an HDTV system enables
America, will be used as a reference for conventional tele- a large-screen display and increased realism. For an NTSC
vision systems when discussing HDTV in this article. system, the spatial resolution is not high. To avoid the
For many decades, conventional television systems visibility of a line structure in an NTSC system, the
have been quite successful. However, they were developed recommended viewing distance is approximately 7 times
on the basis of the technology that was available during the picture height. For a 2-ft-high display screen the
the 1940s and 1950s. Advances in technologies such as recommended viewing distance from the screen is 14 ft,
communications, signal processing, and very-large-scale 7 times the picture height. This makes it difficult to
integration (VLSI) has enabled a major redesign of a have a large-screen television receiver in many homes.
television system with substantial improvements over Because of the long viewing distance, the viewing angle
conventional television systems. An HDTV system is one is approximately 10◦ , which limits realism. For an HDTV
result of this technological revolution. system with more than twice the number of lines, the
recommended viewing distance is typically 3 times the
1. CHARACTERISTICS OF HDTV picture height. For a 2-ft-high display, the recommended
viewing distance would be 6 ft. This can accommodate
Many characteristics of an HDTV system markedly a large-screen display in many environments. Because
differ from a conventional television system. These of a short viewing distance and wider aspect (width-to-
characteristics are described in this section. height) ratio, the viewing angle for an HDTV system is
1.1. High Resolution approximately 30◦ , which significantly increases realism.
An HDTV system can also deliver higher temporal res-
An HDTV system can deliver video with spatial resolution olution by using progressive scanning. Unlike interlaced
much higher than a conventional television system. scanning, where a snapshot (field) consists of only even
Typically, video with a spatial resolution of at least four lines or only odd lines, all the lines in progressive scanning
times that of a conventional television system is called are scanned for each snapshot. The snapshot in progres-
sive scanning is called a frame. Both progressive scanning
∗This article is a modified version of High Definition Television, and interlaced scanning have their own merits, and the
published in the Wiley Encyclopedia of Electrical and Electronics choice between the two generated much discussion dur-
Engineering; 1999, Vol. 8, pp. 725–739. ing the digital television standardization process in the
HIGH-DEFINITION TELEVISION 967
requires 6 MHz of bandwidth, the available VHF (very- converted to a more efficient technical system. In addition,
high-frequency) and UHF (ultra-high-frequency) bands, the introduction of HDTV would permanently require a
which are suitable for the terrestrial broadcasting of new channel for each existing NTSC channel. The FCC
television signals, were divided into 6-MHz channels. rejected this spectrum-inefficient augmentation channel
Initially, there was plenty of spectrum. The NTSC approach.
system, however, utilizes its given 6 MHz of spectrum Although the FCC’s decision did not require receiver
quite inefficiently. This inefficiency generates interference compatibility, it did require transmission of an entire
among the different NTSC signals. As the number of NTSC HDTV signal within a single 6-MHz channel. In the
signals that were broadcast terrestrially increased, the simulcast approach adopted by the FCC, an HDTV signal
interference problem became serious. The solution was to that can be transmitted in a single 6-MHz channel can
avoid using certain channels. These unused channels are be designed independently of the NTSC signal. An NTSC
known as ‘‘taboo channels.’’ In a typical highly populated television receiver cannot receive an HDTV signal. In order
geographic location in the United States, only one of two to receive an HDTV signal, a new television receiver would
VHF channels is used and only one of six UHF channels be needed. To ensure that existing television receivers do
is used. In addition, in the 1980s, other services such as not become obsolete when HDTV service is introduced,
mobile radio requested the use of the UHF band spectrum. the FCC would give one new channel for HDTV service to
As a result, an HDTV system that requires a large amount each NTSC station that requested it. During the transition
of bandwidth, such as Japan’s MUSE system, was not period, both NTSC and HDTV services coexist. After
an acceptable solution for terrestrial broadcasting in the sufficient penetration of HDTV service, NTSC service
United States. will be discontinued. The spectrum previously occupied by
At the request of the broadcast organizations, NTSC services will be used for additional HDTV channels
the United States Federal Communications Commis- or for other services. Initially, the FCC envisioned that the
sion (FCC) created the Advisory Committee on Advanced new HDTV channel and the existing NTSC channel would
Television Service (ACATS) in September 1987. ACATS carry the same programs, so as not to disadvantage NTSC
was chartered to advise the FCC on matters related to receivers during the transition period. This is the basis
the standardization of the advanced television service in for the term ‘‘simulcasting.’’ Later, this requirement was
the United States, including establishment of a technical removed. The simulcast approach is illustrated in Fig. 2.
standard. At the request of ACATS, industries, universi- The simulcast approach provides several major advan-
ties, and research laboratories submitted proposals for the tages. It presents the possibility of designing a new
ATV (advanced television) technical standard in 1988. spectrum-efficient HDTV signal that requires significantly
While the ACATS screened the proposals and prepared less power and that does not interfere with other signals,
testing laboratories for their formal technical evaluation, including the NTSC signal. This allows the use of the
the FCC made a key decision. In March 1990, the FCC taboo channels, which could not be used for additional
selected the simulcast approach for advanced television NTSC service because of the strong interference charac-
service rather than the receiver-compatible approach, in teristics of the NTSC signals. Without the taboo channels,
which existing NTSC television receivers can receive an it would not have been possible to give an additional chan-
HDTV signal and generate a viewable picture. This was nel to each existing NTSC broadcaster for HDTV service.
the approach taken when the NTSC introduced color. In addition, it eliminates the spectrum-inefficient NTSC
A black-and-white television receiver can receive a color channels following the transition period. The elimination
television signal and display it as a viewable black-and-
white picture. In this way, the then-existing black-and-
white television receivers would not become obsolete. It
was possible to use the receiver-compatible approach in the Simulcasting
case of color introduction. This is because color information Full HDTV Full HDTV
did not require a large amount of bandwidth and a small source display
portion of the 6-MHz channel used for a black-and-white
picture could be used to insert the color information Video Video
without seriously affecting the black-and-white picture. encoder decoder
In the case of HDTV, the additional information 6 MHz 6 MHz
needed was much more than the original NTSC signal Transmission 6 MHz NTSC
and the receiver-compatibility requirement would require Receiver
system taboo channel
additional spectrum to carry the HDTV signal. Among the
proposals received, the receiver-compatible approaches
typically required a 6-MHz augmentation channel that
carried the enhancement information, which was the NTSC NTSC
source display
difference between the HDTV signal and the NTSC
signal. Even though the augmentation approach solves
the receiver-compatibility problem, it has several major Transmitter Existing 6 MHz NTSC Receiver
problems. The approach requires an NTSC channel as allocated channel
a basis to transmit an HDTV signal. This means that Figure 2. Flowchart of simulcasting approach for transition from
the highly spectrum-inefficient NTSC system cannot be an NTSC system to a digital high-definition television system.
HIGH-DEFINITION TELEVISION 969
of NTSC broadcasting vacates the spectrum that it occu- the Grand Alliance chose the best technical elements
pied. Furthermore, by removing the NTSC signals that from the four systems and made further improvements
have strong interference characteristics, other channels on them. The Grand Alliance HDTV system was
could be used more efficiently. The 1990 FCC ruling was submitted to the ATTC and ATEL for performance
a key decision in the process to standardize the HDTV verification. Test results verified that the Grand Alliance
system in the United States. system performed better than the previous four digital
The 1990 decision also created several technical systems. A technical standard based on the Grand
challenges. The HDTV signal had to be transmitted in a Alliance HDTV prototype system was documented by
single 6-MHz channel. In addition, the signal was required the Advanced Television System Committee (ATSC), an
to produce minimal interference with NTSC signals and industry consortium.
other HDTV signals. At the time of the FCC’s decision The HDTV prototype proposed by the Grand Alliance
in 1990, it was not clear whether such a system could be was a flexible system that carried approximately
developed within a reasonable period of time. Later events 20 million bits per second (20 Mbps). Even though it used
proved that developing such a system at a reasonable the available bit capacity to transmit one HDTV pro-
cost to broadcasters and consumers was possible using gram, the bit capacity could also be used to transmit
modern communications, signal processing, and VLSI several programs of standard definition television (SDTV)
technologies. or other digital data such as stock quotes. SDTV res-
Before the formal technical evaluation of the initial olution is comparable to that of the NTSC, but it is
HDTV proposals began, some were eliminated, others substantially less than the HDTV. The documented tech-
were substantially modified, and still others combined nical standard (known as the ATSC standard) allowed
their efforts. Five HDTV system proposals were ultimately the transmission of SDTV programs as well as HDTV
approved for formal evaluation. One proposed an analog programs.
system while four others proposed all-digital systems. In November 1995, the ACATS recommended the
The five systems were evaluated in laboratory tests ATSC standard as the United States advanced television
at the Advanced Television Testing Center (ATTC) in standard to the FCC. The ATSC standard had allowed
Alexandria, Virginia. Subjective evaluation of picture a set of only 18 video resolution formats for HDTV
quality was performed at the Advanced Television and SDTV programs. The FCC eased this restriction in
Evaluation Laboratory (ATEL) in Ottawa, Canada. In December 1996, and decided that the ATSC standard
February 1993, a special panel of experts reviewed the with a relaxation of the requirements for video resolution
test results of the five HDTV system proposals and made format would be the United States digital television
a recommendation to the ACATS. standard. In early 1997, the FCC made additional rulings
The panel concluded that the four digital systems to support the new technical standard, such as channel
performed substantially better than the analog system. allocation for digital television service.
The panel also concluded that each of the four digital
systems excelled in different aspects. Therefore, the panel
could not recommend one particular system. The panel 3. AN HDTV SYSTEM
recommended that each digital system be retested after
improvements were made by the proponents. The four A block diagram of a typical HDTV system is shown
digital proponents had stated earlier that substantial in Fig. 3. The information transmitted includes video,
improvements could be made to their respective system. audio, and other auxiliary data such as stock quotes.
The ACATS accepted the panel’s recommendation and The input video source may have a format (spatial
decided to retest the four systems after improvements resolution, temporal resolution, or scanning format)
were made. As an alternative to the retest, the ACATS different from the formats used or preferred by the
encouraged the four proponents to combine the best video encoder. In this case, the input video format is
elements of the different systems and submit one single converted to a format used or preferred by the video
system for evaluation. encoder. The video is then compressed by a video encoder.
The four digital system proponents evaluated their Compression is needed because the bit rate supported by
options and decided to submit a single system. In May the modulation system is typically much less than the
1993, they formed a consortium called the Grand Alliance bit rate needed for digital video without compression.
to design and construct an HDTV prototype system. The audio, which may be multichannel for one video
The Grand Alliance consisted of seven organizations program, is also compressed. Since the bit rate required
who were members at the inception of the Grand for audio is much less than that for video, the need
Alliance. Later, some member organizations changed to compress the audio is not as crucial. Any bit-rate
their names: (1) General Instrument first proposed digital savings, however, can be used for additional bits for
transmission of an HDTV signal and submitted one video or other auxiliary data. The data may represent
of the four initial systems; Massachusetts Institute of any digital data, including additional information for
Technology submitted a system together with General video and audio. The compressed video data, compressed
Instrument; AT&T and Zenith submitted one system audio data, and any other data are multiplexed by a
together; and Philips, the David Sarnoff Research transport system. The resulting bitstream is modulated.
Center, and Thomson Consumer Electronics submitted The modulated signal is then transmitted over a
one system together. Between the years 1993 and 1995, communication channel.
970 HIGH-DEFINITION TELEVISION
(a)
VHF / UHF
transmitter
• Transmission
Video
Video • format
• coding
selection
_ _
•
Multiplexer/
_ •_
transport
Transmission _ _
•
Audio _ _
Audio system: •
coding
modulator
Data
(b)
Format Video
conversion decoding
Demultiplexer
Display
Transmission
system:
tuner
demodulator
Audio
decoding
Figure 3. A block diagram of a typical HDTV system: (a) transmitter; (b) receiver.
At the receiver, the received signal is demodulated to for its intended application. The system was the basis for
generate a bitstream, which is demultiplexed to produce the United States digital television standard. Even though
compressed video, compressed audio, and other data. The this article focuses on one system, many issues and design
compressed video is then decompressed. The video format considerations encountered in the Grand Alliance HDTV
received may not be the same as the format used in the system could be applied to any HDTV system.
display. In this case, the received video format is converted The overall Grand Alliance HDTV system consists of
to the proper display format. The compressed audio, which five elements: transmission format selection, video coding,
may be multichannel, is decompressed and distributed audio coding, multiplexing, and modulation. These are
to different speakers. The use of the data received described in the following sections.
depends on the type of information the data contains. The
communication channel in Fig. 3 may represent a storage 3.1. Transmission Format Selection
device such as digital video disk. If the available bit rate
can support more than one video program, multiple video A television system accommodates many video input
programs can be transmitted. sources such as videocameras, film, magnetic and optical
There are many different possibilities for the design media, and synthetic imagery. Even though these different
of an HDTV system. For example, various methods can input sources have different video formats, a conventional
be used for video compression, audio compression, and television system such as the NTSC uses only one single
modulation. Some modulation methods may be more transmission format. This means the various input sources
suitable for terrestrial transmission, while others may are converted to one format and then transmitted. Using
be more suitable for satellite transmission. Among the one format simplifies the receiver design because a receiver
many possibilities, this article will focus on the Grand can eliminate format conversion by designating the display
Alliance HDTV system. This system was designed over format to be the same as the transmission format. This is
many years of industry competition and cooperation. shown in Fig. 4. When the NTSC system was standardized
The system’s performance was carefully evaluated by in the 1940s and 1950s, format conversion would have been
laboratory and field tests, and was judged to be acceptable costly.
HIGH-DEFINITION TELEVISION 971
Transmitter
Receiver
One
format
Display Decode Receive Figure 4. A television system with one single
video transmission format.
The disadvantage of using one transmission format is Table 1. HDTV Transmission Formats Used in the Grand
the inefficient use of the available spectrum, since all Alliance System
the video input sources must be converted to one format Frame/Field Rate
and then transmitted in that format. For example, in the Spatial Resolution Scanning Format (Frames/s)
NTSC system, film (whose native format is 24 frames/s
720 × 1280 Progressive scanning 60
with progressive scanning) is converted to 60 fields/sec
720 × 1280 Progressive scanning 30
with interlaced scanning. It is then transmitted in the 720 × 1280 Progressive scanning 24
NTSC format. Transmission of video in a format other 1080 × 1920 Progressive scanning 30
than its native format is an inefficient use of the spectrum. 1080 × 1920 Progressive scanning 24
The Grand Alliance HDTV system utilizes multiple 1080 × 1920 Interlaced scanning 60
transmission formats. This allows the use of a video
transmission format that is identical to or approximates
the native video source format. In addition, the system
The Grand Alliance system utilizes both 720 lines
allows the use of different formats for various applications.
and 1080 lines. The number of pixels per line was
From the viewpoint of spectrum efficiency, allowing all
chosen so that the aspect ratio (width-to-height ratio)
possible video formats would be ideal. Since a display (such is 16 × 9 with square pixels. When the spatial vertical
as a CRT) has typically one display format, the different dimension that corresponds to one line equals the spatial
formats received must be converted to one display format, horizontal dimension that corresponds to one pixel, it is
as shown in Fig. 5. Allowing for too many formats can called ‘‘square pixel.’’ For 720 lines, the scanning format
complicate the format conversion operation. In addition, is progressive. The highest frame rate is 60 frames/s.
most of the benefits derived from multiple formats can be The pixel rate is approximately 55 Mpixels/s (million
obtained by carefully selecting a small set of formats. For pixels per second). For the video compression and
HDTV applications, the Grand Alliance system utilizes modulation technologies used, a substantial increase in
six transmission formats as shown in Table 1. In the the pixel rate above 60–70 Mpixels/s may result in a
table, the spatial resolution of C × D means C lines of noticeable degradation in video quality. At 60 frames/s
vertical resolution with D pixels of horizontal resolution. with progressive scanning, the temporal resolution is
The scanning format is either a progressive scan or an very high, and smooth motion rendition results. This
interlaced scan format. The ‘‘frame/field rate’’ refers to the format is useful for sports events and commercials. The
number of frames/s for progressive scan and the number 720-line format also allows the temporal resolution of
of fields/s for interlaced scan. 30 frames/s and 24 frames/s. These frame rates were
Transmitter
Format 1
Source material 1 encoder
Transmit
Format N
Source material N encoder
Receiver
Format 1
decoder
Format
Display conversion Receive
Format N Figure 5. A television system with multiple
decoder
video transmission formats.
972 HIGH-DEFINITION TELEVISION
chosen to accommodate film and graphics. For film, whose For this and other reasons like simple processing,
native format is 24 frames/s, conversion to 60 frames/s and the computer industry preferred only the progressive
then compressing it will result in a substantial inefficiency transmission format for television. Other industries, such
in spectrum utilization. For 720 lines at 24 frames/s, it is as television manufacturers and broadcasters, preferred
possible to simultaneously transmit two high-resolution interlaced scanning. This is because they were accustomed
video programs within a 6-MHz channel because of the to interlaced scanning. Interlaced scanning worked well
lower pixel rate (approximately 22 Mpixels/s each). for entertainment material. Early video equipment, such
In the 1080-line format, two temporal rates in as videocameras, was developed only for interlaced
progressive scan are 30 and 24 frames/s. These temporal scanning. The Grand Alliance HDTV system used five
rates were chosen for film and graphics with the highest progressive scan formats and one interlaced scan format.
spatial resolution. Another temporal rate used is the The FCC decision in December 1996 removed most of
1080-line interlaced scan at 60 fields/s. This is the only the restrictions on transmission formats and allowed both
interlaced scan HDTV format used in the Grand Alliance progressive and interlaced formats. This decision left the
system. It is useful for scenes obtained with a 1080- choice of transmission format to free market forces.
line interlaced-scan camera. The pixel rate for 1080-line A multiple transmission format system utilizes the
progressive scan at 60 frames/s would be more than available spectrum more efficiently than does a single
120 Mpixels/s. The video encoded for such a high pixel rate transmission format system by better accommodating
can result in a substantial degradation in video quality for video source materials with different native formats. In
the compression and modulation technologies used in the addition, multiple transmission formats can be used for
Grand Alliance system. Therefore, there is no 1080-line, various applications. Table 2 shows possible applications
60-frame/s progressive scan format in the system. for the six Grand Alliance HDTV formats. For a
All conventional television systems, such as the NTSC, multiple transmission format system, one of the allowed
utilize only interlaced scanning. In such systems, a display transmission formats is chosen for a given video program
format is matched to the single-transmission format. The prior to video encoding. The specific choice depends on the
display requires at least a 50–60 Hz rate with a reasonable native format of the input video material and its intended
amount of spatial resolution (approximately 480 active application. The same video program within a given
lines for the NTSC system). An alternative strategy in format may be assigned to a different transmission format,
the NTSC system would be to preserve 480 lines with depending on the time of broadcast. If the transmission
progressive scan, but at 30 frames/s. To avoid display format chosen is different from the native format of the
flicker, each frame can be repeated twice at the display, video material, format conversion occurs.
making the display rate 60 Hz. Repetition of a frame at
the receiver would require frame memory, which was not 3.2. Video Coding
possible with the technologies available when the NTSC The Grand Alliance HDTV system transmits at least
system was standardized. Because of the exclusive use one HDTV program in a single 6-MHz channel. For
of interlaced scanning in conventional television systems, the modulation technology used in the Grand Alliance
early HDTV video equipment, such as videocameras, was system the maximum bit rate available for video is
developed for interlaced scanning. approximately 19 Mbps. For a typical HDTV video input,
An interlaced display has video artifacts such as the incompressed bit rate is on the order of 1 Gbps. This
interline flicker. Consider a sharp horizontal line that means the input video must be compressed by a factor
is in the odd field, but not in the even field. Even though of >50.
the overall large area flicker rate is 60 Hz, the flicker For example, consider an HDTV video input of
rate for the sharp horizontal line is only 30 Hz. As a 720 × 1280 with progressive scan at 60 frames/s. The
result, the line flickers in a phenomenon called interline pixel rate is 55.296 Mpixels/s. A color picture consists
flicker. Interline flickers are particularly troublesome for of three monochrome images: red, green, and blue. The
computer graphics or written material that contains many red, green, and blue colors are three primary colors of an
sharp lines. Partly for this reason, almost all computer
monitors use progressive scanning.
When a television system is used as a standalone Table 2. Applications of HDTV Transmission Formats
entertainment device, its interoperability with computers Used in the Grand Alliance System
is not a serious issue. For a digital HDTV system, however,
Format Applications
it is no longer a standalone entertainment device, and its
interoperability with computers and telecommunications 720 × 1280, PS, 60 frames/s Sports, concerts, animation,
networks is useful. When a display device uses progressive graphics, upconverted
scan and an interlaced transmission format is used, a NTSC, commercials
conversion process called deinterlacing must convert the 720 × 1280, PS, 24 frames/s Complex film scenes,
interlaced scan format to a progressive scan format before or 30 frames/s graphics, animation
it is displayed. A high-performance deinterlacer requires 1080 × 1920, PS, 24 frames/s Films with highest spatial
complex signal processing. Even when a high-performance or 30 frames/s resolution
deinterlacer is used, a progressive transmission format 1080 × 1920, IS, 60 fields/s Scenes shot with an
yields better performance than an interlaced transmission interlaced scan camera
format for graphics, animation, and written material.
HIGH-DEFINITION TELEVISION 973
additive color system. By mixing the appropriate amounts coding standards for moving pictures and associated audio.
of red, green, and blue lights, many different color lights In 1991, the group developed the ISO standard 11172,
can be generated. By mixing a red light and a green called coding of moving pictures and associated audio. This
light, for example, a yellow light can be generated. A standard, known as MPEG-1, is used for digital storage
pixel of a color picture consists of the red, green, and media at up to ∼1.5 Mbps.
blue components. Each component is typically represented In 1996, the group developed the ISO standard 13818
by 8 bits (256 levels) of quantization. For many video called Generic Coding of Moving Pictures and Associated
applications, such as television, 8 bits of quantization are Audio. This standard, known as MPEG-2, is an extension
considered sufficient to avoid video quality degradation by of MPEG-1 that allows flexibility in the input format
quantization. Each pixel is then represented by 24 bits. and bit rates. The MPEG-2 standard specifies only the
The bit rate for the video input of 720 × 1280 with syntax of the coded bitstream and the decoding process.
progressive scan at 60 frames/s is approximately 1.3 Gbps. This means that there is some flexibility in the encoder.
In this example, reducing the data rate to 19 Mbps As long as the encoder generates a bitstream that is
requires video compression by a factor of 70. consistent with the MPEG-2 bitstream syntax and the
Video compression is achieved by exploiting the MPEG-2 decoding process, it is considered a ‘‘valid’’
redundancy in the video data and the limitations of encoder. Since many methods of generating the coded
the human visual system. For typical video, there is a bitstream are consistent with the syntax and the decoding
considerable amount of redundancy. For example, much process, some optimizations and improvements can be
of the change between two consecutive frames is due made without changing the standard. The Grand Alliance
to the motion of an object or the camera. Therefore, HDTV system uses some compression methods included
a considerable amount of similarity exists between in MPEG-2 to generate a bitstream that conforms to the
two consecutive frames. Even within the same frame, MPEG-2 syntax. An MPEG-2 decoder can decode a video
the pixels in a neighborhood region typically do not bitstream generated by the Grand Alliance video coder.
vary randomly. By removing the redundancy, the same
(redundant) information is not transmitted. 3.3. Audio Processing. The Grand Alliance HDTV
For television applications, the video is displayed for system compresses the audio signal to efficiently use
human viewers. Even though the human visual system the available spectrum. To reconstruct CD-quality audio
has enormous capabilities, it has many limitations. For after compression, the compression factor for audio is
example, the human visual system does not perceive substantially less than that of the video. High-quality
well the spatial details of fast-changing regions. The audio is important for HDTV viewing. The data rate for
high spatial resolution in such cases does not need audio is inherently much lower than that for video, and
to be preserved. By removing the redundancy in the the additional bit-rate efficiency that can be achieved at
data and exploiting the limitations of the human visual the expense of audio quality is not worthwhile for HDTV
system, many methods of digital video compression were applications.
developed. A digital video encoder usually consists of the Consider one channel of audio. The human auditory
three basic elements shown in Fig. 6. The first element system is not sensitive to frequencies above 20 kHz.
is representation. This element maps the input video to The audio signal sampled at a 48-kHz rate is sufficient
a domain more suitable for subsequent quantization and to ensure that the audio information up to 20 kHz is
codeword assignment. The quantization element assigns preserved. Each audio sample is typically quantized at
reconstruction (quantization) levels to the output of the 16 bits/sample. The total bit rate for one channel of
representation element. The codeword assignment selects audio input is 0.768 Mbps. Exploiting the limitations of
specific codewords (a string of zeros and ones) to the the human auditory system, the bit-rate requirement
reconstruction levels. The three elements work together to is reduced to 0.128 Mbps, with the reproduced audio
reduce the required bit rate by removing the redundancy quality almost indistinguishable from that of the input
in the data and exploiting the limitations of the human audio. The compression factor achieved is 6, which is
visual system. substantially less than the video’s compression factor of
Many different methods exist for each of the three >50. In the case of video, it is necessary to obtain a
elements in the image coder. The Grand Alliance system very high compression factor, even at the expense of some
utilizes a combination of video compression techniques noticeable quality degradation for difficult scenes because
that conform to the specifications of the MPEG-2 (Moving of the very high-input video bit rate (>1 Gbps). In the case
Pictures Expert Group) video compression standard. This of audio, additional bit-rate savings from 0.128 Mbps at
is one of many possible approaches to video compression. the expense of possible audio quality degradation is not
considered worthwhile for HDTV applications. Similar to
MPEG-2 Standard. The International Standard Orga- video compression, audio compression utilizes reduction of
nization (ISO) established the Moving Pictures Expert redundancy in the audio data and the limitations of the
Group (MPEG) in 1988. Its mission was to develop video human auditory system.
Video Codeword
source Representation Quantization Bit stream Figure 6. Three basic elements of
assignment
a video encoder.
974 HIGH-DEFINITION TELEVISION
The Grand Alliance HDTV system uses a modular it does not utilize all of its capabilities. This means that
approach to the overall system design. Various technolo- the Grand Alliance decoder cannot decode an arbitrary
gies needed for a complete HDTV system can be chosen MPEG-2 systems bitstream, but all MPEG-2 decoders can
independently from each other. The audio compression decode the Grand Alliance bitstream.
method, for example, can be chosen independent from The bitstream that results from a particular application
the video compression method. Even though the MPEG-2 such as video, audio, or data is called an elementary
standard used for video compression in the Grand Alliance bitstream. The elementary bitstreams transmitted in a
system includes an audio compression method, the Grand 6-MHz channel are multiplexed to form the program
Alliance selected the Audio Coder 3 (AC-3) standard on transport bitstream. Each elementary bitstream has a
the basis of several factors including performance, bit-rate unique program identification (PID) number, and all
requirement, and cost. the elementary bitstreams within a program transport
The Grand Alliance system can encode a maximum of bitstream have a common timebase.
six audio channels per audio program. The channelization An example of the multiplex function used to form a
follows the ITU-R recommendation BS-775: Multichannel program transport stream is shown in Fig. 7. The first two
Stereophonic Sound System with and without Accompany- elementary streams are from one television program. The
ing Picture. The six audio channels are left, center, right, next two elementary streams are from another television
left surround, right surround, and low-frequency enhance- program. As long as the available bit rate for the channel
ment. The bandwidth of the low-frequency enhancement can accommodate more than one television program,
channel extends to 120 Hz, while the other five chan- the transport system will accommodate them. The fifth
nels extend to 20 kHz. The six audio channels are also elementary stream is only an audio program without the
referred to as ‘‘5.1 channels.’’ Since the six channels are corresponding video. The next two elementary streams
not completely independent for a given audio program, are from two different datastreams. The last elementary
this dependence can be exploited to reduce the bit rate. stream contains the control information, which includes a
The Grand Alliance system encodes 5.1 channel audio at program table that lists all the elementary bit streams,
a bit rate of 384 kbps with audio quality essentially the their PIDs, and the applications such as video, audio, or
same as that of the original. data. All eight elementary streams that form the program
transport bitstream have the same timebase.
3.4. Transport System The Grand Alliance transport system uses the fixed-
length packet structure shown in Fig. 8. Each packet
The bitstreams generated by the video and audio encoders consists of 188 bytes, which is divided into a header
and the data channel must be multiplexed in an organized field and payload. The header field contains overhead
manner so that the receiver can demultiplex them information, and the payload contains the actual data
efficiently. This is the main function of the transport that must be transmitted. The size of the packet is
system. The Grand Alliance system uses a transport chosen to ensure that the actual payload-to-overhead
format that conforms to the MPEG-2 system standard, but ratio is sufficiently high and that a packet lost during
188 bytes
4 bytes
Variable length
Link header Payload
adaptation header
(not to scale)
Figure 8. Fixed-length packet structure used in the Grand Alliance transport system.
HIGH-DEFINITION TELEVISION 975
transmission will not have serious consequences on the not high, several standard definition television programs
received video, audio, and data. The payload of each (comparable to the NTSC resolution) can be transmitted.
packet contains bits from only one particular elementary This is in sharp contrast with the NTSC system, where
bitstream. For example, a packet cannot have bits a fixed bandwidth is allocated to one video program and
from both a video elementary bitstream and an audio a fixed bandwidth is allocated to audio. The capability to
elementary bitstream. dynamically allocate bits as the need arises is a major
Information that is not actual data but is important feature of the transport system.
or useful for decoding the received bitstream is con- The transport system is also scalable. If a higher-bit-
tained in the header. The header of each packet includes rate channel is available, the same transport system can be
a 4-byte link header and may also include a variable- used by simply adding elementary bitstreams. The system
length adaptation header when needed. The link header is also extensible. If future services become available,
includes 1-byte synchronization to indicate the begin- such as 3D television, they can be added as an elementary
ning of each packet, a 13-bit PID to identify which stream with a new PID. Existing receivers that do not
elementary stream is contained in the packet, and infor- recognize the new PID will ignore the new elementary
mation as to whether the adaptation header is included stream. New receivers will recognize the new PID.
in the packet. The adaptation header contains timing The transport system is also robust in terms of
information to synchronize decoding and a presentation transmission errors, and is amenable to cost-effective
of applications such as video and audio. This informa- implementation. The detection and correction of trans-
tion can be inserted into a selected set of packets. The mission errors can be synchronized easily because of the
adaptation header also contains information that facili- fixed-length packet structure; this structure also facili-
tates random entry into application bitstreams in order tates simple demultiplex designs for low-cost, high-speed
to support functions such as program acquisition and implementation.
change.
On the transmitter side, the bits from each elementary 3.5. Transmission System
stream are divided into packets that contain the same
PID. The packets are multiplexed to form the program The bitstream generated by the transport system must
transport bitstream. An example is shown in Fig. 9. At be processed in preparation for modulation, and then
the receiver, the synchronization byte, which is the first modulated for transmission. Choosing from the many
byte in each packet, is used to identify the beginning of modulation methods depends on several factors, includ-
each packet. From the program table contained in the ing the transmission medium and the specific application.
control packet, information can be obtained on which The best method for terrestrial broadcasting may not be
elementary streams are in the received program transport the best for satellite broadcasting. Even for terrestrial
bitstream. This information, together with the PID in broadcasting, the use of taboo channels means that inter-
each packet, is used to separate the packets into different ference with existing NTSC channels must be considered
elementary bitstreams. The information contained in the to determine the specific modulation technology. Consid-
adaptation header in a selected set of packets is used for ering several factors, such as coverage area, available bit
the timing and synchronization of the decoding and for rate, and complexity, the Grand Alliance system uses an 8-
the presentation of different applications (video, audio, VSB system for terrestrial broadcasting. A block diagram
data, etc.). of the 8-VSB system is shown in Fig. 10.
The transport system used in the Grand Alliance
system has many advantages. The system is very flexible 3.5.1. Data Processing. The data processing part of
in dynamically allocating the available channel capacity the 8-VSB system consists of a data randomizer, a
to video, audio, and data. The system can devote all Reed–Solomon encoder, a data interleaver, and trellis
available bits to video, audio, or data, or any combination encoder. Data packets of 188 bytes per packet are
thereof. The system also can allocate available bits to received from the transport system and randomized.
more than one television program. If video resolution is Portions of the bitstream from the transport system
Data processing
Reed-
Bit stream Data Data Trellis
solomon
randomizer interleaver encoder
encoder
Multiplexer
Sync
RF VSB Pilot
RF
Up-converter modulator insertion
Figure 10. A block diagram of an 8-VSB system used in the Grand Alliance HDTV system.
may have some pattern and may not be completely decoder is effective in combating the random and short-
random. Randomization ensures that the spectrum of burst bit errors, but it can create long-burst bit errors
the transmitted signal is flat and is used efficiently. In in the presence of strong interference and bursts. The
addition, the receiver exhibits optimal performance when Reed–Solomon code is effective in combating long-burst
the spectrum is flat. bit errors.
The transmission channel introduces various types The 8-VSB system transmits approximately 10.76
of noise such as random noise and multipath. They million symbols per second (SPS), with each symbol
manifest themselves as random and bursty bit errors. To representing 3 bits (eight levels). When accounting for
handle these bit errors, two forward error correction (FEC) the overhead associated with the trellis coder, the
schemes are used. These FEC schemes add redundancy Reed–Solomon coder, and additional synchronization
to the bitstream. Errors are then detected and corrected bytes, the bit rate available to the transport system’s
by exploiting this added redundancy. Even though error decoder is approximately 19.4 Mbps. The bit rate is not
correction bits use some of the available bit rate, they only for applications such as video, audio, and data, but
increase overall performance of the system by detecting also for overhead information (link header, etc.) in the
and correcting errors. 188-byte packets. The actual bit rate available for the
The first error correction method is the Reed–Solomon applications is less than 19.4 Mbps. The VSB system can
code, known for its burst noise correction capability and
deliver a higher bit rate, for example, by reducing the
efficiency in overhead bits. The burst noise correction
redundancy in the error correction codes and by increasing
capability is particularly useful when the Reed–Solomon
the number of levels per symbol. However, the result is
code is used with the trellis code. The trellis code
loss of performance in other aspects such as the coverage
is effective in combating random and short impulsive
area. The 8-VSB system delivers the 19.4-Mbps bit rate to
noise, but it tends to generate burst errors in the
ensure that an HDTV program can be delivered within a
presence of strong interference and burst noise. The
Reed–Solomon code used in the 8-VSB system adds 6-MHz channel with a coverage area at least comparable
approximately 10% of overhead to each packet of data. to an NTSC system, and with an average power level below
The result is a data segment that consists of a packet the NTSC power level in order to reduce interference with
from the transport system and the Reed–Solomon code the NTSC signals.
bits. The resulting bytes are convolutionally interleaved The trellis encoding results in data segments. At the
over many data segments. The convolutional interleaving beginning of each set of 312 data segments, a data segment
that is useful in combating the effects of burst noise and is inserted that contains the synchronization information
interference is part of the trellis encoding. The trellis for the set of 312 data segments. This data segment
encoder, which is a powerful technique for correcting also contains a training sequence that can be used for
random and short-burst bit errors, adds additional channel equalization at the receiver. Linear distortion in
redundancy. In the 8-VSB system, the trellis coder the channel can be accounted for by an equalizer at the
creates a 3-bit symbol (eight levels) from a 2-bit data receiver. At the beginning of each data segment, a 4-symbol
symbol. synchronization signal is inserted. The data segment sync
At the transmitter, the Reed–Solomon encoder pre- and the 312-segment set sync are not affected by the trellis
cedes the trellis encoder. At the receiver, the trellis encoder and can provide synchronization independent of
decoder precedes the Reed–Solomon decoder. The trellis the data.
HIGH-DEFINITION TELEVISION 977
3.5.2. Pilot Insertion. Prior to modulation, a small pilot of the receiver is the NTSC rejection filter, which
is inserted in the lower band within the 6-MHz band. The is useful because of the way HDTV service is being
location of the pilot is on the Nyquist slope of NTSC introduced in the United States. In some locations, an
receivers. This ensures that the pilot does not seriously HDTV channel occupies the same frequency band as
impair existing NTSC service. The channel assigned for an existing NTSC channel located some distance away.
the HDTV service may be a taboo channel that is currently The interference of the HDTV signal with the NTSC
unused because of cochannel interference with an existing channel is minimized by a very-low-power level of the
NTSC service located some distance away. The HDTV HDTV signal. The low-power level was made possible
signal must be designed to ensure that its effect on existing for HDTV service because of its efficient use of the
service is minimal. spectrum. To reduce interference between the NTSC signal
The NTSC system is rugged and reliable. The main and the HDTV channel, the 8-VSB receiver contains an
reason for this is the use of additional signals for NTSC rejection filter. This is a simple comb filter whose
synchronization that do not depend on the video signals. rejection null frequencies are close to the video carrier,
The NTSC receiver reliably synchronizes at noise levels chroma carrier, and audio carrier frequencies of the
well below the loss of pictures. In the 8-VSB system, a NTSC signal. The comb filter, which reduces the overall
similar approach is taken. The pilot signal that does not performance of the system, can be activated only when a
depend on the data is used for carrier acquisition. In strong cochannel NTSC signal interferes with the HDTV
addition, the data segment sync is used to synchronize the channel.
data clock for both frequency and phase. The 312-segment-
set sync is used to synchronize the 312 segment-set
and equalizer training. Reliance on additional signals for 3.5.5. Cliff Effect. Although additive white Gaussian
carrier acquisition and clock recovery is very useful. Even noise does not represent typical channel noise, it is
when occasional noise in the field causes a temporary often used to characterize the robustness of a digital
loss of data, a quick recovery is possible as long as communication system. The segment error probability
the carrier acquisition and clock recovery remain locked for the 8-VSB system in the presence of additive white
during the data loss. The 8-VSB system ensures that Gaussian noise is shown in Fig. 11. At the signal-to-noise
carrier acquisition and clock recovery remain intact well ratio (S/N) of 14.9 dB, the segment error probability is
below the threshold level of data loss by relying on 1.93 × 10−4 or 2.5 segment errors/sec. At this threshold
additional signals for such functions. the segment errors become visible. Thus, up to an S/N of
14.9 dB, the system is perfect. At an S/N of 14 dB, which
3.5.3. Modulation. For transmission of the prepared is just 0.9 dB less than the threshold of visibility (ToV),
bitstream (message) over the air, the bitstream must practically all segments are in error. This means a system
be mapped to a bandpass signal that occupies a 6- that operates perfectly above the threshold of visibility
MHz channel allocated to a station’s HDTV service. A becomes unusable when the signal level decreases by
modulator modulates a carrier wave according to the
1 dB or when the noise level increases by 1 dB. This is
prepared bitstream. The result is a bandpass signal
known as the ‘‘cliff effect’’ in a digital communication
that occupies a given 6-MHz channel. The 8-VSB system
system. In the case of the NTSC system, the picture
modulates the signal onto an IF (intermediate-frequency)
quality decreases gradually as the S/N decreases. To
carrier, which is the same frequency for all channels.
avoid operating near the cliff region, a digital system
It is followed by an upconversion to the desired HDTV
is designed to operate well above the threshold region
channel.
within the intended coverage area. Both laboratory and
3.5.4. Receiver. At the receiver, the signal is processed field tests have indicated that the coverage area of the
in order to obtain the data segments. One feature HDTV channel is comparable to or greater than the NTSC
1E+01
1E+00
1E− 01
Probability of error
1E− 02
1E−03
1E−04
1E −05
1E− 06
1E− 07
1E− 08
Figure 11. Segment error probability for the
8 9 10 11 12 13 14 15 16 17 18
8-VSB system in the presence of additive white
S/N in db Gaussian noise.
978 HIGH-DEFINITION TELEVISION
channel, despite the substantially lower power level used of common elements, interoperability among different
for the HDTV signal. standards will become an important issue. In terms
of program exchange for different delivery media and
3.5.6. Cable Mode. A cable environment introduces throughout the world, technologies that convert one
substantially less channel impairment than does terres- standard to another will continue to be developed.
trial broadcasting for a variety of reasons. In the case of Efforts to facilitate the conversion process by adopting
cable, for example, cochannel interference from an exist- common elements among the different standards also will
ing NTSC station is not an issue. This can be exploited to continue.
deliver a higher bit rate for applications in a 6-MHz cable Interoperability will be an issue not only among the
channel. Although the Grand Alliance HDTV system was different HDTV standards, but among telecommunication
designed for terrestrial broadcasting, the system includes services and computers as well. A traditional television
a cable mode that doubles the available bit rate using a system has been a standalone device whose primary pur-
small modification. Doubling the bit rate means the ability pose was entertainment. Although an HDTV system is
to transmit two HDTV programs within a single 6-MHz used for entertainment, it can be an integral part of
cable channel. a home center for entertainment, telecommunications,
In order to double the bit rate for cable mode, the and information. The HDTV display can be used as a
Grand Alliance System uses 16-VSB rather than 8-VSB. videophone, a newspaper service, or a computer display.
All other aspects of the system such as video compression, Interoperability between an HDTV system and other ser-
audio compression, and transport remain the same. In the vices that will be integrated in the future is an important
8-VSB system, a symbol is represented by eight levels or consideration.
3 bits. One of the 3 bits is due to the redundancy created by
trellis encoding. For a cable environment, error correction BIBLIOGRAPHY
from the powerful trellis coding is no longer needed. In
addition, the higher available S/N for cable means that 1. ATSC, Digital Audio Compression (AC-3), Dec. 20, 1995.
a symbol can be represented by 16 levels or 4 bits. Since 2. ATSC, Digital Television Standard, Sept. 16, 1995.
the symbol rate remains the same between the 8-VSB and 3. ATSC, Guide to the Use of the ATSC Digital Television
16-VSB systems, the available bit rate for the 16-VSB Standard, Oct. 4, 1995.
system doubles in comparison with the 8-VSB system. For
4. V. Bhaskaran and K. Konstaninides, Image and Video
the 16-VSB system without trellis coding, the S/N ratio for
Compression Standards and Architectures, Kluwer Academic
the segment error rate that corresponds to the threshold Publishers, 1995.
of visibility in the environment of additive white Gaussian
5. W. Bretl, G. Sgrignoli, and P. Snopko, VSB Modem Subsystem
noise is approximately 28 dB. This is 13 dB higher than
Design for Grand Alliance Digital Television Receivers, ICCE,
the 8-VSB system with the trellis coding. This increase
1995.
in the S/N is acceptable in a cable environment that has
substantially less channel impairments than a typical 6. A. B. Carlson, Communication Systems, 3rd ed., McGraw-
Hill, 1986.
terrestrial environment.
7. The Grand Alliance Members, The U.S. HDTV standard. The
Grand Alliance, IEEE Spectrum 36–45 (April 1995).
4. HDTV AND INTEROPERABILITY
8. ISO/IEC JTC1 CD 11172, Coding of Moving Pictures
and Associated Audio for Digital Storage Media up to
The Grand Alliance HDTV system has served as the 1.5 Mbits/s, International Organization for Standardiza-
basis for the digital television standard in the United tion (ISO),
States. The standard itself defines a significant number 1992.
of technical elements. The technologies involved, however,
9. ISO/IEC JTC1 CD 13818, Generic Coding of Moving
will continue to develop without the need to modify the Pictures and Associated Audio, International Organization
standard. For example, the video compression system for Standardization (ISO), 1994.
that was adopted defines syntax for only the decoder.
10. N. S. Jayant and P. Noll, Digital Coding of Waveforms,
There is much room for improvement and advances
Prentice-Hall, Englewood Cliffs, NJ, 1984.
in the encoder. Technical elements can also be added
in a backward-compatible manner. The transmission of 11. J. S. Lim, Two-Dimensional Signal and Image Processing,
Prentice-Hall, Englewood Cliffs, NJ, 1990.
a very high-definition (VHD) television format, which
was not provided in the initial standard, can be 12. A. N. Netravali and B. G. Haskell, Digital Pictures: Rep-
accomplished in a backward-compatible manner. This resentation and Compression, Plenum Press, New York,
can be achieved by standardizing a method to transmit 1988.
enhancement data. This, in turn can be combined with 13. C. C. Todd et al., AC-3: Flexible Perceptual Coding for
an allowed video transmission format to deliver the VHD Audio Transmission and Storage, Audio Engineering Society
format. Convention, Amsterdam, Feb. 28, 1994.
The Grand Alliance system was designed for HDTV 14. J. S. Lim, Digital television: Here at last, Sci. Am. 78–83
delivery in terrestrial environments in the United (May 1998).
States. For other delivery environments such as satellite, 15. B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video: An
cable, and environments in other parts of the world, Introduction to MPEG-2, Digital Multimedia Standard Series,
other standards will emerge. Depending on the degree Chapman and Hill, New York, 1997.
HIGH-RATE PUNCTURED CONVOLUTIONAL CODES 979
16. J. L. Mitchell, W. B. Pennebaker, C. E. Fogg, and D. J. 9 ≤ M ≤ 23. All these codes are derived from the best-
LeGall, MPEG Video Compression Standard, Digital Mul- known rate-1/2 codes of the same memory lengths. The
timedia Standard Series, Chapman and Hill, New York, short memory codes are useful for Viterbi decoding,
1997. whereas the long memory codes are provided for archival
17. T. Sikora, MPEG digital video-coding standards, IEEE Signal purposes in addition to being suitable for sequential
Process. Mag. 14(5): 82–100 (Sept. 1997). decoding.
We assume that the reader is familiar with the
basic notions of convolutional encoding and the tree,
HIGH-RATE PUNCTURED CONVOLUTIONAL trellis, and state diagram representations of convolutional
CODES codes. Without loss of generality, we consider binary
convolutional codes of coding rate R = b/v, b and v integers
DAVID HACCOUN with 1 ≤ b ≤ v. A rate R = b/v convolutional code produced
Ecole Polytechnique de Montréal by a b-input/ v-output encoder may also be denoted as a
Montréal, Quebec, Canada (v, b) convolutional code. The encoder is specified by a
generating matrix G(D) of dimension b.v whose elements
are the generator polynomials
1. INTRODUCTION
mi
m
gij (D) = gkij Dk = g0ij + g1ij D + . . . + gij i Dmi (1)
For discrete memoryless channels where the noise is k=0
essentially white (such as the space and satellite chan-
nels), error control coding systems using convolutional where i = 1, . . . , b; j = 1, . . . , b.
encoding and probabilistic decoding are among the most The total memory of the encoder is M = mi . Hence,
attractive means of approaching the reliability of commu- i
nication predicted by the Shannon theory; these systems for low rate R = 1/v codes, there are two branches
provide substantial coding gains while being readily imple- emerging from each node in all the representations
mentable [1–3]. of the code, with two branches or paths remerging
By far, error control techniques using convolutional at each node or state of the encoder. For usual high
codes have been dominated by low-rate R = 1/v rate R = b/v codes, 2b branches enter and leave each
codes [1–3,11,16,19]. Optimal low-rate codes provid- encoder state. As a consequence, compared to R = 1/v
ing large coding gains are available in the litera- codes, for R = b/v codes the encoding complexity is
ture [1,3,11,16,19], and practical implementations of pow- multiplied by b, whereas the Viterbi decoding complexity is
erful decoders using Viterbi, sequential, iterative, or Turbo multiplied by 2b . By using the notion of puncturing, these
decoding schemes exist for high data rates in tens of difficulties can be entirely circumvented, since regardless
Mbits/s. However, as the trend for ever-increasing data of the coding rate R = (v − 1)/v, the encoding or decoding
transmission rates and high error performance contin- procedure is hardly more complex than for the rate
ues while conserving bandwidth, the needs arise for good R = 1/v codes.
high-rate R = b/v convolutional codes as well as prac- The article is structured as follows. Section 2 introduces
tical encoding and decoding techniques for these codes. the basic concepts of encoding for punctured codes
Unfortunately, a straightforward application of the usual and their perforation patterns. Section 3 presents
decoding techniques for rates R = 1/v codes to high- Viterbi decoding for punctured codes and their error
rate R = b/v codes becomes very rapidly impractical as performances. The search for good punctured codes is the
the coding rate increases. Furthermore, a conspicuous objective of Section 4. This section presents extensive lists
lack of good nonsystematic long-memory (M > 9) convolu- of the best-known punctured codes of coding rates varying
tional codes with rates R larger than 2/3 prevails in the from R = 2/3 to R = 7/8 together with their weight spectra
literature. and bit error probability bounds. Finally, the problem
With the advent of high-rate punctured convolutional of generating punctured codes equivalent to the best-
codes [6], the inherent difficulties of coding and decoding known usual nonpunctured high-rate codes is presented in
of high-rate codes can be almost entirely circumvented. Section 5. Again, extensive lists of long memory punctured
Decoding of rate R = b/v punctured convolutional codes equivalent to the best usual codes of the same rate are
is hardly more complex than for rate R = 1/v codes; provided.
furthermore, puncturing facilitates the implementation of
rate-adaptive, variable-rate, and rate-compatible coding-
decoding [3,6–9]. 2. BASIC CONCEPTS OF PUNCTURED
In this article, we present high-rate punctured CONVOLUTIONAL CODES
convolutional codes especially suitable for Viterbi and
sequential decoding. We provide the weight spectra 2.1. Encoding of Punctured Codes
and upper bounds on the error probability of the best-
known punctured codes having memory 2 ≤ M ≤ 8 and A punctured convolutional code is a high-rate code
coding rates 2/3 ≤ R ≤ 7/8 together with rate-2/3 and obtained by the periodic elimination (i.e., puncturing)
3/4 long-memory punctured convolutional codes having of specific code symbols from the output of a low-rate
980 HIGH-RATE PUNCTURED CONVOLUTIONAL CODES
encoder. The resulting high-rate code depends on both (a) 000 000
the low-rate code, called the original code or mother code, 001 101 001 101
110 110
and the number and specific positions of the punctured
symbols. The pattern of punctured symbols is called 111 110 111 110
(b) 100 100
the perforation pattern of the punctured code, and it
is conveniently described in a matrix form called the 111 010 111 010
perforation matrix.
Consider constructing a high-rate R = b/v punctured (c) 001 001
010 011 010 011
convolutional code from a given original code of any low- 011 011
rate R = 1/vo . From every vo b code symbols corresponding 100 000 100 000
Punctured
00 0X 00 0X Input data Encoder for
(a) Symbol coded data
original R = 1/Vo
11 1X 11 1X selector ( R = b /V )
code
(b) 11 1X 11 1X
00 0X 00 0X
Vo 1 0 1
( Perforation
matrix [P] )
1 1 0
10 1X 10 1X
(c) 01 0X 01 0X
01 0X 01 0X b
1 : Transmitting V symbols
(d)
10 1X 10 1X ( 0 : Deleting S = Vo b − Vsymbols )
Figure 1. Trellis for M = 2, R = 1/2 convolutional code. Figure 3. Schematic of an encoder for high rate punctured codes.
HIGH-RATE PUNCTURED CONVOLUTIONAL CODES 981
symbols. Given the perforation pattern of the punctured In these expressions, dfree is the free distance of the code
code, this can be readily performed by inserting dummy and aj is the number of incorrect paths or adversaries of
data into the positions corresponding to the deleted code Hamming weight j, j ≥ dfree , that diverge from the correct
symbols. In the decoding process, the dummy data are path and remerge with it sometime later. As for cj , it is
discarded by assigning them the same metric value simply the total number of bit errors in all the adversaries
(usually zero) regardless of the code symbol, 0 or 1. For having a given Hamming weight j.
either hard- or soft-quantized channels, this procedure in Using the weight spectrum, upper bounds on the
effect inhibits the conventional metric calculation for the sequence error probability PE of a code of rate R = b/v
punctured symbols. In addition to that metric inhibition, is given by
the only coding rate-dependent modification in a variable- ∞
PE ≤ aj Pj (6)
rate codec is the truncation path length, which must be
j=dfree
increased with the coding rate. All other operations of the
decoder remain essentially unchanged [6–9,21].
where Pj is the pair-wise error probability between two
Therefore, Viterbi codecs for high-rate punctured
codewords having Hamming distance j. As for the upper
codes involve none of the complexity required for the
bound on the bit error probability PB , it is easily obtained
straightforward decoding of high-rate b/v codes. They can
by weighing each erroneous path by the number of
be implemented by adding relatively simple circuitry to
information bits 1 contained in that path.
the codecs of the original low-rate 1/vo code. Furthermore,
Since the coding rate is R = b/v, PB is bounded by
since a given low-rate 1/vo code can give rise to a
large number of high-rate punctured codes, the punctured
1
∞
approach leads to very attractive realizations of variable- PB ≤ cj Pj (7)
rate Viterbi decoding. In fact, virtually all hardware b
j=dfree
implementations of convolutional codecs for high-rate
codes use the punctured approach. Evaluation of this bound depends on the actual
The punctured approach to high-rate codes can just expression of the pair-wise error probability Pj , which
as easily be applied to sequential decoding, where the in turn depends on the type of modulation and channel
decoding of the received message is performed one tree- parameters used [1,16]. For coherent PSK modulation and
branch at a time, without searching the entire tree. unquantized additive white Gaussian noise channels, the
Here again, decoding is performed on the tree of the error probability between two codewords that differ over j
original low-rate code rather than on the tree on the symbols is given by
high-rate code where, like for Viterbi decoding, the metric
of the punctured symbols is inhibited. Therefore, either Pj = Q( 2jREb /No) (8)
the Fano [10] or the Ziganzirov–Jelinek stack decoding
algorithm could be used for the decoding of high-rate where ∞
punctured codes with minimal modifications [13–15,19]. 1 1
Q(x) = √ exp − z2 dz (9)
Finally, the same approach as described above can also x 2π 2
be applied for iterative Turbo decoding using the MAP
algorithm or its variants [4,5]. and where Eb /No is the energy per bit-to-noise den-
sity ratio.
For binary symmetric memoryless channels having
3.2. Error Performance transition probability p, p < 0.5, the pair-wise error
For discrete memoryless channels, upper bounds on both probabilities Pj may be bounded by
the sequence and bit error probabilities of a convolutional
code can be obtained. The derivation of the bounds is Pj ≤ 2j (p(1 − p))j/2 (10)
based on a union bound argument on the transfer function
T(D, B) of the code that describes the weight distribution, However, using the exact expressions for Pj yields
or weight spectrum, of the incorrect codewords and the tighter bounds for PB and PE that closely match
number of bit errors on these codewords or paths [1,16]. performance simulation results, that is
Except for short memory codes, the entire transfer function
of the code is rarely known in closed form, but upper
j
j
Ci pi (1 − p)j−i , j odd
bounds on the error performances can still be calculated
i=(j+1)/(2)
using only the first few terms of two series expansions Pj =
j
1 j
related to the transfer function T(D, B), that is
j
Ci pi (1 − p)j−i + C (p(1 − p))j/2 ,
j even
2 j/2
i=(j)/(2)+1
∞
T(D, B) |B=1 = aj Dj (4)
j j!
j=dfree where Ci = , j > i.
i!(j − i)!
A good evaluation of the bounds on either PE or PB
and requires knowledge of the transfer functions (4) or (5).
dT(D, B)
∞
|B=1 = cj Dj (5) However, for the vast majority of codes, only the first
dB few terms of either functions are known, and sometimes
j=dfree
HIGH-RATE PUNCTURED CONVOLUTIONAL CODES 983
only the leading coefficients adfree and cdfree are available. Naturally, if families of variable-rate punctured codes
However, for channels with large Eb /No values such as are desired, then all the required perforation patterns
those usually used with high-rate codes, the bounds on PE must be applied to the same low-rate original code.
and PB are dominated by the very first terms adfree and Furthermore, if the codes are to be rate-compatible, then
cdfree . the perforation patterns must be selected accordingly [9].
Naturally, bounds (6) and (7) are also applicable Obviously, puncturing a code reduces its free distance,
for punctured codes. Therefore, in deriving the error and hence a punctured code cannot achieve the free
performances of punctured codes, the free distances and distance of its original code. Although the free distance
at least the first few terms of the weight spectra of these of a code increases as its rate decreases, using original
codes must be known. Partial weight spectra are provided codes with rates 1/vo lower than 1/2 does not always
here for both the best known short-memory and for long- warrant punctured codes with larger free distances since,
memory codes. for a given b and v, the proportion of deleted symbols,
S/vo b = 1 − (v/vo b), also increases with vo . Consequently,
good results and ease of implementation tend to favor the
4. SEARCH FOR GOOD PUNCTURED CODES
use of rate-1/2 original codes for generating good punctured
codes with coding rates of the form R = (v − 1)/v. Results
Since punctured coding was originally devised for Viterbi
on both short-(M ≤ 8) and long-memory punctured codes
decoding, the criterion of goodness for these codes was the
(M ≥ 9) that are derived from the best-known R = 1/2
free distance, and hence the best free distance punctured
codes are provided in this article [21–23].
codes that first appeared in the literature were all short-
Although one could select the punctured code on the
memory codes [6–9]. For sequential decoding, good long-
basis of its free distance only, a finer method consists of
memory punctured codes should have, in addition to a
determining the weight spectrum of the punctured code
large free distance, a good distance profile.
according to Eqs. (4) and (5) and then calculating the error
In searching for good punctured codes of rate b/v
probability bounds Eqs. (6) and (7). The code yielding the
and memory M, one is confronted with the problem of
best error performance may thus be selected as the best
determining both an original code of memory M and
punctured code, provided that it is not catastrophic.
rate R = 1/vo , and its accompanying perforation pattern.
Clearly, then, starting from a known optimal low-
Not unlike the search for usual convolutional codes, the
rate code, a successful search for good punctured
search for punctured codes is often based on intuition
codes hinges on the ability for determining the weight
and trial and error rather than on a strict mathematical
spectrum corresponding to each possible perforation
construction [21–23]. An approach that yielded good
pattern. Although seemingly simple, finding the weight
results is based on the notion that ‘‘good codes generate
spectrum of a punctured code is a difficult task. This is
good codes.’’ Consequently, one could choose as the mother
due to the fact that even if the spectrum of the low-
code a known good (even the best) code of memory M and
rate original code were available, the spectrum of the
rate R = 1/vo , (e.g., R = 1/2, 1/3, 1/4, . . .) and exhaustively
punctured code cannot easily be derived from it. One has
try out all possible perforation patterns to generate the
to go back exploring the tree or trellis of the low-rate
best possible punctured codes of rates R = b/v and same
original code and apply the perforation pattern to each
memory M.
path of interest. For the well-known short-memory codes,
For each perforation pattern, the error probability
the procedure is at best a rediscovery of their weight
must be evaluated, which in turn implies determining the
spectra, whereas for long-memory codes where often only
corresponding weight spectrum. For a punctured code of
the free distance is known, it is a novel determination of
rate R = b/v obtained from a mother code of rate R = 1/vo ,
their spectra. The problem is further compounded by the
the number of possible perforation matrices to consider
fact that since puncturing a path reduces its Hamming
is equal to Cbvv . For example, using a mother code of
o
weight, in obtaining spectral terms up to some target
rate R = 1/2, there are C14 8 = 3003 different perforation
Hamming distance, a larger number of paths must be
matrices to consider for determining the best punctured
explored over much longer lengths for a punctured than
codes of rate R = 7/8. This number may be substantially
for a usual non-punctured low-rate code.
reduced if the code must satisfy some specific properties.
For each perforation pattern, the corresponding weight
For example, if the code is systematic, that is the
spectrum consists essentially in the triplet {Dl , al , cl } from
information sequence appears unaltered in the output
which the bounds on the error probabilities PE and PB
code sequence, then clearly the first row of the perforation
can be calculated. Of course, the best punctured code
matrix is all composed of 1s, and the puncturing may take
will correspond to the perforation pattern yielding the
place only in the other rows. For example, for R = 7/8
best error probability bounds. In the search for the best
systematic punctured codes, the puncturing matrices are
punctured codes, the difficulty is compounded by the fact
of the following form
that in determining the weight spectrum of a code of a
1 1 1 1 1 1 1 memory length M up to some Hamming weight value L,
P= the computational effort is exponential in both M and L.
1 0 0 0 0 0 0
The best punctured codes presented here have been
There are only seven possible alternatives, and in general obtained using very efficient trellis-search algorithms
the mother codes for R = b/v systematic punctured codes for determining the weight spectra of convolutional
must also be systematic. codes [17,18]. Each algorithm has been designed to be
984 HIGH-RATE PUNCTURED CONVOLUTIONAL CODES
especially efficient within a given range of memory lengths, analytically derived for both rate-2/3 and rate-4/5 memory-
making it possible to obtain substantial extensions of the 2 punctured codes. The transfer functions of the codes
known spectra of the best convolutional codes. The spectral have been determined by solving the linear equations
terms obtained by these exploration methods have been describing the transitions between the different states of
shown to match exactly the corresponding terms derived the encoder. As expected, all the spectral terms obtained by
from the transfer function of the code [21], thus validating the search algorithms have been shown to match exactly
the search procedure. the corresponding terms of the transfer functions [22].
The bit error probability upper bound PB over both
4.1. Short-Memory Punctured Codes
unquantized and hard quantized additive white Gaussian
A number of short-memory lengths (M ≤ 8) punctured noise channels has been evaluated for all the codes listed
codes of rates R = (v − 1)/v varying from 2/3 to 16/17 in Tables 1 to 6 [21–23], using all the listed weight
have been determined first by Cain et al. [6], Yasuda spectra terms. For all these codes, the error performance
et al. [7] and [8], and then by Hagenauer [9] for rate- improves as the coding rate decreases, indicating well-
compatible codes. In particular, in Ref. [7] all the memory- chosen perforation patterns. A notable exception concerns
6 punctured codes of rates varying from 2/3 to 16/17 have the memory M = 3 code, where the bit error performance
been derived from the same original memory-6, rate-1/2,
turned out to be slightly better at rate 4/5 than at rate
optimal convolutional code due to Odenwalder [16]. A more
3/4. This anomaly may be explained by an examination
complete list of rate 2/3 to 13/14 punctured codes has been
of the spectra of these two codes. As shown in Tables 2
derived from the optimal rate-1/2 codes with memory M
and 3, although the free distances of the rates 3/4 and 4/5
varying from 2 to 8 and compiled by Yasuda et al. [8]. In
codes are df = 4 and df = 3, respectively, the number of bit
this list, the perforation matrix for each code is provided,
errors cn on the various spectral terms are far larger for
but the weight spectrum is limited to the first term only,
the rate 3/4 code than for the rate 4/5 code. Clearly, it is the
that is, the term cdfree corresponding to dfree .
coefficients cn that adversely affect the error performance
Using the given perforation patterns, Haccoun
of the rate 3/4 code. This anomaly illustrates the fact that
et al. [21–23] have extended Yasuda’s results by deter-
selecting a code according to only the free distance is good
mining the first 10 spectral terms an and cn for all the
in general but may sometimes be insufficient. Knowledge
codes having memory lengths 2 ≤ M ≤ 8 and coding rates
of further terms of the spectrum will always provide more
2/3, 3/4, 4/5, 5/6, 6/7, and 7/8. These results are given in
insight on the code performance.
Tables 1 to 6, respectively. Each table lists the genera-
The theoretical bound on PB for the original M = 6,
tors of the original low-rate code, the perforation matrix,
R = 1/2 optimal-free-distance code is plotted in Fig. 4,
the free distance of the resulting punctured code, and
together with those of all the best punctured codes with
the coefficients an and cn of the series expansions of the
rates R = 2/3, 3/4, 4/5, 5/6, 6/7, and 7/8 derived from it. For
corresponding weight spectra up to the 10 spectral terms.
comparison purposes, Fig. 4 also includes the performance
Beyond 10 spectral coefficients the required computer time
curve for the uncoded coherent PSK modulation. It shows
becomes rather prohibitive, and hardly any further preci-
that the performance degradation from the original rate-
sion of the bounds is gained by considering more than 10
1/2 code is rather gentle as the coding rate increases from
spectral terms.
To verify the accuracy and validity of these results, the 1/2 to 7/8. For example, at PB = 10−5 , the coding gains for
D(T, B) the punctured rate-2/3 and 3/4 codes are 5.1 dB and 4.6 dB,
entire transfer functions T(D, B) and have been respectively. These results indicate that these codes are
dB
Table 1. Weight Spectra of Yasuda et al. Punctured Codes with R = 2/3, 2 ≤ M ≤ 8, [21]
Original Code Punctured Code
(an , n = df , df +1 , df +2 , . . .)
M G1 G2 df [P] df [cn , n = df , df +1 , df +2 , . . .]
Table 2. Weight Spectra of Yasuda et al. Punctured Codes with R = 3/4, 2 ≤ M ≤ 8, [21]
Original Code Punctured Code
(an , n = df , df +1 , df +2 , . . .)
M G1 G2 df [P] df [cn , n = df , df +1 , df +2 , . . .]
2 5 7 5 101 3 (6, 23, 80, 290, 1050, 3804, 13782, 49930, 180890, 655342)
110 [15, 104, 540, 2557, 11441, 49340, 207335, 854699, 3471621, 13936381]
3 15 17 6 110 4 (29, 0, 532, 0, 10059, 0, 190112, 0, 3593147, 0)
101 [124, 0, 4504, 0, 126049, 0, 3156062, 0, 74273624, 0]
4 23 35 7 101 3 (1, 2, 23, 124, 576, 2852, 14192, 70301, 348427, 1726620)
110 [1, 7, 125, 936, 5915, 36608, 216972, 1250139, 7064198, 39308779]
5 53 75 8 100 4 (1, 15, 65, 321, 1661, 8396, 42626, 216131, 1095495, 5557252)
111 [3, 85, 490, 3198, 20557, 123384, 725389, 4184444, 23776067, 133597207]
6 133 171 10 110 5 (8, 31, 160, 892, 4512, 23307, 121077, 625059, 3234886, 16753077)
101 [42, 201, 1492, 10469, 62935, 379644, 2253373, 13073811, 75152755, 428005675]
7 247 371 10 110 6 (36, 0, 990, 0, 26668, 0, 726863, 0, 19778653, 0)
101 [239, 0, 11165, 0, 422030, 0, 14812557, 0, 493081189, 0]
8 561 753 12 111 6 (10, 77, 303, 1599, 8565, 44820, 236294, 1236990, 6488527, 34056195)
101 [52, 659, 3265, 21442, 133697, 805582, 4812492, 28107867, 162840763, 935232173]
Table 3. Weight Spectra of Yasuda et al. Punctured Codes with R = 4/5, 2 ≤ M ≤ 8, [21]
Original Code Punctured Code
(an , n = df , df +1 , df +2 , . . .)
M G1 G2 df [P] df [cn , n = df , df +1 , df +2 , . . .]
2 5 7 5 1011 2 (1, 12, 53, 238, 1091, 4947, 22459, 102030, 463451)
1100 [1, 36, 309, 2060, 12320, 69343, 375784, 1983150, 10262827]
3 15 17 6 1011 3 (5, 36, 200, 1070, 5919, 32721, 180476, 995885, 5495386, 30323667)
1100 [14, 194, 1579, 11313, 77947, 514705, 3305113, 20808587, 129003699, 790098445]
4 23 35 7 1010 3 (3, 16, 103, 675, 3969, 24328, 147313, 897523, 5447618, 33133398)
1101 [11, 78, 753, 6901, 51737, 386465, 2746036, 19259760, 132078031, 896198879]
5 53 75 8 1000 4 (7, 54, 307, 2005, 12970, 83276, 534556, 3431703, 22040110)
1111 [40, 381, 3251, 27123, 213451, 1621873, 12011339, 87380826, 627189942]
6 133 171 10 1111 4 (3, 24, 172, 1158, 7409, 48729, 319861, 2097971, 13765538, 90315667)
1000 [12, 188, 1732, 15256, 121372, 945645, 7171532, 53399130, 392137968, 2846810288]
7 247 371 10 1010 5 (20, 115, 694, 4816, 32027, 210909, 1392866, 9223171, 61013236)
1101 [168, 1232, 9120, 78715, 626483, 4758850, 35623239, 263865149, 1930228800]
8 561 753 12 1101 5 (7, 49, 351, 2259, 14749, 99602, 663936, 4431049, 29536078, 197041141)
1010 [31, 469, 4205, 34011, 268650, 2113955, 16118309, 121208809, 898282660, 2301585211]
very good indeed, even though their free distances, which that is hardly more complex than for a rate-1/2 code makes
are equal to 6 and 5, respectively, are slightly smaller than the punctured coding technique very attractive indeed for
the free distances of the best-known usual rate-2/3 and 3/4 short-memory codes. For longer codes and larger coding
codes, which are equal to 7 and 6, respectively [11]. gains, Viterbi decoding becomes impractical and other
The error performance of these codes has been decoding techniques such as sequential decoding should
verified using an actual punctured Viterbi codec [7,8], be considered instead. Long-memory length punctured
and independently, using computer simulation [21–23]. codes and their error performance are presented next.
Both evaluations have been performed using 8-level soft
decision Viterbi decoding with truncation path lengths 4.2. Long-Memory Punctured Codes
equal to 50, 56, 96, and 240 bits for the coding rates 2/3, 3/4, Following essentially the same approach as for the
7/8, and 15/16, respectively. Both hardware and software short-memory codes, one could choose a known optimal
evaluations have yielded identical error performances that long-memory code of rate 1/2 and exhaustively try out
closely match the theoretical upper bounds. all possible perforation patterns to generate all punctured
Even for the rate 15/16, the coding gain of the M = 6 codes of rate R = b/v. The selection of the best punctured
code has been shown to reach a substantial 3 dB at code is again based on its bit error performance, which
PB = 10−6 [3]. The fact that such a coding gain can be is calculated from the series expansion of its transfer
achieved with only a 7% redundancy and a Viterbi decoder function. Here, one of the difficulties is that for the original
986 HIGH-RATE PUNCTURED CONVOLUTIONAL CODES
Table 4. Weight Spectra of Yasuda et al. Punctured Codes with R = 5/6, 2 ≤ M ≤ 8, [21]
Original Code Punctured Code
(an , n = df , df +1 , df +2 , . . .)
M G1 G2 df [P] df [cn , n = df , df +1 , df +2 , . . .]
2 5 7 5 10111 2 (2, 26, 129, 633, 3316, 17194, 88800, 459295, 2375897, 12288610)
11000 [2, 111, 974, 6857, 45555, 288020, 1758617, 10487425, 61445892, 355061333]
3 15 17 6 10100 3 (15, 96, 601, 3918, 25391, 164481, 1065835, 6906182, 44749517)
11011 [63, 697, 6367, 53574, 426471, 3277878, 24573195, 180823448, 1311630186]
4 23 35 7 10111 3 (5, 37, 309, 2282, 16614, 122308, 900991, 6634698, 48853474)
11000 [20, 265, 3248, 32328, 297825, 2638257, 22710170, 191432589, 1587788458]
5 53 75 8 10000 4 (19, 171, 1251, 9573, 75167, 585675, 4558463, 35513472)
11111 [100, 1592, 17441, 166331, 1591841, 14627480, 131090525, 1155743839]
6 133 171 10 11010 4 (14, 69, 654, 4996, 39699, 315371, 2507890, 19921920, 158275483)
10101 [92, 528, 8694, 79453, 792114, 7375573, 67884974, 610875423, 1132308080]
7 247 371 10 11100 4 (2, 51, 415, 3044, 25530, 200878, 1628427, 12995292, 104837990)
10011 [7, 426, 5244, 49920, 514857, 4779338, 44929071, 406470311, 3672580016]
8 561 753 12 10110 5 (19, 187, 1499, 11809, 95407, 775775, 6281882, 50851245)
11001 [168, 2469, 25174, 242850, 2320429, 21768364, 199755735, 1807353406]
Table 5. Weight Spectra of Yasuda et al. Punctured Codes with R = 6/7, 2 ≤ M ≤ 8, [21]
Original Code Punctured Code
(an , n = df , df +1 , df +2 , . . .)
M G1 G2 df [P] df [cn , n = df , df +1 , df +2 , . . .]
2 5 7 5 101111 2 (4, 39, 221, 1330, 8190, 49754, 302405, 1840129, 11194714, 68101647)
110000 [5, 186, 1942, 16642, 131415, 981578, 7076932, 49784878, 343825123, 2340813323]
3 15 17 6 100011 2 (1, 25, 188, 1416, 10757, 81593, 619023, 4697330, 35643844)
111100 [2, 134, 1696, 18284, 179989, 1676667, 15082912, 132368246, 1140378555]
4 23 35 7 101010 3 (14, 100, 828, 7198, 60847, 513573, 4344769, 36751720)
110101 [69, 779, 9770, 113537, 1203746, 12217198, 120704682, 1167799637]
5 53 75 8 110110 3 (5, 55, 517, 4523, 40476, 362074, 3232848, 28872572)
101001 [25, 475, 6302, 73704, 823440, 8816634, 91722717, 935227325]
6 133 171 10 111010 3 (1, 20, 223, 1961, 18093, 169175, 1576108, 14656816, 136394365)
100101 [5, 169, 2725, 32233, 370861, 4169788, 45417406, 483171499, 768072194]
7 247 371 10 101001 4 (11, 155, 1399, 13018, 122560, 1154067, 10875198, 102494819, 965649475)
110110 [85, 1979, 24038, 282998, 3224456, 35514447, 383469825, 4075982541, 4092715598]
8 561 753 12 110110 4 (2, 48, 427, 4153, 39645, 377500, 3600650, 34334182)
101001 [9, 447, 5954, 76660, 912140, 10399543, 115459173, 1256388707]
low-rate and long-memory codes of interest, only very comparison of the resulting punctured codes with the best-
partial knowledge of their weight spectra is available. known nonsystematic high-rate codes is limited to the rate-
In fact, beyond memory length M = 15, very often only 2/3 and 3/4 codes, since with very few exceptions, optimal
the free distances of these codes are available in the long memory codes suitable for sequential decoding are
literature [12]. known only for rates 2/3 and 3/4.
Results of computer search for the best rate-2/3 and Tables 7 and 8 list the characteristics of the best
3/4 punctured codes with memory lengths ranging from punctured codes of rate 2/3 and 3/4, respectively, with
9 to 23 that are derived from the best-known rate-1/2 memory lengths M varying from 9 to 23, that are derived
are provided in Refs. 21–23, where for each code the first from the best nonsystematic rate-1/2 codes [21]. From
few terms of the weight spectrum have been obtained M = 9 to M = 13 the original codes are the maximal
for each possible distinct perforation pattern. The final free distance codes discovered by Larsen [19], whereas for
selection of the best punctured codes was based on the 14 ≤ M ≤ 23 the original codes are those of Johannesson
evaluation of the upper bound on the bit error probability. and Paaske [12]. In both of these tables, for each
Naturally, the codes obtained with this approach are memory length the generators of the original code and
suitable for variable-rate codecs using an appropriate its perforation matrix are given, together with the free
decoding technique such as sequential decoding. Only distances of both the original and resulting punctured
these two rates have been considered because a definite code. Just as with short-memory codes, the first few terms
HIGH-RATE PUNCTURED CONVOLUTIONAL CODES 987
Table 6. Weight Spectra of Yasuda et al. Punctured Codes with R = 7/8, 2 ≤ M ≤ 8, [21]
Original Code Punctured Code
(an , n = df , df +1 , df +2 , . . .)
M G1 G2 df [P] df [cn , n = df , df +1 , df +2 , . . .]
2 5 7 5 1011111 2 (6, 66, 408, 2636, 17844, 119144, 793483, 5293846, 35318216)
1100000 [8, 393, 4248, 38142, 325739, 2647528, 20794494, 159653495, 1204812440]
3 15 17 6 1000010 2 (2, 38, 346, 2772, 23958, 201842, 1717289, 14547758, 123478503)
1111101 [4, 219, 3456, 38973, 437072, 4492304, 45303102, 442940668, 4265246076]
4 23 35 7 1010011 3 (13, 145, 1471, 14473, 143110, 1416407, 14019214, 138760394)
1101100 [49, 1414, 21358, 284324, 3544716, 42278392, 489726840, 1257797047]
5 53 75 8 1011101 3 (9, 122, 1195, 12139, 123889, 1259682, 12834712, 130730637, 1331513258)
1100010 [60, 1360, 18971, 252751, 3165885, 38226720, 450898174, 923001734, 3683554219]
6 133 171 10 1111010 3 (2, 46, 499, 5291, 56179, 599557, 6387194, 68117821)
1000101 [9, 500, 7437, 105707, 1402743, 17909268, 222292299, 2706822556]
7 247 371 10 1010100 4 (26, 264, 2732, 30389, 328927, 3571607, 38799203)
1101011 [258, 3652, 52824, 746564, 9825110, 125472545, 1567656165]
8 561 753 12 1101011 4 (6, 132, 1289, 13986, 154839, 1694634, 18532566)
1010100 [70, 1842, 24096, 337514, 4548454, 58634237, 738611595]
an and cn , n = dfree , dfree + 1, dfree + 2, . . . of the weight equivalences of the perforation patterns under cyclical
spectra are also given for each punctured codes. In deriving shifts of their rows [21–23]. Among all the codes that were
these spectral coefficients, the tree exploration of the found, Tables 7 and 8 list only those having the smallest
original code had to be performed over a considerable number of bit errors cdfree at the free distance dfree , and
length [21]. obviously all the codes listed are noncatastrophic.
In the search for the best punctured codes, the Table 7 lists two M = 19, R = 2/3 punctured codes
perforation patterns were chosen as to yield both a derived from two distinct good M = 19, R = 1/2 original
maximal free distance and a good distance profile. codes. The free distances of these two punctured codes are
Although all perforation patterns were exhaustively equal to 12 and 13, respectively, but the coefficients cn are
examined, the search was somewhat reduced by exploiting larger for the dfree = 13 code than they are for the dfree = 12
code. Consequently, as confirmed by the calculation of the
bit error bound PB , the code with dfree = 13 turned out to
be slightly worse by approximately 0.35 dB than the code
Bit error performance bounds with dfree = 12. This anomaly again confirms the need to
unquantized channel
M = 6 convolutional codes
determine at least the first few terms of the weight spectra
when searching for the best punctured codes.
10−3
Figure 5 plots the free distances of the original rate
Uncoded PSK 1/2 codes and the punctured rate-2/3 and 3/4 codes as
a function of the memory length M, 2 ≤ M ≤ 22. Except
for the two anomalies with the M = 19 mentioned above,
10−4 the behavior of the free distances is as expected: the free
distance of the punctured codes of a given rate is generally
nondecreasing with the memory length, and for a given
10−5 memory length the free distance decreases with increasing
coding rates.
PB
(an , n = df , df +1 , df +2 , . . .)
M G1 G2 df [P] df [cn , n = df , df +1 , df +2 , . . .]
The upper bounds on the bit error probability the code should also be based on the distance profile
over unquantized white Gaussian channels have been and computational performance. Short of analyzing the
evaluated for all the punctured codes of rates R = 2/3 and computational behavior, when in doubt, the codes finally
3/4 and are shown for even values of memory lengths in selected and listed in the tables had the fastest-growing
Figs. 6 and 7. These bounds indicate a normal behavior column distance function.
for all the punctured codes listed in Tables 7 and 8. In the search for punctured codes, the above approach
All the bit error performances improve as the coding will produce good but not necessarily optimal codes
rate decreases and/or as the memory increases, with since the original low-rate code is imposed at the
approximately 0.4 dB improvement for each unit increase outset. A measure of the discrepancy between optimal
of the memory length. At PB = 10−5 , the M = 22, R = 2/3, and punctured codes of the same rate and memory
and R = 3/4 punctured codes can yield substantial coding length is provided in Fig. 8, which shows the bit error
gains of 8.3 dB and 7.7 dB, respectively. performance bound for both the best punctured and
The selection of the best punctured codes listed in maximal free distance codes of memory length 9 and
Tables 7 and 8 has been based on both the maximal rates 2/3 and 3/4. These bounds have been computed
free distance and the calculated bit error probability using only the term at dfree for the maximum free
bound. However, the choice of the best punctured code distance (MFD) codes and using both the terms at dfree
is not always clear-cut as different puncturing patterns and dfree + 1 for the punctured codes. Based on these
may yield only marginally different error performances. terms only, the two MFD codes appear to be only
In some cases, the performance curves may even be slightly better than the punctured codes. Therefore, it
undistinguishable, leading to several ‘‘best’’ punctured may be concluded that, although not optimal, the error
codes having the same coding rate and memory length. performances of the rate-2/3 and 3/4 punctured codes
However, since these long codes are typically for of memory 9 closely match those of the MFD codes of
sequential decoding applications, the final selection of the same rates and memory lengths. The same general
HIGH-RATE PUNCTURED CONVOLUTIONAL CODES 989
(an , n = df , df +1 , df +2 , . . .)
M G1 G2 df [P] df [cn , n = df , df +1 , df +2 , . . .]
conclusions may be made about the slight suboptimality Free distance of punctured codes
of the other punctured codes with different memory 24
Original
lengths [21–23]. However, as mentioned earlier, the 21 R = 1/2
small error performance degradation of the punctured
codes is compensated for by a far simpler practical 18
Upper bound Punctured
implementation. R = 2/3
15 R = 2/3
Given an optimal usual high-rate code of rate R = b/v
Dfree
Upper bound
and memory length M, one could attempt to determine 12 R = 3/4
the low-rate 1/v code that, after puncturing, will yield
9
a punctured code that is equivalent to that optimal Punctured
code. This approach, which is the converse of the 6 R = 3/4
usual code searching method, can be used to find the
3
punctured code equivalent to any known usual high-rate 2 4 6 8 10 12 14 16 18 20 22
code. Based on this approach and using the notion of M
orthogonal perforation patterns, a systematic construction Figure 5. Free distances of original rate-1/2 codes and punctured
technique has been developed by Begin and Haccoun [22]. R = 2/3 and 3/4 codes derived from them as a function of M,
Using this technique, punctured codes strictly equivalent 2 ≤ M ≤ 22, and upper bounds, [21].
to the best-known nonsystematic rate-2/3 codes with
memory lengths up to M = 24 have been found. Likewise, be pointed out that the punctured codes generated by
punctured codes equivalent to the best-known rate-3/4 this latter approach are not suitable for variable-rate
codes with memory lengths up to M = 9 have also applications since each punctured code has its own
been determined [22]. Therefore, optimal high-rate codes distinct low-rate original code and orthogonal perforation
may be obtained as punctured codes, but it should pattern.
990 HIGH-RATE PUNCTURED CONVOLUTIONAL CODES
10−4 10−4
M=2
10−5
PB
10−5
PB
M=2 M=4
10−6 10−6
10−7
10−7 M=4 M=6
M = 20 M = 14
M = 22 M = 18 M = 14 M = 12 M=8 M = 22 M = 16
M = 10 M=6 M = 18 M = 12 M = 10 M = 8
M = 20 M = 16 10−8
10−8 2 3 4 5 6 7
2 3 4 5 6 7
Eb/NO (dB)
Eb/NO (dB)
Figure 7. Upper bounds on the bit error probability over unquan-
Figure 6. Upper bounds on the bit error probability over unquan- tized white Gaussian noise channels for R = 3/4 punctured codes
tized white Gaussian noise channels for R = 2/3 punctured codes with 2 ≤ M ≤ 22, M even, [21].
with 2 ≤ M ≤ 22, M even, [21].
code having both the same rate and same weight spectrum codes
as the best-known usual code [22]. These codes are
listed in Tables 9 to 12. Tables 9 to 11 use the same 10−6
Max. free dist.
orthogonal perforation matrix P01 whereas Table 12 uses
the orthogonal perforation matrix P02 : Punctured
10−7
1 0 0
1 0 0
1 0
P01 = 0 1 P02 = (13)
0 0 1
0 1 10−8
0 0 1 3 4 5 6
Eb/NO (dB)
The R = 2/3 codes are obtained by puncturing R = Figure 8. Bit error performance bounds for maximal free
1/3 original codes, and the R = 3/4 punctured codes distance (MFD) codes and punctured codes of rates 2/3 and 3/4
are obtained from R = 1/4 original codes. Tables 9 to with memory M = 9, [21].
12 provide the memory lengths and generators of the
original low-rate codes and the resulting punctured codes difference in memory length is quite small, usually one or
equivalent to their respective best usual codes. two units, and this difference appears to be independent
We can observe from the tables that, for most of the of the actual memory length of the code. Therefore, its
cases, the memory of the required original code is larger relative importance decreases as the memory length of
than that of the resultant equivalent punctured code. The the code increases. For large memory codes, this memory
HIGH-RATE PUNCTURED CONVOLUTIONAL CODES 991
increment is of no consequence whatsoever since the Table 9. Original Codes that Yield the R = 2/3 Codes
codes certainly would be decoded by sequential decoding of Johannesson and Paaske, Perforation Pattern
methods. P01 , [22]
As mentioned earlier, the punctured codes listed in Original R = 1/3 Punctured R = 2/3
Tables 9 to 12 are not suitable for variable-rate or
rate-compatible decoding applications. However, they do G11 G21 G31
provide a practical method for coding and decoding the M0 G1 G2 G3 M G12 G22 G32
best-known long memory rate 2/3 and 3/4 codes, especially 4 26 22 35 3 6 2 4
using sequential decoding. Decoding these codes by the 1 4 7
normal (nonpunctured) approach is cumbersome because 5 54 47 67 4 6 3 7
of the large number (2b ) of nodes stemming from each 1 5 5
node in the tree of the high-rate R = b/v code. With the 6 172 137 152 5 14 06 16
punctured approach, the decoding proceeds on the low- 07 17 10
rate tree structure, so that the number of nodes stemming
7 314 271 317 6 12 05 13
from a single node is always 2, regardless of the actual 05 16 13
coding rate.
8 424 455 747 7 26 14 32
00 23 33
5.2. Systematic Codes 9 1634 1233 1431 8 32 05 25
13 33 22
Using a similar procedure, all high-rate systematic
codes, whether known or yet to be discovered, may be 10 3162 2553 3612 9 54 16 66
obtained by puncturing some low-rate original (system- 25 71 60
atic) code. Furthermore, since for any R = b/(b + 1) target 11 6732 4617 7153 10 53 23 51
code the branches of the high-rate code consist of b infor- 36 53 67
mation digits and a single parity symbol, the original code 12 17444 11051 17457 11 162 054 156
need only be a R = 1/2 systematic code: the b information 064 101 163
digits are directly issued from the information digits of the
Table 10. Original Codes that Yield the R = 2/3 Codes of Johannesson
and Paaske, Perforation Pattern P01 , [22]
Original R = 1/3 Punctured R = 2/3
Table 11. Original Codes that Yield the R = 2/3 Codes of Table 13. Systematic Punctured Codes that
Johannesson and Paaske, Perforation Pattern P01 , [22] Duplicate the Codes of Hagenauer. G2 is Given in
Octal [22]
Original R = 1/3 Punctured R = 2/3
R P G2
G11 G21 G31
M0 G1 G2 G3 M G12 G22 G32 2/3 11 33275606556377737
01
3 16 11 15 2 3 1 3 3/4 111 756730246717030774725
1 2 2 001
4 22 22 37 3 4 2 6 4/5 1111 7475464466521133456725475223
1 4 7 0001
6 72 43 72 4 7 1 4 5/6 11111 17175113117122772233670106777
2 5 7 00001
7/8 1111111 1773634453774014541375437553121
7 132 112 177 5 14 06 16 0000001
03 10 17
8 362 266 373 6 15 06 15
06 15 17
9 552 457 736 7 30 16 26 allows once again their easy and practical decoding by
07 23 36 any sequential decoding algorithm. However, just like
10 2146 2512 3355 9 52 06 74 the nonsystematic codes, the systematic punctured codes
05 70 53 found here are optimal but do not readily lend themselves
to variable-rate or rate-compatible decoding.
11 7432 5163 7026 10 63 15 46
32 65 61
6. CONCLUSION
Table 12. Original Codes that Yield the R = 3/4 Codes of In this article, we have presented the basic notions,
Johannesson and Paaske, Perforation Pattern P02 , [22] properties and error performances of high-rate punctured
convolutional codes. These high-rate codes, which are
Original R = 1/4 Punctured R = 3/4
derived from well-known optimal low-rate codes and a
G11 G21 G31 G41 judicious choice of the perforation patterns, are no more
G12 G22 G32 G42 complex to encode and decode than low-rate codes, yielding
M0 G1 G2 G3 G4 M G13 G23 G33 G43 easy implementations of high coding rate codecs as well
6 100 170 125 161 3 4 4 4 4
as variable-rate and rate-compatible coding and decoding.
0 6 2 4 Extensive lists of both short- and long-memory length
0 2 5 5 punctured codes have been provided together with up to
7 224 270 206 357 5 6 2 2 6
the first 10 terms of their weight spectra and their bit error
1 6 0 7 probability bounds. The substantial advantages of using
0 2 5 5 high-rate punctured codes over the usual high-rate codes
8 750 512 446 731 6 6 1 0 7 open the way for powerful, versatile, and yet practical
3 4 1 6 implementations of variable-rate codecs extending from
2 3 7 4 very low to very high coding rates.
10 2274 2170 3262 3411 8 16 06 04 10
03 12 00 13
BIOGRAPHY
01 02 17 10
11 6230 4426 4711 7724 9 10 03 07 14
01 15 04 16
David Haccoun received the Engineer and B.Sc.Ap.
07 00 14 15 degrees (Magna Cum Laude) in engineering physics
from École Polytechnique de Montréal, Canada, in
1965; the S.M. degree in electrical engineering from
the Massachusetts Institute of Technology, Cambridge,
original code and the single parity symbol is obtained by Massachusetts, in 1966; and the Ph.D. degree in electrical
puncturing all but one of the b remaining code symbols engineering from McGill University, Montréal, Canada, in
at the output of the original encoder. Equivalently, the 1974.
perforation matrix has two rows, whereby the first one is Since 1966 he has been with the Department of Electri-
filled with 1s and the second one is filled with 0s except at cal Engineering, École Polytechnique de Montréal, where
one position. he has been Professor of Electrical Engineering since 1980
Using this procedure, Haccoun and Begin [22] have and was the (founding) Head of the Communication and
obtained the original codes and associated perforation Computer Section from 1980 to 1996. He was a Research
matrices allowing to duplicate all the systematic codes Visiting Professor at several universities in Canada and in
of Hagenauer [24]. These codes are listed in Table 13. France. Dr. Haccoun is involved in teaching at both under-
The possibility of generating these codes by perforation graduate and graduate levels, conducting research, and
HIGH-SPEED PHOTODETECTORS FOR OPTICAL COMMUNICATIONS 993
performing consulting work for both government agencies 17. P. Montreuil, Algorithmes de determination de spectres des
and industries. codes convolutionnels, M.Sc.A. thesis, Dep. Elect. Eng., Ecole
His current research interests include the theory and Polytechnique de Montreal, 1987.
applications of error-control coding, mobile and personal 18. D. Haccoun and P. Montreuil, Weight spectrum determina-
communications, and digital communications systems by tion of convolutional codes, to be submitted to IEEE Trans.
satellite. He is the author or coauthor of a large number of Commun., Book of Abstracts, 1988 Int. Symp. Inform. Theory,
journal and conference papers in these areas. He holds a Kobe, Japan, June 1988, 49–50.
patent on an error control technique and is a coauthor of 19. K. Larsen, Short convolutional codes with maximal free
the books The communications Handbook (CRC press and distance for rates 1/2, 1/3, and 1/4, IEEE Trans. Inform.
IEEE Press, 1997 and 2001), and Digital Communications Theory IT-19: 371–372 (1973).
by Satellite: Modulation, Multiple-Access and Coding (New 20. G. Begin and D. Haccoun, Performance of sequential decoding
York : Wiley, 1981). A Japanese translation of that book of high rate punctured convolutional codes, IEEE Trans.
was published in 1984. Commun. 42: 996–978 (1994).
21. D. Haccoun and G. Begin, High-rate punctured convolutional
BIBLIOGRAPHY codes for Viterbi and sequential decoding, IEEE Trans.
Commun. 37: 1113–1125 (1989).
1. A. J. Viterbi, Convolutional codes and their performance in 22. G. Begin and D. Haccoun, High-rate punctured convolutional
communications systems, IEEE Trans. Commun. Technol. codes: structure properties and construction technique, IEEE
COM-19 (1971). Trans. Commun. 37: 1381–1385 (1989).
2. I. M. Jacobs, Practical applications of coding, IEEE Trans. 23. G. Begin, D. Haccoun, and C. Paquin, Further results on
Inform. Theory IT-20: 305–310 (1974). high-rate punctures convolutional codes for Viterbi and
3. W. W. Wu, D. Haccoun, R. Peile, and Y. Hirata, Coding for sequential decoding, IEEE Trans. Commun. 38: 1922–1928
satellite communication, IEEE J. Select. Areas. Commun. (1990).
SAC-5: 724–748 (1987). 24. J. Hagenauer, High rate convolutional codes with good
4. C. Berrou, A. Glavieux, and P. Thitimasjhima, Near Shannon profiles, IEEE Trans. Inform. Theory IT-23: 615–618 (1977).
Limit Error Correcting Coding and Decoding: Turbo Codes,
Proceedings of ICC’93, pp. 1064–1070 (1993).
5. D. Divsalar and F. Pollara, Turbo codes for deep-space HIGH-SPEED PHOTODETECTORS FOR OPTICAL
communications, TDA Progress Report 42–120: 29–39 (1995). COMMUNICATIONS
6. J. B. Cain, G. C. Clark, and J. Geist, Punctured convolutional
codes of rate (n − 1)/n and simplified maximum likelihood M. SELIM ÜNLÜ
decoding, IEEE Trans. Inform. Theory IT-25: 97–100 (1979).
OLUFEMI DOSUNMU
7. Y. Yasuda, Y. Hirata, K. Nakamura, and S. Otani, Develop- MATTHEW EMSLEY
ment of a variable-rate Viterbi decoder and its performance Boston University
characteristics, 6th Int. Conf. Digital Satellite Commun., Boston, Massachusetts
Phoenix, Arizona, Sept. 1983.
8. Y. Yasuda, K. Kashiki, and Y. Hirata, High-rate punctured
convolutional codes for soft decision Viterbi decoding, IEEE 1. INTRODUCTION
Trans. Commun. COM-32: 315–319 (1984).
9. J. Hagenauer, Rate compatible punctured convolutional codes
The capability to detect, quantify, and analyze an optical
and their applications, IEEE Trans. Commun. 36: 389–400 signal is the first requirement of any optical system.
(1988). In optical communication systems, the detector is a
crucial element whose function is to convert the optical
10. R. M. Fano, A heuristic discussion of probabilistic decoding,
IEEE Trans. Inform. Theory IT-9 (1962). signal at the receiver end into an electrical signal,
which is then amplified and processed to extract the
11. E. Paaske, Short binary convolutional codes with maximal
information content. The performance characteristics of
free distance for rates 2/3 and 3/4, IEEE Trans. Inform.
the photodetector, therefore, determines the requirements
Theory IT-20: 683–686 (1974).
for the received optical power and dictates the overall
12. R. Johannesson and E. Paaske, Further results on binary system performance along with other system parameters
convolutional codes with an optimum distance profile, IEEE
such as the allowed optical attenuation, and thus the
Trans. Inform. Theory IT-24: 264–268 (1978).
length of the transmission channel.
13. F. Jelinek, A fast sequential decoding algorithm using a stack, Progress in optical communications and processing
IBM J. Res. Develop. 13: 675–685 (1969). requires simultaneous development in light sources, inter-
14. K. Zigangirov, Some sequential decoding procedures, Prob- connects, and photodetectors. While optical fibers have
lemii Peradachi Informatsii 2: 13–15 (1966). been developed for nearly ideal links for optical signal
15. D. Haccoun and M. J. Ferguson, Generalized stack algo- transmission, semiconductors have become the material
rithms for decoding convolutional codes, IEEE Trans. Inform. of choice for optical sources and detectors, primarily
Theory IT-21: 638–651 (1975). because of their well-established technology, fast electrical
16. J. P. Odenwalder, Optimal decoding of convolutional codes, response, and optical generation and absorption proper-
Ph.D. dissertation, Dept. Elect. Eng., U.C.L.A., Los Angeles, ties. Many years of research on semiconductor devices have
1970. led to the development of high-performance photodetectors
994 HIGH-SPEED PHOTODETECTORS FOR OPTICAL COMMUNICATIONS
in all of the relevant wavelengths throughout the visible 1.2. Basic Performance Requirements
and near-infrared spectra. The choice of wavelengths in
Before we continue with specific photodetector structures,
optical communications is driven by limitations relating we must first discuss some of the performance require-
to the availability of suitable transmission media and, to ments. Since the relevant wavelengths for optical commu-
a lesser extent, light-emitting devices (LEDs). In general, nications are in the near-infrared region of the spectrum,
photodetectors are not the limiting factors. As a result, both external and internal photoemission of electrons can
discussion of photodetectors is usually limited to one or be utilized to convert the optical signal to electrical signal.
two chapters in books on optoelectronic devices [1–4] or External photoemission devices such as vacuum tubes
optical communications [5,6] with only a few dedicated not only are very bulky but also require high voltages,
books on photodetectors [7–9]. and therefore cannot be used for optical communication
applications. Internal photoemission devices, especially
1.1. Optical Communication Wavelengths semiconductor photodiodes, provide high performance in
compact and relatively inexpensive structures. Semicon-
The early development work on optical fiber waveguides ductor photodetectors are made in a variety of materials
focused on the 0.8–0.9-µm wavelength range because the such as silicon, germanium and alloys of III–V compounds,
first semiconductor optical sources, based in GaAs/AlAs and satisfy many of the important performance and com-
alloys, operated in this region. As silica fibers were fur- patibility requirements:
ther refined, however, it was discovered that transmission
at longer wavelengths (1.3–1.6 µm) would result in lower 1. Compact size for efficient coupling with fibers and
losses [6]. The rapid development of long wavelength fibers easy packaging
has led to attenuation values as low as 0.2 dB/km, which 2. High responsivity (efficiency) to produce a maxi-
is very close to the theoretical limit for silicate glass mum electrical signal for given optical power
fiber [10]. In addition to the intrinsic absorption due to 3. Wide spectral coverage through the use of different
electronic transitions at short wavelengths (high photon materials to allow for photodetectors in all
energies) and interaction of photons with molecular vibra- communication wavelengths
tions within the glass at long (infrared) wavelengths,
4. Short response time to operate at high bit rates
absorption due to impurities in the silica host, most
(wide bandwidth)
notably water incorporated into the glass as hydroxyl or
OH ions, limits the transmission properties of the optical 5. Low operating (bias) voltage to be compatible with
fiber. Furthermore, Rayleigh scattering resulting from the electronic circuits
unavoidable inhomogeneities in glass density, manifesting 6. Low noise operation to minimize the received power
themselves as subwavelength refractive-index fluctua- requirements
tions, represents the dominant loss mechanism at short 7. Low cost to reduce the overall cost of the
wavelengths. The overall typical silica fiber loss has the communication link
well-known form given in Fig. 1. Two of the important com- 8. Reliability (long mean time to failure) to prevent
munication wavelength regions are clearly identifiable as failure of the system
1.3 and 1.55 µm as a direct result of the fiber attenuation 9. Stability of performance characteristics working
characteristics. The third wavelength region we will con- within a variety of ambient conditions, especially
sider is at 0.85 µm, due to the advent of GaAs-based light over a wide range of temperatures
emitters and detectors.
10. Good uniformity of performance parameters to
allow for batch production
100
2. FUNDAMENTAL PROPERTIES AND DEFINITIONS
Experimental OH absorption
Theoretical limit In this section, we describe various definitions relating
10 to the performance of photodetectors, including detection
Attenuation (dB/km)
2.1. Optical Absorption and Quantum Efficiency coefficient α, can be expressed as [11]
The first requirement of photodetection — absorption of
= (0) · (1 − R) · [1 − exp(−αL)] (2)
light — implies that photons have sufficient energy to
excite a charge carrier inside the semiconductor. The most where R is the power reflectivity at the incidence surface.
common absorption event results in the generation of The quantum efficiency (0 ≤ η ≤ 1) of a photodetector
an electron–hole pair in typical intrinsic photodetectors. is the probability that an incident photon will create or
In contrast, an extrinsic photodetector responds to excite a charge carrier that contributes to the detected
photons of energy less than the bandgap energy. In photocurrent. When many photons are present, we
these photodetectors, the absorption of photons result consider the ratio of the detected flux to the incident
in transitions to or from impurity levels, or between the flux of photons. Assuming that all photoexcited carriers
subband energies in quantum wells as is depicted in Fig. 2. contribute to the photocurrent, we obtain
As will be shown below, thin semiconductor devices for
high-speed operation will be required and, therefore, it is
η= (3)
crucial to have materials that absorb the incident light (0)
very quickly. The absorption of photons at a particular
η = (1 − R) · [1 − exp(−αL)] (4)
wavelength is dependent on the absorption coefficient α,
which is the measure of how fast the photon flux decays 2.2. Material Selection
in the material:
As can be deduced from the equations above, it is
d necessary to have a large α to realize high-efficiency
= −α ⇒ (x) = (0) exp(−αx) (1)
dx photodetectors. To capture most of the photons for
single-pass absorption, a semiconductor with a thickness
Therefore, the amount of photon flux that is absorbed of several absorption depth lengths (Labs = 1/α) is
in a material of thickness (or length) L and absorption needed. Figure 3 shows the wavelength dependence of
106 10−2
105 10−1
Penetration depth (µm)
Ge
Absorption (cm−1)
In Ga As
104 0.53 0.47 100
InP
103 101
Si
102 102
GaAs
101 103
400 600 800 1000 1200 1400 1600 1800
Wavelength (nm)
α and absorption (or penetration) depth for a variety photocurrent (in amperes) to the incident light (in watts):
of semiconductor materials. For very small α, Labs is
very large; this results not only in impractical detector photocurrent Ip
= = (A/W) (5)
structures but also in detectors with a necessarily slow incident optical power P0
response. At the other extreme, Labs is very small; that is,
all the radiation is absorbed in the immediate vicinity From here, a simple relationship between quantum
of the surface. For typical high-speed semiconductor efficiency and responsivity can be developed:
photodetectors, it is desirable to have an absorption qη
region between a fraction of a micron and a few {P0 = (0)hν and Ip = q ·
} ⇒ = (6)
hν
micrometers in length; specifically, α should be about
104 cm−1 . Therefore, the high-speed application of the most Responsivity () can also be expressed in terms of
commonly used detector material, silicon, is limited to wavelength (λ):
the visible wavelengths (0.4 µm < λ < 0.7 µm). While the
absorption spectrum of an indirect bandgap semiconductor c qλη η · λ(µ m)
ν= ⇒= (7)
like germanium covers all the optical communication λ hc 1.24
wavelengths, direct bandgap semiconductors such as
It should be noted that the responsivity is directly
III–V compounds are more commonly used for high-speed
proportional to η at a given wavelength. The ideal
applications. For example, GaAs has a cutoff wavelength
responsivity versus λ is illustrated in Fig. 5 together with
of λc = 0.87 µm and is ideal for optical communications
the response of a generic InGaAs photodiode showing
at 0.85 µm. At longer wavelengths, ternary compounds
long- and short-wavelength cutoff and deviation from the
such as InGa(Al)As and quaternary materials such as
ideal behavior.
InGa(Al)AsP and a variety of their combinations are used.
An important consideration when various semiconductor 2.4. Noise Performance and Detectivity
materials are used together to form heterostructures is
the lattice matching of the different constituents and The responsivity equation Ip = P0 suggests that the
the availability of a suitable substrate. Figure 4 shows transformation from optical power to photocurrent is
the lattice constants and bandgap energies of various deterministic or noise-free. In reality, even in an ideal pho-
compound semiconductors. For most of the photodetector toreceiver, two fundamental noise mechanisms [12] — shot
structures designed to operate at 1.3 and 1.55 µm noise [13] and thermal noise [14,15] — lead to fluctuations
wavelengths, compound semiconductors lattice-matched in the current even with constant optical input. The over-
to InP are used. all sensitivity of the photodetector is determined by these
random fluctuations of current that occur in the presence
2.3. Responsivity and absence of an incident optical signal. Various figures
In an analog photodetection system η is recast into a of merit have been developed to assess the noise perfor-
new variable, namely, responsivity, or the ratio of the mance for photodetectors. Although these are not always
2.5
GaP
AIAs
AISb
Bandgap energy (eV)
1.5
GaAs InP
GaSb
0.5
InAs
0
5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2
Lattice constant (angstroms)
Figure 4. Correlation between lattice constant and bandgap energy for various semiconduc-
tor materials.
HIGH-SPEED PHOTODETECTORS FOR OPTICAL COMMUNICATIONS 997
1.4
100%
1.2 90%
80%
1
70%
Responsivity (A/W)
0.8 60%
50%
0.6
40%
0.4 30%
20%
0.2
Q.E. = 10%
0
0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
Wavelength (µm)
Figure 5. Responsivity versus wavelength for ideal case (dashed) and a generic InGaAs
photodiode (solid).
appropriate for high-speed photodetectors for optical com- noise limit, the noise power scales with the total current
munications, it is instructive to discuss the most commonly (Ip + Id , photocurrent + dark current) and bandwidth
f :
used definitions.
The noise equivalent power (NEP) is the input optical σs2 = i2s (t) = 2q(Ip + Id )
f (12)
power that results in unity signal-to-noise ratio (SNR = 1).
The detectivity D is defined as the inverse of NEP where q is the electronic charge. For high-gain detectors
(D = 1/NEP), while specific detectivity D∗ (D-star) incor- such as avalanche photodetectors, or at high incident
porates the area of the photodetector A and the bandwidth power limit, shot noise is much greater than thermal noise:
f of the signal:
Pin ηPin
∗ (A ×
f )1/2 SNR = = (13)
D = (8) 2q
f 2hv
f
NEP
This further neglects the dark current:
For thermal-noise-limited photodetectors (most semicon-
ductor detectors without internal gain), the noise power
Ip2 (Pin )2
(rms) is given by SNR = = (14)
σ2 2q(Pin + Id )
f
4kB T
σT2 = i2T (t) =
f (9) In this case, the SNR scales linearly with the input optical
RL
power and the NEP would not have the traditional units
where RL is the load resistance, T is temperature of W/Hz1/2 and is not traditionally used. Instead, one can
in degrees Kelvin and kB is the Boltzmann constant. calculate and refer to the power (or number of photons) in
Therefore, the NEP has the units of W/Hz1/2 as a ‘‘one’’ bit at given bit rate and SNR values.
shown below:
L
fRC = (20)
2π RL εA
e hn e
h h e h
Figure 11. Photodetector structures: (a) vertically illuminated (top or bottom); (b) vertically
illuminated with lateral carrier collections; (c) resonant cavity enhanced (RCE); (d) waveguide
photodetector (edge-illuminated).
HIGH-SPEED PHOTODETECTORS FOR OPTICAL COMMUNICATIONS 1001
optimization of both the bandwidth and efficiency. A and n regions, it is sufficient to say that the field is at a
further refinement of the WGPD is the traveling-wave maximum across the entire depletion length. Therefore,
photodetector (TWPD), in which the optical signal and the mean electric field is twice that of a conventional
resulting electrical signal travel at the same speed, PN junction and, correspondingly, the transit time is cut
overcoming the capacitance limitations [20]. in half [7], assuming the carriers have not reached their
saturation velocities:
4.2. PIN Detectors
L2
PN junctions are formed simply by bringing into τd ≈ (25)
4Vµ
contact n- (donors) and p-doped (acceptors) semiconductor
regions [21]. When a junction between these two regions
An added benefit of the PIN photodiode is that one does
is formed, diffusion of the electrons from the n region and
not need to apply an external bias to deplete the intrinsic
holes from the p region will cause a charge imbalance
region, as it is usually fully depleted by the built-in
to form as a result of the uncompensated ions that are
potential. Also, having L much greater in length than
left behind. This charge imbalance will create an opposing
the neutral regions results in a greatly reduced diffusion
electric field that will prevent further diffusion of electrons
contribution to the photocurrent. PIN photodiodes can
and holes, resulting in a region depleted of carriers at the
also have very highly doped p and n regions, thereby
junction. The length of the depletion was stated earlier in
reducing access resistance as compared to the one-sided
Eq. (15).
PN junction photodiode.
As was described earlier, photons absorbed in the
An alternative approach to making the neutral regions
depletion region will create electron–hole pairs that are
thin is the use of heterojunctions, or junctions between
separated by the built-in or externally applied field and
two different semiconductors. As was discussed earlier,
drift across the depletion region, inducing current at the
the neutral regions can be made of materials that do not
contacts until reaching the depletion edge. Conversely,
absorb at the operating wavelength. Therefore, carriers
photons that are absorbed in the neutral regions can
will be photogenerated only within the depletion region,
contribute to the photocurrent if the minority carrier is
thereby eliminating any diffusion current. For example,
sufficiently close to the depletion region, or if the diffusion
InGaAs-InP heterojunction PIN photodiodes operating at
length is long enough that recombination does not occur
speeds in excess of 100 GHz have been reported [23,24].
before it reaches the depletion region, where it will be
swept across while inducing current at the contacts. The
minority carrier moves by diffusion through the neutral 4.3. MSM Photodetectors
region, which is an inherently slow process. Therefore, Unlike most photodetectors used in high-speed opti-
great care must be taken to inhibit this process by making cal communications, metal–semiconductor–metal (MSM)
the highly doped P and N regions extremely thin. photodetectors are photoconductors. Photoconductors
In general, the depletion region is designed to be as operate by illuminating a biased semiconductor material
large as possible to increase the absorption of photons. layer, which results in the creation of electron–hole pairs
For PN junctions, applying a reverse bias increases the that raise the carrier concentration and, in turn, increase
depletion region. An alternative approach is a one-sided the conductivity as given by
abrupt junction, formed between a highly doped p(n)
region and lightly doped n(p) region. The depletion length
σ = q(µn n + µp p) (26)
is given by [22]
2εs (Vbi − V) where µn and µp are the electron and hole mobility,
L= (23)
qNB and n and p represent the electron and hole carrier
concentrations, respectively. This increase in carrier
where NB is the doping concentration on the lightly concentration and subsequent increase in conductivity
doped side. However, one-sided PN junctions suffer from results in an increase of the photocurrent given by [4]
high series resistance, which is detrimental to high-speed
photodiodes. Increasing the background doping, NB , in a qηGP
PN junction decreases the transit time τd for the carriers Iph = (27)
hν
by increasing the maximum field strength in the depletion
region. Since the field is triangular, however, the mean where q is the electron charge, G is the photoconductor
electric field will be half the maximum E field: gain, and P represents the optical input power. The
photoconductor gain G, which results from the difference
L2 between the transit time for the majority carrier and the
τd ≈ (24)
2Vµ recombination lifetime of the minority carrier, is given by
absorption area so that the carrier lifetime is reduced, without any absorption and will excite carriers within the
resulting in a reduction of the impulse response [9]. metal over the metal–semiconductor barrier.
MSMs are important devices because they can be One advantage that Schottky detectors have over their
easily monolithically integrated into FET-based circuit PIN counterparts is their reduced contact resistance,
technologies. MSMs, for example, have been demonstrated resulting in a faster response. For example, Schottky
on InGaAs with bandwidths in excess of 40 GHz operating detectors with measured 3-dB bandwidths well above
at 1.3 µm [25], and it has been shown that monolithic 100 GHz have been reported [9,27]. An additional advan-
integration with InP-based devices is possible [26]. tage is their simple material structure and relative ease in
fabrication [18]; however, detector illumination is a com-
4.4. Schottky Photodetectors mon design issue. Because metals and silicides are very
absorptive because of the effects of free-carrier absorption,
A Schottky detector consists of a conductor in contact one cannot design a traditional Schottky detector to be illu-
with a semiconductor, which results in a rectifying minated from the metal side, unless the metal layer is very
contact. As a result, a depletion region forms at thin such that reflection is minimized. Ignoring this issue
the conductor/semiconductor junction, entirely on the would result in a photodiode with poor quantum efficiency.
semiconductor side. The conductor can be metal or silicide,
while the semiconductor region is usually moderately 4.5. Avalanche Photodetectors
doped (∼1017 cm−3 ) to prevent tunneling through a thin
barrier. A generic energy band diagram of a Schottky Avalanche photodetectors are PN junctions with a
junction is illustrated in Fig. 12. large field applied across the depletion region so that
In a basic Schottky detector, the depletion length W and electron–hole pairs created in this region will have
semiconductor work function s are determined by the enough energy to cause impact ionization, and in turn
semiconductor doping, while the metal-to-semiconductor avalanche multiplication, while they are swept across the
barrier height b is set by the high density of surface depletion length. Under this condition, a carrier that is
states formed at the metal/semiconductor interface, where generated by photon absorption is accelerated across the
the Fermi level is ‘‘pinned’’ at a certain position, regardless depletion region, lifting it to a kinetic energy state that
of doping. The semiconductor-to-metal barrier height v , is large enough to ionize the neighboring crystal lattice,
on the other hand, can be controlled through doping, as is resulting in new electron–hole pair generation. Likewise,
described through the following equation [21]: these new pairs are accelerated and multiply with the
production of more electron–hole pairs, until finally these
v = m − s (29) carriers reach the contacts. This avalanche effect results
in a multiplication of the photogenerated carriers and,
in turn, an increase in the collected photocurrent. The
where m represents the metal work function.
multiplication factor M is used to refer to the total number
In the case where the photon energy is greater than
of pairs produced by the initial photoelectron. This factor
the semiconductor bandgap, the Schottky barrier behaves
represents the internal gain of the photodiode; as a result,
much like a one-sided PN junction; for photons incident
APDs can have quantum efficiencies greater than unity
on the metal side of the Schottky barrier, however,
the metal must be thin such that it is effectively 1
transparent. For energies less than the semiconductor M= (30)
1 − (V/VBR )r
bandgap, but greater than the barrier formed at the
metal–semiconductor junction (v < hν < Eg ), photons where VBR represents the breakdown voltage and r is a
incident on the semiconductor side will cross the region material dependent coefficient. Since APDs exhibit gain,
they are highly sensitive to incident light and therefore
are limited by shot noise, as a single incident photon has
the potential to create many carrier pairs. Also, there
is a finite time associated with the avalanche process,
resulting in a slower impulse response. Regardless,
Eo APDs for optical communications have been demonstrated
exhibiting internal gains in excess of 200 at 9 V reverse
q Φm
bias [28], as well as devices with BWE product of
q Φs
q Φb q Φv 90 GHz [29].
Young Investigator Awards in 1996. Dr. Ünlü has 6. J. M. Senior, Optical Fiber Communications, Prentice-Hall,
authored and co-authored more than 150 technical articles New York, 1992.
and several book chapters and magazine articles; edited 7. S. Donati, Photodetectors: Devices, Circuits, and Applications,
one book; holds one U.S. patent; and has several patents Prentice-Hall, Englewood Cliffs, NJ, 2000.
pending. During 1999–2001, he served as the chair of 8. E. L. Dereniak and G. D. Boreman, Infrared Detectors and
the IEEE/LEOS technical subcommittee on photodetectors Systems, Wiley, New York, 1996.
and imaging, and he is currently an associate editor for 9. H. S. Nalwa, ed., Photodetectors and Fiber-Optics, Academic
IEEE Journal of Quantum Electronics. Press, New York, 2001.
10. T. Miya, Y. Terunuma, T. Hosaka, and T. Miyashita, Ulti-
Olufemi Dosunmu was born in Bronx, New York, in
mate low-loss single-mode fibre at 1.55 µm, Electron. Lett.
1977. He graduated as Salutatorian from Lakewood High 15(4): 106–108 (1979).
School in Lakewood, New Jersey in 1995, and received both
11. T. P. Lee and T. Li, Photodetectors, in S. E. Miller and
his B.S. and M.S. degrees in Electrical Engineering with
A. G. Chynoweth, eds., Optical Fiber Telecommunications,
his thesis entitled Modeling and Simulation of Intrinsic
Academic Press, New York, 1979, pp. 593–626.
and Measured Response of High Speed Photodiodes, along
with a minor in Physics from Boston University in 12. D. K. C. MacDonald, Noise and Fluctuations in Electronic
Devices and Circuits, Oxford Univ. Press, Oxford, 1962.
May 1999. While at Boston University, he was awarded
the 4-year Trustee Scholarship in 1995, the Golden 13. W. Schottky, Ann. Phys. 57: 541 (1918).
Key National Honor Society award in 1998, as well as 14. J. B. Johnson, Phys. Rev. 32: 97 (1928).
the National Defense Science & Engineering Graduate 15. H. Nyquist, Phys. Rev. 32: 110 (1928).
(NDSEG) Fellowship in 2001. In 1997, he was a summer
intern at AT&T Labs in Red Bank, New Jersey and, 16. D. G. Parker, The theory, fabrication and assessment of ultra
high speed photodiodes, Gec. J. Res. 6(2): 106–117 (1988).
in 1998, at Princeton Plasma Physics Laboratories in
Princeton, New Jersey. Between 1999 and 2000, he was 17. S. M. Sze, Semiconductor Device Physics and Technology,
employed at Lucent Technologies in the area of ASIC Wiley, New York, 1985.
design for high-speed telecommunication systems. Mr. 18. M. Gökkavas et al., Design and optimization of high-speed
Dosunmu is expected to complete his PhD in January resonant cavity enhanced schottky photodiodes, IEEE J.
2004. Quant. Electron. 35(2): 208–215 (1999).
19. B. E. A. Saleh and M. C. Teich, Fundamentals of Photonics,
Matthew K. Emsley was born in 1975, in the northern Wiley, New York, 1991.
suburbs of Wilmington, Delaware. He graduated from 20. K. S. Giboney, M. J. W. Rodwell, and J. E. Bowers, Traveling-
Brandywine High School in 1993 and received his B.S. wave photodetectors, IEEE Photon. Technol. Lett. 4(12):
degree in Electrical Engineering from The Pennsylvania 1363–1365 (1992).
State University in December 1996, and a M.S. degree 21. B. G. Streetman, Solid State Electronic Devices, 4th ed.,
in Electrical Engineering from Boston University in May Prentice-Hall, Englewood Cliffs, NJ, 1995.
2000 for his thesis entitled Reflecting Silicon-on-Insulator
22. S. M. Sze, Physics of Semiconductor Devices, 2nd ed., Wiley,
(SOI) Substrates for Optoelectronic Applications. While at
New York, 1981.
Boston University Mr. Emsley was awarded the Electrical
and Computer Engineering Chair Fellowship in 1997 and 23. D. L. Crawford et al., High speed InGaAs-InP p-i-n photodi-
odes fabricated on a semi-insulating substrate, IEEE Photon.
the Outstanding Graduate Teaching Fellow award for
Technol. Lett. 2(9): 647–649 (1990).
1997/98. In 2001 Matthew was awarded a LEOS Travel
Award as well as the H.J. Berman ‘‘Future of Light’’ 24. Y. Wey et al., 108-GHz GaInAs/InP p-i-n photodiodes with
Prize in Photonics for his poster entitled Silicon Resonant- integrated bias tees and matched resistors, IEEE Photon.
Tech. Lett. 5(11): 1310–1312 (1993).
Cavity-Enhanced Photodetectors Using Reflecting Silicon
on Insulator Substrates at the annual Boston University 25. E. H. Böttcher et al., Ultra-wide-band (>40 GHz) submicron
Science Day. Mr. Emsley is expected to complete his Ph.D. InGaAs metal-semiconductor-metal photodetectors, IEEE
in May 2003. Photon. Technol. Lett. 8(9): 1226–1228 (1996).
26. J. B. D. Soole and H. Schumacher, InGaAs metal-semiconduc-
tor-metal photodetectors for long wavelength optical commu-
BIBLIOGRAPHY nications, IEEE J. Quantum Electron. 27(3): (1991).
27. E. Özbay, K. D. Li, and D. M. Bloom, 2.0 ps, 150 GHz GaAs
1. D. Wood, Optoelectronic Semiconductor Devices, Prentice-
monolithic photodiode and all-electronic sampler, IEEE
Hall, New York, 1994.
Photon. Technol. Lett. 3(6): 570–572 (1991).
2. J. Singh, Semiconductor Optoelectronics, McGraw Hill, New
28. R. Kuchibhotla et al., Low-voltage high-gain resonant-cavity
York, 1995.
avalanche photodiode, IEEE Photon. Technol. Lett. 3(4):
3. K. J. Ebeling, Integrated Opto-electronics, Springer-Verlag, 354–356 (1991).
New York, 1992.
29. H. Kuwatsuka et al., An Alx Ga1−x Sb avalanche photodiode
4. P. Bhattacharya, Semiconductor Optoelectronic Devices, 2nd with a gain bandwidth product of 90 GHz, IEEE Photon.
ed., Prentice-Hall, Englewood Cliffs, NJ, 1997. Technol. Lett. 2(1): 54–55 (1990).
5. G. P. Agrawal, Fiber-Optic Communication Systems, Wiley, 30. M. Gökkavas et al., High-speed high-efficiency large-area
New York, 1992. resonant cavity enhanced p-i-n photodiodes for multimode
1006 HORN ANTENNAS
EDWARD V. JULL b
University of British Columbia
Vancouver, British Columbia,
Canada
x2
ρ = (( cos α)2 + x2 )1/2 ≈ cos α + (1)
2 cos α
when x
cos α. Thus the phase variation in radians
across the aperture for small flare angles α is approx-
imately kx2 /(2), where k = 2π /λ is the propagation
r x constant. This quadratic phase variation increases with
a
increasing flare angle, thus reducing directivity increase
d due to the enlarged aperture dimension. It is convenient
0
to quantify aperture phase variation by the parameter
(1 − cos α) d2
s= ≈ ,d
(2)
λ 8λ
Figure 2. Effect of horn flare on the aperture field phase of a which is the approximate difference in wavelengths
horn. between the distance from the apex to the edge (x = d/2)
and the center (x = 0) of the aperture. The radiation
patterns of Fig. 3a,b [2] show the effect of increasing s
patterns, so when used as a feed for a reflector, there on the E- and H-plane radiation patterns of sectoral and
is substantial spillover, or radiation missing the reflector pyramidal horns. The main beam is broadened, the pattern
and radiation directly backward from the feed. To increase nulls are filled, and the sidelobe levels are raised over
the directivity of a radiating waveguide and its efficiency those for an in-phase aperture field (s = 0). With large
as a reflector feed, for example, its aperture dimensions flare angles radiation from the extremities of the aperture
must be enlarged, for the beamwidth of an aperture of can be so out of phase with that from the center that the
width a λ is proportional to λ/a radians. horn directivity decreases with increasing aperture width.
This waveguide enlargement by a flare characterizes The adverse effects of the flare can be compensated by
horns. The aperture fields of a horn are spherical waves a lens in the aperture (Fig. 4a), but because that adds to
originating at the horn apex (Fig. 2). The path from the the weight and cost and because bandwidth limitations
H
0.9 0.9 q
a
0.8 0.8
Absolute value of relative field strength
1 a2
s=
0.7 0.7 8l H
s =1
3/4
E
0.6 q 0.6
b
0.5 b2 0.5
Maximum phase deviation
s= =
8 l H in wavelengths
0.4 0.4 s=1
3/4
0.3 0.3
Clearly (5) and (6) differ significantly only for large angles is in space, then (5) is used for A(r, θ, φ), but this is not
θ off the beam axis. an accurate solution since the aperture dimensions are
If the aperture field is separable in the aperture not large. Rectangular waveguides mounted in conducting
coordinates — that is, if in (3), Ex (x, y, 0) = E0 E1 (x)E2 (y) planes use (6) for A(r, θ, φ) in (7), which then accurately
where E1 (x) and E2 (y) are field distributions normalized provides the far field. The pattern has a single broad
to E0 , the double integral is the product of two single lobe with no sidelobes. For large apertures plots of
integrals. the normalized E-plane (φ = 0) and H-plane (φ = π/2)
patterns of (7) appear in Fig. 3a,b for those of a horn
E(r, θ, φ) = A(r, θ, φ)E0 F1 (k1 )F2 (k2 ) (7) with no flare (s = 0), but without the factor (1 + cos θ )/2
from (5) or cos θ from (6).
where
a/2 4.2. Circular Waveguides
F1 (k1 ) = E1 (x)ejk1 y dx (8)
−(a/2)
The dominant TE11 mode field in circular waveguide
b/2
produces an aperture field distribution, which in the
F2 (k2 ) = E2 (y)ejk2 y dy (9) aperture coordinates ρ ,φ of Fig. 6b is
−(b/2)
J1 (kc ρ )
define the radiation field. E(ρ , φ ) = E0 ρ̂ cos φ
+ φ̂
J (k
1 c ρ
) sin φ
(13)
kc ρ
4. OPEN-ENDED WAVEGUIDES where J1 is the Bessel function of the first kind and order,
J1 is its derivative with respect to its argument kc ρ , and
4.1. Rectangular Waveguides kc a/2 = 1.841 is its first root; E0 is the electric field at
the aperture center (ρ = 0). Since (13) is not linearly
With the TE10 waveguide mode the aperture field polarized, its use in (3) provides only part of the total
radiated far field. The total field
πy
Ex (x, y, 0) = E0 cos (10)
a ka
e −jkr
J1
2
in (7) yields the following equations for (8) and (9): E(r, θ, φ) = jkaE0 J1 (1.841) θ̂ cos φ
r
k a
k1 b 2
sin
F1 (k1 ) = b
2 ka
(11) J1
k1 b 2
2 +φ̂ sin φ cos θ 2 (14)
ka
k2 a 1−
cos 3.682
2
F2 (k2 ) = a
π 2 − (k2 a)2 (12)
in which k = k sin θ .
In the E and H planes (φ = 0 and π/2) the cross-
polarized fields cancel and the patterns shown in Fig. 14a
This defines the radiation pattern in the forward are similar to those of (11) and (12), respectively, but
hemisphere −π/2 < θ < π/2, 0 < φ < 2π . If the aperture with slightly broader beams and lower sidelobes for
x
x
f
f
f′
r
r r′
a q
b q 2
y
y z
a z
Figure 6. Coordinates for radiation from (a) rectangular and (b) circular apertures.
HORN ANTENNAS 1011
the same aperture dimensions. As with rectangular Fig. 3b shows corresponding plots of |I2 (k sin θ )/I2 (0)| for
waveguides, open-ended circular waveguide apertures are the H-plane flare parameter s = a2 /8λH . For no flare
insufficiently large for (14) to represent all the radiated (s = 0) the patterns are those of a large open-ended
fields accurately. In the principal planes (φ = 0, π/2), it rectangular waveguide supporting only the TE10 mode.
can give a reasonable approximation for the copolarized The effect of the flare is to broaden the main beam, raise
fields but fails to accurately represent the cross-polarized the sidelobes, and fill the pattern nulls. For larger values
field patterns in φ = π/4. This is evident from a comparison of s, there is enhanced pattern beam broadening and
of numerical results from approximate and exact solutions eventually a splitting of the main beam on its axis.
(see Collin [4], p. 233). These curves also represent the radiation patterns of
the E/H-plane sectoral horns of Fig. 1b,c. For an E-plane
5. PYRAMIDAL AND SECTORAL HORNS sectoral horn (H → ∞), the E-plane pattern is given
by (19) and the H-plane pattern approximately by (12).
5.1. Radiation Patterns For an H-plane sectoral horn (E → ∞), the E-plane
pattern is given approximately by (11) and the H-plane
A pyramidal horn fed by a rectangular waveguide pattern by (20).
supporting the TE10 mode has an incident electric field In comparing parts (a) and (b) of Fig. 3 it is evident that
in the aperture of Fig. 6a that is approximately the mode E-plane beamwidths of a square aperture are narrower
distribution modified by a quadratic phase variation in the than H-plane beamwidths. For horns of moderate
two aperture dimensions: flare angle and optimum horns the E-plane half-power
πy 2 beamwidth is 0.89λ/b radians and the H-plane half-power
x y2
Ex (x, y, 0) = E0 cos exp −jk + (15) beamwidth is 1.22λ/a radians. E-plane patterns have
a 2E 2H minimum sidelobes of −13.3 dB below peak power, while
H-plane pattern minimum sidelobes levels are −23.1 dB.
With (15), Eq. (3) becomes
The universal patterns of Fig. 3a,b can also be used to
E(r, θ, φ) = A(r, θ, φ)E0 I1 (k1 )I2 (k2 ) (16) predict the approximate near-field radiation patterns of
horns by including the quadratic phase error; which is a
where (5) is used for A(r, θ, φ) and first-order effect of finite range r. This is done by including
2 π
b/2
πx exp −j (x2 + y2 ) (25)
I1 (k1 ) = exp −j − k1 x dx (17) rλ
−(b/2) λE
a/2 πy 2 in (15). Then the near field principal plane patterns of
πy
I2 (k2 ) = cos exp −j − k2 y dy (18) a pyramidal horn are given by (17) and (18) with E , H
−(a/2) a λH
replaced by
rH
The E-plane (φ = 0) and H-plane (φ = π/2) radiation H = (26)
r + H
patterns are, respectively
(a) 25
Moment method
15 Approximate
Measured
Gain (dB)
−5
−15
−25
−35
180 150 120 90 60 30 0 30 60 90 120 150 180
q (degrees)
E - plane
(b) 25
15 Moment method
Approximate
Measured
5
−5
Gain (dB)
−15
−25
−35
−45
Figure 7. Calculated and measured (a) E-plane
and (b) H-plane radiation patterns of a pyra- −55
180 150 120 90 60 30 0 30 60 90 120 150 180
midal horn of dimensions a = 4.12λ, b = 3.06λ,
q (degrees)
E = 10.52λ E = 9.70λ. (Copyright 1993, IEEE, from
Liu et al. [5].) H - plane
intensive, but some results from Liu et al. (5) are this gain due to the phase variation introduced by the
shown in Fig. 7 and compared with measurements and E-plane flare of the horn is
approximate computations. Their numerical computations
and measurements by Nye and Liang (6) of the aperture C2 (u) + S2 (u)
fields show that higher-order modes need to be added to RE (u) = (29)
u2
the dominant mode field of (15) and that the parabolic
phase approximation of (1) improves as the aperture size where the Fresnel integrals and their argument are
increases. defined by (21) and (23). Similarly the gain reduction
factor due to the H-plane flare of the horn is
5.3. Gain
π 2 [C(v) − C(w)]2 + [S(v) − S(w)]2
Pyramidal horns are used as gain standards at microwave RH (v, w) = (30)
4 (v − w)2
frequencies because they can be accurately constructed
and their axial directive gain reliably predicted from a where
relatively simple formula. The ratio of axial far-field power v a 1 λH
density to the average radiated power density from (16) = ±√ + (31)
w 2λH a 2
yields
G = G0 RE (u)RH (v, w) (28) A plot of RE and RH in decibels as a function of the
parameter 2d2 /λ, where d is the appropriate aperture
where G0 = 32ab/(π λ2 ) is the gain of an in-phase uniform dimension b or a and the slant length E or H ,
and cosinusoidal aperture distribution. The reduction of respectively, is shown in Fig. 8. Calculation of the gain
HORN ANTENNAS 1013
RH 6. CONICAL HORNS
Gain reduction (DB)
4π
G = 0.5 ab (32)
λ2
where
b
u = (34)
2λg E
−40
and −50 −25 0 25 50
λ q deg
λg = (35)
1 − (λ/2a)2 Figure 9. Copolar and crosspolar radiation patterns for a conical
horn with dimensions a = 4λ, = 23λ - - - - E-plane, − · − · −
is the guide wavelength. The accuracy of (33) is comparable H-plane − − − − crosspolarization. (Copyright 1994, IEE, from
to that of (28) for the horns of similar b dimension. Olver et al. [7].)
1014 HORN ANTENNAS
7. MULTIMODE AND CORRUGATED HORNS but without the frequency bandwidth limitations of the
multimode horn. This is achieved by introducing annular
Lack of axisymmetric radiation patterns make rectangular corrugations to the interior walls of a conical horn. There
and conical horns inefficient reflector feeds. Conical horns must be sufficient corrugations per wavelength (at least 3)
also have unacceptably high cross-polarization levels if that the annular electric field Eφ is essentially zero on
used as reflector feeds in a system with dual polarization. the interior walls. The corrugations make the annular
Multimode and corrugated horns were developed largely to magnetic field Hφ also vanish. This requires corrugation
overcome these deficiencies. In a dual-mode horn in (Ref. 3, depths such that short circuits at the bottom of the grooves
p. 195), this is done by exciting the TM11 mode, which appear as open circuits at the top, suppressing axial
propagates for circular waveguide diameters a > 1.22λ, current flow on the interior walls of the horn. This groove
in addition to the TE11 mode, which propagates for depth is λ/4 on a plane corrugated surface or a curved
a > 0.59λ. The electric field configuration of these modes surface of large radius. For a curved surface of smaller
in a waveguide cross section is shown in Fig. 10a,b. Added radius, such as near the throat of the horn, the slot depths
in phase and in the right proportion, cross-polarized and need to be increased; For example, for a surface radius of
aperture perimeter fields cancel, while the copolar fields 2λ, the depth required is 0.3λ. Usually slots are normal
around the aperture center add, yielding the aperture to the conical surface in wide-flare horns but are often
field configuration of Fig. 10c. These mixed mode fields are perpendicular to the horn axis with small flares. To provide
linearly polarized and taper approximately cosinusoidally a gradual transition from the TE11 mode in the wave guide
radially across the aperture. This yields the essentially to a hybrid HE11 mode in the aperture, the depth of the first
linearly polarized and axisymmetric radiation patterns corrugation in the throat should be about 0.5λ so that the
desired. surface there resembles that of a conducting cone interior.
Partial conversion of TE11 to TM11 fields can be effected Propagation in corrugated conical horns can be accurately
by a step discontinuity in the circular waveguide feed, calculated numerically by mode matching techniques. The
as in Fig. 10d, or by a circular iris or dielectric ring in aperture field is approximately
the horn. The TM11 /TE11 amplitude ratio depends on the
ratio of waveguide diameters, and the relative phase of the −jkρ 2
Ex (ρ ) = AJ0 (kc ρ ) exp (37)
modes depends on the length of larger-diameter circular 2
waveguide and the horn. This dependence limits the
where kc a/2 is 2.405, the first zero of the zero order Bessel
frequency bandwidth of the horn to about 5%. A multimode
function J0 ; is the slant length of the horn; and A is a
square pyramidal horn has similarly low sidelobe levels
constant. This aperture field is similar to that of Fig. 10c,
in its E- and H-plane radiation patterns because of an
and the resulting E and H patterns are similarly equal
essentially cosinusoidal aperture distribution in both E
down to about −25 dB. Some universal patterns are shown
and H planes [2]. This is achieved by excitation of a hybrid
in Fig. 11. Cross-polarization fields are also about −30 dB
TE21 /TM21 mode either by an E-plane step discontinuity
from the axial values, but now over a bandwidth of 2–1
or by changes in the E-plane flare. With their bandwidth
or more.
very limited, dual-mode horns have largely been replaced
Broadband axisymmetric patterns with low cross-
by corrugated horns in dual-polarization systems, except
polarization make corrugated horns particularly attractive
where a lack of space may give an advantage to a thin-
as feeds for reflectors. Low cross-polarization allows the
walled horn.
use of dual polarization to double the capacity of the
Corrugated horns have aperture fields similar to those
system. Another notable feature for this application is
of Fig. 10c and consequently similar radiation patterns,
that the position of the E- and H-plane pattern phase
centers coincide. Figure 12 shows the distance of the phase
center from the horn apex, divided by the slant length, of
(a) (b) (c) small flare angle conical [8] and corrugated [9] horns for
values of the phase parameter s given by (2). For a conical
horn the E-plane phase center is significantly farther from
the aperture than the H-plane phase center. Thus, if a
conical horn is used to feed a parabolic reflector, the best
location for the feed is approximately midway between the
TE11 TM11 0.85 TE11 + 0.15 TM11 E- and H-plane phase centers. With a corrugated horn
such a compromise is not required, so it is inherently more
(d)
efficient.
Corrugated horns may have wide flare angles, and their
aperture size for optimum gain decreases correspondingly.
For example, with a semiflare angle of 20◦ , the optimum
aperture diameter is about 8λ, whereas for a semiflare
angle of 70◦ it is 2λ. Wide-flare corrugated horns are
sometimes called ‘‘scalar horns’’ because of their low cross-
Figure 10. Excitation of axisymmetric linearly polarized aper- polarization levels.
ture fields in a stepped conical horn. (Copyright 1984, For radio astronomy telescope feeds and other space-
McGraw-Hill, Inc. from Love [2].) science applications, efficient corrugated horns have been
HORN ANTENNAS 1015
0
l
4
−10 l
0.5 2
0.4 a
Relative power, dB
0.3
−20
0.2
0.1
S =0 L
−30
−40
0 1 2 3 4 5 6 7 8 9 10 11 12 curve along its length. This arrangement provides a
pa sin q horn shorter than a conical corrugated horn of similar
l beamwidth, with a better impedance match due to the
Figure 11. Universal patterns of small-flare-angle corrugated curved profile at the throat and an essentially in-phase
horns as a function of the parameter s = a2 /8λ. (Copyright 1984, aperture field distribution due to the profile at the
McGraw-Hill, Inc. from Love [2].) aperture. Consequently the aperture efficiency is higher
than that of conical corrugated horns. The phase center of
the horn is near the aperture center and remains nearly
made by electroforming techniques for frequencies up to fixed over a wide frequency band. Radiation patterns of
640 GHz. Their axisymmetric radiation patterns with very a short profile horn similar to that of Fig. 13, but with
low sidelobe levels resemble Gaussian beams, which is hyperbolic profile curves, are shown in Fig. 14 [10]. A
often essential at submillimeter wavelengths. Gaussian profile curve has also been used. All produce
patterns similar to those of a Gaussian beam, such as is
8. PROFILE HORNS radiated from the end of an optical fiber supporting the
HE11 mode. The performance of this small horn as a feed
Most corrugated horns are conical with a constant flare seems close to ideal, but larger-profile horns may exhibit
angle. Figure 13 shows a profile conical horn in which higher sidelobe levels due to excitation of the HE12 mode
the flare angle varies as on a sine-squared or similar at the aperture.
1.0
0.8 a
ph
0.6
ph
0.4
0.2
(a) 0 (b) 0
20 −20
dB
40 −40
60 −60
90° 45° 0 45° 90° 90° 45° 0 45° 90°
q q
Figure 14. (a) Far field radiation patterns of TE11 mode and (b) radiation patterns of a profile
corrugated horn of aperture a = 15.8 mm and length L = 26.7 mm at 30 GHz (- - - - - E-plane,
H-plane . . . . . . crosspolarization). (Copyright 1997, IEEE, from Gonzalo et al. [10].)
9. HORN IMPEDANCE sidelobe levels and less back radiation in their E-plane
patterns than do conventional pyramidal and conical
Antennas must be well matched to their transmission horns. Their H-plane flare patterns are affected little by
lines to ensure a low level of reflected power. In such aperture matching because the electric field vanishes
microwave communications systems levels below −30 dB at the relevant edges.
are commonly required. For dual-mode and corrugated horns there are also
The impedance behavior of a horn depends on the negligible fields at the aperture edges and hence little
mismatch at the waveguide/horn junction and at its diffraction there. Corrugated horns with initial groove
aperture. For an E-plane sectoral horn, reflections depths near the throat of about a half-wavelength and
from these discontinuities are comparable in magnitude, which gradually decrease to a quarter-wavelength near
and since they interfere, the total reflection coefficient the aperture, as in Fig. 13, are well matched at both throat
oscillates with frequency and the input voltage standing- and aperture. For most well-designed corrugated horns a
wave ratio (VSWR) may vary from 1.05 at high frequencies VSWR of less than 1.25 is possible over a frequency range
to 1.5 at the lowest frequency. With E-plane sectoral of about 1.5–1. Dual-mode horns using a step discontinuity
horns aperture reflection is much stronger than junction as in Fig. 10d may have a VSWR of 1.2–1.4. If an iris is
reflection, so their VSWRs increase almost monotonically required for a match, the frequency bandwidth will, of
with decreasing frequency. An inductive iris in the course, be limited. Conical and pyramidal horns using
waveguide near the E-plane horn junction can match its flare angle changes to generate the higher-order modes
discontinuity. A capacitive iris may be similarly used for can have VSWRs less than 1.03 and require no matching
an H-plane sectoral horn. Aperture reflections in these devices.
horns may be matched with dielectric covers.
Pyramidal horns of sufficient size and optimum
design tend to be inherently well matched to their BIOGRAPHY
waveguide feeds because the E/H-plane aperture and flare
discontinuities partially cancel. For example, a 22-dB-gain Edward V. Jull received his B.Sc.degree in engineering
horn has a VSWR of about 1.04 and an 18 dB horn a VSWR physics from Queen’s University, Kingston, Ontario,
of less than 1.1. Canada in 1956, a Ph.D in electrical engineering in 1960
Conical horns fed by circular waveguides supporting and a D.Sc.(Eng.) in 1979, both from the University
the dominant TE11 mode have an impedance behavior of London, United Kingdom. He was with the Division
similar to that of pyramidal horns of comparable size of Radio and Electrical Engineering of the National
fed by rectangular waveguides. The waveguide/horn Research Council, Ottawa, Canada, from 1956 to 1957
discontinuities of both horns may be matched by an iris in the microwave section and from 1961 to 1972 in the
placed in the waveguide near the junction. A broader antenna engineering section. From 1963 to 1965 he was
bandwidth match is provided by a curved transition a guest researcher in the Laboratory of Electromagnetic
between the interior walls of the waveguide and the Theory of the Technical University of Denmark in Lyngby,
horn. Broadband reduction of aperture reflection may be and the Microwave Department of the Royal Institute of
similarly reduced by a curved surface of a few wavelengths’ Technology in Stockholm, Sweden. In 1972, he joined the
radius. Such ‘‘aperture-matched’’ horns also have lower Department of Electrical Engineering at the University
HUFFMAN CODING 1017
of British Columbia in Vancouver, British Columbia, letter a and then replaces each letter xi in x by its codeword
Canada. He became professor emeritus in 2000, but C(xi ), yielding a binary sequence C(x1 )C(x2 ) · · · C(xn ). The
remains involved in teaching and research on aperture mapping C, which maps each letter a ∈ A into its codeword
antennas and diffraction theory and is the author of a book C(a), is called a code. For example, the American Standard
so titled. He was president of the International Union of Code for Information Interchange (ASCII) assigns a 7-bit
Radio Science (URSI) from 1990 to 1993. codeword to each letter in the alphabet of size 128
consisting of numbers, English letters, punctuation marks,
BIBLIOGRAPHY and some special characters.
The ASCII code is a fixed-length code because all
1. J. F. Ramsay, Microwave antenna and waveguide techniques codewords have the same length. Fixed-length codes are
before 1900, Proc. IRE 46: 405–415 (1958). efficient only when all letters are equally likely. In practice,
2. A. W. Love, Horn antennas, in R. C. Johnson and H. Jasik, however, letters appear with different frequencies in the
eds., Antenna Engineering Handbook, 2nd ed., McGraw-Hill, raw-data sequence. In this case, one may want to assign
New York, 1984, Chap. 15. short codewords to letters with high frequencies and long
3. A. W. Love, ed., Electromagnetic Horn Antennas, IEEE Press,
codewords to letters with low frequencies, thus making
Piscataway, NJ, 1976. the whole binary sequence C(x1 )C(x2 ) · · · C(xn ) shorter. In
general, variable-length codes, in which codewords may
4. R. E. Collin, Antennas and Radiowave Propagation, McGraw-
be of different lengths, are more efficient than fixed-
Hill, New York, 1985.
length codes.
5. K. Liu, C. A. Balanis, C. R. Birtcher, and G. C. Barber, Anal- For a variable-length code, the decoding process — the
ysis of pyramidal horn antennas using moment methods,
process of recovering x from C(x1 )C(x2 ) · · · C(xn ) — is a bit
IEEE Trans. Antennas Propag. 41: 1379–1389 (1993).
more complicated than in the case of fixed-length codes. A
6. J. F. Nye and W. Liang, Theory and measurement of the field variable-length code (or simply code) is said to be uniquely
of a pyramidal horn, IEEE Trans. Antennas Propag. 44: decodable if one can recover x from C(x1 )C(x2 ) · · · C(xn )
1488–1498 (1996).
without ambiguity. As the uniquely decodable concept
7. A. D. Olver, P. J. B. Clarricoats, A. A. Kishk, and L. Shafai, suggests, not all one-to-one mappings from letters to
Microwave Horns and Feeds, IEE Electromagnetic Waves codewords are uniquely decodable codes [1, Table 5.1].
Series, Vol. 39, IEE, London, 1994. Also, for some uniquely decodable code, the decoding
8. T. Milligan, Modern Antenna Design, McGraw-Hill, New process is quite involved; one may have to look at the entire
York, 1985, Chap. 7. binary sequence C(x1 )C(x2 ) · · · C(xn ) to even determine
9. B. MacA. Thomas, Design of corrugated horns, IEEE Trans. the first letter x1 . A uniquely decodable code with
Antennas Propag. 26: 367–372 (1978). instantaneous decoding capability — once a codeword is
10. R. Gonzalo, J. Teniente, and C. del Rio, Very short and received, it can be decoded immediately without reference
efficient feeder design from monomode waveguide, IEEE to the future codewords — is called a prefix code. A simple
Antennas and Propagation Soc. Int. Symp. Digest, Montreal, way to check whether a code is a prefix code is to verify
1997, pp. 468–470. the prefix-free condition: a code C is a prefix code if and
11. E. V. Jull, Aperture Antennas and Diffraction Theory, IEE only if no codeword in C is a prefix of any other codeword.
Electromagnetic Waves Series, Vol. 10, IEE, London, 1981. A prefix code can also be represented by a binary tree in
which terminal nodes are assigned letters from A and the
codeword of a letter is the sequence of labels read from the
root to the terminal node corresponding to the letter.
HUFFMAN CODING
0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.4 0.6 1.0
a1 a2 a3 a1 a1
0.1 0.1 0.2 0.2 0.1 0.1 0.4 0.2 0.4 0.6
a4 a5 a2 a3 a4 a5 a1
0
0.2 0.2 0.1 0.1 0.4 0.2
a2 a3 a4 a5
0.2 0.2 0.1 0.1
a2 a3 a4 a5
100 101 110 111
Therefore, in view of the above result, it suffices to find Property P1 is straightforward. If a node other than the
an optimal prefix code C such that the total length nr(C) root has no sibling, then this node can be merged with
of C(x1 )C(x2 ) · · · C(xn ) is a minimum, where its parent. As a result, all codewords of the terminal
nodes in the subtree rooted at this node are shortened
r(C) = p(a)l(a) by 1, and the resulting prefix code will be better than C,
a∈A which contradicts to the optimality of C. Property P2 is
denotes the average codeword length of C in bits per letter, equivalent to the principle of assigning short codewords
and p(a) is the normalized frequency of a in x if x is to highly likely letters and long codewords to less likely
deterministic and the probability of a if x is random. letters. Properties P3 and P4 are implied by properties P1
and P2.
Given a probability distribution or a set of normalized Property P4 implies a recursive procedure to design an
frequencies {p(a) : a ∈ A}, an interesting problem is how optimal prefix code C given the probabilities p1 , p2 , . . . , pJ .
to find such an optimal prefix code C. This problem is Rewrite C as CJ . Suppose that pJ−1 and pJ are the two
solved when the size of A is finite. Instead of performing least probabilities. From property P4, we know that aJ−1
an exhaustive search among all prefix codes, Huffman [2] and aJ are sibling terminal nodes in the binary tree
proposed an elegant algorithm in 1952, which is now corresponding to CJ . Merge aJ−1 and aJ with their common
known as the Huffman coding algorithm, to generate parent, which then becomes a terminal node corresponding
optimal prefix codes based on a set of probabilities to a merged letter with probability pJ−1 + pJ . The new
or frequencies. The resulting optimal codes are called binary tree has J − 1 terminal nodes and gives rise to
Huffman codes. a new prefix code CJ−1 with J − 1 codewords of lengths
l1 , l2 , . . . , lJ−2 , lJ−1 − 1. The average codeword length of CJ
2. HUFFMAN CODING ALGORITHM is related to that of CJ−1 by
Any elegant algorithm has its recursive procedure. There r(CJ ) = r(CJ−1 ) + pJ−1 + pJ
is no exception in the case of the Huffman coding Since the quantity pJ−1 + pJ is independent of CJ−1 ,
algorithm. Let A = {a1 , a2 , . . . , aJ } with 2 ≤ J < ∞. For minimizing r(CJ ) is equivalent to minimizing r(CJ−1 ).
each 1 ≤ j ≤ J, rewrite p(aj ) and l(aj ) as pj and lj , Consequently, in order to design an optimal code CJ
respectively. Let C be an optimal prefix code such that for J letters, we can first design an optimal code CJ−1
J for J − 1 letters, and then extend CJ−1 to CJ while
r(C) = pj lj (1) maintaining optimality. Recursively reduce the alphabet
j=1 size by merging the two least likely letters of the
corresponding alphabet. We finally reduce the problem
is minimized among all prefix codes. The following to design an optimal code C2 for two letters, for which the
properties of the optimal code C are helpful to uncover solution is obvious; that is, we assign 0 to one letter and 1
the recursive procedure of the Huffman coding algorithm: to the other. Since r(Cj ) is minimized for each 2 ≤ j ≤ J, we
find that C2 , C3 , . . . , CJ are all optimal prefix codes with
P1. C is a complete prefix code, that is, the binary respect to their corresponding probability distributions.
tree corresponding to C is a complete binary tree This is the essence of the recursive procedure of the
in which every node other than the root has its Huffman coding algorithm.
sibling. To summarize, given a set of probabilities, the Huffman
P2. If pi > pj , then li ≤ lj . coding algorithm generates optimal prefix codes for A
P3. The two longest codewords of C, which correspond according to the following procedure:
to the two least likely letters, have the same length.
P4. If in the binary tree corresponding to C, terminal Step 1. Start with m = J trees, each of which consists
nodes with the same depth are rearranged of exactly one node corresponding to a letter in A.
properly, then the two least likely letters are sibling Set the weight of each node as the probability of the
terminal nodes. corresponding letter.
HUFFMAN CODING 1019
Step 2. For m ≥ 2, find the two trees T1(m) and T2(m) difference between r(C) and H(p1 , p2 , . . . , pJ ) may be well
with the least weights at their roots. Combine T1(m) below 1. For detailed improvements on the upper and lower
and T2(m) into a new tree so that in the new tree, the bounds, the reader is referred to Gallager [3], Capocelli
roots of T1(m) and T2(m) are siblings with the root of et al. [4], and Capocelli and De Santis [5].
the new tree as their parent. The root of the new Another method to improve the upper bound is to design
tree carries a weight equal to the sum of the weights a Huffman code C for an extended alphabet AN , where AN
of its two children. The total number of trees m is consists of all length N strings from A. In this case, one
reduced by 1. assigns a codeword to each block of N letters rather than
a single letter. Accordingly, r(C ) is the average codeword
Step 3. Repeat step 2 for m = J, J − 1, . . . , 2 until only
length in bits per block and is within one bit of the block
one tree is left. The final tree gives rise to an optimal
entropy. Thus, r(C )/N, the average codeword length in
prefix code.
bits per letter, is within 1/N bits of the entropy per letter.
The following example further illustrates the procedure. As N increases, r(C )/N can be arbitrarily close to the
entropy per letter. Since the complexity of the Huffman
Example 2. Consider an alphabet A = {a1 , a2 , a3 , a4 , a5 } coding algorithm grows exponentially with respect to N,
with the probability distribution (0.4, 0.2, 0.2, 0.1, 0.1). this method works only when both N and the size of A
The Huffman coding algorithm generates the prefix code are small.
C in Example 1 recursively as shown in Fig. 1. By
convention, the left and right branches emanating from
4. HUFFMAN CODING FOR AN INFINITE ALPHABET
an internal node in a binary tree are labeled 0 and 1,
respectively. Since the Huffman coding algorithm constructs a binary
It is worthwhile to point out that the Huffman codes tree using the bottom–up approach by successively
generated by the Huffman coding procedure are not merging the two least probable letters in the alphabet,
unique. Whenever there are two or more pairs of trees it cannot be applied directly to infinite alphabets. Let
having the least weights at their roots in step 2, one can A = {0, 1, 2, . . .}. Indeed, given an arbitrary distribution
combine any such pair of trees into a new tree, resulting in (p0 , p1 , p2 , . . .), the problem of constructing an optimal
a possibly different final tree. In Example 2, after merging prefix code C with respect to (p0 , p1 , p2 , . . .) is still open at
a4 and a5 into one letter, say, a4 , with probability 0.2, the writing of this article, even though it can be shown [6]
we can choose to merge a3 and a4 instead of merging that such an optimal prefix code exists.
a2 and a3 as in Fig. 1. Continuing the algorithm, we However, when the distribution is a geometric proba-
can get another prefix code C with the binary codeword bility distribution
set B = {0, 10, 110, 1110, 1111} in which the jth string
represents the codeword for aj in A, 1 ≤ j ≤ 5. It is easy pi = (1 − θ )θ i , i≥0 (3)
to verify that these both C and C have the same average
codeword length of 2.1 bits. This non-uniqueness will allow where 0 < θ < 1, a simple procedure does exist for
us later to develop adaptive Huffman coding algorithms. construction of the optimal prefix code C. In this case,
given each i ∈ A, one can even compute the codeword C(i)
3. ENTROPY AND PERFORMANCE without involving any tree manipulation; this property is
desirable in the case of infinite alphabets since there is no
The entropy of a probability distribution (p1 , p2 , . . . , pJ ) is way to store an infinite binary tree. The procedure was
defined as first observed by Golomb [7] in the case of θ k = 12 for some
integer k, and later extended by Gallager and Voorhis [8] to
J
H(p1 , p2 , . . . , pJ ) = − pj log pj general geometric distributions in (3). The corresponding
j=1 optimal prefix code is now called the Golomb code in the
case of θ k = 12 for some integer k, and the Gallager–Voorhis
where log stands for the logarithm with base 2 code in the general case.
and the entropy is measured in bits. The entropy The procedure used to construct the Gallager–Voorhis
represents the ultimate compression rate in bits per code for a geometrical probability distribution with the
letter one can possibly achieve with all possible data parameter θ is as follows:
compression schemes.
A figure of merit of a prefix code is its performance Step 1. Find the unique positive integer k such that
against the entropy. It can be shown [1] that if C
is a Huffman code with respect to the distribution θ k (1 + θ ) ≤ 1 < θ k−1 (1 + θ ) (4)
(p1 , p2 , . . . , pJ ), then its average codeword length r(C) in
bits per letter is within one bit of the entropy. This unique k exists because 0 < θ < 1.
Step 2. If k = 1, let Ck (0) denote the empty binary
Theorem 1. The average codeword length r(C) of a string. Otherwise, let k = 2n + m, where n and
Huffman code with respect to (p1 , p2 , . . . , pJ ) satisfies m are positive integers satisfying 0 ≤ m < 2n . For
H(p1 , p2 , . . . , pJ ) ≤ r(C) < H(p1 , p2 , . . . , pJ ) + 1 (2) any integer 0 ≤ j < 2n − m, let Ck (j) be the binary
representation of j padded with possible zeros
The upper bound in (2) is uniform and applies to to the left to ensure that the length of Ck (j)
every distribution. For some distributions, however, the is n. For 2n − m ≤ j < k, let Ck (j) be the binary
1020 HUFFMAN CODING
representation of j + 2n − m padded with possible dynamic Huffman coding algorithm. It was later improved
zeros to the left to ensure that the length of Ck (j) is by Knuth [10] and Vitter [11].
n + 1. The constructed code Ck is a prefix code for Suppose that the sequence x = x1 x2 · · · xn is drawn from
{0, 1, . . . , k − 1}. the alphabet A = {a1 , a2 , . . . , aJ } with 2 ≤ J < ∞. Before
Step 3. To encode an integer i ≥ 0, we find a pair of encoding x1 , it is reasonable to assume that all letters
nonnegative integers (s, j) such that i = sk + j and a ∈ A are equally likely since we have no knowledge about
0 ≤ j < k. The Gallager–Voorhis code encodes i into x other than the alphabet A. Maintain a counter c(aj ) for
a codeword Gk (i) consisting of s zeros followed by a each letter aj , 1 ≤ j ≤ J. All counters are initially set to 1.
single one and then by Ck (i). The initial probability distribution is
Example 3. Suppose that θ 3 = 12 . In this case, k = 3, and
c(a ) c(aJ )
1 c(a2 ) 1 1 1
C3 (0) = 0, C3 (1) = 10, C3 (2) = 11 J , , . . . , = , , . . . ,
J
J J J J
c(aj ) c(aj ) c(aj )
Table 1 illustrates Golomb (Gallager-Voorhis) codewords j=1 j=1 j=1
G3 (i) for integers i = 0, 1, 2, . . . , 11.
Pick an initial Huffman code C1 for A based on the
initial probability distribution. To encode x = x1 x2 · · · xn ,
5. ADAPTIVE HUFFMAN CODING the adaptive Huffman coding algorithm works as follows:
In previous sections, we have assumed that the probability Step 1. Use the Huffman code Ci to encode xi .
distribution is available and known to both the encoder Step 2. Increase c(xi ) by 1.
and decoder. In many practical applications, however, Step 3. Update Ci into a new prefix code Ci+1 , so
the distribution is unknown. Since Huffman coding needs that Ci+1 is a Huffman code for the new probability
a distribution to begin with, to encode a sequence distribution
x = x1 x2 · · · xn , one way to apply Huffman coding is to
use the following two-pass approach: c(a1 ) c(a2 ) c(aJ )
, ,...,
J
J
J
Pass 1. Read the sequence x to collect the frequency c(aj ) c(aj ) c(aj )
j=1 j=1 j=1
of each letter in the sequence. Use the frequencies
of all letters to estimate the probabilities of these
letters, and design a Huffman code based on Step 4. Repeat steps 1–3 for i = 1, 2, . . . , n until all
the estimated probability distribution. Send the letters in x are encoded.
estimated probability distribution or the designed
Huffman code to the user; It is clear that step 3 is the major step. To understand
Pass 2. Use the designed Huffman code to encode the it properly, let us first discuss the sibling property of a
sequence. complete binary tree.
This two-pass coding scheme is not desirable in applica- 5.1. Sibling Property
tions such as streaming procedures, where timely encoding Assign a positive weight wj to each letter aj , 1 ≤ j ≤ J. Let
of current letters is required. It would be nice to have a C be a complete prefix code for A. Let T be the binary
one-pass coding scheme in which we estimate the proba- tree corresponding to C. Label the nodes of T by an index
bilities of letters on the fly on the basis of the previously from {1, 2, . . . , 2J − 1}. Assign a weight to each node in T
encoded letters in x and adaptively choose a prefix code recursively; each terminal node carries the weight of the
based on the estimated probability distribution to encode corresponding letter in A, and each internal node carries a
the current letter. Faller [9] and Gallager [3] indepen- weight equal to the sum of the weights of its two children.
dently developed a one-pass coding algorithm called the A complete prefix code, C, is said to satisfy the sibling
adaptive Huffman coding algorithm, also known as the property with respect to weights w1 , w2 , . . . , wJ if the nodes
of T can be arranged in a sequence i1 , i2 , . . . , i2J−1 such that
(continuing the ‘‘properties’’ list from Section 2, above)
Table 1. Golomb Codewords G3 (i) for Integers
i = 0, 1, 2, . . . , 11
P5. w(i1 ) ≤ w(i2 ) ≤ · · · ≤ w(i2J−1 ), where w(ij ) denotes
Integer i G3 (i) Integer i G3 (i)
the weight of the node ij .
0 10 6 0010 P6. nodes i2j−1 and i2j are siblings for any 1 ≤ j ≤ J; and
1 110 7 00110 the parent of nodes i2j−1 and i2j does not precede
2 111 8 00111 nodes i2j−1 and i2j in the sequence.
3 010 9 00010
4 0110 10 000110
5 0111 11 000111 The sibling property is related to the Huffman coding
algorithm.
HUFFMAN CODING 1021
Example 4. We now revisit the Huffman code in leading from the terminal node corresponding to xi to
Example 2. It is not hard to see that the Huffman code the root.
shown in Fig. 1 satisfies the sibling property with respect
to weights w1 = 4, w2 = 2, w3 = 2, w4 = 1, and w5 = 1. Case 1. If for any 0 ≤ k < l
Example 5. Let us now increase the weight w2 in w(ijk ) < w(ijk +1 ) (5)
Example 4 by 1. Accordingly, the weight of each node
along the path from the terminal node corresponding to a2 then after the weights of nodes ijk , 0 ≤ k < l, are updated,
to the root increases by 1, as shown in Fig. 2. It is easy to Ci still satisfies the sibling property. The update of the
see that the prefix code in Fig. 2 is not a Huffman code for weights of nodes is due to the increment of c(xi ) by 1 after
the probability distribution xi is encoded by Ci . In this case, Ci is also a Huffman code
for the new probability distribution after c(xi ) increases by
1. Hence Ci+1 = Ci .
w w5
1 w2 4 3 2 1 1
5 , 5 ,..., 5 = , , , , Case 2. If the inequality (5) is not valid for some k, then
11 11 11 11 11
wj wj wj
we can obtain Ci+1 from Ci by exchanging some subtrees
rooted at nodes of equal weight. Let j0 be the largest
j=1 j=1 j=1
integer such that w(ij ) = w(ij0 ). Having jk defined, we let
0
Moreover, this prefix code does not satisfy the sibling jk+1 be the largest integer such that w(ij ) is equal to the
k+1
property with respect to weights 4, 3, 2, 1, and 1, either. weight of the parent of the node ij . If jk = 2J − 1, that is,
k
if the node ij is the root of the binary tree, the procedure
k
In general, the following theorem is implied by the terminates. Denote the maximum k by l . It is easy to see
Huffman coding procedure. that l < l. In this case, the following must be done:
Theorem 2. A complete prefix code C is a Huffman code Step 1. Exchange the subtree rooted at the node ij0
for the probability distribution with the subtree rooted at the node ij0 .
w1 w2 wJ Step 2. For k = 0, 1, . . . , l − 1, exchange (in the new
, ,..., binary tree resulting from the last exchange
J
J
J
wj wj wj operation) the subtree rooted at the parent of the
j=1 j=1 j=1 node ij with the subtree rooted at the node ij . (The
k k+1
two roots of the two subtrees are not exchanged since
if and only if C satisfies the sibling property with respect they have the same weight.)
to weights w1 , w2 , . . . , wJ . Step 3. Update the weight of each node along the path
from node ij0 to the root.
Update Ci into Ci+1 : Note that Ci is a Huffman code for
the current probability distribution The final binary tree satisfies properties P5 and P6,
and gives rise to the Huffman code Ci+1 .
c(a1 ) c(a2 ) c(aJ )
, ,...,
J
J
J
Example 6. Suppose that a2 is the current letter to be
c(aj ) c(aj ) c(aj )
encoded by the prefix code in Fig. 2. Denote the code by Ci .
j=1 j=1 j=1
In Example 5, we know that Ci does not satisfy the sibling
Think of c(aj ) as the weight of aj , that is, wj = c(aj ), property with respect to weights 4, 3, 2, 1, and 1. Exchange
1 ≤ j ≤ J. Let the nodes of the binary tree corresponding the subtree rooted at node i4 with the subtree rooted at
to Ci be arranged in a sequence i1 , i2 , . . . , i2J−1 such that node i5 , and update relevant weights accordingly. We get
properties P5 and P6 are satisfied. Let j0 , j1 , . . . , jl be a the two binary trees shown in Fig. 3. The binary tree on
sequence such that ij0 , ij1 , . . . , ijl is the sequence of nodes the right-hand side of Fig. 3 gives rise to a Huffman ! code
4 3 2 1 1
for the probability distribution 11 , 11 , 11 , 11 , 11 after the
current letter a2 is encoded by Ci .
10 i9 11 i9
distribution (p1 , . . . , pJ ). Then with probability one, Ci con-
verges and for sufficiently large i, Ci itself is a Huffman
code for the true distribution (p1 , . . . , pJ ).
4 i8 6 i7 4 i8 7 i7
a1 a1
4 i6 2 i5 4 i6 3 i5 6. APPLICATIONS
a2 a2
The years since 1977 have witnessed widespread appli-
2 i4 2 i3 2 i4 2 i3
cations of Huffman coding in data compression, in sharp
a3 a3
contrast with the first 25 years since the appearance of
1 i2 1 i1 1 i2 1 i1 Huffman’s groundbreaking paper [2]. This phenomenon
a4 a5 a4 a5 can be explained by the increasing affordability of comput-
ing power and the growing demand of data compression
Figure 3. Exchange of subtrees and update of relevant weights. to save transmission time and/or storage space. Nowa-
days, Huffman coding competes with state-of-the-art cod-
ing schemes such as arithmetic coding and Lempel–Ziv
By exchanging subtrees and updating all relevant weights, coding in applications requiring data compression. Since
we get the Huffman code C2 in Fig. 4. Encode the second each letter in a data sequence to be compressed must
letter a1 by C2 , and then increase the counter c(a1 ) by 1. be encoded into an integer number of bits, the compres-
C2 is updated into C3 in Fig. 4 by exchanging subtrees sion performance of Huffman coding is often worse than
rooted at nodes i6 and i8 , and increasing all relevant that of arithmetic coding or Lempel–Ziv coding. How-
weights by 1. Encode the third letter a2 by C3 , and then ever, in practical applications, the selection of a data
increase the counter c(a2 ) by 1. C3 is updated into C4 in compression scheme is based not only on its compression
Fig. 4 by exchanging subtrees rooted at nodes i2 and i5 , performance but also on its computational speed and mem-
and updating relevant weights. Finally, encode the fourth ory requirement. Since a fixed Huffman code can be easily
letter a3 by C4 . The codeword sequence is implemented as a lookup table in practice, Huffman coding
has the advantages of real-time computational speed and
the need of only a fixed amount of memory. Thus, Huffman
C1 (a1 )C2 (a1 )C3 (a2 )C4 (a3 ) = 00011001110
coding remains a popular choice in time-critical applica-
tions or applications in which computing resources such
A very interesting fact about the adaptive Huffman as memory or computing power are limited. Some typical
coding algorithm is that as i is large enough, the applications of Huffman coding in telecommunications are
adaptive prefix code Ci converges and is indeed a Huffman described in the following paragraphs.
code for the true distribution. This is expressed in the
following theorem. 6.1. Facsimile Compression
One of the earliest applications of Huffman coding in
Theorem 3. Apply the adaptive Huffman coding algo- telecommunications is in facsimile compression. In 1980,
rithm to encode a stationary ergodic source X1 X2 · · · Xn · · · the CCITT Group 3 digital facsimile standard was put
taking values from the finite alphabet A with a common into force by the Consultative Committee on International
6 i11 7 i11
4 i10 2 i9 4 i10 3 i9
2 i7 2 i8 1 i5 1 i6 2 i7 2 i8 1 i5 2 i6
a5 a6 a5 a1
1 i1 1 i2 1 i3 1 i4 1 i1 1 i2 1 i3 1 i4
a1 a2 a3 a4 a6 a2 a3 a4
C1 C2
8 i11 9 i11
5 i10 3 i9 5 i10 4 i9
2 i7 3 i8 1 i5 2 i6 2 i7 3 i8 2 i5 2 i6
a1 a5 a1 a2
1 i1 1 i2 1 i3 1 i4 1 i1 1 i2 1 i3 1 i4
a6 a2 a3 a4 a6 a5 a3 a4
Figure 4. The Huffman codes C1 , C2 , C3 , and
C3 C4
C4 in Example 6.
HUFFMAN CODING 1023
7. S. W. Golomb, Run-length encodings, IEEE Trans. Inform. 10. D. E. Knuth, Dynamic Huffman coding, J. Algorithms 6:
Theory IT-12: 399–401 (1966). 163–180 (1985).
8. R. G. Gallager and D. C. Van Voorhis, Optimal source codes 11. J. S. Vitter, Design and analysis of dynamic Huffman codes,
for geometrically distributed integer alphabets, IEEE Trans. J. Assoc. Comput. Mach. 34: 825–845 (1987).
Inform. Theory IT-21: 228–230 (1975).
12. M. Weinberger, G. Seroussi, and G. Sapiro, The LOCO-
9. N. Faller, An adaptive system for data compression, Record I lossless image compression algorithm: Principles and
7th Asilomar Conf. Circuits, Systems and Computers, 1973, standardization into JPEG-LS, IEEE Trans. Image Process.
pp. 593–597. 9: 1309–1324 (2000).
I
IMAGE AND VIDEO CODING transmission time from the output of the encoder to the
input of the decoder. Delay is a critical parameter for
SHIPENG LI two-way communications. For such applications, delay
Microsoft Research Asia is the fourth dimension in image and video coding, and
Beijing, P. R. China the objective is to have the delay lower than a given
threshold. Therefore, in general, the problem space has
WEIPING LI
four dimensions, namely, rate, distortion, complexity, and
WebCast Technologies, Inc.
Sunnyvale, California
delay. The overall problem is to minimize both rate and
distortion under a constraint of complexity and possibly
another constraint of delay.
1. INTRODUCTION This article is organized as follows. In Section 2, some
basic concepts of image and video signals are presented.
Image and video coding has been a very active research Its goal is to establish a good basis for the characteristics
area for a long time. While transmission bandwidth and of the source in image and video coding. Section 3 reviews
storage capacity have been growing dramatically, the some basic principles and techniques of image and video
demand for better image and video coding technology coding. This is to present some components in an image
has also been growing. The reason is the ever-increasing or video coding system. Section 4 is devoted to the
demand for higher quality of images and video, which existing and emerging image and video coding standards
requires ever-increasing quantities of data to be transmit- that exemplify the usage of many basic principles and
ted and/or stored. techniques. Because image and video coding is a part
There are many different types of image and video of a communication system, standards are extremely
coding techniques available. The term coding used to important for practical applications. Section 5 concludes
solely refer to compression of image and video signals. with some thoughts on the possible future research in
However, in recent years it is generalized more toward image and video coding.
representation of image and video data that provides not
only compression but also other functionalities. In this
article, we still focus on discussions in the traditional 2. BASIC CONCEPTS OF IMAGE AND VIDEO SIGNALS
coding sense (i.e., compression). We briefly touch on
the topic of coding with different functionalities at Before discussing image and video coding, some basic
the end. concepts about the characteristics of image and video
To better understand the details of different image and signals are briefly presented in this section. Images and
video coding techniques and standards, many of which video are multidimensional signals. A grayscale image
seem to be more art than science, a good understanding can be considered as a function of two variables f (x, y)
of the general problem of space of image and video where x and y are the horizontal and vertical coordinates,
coding is extremely important. The basic problem is to respectively, of a two-dimensional plane, and f (x, y) is
reduce the data rate required for representing images the intensity of brightness at the point with coordinates
and video as much as possible. In compressing image (x, y). In most practical cases, the coordinates x and y are
and video data, often some distortion is introduced so sampled into discrete values so that a grayscale image
that the received image and video signals may not be is represented by a two-dimensional array of intensity
exactly the same as the original. Therefore, the second values. Each of the array elements is often called a
objective is to have as little distortion as possible. In picture element or pixel. By adding a new dimension
theoretic source coding, data rate and distortion are in time, an image sequence is usually represented by a
the only two dimensions to be considered, and the time sequence of two-dimensional spatial intensity arrays
objective is to have both rate and distortion as small (images) or, equivalently, a three-dimensional spacetime
as possible. Image and video coding is a type of source array. Figure 1 illustrates the concepts of image and
coding that also requires other practical considerations. image sequence. A video signal is a special type of image
One of the practical concerns is the complexity of a sequence. It is different from a film, which is another type
coding technique, which is further divided into encoding of image sequence. In addition to such a simple description,
complexity and decoding complexity. Therefore, in image more aspects of image and video are presented in the
and video coding, complexity is the third dimension to following subsections.
be considered, and the additional objective is to have
2.1. Imaging
complexity lower than a given threshold. Yet another
practical concern is the delay (or latency) of a coding The term imaging usually refers to the process of
technique, which is the time between when an image or generating an image by a certain physical means. Images
a video frame is available at the input of the encoder and video may come from a rich variety of sources,
and when the reconstructed image or video frame is from natural photographs to all sorts of medical images,
available at the output of the decoder, excluding the from microscopy to meteorology, not necessarily directly
1025
1026 IMAGE AND VIDEO CODING
3
e) n
m io
(ti ns
e
im
D
Digital Digital
Dimension 2
image Pixel
Dimension 2 video
(vertical)
(vertical) sequence
YUV 4:4:4 sampling scheme
(b)
Dimension 1 Dimension 1
(horizontal) (horizontal)
is frequently represented with 24 bits per pixel, with 8 is reduced. This is possible because interlaced scanning
bits for each color component, which is commonly called refreshes every other line at each frame refresh and
a true color image. For an image with a YUV 4 : 2 : 2 the full vertical resolution is covered in two refresh
color space and 8 bits per color component, there are periods. However, the interlaced scan of more than two
4∗ 8 + 2∗ 8 + 2∗ 8 = 64 bits for 4 pixels and equivalently 16 lines would result in noticeable reduction of the vertical
bits per pixel, which is commonly called a 16-bit color resolution. The subframes formed by all the even or
image. Similarly, an image with a YUV 4 : 2 : 0 color space odd scan lines are called fields. Correspondingly, fields
and 8 bits per color component is commonly called a 12-bit can be classified into even fields and odd fields, or
color image. top fields and bottom fields according to their relative
vertical positions. The top and bottom fields are sent
2.5. Video Scanning and Frame Rate alternately to an interlaced monitor at a field rate
equal to the refresh rate. Figure 3 (a) and (b) depict
Besides these commonalities with image signals, video the progressive and interlaced video scanning processes,
signals have some special features. Unlike an image respectively. A video signal of either scan type can be
sequence obtained from film that is a time sequence of digitized naturally by sampling horizontally along the
two-dimensional arrays, a video signal is actually a one- scan line, which results in a single rectangular frame
dimensional function of time. A video signal is not only for progressive scan and two interlaced fields (in one
sampled along the time axis, but also sampled along one frame) for interlaced scan. Figure 3 (c) and (d) illustrate
space dimension (vertical). Such a sampling process is such digitized progressive and interlaced video frames,
called scanning. The result is a series of time samples, respectively.
or frames, each of which is composed of space samples,
or scan lines. Therefore, video is a one-dimensional
2.7. Uncompressed Digital Video
analog signal over time. There are two types of video
scanning: progressive scanning and interlaced scanning. With the image and video signals in digital format, we
A progressive scan traces a complete frame, line by line can process, store, and transmit them digitally using
from top to bottom, at a high refresh rate (>50 frames per computers and computer networks. However, the high
second to avoid flickering). For example, video displayed volume nature of image and video data is prohibitive
on most computer monitors uses progressive scanning. It to many applications without efficient compression.
is well known that the frame rate of a film is 24 frames For example, considering a 2-hour video with spatial
per second because the human brain interprets a film at resolution of 720 × 480 pixels per frame, YUV 4 : 2 : 2
24 or more frames per second as ‘‘continuous’’ without a format, 30 frames per second frame rate, and each
‘‘gap’’ between any two frames. A major difference between color component quantized to 8 bits (1 byte), the total
a (scanned) video signal and a film is that all pixels in a uncompressed data size of such a video sequence is
film frame are illuminated at the same time for the same 2 × 60 × 60 × 30 × 720 × 480 × 2 = 149 Gbytes with a bit
period of time, and the pixels at the upper left corner rate of 30 × 720 × 480 × 2 × 8 = 165 Mbps, and it is
and the lower right corner of a frame of the (scanned) still nowhere near the cinema quality. This is a huge
video signal are illuminated at different times due to the amount of data for computers and networks with
time period for scanning from the upper left corner to the today’s technology.
lower right corner. This is why the frame rate of a video
signal must be more than 50 frames per second to avoid 2.8. Redundancy and Irrelevancy
flickering, while 24 frames per second is a sufficient rate
for film. Fortunately, image and video data contain a great deal
of redundancy and irrelevancy that can be reduced
or removed. Redundancy refers to the redundant or
2.6. Interlaced Scanning
duplicated information in an image or video signal
A TV signal is a good example of interlaced (scan) and can generally be classified as spatial redundancy
video. Historically, interlaced video format was invented (correlation between neighboring pixel values), spectral
to achieve a good balance between signal bandwidth, redundancy (correlation between different color planes
flickering, and vertical resolution. As discussed in the or spectral bands), temporal redundancy (correlation
previous subsection, the refresh rate for a video frame between adjacent frames in a sequence of images),
must be more than 50 times per second. The number of statistical redundancy (nonuniform distribution of image
scan lines per frame determines the vertical resolution and video pixel value), and so on. Irrelevancy refers
of a video frame. For a given refresh rate, the number to the part of the signal that is not noticeable
of scan lines determines the bandwidth of the video by the receiver (e.g., the human visual system, or
signal, because, within the same time period of scanning HVS). Image and video compression research aims at
one frame, more scan lines result in a faster change of reducing the number of bits needed to represent an
intensity (i.e., higher bandwidth). To reduce the video image or video signal by removing the redundancy
signal bandwidth while maintaining the same refresh and irrelevancy as much as possible, with or without
rate to avoid flickering, one must reduce the number noticeable difference to the human eye. Of course, with
of scan lines in one refresh period. The advantage of the rapid developments in high-capacity computers and
using interlaced scanning is that the vertical resolution high-speed computer networks, the tractable data size
is not noticeably reduced while the number of scan lines and bit rates will be increased significantly. However,
1028 IMAGE AND VIDEO CODING
(a) (b)
(c) (d)
A sampled frame in progressive video A sampled frame (two fields) in interlaced video
the research and development of advanced algorithms Ultimately, the quality of a compressed image or video
for digital image and video compression continue to bit stream should be judged subjectively by human eyes.
be necessary in the context of providing a novel and However, it is normally very costly and time-consuming
rich multimedia experience with much improved quality to perform a formal subjective test. Although subjective
more efficiently, more flexibly, more robustly, and more quality evaluation is still a necessary step for formal
ubiquitously. tests in various image and video coding standardization
processes, many objective measurements can also be
used as rough quality indicators. For example, it can
2.9. Evaluation of Image and Video Coding Schemes
be measured by the distortion of the decoded image with
In practice, we face many choices of various image and reference to the original image under different criteria,
video coding schemes. Even for the same coding scheme, and a particular criterion among them is the mean squared
we have the choice of different sets of parameters. It error defined as follows:
is important to first establish measures for quality and
performance before any attempt to select the ones that [(r(i, j) − o(i, j)]2
best fit our needs. Some basic measurements commonly i,j
D= (3)
used in image and video compression are explained N
as follows.
The effectiveness of an image or video compression where r(i, j) and o(i, j) are the reconstructed and original
session is normally measured by the bit rate of the image intensities of pixel position (i, j), respectively, and
compressed bit stream generated, which is the average N is the number of pixels in the image, and summation is
number of bits representing the compressed image or carried out for all pixels in an image. Another commonly
video signal, with average number of bits per pixel (bpp) used measurement is PSNR (peak signal to noise ratio)
for images and average number of bits per second (bps) for defined as follows:
video. It can also be measured by the compression ratio
defined as follows: P2
PSNR = 10 log10 (4)
D
Compression Ratio
where D is the mean squared error calculated in Eq. (3)
total size in bits of the original image or video
= and P is the maximum possible pixel value of the original
total size in bits of the compressed bitstream image; for example, if the intensities of the original images
bit rate of the original image or video are represented by 8-bit integers, then the peak value
= (2) would be P = 28 − 1 = 255.
bit rate of the compressed bitstream
IMAGE AND VIDEO CODING 1029
The performance or coding efficiency of an image or yet to be found. Many practical coding systems still need
video coding scheme is best illustrated in a rate-distortion close supervision by so-called compressionists.
or bit rate versus PSNR curve, where the encoder would Lossy compression provides a significant reduction in
encode the same image or video signal at a few bit rates bit rate that enables a multitude of real-time applications
and the decoded quality is measured using the above- involving processing, storing, and transmission of audio-
defined criteria. This makes it a very intuitive tool for visual information, such as digital camera, multimedia
evaluating various coding schemes. web, video conferencing, digital TV, and so on. Most image
and video coding standards, such as JPEG, JPEG2000,
3. Basic Principles and Techniques for Image and Video MPEG-1, MPGE-2, MPEG-4, H.26x, and the like, are all
Coding examples of lossy compression.
Although a higher compression ratio can be achieved
Image and video coding techniques can be classified into with lossy compression, there exist several applications
two categories: lossless compression and lossy compres- that require lossless coding, such as digital medical
sion. In lossless compression, video data can be identically imagery and facsimile. A couple of lossless coding stan-
recovered (decompressed) both quantitatively (numeri- dards have been developed for lossless compression, such
cally) and qualitatively (visually) from a compressed bit as lossless JPEG, JPEG-LS, ITU Tele-Fax standards, and
stream. Lossless compression tries to represent the video the JBIG. Furthermore, lossless compression components
data with the smallest possible number of bits without generally can be used in lossy compression to further
loss of any information. Lossless compression works by reduce the redundancy of the signal being compressed.
removing the redundancy present in video data informa-
tion that, if removed, can be recreated from the remaining 3.1. A Generic Model for Image and Video Coding
data. Although lossless compression preserves exactly the
accuracy of image or video representation, it typically To help better understand the basic ideas behind different
offers a relatively small compression ratio, normally a image and video coding schemes. Figure 4 illustrates the
factor of 2 or 3. Moreover, the compression ratio is very components and their relationship in a typical image and
dependent on the input data, and there is no guarantee video coding system. Depending on the target applications,
that a given output bit rate can always be achieved. not all these components may appear in every coding
By allowing a certain amount of distortion or informa- scheme. In Figure 4, the encoder takes as input an image
tion loss, a much higher compression ratio can be achieved. or video sequence and generates as output a compressed
In lossy compression, once compressed, the original data bit stream. The decoder, conversely, takes as input the
cannot be identically reconstructed. The reconstructed compressed bit stream and reconstructs an image or
data are similar to the original but not identical. Lossy video sequence that may or may not be the same as the
compression attempts to achieve the best possible fidelity original input. The components in Fig. 4 are explained in
given an available bit-rate capacity, or to minimize the this subsection.
number of bits representing the image or video signal An image or video coding scheme normally starts
subject to some allowable loss of information. Lossy com- with certain preprocessing of the input image or video
pression may take advantage of the human visual system signal, such as denoising, color space conversion, or
that is insensitive to certain distortion in image and video spatial resolution conversion to better serve the target
data and enables rate control in the compressed data applications or to improve the coding efficiency for the
stream. As a special case of lossy compression, percep- subsequent compression stages. Normally, preprocessing
tually lossless or near lossless coding methods attempt is a lossy process.
to remove redundant, as well as perceptually irrelevant, The preprocessed visual signal is passed to a reversible
information so that the original and the decoded images (one-to-one) transformation stage, where the visual data
may be visually but not numerically identical. Unfortu- are transformed into a form that can be more efficiently
nately, the measure of perceptive quality of images or video compressed. After the transformation, the correlation
is a rather complex one, especially for video. Many efforts (interdependency, redundancy) among transformed coef-
have been devoted to derive an objective measurement by ficients is reduced, the statistical distribution of the
modeling the human visual system, and an effective one is coefficients can be shaped to make subsequent entropy
Encoder
Image &
video input Pre- Lossless
Transformation Quantization
processing coding
Channel
To display
Post- Inverse De- Lossless
processing transformation quantization decoding
coding more efficient, and/or most energy is packed into either mapped into a single symbol or formed as a context
only a few coefficients or subband regions so that the to address different probability distribution tables for
majority of the transformed coefficients are zeros. Depend- one or more of the coefficients to be entropy coded. In
ing on the transform type and the arithmetic precision, either case, the correlation between the coefficients can
even if this step is not lossless, it is close to it. Typical be exploited to result in higher compression ratios by the
transformations include differential or predictive mapping entropy coder that follows. The entropy coding part tries
(spatially or temporally); unitary transforms such as the to generate the shortest binary bit stream by assigning
discrete cosine transform (DCT); subband decomposition binary codes to the symbols according to their frequency
such as wavelet transform; and adaptive transforms such of occurrence. Entropy coding can usually be achieved
as adaptive DCT, wavelet packets, fractals, and so on. by either statistical schemes or dictionary-based schemes.
Modern compression systems normally use a combina- In a statistical scheme, fixed-length symbols are coded
tion of these techniques and allow different modes in the using variable-length codewords (VLC), where the shorter
transformation stage to decompose the image or video sig- codewords are assigned to the symbols that occur more
nal adaptively. For example, there are intramodes and frequently (e.g., Huffman coders and arithmetic coders).
intermodes in MPEG video coding standards. Alternatively, in a dictionary-based scheme, variable-
The transformed signal is fed into a quantization length strings of symbols are coded using fixed-length
module where the coefficients are mapped into a smaller binary codewords, where a priori knowledge is not required
set of discrete values, thus requiring fewer bits to (e.g., Lempel Ziv coder).
represent these coefficients. Because quantization causes After the lossless data compression stage, the image
loss of information, it is a lossy process. Quantization is or video data are encoded into a bit stream that can
actually where a significant amount of data reduction can be either directly sent to, or wrapped with some system
be attained. Rate control in lossy compression can be easily layer information such as synchronization information
achieved by adjusting the quantization step size. There are or encryption, and then sent to a channel that can be
two main types of quantization: scalar quantization (SQ), either a storage device, a dedicated link, an IP network,
where each coefficient is quantized independently, and or a processing unit. The decoder receives the bit stream
vector quantization (VQ), where several coefficients are from the channel with or without possible corruptions and
grouped together and quantized jointly. starts to decode the received bit stream. The decoding is
In some cases, especially in predictive coding, the just the reverse process of the encoding except for the
differential or predictive operation and the quantization postprocessing part.
may work iteratively in a feedback loop to prevent error Applications of image and video coding may be classified
propagation, because otherwise the original reference into symmetric or asymmetric. For example, video
could not be reconstructed perfectly in the decoder after conferencing applications are symmetric because both
quantization. ends of the communication have the same requirements.
Although not explicitly shown in Fig. 4, in practical In video streaming applications, where the same video
lossy compression systems, there is always an optimization content can be preencoded and stored in a server and
process that is closely related to rate control. The goal accessed by many users, encoding may be much more
of this process is to minimize the bit rate given a complicated than decoding, because encoding is only
certain quality constraint, or to maximize the decoded required to be performed once but decoding is required by
quality given a certain bit rate budget, by adjust many users for many times. With more allowable encoding
encoding parameters such as transformation or prediction complexity, asymmetric algorithms usually lead to much
modes, quantization parameters, and bit allocation among better coding efficiency than the symmetric ones.
different portions of the image or video signal. As a For predictive video coding, since previous decoded
matter of fact, many compression standards specify only frames are normally needed as prediction references, the
the decoding process and leave the encoding process, encoder would include the major parts of a decoder in it.
especially the optimization part, open to the implementers In this sense, modules designed for decoder are also parts
as long as they produce a bit stream that can be of the encoder, and to prevent drifting errors, the decoder
decoded by a compliant decoder. This allows the encoding must also follow the same procedures as in the decoding
algorithms to be improved over time, and yet compliant loop of the encoder.
decoders will continue to be able to decode them. The There are many reasons for postprocessing. For
compression process rarely just stops after quantization. example, if the decoded video format does not match
The redundancy of the quantized coefficients can be the display, a scaling of spatial resolution or temporal
further removed by some generic or specific lossless data frame rate, or a de-interlacing operation will be used for
compression schemes. Lossless data compression normally the conversion. If the bit-rate is so low that there are
involves two parts, symbol formation and entropy coding. many artifacts in the decoded image or video, some de-
The symbol formation part tries to convert data or blocking or de-ringing operations will be used to reduce
coefficients into symbols that can be more efficiently these artifacts. If there are channel errors or packet losses,
encoded by entropy coding. Such a mapping can be there would be some distortions in the decoded image or
achieved through, for example, coefficients partitioning, video, then an error concealment operation is required to
or run length coding (RLC). Image or video coefficients repair the image or video to the best possible extent. If
can be partitioned and grouped into data blocks based there are multiple image or video objects that need to be
on their potential correlation. Such data blocks can be presented on the same display, a composition operation
IMAGE AND VIDEO CODING 1031
will be used to generate a meaningful scene to present where shorter code words are assigned to frequently
to the user. Moreover, it has been shown that in-loop occurring symbols, and vice versa. According to Shannon’s
filtering of the decoded frames can improve the efficiency noiseless source encoding theorem, the entropy H(U0 )
of predictive video coding significantly at low bit rates. is the lower bound for the average word length of a
Although the in-loop filtering results may or may not be uniquely decodable variable-length code for the symbols.
output directly to the display, we regard such filtering as Conversely, the average word length can approach H(U0 )
a kind of postprocessing as well. Postprocessing is a stage if sufficiently large blocks of symbols are encoded jointly.
that prepares the decoded image or video data into a more
favorable form for improving visual quality or subsequent 3.2.1. Huffman Coding. One particular set of uniquely
coding efficiency. decodable codes are called prefix codes. In such a code,
As mentioned before, not all stages would appear one code word cannot be the prefix of another one. In 1952
in every image or video coder. For example, a lossless Huffman proposed an algorithm for constructing optimal
coder normally does not contain lossy stages, such as, variable-length prefix codes with minimum redundancy
preprocessing and quantization. for memoryless sources. This method remains the most
So far, we only provided a generic overview of commonly used today, for example, in the JPEG and
what components are involved in an image and video MPEG compression standards.
compression system. In the following subsections, we The construction of a Huffman code is as follows:
present common principles and basic techniques of image
and video coding in some details. 1. Pick the two symbols in the alphabet with lowest
Although color components in images or video can be probabilities and merge them into a new combined
coded jointly as a vector based signal to achieve improved symbol. This generates a new alphabet with one
compression efficiency, practical image and video coding less symbol. Assign ‘‘0’’ and ‘‘1’’ to the two branches
systems normally choose to compress them independently linking the two original symbols to the new combined
due to its simplicity. From now on in this article, unless one, respectively.
especially noted, the algorithms and schemes described 2. Calculate the probability of the combined symbol
are for a single color component. by adding up the probabilities of the two origi-
nal symbols.
3.2. Entropy Coding
3. If the new alphabet contains more than one symbol,
Entropy is a measure of disorder, or uncertainty. In repeat steps 1 and 2 for the new alphabet. Otherwise,
information systems, the degree of unpredictability of a the last combined symbol becomes the Huffman
message can be used as a measure of the information tree root.
carried by the message. In 1948, Shannon defined the 4. For each original symbol in the alphabet, traverse all
information conveyed by an event I(E), measured in bits, the branches from the root and append the assigned
in terms of the probability of the event P(E), ‘‘0’’ or ‘‘1’’ for each branch along the way to generate
a Huffman codeword for the original symbol.
I(E) = − log2 (P(E)). (5)
We now have a Huffman code for each member of the
The physical meaning of the above definition is not hard to alphabet. Huffman codes are prefix codes and each of
understand. The higher the probability of an event (i.e., the them is uniquely decodable. A Huffman code is not unique.
more predictable event), the less information is conveyed For each symbol set, there exist several possible Huffman
by that event when it happens. Moreover, information codes with equal efficiency. It can be shown that it is not
conveyed by a particular sequence of independent events possible to generate a code that is both uniquely decodable
is the sum of the information conveyed by each event in and more efficient than a Huffman code [16]. However,
the sequence, whereas the probability of the sequence if the probability distribution somehow changes, such
is the product of the individual probabilities of the preconstructed codes would be less efficient and sometimes
events in the sequence, which is exactly what the log would even bring expansion. Moreover, Huffman codes are
function reflects. A discrete memoryless source (DMS) most efficient for data sources with nonuniform probability
generates symbols from a known set of alphabet symbols distributions. Sometimes, we may have to manipulate the
one at a time. It is memoryless since the probability of data so as to achieve such a distribution.
any symbol being generated is independent of the past In many cases, the data source contains a large alphabet
history. Assume a DMS U0 with alphabet {a0 , a1 , . . . , aK−1 } but with only a few frequent symbols. Huffman codes
and probabilities {P(a0 ), P(a1 ), . . . , P(aK−1 )}, the entropy constructed for such a source would require a very large
of such an information source is defined as the average code table and it would be very difficult to adapt to any
amount of information conveyed by each symbol output by probability distribution variations. An alternative method
the source, used in many image and video coding systems is to
group all infrequent symbols as one composite symbol
K−1
H(U0 ) = − P(ai ) log2 (P(ai )). (6) and construct a Huffman table for the reduced alphabet.
k=0
The composite symbol is assigned a special escape code
used to signal that it is followed by a fixed-length index of
Generally, a source with nonuniform distribution can be one of the infrequent symbols in the composite group. Only
represented or compressed using a variable-length code a very small code table is used and the statistics of the vast
1032 IMAGE AND VIDEO CODING
majority of infrequent symbols in the alphabet is shielded ideal code word length. Any number within that interval
by the composite symbol. This greatly simplifies the can now be used as the code for the string (usually the
Huffman code while maintaining good coding efficiency. one with the smallest number of digits is chosen). In
the encoding process, there is no assumption that the
3.2.2. Arithmetic Coding. Huffman codes and deriva- probability distribution would stay the same at all time.
tives can provide efficient coding for many sources. How- Therefore, by nature the arithmetic encoding can well
ever, the Huffman coding schemes cannot optimally adapt adapt to the changing statistics of the input and this
to given symbol probabilities because they encode each is a significant advantage over Huffman coding. From
input symbol separately with an integer number of bits. an information theory point of view, arithmetic coding is
It is optimal only for a ‘‘quantized’’ version of the orig- better than Huffman coding; it can generate fractional
inal probability distribution so the average code length bits for a symbol, and the total length of the encoded data
is always close to but seldom reaches the entropy of the stream is minimal. There are also many implementation
source. Moreover, no code is shorter than 1 bit, so it is not issues associated with arithmetic coding, for example,
efficient for an alphabet with highly skewed probability limited numerical precision, a marker for the end of a
distribution. Furthermore, there are no easy methods to string, multiplication free algorithm, and so on. For a
make Huffman coding adapt to changing statistics. detailed discussion of arithmetic coding, please see [17]. In
On the other hand, an arithmetic encoder computes a practice, arithmetic and Huffman coding often offer similar
code representing the entire sequence of symbols (called average compression rates while arithmetic coding is a
a string) rather than encodes each symbol separately. little better (ranging from 0 to 10% less bits). Arithmetic
Coding is performed by representing the string by a coding is an option for many image and video coding
subinterval through a sequence of divisions of an initial standards, such as JPEG, MPEG, H.26x, and so forth.
interval according to the probability of each symbol to
be encoded. 3.2.3. Lempel–Ziv Coding. Huffman coding and arith-
Assume that a string of N symbols, S = {s0 , s1 , . . . , st , metic coding require a priori knowledge of the probabilities
. . . , sN−1 } are from an alphabet with K symbols or an accurate statistical model of the source which in some
{a0 , a1 , . . . , ai , . . . , aK−1 } with a probability distribution cases is difficult to obtain, especially with mixed data
that is varied with time t as P(ai , t). Then the arith- types. Conversely, Lempel–Ziv (LZ) coding developed by
metic encoding process of such a string can be described Ziv and Lempel [18] does not need an explicit model of the
as follows: source statistics. It is a dictionary-based universal coding
that can dynamically adapt to any sources.
1. Set the initial interval [b, e) to the unit interval [0,1) In LZ coding, the code table (dictionary) of variable-
and t = 0. length symbol strings is constructed dynamically. Fixed-
2. Divide the interval [b, e) into K subintervals length binary codewords are assigned to the variable-
proportional to the probability distribution P(ai , t) length input symbol strings by indexing into the code
for each symbol ai at time t, that is, table. The basic idea is always to encode a symbol string
that the encoder has encountered before as a whole. The
i−1 longest symbol string the encoder has not seen so far is
bi = b + (e − b) P(aj , t) and added as a new entry in the dictionary, and will in turn be
j=0 used to encode all future occurrences of the same string.
i At any time, the dictionary contains all the substrings
ei = b + (e − b) P(aj , t). (prefixes) the encoder has already seen. With the initial
j=0 code table and the indices received, the decoder can also
dynamically reconstruct the same dictionary without any
3. Pick up the subinterval corresponding to symbol st , overhead information.
say st = ai , update [b, e) with [bi , ei ) and t = t + 1. A popular implementation of LZ coding is the Lem-
4. Repeat step 2 and 3 until t = N. Then output a binary pel–Ziv–Welch (LZW) algorithm developed by Welch [20].
arithmetic code that can uniquely identify the final Let A be the source alphabet consisting K symbols
interval selected. {ak , k = 0, . . . , K − 1}. The LZW algorithm can be described
as follows,
Disregarding the numerical precision issue, the width of
the final subinterval is equal to the probability of the 1. Initialize the first K entries of the dictionary with
N−1
each symbol ak from A and set the scan string w
string P(S) = P(st , t). It can be shown that the final to empty.
t=0
subinterval of width P(S) is guaranteed to contain one 2. Input the next symbol S and concatenate it with w
number that can be represented by B binary digits, with to form a new string wS.
3. If wS has a matching entry in the dictionary, update
− log2 (P(S)) + 1 ≤ B < − log2 (P(S)) + 2, (7) the scan string w with wS, and go to 2. Otherwise,
add wS as a new entry in the dictionary, output the
which means that the subinterval can be represented index of the entry matching w update the scan string
by a number which needs 1 to 2 bits more than the w with S and go to 2.
IMAGE AND VIDEO CODING 1033
4. When the end of the input sequence is reached, where ZT represents the state of the Markov source at
process the scan string w from left to right, output time T The conditional entropy of such an Nth-order of
the indices of entries in the dictionary that match Markov source is given by,
with the longest possible substrings of w.
H(UT , ZT ) = H(UT | UT−1 , UT−2 , . . . , UT−N )
If the maximum dictionary size is M entries, the length
of the codewords would be log2 (M) rounded to the next = E(− log2 (P(uT | uT−1 , uT−2 , . . . uT−N )))
smallest integer. The larger the dictionary is, the better
the compression. In practice the size of the dictionary =− ··· P(uT , uT−1 , uT−2 , . . . uT−N )
uT uT−N
is a trade-off between speed and compression ratio. It
can be shown that LZ coding asymptotically approaches × log2 (P(uT | uT−1 , uT−2 , . . . uT−N ))
the source entropy rate for very long sequences [19]. For
short sequences, however, LZ codes are not very efficient = ··· P(uT−1 , uT−2 , . . . uT−N )
uT−1 uT−N
because of their adaptive nature. LZ coding is used in
the UNIX compress utility and in many other modern file × H(UT | uT−1 , uT−2 , . . . uT−N ). (11)
compression programs.
3.3. Markov Sources Moreover, it can be shown that,
Though some sources are indeed memoryless, many
others, for example image and video data where the
T
probability distribution of values for one symbol can be H(UT , UT−1 , . . . , UT0 ) = H(Ut , Zt ). (12)
very dependent on one or more previous values, are sources t=T0
with memory. A source with memory can be modeled as a
Markov source. If a symbol from a source is dependent on From the above equation, we can clearly see that
N previous value(s), the source is known as an Nth-order for a Markov source, the complicated joint entropy of a
Markov source. Natural or computer rendered images and whole symbol sequence can be simplified to the sum of
video data are examples of Markov sources. the conditional entropy of each symbol in the sequence,
Conversely, joint sources generate N symbols simulta- which means that the entropy coding of such joint Markov
neously. A coding gain can be achieved by encoding those sources can be simplified to the conditional entropy coding
symbols jointly. The lower bound for the average code word of each symbol in the sequence given N previous symbols.
length is the joint entropy, In addition, the last equation in ( 11) suggests a simple
H(U1 , U2 , . . . , UN ) = − ··· P(u1 , u2 , . . . , uN ) conditional entropy coding method: any entropy coding
u1 u2 uN
method for a memoryless source can be used to encode the
symbol at time T as long as the probability distribution
× log2 (P(u1 , u2 , . . . , uN )). (8) used is switched according to the contexts (or states) of N
It generally holds that previous symbols.
Image and video data are normally highly correlated.
H(U1 , U2 , . . . , UN ) ≤ H(U1 ) + H(U2 ) + · · · + H(UN ) (9) The value of a pixel has dependency with a few neighboring
with equality, if U1 , U2 , . . . , UN are statistically indepen- pixels. Even for the simplest first-order Markov separable
dent. This states that for sources with memory, they can model, a pixel in a 2-D image would have dependency
be best coded jointly with a smaller lower bound (the joint with at least 3 neighboring pixels and a pixel in a 3-D
entropy) for the average word length than otherwise coded video sequence would have dependency with at least
independently. Moreover, the word length of jointly coding 7 neighboring pixels. If each pixel is represented by 8
the memoryless sources has the same lower bound as inde- bits, in order to most efficiently compress the pixel value
pendently coding them. However, coding each memoryless with the above derived simplified context-based entropy
source independently is much easier to implement than coding method, it would require 2563 probability tables for
coding jointly in real applications. image coding and 2567 probability tables for video coding.
For an image frame or a video sequence, since each pixel Apparently, it is impractical for the encoder or decoder to
in it is correlated with neighboring pixels, to obtain the maintain such a huge number of probability distribution
best coding efficiency, it is ideal to encode all the pixels in tables. For an efficient, independent coding of symbols,
the whole image frame or video sequence jointly. However, statistical dependencies should be reduced.
practical implementation complexity prohibits us to do so.
Fortunately, with the Markov model for image and video
data, the problem can be much simplified. 3.4. Predictive Coding
For an Nth-order Markov source, assume the first
Predictive coding is a way to reduce correlation between
symbol starts at time T0 the conditional probabilities of
data from Markov sources. It is much simpler than a
the source symbols are,
conditional entropy coder described in the previous section.
P(uT , ZT ) = P(uT | uT−1 , uT−2 , . . . , uT−N ) Instead of coding an original symbol value, the difference
or error between the original value and a predicted value
= P(uT | uT−1 , uT−2 , . . . , uT−N , uT−N−1 , . . . uT0 )
based on values of one or more past symbols is encoded.
(10) The decoder will perform the same prediction and use
1034 IMAGE AND VIDEO CODING
e Entropy
make the source more suitable for subsequent compression
x + are very common in an image or video compression system.
coding
− For nonstationary sources, adaptive prediction can
+ be used for more accurate prediction where the system
x switches between several predefined predictors according
r =x
Channel
Predictor to the characteristics of the data being compressed. The
choices of the predictors can be either explicitly coded
Encoder with a small overhead or implicitly derived from the
reconstructed values to avoid the overhead.
Predictive coding is important in a number of ways.
In its lossless form, predictive coding is used in many
e Entropy sophisticated compression schemes to compress critical
x +
decoding data, such as DC coefficients and motion vectors. If some
degree of loss is acceptable, the technique can be used
more extensively, (e.g., interframe motion compensated
r =x x prediction in video coding).
Predictor
3.5. Signal Models for Images and Video
Decoder
In general, there is no good model to describe exactly
Figure 5. Diagram of a predictive coder.
the nature of real world images and video. A first-order
Gaussian-Markov source is often used as a first order
approximation due to its tractability. Though crude, this
the encoded error to reconstruct the original value of that
model provides many insights in understanding image and
symbol. Figure 5 illustrates the predictive coding process.
video coding principles.
The linear predictor is the most used one in predictive
A one-dimensional (1-D) Gaussian-Markov (AR(1)
coding. It creates a linear weighting of the last N symbols
source) can be defined as,
(Nth-order Markov) to predict the next symbol. That is,
x(n) = αx(n − 1) + ε(n), for all n > n0 , (17)
Ŝ0 = α−1 S−1 + α−2 S−2 + · · · + α−N S−N , (13)
where |α| < 1 is the regression coefficient, and {ε(n)} is
where {S−1 , S−2 , . . . , S−N } are the last N symbols and α−i the i.i.d (independent identically distributed) zero mean
are the weights in the linear predictor. Without loss of normal random process with variance σε2 , and x(n0 ) is a
generality, assume the input symbols have zero mean, i.e., zero mean finite variance random variable. Such a source
E{S} = 0. The variance of the prediction error e0 = S0 − Ŝ0 is known to be asymptotically stationary [1]. For a two-
is given by, dimensional (2-D) image signal, the simplest source model
N N
is a two-dimensional separable correlation AR(1) model,
σe20 = α−i α−j Rij , (14)
i=0 j=0
x(m, n) = αh x(m − 1, n) + αv (m, n − 1)
where Rij is the covariance of symbols S−i and S−j , and − αh αv x(m − 1, n − 1) + ε(m, n), (18)
α0 = −1.
Minimization of the prediction error variance leads to where ε(m, n) is an i.i.d zero mean Gaussian noise
the orthogonality principle, source with variance σN , and αh , αv denote the first
order horizontal and vertical correlation coefficients,
E{e0 S−i } = 0, for all i = 1, 2, . . . , N. (15) respectively. Its autocorrelation function is separable and
can be expressed as a product of two 1-D autocorrelations.
The optimum coefficients {α−1 , α−2 , . . . , α−N } can be The separable correlation model of a 2-D image enables us
obtained by solving the above equation. Moreover, the to use 2-D separable transforms or other signal processing
orthogonality principle implies de-correlation of errors, techniques to process the 2-D sources using separate 1-
D processing in both horizontal and vertical directions,
E{e0 e−i } = E{e0 }E{e−i } = 0, for all i = 1, 2, . . . , N. (16) respectively. Therefore, in most cases, 1-D results can
be generalized to 2-D cases in accordance with the 2-D
For Gaussian random processes, de-correlation means separable correlation model.
statistical independence. Clearly, after optimum linear For video signals, a 3-D separable signal model could
prediction, the prediction errors are uncorrelated and the still apply. However, the special feature of moving pictures
much simpler and independent entropy coding can be in natural video is the relative motion of video objects
used to efficiently encode these errors. Intuitively, after in adjacent frames. A more precise signal model should
the process we have transformed a source of highly- incorporate the motion information into the 3-D signal
correlated values with any possible distribution into a model, for example, forming signal threads along the
source of much less correlated values but with consistently temporal direction and then applying the 3-D separable
high probability of being small. Such transformations that signal model. Fortunately, most modern video compression
IMAGE AND VIDEO CODING 1035
technologies have already explicitly used such motion quantization (ECQ) shows that, if an efficient entropy
information in the encoding process to take advantages coder is applied after quantization, the optimal coding
of the temporal redundancy. gain can be always achieved by a uniform quantizer [3].
In other words, for most applications in image and video
3.6. Quantization compression, the simplest possible quantizer, followed by
The coding theory and techniques we discussed so far variable-length coding, produces the best results. In order
are mostly focused on lossless coding. However, the state- to take advantage of the subsequent entropy coder, many
of-the-art lossless image and coding schemes exploiting practical scalar quantizers have included a dead-zone,
the statistical redundancy of image and video data can where a relatively larger partition is allocated for zero.
only achieve an average compression factor of 2 or 3. Note that if the probability distribution of the data to be
From Shannon’s rate distortion theory, we know that quantized is highly skewed, the inverse quantizer might
the coding rate of a data source can be significantly achieve smaller distortion if choosing a biased reconstruc-
reduced by introducing some numerical distortion [4]. tion point towards the higher probability end rather than
Fortunately, because the human visual system can the usual mid-point for uniform distribution.
tolerate some distortion under certain circumstances, As shown above, the uniform quantizer combined
such a distortion may or may not be perceptible. Lossy with entropy coding is simple and well suited in image
compression algorithms reduce both the redundant and and video coding. However, new applications such as
irrelevant information to achieve higher compression delivery of image or video over unstable or low bandwidth
ratio. Quantization is usually the only lossy operation networks require progressive transmission and/or exact
that removes perceptual irrelevancy. It is a many-to- rate control of the bit stream. To equip the image or
one mapping that reduces the number of possible signal video coding with these new functionalities, progressive
values at the cost of introducing some numerical errors in quantization strategies have to be adopted where a
the reconstructed signal. Quantization can be performed coefficient is quantized in a multi-pass fashion and
either on individual values (called scalar quantization) the quantization error can be reduced by successive
or on a group of values (a coding block, called vector enhancement information obtained from each pass.
quantization). The rate-distortion theory also indicates Successive-Approximation Quantization (SAQ) of the
that, as the size of the coding block increases, the distortion coefficients is one such approach and is widely used
asymptotically approaches Shannon lower bound; in other in image and video coding systems [37,38,44,47]. As a
words, if a source is coded as an infinitely large block, then special case of SAQ, bit-plane coding encodes each bit in
it is possible to find a block-coding scheme with rate R(D) a binary representation of the coefficient from the most
that can achieve distortion D where R(D) is the minimum significant bit (MSB) to the least significant bit (LSB) at
possible rate necessary to achieve an average distortion each quantization scan.
D [4]. This states that vector quantization is always better
than scalar quantization. However, due to the complexity 3.6.2. Vector Quantization. In contrast to a scalar
issue, many practical image and video coders still prefer quantizer that operates upon a single, one-dimensional,
to use scalar quantization. The basics on scalar and vector variable, a vector quantizer acts on a multidimensional
quantization techniques are discussed as follows. vector. Vector quantization is a mapping of k-dimensional
Euclidean space Rk into a finite subset C of Rk , where
3.6.1. Scalar Quantization. An N-point scalar quanti- C = {ci : i = 1, 2, . . . , N} is the set of N reproduction
zation is a mapping from a real one-dimensional space vectors or codebook, and the element ci is referred to
to a finite set C of discrete points in the real space, as a codeword or a code-vector. An encoder Q(x) takes
C = {c0 , c1 , . . . , cN−1 }. Normally, the mapping is to find the an input vector x and generates the index i of the best
closest match in C for the input signal according to a cer- matched vector ci to x in the set C according to certain
tain distortion criterion, for example, mean squared error distortion criteria, for example, MSE. A decoder Q−1 (i)
(MSE). The values of ci are referred to as reproduction val- regenerates the codeword ci using the input index i Fig. 6
ues. The output is the index of the best matched reproduc- illustrates the VQ encoding and decoding processes.
tion value. The transmission rate r = log2 N is defined to Vector quantization always outperforms the scalar
indicate the number of bits per sample. The uniform quan- quantization in terms of error measurement under the
tizer with ci distributed uniformly on the real axis is the same bit rate by Shannon’s rate-distortion theory [3–6].
optimal solution when quantizing a uniformly distributed Vector quantization takes advantage of the joint probabil-
source. For a random nonuniformly distributed source ity distribution of a set of random variables while scalar
(such as image luminance levels) and even for a source quantization only uses the marginal probability distribu-
with an unknown distribution, the Lloyd–Max quantizer tion of a one-dimensional random variable [3]. In [7], Look-
design algorithm [2] provides an essential approach to abaugh and Gray summarized the vector quantization
the (locally) optimal scalar quantizer design. As a special advantages over scalar quantization in three categories:
case of the generalized Lloyd algorithm to be discussed memory advantage (correlation between vector compo-
for vector quantization, the Lloyd-Max quantizer tries nents, Fig. 7), shape advantage (probability distribution,
to minimize the distortion for a given number of levels Fig. 8) and space filling advantage (higher dimensionality,
without the need for a subsequent entropy coder. Though Fig. 9). Table 1 lists the high-rate approximation of cod-
optimal, it involves a complicated iterative training pro- ing gain brought by these VQ advantages over a scalar
cess. On the other hand, the study of entropy-constrained quantizer for different VQ dimensionalities. The results
1036 IMAGE AND VIDEO CODING
Codeword 0 Codeword 0
Codeword 1 Codeword 1
Codeword N −1 Codeword N −1
Codebook Codebook
VQ encoding VQ decoding
1 0 0 0 0
2 0.17 1.14 5.05 6.36
3 0.29 1.61 6.74 8.64
4 0.39 1.87 7.58 9.84
5 0.47 2.04 8.09 10.6
6 0.54 2.16 8.42 11.12
7 0.60 2.25 8.67 11.52
8 0.66 2.31 8.85 11.82
9 0.70 2.36 8.99 12.05
10 0.74 2.41 9.10 12.25
12 0.81 2.47 9.27 12.55
16 0.91 2.55 9.48 12.94
24 1.04 2.64 9.67 13.35
100 1.35 2.77 10.01 14.13
∞ 1.53 2.81 10.11 14.45
Channel
how to reduce the complexity of codebook training and
codebook searching [3,12–15,43]. Some efforts have also Encoder
been put on the analogy of a uniform scalar quantizer
in multidimensions — lattice VQ (LVQ) [8–10,45], where
a codebook is not necessary. However, it seems that LVQ x' e' Entropy
just puts off the burden of designing and searching for a + decoding
large codebook to the design of a complex entropy coder.
Original Y
mismatch errors.
The design of a DPCM system should consist of
optimizing both the predictor and the quantizer jointly. X
However, it has been shown that under the mean-squared
ed
m
error optimization criterion, independent optimizations
or
sf
an
of the predictor and the quantizer discussed in previous
Tt
sections are good approximations to the jointly optimal
solution. Because of the reconstruction dependency of a
predictive coder, any channel errors could be propagated
throughout the remainder of the reconstructed values.
Usually, the sum of the coefficients is made slightly less
than one (called leaky prediction) to reduce the effects of
channel errors.
Tr
drawbacks. First, its IIR filtering nature decides that
an
sf
the correct reconstruction of future values is always Original X
or
m
dependent on the previously correctly reconstructed
ed
Y
values. Thus, channel errors are not only propagated
but also accumulated to future reconstructed values. Figure 11. Advantage of transformation in signal compression.
This makes predictive coding an unstable system under
channel errors. Secondly, for lossy coding, because of
the iterative prediction and quantization processes, it the original signal with the same distortion although
is hard to establish a direct relation between average they both use a uniform scalar quantizer in each
distortion and rate of the quantizer, which in turn vector dimension. Furthermore, the complexity of entropy
makes it hard for optimal rate control. Thirdly, predictive coding following the quantization step is reduced in
coding is a model-based approach and is less robust the transformed domain since the number of codewords
to source statistics. When source statistics changes, is reduced.
adaptive predictive coding normally has to choose different The efficiency of a transform coding system will depend
predictors to match the source. Moreover, the prediction on the type of linear transform and the nature of bit
coding is a waveform compression technique. Since the allocation for quantizing transform coefficients. Most
human visual system model is best described in the practical systems are based on suboptimal approaches
frequency domain, it is difficult to apply visual masking to for transform operation as well as bit allocation. There
the prediction error. are mainly two forms of linear transformations that are
Alternatively, transform coding techniques can be used commonly used in transform coding: block transforms and
to reduce the correlation in source data. Transform subband decompositions.
coders take an M input source samples and perform a
reversible linear transform or decomposition to obtain 3.8.1. Block Transforms. Block transform coding, also
M transform domain coefficients that are decorrelated called block quantization, is a widely used technique
and more energy compacted for better compression. in image and video compression. A block of data is
They decorrelate coefficients to make them amendable transformed so that a large portion of its energy is packed
to efficient entropy coding with low-order models, and in relatively few transform coefficients, which are then
distribute energy to only a few coefficients and thus make it quantized independently.
easy to remove redundancy and irrelevancy. Theoretically, In general, a 1-D transformation scheme can normally
the asymptotic MSE performance is the same for both be represented by a kernel matrix T = {t(k, n)}k,n=0,1,...,N−1
predictive coding and transform coding [21]. However, and the transformed results Y = {y0 , y1 , . . . , yN−1 }T can be
transform coding is more robust to channel errors and represented as the multiplications of the transform matrix
source statistics. There is an explicit relationship between T and the signal vector X = {x0 , x1 , . . . , xN−1 }T . That is,
distortion and data rate after transformation, and optimal
bit allocation or rate control can be easily implemented. Y = T • X. (19)
The subjective quality is better at low bit rates since
transform coding is normally a frequency domain approach The original signal can be recovered by multiplying the
and the HVS model can be easily incorporated in the inverse matrix of T with the transformed signal without
encoding process. any distortion if we ignore the rounding errors caused
Figure 11 illustrates the advantage of transformation by limited arithmetic precision. From a signal analysis
over a highly correlated two-dimensional vector source point of view, the original signal is represented as the
with scalar quantization. We can see that the transformed weighted sum of basis vectors, where the weights are just
signal would require much fewer bits to code than the transform coefficients.
IMAGE AND VIDEO CODING 1039
Block transforms are normally orthonormal (unitary) The optimal block transform is the Karhunen-Loeve
transforms, which means that, Transform (KLT) that yields decorrelated transform
coefficients and optimum energy concentration. The basis
T • T T = T T • T = IN×N , (20) vectors of KLT are eigenvectors of the covariance matrix of
the input signal. However, the KLT depends on the second-
where IN×N is the identity matrix. Orthogonality is clearly order signal statistics as well as the size of the block, and
a necessary condition for basis vectors to decompose an the basis vectors are not known analytically. Even when a
input into uncorrelated components in an N-dimensional transform matrix is available, it still involves quite a large
space. Orthonormality of basis vectors is a stronger amount of transformation operations because the KLT is
condition that leads to the signal energy preservation not separable for image blocks and the transform matrix
property in both the signal domain and the transform cannot be factored into sparse matrices for fast calculation.
domain. Moreover, the mean squared error caused by Therefore, the KLT is not appropriate for image coding
quantization in the transform domain is the same as that applications. Fortunately, there exists a unitary transform
in the signal domain and is independent of the transform. that performs nearly as well as the KLT on natural images
This greatly eases the encoding process where otherwise but without the disadvantages of KLT. This leads us to
an inverse transform is needed to find the distortion in the the Discrete Cosine Transforms (DCT).
original domain. The Discrete Cosine Transform kernel matrix is defined
Image signals are two-dimensional signals. By nature as follows,
the correlation between pixels is not separable, so a
non-separable transform should be applied to decorrelate π(2n + 1)k
tkn = u(k) cos , (21)
the pixels. However, since a 2-D separable correlation 2N
model provides practical simplicity and sufficiently good
performance, 2-D separable transforms are widely used where
1
in practical image coding systems. A 2-D separable
√ , if k = 0;
transforms can be easily implemented with two steps u(k) =
N (22)
of 1-D transforms for both all rows and all columns
2
, if k = 0.
subsequently. Similarly, for 3-D video signals, 3-D N
separable transforms can be implemented with 1-D
transforms in each direction: horizontal, vertical and The DCT has some very interesting properties. First,
temporal, separately. it is verified that the DCT is very close — in terms of
In practice, the block transforms are not applied to a energy compaction and decorrelation — to the optimal
whole image itself. Normally, the image is divided into KLT for a highly correlated first-order stationary Markov
subimages or blocks and each block is then transformed sequence [22]. Secondly, its transform kernel is a real
and coded independently. The transform coding based function, so only the real part of the transform domain
on a small block size does not necessarily degrade too coefficients of a natural image or video must be coded.
much the coding efficiency, because: (1) from the VQ Moreover, there exist fast algorithms for computing
theory, we learned that beyond a certain vector size, the the DCT in one or two dimensions [25–28]. All these
coding gain increase tends to saturate, so in this case, have made DCT a popular transform in various image
larger block sizes don’t bring us significant additional and video coding schemes. For a natural image block,
coding gain anyway; (2) natural images and video are after the DCT, the DC coefficient is typically uniformly
generally nonstationary signals, dividing them into small distributed, whereas the distribution for the other
blocks is particularly efficient in cases where correlations coefficients resembles a Laplacian one.
are localized to neighboring pixels, and where structural There are some other transforms that also could be
details tend to cluster; and (3) the inter block correlation used for image and video coding, such as Haar Transform,
can still be exploited by predictive coding techniques for or a Walsh-Hadamard Transform. The reason to use
certain transform domain coefficients, for example, the DC them is not because of their performance but because
component. From a theoretical analysis and simulation of their simplicity.
results, it is shown that for natural images the block size
is optimal around 8 to 16. In most image and video coding 3.8.2. Subband Decomposition. Another form of linear
standards such as JPEG, MPEG, a value of 8 has been transformation that brings energy compaction and decor-
chosen. Recently, there is also a trend to use adaptive block relation is subband decomposition. In subband decomposi-
transforms where transform blocks may adapt to signal tion (called analysis process), the source to be compressed
local statistics. The block processing has a significant is passed through a bank of analysis filters (filter bank)
drawback, however, since it introduces a distortion followed by critical subsampling to generate signal sub-
termed blocking artifact, which becomes visible at high bands. Each subband represents a particular portion of the
compression ratios, especially in image regions with low frequency spectrum of the image. At the decoder, the sub-
local variance. Lapped Orthogonal Transforms (LOT) [30] band signals are decoded, upsampled and passed through
attempt to reduce the blocking artifacts by using smoothly a bank of synthesis filters and properly summed up to
overlapping blocks. However, the increased computational yield the reconstructed signal. This process is called the
complexity of such algorithms does not seem to justify wide synthesis process. The fundamental concept behind sub-
replacement of block transforms by LOT. band coding is to split up the frequency band of a signal
1040 IMAGE AND VIDEO CODING
and then to code each subband using a coder and bit rate As a special case to subband decomposition, wavelet
accurately matched to the statistics of the band. decomposition allows nonuniform tiling of the time-
Compared with block transforms, subband decomposi- frequency plane. All wavelet basis functions (baby
tion is normally applied to the entire signal and thus it can wavelets) are derived from a single prototype (mother
decorrelate a signal across a larger scale than block trans- wavelet) by dilations (scaling) and translations (shifts).
forms, which translates into more potential coding gain. Besides the perfect reconstruction property, wavelet filters
At high compression ratios, block transform coding suf- used for image and video compression face additional and
fers severe blocking artifacts at block boundaries whereas often conflicting requirements, such as, compact support
subband coding does not. The capability to encode each (short impulse response) of the analysis filters to preserve
subband separately in accordance with its visual impor- the localization of image features; compact support of the
tance leads to visually pleasing image reconstruction. As synthesis filters to prevent spreading of ringing artifacts;
a subset of subband decomposition, wavelet decomposi- linear phase to avoid unpleasant waveform distortions
tion provides an intrinsic multi-resolution representation around edges; and orthogonality to provide preservation of
of signals that is important for many attractive image energy. Among them, orthogonality is mutually exclusive
and video coding functionalities, such as adaptive cod- with linear phase in two-band FIR systems, so it is often
ing, scalable coding, progressive transmission, optimal bit sacrificed for linear phase. More information on wavelets
allocation and rate control, error robustness, and so forth. and filter banks can be found in [146–148]. Wavelet
It has been shown that the analysis and synthesis decomposition of discrete sources is also often referred
filters play an important role in the performance of to as Discrete Wavelet Transforms (DWT).
the decomposition for compression purposes. One of the Two-band systems are the basic component of most sub-
original challenges in subband coding was to design band decomposition schemes. Recursive application of a
subband filters that cover well the desired frequency band two-band filter bank to the subbands of the previous stage
but without aliasing upon the reconstruction step caused yields subbands with various tree structures. Examples
by the intermediate subsampling. The key advance was are uniform decomposition, octave-band (pyramid) decom-
the development of quadrature mirror filters (QMF) [29]. position, and adaptive or wavelet-packet decomposition.
Although aliasing is allowed in the subsampling step at Among them, pyramid decomposition is the most widely
the encoder, the QMF filters cancel the aliasing during the used in image and video coding where a multi-resolution
reconstruction at the receiver. These ideas continue to be representation of image and video is generated. Based on
generalized and extended. the fact that most of the frequency of an image is concen-
A two-band filter bank is illustrated in Fig. 12. The trated in low-frequency regions, pyramid decomposition
filters F0 (ω) and F1 (ω) are the analysis lowpass and further splits the lower frequency part while keeping
highpass filters, respectively, while G0 (ω) and G1 (ω) are the high-pass part at each level untouched. Figure 13
the synthesis filters. In this system, the input/output illustrates such 2-D separable pyramid decomposition
relationship is given by intuitively. Pyramid decomposition provides inherent spa-
tial scalability in the encoded bit stream. On the other
1 hand, an adaptive wavelet transform or wavelet-packet
X (ω) = [F0 (ω)G0 (w) + F1 (ω)G1 (ω)]X(ω) decomposition chooses the band splitting method accord-
2
1 ing to the local characteristics of the signal source in each
+ [F0 (ω + π )G0 (w) subband to achieve better compression.
2
Although applied to a whole image frame, the discrete
+ F1 (ω + π )G1 (ω)]X(ω + π ), (23) wavelet transform is not as complicated as it seems.
Lifting [48] is a fast algorithm that can be used to
where the underlined term is where the aliasing occurs. efficiently compute DWT transforms while providing some
Perfect reconstruction can be achieved by removing new insights on wavelet transforms.
the aliasing distortion. One such condition could be, As we mentioned before, one advantage of subband
G0 (ω) = F1 (w + π ) and −G1 (ω) = F0 (w + π ). Furthermore coding over block transform coding is the absence of the
QMF filters achieve aliasing cancellation by choosing, blocking effect. However, it introduces another major
F0 (ω) = F1 (ω + π ) = −G0 (ω) = G1 (ω + π ), where the high- artifact — a ringing effect which occurs around high-
pass band is the mirror image of the lowpass band in the contrast edges due to the Gibbs phenomenon of linear
frequency domain. filters. This artifact can be reduced or even removed by an
L (t ) L′(t )
F0(w) 2 2 G0(w)
x (t ) x ′(t )
+
Figure 12. Two-band wavelet analysis/synthesis H (t ) H ′(t )
system. F0 ( ) and F1 ( ) are the lowpass and F1(w) 2 2 G1(w)
highpass analysis filters, respectively; G0 ( ) and
Channel
G1 ( ) are the lowpass and highpass synthesis
filters, respectively. Analysis filtering Synthesis filtering
IMAGE AND VIDEO CODING 1041
Previous frame
∆t
Current frame
time t
x
Moving object y
Stationary
background
Displacement
Figure 14. Illustration of motion compensation vector (∆ x,∆y) Shifted object
and estimation concept.
applied to each row of an image and then on each column. of de-correlating samples in different frames using motion
The key concept of SA-DWT is to keep the phase of each information is called motion compensation. The concept
transformed coefficient aligned so that locality and spatial of motion estimation and compensation is illustrated
relation between neighboring pixels are well preserved. in Fig. 14. Motion compensation is probably the most
Based on the available shape information, SA-DWT starts important factor in video coding that dominates most of the
by searching disjoined pixel segments in each row of an significant video coding advancements in recent years. It
arbitrarily-shaped region and applies the normal DWT seems that further improvements of motion compensation
to each of the segments separately but the subsampling are a continuing need.
positions of each segment are aligned no matter where The analogies to the normal prediction techniques and
each segment starts. The subsampled coefficients are put transformations in the temporal direction considering
in their corresponding spatial positions so that relative motion information are motion compensated predictions
positions are preserved for efficient decorrelation in later or motion predictions and motion compensated transfor-
steps. The same operation is then applied for each mations or spatiotemporal transformations, respectively.
column of the row-transformed coefficients. SA-DWT also Motion estimation is the first step of these techniques,
keeps the number of transform coefficients equal to the which is essentially a search process that normally
number of pixels in an original shape but with much involves heavy computation. Fortunately, there are
improved decorrelation properties across an arbitrarily- many fast yet efficient motion estimation algorithms
shaped object texture. It has been shown that SA-DWT available today.
combined with advanced quantization and entropy coding Ideally, each pixel in a video frame should be assigned
techniques can achieve much improved efficiency over SA- a motion vector that refers to a pixel most correlated with
DCT for still image object coding. Moreover, it will not it. However, it requires not only increases in the already
suffer from the blocking artifacts existing in SA-DCT. heavy computational load for motion vector search, but
also a significant amount of bits to encode the motion
3.9. Motion Estimation and Compensation vector overhead. Therefore, in practice, pixels are usually
grouped into a block with fixed or variable size for the
The prediction or transformation techniques discussed so motion estimation. Variable block size can adapt to the
far all aim at de-correlating highly correlated neighboring characteristics and object distribution in video frames
signal samples. They are especially effective for spatial and achieve better estimation accuracy and motion vector
neighbors within an image or video frames. Video coding efficiency.
is a set of temporal samples that capture a moving There are many motion estimation algorithms avail-
scene across time. In a typical scene there is a great able, for example, block matching, where the motion vector
deal of similarity or correlation between neighboring is obtained by searching through all the possible posi-
images of the same sequence. However, unless the video tions according to a matching criterion; hierarchical block
sequence has slow motion, directly extending the above de- matching, where the estimated motion vectors are succes-
correlating techniques to the temporal direction would not sively refined from larger size blocks to smaller size ones;
always be very effective. The direct temporal neighboring gradient matching, where the spatial and temporal gra-
samples that are spatially collocated are not necessarily dients are measured to calculate the motion vectors, and
highly correlated because of the motion in the video phase correlation, where the phases of the spectral compo-
sequence. Rather, the pixels in neighboring frames that nents in two frames are used to derive the motion direction
are more correlated are along the motion trajectory or and speed. Among them, gradient matching and phase cor-
optical flow and there usually exists a spatial displacement relation techniques actually calculate the motion vectors
between them. Therefore, an efficient signal decorrelation of moving objects rather than estimating, extrapolating
scheme in the temporal direction should always operate or searching for them. Block matching is the simplest
on pixels along the same motion trajectory. Such a spatial and best studied one, and it is being widely used in
displacement of visual objects (pixels, blocks or objects) practical video coding systems. To reduce the complex-
is called a motion vector and the process of determining ity of a brute-force full search while keeping the motion
how objects move from one frame to another or finding the vector accuracy, many fast search schemes have been
motion vector is called motion estimation. The operation developed for block matching algorithms, for example,
IMAGE AND VIDEO CODING 1043
3-step search (TSS) [49], Diamond search (DS) [50,51], coding schemes can be used to efficiently encode the
Zonal-based search [52], and so on. Advanced search algo- prediction residue images with a little modification on
rithms [53–55] even take advantage of the interframe the parts that heavily depend on the statistics of pixel
motion field prediction to further speed up the motion values, for example, the entropy coding tables. Indeed,
estimation process. many existing video coding systems extend the still image
Moreover, because the motion of objects in a video coding methods to encode the residue image and achieve
scene is not just limited to 2D translation, global great results, for example, MPEG and H.26x coding
motion estimation algorithms [56] try to capture the true standards using block-based motion compensation and
object motion and further improve video coding efficiency block transformation for the residue coding.
by using an extended set of motion parameters for Unfortunately, block-based systems have trouble cod-
translation, rotation, and zooming. ing sequences at very low-bit rates and the resultant
coded images have noticeable blocking artifacts. The high-
3.9.1. Motion Compensated Prediction. The most wide- frequency components caused by the blocking artifacts
ly used technique for de-correlating the samples along in the residue image partly change the high-correlation
a temporal motion trajectory is motion compensated nature of the residue. One solution to this problem is a
prediction. Motion compensated prediction forms a lowpass filter in the motion compensation loop (either on
prediction image based on one or more previously encoded the reference image or on the predicted image) to remove
frames (not necessarily past frames in display order) the high-frequency components. Another solution is to
in a video sequence and subtracts it from the frame apply new signal decomposition techniques that match
to be encoded to generate a residue or error image to the statistics of the residue. Matching Pursuit [61–63] is
be encoded by various coding schemes. In fact, motion one of these techniques that can be used to efficiently
compensated prediction can be viewed as an adaptive encode the prediction residue image. Instead of expanding
predictor where the goal is to find minimum difference the motion residual signal on a complete basis such as
between the original image and the predicted image. the DCT, an expansion on an overcomplete set of separa-
Different techniques developed for improving motion ble Gabor functions, which do not contain artificial block
prediction are just variations of the adaptive predictor. edges, is used. Meanwhile, it removes grid positioning
For example, integer-pel motion compensation is the restrictions, allowing elements of the basis set to exist at
basic form of the adaptive predictor where the choices any pixel resolution location within the image. All these
allow the system to avoid the artifacts most often produced
of different predictions are signaled by the motion
by low-bit rate block-based systems.
vectors; fractional-pel (half-pel, quarter-pel, or 1/8-pel)
Motion compensated prediction can work efficiently
motion prediction is a refinement of integer-pel motion
with dependency only on one immediate previous frame.
prediction where the choices of predictions are increased
This makes it very suitable for real-time video applica-
to provide more accurate predictions and they are signaled
tions, such as video phone and video conferencing, where
by the additional fractional-pel precision [57]; P-type
low delay is a critical prerequisite. As with any prediction
prediction is formed by predictors using only a past
coding schemes, the major drawback of predictive coding
frame as references; B-type prediction uses both the past
is the dependency on previously coded frames. This is
frame and future frame; long-term prediction maintains
especially more important for video. Features like random
a reference that is used frequently by other frames;
access and error robustness are limited in motion com-
advanced video coding schemes also use multiple frames
pensated predictive coding and transmission errors could
as the input to the adaptive predictor to maximize
be propagated and accumulated (called error drifting).
the chance of further reducing the prediction errors
Moreover, a predictive scalable coder must design ways to
with the expense of increased encoding complexity and
compensate the loss caused by the unavailability of part
more overhead for predictor parameters; overlapped block
of the reference bit stream.
motion compensation (OBMC) [57] and de-blocking or loop
filters in the prediction loop are ways to exploit more 3.9.2. Motion Compensated Transformations. As we
spatially correlated pixels at the input of the adaptive mentioned, direct extension of 2-D spatial signal decompo-
predictor without the need to transmit the overhead for sition to 3-D spatiotemporal decomposition without motion
more predictor parameters, while improving the visual compensation [64] is not efficient especially for moderate
quality of the decoded image; adaptive block size in to high motion sequences, though they offer computational
motion prediction forces the motion predictor to adapt to simplicity and freedom from motion artifacts. Without
the local characteristics of an image; global motion based motion compensation, the decoded video may normally
prediction is not a linear prediction anymore and it forms present severe blurring and even ghost image artifacts.
more accurate prediction with complex warping operations With added complexity, motion compensated 3-D trans-
considering the fact that the motion of video objects is not formations align video samples along the motion tra-
limited to 2-D translations. There are continuing research jectory to better decorrelate them for better coding effi-
efforts on choosing better adaptive predictors, and there is ciency. In addition, motion compensated transformations
always a delicate trade-off between complexity, overhead provide enhanced functionalities like scalability, easier
bits for predictor parameters, and prediction errors. rate-distortion optimization, easier error concealment and
Ideally, if the motion compensated prediction simply limited error propagation in error-prone channels, and
removes the temporal correlation without affecting the so on. A particularly interesting example is the spa-
spatial correlation between pixels, any efficient image tiotemporal subband/wavelet coding of video, where 3-D
1044 IMAGE AND VIDEO CODING
subband/wavelet transforms are applied to the motion In fact, if the target is just concerned about bit
aligned video data [65–68]. One of the key issues is to rate versus quality, rate control and bit allocation are
align the pixels in successive frames with motion informa- closely related and sometimes it is hard to distinguish
tion. This is still an active research topic and some good the difference between them. In this case, a generic
results have been reported. Another issue is related to the rate control (bit allocation) problem can be formulated
boundary effect that may be present in the temporal direc- as the problem to minimize the overall distortion
tion due to limited-length wavelet transforms. A solution (quantization error),
is proposed in [69] to use a lifting algorithm.
One of the major drawbacks of 3-D transformations is D= Di , (24)
that normally a few frames are involved and there exist i
considerable encoding and decoding delays. Therefore, it
is not suitable for real-time communications. under the constraint of a given total bit rate
constraint of a given bit budget. Integer programming the- should be within a threshold θ , the optimal bit allocation
ory is normally used in this approach to find the optimal can be derived as,
discrete solution. Because the operational R-D approach 2
can achieve exactly the practical optimal performance for 1 σi
log2 , if σi2 > θ ;
completely arbitrary inputs and choices of discrete quan- Ri = 2 θ (27)
tizers, it is often preferred in a practical coding system. 0, if σi2 ≤ θ ;
However, it requires that: (1) the number of admissible
quantizers or quantizer combinations is tractable for prac- and the distortion for each component is given by,
tical coding systems; (2) the coding system is capable of
θ, if σi2 > θ ;
providing all the operational R-D points easily or tolerates Di = (28)
the possible long delays and high complexity caused by σi2 , if σi2 ≤ θ ;
calculation of these R-D points. Therefore, practically the
This result shows that all the components should have
operational approach is normally applied to independent
the same quantization error except for those with energy
coding cases or scalable coding schemes where a set of
σi2 lower than certain threshold θ . By adjusting the
operational points can be easily obtained from the embed-
value θ , various bit rates can be achieved using the
ded bit streams. The model-based approach is normally
above equations.
applied in scenarios where delay and complexity cannot
Using such a fixed bit allocation, the number of bits
be tolerated or dependent coding cases where the combina-
used for each component is fixed for all compression units
tions of admissible quantizers are out of control because of
of the source, so the allocation information only needs to be
the dependency. If the input statistics are known, model-
sent out once and all other bits can be used to encode the
based bit allocation can be used to derive a predetermined
coefficient values. Such an optimal bit allocation is totally
bit allocation strategy that best fits the input source. Oth-
dependent on the statistical characteristics of the source;
erwise, it will work in a feed-forward fashion based on
specifically, the variances of transform components are
the heuristics where the parameters of the model can be
needed and in general, the source has to be stationary.
updated adaptively and hopefully converge to an optimal
While producing a constant-rate bit stream, coders using
bit allocation plan in the long run. The performance of
fixed bit allocation cannot adapt to the spatial or temporal
model-based bit allocation can be also improved through
changes of the source, and thus coding distortion may vary
multipass coding to refine the bit allocation among com-
from one compression unit to another due to changes of
pression units.
the source. Conversely, adaptive bit allocation schemes
For operational R-D based bit allocation, the problem
can be used to deal with the random changes of a source.
normally involves finding the convex hull where the
However, overhead bits will be introduced to indicate
optimal points lie on [82]. Figure 15 shows an example
the difference in bit allocation schemes. Combining with
of the convex hull on an R-D plot. Finding a convex hull
the possible entropy coding afterwards, a closed-form
of a set of points has been extensively investigated and
formula is generally not available to explicitly indicate
many fast algorithms have been designed [83]. In order
the relationship between bits and distortion. A feed-
to avoid the impractical exhaustive computation of all
forward heuristic or multipass approach can be applied
the combinations, an algorithm was designed to find the
to hopefully obtain an optimal bit allocation on average. A
convex hull in a limited number of computations [84]. For
simple example is the threshold coding that is used widely
a single optimization point, the BFOS algorithm can be
in many image and video coding standards nowadays.
used to quickly allocate the available bits [85].
Threshold coding fully takes advantage of the energy
For a Gaussian source and MSE distortion criterion,
packing property of transformations by encoding only
assuming that the distortion error of each component
those significant coefficients. In such a coding scheme,
only those coefficients whose energy is higher than the
threshold are quantized and encoded; all others will be
treated as zero and discarded. This significantly increases
the number of zeros and reduces that of significant
coefficients to be encoded. Besides, the threshold could be
much larger than the quantization step to take advantage
of efficient zero coding. This is the basis for many dead-
zone quantizers. The threshold itself does not need to
Rate
When delivering image or video data over time-varying more consecutive zeros toward the end of the sequence;
channels such as wireless channels and the best effort thus, it makes run-length coding more efficient. Zigzag
IP networks, the bit budget that best fits the channels scanning takes advantage of the inherent statistics in the
could not be predetermined at the encoder time. Scalable image and video data and arranges it in an order that is
image [86] and video coding [38,87] can dynamically more convenient for subsequent coding. Figure 16 gives
adjust the bit rate on the fly to adapt to the channel an example of such a zigzag scan order for an 8 × 8 DCT
conditions (bandwidth, throughput, or error rate, etc.). domain coefficients, which is commonly used in JPEG and
This essentially results in a new coding scheme where the MPEG image and video coding standards.
rate control is put off from the encoder time to the delivery
time. How to quickly and optimally allocate the bits on the 3.12.3. Zerotree coding. For subband/wavelet based
fly is an active research topic [86,88]. decompositions, symbol formation is a little bit different.
In the tree structure of octave-band wavelet decomposition
3.12. Symbol Formation shown in Fig. 17, each coefficient in the high-pass
As an important and sometimes crucial part of final step in subbands has four coefficients corresponding to its spatial
image and video compression systems, symbol formation position at a higher scale. Because of this very structure of
essentially organizes the final coefficients that may or may the decomposition, there should be a good way to arrange
not be quantized in a form that is more suitable for efficient its coefficients to achieve better compression efficiency.
entropy coding subsequently. Common symbol formation Based on the statistics of the decomposed coefficients in
techniques are run-length coding (RLC), zig-zag scanning, each subband, it has been observed that if a wavelet
zerotree scanning, and context formation for conditional coefficient at a coarse scale is insignificant with respect
entropy coding, and so forth.
to a given threshold T, then all wavelet coefficients of drive a conditional entropy coding scheme. However, this
the same orientation in the same spatial location at would require overhead bits in explicitly coding the class of
a finer scale are also likely to be insignificant with the subgroup. Fortunately, almost any data to be encoded
respect to T. EZW image compression is a wavelet based in image and video compression are very dependent on its
coder that was first proposed by Shapiro [89] in 1993. neighboring context. Such context information can be used
It uses the zerotree structure and takes advantage of to drive different entropy coders that are most suitable
the self-similarity between subbands at different scale. for the data to be coded. Context-based entropy coding
A zerotree [89] is formed with the root coefficient and is essentially a prediction coding scheme. The best part
all its children are insignificant. Figure 17 shows such is that the prediction is implicit and the dependency or
a tree structure. An encoder can stop traversing the correlation between the data and the context does not have
tree branch once it detects that the node is a zerotree to be known explicitly.
root. Many insignificant coefficients at higher frequency As we have discussed in the entropy coding part, one
subbands (finer resolutions) can be discarded. Zerotree challenge of context formation is to keep the contexts as
coding can be considered as the counterpart of zigzag scan small as possible while reflecting as much dependency
order in subband/wavelet coding. It also uses successive- as possible. Lower entropy can be achieved through
approximation quantization to quantize the significant higher order conditioning (larger contexts). However,
coefficients and context-based arithmetic coding to encode larger context implies a larger model cost [92], which
the generated bits. Bits are coded in order of importance, reflects the penalties of context dilution when count
yielding a fully embedded code. The main advantage statistics must be spread over too many contexts, thus
is that the encoder can terminate the encoding at any affecting the accuracy of the corresponding estimates,
point, thereby allowing a target bit rate to be met exactly. especially for adaptive context-based coding. Moreover,
Similarly, the decoder can also stop decoding at any point larger contexts means more memory is needed to store
resulting in the best image at the rate of the truncated bit the probability tables. This observation suggests that the
stream. The algorithm produces excellent results without choice of context model should be guided by the use,
any prestored tables or codebooks, training, or prior whenever possible, of available prior knowledge on the
knowledge of the image source. data to be modeled, thus avoiding unnecessary learning
In general, ordered data are more easily compressed costs. Often explicit prediction and context-based coding
than un-ordered data. But when organizing data in an schemes can be combined to reduce the model cost even
order that is not predefined, the order information has to be though they might be based on the same context [93].
encoded in the bitstreams as overhead. Zigzag scan works
well because it is a predetermined order and fits the source 3.13. Preprocessing and Postprocessing
statistics well. On the other hand, for zerotree coding, In addition to the core techniques we have discussed,
partial order information such as a significance map has additional preprocessing and postprocessing stages are
to be sent to the decoder. An efficient coding algorithm extensively used in image and video compression systems
can be derived, if a good balance between overhead in order to render the input or output images in a more
information for ordering and a good ordering scheme for appropriate format for the purpose of coding or display.
subsequent entropy coding is achieved. Set Partitioning in The possible operations include but are not limited to de-
Hierarchical Trees (SPIHT) algorithm [90], rearranges the noising, format/color conversions, compression artifacts
transformed coefficients by partial ordering according to removal, error concealment, and so on.
their magnitudes with a set partitioning sorting algorithm, As with any other signals captured from a natural
transmits refinement bits in an order according to the source, image and video data normally contain noise in
ordered bitplane, and exploits the self-similarity of the them. The problem becomes more severe as many low-
wavelet transform across different scales. The algorithm end consumer-grade digital image and video devices are
enables progressive transmission of the coefficient values gaining in popularity. As we learned from information
by ordering the coefficients and transmitting the most theory, truly random noise not only is very hard to
significant bits first. The ordering information also makes compress but also degrades the image and video quality.
quantization more efficient by first allocating bits to It is very common to use a de-noising filter [94,95] prior
coefficients with larger magnitudes. An efficient coding to coding in order to enhance the quality of the final
scheme of the ordering information is also included in the pictures and to remove the various noises that will affect
algorithm. The results [90] show that the SPIHT coding the performance of compression algorithms.
algorithm in most cases surpasses those schemes obtained The simplest compression techniques are based on
from various zerotree algorithms [89]. interpolation and subsampling of the input image and
video data to match with the resolution and format of
3.12.4. Context Formation. It can be shown that when the target display devices. For example, a mobile device
a data source is partitioned into different classes, the is normally only equipped with limited display screen
attainable entropy can be smaller than the unpartitioned resolution, and a subsampled image that matches its
one. Thus, it suggests that we can achieve more coding screen size can just meet its needs while greatly reducing
efficiency when a source is divided into different groups the bit rate. The challenge of subsampling is how to
with significant different probability distributions. One maintain the crispness of a picture without introducing
way is to use a search algorithm to find which subgroup aliasing, that is, how to select the lowpass filters, which is
the encoded data belong to and use this information to a classic image processing problem [96,97].
1048 IMAGE AND VIDEO CODING
The human eyes are more sensitive to the difference being made rapidly. In a limited space, we can grasp
in brightness than to the differences in color. To exploit only the fundamentals of these techniques. Hopefully,
this in image and video coding systems, the input images these basic principles can inspire and at least help
are often converted to YUV components (one luminance readers to understand new image and video compression
Y and two chrominance differences U and V) instead of technologies.
using RGB components (red, green, blue). Furthermore,
the components U and V often can be subsampled at a
4. IMAGE AND VIDEO CODING STANDARDS
lower resolution.
Image format conversion is also a common postprocess-
We have introduced a number of well-known basic
ing step for image and video decoding. Often the image
compression techniques commonly used in image and
or video signal is not encoded exactly as the same format
video coding. Practical coding systems all contain one or
supported by the display devices, for example, in terms of
more of these basic algorithms. Now, we present several
image size, frame rate, interlaced or progressive display,
image and video coding standards. The importance of
color space, and so on. Postprocessing must be able to con-
standards in image and video coding is due to the fact
vert these different formats of image video into the display
that an encoder and a decoder of a coding algorithm may
format. There are active studies on this issue, especially on
be designed and manufactured by different parties. To
resizing of images [96,97], frame rate conversion [98,99],
ensure the interoperability, that is, one party’s encoded
interlaced to progressive video (de-interlacing) [100–102].
bit stream can be decoded by the other party’s decoder,
It is normal to expect a certain degree of distortion of the
a well-defined standard has to be established. It should
decoded images for very low bit rate applications though
be pointed out that a standard, such as MPEG-1, MPEG-
a good coding scheme should introduce these distortions
2, or MPEG-4, only defines the bit stream syntax and
in areas less annoying for the users. Postprocessing can
the decoding process, leaving the encoding part open to
be used to further reduce these distortions. For block
different implementations. Therefore, strictly speaking,
transform coding, solutions were proposed to reduce
there is no such thing as video quality of a particular
the blocking artifacts appearing at high-compression
standard. The quality of a standard compliant bit stream
ratios [103–108]. Similar approaches have also been
at any given bit rate depends on the implementation of
used to improve the quality of decoded signals in other
the encoder that generates the bit stream.
coding schemes such as subband/wavelet coding, reducing
different kinds of artifacts such as ringing, blurring,
4.1. Image Coding Standards
mosquito noise, and so on [109,110]. In addition to
improving the visual quality, filtering in the prediction 4.1.1. ITU-T Group 3 and Group 4 Facsimile Compres-
loop such as in motion compensation could also improve sion Standards. One of the earliest lossless compression
the coding efficiency [111,112]. standards are the ITU-T (former CCITT) facsimile stan-
When delivering encoded image and video bitstreams dards. The most common ones are Group 3 [123] and
over error-prone or unreliable channels such as the best- Group 4 [124] standards. Group 3 includes actually two
effort Internet and wireless channels, it is quite possible distinct compression algorithms, known as Modified Huff-
to receive a partially corrupted bit stream due to packet man (MH) coding and Modified READ (Relative Element
losses or channel errors. Error concealment is a type of Address Designate) coding (MR) modes. The algorithm of
postprocessing technique [113–116] used to recover the Group 4 standard is commonly called Modified Modified
corrupted areas in an image or video based on prior READ (MMR) coding.
knowledge about the image and video characteristics, MH coding is a one-dimensional coding scheme that
for example, spatial correlation for images or motion encodes each row of pixels in an image independently. It
information for video. Other related preprocessing and uses run-length coding with a static Huffman coder. It
postprocessing techniques are image and video object reduces the size of the Huffman code table by using a
segmentation and composition for object-based video prefix markup code for runs over 63 and thus accounts for
coding such as in MPEG-4 [117]; scene change detection for the term modified in the name. Each scan line ends with
inserting key-frames in video coding [118]; variable frame- a unique EOL (end of line) code and it doubles as an error
rate coding [119] according to the video contents and the recovery code.
related frame interpolation [120]; region of interests (ROI) To exploit the fact that most transitions in bi-level
coding of images [121]; and so forth. facsimile images occur at one pixel to the right or left
Image and video coding is a very active research or directly below a transition on the line above, MR
topic and progress on new compression technologies is coding uses a 2-dimensional reference (Figure 18) with
a1b1 b1 b2
Reference line
Coding line
Entropy
FDCT Quantizer …...
encoder
run-length coding and a static Huffman coder. The name 4.1.3. JPEG. This is probably the best known ISO/ITU-
Relative Element Address Designate (READ) is based on T standard created in the late 1980s. The JPEG (Joint
the fact that only the relative positions of transitions Photographic Experts Group) is a DCT-based standard
are encoded. The most efficient MR coding mode is that specifies three lossy encoding modes, namely,
vertical mode (interline prediction) where only the small sequential, progressive, and hierarchical, and one lossless
difference between the same transition positions in two encoding mode. The baseline JPEG coder [122] is the
adjacent lines is encoded. It also includes EOL codewords sequential encoding in its simplest form. Baseline mode
and periodically includes MH-coded lines (intra-lines) to is the most popular one and it supports lossy coding only.
minimize the effect of errors. The MR coding scheme, Figure 19 shows the key processing steps in a baseline
though simple, provides some very important concepts for JPEG coder. It is based on the 8 × 8 block DCT, uniform
modern image and video coding. We will see later in this scalar quantization with a perceptual weighting matrix,
section, it has much in common to the intra- and inter zigzag scanning of AC components, predictive coding of
video coding schemes. DC components, and Huffman coding (or arithmetic coding
MMR coding in the Group 4 facsimile standard is with more complexity but better performance).
based on MR coding in Group 3. It modifies the MR The progressive and hierarchical modes of JPEG are
algorithm to maximize compression by removing the MR both lossy and differ only in the way the DCT coefficients
error prevention mechanisms. There is no EOL symbols are coded or computed, when compared to the baseline
and MH coding in Group 4. To enable the MR coding mode. They allow a reconstruction of a lower quality or
in Group 4, when coding the first line, a virtual white lower resolution version of the image, respectively, by
line is used as the reference. The transmission errors are partial decoding of the compressed bit stream. Progressive
corrected with lower level control procedures. mode encodes the quantized coefficients by a mixture of
spectral selection and successive approximation, while
the hierarchical mode utilizes a pyramidal approach to
4.1.2. JBIG. Group 3 and Group 4 fax coding has
computing the DCT coefficients in a multi-resolution way.
proven adequate for text-based documents, but does
The lossless mode or lossless JPEG is a lossless coding
not provide good compression or quality for documents
scheme for continuous-tone image. It is an adaptive
with handwritten text or continuous tone images. As a
prediction coding scheme with a few predictors to choose
consequence, a new set of fax standards, such as JBIG
from, which is similar to the DC component coding in the
and the more recent JBIG2, has been created since the
baseline mode but operated on the image domain pixel
late 1980s.
instead. The prediction difference is efficiently encoded
JBIG stands for Joint Bi-level Image experts Group
with Huffman coding or arithmetic coding. It should be
coding: The algorithm is defined in CCITT Recommen-
noted that lossless JPEG is not DCT-based as in the lossy
dation T.82 [125,126] and it is a lossless bi-level image
modes. It is a pure pixel domain predictive coding.
coding standard based on an adaptive arithmetic coding
with adaptive 2D contexts. 4.1.4. JPEG-LS. Not to be confused with the lossless
Besides more efficient compression, JBIG provides pro- mode of JPEG (Lossless JPEG), JPEG-LS [131] is the
gressive transmission of image through multi-resolution latest and totally different ISO/ITU-T standard for
coding. As a new emerging standard for bi-level image lossless coding of still images and which also provides
coding, JBIG2 [127] allows both lossy and lossless bi-level for near-lossless coding. The baseline system is based
image compression. It is the first international stan- on the LOCO-I algorithm (LOw COmplexity LOssless
dard that provides lossy compression of bi-level images COmpression for Images) [132] developed at Hewlett-
to achieve much higher compression ratio with almost Packard Laboratories.
no quality degradation. Besides higher compression per- LOCO-I combines the simplicity of Huffman coding
formance, JBIG2 allows both quality-progressive coding with the compression potential of context models. The
from lower to higher (or lossless) quality, and content- algorithm uses a nonlinear predictor with rudimentary
progressive coding, successively adding different types of edge detecting capability, and is based on a very simple
image data (e.g., first text, then halftones). The key tech- context model, determined by quantized gradients. A small
nology in JBIG2 is pattern matching predictive coding number of free statistical parameters are used to capture
schemes where a library of dynamically built templates is high-order dependencies, without the drawbacks of context
used to predict the repetitive character-based pixel blocks dilution. The prediction residues are modeled by a double-
in a document. sided geometry distribution with two parameters that
1050 IMAGE AND VIDEO CODING
Context
Fixed Predicted
Gradients
predictor values
c b c
a x +
Flat Adaptive
region? correction
Predictor Normal Compressed
Mode Normal bit stream
Run Mode
Image Run-length
Image Run samples Run code spec. Run
samples
counter coder
Modeler Coder
can be updated symbol by symbol based on the simple Group that is designed for different types of still images
context, and in turn are efficiently encoded by a simple allowing different imaging models within a unified system.
and adaptive Golomb code [133,134] that corresponds to JPEG 2000 is intended to complement, not replace, the
Huffman coding for a geometric distribution. In addition, current JPEG standards and it has two coding modes:
a run coding mode is defined for low-entropy flat area, a DCT-based coding mode that uses currently baseline
where runs of identical symbols are encoded using JPEG and a wavelet-based coding mode. JPEG 2000
extended Golomb coding with improved performance and normally refers to the wavelet-based coding mode and it
adaptability. Figure 20 shows the block diagram of the is based on the discrete wavelet transform (DWT), scalar
JPEG-LS encoder. This algorithm was designed for low- quantization, context modeling, arithmetic coding, and
complexity while providing high compression. However, it post-compression rate allocation techniques.
does not provide for scalability, error resilience, or other This core compression algorithm in JPEG 2000 is based
additional functionality. on independent Embedded Block Coding with Optimized
Truncation (EBCOT) of the embedded bit streams [91].
4.1.5. Visual Texture Coding (VTC) in MPEG-4. Visual The EBCOT algorithm uses a wavelet transform to
Texture Coding (VTC) [135] is the algorithm in the generate the subband coefficients, where the DWT can
MPEG-4 standard (see Section 4.2.5) used to compress be performed with reversible filters for lossless coding,
the texture information in photo realistic 3D models or nonreversible filters for lossy coding with higher
as well as still images. It is based on the discrete compression efficiency. EBCOT partitions each subband
wavelet transform (DWT), scalar quantization, zerotree into relatively small blocks of samples (called codeblocks)
coding, and context-based arithmetic coding. Different and encodes them independently. A multi-pass bitplane
quantization strategies are used to provide different coding scheme based on context-based arithmetic coding
SNR scalability: single quantization step (SQ) provides is used to code the original or quantized coefficients in
no SNR scalability; multiple quantization steps (MQ) each codeblock into an embedded bit stream in the order
provides discrete (coarse-grain) SNR scalability and bi- of importance along with the rate-distortion pairs at each
level quantization (BQ) supports fine grain SNR scalability pass of the fractional bitplane (truncation point). It seems
at the bit level. In addition to the traditional tree-depth that failing to exploit the inter-subband redundancy would
(TD) zerotree scanning similar to the EZW algorithm, have a sizable adverse effect on coding efficiency. However,
band-by-band (BB) scanning is also used in MPEG-4 this is more than compensated by the finer scalability that
VTC to support resolution scalability. MPEG-4 VTC also results from the multi-pass coding.
supports coding of arbitrarily shaped objects, by the means JPEG 2000 also supports many new features, such
of a shape adaptive DWT [36,150], but does not support as, compression of large images, single decompression
lossless texture coding. Besides, a resolution scalable architecture, compound documents, static and dynamic
lossless shape coding algorithm [136] that matches the region-of-interest (ROI), multiple component images,
shape adaptive wavelet decomposition at different scales content-based description, and protective image security.
is also adopted. The scalable shape coding scheme uses
fixed contexts and fixed probability tables that are suitable 4.2. Video Coding Standards
for the bi-level shape masks with large continuous regions
of identical pixel values. Most practical video coding systems including all interna-
tional video coding standards are based on a hybrid coder
4.1.6. JPEG 2000. JPEG 2000 [86,137] is the latest that combines motion compensation in the temporal direc-
emerging standard from the Joint Photographic Experts tion and DCT transform coding within each independent
IMAGE AND VIDEO CODING 1051
Rate control
Bit stream
Video in
- DCT Q VLC
Output
buffer
out
IDCT Q −1
+
Motion Frame
compensation buffer Motion
vectors
Motion
estimation
Encoder
Bit stream
in Input Q −1
VLD IDCT
buffer
Motion +
vectors Video out
Motion Frame
compensation buffer
Decoder Figure 21. A generic DCT-based predictive video
coder.
frame or prediction frame. A generic DCT-based predictive belong to the DCT-based motion predictive coder category.
video coding system is illustrated in Fig. 21. In the following sections, we provide brief summaries of
For independent frames, a DCT transform coder similar the video coders with emphasis on the differences of the
to the baseline JPEG is applied to compress the frame coding algorithms.
without using any information from other frames (referred
to as intracoding). For prediction frames, a compensation 4.2.1. H.261. The H.261 video codec [138] (1990), ini-
module first generates a predicted image from one or tially intended for ISDN teleconferencing, is the baseline
more previously coded frames based on the motion vectors video mode for most multimedia conferencing systems.
estimated and subtract it from the original frame to The basic coding unit in H.261 is macroblock that
obtain the motion predicted residue image. Another DCT contains one 16 × 16 or four 8 × 8 luminance blocks and
transform coder that fits the characteristics of the residue two 8 × 8 chrominance blocks. The H.261 codec encodes
image is then applied to further exploit the spatial video frames using an 8 × 8 DCT. An initial frame (called
correlation to efficiently compress the residue image. In an I or intra frame) is coded and transmitted as an
order to avoid the drifting errors caused by the mismatch independent frame. Subsequent frames are efficiently
between reference frames, the encoder normally embeds coded in the inter mode (P or predictive frame) using
the same decoding structure to reconstruct exactly the motion compensated predictive coding described above,
same reference frames as in the decoder. Optionally, where motion compensation is based on a 16 × 16 pixel
a loop filter to smooth out the blocking artifacts or block with integer pixel motion vectors and always
generate a better reference image may be inserted in referenced to the immediately previous frame. The DCT
the reconstruction loop to generate a visually better image coefficients are quantized with a uniform quantizer
and enhance the prediction efficiency. and arranged in a zigzag scanning order. Run-length
There have been several major initiatives in video coding combined with variable length coding is used
coding that have led to a range of video standards: ITU to compress the quantized coefficients into a video bit
video coding for video teleconferencing standards, H.261 stream. An optional loop filter is introduced after motion
for ISDN, H.262 (same as MPEG-2) for ATM/broadband, compensation to improve the motion prediction efficiency.
and H.263 for POTS (Plain Old telephone Service); H.261 is intended for head-and-shoulders type of scene
ISO video coding standards MPEG-1 (Moving Picture in video conferencing applications where only small,
Experts Group), MPEG-2 and MPEG-4; and the emerging controlled amounts of motion are present so the motion
standards such as H.264 | MPEG-4 Part 10 by JVT (Joint vector range can be limited. The supported video formats
Video Team of ITU and ISO). These coding systems all include both the CIF (Common Interchange Format with
1052 IMAGE AND VIDEO CODING
a resolution of 352 × 288 and YCbCr 4 : 2 : 0) and the QCIF including, a more flexible coding mode selection at the
(quarter CIF) format. All H.261 video is noninterlaced, macroblock level; a nonlinear quantization table with
using a simple progressive scanning pattern. increased accuracy for small values; an alternative zigzag
scan for DCT coefficients especially for the interlaced video
4.2.2. MPEG-1. The MPEG-1 standard [139] (1993) is coding; much increased permissible motion vector range;
a true multimedia standard that contains specifications new VLC tables for the increase bit rate range; customized
for audio coding, video coding, and systems. MPEG-1 was perceptual quantization matrix support; and dual prime
intended for storage of multimedia content on a standard prediction for interlaced video encoded as P pictures, which
CD-ROM, with data rates of up to 1.5 Mbits/s and a storage mimics the B picture prediction especially for low-delay
capacity of about 600 Mbytes. Noninterlaced CIF video applications.
format (352 × 288 at 25 fps or 352 × 240 at 30 fps) is used MPEG-2 also introduces some error resilience tools
to provide VHS-like video quality. such as independent slice structure, data partitioning
The video coding in MPEG-1 is very similar to the video to separate data with different importance, concealment
coding of the H.261 described above with the difference motion vectors in intra-pictures, and different scalabilities
that the uniform quantization is now based on perceptual as we discussed.
weighting criteria. The temporal coding was based on both MPEG-2 is probably the most widely used video coding
uni- and bi-directional motion-compensated prediction. standard so far that enables many applications, such as
A new B or bi-directionally predictive picture type is DVD, HDTV, and digital satellite and cable TV broadcast.
introduced in MPEG-1 which can be coded based on either
the next and/or the previous I or P pictures. In contrast, an
4.2.4. H.263, H.263+, and H.263++. The H.263 stan-
I or intra picture is encoded independently of all previous
dard [141] (1995) was intended for use in POTS confer-
or future pictures and a P or predictive picture is coded
encing and the video codec is based on the same DCT
based on only a previous I or P picture. MPEG-1 also allows
half-pel motion vector precision to improve the prediction and motion compensation techniques as used in H.261.
accuracy. There is no loop filter present in an MPEG-1 H.263 now supports 5 picture formats including sub-
motion compensation loop. QCIF (126 × 96) QCIF (176 × 144), CIF (352 × 288), 4CIF
(704 × 576) and 16CIF (1408 × 1152). Several major differ-
4.2.3. MPEG-2. The MPEG-2 standard [140] (1995) ences exist between H.263 and H.261 including, more accu-
was initially developed primarily for coding interlaced rate half-pel motion compensation in H.263 but integer-pel
video at 4-9 Mbits/s for broadcast TV and high quality in H.261; no loop filter in H.263 but optional in H.261
digital storage media (such as DVD video); it has now also since the optional overlapped block motion compensation
been used in HDTV, cable/satellite TV, video services over (OBMC) in H.263 could have the similar de-blocking filter
broadband networks, and high quality video conferencing effect; motion vector predictor in H.263 using the median
(same as H.262). The MPEG-2 standard includes video value of three candidate vectors while only the preceding
coding, audio coding, system format for program and one being used in H.261; 3-dimensional run length coder
transport streams, and other information related to (last, run, level) in H.263 versus (run, level) and an eob
practical implementations. symbol in H.261.
MPEG-2 video supports both interlaced and progressive As H.263 version 2, H.263+ [142] (1998) provides
video, multiple color format (4 : 2 : 0, 4 : 2 : 2 and 4 : 4 : 4), optional extensions to baseline H.263. These options
flexible picture size and frame rates, hierarchical or broaden the range of useful applications and improve
scalable video coding, and is backward compatible the compression performance. With these new options,
with MPEG-1. To best satisfy the needs of different H.263+ supports custom source format with different
applications, different profiles (subset of the entire picture size, aspect ratio, and clock frequency. It also
admissible bit stream syntax) and levels (set of constraints provides enhanced error-resilience capability with slice
imposed on the parameters of the bitstreams within a structured coding, prediction reference selection, motion
profile) are also defined in MPEG-2. vector coding with reversible VLC (RVLC), where RVLCs
The most distinguishing feature of MPEG-2 from are codewords which can be decoded in a forward
previous standards is the support for interlaced video as well as a backward manner, and independent
coding. For interlaced video, it can be either encoded as segment decoding. Backward-compatible supplemental
a frame picture or a field picture, with adaptive frame enhancement information can be embedded into the video
or field DCT and frame or field motion compensation at bit stream to assist operations in the receiver. In addition,
macroblock-level. MPEG-2 also offer another new feature temporal scalability can be provided by B picture, while
in providing the temporal, spatial, and SNR scalabilities. SNR and spatial scalabilities are enabled by newly defined
Temporal scalability is achieved with B-frame coding; EI pictures (referencing to a temporally simultaneous
spatial scalability is obtained by encoding the prediction picture) and EP pictures (referencing to both a temporally
error with a reference formed from both an upsampled low- preceding picture and a temporally simultaneous one).
resolution current frame and a high-resolution previous Figure 22 gives an example of mixed scalabilities enabled
frame; SNR scalability is provided by finely quantizing by these new picture types.
the error residue from the coarsely quantized low-quality H.263++ [143] (H.263 version 3, 2001) provides
layer (there is a drifting problem). more additional options for H.263, including, Enhanced
Compared with MPEG-1, a number of improvements Reference Picture Selection to provide enhanced coding
have been made to further improve the coding efficiency, efficiency and enhanced error resilience (particularly
IMAGE AND VIDEO CODING 1053
Temporal
enhancement
layer 3
Spatial EI B EI B EP
enhancement
layer 2
SNR
enhancement EI EP EP EP EP
layer 1
I P P P P
Base layer Figure 22. Example for mixed temporal,
SNR and spatial scalabilities supported in
H.263+.
against packet losses) through the use of multiple For reasons of efficiency and backward compatibility,
reference pictures; Data Partitioned Slice to enhance error video objects are coded in a hybrid coding scheme
resilience (particularly against localized corruption of bit somewhat similar to previous MPEG standards. Figure 23
stream) by separating header and motion vectors from outlines the basic approach of the MPEG-4 video
DCT coefficients in the bit stream and protecting motion algorithms to encode rectangular as well as arbitrarily-
vectors using a reversibly decodable variable length shaped input image sequences. Compared with traditional
coder (RVLC); and additional Supplemental Enhancement rectangular video coding, there is an additional shape
Information including support for interlaced video. coding module in the encoder that compresses the shape
information necessary for the decoding of a video object.
4.2.5. MPEG-4. Coding of separate audio–visual Shape information is also used to control the behavior of
objects, both natural and synthetic, leads to the latest the DCT transform, motion estimation and compensation,
ISO MPEG-4 standard [144] (1998). The MPEG-4 visual and residue coding.
standard is developed to provide users a new level of Shape coding can be performed in binary mode, where
interaction with visual content. It provides technologies the shape of each object is described by a binary mask,
to view, access and manipulate objects rather than pixels, or in grayscale mode, where the shape is described in a
with great error robustness at a large range of bit rates. form similar to an alpha channel, allowing transparency,
Application areas range from digital television, stream- and reducing aliasing. The binary shape can be losslessly
ing video, to mobile multimedia, 2D/3D games, and visual or lossy coded using context-based arithmetic encoding
communications. (CAE) based on the context either from the current video
The MPEG-4 visual standard consists of a set of object (intraCAE) or from the motion predicted video
tools supporting mainly three classes of functionalities: object (interCAE). By using binary shape coding for coding
compression efficiency, content-based interactivity, and its support region, grayscale shape information is lossy
universal access. Among them, content-based interactivity encoded using a block based motion compensated DCT
is one of the most important novelties offered by similar to that of texture coding.
MPEG-4. To support object-based functionality, the Texture coding is based on an 8 × 8 DCT, with
basic compression unit in MPEG-4 can be arbitrarily appropriate modifications for object boundary blocks for
shaped video object plane (VOP) instead of always the both intramacroblocks and intermacroblocks. Low-pass
rectangular frame. Each object can be encoded with extrapolation (LPE) (also known mean-repetitive padding)
different parameters, and at different qualities. Tools is used for intra boundary blocks and zero padding is
provided in the MPEG-4 standard include shape coding, used for inter boundary blocks before DCT transforms.
motion estimation and compensation, texture coding, error Furthermore, shape adaptive DCT (SA-DCT) can also be
resilience, sprite coding, and scalability. Conformance used to decompose the arbitrarily shaped boundary blocks
points, in the form of object types, profiles, and levels, to achieve more coding efficiency. The DCT coefficients
provide the basis for interoperability. MPEG-4 provides are quantized with or without perceptual quantization
support for both interlaced and progressive material. The matrices. MPEG-4 also allows for a nonlinear quantization
chrominance format that is supported is 4 : 2 : 0. of DC values. To further reduce the average energy of the
1054 IMAGE AND VIDEO CODING
Padding Padding or
or Inverse SA-
SA-DCT DCT Texture,
Video motion Bit stream
texture in bits Video out
− DCT Q VLC
multiplex
shape bits
IDCT Q−1
Padding
+
Motion Frame
compensation buffer
Motion
estimation
Padding
Video
shape in Shape
Figure 23. Block diagram of a MPEG-4
coding
video coder.
quantized coefficients, the adaptive prediction of DC and MPEG-4 also includes tool for sprite coding where
AC coefficients from their neighboring blocks can be used a sprite consists of those regions present in the
to improve the coding efficiency. Three types of zigzag scene throughout the video segment, for example, the
scans are allowed to reorder the quantized coefficients background. Sprite-based coding is very suitable for
according to the DC prediction types. Two VLC tables synthetic objects as well as objects in natural scenes with
switched by the quantization level can be used for the run rigid motion, where sprites can provide high compression
length coding. efficiency by appropriate warping/cropping operations.
Still textures with possible arbitrary shapes can be Shape and texture information for a sprite is encoded
encoded using wavelet transforms as described before with as an intra VOP.
the difference that the shape-adaptive discrete wavelet Scalability is provided for object, SNR, spatial and
transform (SA-DWT) and shape adaptive zerotree coding temporal resolution enhancement in MPEG-4. Object
are used to extend the still texture coding to arbitrarily scalability is inherently supported by the object based
shaped image objects. coding scheme. SNR scalability is provided by fine
Motion compensation is block based, with appropriate granularity scalability (FGS) where the residue of the
modifications for object boundaries through repetitive base layer data is further encoded as an out-of-loop
padding and ‘‘polygon matching.’’ Motion vectors are enhancement layer. Spatial scalability is achieved by
predictively encoded. The block size can be 16 × 16, or referencing both the temporally simultaneous base layer
8 × 8, with up to quarter pixel precision. MPEG-4 also and the temporally preceding spatial enhancement layer.
provides modes for overlapped block motion compensation Temporal scalability is provided by using bi-directional
(OMBC) to reduce blocking artifacts and get better coding. One of the most advantageous features of MPEG-
prediction quality at lower bit rates, and global motion 4 temporal and spatial scalabilities is that they can be
compensation (GMC) through image warping. Moreover, applied to both rectangular frames and arbitrarily-shaped
static sprite can also be encoded efficiently with only 8 video objects.
global motion parameters describing camera motion that
represent the appropriate affine transform of the sprite.
When the video content to be coded is interlaced, 4.2.6. The Emerging MPEG-4 Part-10/ITU H.264 Stan-
additional coding efficiency can be achieved by adaptively dard. At the time of completing this manuscript, there
switching between field and frame coding. is a very active new standardization effort in develop-
Error resilience functionality is important for universal ing a video coding standard that can achieve substan-
access through error-prone environments, such as mobile tially higher coding efficiency compared to what could be
communications. It offers means for resynchronization, achieved using any of the existing video coding standards.
error detection, data recovery, and error concealment. This is the emerging MPEG-4 Part-10/ITU H.264 stan-
Error resilience in MPEG-4 is provided by resynchro- dard jointly developed by a Joint Video Team (JVT) [145].
nization markers to resume decoding from errors; data In MPEG, this new standard is called the Advanced Video
partitioning to separate information with different impor- Coding (AVC) standard.
tance; header extension codes to insert redundant impor- The new standard takes a detailed look at many
tant header information in the bit stream; and reversible fundamental issues in video coding. The underlying coding
variable length codes (RVLC) to decode portions of the structure of the new standard is still a block-based motion
corrupted bit stream in the reverse order. compensated transform coder similar to that adopted in
IMAGE AND VIDEO CODING 1055
Quantization step
10 Intra increases at a
Prediction Coding control compounding rate of
Modes 12.5%
4x4 Integer
Transform Inverse UVLC
(fixed) Transform Q−1 or
CABAC
Loop filter
No
+ mismatch Multiple
reference
frames
Motion Frame
compensation buffer
Motion
vectors
previous standards. The block diagram of an AVC encoder frame. The use of adaptive de-blocking filters in the pre-
is given in Fig. 24. diction loop substantially reduces the blocking artifacts,
There are several significant advances in AVC that with additional complexity. Because bit savings depend on
provides high coding efficiency. The key feature that video sequences and bit rates, the above numbers are only
enables such high coding efficiency is adaptivity. For meant to give a rough idea about how much improvement
intracoding, the flexible intra prediction is used to exploit one may expect from a coding tool.
spatial correlation between adjacent macroblocks and In AVC, the quantization step sizes are increased at
adapt to the local characteristics. AVC uses a simpler and a compounding rate of approximately 12.5%, rather than
integer-based 4 × 4 block transform that does not have a increasing it by a constant increment, to provide finer
mismatch problem as in the 8 × 8 DCT due to different quantization yet covering a greater bit rate range.
implementation precisions dealing with the real-valued In AVC, entropy coding is performed using either
DCT basis function. The smaller transform size also helps variable length codes (VLC) or using context-based
to reduce blocking and ringing artifacts. adaptive binary arithmetic coding (CABAC). The use
Most of the coding gain achieved in AVC is obtained of CABAC in AVC yields a consistent improvement in
through advanced motion compensation. Motion compen- bit savings. AVC is still under development and new
sation in AVC supports most of the key features found technologies continue to be adopted into the draft standard
in earlier video standards, but its efficiency is improved to further improve the coding efficiency. To provide a
through added flexibility and functionality. For motion high level picture of the various video coding standards,
estimation/compensation, AVC uses blocks of different Fig. 25 shows the approximate timeline and evolution of
sizes and shapes that provide capability to handle fine different standards.
motion details, reduce high frequency components in
the prediction residue and avoid large blocking artifacts. 5. FUTURE RESEARCH DIRECTIONS IN IMAGE AND
Using seven different block sizes and shapes can trans- VIDEO CODING
late into bit savings of more than 15% as compared to
using only a 16 × 16 block size. Higher precision sub-pel Further improving the coding efficiency of image and
motion compensation is supported in AVC to improve the video compression algorithms continues to be the goal
motion prediction efficiency. Using 1/4-pel spatial accu- that researchers will be pursuing for a long time. From
racy can yield more than 20% in bit savings as compared the success of the JPEG2000 and MPEG-4 AVC codecs,
to using integer-pel spatial accuracy. AVC offers an option we learned that better schemes to enable rate-distortion
of multiple reference frame selection that can improve optimization (normally generating a variable bit rate
both the coding efficiency and error-resilience capability. bit stream) and content-adaptivity are two areas worth
Using five reference frames for prediction can yield 5-10% further investigation. Adaptivity is the most important
in bit savings as compared to using only one reference feature of AVC, block-sizes, intra prediction modes, motion
1056 IMAGE AND VIDEO CODING
Joint ITU-T/MPEG
H.262/MPEG-2 H.264/MPEG-4 AVC
Standards
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
prediction modes, multiple reference frames, adaptive different scalabilities for a video bit stream. The principal
entropy coding, adaptive loop filtering, and so on, all concept in scalability is that an improvement in source
reflect the significance of adaptivity. The main reason is reproduction can be achieved by sending only an
that image and video signals by nature are not stationary incremental amount of bits over the current transmitted
signals, although traditional source coding theories all rate. Quantization, spatial, and temporal scalabilities
assume them as statistically stationary sources. are all important in applications. A key feature of
Besides coding efficiency, there are also increasing scalable coding is that the encoding only needs to be
demands for more special functionalities, such as scal- done once and a single scalable bit stream will fit
ability, error-resilience, transcoding, object-based coding, in all applications with different target bit rates, or
model-based coding, and hybrid natural and synthetic spatial, or temporal resolutions. Scalable coding can
coding, and so forth. These functionalities may require bring many benefits in practical applications: potential
unconventional image and video coding techniques and high coding efficiency through rate-distortion optimization
may need to balance the trade-off between coding efficiency (e.g., JPEG-2000); easy adaptation to heterogeneous
and functionalities. channels and user device capabilities and support for
Scalable coding refers to coding an image or a video multicast; better received picture quality over dynamic
sequence into a bit stream that is partially decodable changing networks through channel adaptation; robust
to produce a reconstructed image or video sequence delivery over error-prone, packet-loss-ridden through
at possibly a lower spatial, temporal, or quantization unequal error protection (UEP) and priority streaming
resolution. Figure 26 shows the relationship between because some enhancement bit stream can be lost
or discarded without irreparable harm to the overall
representation; reduced storage space required to store
multiple bitstreams otherwise; no complicated transcoding
y
ilit
may be considered as repairing damaged bitstreams while face and body animation is included. Using such a model-
scalable coding is trying to prevent bitstreams from being based coding, it is possible to transmit visual information
damaged. For a given transmission bit rate, splitting bits at extremely low bit rates. The challenges are to establish
between source coding and channel coding is a bit tricky. good models for different visual objects and to accurately
If bits are allocated to channel coding and the channel is extract the model parameters.
ideal, there is a loss in performance compared to source Image and video coding techniques mainly deal with
coding alone. Similarly, if there are no bits allocated to natural images and video. Computer graphics generate
channel coding and the channel is very noisy, there will be visual scenes as realistic as possible. Hybrid natural and
a loss in performance compared to using some error control synthetic coding is to combine these two areas. Usually,
coding. How to optimally allocate these bits between source it is relatively easy for computer graphics to generate a
coding and channel coding brings us to the problem of joint structure of a visual object, but harder to generate realistic
source and channel coding. texture. With hybrid natural and synthetic coding, natural
Transcoding is usually considered as converting one texture mapping can be used to make animated objects
compressed bit stream into another compressed bit stream. look more realistic. Latest developments in computer
There are several different reasons for using transcoding. graphics/vision tend to use naturally captured image and
One of them is to change the bit rate. Another reason video contents to render realistic environment (called
is to convert the bit stream format from one standard to image based rendering or IBR), such as lumigraph [155],
another. The challenge is to have the coding efficiency light-fields [156], concentric mosaics [157], and so on.
of the transcoded bit stream close to that of an encoded How to efficiently encode these multidimensional cross-
bit stream from the original image and video. Another correlated image and video contents is an active research
challenge is to have the transcoding operation as simple topic [158].
as possible without a complete re-encoding process. In summary, image and video coding research has been
Object-based coding is required for object-based inter- a very active field with a significant impact to industry
activity and manipulation. In MPEG-4 video coding stan- and society. It will continue to be active with many more
dard, object-based coding is defined for many novel applica- interesting problems to be explored.
tions to provide content-based interactivity, content-based
media search and indexing, and content scalable media BIOGRAPHIES
delivery, and so on. Figure 27 illustrates the concept of
an object-based scenario. One of the challenges in object- Shipeng Li joined Microsoft Research Asia in May 1999.
based coding is the preprocessing part that separates an He is currently the research manager of Internet Media
image or video object from a normal scene. Good progresses group. From October 1996 to May 1999, he was with
have been reported in this direction [151–154]. Multimedia Technology Laboratory at Sarnoff Corporation
For some visual objects, it is possible to establish a in Princeton, NJ (formerly David Sarnoff Research Center,
2D or 3D structural model with which the movement of and RCA Laboratories) as a member of technical staff. He
different parts of the object can be parameterized and has been actively involved in research and development of
transmitted. One of such successful visual models is the digital television, MPEG, JPEG, image/video compression,
human face and body. In MPEG-4 visual coding standard, next generation multimedia standards and applications,
Object Object
coding Decoding
Mux Scene
Mux Demux
Composition
Object
Decoding
Object
coding
Object
Decoding
Scene
Mux Demux Composition
Object
coding Object
Decoding
and consumer electronics. He made contributions in shape- 7. T. D. Lookabaugh and R. M. Gray, High-resolution quanti-
adaptive wavelet transforms, scalable shape coding, and zation theory and the vector quantizer advantage, IEEE
the error resilience tool in the FGS profile for the Trans. Inform. Theory IT-35: 1023–1033 (Sept. 1989).
MPEG-4 standard. He has authored and coauthored more 8. T. R. Fischer, A pyramid vector quantizer, IEEE Trans.
than 70 technical publications and more than 20 grants Inform. Theory 32: 568–583 (July 1986).
and pending U.S. patents in image/video compression 9. M. Barlaud et al., Pyramid lattice vector quantization for
and communications, digital television, multimedia, and multiscale image coding, IEEE Trans. Image Process. 3:
wireless communication. He is the coauthor of a chapter 367–381 (July 1994).
in Multimedia Systems and Standards published by the
10. J. H. Conway and N. Sloane, A fast encoding method for
Marcel Dekker, Inc. He is a member of Visual Signal
lattice codes and quantizers, IEEE Trans. Inform. Theory
Processing and Communications committee of IEEE IT-29: 820–824 (Nov. 1983).
Circuits and Systems Society. He received his B.S. and
M.S. both in Electrical Engineering from the University 11. A. Buzo, Y. Linde, and R. M. Gray, An algorithm for vector
quantizer design, IEEE Trans. Commun. 20: 84–95 (Jan.
of Science and Technology of China (USTC) in 1988 and
1980).
1991, respectively. He received his Ph.D. in Electrical
Engineering from Lehigh University, Bethlehem, PA, 12. B. Ramamurthi and A. Gersho, Classified vector quantiza-
in 1996. He was an assistant professor in Electrical tion of image, IEEE Trans. Commun. 34: 1105–1115 (Nov.
Engineering department at University of Science and 1986).
Technology of China in 1991–1992. 13. D.-Y. Cheng, A. Gersho, B. Ramamurthi, and Y. Shoham,
Fast search algorithms for vector quantization and pattern
Weiping Li received his B.S. degree from University of matching, Proc. of the International Conference on ASSP,
Science and Technology of China (USTC) in 1982, and 911.1–911.4 (March 1984).
his M.S. and Ph.D. degrees from Stanford University in
14. J. L. Bentley, Multidimensional binary search tree used for
1983 and 1988 respectively, all in electrical engineering.
associative searching, Comm. ACM, 509–526 (Sept. 1975).
Since 1987, he has been a faculty member in the depart-
ment of Electrical and Computer Engineering at Lehigh 15. M. Soleymani and S. Morgera, An efficient nearest neighbor
University. From 1998 to 1999, he was with Optivision, search method, IEEE Trans. Commun. 20: 677–679 (1987).
Inc., in Palo Alto, California, as the Director of R&D. He 16. Abramson, Norman, Information Theory and Coding,
has been with WebCast Technologies, Inc., since 2000. McGraw-Hill, New York, 1963.
Weiping Li has been elected a Fellow of IEEE for contri- 17. P. G. Howard and J. S. Vitter, Practical implementations
butions to image and video coding algorithms, standards, of arithmetic coding, in Image and Text Compression,
and implementations. He served as the Editor-in-Chief J. A. Storer, ed., Kluwer Academic Publishers, Boston, MA,
of IEEE Transactions on Circuits and Systems for Video 1992, pp. 85–112.
Technology from 1999 to 2001. He has served as an Editor 18. J. Ziv and A. Lempel, A universal algorithm for sequential
for the Streaming Video Profile Amendment of MPEG-4 data compression, IEEE Trans. Inform. Theory IT-23:
International Standard. He is a member of the Board of 337–343 (1977).
Directors for MPEG-4 Industry Forum. He served as one of
19. R. B. Wells, Applied Coding and Information Theory for
the Guest Editors for a special issue of IEEE Proceedings
Engineers, Prentice-Hall, Englewood Cliffs, NJ, 1999.
on image and video compression (February 1995). Weiping
Li was awarded Guest Professorships in University of Sci- 20. T. A. Welch, A technique for high-performance data com-
ence and Technology of China and in Huazhong University pression, IEEE Trans. Comput. 17:8–19 (1997).
of Science and Technology in 2001. He received the Spira 21. M. Rabani and P. Jones, Digital Image Compression Tech-
Award for Excellence in Teaching in 1992 at Lehigh Uni- niques, SPIE Optical Engineering Press, Bellingham, WA,
versity and the Guo Mo-Ruo Prize for Outstanding Student 1991.
in 1980 at University of Science and Technology of China. 22. O. Egger, P. Fleury, T. Ebrahimi, and M. Kunt, High-
performance compression of visual information — A tutorial
BIBLIOGRAPHY review — Part I: Still pictures, Proc. IEEE 87(6): 976–1011
(June 1999).
1. M. B. Priestley, Spectral Analysis and Time Series, Aca- 23. T. Berger and J. D. Gibson, Lossy source coding, IEEE
demic Press, NY, 1981. Trans. Inform. Theory 44(6): 2693–2723 (1998).
2. S. P. Lloyd, Least squares quantization in PCM, IEEE 24. T. Ebrahimi and M. Kunt, Visual data compression for
Trans. Inform. Theory 28: 127–135 (1982). multimedia applications, Proc. IEEE 86(6): 1109–1125 (Jun.
3. A. Gersho and R. M. Gray, Vector Quantization and Signal 1998).
Compression, Kluwer Academic Publishers, 1992. 25. M. J. Narasimha and A. M. Peterson, On the computation
4. C. E. Shannon, Coding theorems for a discrete source with of the discrete cosine transform, IEEE Trans. Commun.
a fidelity criteria, IRE National Convention Record, Part 4, COM-26: 934–936 (Jun. 1978).
142–163, 1959. 26. W. Li, A new algorithm to compute the DCT and its inverse,
5. R. M. Gray, Source Coding Theory, Kluwer Academic Press, IEEE Trans. Signal Proc. 39(6): 1305–1313 (June 1991).
1992. 27. D. Slawecki and W. Li, DCT/IDCT processor design for high
6. T. Berger, Rate Distortion Theory, Prentice-Hall Inc., data rate image coding, IEEE Trans. Circuits Syst. Video
Englewood Cliffs, 1971. Technol. 2(2): 135–146 (1992).
IMAGE AND VIDEO CODING 1059
28. W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data 49. T. Koga et al., Motion compensated interframe coding for
Compression Standard, Van Nostrand Reinhold, New York, video conferencing, Proc. of the National Telecommunica-
1993. tions Conference, New Orleans, LA, pp. G5.3.1–G5.3.5 (Dec.
29. J. D. Johnson, A filter family designed for use in quadrature 1981).
mirror filter bands, Proc. Int. Conf. ASSP (ICASSP) 291–294 50. J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim,
(Apr. 1980). A novel unrestricted center-biased diamond search algo-
30. H. S. Malavar, Signal Processing with Lapped Transforms, rithm for block motion estimation, IEEE Trans. Circuits
Artech House Norwood, MA, 1992. Syst. Video Technol. 8(4): 369–377 (Aug. 1998).
31. Z. Zhou and A. N. Venetsanopoulos, Morphological methods 51. S. Zhu and K. K. Ma, A new Diamond search algorithm for
in image coding, Proc. Int. Conf. ASSP (ICASSP) 3: 481–484 fast block matching motion estimation, Proc. 1st Int. Conf.
(March 1992). Inform. Commun. Signal Process. ICICS ‘97, 1: 292–296
(Sept. 1997).
32. J. R. Casas, L. Torres, and M. Jare no, Efficient coding of
residual images, Proc. SPIE, Visual Communications Image 52. Z. L. He and M. L. Liou, A high performance fast search
Processing 2094: 694–705 (1993). algorithm for block matching motion estimation, IEEE
Trans. Circuits Syst. Video Technol. 7(5): 826–828 (Oct.
33. A. Toet, A morphological pyramid image decomposition,
Pattern Reconstruction Lett. 9(4): 255–261 (May 1989). 1997).
34. T. Sikora, Low complexity shape-adaptive DCT for coding of 53. S. Zafar, Y. Zhang, and J. S. Baras, Predictive block-
arbitrarily shaped image systems, Signal Processing: Image matching motion estimation schemes for video compres-
Commun. 7: 381–395 (Nov. 1995). sion — Part II. Inter-frame prediction of motion vectors,
IEEE Proc. SOUTHEASTCON 91 2: 1093–1095 (April
35. T. Sikora and B. Makai, Shape-adaptive DCT for generic 1991).
coding of video, IEEE Trans. Circuits Syst. Video Technol.
5: 59–62 (Feb. 1995). 54. L. W. Lee, J. F. Wang, J. Y. Lee, and J. D. Shie, Dynamic
search-window adjustment and interlaced search for block-
36. S. Li and W. Li, Shape-adaptive discrete wavelet transforms
matching algorithm, IEEE Trans. Circuits Syst. Video
for arbitrarily shaped visual object coding, IEEE Trans.
Technol. 3(1): 85–87 (Feb. 1993).
Circuits Syst. Video Technol. 10(5) 725–743 (Aug. 2000).
55. B. Liu and A. Zaccarin, New fast algorithms for the
37. J. M. Shapiro, Embedded image coding using zerotrees
estimation of block motion vectors, IEEE Trans. Circuits
of wavelet coefficients, IEEE Trans. Signal Process. 41:
Syst. Video Technol. 3(2): 148–157 (April 1993).
3445–3462 (Dec. 1993).
56. A. Smolic, T. Sikora, and J.-R. Ohm, Long-term global
38. W. Li, Overview of fine granularity scalability in MPEG-4
motion estimation and its application for sprite coding,
video standard, IEEE Trans. Circuit Syst. Video Technol.
11(3): 301–317 (March 2001). Special issue on streaming content description and segmentation, IEEE Trans. CSVT
video. 9(8): 1227–1242 (Dec. 1999).
39. W. Li and Y.-Q. Zhang, Vector-based signal processing and 57. B. Girod, Motion-compensating prediction with fractional-
quantization for image and video compression, Proc. IEEE pel accuracy, IEEE Trans. Commun. 41: 604–612 (April
83(2): 317–335 (Feb. 1995). 1993).
40. W. Li and Y.-Q. Zhang, Vector transform coding of subband- 58. M. T. Orchard and G. J. Sullivan, Overlapped block motion
decomposed images, IEEE Trans. Circuits Syst. Video compensation: an estimation-theoretic approach, IEEE
Technol. 4(4): 383–391 (Aug. 1994). Trans. Image Proc. 3(5): 693–699 (Sept. 1994).
41. W. Li, On vector transformation, IEEE Trans Signal Proc. 59. C. J. Hollier, J. F. Arnold, and M. C. Cavenor, The effect of
41(11): 3114–3126 (Nov. 1993). a loop filter on circulating noise in interframe video coding,
IEEE Trans. Circuits Syst. Video Technol. 4(4): 442–446
42. W. Li, Vector transform and image coding, IEEE Trans.
(Aug. 1994).
Circuits Syst. Video Technol. 1(4): 297–307 (Dec. 1991).
60. M. Yuen and H. R. Wu, Performance of loop filters in
43. H. Q. Cao and W. Li, A fast search algorithm for vector
MC/DPCM/DCT video coding, Proceedings of ICSP’96,
quantization using a directed graph, IEEE Trans. Circuits
pp. 1182–1186.
Syst. Video Technol. 10(4): 585–593 (June 2000).
44. F. Ling, W. Li, and H. Sun, Bitplane coding of DCT 61. S. Mallat and Z. Zhang, Matching pursuit with time-
coefficients for image and video compression, Proc. of SPIE frequency dictionaries, IEEE Trans. Signal Proc. 41:
VCIP’99 (Jan. 1999). 3397–3415 (Dec. 1993).
45. C. Wang, H. Q. Cao, W. Li, and K. K. Tzeng, Lattice labeling 62. R. Neff and A. Zakhor, Very low bit-rate video coding based
algorithms for vector quantization, IEEE Trans. Circuits on matching pursuits, IEEE Trans. Circuits Syst. Video
Syst. Video Technol. 8(2): 206–220 (Apr. 1998). Technol. 7(1): 158–171 (Feb. 1997).
46. W. Li et al., A video coding algorithm using vector-based 63. O. K. Al-Shaykh et al., Video compression using matching
techniques, IEEE Trans. Circuits Syst. Video Technol. 7(1): pursuits, IEEE Trans. Circuits Syst. Video Technol. 9(1):
146–157 (Feb. 1997). 123–143 (Feb. 1999).
47. F. Ling and W. Li, Dimensional adaptive arithmetic coding 64. W. E. Glenn, J. Marcinka, and R. Dhein, Simple scalable
for image compression, Proc. of IEEE International Sympo- video compression using 3-D subband coding, SMPTE J.
sium on Circuits and Systems (May–June 1998). 106: 140–143 (March 1996).
48. I. Daubechies and W. Sweldens, Factoring wavelet trans- 65. W. A. Pearlman, B.-J. Kim, and Z. Xiong, Embedded video
forms into lifting steps, J. Fourier Anal. Appl. 4: 247–269 coding with 3D SPIHT, P. N. Topiwala, ed., in Wavelet Image
(1998). and Video Compression, Kluwer, Boston, MA, 1998.
1060 IMAGE AND VIDEO CODING
66. S. J. Choi and J. W. Woods, Motion-compensated 3-D sub- 85. E. A. Riskin, Optimal bit allocation via the generalized
band coding of video, IEEE Trans. Image Proc. 8: 155–167 BFOS algorithm, IEEE Trans. Inform. Theory 37: 400–402
(Feb. 1999). (March 1991).
67. J. R. Ohm, Three-dimensional subband coding with motion 86. M. Rabbani and R. Joshi, An overview of the JPEG 2000
compensation, IEEE Trans. Image Process. 3(5): 559–571 still image compression standard, Signal Process. Image
(Sept. 1994). Commun. 17(1): 3–48 (Jan. 2002). Special issue on JPEG
2000.
68. J. Xu, S. Li, Z, Xiong, and Y.-Q. Zhang, 3-D embedded
subband coding with optimized truncation (3-D ESCOT), 87. F. Wu, S. Li, and Y.-Q. Zhang, A framework for efficient
ACHA Special Issue on Wavelets (May 2001). progressive fine granular scalable video coding, IEEE Trans.
Circuits Syst. Video Technol. 11(3): 332–344 (March 2001).
69. J. Xu, S. Li, Z. Xiong, and Y.-Q. Zhang, On boundary effects Special Issue for Streaming Video.
in 3-D wavelet video coding, Proc. Symposium on Optical
Science and Technology, (July 2000). 88. Q. Wang, Z. Xiong, F. Wu, and S. Li, Optimal rate allocation
for progressive fine granularity scalable video coding, IEEE
70. M. F. Barnsley, Fractals Everywhere, Academic Press, San Signal Process. Lett. 9: 33–39 (Feb. 2002).
Diego, CA, 1988.
89. J. M. Shapiro, Embedded image coding using zerotrees of
71. A. E. Jacquin, Image coding based on a fractal theory of wavelet coefficients, IEEE Trans. SP 41(12): 3445–3462
iterated contractive image transformations, IEEE Trans. (Dec. 1993).
Image Process. 1: 18–30 (Jan. 1992).
90. A. Said and W. A. Pearlman, A new, fast, and efficient image
72. Y. Fisher, A discussion of fractal image compression, codec based on set partitioning in hierarchical trees, IEEE
in Saupe D. H. O. Peitgen, H. Jurgens, eds., Chaos and Trans. CSVT 6(3): 243–250 (June 1996).
Fractals, Springer-Verlag, New York, 1992, pp. 903–919.
91. D. Taubman, EBCOT (embedded block coding with
73. E. W. Jacobs, Y. Fisher, and R. D. Boss, Image compression: optimized truncation): A complete reference, ISO/IEC
A study of iterated transform method, Signal Process. 29: JTC1/SC29/WG1 N983, (Sept. 1998).
251–263 (Dec. 1992). 92. J. Rissanen, Universal coding, information, prediction, and
74. K. Barthel, T. Voy’e, and P. Noll, Improved fractal image estimation, IEEE Trans. Inform. Theory IT-30: 629–636
coding, in Proc. Picture Coding Symp. (PCS) 1.5: (March (July 1984).
1993). 93. M. J. Weinberger and G. Seroussi, Sequential prediction
75. J. Y. Huang and O. M. Schultheiss, Block quantization and ranking in universal context modeling and data
of correlated Gaussian random variables, IEEE Trans. compression, IEEE Trans. Inform. Theory 43: 1697–1706
Commun. 11: 289–296 (Sept. 1963). (Sept. 1997).
76. N. S. Jayant and P. Noll, Digital Coding of waveforms, 94. M. Mattavelli and A. Nicoulin, Pre and post processing for
Prentice-Hall, Englewood Cliffs, NJ, 1984. very low bit-rate video coding, in Proc. Int. Workshop HDTV
(Oct. 1994).
77. A. Segall, Bit allocation and encoding for vector sources,
IEEE Trans. Inform. Theory IT-22: 162–169 (March 1976). 95. M. I. Sezan, M. K. Ozkan, and S. V. Fogel, Temporally
adaptive filtering of noisy image sequences using a robust
78. T. Chiang and Y.-Q. Zhang, A new rate control scheme using motion estimation algorithm, in Proc. IEEE Int. Conf.
quadratic rate distortion model, IEEE Trans. Circuits Syst. Acoustics, Speech, and Signal Processing (ICASSP) IV:
Video Technol. 7(1): 246–250 (Feb. 1997). 2429–2432 (May 1991).
79. Y. Shoham and A. Gersho, Efficient bit allocation for 96. L. Chulhee, M. Eden, and M. Unser, High-quality image
an arbitrary set of quantizers, IEEE Trans. ASSP 36: resizing using oblique projection operators, IEEE Trans.
1445–1453 (Sept. 1988). Image Proc. 7(5): 679–692 (May 1998).
80. K. Ramchandran, A. Ortega, and M. Vetterli, Bit allocation 97. A. Munoz, T. Blu, and M. Unser, Least-squares image
for dependent quantization with applications to multireso- resizing using finite differences, IEEE Trans. Image Proc.
lution and MPEG video coders, IEEE Trans. Image Process. 10(9): 1365–1378 (Sept. 2001).
3(5): 533–545 (Sept. 1994).
98. R. Castagno, P. Haavisto, and G. Ramponi, A method for
81. T. Weigand, M. Lightstone, D. Mukherjee, T. G. Campbell, motion adaptive frame rate up-conversion, IEEE Trans.
and S. K. Mitra, Rate-distortion optimized mode selection Circuits Systems Video Technol. 6(5): 436–466 (Oct. 1996).
for very low bit-rate video coding and the emerging H.263 99. Y.-K. Chen, A. Vetro, H. Sun, and S. Y. Kung, Frame-
standard, IEEE Trans. Circuits Syst. Video Technol. 6: rate up-conversion using transmitted true motion vectors,
182–190 (April 1996). in Proc. IEEE Second Workshop on Multimedia Signal
82. H. Everett, Generalized lagrange multiplier method for Processing, 622–627 (Dec. 1998).
solving problems of optimum allocation of resources, Oper. 100. S.-K. Kwon, K.-S. Seo, J.-K. Kim, and Y.-G. Kim, A motion-
Res. 11: 399–417 (1963). adaptive de-interlacing method, IEEE Trans. Consumer
83. G. T. Toussaint, Pattern recognition and geometrical com- Electron. 38(3): 145–150 (Aug. 1992).
plexity, in Proc. 5th Int. Conf. Pattern Recognition 101. C. Sun, De-interlacing of video images using a shortest path
1324–1347 (Dec. 1980). technique, IEEE Trans. Consumer Electron. 47(2): 225–230
84. P. H. Westerink, J. Biemond, and D. E. Boekee, An optimal (May 2001).
bit allocation algorithm for subband coding, in Proc. Int. 102. J. Schwendowius and G. R. Arce, Data-adaptive digital
Conf. Acoustics, Speech, and Signal Processing (ICASSP) video format conversion algorithms, IEEE Trans. Circuits
757–760 (1988). Systs. Video Technol. 7(3): 511–526 (June 1997).
IMAGE AND VIDEO CODING 1061
103. H. C. Reeve and J. S. Lim, Reduction of blocking effects in coding, IEEE Trans. Circuits Syst. Video Technol. 7(2):
image coding, Opt. Eng. 23(1): 34–37 (1984). 299–311 (April 1997).
104. B. Ramamurthi and A. Gersho, Nonlinear space-variant 120. T.-Y. Kuo, J. W. Kim, and C.-J. Kuo, Motion-compensated
postprocessing of block coded images, IEEE Trans. Acoust. frame interpolation scheme for H.263 codec, in Proc. IEEE
Speech Signal Process. 34: 1258–1268 (Oct. 1986). International Symposium on Circuits and Systems (ISCAS)
(1999).
105. R. L. Stevenson, Reduction of coding artifacts in transform
image coding, in Proc. IEEE Int. Conf. Acoustics, Speech, 121. C. Christopoulos, J. Askelof, and M. Larsson, Efficient
and Signal Processing (ICASSP) V: 401–404 (April 1993). methods for encoding regions of interest in the upcoming
JPEG2000 still image coding standard, IEEE Signal Process.
106. L. Yan, A nonlinear algorithm for enhancing low bit-rate
Lett. 7(9): 247–249 (Sept. 2000).
coded motion video sequence, in Proc. IEEE Int. Conf.
Acoustics, Speech, and Signal Processing (ICASSP) II: 122. K. R. Rao and P. Yip, Discrete Cosine Transforms —
923–927 (Nov. 1994). Algorithms, Advantages, Applications, Academic Press,
(1990).
107. Y. Yang, N. P. Galatsanos, and A. K. Katsaggelos, Iterative
projection algorithms for removing the blocking artifacts of 123. CCITT Recommendation T.4, Standardization of Group 3
block-DCT compressed images, in Proc. IEEE Int. Conf. Facsimile Apparatus for Document Transmission, (1980).
Acoustics, Speech, and Signal Processing (ICASSP) V: 124. CCITT Recommendation T.6, Facsimile Coding Schemes and
401–408 (April 1993). Coding Control Functions for Group 4 Facsimile Apparatus,
108. B. Macq et al., Image visual quality restoration by cancel- (1984).
lation of the unmasked noise, in Proc. IEEE Int. Conf. 125. ITU-T Recommendation T.82, Information technology —
Acoustics, Speech, and Signal Processing (ICASSP) V: 53–56 Coded representation of picture and audio information-
(1994). Progressive bi-level image compression, (1993).
109. T. Chen, Elimination of subband-coding artifacts using the 126. ISO/IEC-11544, Progressive Bi-level Image Compression.
dithering technique, in Proc. IEEE Int. Conf. Acoustics, International Standard, (1993).
Speech, and Signal Processing (ICASSP) II: 874–877 (Nov.
1994). 127. ISO/IEC-14492 FCD, Information Technology — Coded Rep-
resentation of Picture and Audio Information — Lossy/Loss-
110. W. Li, O. Egger, and M. Kunt, Efficient quantization noise less Coding of Bi-Level Images, Final Committee Draft.
reduction device for subband image coding schemes, in Proc. ISO/IEC JTC 1/SC 29/WG 1 N 1359, (July, 1999).
IEEE Int. Conf. Acoustics, Speech, and Signal Processing
(ICASSP) IV: 2209–2212 (May 1995). 128. ISO/IEC JTC 1/SC 29/WG 1, ISO/IEC FCD 15444-1:
Information technology — JPEG 2000 image coding system:
111. K. K. Pang and T. K. Tan, Optimum loop filter in hybrid Core coding system [WG1 N 1646], (March 2000).
coders, IEEE Trans. Circuits Syst. Video Technol. 4(2):
158–167 (April 1994). 129. W. B. Pennebaker and J. L. Mitchell, JPEG: Still Image
Data Compression Standard, Van Nostrand Reinhold, New
112. Tao Bo and M. T. Orchard, Removal of motion uncertainty York, (1992).
and quantization noise in motion compensation, IEEE
Trans. Circuits Syst. Video Technol. 11(1): 80–90 (Jan. 130. ISO/IEC, ISO/IEC 14496-2:1999: Information technol-
2001). ogy — Coding of audio-visual objects — Part 2: Visual, (Dec.
1999).
113. S. Shirani, F. Kossentini, and R. Ward, A concealment
method for video communications in an error-prone 131. ISO/IEC, ISO/IEC 14495-1:1999: Information technol-
environment, IEEE J. Select. Areas Commun. 18(6): ogy — Lossless and near-lossless compression of continuous-
1122–1128 (June 2000). tone still images: Baseline, (Dec. 1999).
114. S. Shirani, B. Erol, and F. Kossentini, Error concealment 132. M. J. Weinberger, G. Seroussi, and G. Sapiro, The LOCO-
for MPEG-4 video communication in an error prone I lossless image compression algorithm: Principles and
environment, in Proc. IEEE International Conference on standardization into JPEG-LS, IEEE Trans. Img. Process.
Acoustics, Speech, and Signal Processing (2000). 9(8): 1309–1324 (Aug. 2000).
115. Y. Wang and Q.-F. Zhu, Error control and concealment for 133. S. W. Golomb, Run-length encodings, IEEE Trans. Inform.
video communication: a review, Proc. IEEE 86(5): 974–997 Theory IT-12: 399–401 (July 1966).
(May 1998). 134. R. F. Rice, Some practical universal noiseless coding tech-
116. H. R. Rabiee, H. Radha, and R. L. Kashyap, Error conceal- niques, Tech. Rep. JPL-79-22, Jet Propulsion Laboratory,
ment of still image and video streams with multidirectional Pasadena, CA, (March 1979).
recursive nonlinear filters, in Proceedings of International 135. W. Li, Y.-Q. Zhang, I. Sodagar, J. Liang, and S. Li, MPEG-4
Conference on Image Processing (1996). texture coding, A. Puri and C. Chen, eds., in Multimedia
117. D. Gatica-Perez, M.-T. Sun, and C. Gu, Semantic video Systems, Standards, and Networks, Marcel Dekker, Inc.,
object extraction using four-band watershed and partition New York, 2000.
lattice operators, IEEE Trans. Circuits Syst. Video Technol. 136. S. Li and I. Sodagar, Generic, scalable and efficient shape
11(5): 603–618 (May 2001). coding for visual texture objects in MPEG-4, In proc.
118. K. Sethi and N. V. Patel, A statistical approach to scene International Symposium on Circuits and Systems 2000
change detection, in IS&T SPIE Proc.: Storage and Retrieval (May 2000).
for Image and Video Databases III 2420: 329–339 (Feb. 137. ISO/IEC FCD15444-1:2000, Information Technology — Jpeg
1995). 2000 Image Coding System, V1.0, March 2000.
119. J.-J. Chen and H.-M. Hang, Source model for transform 138. ITU-T Recommendation H.261, Video codec for audiovisual
video coder and its application II. Variable frame rate services at p∗ 64 kbits/sec, 1990.
1062 IMAGE COMPRESSION
139. MPEG-1 Video Group, Information Technology — Coding of 161. W. Effelsberg and R. Steinmetz, Video Compression Tech-
Moving Pictures and Associated Audio for Digital Storage niques, dpunkt-Verlag, 1998.
Media up to about 1.5 Mbit/s—: Part 2 — Video, ISO/IEC 162. A. Bovik, ed., Handbook of image and video processing,
11172-2, International Standard, 1993. Academic Press, 2000.
140. MPEG-2 Video Group, Information Technology — Generic
Coding of Moving Pictures and Associated Audio:
Part 2 — Video, ISO/IEC 13818-2, International Standard,
1995.
IMAGE COMPRESSION
141. ITU-T Experts Group on Very Bit rate Visual Telephony, ALFRED MERTINS
ITU-T Recommendation H.263: Video Coding for Low Bit
University of Wollongong
rate Communication, Dec. 1995.
Wollongong, Australia
142. Video coding for low bit rate communication, ITU-T SG XVI,
DRAFT 13, H.263+, Q15-A-60 rev. 0, 1997.
143. ITU-T Q.15/16, Draft for H.263 + + Annexes U, V, and W to 1. INTRODUCTION
Recommendation H.263, November, 2000.
144. MPEG-4 Video Group, Generic coding of audio-visual
The extensive use of digital imaging in recent years has
objects: Part 2 — Visual, ISO/IEC JTC1/SC29/WG11 N1902, led to an explosion of image data that need to be stored on
FDIS of ISO/IEC 14496-2, Atlantic City, Nov. 1998. disks or transmitted over networks. Application examples
are facsimile, scanning, printing, digital photography,
145. Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG
multimedia, Internet Websites, electronic commerce,
T, Working Draft Number 2, Revision 8 (WD-2 rev 8), JVT-
B118r8, April, 2002. digital libraries, medical imaging, and remote sensing. As
a result of this ever-increasing volume of data, methods
146. Y. T. Chan, Wavelet Basics, Kluwer Academic Publishers, for the efficient compression of digital images have become
Norwell, MA, 1995.
more and more important. To give an example, a modern
147. G. Strang and T. Nguyen, Wavelets and Filter Banks, digital camera stores about 3.3 megapixels per image.
Wellesley-Cambridge Press, Wellesley, MA, 1996. With three color components and 8-bit resolution per color
148. M. Vetterli and J. Kovacevic, Wavelets and Subband Coding, component, this makes an amount of approximately 10 MB
Prentice-Hall, Englewood Cliffs, NJ, 1995. (megabytes) of raw data per image. Using a 64-MB memory
149. S. Li and W. Li, Shape adaptive vector wavelet coding card, direct storage would allow only six images to be
of arbitrarily shaped objects, SPIE Proceeding: Visual stored. With digital image compression, however, the same
Communications and Image Processing’97 San-Jose, (Feb. camera can store about 60 images or more on the memory
1997). card without noticeable differences in image quality. High
150. S. Li, W. Li, H. Sun, and Z. Wu, Shape adaptive wavelet compression factors without much degradation of image
coding, Proc. IEEE International Symposium on Circuits quality are possible because images usually contain a large
and Systems ISCAS’98 5: 281–284 (May 1998). amount of spatial correlation. In other words, images will
typically have larger areas showing similar color, gray
151. C. Gu and M.-C. Lee, Semiautomatic segmentation and
tracking of semantic video objects, IEEE Trans. Circuits
level, or texture, and these similarities can be exploited to
Syst. Video Technol. 8(5): (Sep. 1998). obtain compression.
Image compression can generally be categorized
152. J.-H. Pan, S. Li, and Y.-Q. Zhang, Automatic moving video
into lossless and lossy compression. Obviously, lossless
object extraction using multiple features and multiple
compression is most desirable as the reconstructed image
frames, ISCAS 2000 (May 2000).
is an exact copy of the original. The compression ratios
153. H. Zhong, W. Liu, and S. Li, A semi-automatic system for that can be obtained with lossless compression, however,
video object segmentation, ICME 2001 (Aug. 2001). are fairly low, typically ranging from 3 and 5, depending
154. N. Li, S. Li, and W. Liu, A novel framework for semi- on the image. Such low compression ratios are justified
automatic video object segmentation, IEEE ISCAS 2002 in applications where no loss of quality can be tolerated,
(May 2002). as is often the case in the compression and storage of
155. S. J. Gortler, R. Grzeszezuk, R. Szeliski, and M. F. Cohen, medical images. For most other applications such as
The lumigraph, Computer Graphics (SIGGRAPH’96) 43–54 internet browsing or storage for printing some loss is
(Aug. 1996). usually acceptable. Allowing for loss allows for much
156. M. Levoy and P. Hanrahan, Light field rendering, Computer higher compression ratios.
Graphics (SIGGRAPH’96) (Aug. 1996). Early image compression algorithms have mainly
focused on achieving low distortion for a given (fixed)
157. H. Shum and L. He, Rendering with concentric mosaics,
Computer Graphics (SIGGRAPH’99) 299–306 (Aug. 1999).
rate, or conversely, the lowest rate for a given maximum
distortion. While these goals are still valid, modern
158. J. Li, H. Shum, and Y.-Q. Zhang, On the compression multimedia applications have led to a series of further
of image based rendering scene, in Proc. International
requirements such as spatial and signal-to-noise ratio
Conference on Image Processing, 2: 21–24 (Sept. 2000).
(SNR) scalability and random access to parts of the image
159. P. D. Symes, Video Compression, McGraw-Hill, New York, data. For example, in a typical Internet application the
1998. same image is to be accessed from various users with
160. J. Watkinson, The MPEG Handbook, Focal Press, 2001. a wide range of devices (from a low-resolution palmtop
IMAGE COMPRESSION 1063
Distortion
Coder 1
desirable to transcode image data within the network Coder 2
to the various resolutions that best serve the respective Lower bound
users. On the fly transcoding, however, requires the
compressed image data to be organized in such a way that
different content variations can be easily extracted from
a given high-resolution codestream. The new JPEG2000
0 1 2 3 4 5 6 7 8
standard takes a step in this direction and combines
Rate (bpp)
abundant functionalities with very high compression ratio
and random codestream access. Figure 1. Distortion-rate curves for different coders.
The aim of this article is to give an overview of some
of the most important image compression tools and tech-
niques. We start in Section 2 by looking at the theoretical low rates, but reaches the lossless stage only at extremely
background of data compression. Section 3 then reviews high rates. The lowest curve shows the minimum rate
the discrete cosine and wavelet transforms, the two most required to code the information at a given distortion,
important transforms in image compression. Section 4 assuming optimal compression for all rates. This is the
looks at embedded state-of-the-art wavelet compression desired curve, combining lossy and lossless compression
methods, and Section 5 gives an overview of standards in an optimal way.
for the compression of continuous-tone images. Finally,
Section 6 gives a number of conclusions and outlook. 2.2. Coding Gain Through Decorrelation
The simplest way to compress an image in a lossy manner
2. ELEMENTS OF SOURCE CODING THEORY would be to quantize all pixels separately and to provide
a bit stream that represents the quantized values. This
2.1. Rate Versus Distortion strategy is known as pulse code modulation (PCM). For
example an 8-bit gray-scale image whose pixels xm,n are
The mathematical background that describes the tradeoff integers in the range from 0 to 255 could be compressed by
between compression ratio and fidelity was established by a factor of four through neglecting the six least significant
Shannon in his work on rate–distortion (RD) theory [1]. bits of the binary representations of xm,n , resulting in
The aim of this work was to determine the minimum bit an image with only four different levels of gray. Entropy
rate required to code the output of a stochastic source coding (see Section 2.4) of the PCM values would generally
at a given maximum distortion. The theoretical results allow increased compression, but such a strategy would
obtained by Shannon have clear implications on the per- still yield a poor tradeoff between the amount of distortion
formance of any practical coder, and in the following we introduced and the compression ratio achieved. The reason
want to have a brief look at what the RD tradeoff means for this is that the spatial relationships between pixels are
in practice. For this we consider the graphs in Fig. 1, not utilized by PCM.
which show the distortion obtained with different coders Images usually contain a large amount of spatial
versus the rate required to store the information. Such correlation, which can be exploited to obtain compression
graphs obtained for a specific image and encoding algo- schemes with a better RD tradeoff than that obtained with
rithm are known as operational distortion–rate curves. PCM. For this the data first needs to be decorrelated,
The distortion is typically measured as the mean-squared and then quantization and entropy coding can take
error (MSE) place. Figure 2 shows the basic structure of an image
coder that follows such a strategy. The quantization
1
M−1 N−1
D= |x̂m,n − xm,n |2 (1) step is to be seen as the assignment of a discrete
NM m=0 n=0 symbol to a range of input values. In the simplest case
this could be the assignment of symbol a through the
where xm,n is the original image and x̂m,n is the operation a = round(x/q), where q is the quantization
reconstructed image. However, also of interest are other step size. The inverse quantization then corresponds to
distortion measures that possibly better reflect the amount the recovery of the actual numerical values that belong to
of distortion as perceived by a human observer. The rate the symbols (e.g., the operation x̂ = q · a). The entropy
is measured as the required average number of bits per coding stage subsequently endeavors to represent the
pixel (bpp). As one might expect, all coders show the generated discrete symbols with the minimum possible
same maximum distortion at rate zero (no transmission). number of bits. This step is lossless. Errors occur only due
With increasing rate the distortion decreases and different to quantization. More details on entropy coding will be
coders reach the point of zero distortion at different rates. given in Section 2.4.
Coder 1 is an example of a coder that combines lossy One of the simplest decorrelating transforms is a
and lossless compression in one bit stream and that is prediction error filter, which uses the knowledge of
optimized for lossless compression. Coder 2, on the other neighboring pixels to predict the value of the pixel of
hand, is a lossy coder that shows good performance for interest and then outputs the prediction error made.
1064 IMAGE COMPRESSION
Encoder
Decoder
Inverse
Compressed Entropy Inverse Reconstructed
decorrelating
data decoder quantizer image
Figure 2. Typical image encoder and transform
decoder structures.
In this case the coding paradigm changes from PCM to quantizers, and (3) what are the optimal quantization
differential PCM, known as DPCM. To give a practical levels? Answers to these questions have been derived for
example, Fig. 12 shows the neighborhood relationship both unitary and biorthogonal transforms. We will briefly
used in the JPEG-LS standard where the pixel values sketch the derivation for the unitary case. To simplify the
a, b, c, and d are used to predict an estimate x̂ for expressions we assume that x0 , x1 , . . . , xM−1 and thus also
the actual value x. Under ideal conditions (stationary y0 , y1 , . . . , yM−1 are zero-mean Gaussian random variables.
source and optimal prediction), the output values e = x − x̂ The optimal scalar quantizers Qk that minimize the
of a prediction error filter would be entirely mutually individual error variances σq2k = E{q2k } with qk = yk − ŷk
uncorrelated, but in practice there will still be some for a given number of quantization steps are known as
correlation left, for example due to spatially varying Lloyd–Max quantizers [7,8]. Important properties of these
image properties. A coding gain of DPCM over PCM arises optimal quantizers are E{qk } = 0 (zero-mean error) and
from the fact that the prediction error usually has much E{qk ŷk } = 0 (orthogonality between the quantized value
less power than the signal itself and can therefore be and the error) [9].
compressed more efficiently. The bit allocation can be derived under the assumption
Modern state-of-the-art image compression algorithms of mutually uncorrelated quantization errors and RD
use transforms like the discrete cosine transform (DCT) relationships of the form
or the discrete wavelet transform (DWT) to carry
out decorrelation [2–4]. These transforms are extremely σq2k = γk σy2k 2−2Bk , k = 0, 1, . . . , M − 1 (2)
efficient in decorrelating the pixels of an image and are
employed in a number of compression standards [5,6]. To for the individual quantizers. The term σq2k in (2) is the
see how and why transform coding works, we consider variance of the quantization error produced by quantizer
the system in Fig. 3. The input is a zero-mean random Qk , σy2k is the variance of yk , and Bk is the number of
vector x = [x0 , x1 , . . . , xM−1 ]T with correlation matrix Rxx = bits spent for coding yk . The values γk , k = 0, 1, . . . , M − 1
E{xxT }, where E{·} denotes the expectation operation. The depend on the PDFs of the random variables yk . Because
output y = Tx of the transform is then a zero-mean random of the Gaussian assumption made earlier we have equal
process with correlation matrix Ryy = TRxx T T . Because γk for all k. The assumption of uncorrelated quantization
the random variables y0 , y1 , . . . , yM−1 stored in y may have errors stated above means that E{qi qk } = 0 for i = k, which
different variances, they are subsequently quantized with is usually satisfied if the quantization is sufficiently fine,
different quantizers Q0 , Q1 , . . . , QM−1 . In the synthesis even if the random variables yk are mutually correlated.
stage the inverse transform is applied to reconstruct an Minimizing the average quantization error
approximation x̂ of the original vector x based on the
quantized coefficients ŷ0 , ŷ1 , . . . , ŷM−1 . The aim is to design 1 2
M−1
the system in such a way that the mean-squared error σq2 = σqk (3)
M
E{x − x2 } becomes minimal for a given total bit budget. k=0
According to (5) the bits are to be allocated such that same limit as the one obtained for DPCM with ideal
2Bk ∼ σy2k , which intuitively makes sense as more bits are prediction [9].
assigned to random variables with a larger variance.
Another interesting property of optimal bit allocation It is interesting to note that the expression (8) for
and quantizer selection can be seen when substituting the coding gain also holds for subband coding based on
Bk according to (5) into (2). One obtains uniform, paraunitary filterbanks (i.e., filterbanks that
carry out unitary transforms). The term σx2 is then the
1/M
M−1 variance of an ongoing stationary input process, and the
σq2k −2B
= γ2 σy2j , k = 0, 1, . . . , M − 1 (6) values σy2k are the subband variances. A more general
j=0 expression for the coding gain has been derived by Katto
and Yasuda [11]. Their formula also holds for biorthogonal
which means that in the optimum situation all quantizers transform and subband coding as well as other schemes
contribute equally to the final output error. such as DPCM.
In order to answer the question of which transform is
optimal for a given coding task we consider the coding gain 2.3. Vector Quantization
defined as
σ2 Vector quantization (VQ) is a multidimensional extension
GTC = PCM2
(7) of scalar quantization in that an entire vector of values
σTC
is encoded as a unit. Let x be such an N-dimensional
2 vector of values, and let xi , i = 1, 2, . . . , I be a set of N-
where σPCM is the error variance of simple PCM and
2 dimensional codevectors stored in a codebook. Given x a
σTC is the error variance produced by transform coding,
vector quantizer finds the codevector x from the codebook
both at the same average bit rate B. Assuming that
that best matches x and transmits the corresponding index
all random variables xk have the same variance σx2
. Knowing the receiver reconstructs x as x . An often
the error of PCM amounts to σPCM 2
= γ 2−2B σx2 . Using (3)
used quality criterion to determine which vector from
and (6) the error of transform coding can be written as
1/M the codebook gives the best match for x is the Euclidean
M−1
distance d(x, xi ) = x − xi 2 . Theoretically, if the vector
2
σTC = γ 2−2B σy2j , and the coding gain becomes length N tends to infinity, VQ becomes optimal and
j=0
approaches the performance indicated by rate distortion
theory. In practice, however, the cost associated with
σx2 searching through a large codebook is the major obstacle.
GTC = 1/M (8)
M−1 See Ref. 12 for discussions of computationally efficient
σy2k forms of the VQ technique and details on codebook design.
k=0
1. A maximum coding gain is obtained if the coefficients H=− p(ai ) log2 p(ai ) (9)
i=1
yk are mutually uncorrelated.
2. The more unequal the variances σy2k are, the higher and describes the average information per symbol (in
the coding gain is, because a high dissimilarity leads bits). According to this equation, the more skewed the
to a high ratio of arithmetic and geometric mean. probability distribution, the lower the entropy. For any
Consequently, if the input values xk are already given number of symbols L the maximum entropy is
mutually uncorrelated (white process), a transform obtained if all symbols are equally likely. The entropy
cannot provide any further coding gain. provides a lower bound for the average number of bits per
3. With increasing M one may expect an increasing symbol required to encode the symbols emitted by a DMS.
coding gain that moves toward an upper limit as M The most popular entropy coding methods are Huffman
goes to infinity. In fact, one can show that this is the coding, arithmetic coding, and Lempel–Ziv coding; the
1066 IMAGE COMPRESSION
first two are frequently applied in image compression. The DCT is a unitary transform, so that the inverse
Lempel–Ziv coding, a type of universal coding, is used transform (2D IDCT) uses the same basis sequences. It
more often for document compression, as it does not is given by
require a priori knowledge of the statistical properties
of the source. M−1
M−1 2γk γ k(m + 12 )π (n + 12 )π
xm,n = yk, cos cos ,
Huffman coding uses variable-length codewords and M M M
k=0 =0
produces a uniquely decodable code. To construct a
Huffman code, the symbol probabilities must be known m, n = 0, 1, . . . , M − 1 (11)
a priori. As an example, consider the symbols a1 , a2 , a3 , a4
The popularity of the DCT comes from the fact that it
with probabilities 0.5, 0.25, 0.125, 0.125, respectively. A
almost reaches the coding gain obtained by the KLT for
fixed-length code would use two bits per symbol. A possible
typical image data while having the advantage of fast
Huffman code is given by a1 : 0, a2 : 10, a3 : 110, a4 : 111.
implementation. In fact, for a 1D (one-dimensional) first-
This code requires only 1.75 bits per symbol on average,
order autoregressive input process with autocorrelation
which is the same as the entropy of the source. In fact,
sequence rxx (m) = σx2 ρ |m| and correlation coefficient ρ →
one can show that Huffman codes are optimal and reach
1 it has been shown that the DCT asymptotically
the lower bound stated by the entropy when the symbol
approaches the KLT [9]. Fast implementations can be
probabilities are powers of 12 . In order to decode a Huffman
obtained through the use of FFT (Fast Fourier Transform)
code, the decoder must know the code table that has been
algorithms or through direct factorization of the DCT
used by the encoder. In practice this means that either
formula [2]. The latter approach is especially interesting
a specified standard code table must be employed or the
for the 2D case where 2D factorizations lead to the most
code table must be transmitted to the decoder as side
efficient implementations [13,14].
information. In image compression the DCT is usually used on
In arithmetic coding there is no one-to-one correspon- nonoverlapping 8 × 8 blocks of the image rather than
dence between symbols and codewords, as it assigns on the entire image in one step. The following aspects
variable-length codewords to variable-length blocks of have lead to this choice. First, from the theory outlined
symbols. The codeword representing a sequence of symbols in Section 2.2 and the good decorrelation properties of
is a binary number that points to a subinterval of the inter- the DCT for smooth signals, it is clear that in order
val [0, 1) that is associated with the given sequence. The to maximize the coding gain for typical images, the
length of the subinterval is equal to the probability of the blocks should be as big as possible. On the other hand,
sequence, and each possible sequence creates a different with increasing block size the likelihood of capturing
subinterval. The advantage of arithmetic over Huffman a nonstationary behavior within a block increases.
coding is that it usually results in a shorter average code This however decreases the usefulness of the DCT for
length when the symbol probabilities are not powers of decorrelation. Finally, quantization errors made for DCT
1
2
, and arithmetic coders can be made adaptive to learn coefficients will spread out over the entire block after
the symbol probabilities on the fly. No side information in reconstruction via the IDCT. At low rates this can lead to
form of a code table is required. annoying artifacts when blocks consist of a combination of
flat and highly textured regions, or if there are significant
edges within a block. These effects are less visible if the
3. TRANSFORMS FOR IMAGE COMPRESSION block size is small. Altogether, the choice of 8 × 8 blocks
has been found to be a good compromise between exploiting
3.1. The Discrete Cosine Transform neighborhood relations in smooth regions and avoiding
annoying artifacts due to inhomogeneous block content.
The discrete cosine transform (DCT) is used in most Figure 4 shows an example of the blockwise 2D DCT
current standards for image and video compression. of an image. The original image of Fig. 4a has a size
Examples are JPEG, MPEG-1, MPEG-2, MPEG-4, H.261, of 144 × 176 pixels (QCIF format) and the blocksize
and H.263. To be precise, there are four different DCTs for the DCT is 8 × 8. Figure 4b shows the blockwise
defined in the literature [2], and in particular it is the transformed image, and Fig. 4c shows the transformed
DCT-II that is used for image compression. Because there image after rearranging the coefficients in such a way
is no ambiguity throughout this text we will simply call it that all coefficients with the same physical meaning (i.e.,
the DCT. The DCT of a two-dimensional (2D) signal xm,n coefficients yk, from the different blocks) are gathered
with m, n = 0, 1, . . . , M − 1 is defined as in a subimage. For example the subimage in the upper
left corner of Fig. 4c contains the coefficients y0,0 of all
the blocks in Fig. 4b. These coefficients are often called
2γk γ
M−1 M−1
k(m + 12 )π (n + 12 )π
yk, = xm,n cos cos , DC coefficients, because they represent the average pixel
M m=0 n=0 M M value within a block. Correspondingly, the remaining
63 coefficients of a block are called AC coefficients. From
k, = 0, 1, . . . , M − 1
Fig. 4c one can see that the DC coefficients contain the
(10)
most important information on the entire image. Toward
with
1 the lower right corner of Fig. 4c the amount of signal
γk = √ for k = 0 energy decreases significantly, represented by the average
2
1 otherwise level of gray.
IMAGE COMPRESSION 1067
Figure 4. Example of a blockwise 2D DCT of an image: (a) original image (144 × 176 pixels);
(b) transformed image (blocksize 8 × 8, 18 × 26 blocks); (c) transformed image after reordering of
coefficients (8 × 8 subimages of size 18 × 26).
3.2. The Discrete Wavelet Transform x̂(n) = x(n − n0 )], the filters must satisfy
The discrete wavelet transform (DWT) is a tool to H0 (−z)G0 (z) + H1 (−z)G1 (z) = 0 (12)
hierarchically decompose a signal into a multiresolution
pyramid. It offers a series of advantages over the and
DCT. For example, contrary to the blockwise DCT H0 (z)G0 (z) + H1 (z)G1 (z) = 2z−n0 (13)
the DWT has overlapping basis functions, resulting
in less visible artifacts when coding at low bit rates.
Equation (12) guarantees that the aliasing components
Moreover, the multiresolution signal representation offers
that occur due to the subsampling operation will be
functionalities such as spatial scalability in a simple
compensated at the output, while (13) finally ensures
and generic way. While most of the image compression
perfect transmission of the signal through the system.
standards are based on the DCT, the new still image
In addition to the PR conditions√ (12) and (13) the filters
compression standard JPEG2000 and parts of the MPEG4
should satisfy H0 (1) = G0 (1) = 2 and H1 (1) = G1 (1) = 0,
multimedia standard rely on the DWT [6,15].
which are essential requirements to make them valid
For discrete-time signals the DWT is essentially an
wavelet filters. Moreover, they should satisfy some
octave-band signal decomposition, carried out through
regularity constraints as outlined by Daubechies [3]. It
successive filtering operations and sampling rate changes.
should be noted that (12) and (13) are the PR constraints
The basic building block of such an octave-band filterbank
for biorthogonal two-channel filterbanks. Paraunitary
is the two-channel structure depicted in Fig. 5. H0 (z) and
filter banks and the corresponding orthonormal wavelets
G0 (z) are lowpass, while H1 (z) and G1 (z) are highpass require the stronger condition
filters. The blocks with arrows pointing downward indicate
downsampling by factor 2 (i.e., taking only every second
|H0 (ejω )|2 + |H0 (ej(ω+π ) )|2 = 2 (14)
sample), and the blocks with arrows pointing upward
indicate upsampling by 2 (insertion of zeros between the
to hold. Apart from the special case where the filter
samples). Downsampling serves to eliminate redundancies
length is 2, Eq. (14) can be satisfied only by filters with
in the subband signals, while upsampling is used to
nonsymmetric impulse responses [16]. Symmetric filters
recover the original sampling rate. Because of the filter
are very desirable because they allow for simple boundary
characteristics (lowpass and highpass), most of the energy
processing (see below). Therefore paraunitary two-channel
of a lowpass-type signal x(n) will be stored in the subband
filterbanks and the corresponding orthonormal wavelets
samples y0 (m). Because y0 (m) occurs at half the sampling
are seldom used in image compression.
rate of x(n) it appears that the filterbank structure
For the decomposition of images, the filtering and down-
concentrates the information in less samples, as required
sampling operations are usually carried out separately in
for efficient compression. More efficiency can be obtained
the horizontal and vertical directions. Figure 6 shows an
by cascading two-channel filterbanks to obtain octave-
illustration of the principle that yields a 2D octave-band
band decompositions or other frequency resolutions. For
decomposition that corresponds to a DWT. In order to
the structure in Fig. 5 to allow perfect reconstruction
ensure that the analysis process results in the same num-
(PR) of the input with a delay of n0 samples [i.e.,
ber of subband samples as there are input pixels, special
boundary processing steps are required. These will be
explained below. An example of the decomposition of an
image is depicted in Fig. 7. One can see that the DWT
H0(z ) 2 y0(m ) 2 G0(z )
concentrates the essential information on the image in a
x (n ) x^ (n )
few samples, resulting in a high coding gain.
When decomposing a finite-length signal (a row or
H1(z ) 2 y1(m ) 2 G1(z ) column of an image) with a filterbank using linear convo-
lution the total number of subband samples is generally
Figure 5. Two-channel filterbank. higher than the number of input samples. Methods to
1068 IMAGE COMPRESSION
LL HL HL HL
L H
LH HH LH HH LH HH
Vertical
H1 ( z ) 2 HH
Horizontal H
H 1 (z ) 2 H0 ( z ) 2 HL Vertical
H 1 (z ) 2
L Horizontal
H 0 (z ) 2 H1 ( z ) 2 LH
H 1 (z ) 2 H 0 (z ) 2
LL
H0 ( z ) 2
H 0 (z ) 2 H 1 (z ) 2
n n n n n
Figure 9. Symmetric extension for even-length filters: (a) impulse response with half-sample
symmetry; (b) half-sample antisymmetry; (c) periodic signal extension with half-sample
symmetry; (d) HSS-HSS, obtained with HSS filter; (e) HSAS-HSAS, obtained with HSAS filter.
Figure 11. JPEG quantization matrices: (a) luminance; (b) chrominance; (c) zigzag scanning order.
reconstruct the subband coefficients bitplane by bitplane. structure is the same as in Fig. 2. The color space is not
The reconstruction levels are always in the middle of the specified in the standard, but mostly the YUV space is
uncertainty interval. This means that if the decoder knows used with each color component treated separately.
for example that a coefficient is larger than or equal to 32 After the blockwise DCT (block size 8 × 8), scalar
and smaller than 63, it reconstructs the coefficient as 48. quantization is carried out with uniform quantizers. This
The bit stream can be truncated at any point to meet a bit is done by dividing the transform coefficients yk, , k, =
budget, and the decoder can reconstruct the best possible 0, 1, . . . , 7 in a block by the corresponding entries qk, in
image for the number of bits received. an 8 × 8 quantization matrix and rounding to the nearest
An interesting modification of the SPIHT algorithm integer: ak, = round(yk, /qk, ). Later in the reconstruction
has been proposed [26], called virtual SPIHT. This coder stage an inverse quantization is carried out as ŷk, = qk, ·
virtually extends the octave decomposition until only ak, . The perceptually optimized quantization matrices
three set roots are required in the LIS at initialization. in Fig. 11 have been included in the standard as a
This formation of larger initial sets results in more recommendation, but they are not a requirement and
efficient sorting during the first rounds where most other quantizers may be used. To obtain more flexibility,
coefficients are insignificant with respect to the threshold. the entire quantization matrices are often scaled such that
Further modifications include 3-D extensions for coding of step sizes qk, = D · qk, are used instead of qk, . The factor
video [27]. D then gives control over the bit rate.
The quantized coefficients in each block are scanned
5. COMPRESSION STANDARDS FOR in a zigzag manner, as shown in Fig. 11, and are then
CONTINUOUS-TONE IMAGES further entropy encoded. First the DC coefficient of
a block is differentially encoded with respect to the
This section presents the basic modes of the JPEG, DC coefficient of the previous block (DPCM), using
JPEG-LS, and JPEG2000 compression standards for still a Huffman code. Then the remaining 63, quantized
images. There also exist standards for coding of bilevel AC coefficients of a block are encoded. Because the
images, such as JBIG and JBIG2, but these will not be occurrence of long stretches of zeros during the zigzag
discussed here. scan is quite likely, zero runs are encoded using a
runlength code. If all remaining coefficients along the
5.1. JPEG zigzag scan are zero, a special end-of-block (EoB) symbol
is used. The actual coefficient values are Huffman-
JPEG is an industry standard for digital image compres- encoded.
sion developed by the Joint Photographic Experts Group,
which is a group of experts nominated by leading compa- 5.2. JPEG-LS
nies and national standards bodies. JPEG was approved
by the principal members of ISO/IEC JTC1 as an inter- JPEG-LS is a standard for lossless and near-lossless
national standard (IS 109181) in 1992 and by the CCITT compression [28]. In the near-lossless mode the user
as recommendation T.81, also in 1992. It includes the can specify the maximum error ε that may occur
following modes of operation [5]: a sequential mode, a pro- during compression. Lossless coding means ε = 0. The
gressive mode, a hierarchical mode, and a lossless mode. performance of JPEG-LS for lossless coding is significantly
In the following we will discuss the so-called baseline better than the lossless mode in JPEG.
coder, which is a simple form of the sequential mode that To achieve compression, context modeling, prediction,
encodes images block by block in scan order from left to and entropy coding based on Golomb–Rice codes are used.
right and top to bottom. Image data are allowed to have 8- Golomb–Rice codes are a special class of Huffman codes
bit or 12-bit precision. The baseline algorithm is designed for geometric distributions. The context of a pixel x is
for lossy compression with target bit rates in the range determined from four reconstructed pixels a, b, c, d in
of 0.25–2 bits per pixel (bpp). The coder uses blockwise the neighborhood of x, as shown in Fig. 12. Context is
DCTs, followed by scalar quantization and entropy coding used to decide whether x will be encoded in the run or
based on run-length and Huffman coding. The general regular mode.
IMAGE COMPRESSION 1071
(a) (b)
(c) (d)
Figure 14. Coding examples (QCIF format): (a) JPEG at 0.5 bpp; (b) JPEG2000 at 0.5 bpp;
(c) JPEG at 2 bpp; (d) JPEG2000 at 2 bpp.
higher rates both standards yield good-quality images with the Dr.-Ing. and Dr.-Ing. habil. degrees in electrical
JPEG2000 still having the better signal-to-noise ratio. engineering from Hamburg University of Technology,
Germany, in 1984, 1991, and 1994, respectively. From
1986 to 1991 he was with the Hamburg University of
6. CONCLUSIONS
Technology, from 1991 to 1995 with the Microelectronics
Applications Center Hamburg, Germany, from 1996 to
This article has reviewed general concepts and standards 1997 with the University of Kiel, Germany, and from
for still-image compression. We started by looking at 1997 to 1998 with the University of Western Australia,
theoretical foundations of data compression and then Crawley. Since 1998, he has been a senior lecturer at the
discussed some of the most popular image compression School of Electrical, Computer, and Telecommunications
techniques and standards. From todays point of view Engineering, University of Wollongong, New South Wales,
the diverse functionalities required by many multimedia Australia. His research interests include digital signal
applications are best provided by coders based on the processing, wavelets and filter banks, image and video
wavelet transform. As demonstrated in the JPEG2000 processing, and digital communications.
standard, wavelet-based coders even allow the integration
of lossy and lossless coding, which is a feature that is
very desirable for applications such as medical imaging
BIBLIOGRAPHY
where highest quality is needed. The compression ratios
obtained with lossless JPEG2000 are in the same order
as the ones obtained with dedicated lossless methods. 1. C. E. Shannon, Coding theorems for a discrete source with a
However, because lossless coding based on the wavelet fidelity criterion, IRE Nat. Conserv. Rec. 4: 142–163 (1959).
transform is still in a very early stage, one may expect 2. K. R. Rao and P. Yip, Discrete Cosine Transform, Academic
even better integrated lossy and lossless wavelet-based Press, New York, 1990.
coders to be developed in the future. 3. I. Daubechies, Ten Lectures on Wavelets, SIAM, 1992.
4. M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies,
BIOGRAPHY Image coding using wavelet transform, IEEE Trans. Image
Process. 1(2): 205–220 (April 1992).
Alfred Mertins received the Dipl.-Ing. degree in electrical 5. G. K. Wallace, The JPEG still picture compression standard,
engineering from University of Paderborn, Germany, and IEEE Trans. Consumer Electron. 38(1): 18–34 (Feb. 1992).
IMAGE PROCESSING 1073
6. Joint Photographic Experts Group, JPEG2000 Final Draft 26. E. Khan and M. Ghanbari, Very low bit rate video coding
International Standard, Part I, ISO/IEC JTC1/SC29/WG1 using virtual SPIHT, Electron. Lett. 37(1): 40–42 (Jan. 2001).
FDIS15444-1, Aug. 2000.
27. B.-J. Kim, Z. Xiong, and W. A. Pearlman, Low bit-rate
7. S. P. Lloyd, Least squares quantization in PCM, Institute of scalable video coding with 3-d set partitioning in hierarchical
Mathematical Statistics Society Meeting, Atlantic City, NJ, trees (3-D SPIHT), IEEE Trans. Circuits Syst. Video Technol.
Sept. 1957, pp. 189–192. 10(8): 1374–1387 (Dec. 2000).
8. J. Max, Quantizing for minimum distortion, IRE Trans. 28. Joint Photographic Experts Group, JPEG-LS Final Commit-
Inform. Theory 7–12 (March 1960). tee Draft, ISO/IEC JTC1/SC29/WG1 FCD14495-1, 1997.
9. N. S. Jayant and P. Noll, Digital Coding of Waveforms, 29. C. Christopoulos, A. Skodras, and T. Ebrahimi, The JPEG-
Prentice-Hall, Englewood Cliffs, NJ, 1984. 2000 still image coding system: An overview, IEEE Trans.
10. R. Bellman, Introduction to Matrix Analysis, McGraw-Hill, Consumer Electron. 46(4): 1103–1127 (Nov. 2000).
New York, 1960. 30. J. H. Kasner, M. W. Marcellin, and B. R. Hunt, Universal
11. J. Katto and Y. Yasuda, Performance evaluation of subband trellis coded quantization, IEEE Trans. Image Process. 8(12):
coding and optimization of its filter coefficients, Proc. SPIE 1677–1687 (Dec. 1999).
Visual Communication and Image Processing, Nov. 1991, 31. D. LeGall and A. Tabatabai, Sub-band coding of digital
pp. 95–106. images using short kernel filters and arithmetic coding
12. A. Gersho and R. M. Gray, Vector Quantization and Signal techniques, Proc. IEEE Int. Conf. Acoustics, Speech, and
Compression, Kluwer, Boston, 1991. Signal Processing, 1988, pp. 761–764.
13. P. Duhamel and C. Guillemot, Polynomial transform com- 32. D. Taubmann, High performance scalable image compression
putation of the 2-D DCT, Proc. IEEE Int. Conf. Acoustics, with EBCOT, IEEE Trans. Image Process. 9(7): 1158–1170
Speech, and Signal Processing, Albuquerqe, NM, April 1990, (July 2000).
pp. 1515–1518. 33. Joint Photographic Experts Group, JPEG2000 Verification
14. W.-H. Fang, N.-C. Hu, and S.-K. Shih, Recursive fast compu- Model 8, ISO/IEC JTC1/SC29/WG1 N1822, July 2000.
tation of the two-dimensional discrete cosine transform, IEE
Proc. Visual Image Signal Process. 146(1): 25–33 (Feb. 1999).
15. MPEG-4 Video Verification Model, Version 14. Generic Coding IMAGE PROCESSING
of Moving Pictures and Associated Audio, ISO/IEC JTC1/SC
29/WG 11, 1999. MAJA BYSTROM
16. P. P. Vaidyanathan, On power-complementary FIR filters, Drexel University
IEEE Trans. Circuits Syst. 32: 1308–1310 (Dec. 1985). Philadelphia, Pennsylvania
17. J. Woods and S. O’Neil, Subband coding of images, IEEE
Trans. Acoust. Speech Signal Process. 34(5): 1278–1288 (May Image processing for telecommunications is a broad
1986). field that can be divided into four general categories:
18. M. J. T. Smith and S. L. Eddins, Analysis/synthesis tech-
acquisition, analysis, compression, and reconstruction.
niques for subband coding, IEEE Trans. Acoust. Speech Images can likewise be classified by their color content into
Signal Process. 1446–1456 (Aug. 1990). binary, monochrome, or color images, or by their method
of acquisition or generation, such as natural, computer-
19. J. N. Bradley, C. M. Brislawn, and V. Faber, Reflected bound-
ary conditions for multirate filter banks, Proc. Int. Symp.
generated, radar, or ultrasonic. Following acquisition,
Time-Frequency and Time-Scale Analysis, Canada, 1992, image data may be processed for efficient storage or
pp. 307–310. representation. Image analysis may be employed to extract
desired information for compression or further processing.
20. R. L. de Queiroz, Subband processing of finite length signals
without border distortions, Proc. IEEE Int. Conf. Acoustics,
Reconstruction or enhancement is posttransmission or
Speech, and Signal Processing, San Francisco, March 1992, postacquisition processing to recover lost or degraded data
Vol. IV, pp. 613–616. or to emphasize visually important image components.
More recently significant attention has focused on the
21. C. Herley, Boundary filters for finite-length signals and time-
varying filter banks, IEEE Trans. Circuits Syst. II 42(2):
related field of digital watermarking or data hiding, in
102–114 (Feb. 1995). which marks or identification patterns are embedded in
images for security purposes.
22. A. Mertins, Boundary filters for size-limited paraunitary filter
banks with maximum coding gain and ideal DC behavior,
IEEE Trans. Circuits Syst. II 48(2): 183–188 (Feb. 2001). 1. IMAGE ACQUISITION AND REPRESENTATION
23. A. Mertins, Signal Analysis: Wavelets, Filter Banks, Time-
Frequency Transforms and Applications, Wiley, Chichester, Images are acquired through a variety of methods such
UK, 1999. as analog or digital cameras, radar, or sonar. Regardless
24. J. M. Shapiro, Embedded image coding using zerotrees of of the method of image generation, in order to provide for
wavelet coefficients, IEEE Trans. Signal Process. 41(12): digital transmission and storage, all input analog signals
3445–3462 (Dec. 1993). must be discretized and quantized. It is well known that
25. A. Said and W. A. Pearlman, A new fast and efficient image for perfect reconstruction, images must be sampled above
codec based on set partitioning in hierarchical trees, IEEE the Nyquist rate, specifically, twice the greatest frequency
Trans. Circuits Syst. Video Technol. 6(3): 243–250 (June in a band-limited signal, and infinite-duration interpola-
1996). tion functions must be employed. In practice, however,
1074 IMAGE PROCESSING
infinite-duration functions are infeasible and images are components can be subsampled, typically by a factor of 2
often represented with fewer than the optimum number of in each direction, with little loss in the subjective quality.
samples for conservation of storage space or transmission
bandwidth. Subsampling of images reduces the number
of picture elements, pixels, used to represent an image. 2. FILTERING AND MORPHOLOGICAL OPERATORS
However, interpolation of a subsampled image may result
in visually apparent degradation, typically in the form of Sampling and interpolation are two examples of image
blockiness or blurring. Jain discusses image sampling and filtering. Many other image processing applications, such
basic interpolation functions [1]. as object identification, edge enhancement, or artifact
Following sampling, an image is represented by a removal, also rely on filtering. Given an original image, x,
two-dimensional signal denoted by xi, j = x(i, j); i = 1 · · · V, the filtering operation is
j = 1 · · · H, where V and H are the vertical and horizontal
length in pixels, respectively. By convention, the upper x̂i,j = Wk,l xi+k,j+l
left corner of the image is pixel x1,1 . The pixel values are k,l∈N
continuous and must be quantized to further limit storage
requirements. The most basic quantizer is the uniform where N is a set of pixels in the designated neighborhood of
quantizer in which the continuous range of values is the (i, j)th pixel, Wk,l is the kernel, or weighting function,
subdivided into a finite number of equal-length intervals. and x̂ is the filtered image. The neighborhood, N, can be
All pixels with values falling within each interval are as small as one pixel or as large as the entire image.
then assigned the same value. If the quantization is fine, In the case of sampling, the kernel is a two-dimensional
that is, if a large number of quantization intervals is comb function; for interpolation, the kernel can be a two-
employed, then no subjective degradation will be apparent. dimensional rectangle or sinc function. Appropriate filter
In practice, sample values often have a nonuniform shapes for particular applications will be mentioned in the
distribution such as Laplacian or Gaussian. In this following sections.
case, better subjective results may be obtained with For a class of images, the binary images in which pixels
a nonuniform quantizer and quantization intervals are can be only black or white, morphological operators can be
determined by the Lloyd–Max algorithm. a valuable tool for many of the analysis and reconstruction
There is a significant trade-off between the storage processing operations. Morphological operators essentially
requirement, which is a function of the number of perform filtering with different shape and size kernels to
quantization intervals, and the subjective quality of the switch the binary value of the pixel, based on the pixel’s
image. The shades of gray in black-and-white (B/W) images neighbors. Through iterations of erosion, the shrinking
are typically represented by 256 quantization levels. The of an object, or dilation, the expanding of an object, and
storage requirement is then log2 256 = 8 bits per pixel. other image operations, such as addition or subtraction,
Thus, even a small image requires significant memory for this class of operators can enhance lines, remove noise,
storage. However, reducing the representation to 7 bits per and segment images. Morphological operators have
pixel may result in loss of small objects in the image [2]. been extended to grayscale images. Dougherty gives an
Color images can be represented in many color spaces. introduction to this class of filters [3].
One of the most frequently used color spaces in image
processing is the RGB color space, which indicates the
3. IMAGE TRANSFORMS
proportion of the red, green, and blue components. The
value of an image pixel, xi, j , is then a vector in three
dimensions. Each vector component assumes a value in Often image processing applications, such as compression
the range [0, max], where max is typically normalized and reconstruction, are performed in a transform domain.
to 2n for n quantization levels. For full-color RGB each The three most commonly used transforms are the discrete
color is represented by 8 bits for a total of 24 bits or 224 Fourier transform (DFT), the discrete cosine transform
color combinations; this number of colors is significantly (DCT), and the discrete wavelet transform (DWT). The
more than the human eye can recognize. Other, more discrete Fourier transform over an N × N region of image
visually intuitive, color spaces such as hue, saturation, and pixels is given by
intensity (HSI) or hue, lightness, and saturation (HLS) can
be employed as well. N−1
N−1
Xk,l = xm, n e−j2π(mk+nl)/N
A drawback of the RGB colorspace is that for natural
m=0 n=0
images, there is significant correlation between the color
components. Other spaces exploit this correlation and while the inverse DFT is given by
thus are more common for applications that require
efficient color representation. In the YIQ space used in
1
N−1 N−1
North American television, the YUV color space used in xm,n = Xk,l ej2π(mk+nl)/N .
European television system, and the YCbCr used in image N 2 k=0 l=0
and video compression standards, there is a luminance or
B/W component and two chrominance components. The DCT is more commonly used in image and video
To further reduce the storage space or transmission compression standards than the DFT, since it has excellent
bandwidth required for each image, the chrominance energy compaction properties and has performance close
IMAGE PROCESSING 1075
to the optimal Karhunen–Loeve transform. Given a pixel Synthesis, or reconstruction, is performed in the
block of size N × N, the DCT is given by opposite manner. The synthesis filters are quadrature
mirror filters designed to cancel the aliasing effects of
N−1
N−1
2 kπ(2m + 1) the interpolation process. With appropriate filter design,
Xk,l = c(k)c(l) xm,n cos that is, choice of wavelet basis, applications such as
N m=0 n=0
2N
compression can produce perceptually better results than
lπ(2n + 1) can DCT-based compression at the same rate, since the
× cos
2N transform can efficiently act on large regions of the image.
5. IMAGE COMPRESSION
invertible transform that preserves all image information version of the original image, the higher subbands can
and is employed on images in which all information is be discarded or coarsely quantized with little effect on
valuable, such as security, medical, or technical images. the reconstructed image quality. The wavelet coefficients
Lossless compression typically involves some form of are quantized and coded independently for each subband.
entropy coding such as Huffman or arithmetic coding. The quantization of coefficients can be controlled by
Image symbols are assigned codewords on the basis of their an algorithm such as the embedded zero-tree wavelet
probabilities; more likely symbols are assigned shorter algorithm, which iteratively determines which coefficients
codewords. Often a form of prediction is employed as are most significant and increases the precision of these
well. For example, pixel values can be predicted on the coefficients. These wavelet compression techniques are
basis of their neighbors’ values. The difference between flexible in that they permit coding of nonsquare regions
the prediction and the actual pixel value is then entropy- of interest. Thus, arbitrarily shaped objects in an image
coded. Naturally, these methods only work well if the may be segmented by an analysis algorithm and more
symbol probability estimation and the prediction are good. important objects compressed less than others for better
Lossy standards such as JPEG, JPEG-2000, MPEG-2, subjective quality. Future compression gains will likely
and MPEG-4 (JPEG — Joint Photographic Experts Group; be made through improved segmentation algorithms and
MPEG — Moving Picture Experts Group) rely on a region-of-interest compression.
combination of either the DCT or DWT, quantization, and
entropy coding in order to compress images or video. In 6. IMAGE ENHANCEMENT AND RECONSTRUCTION
block-based techniques, images are divided into blocks
of pixels. These blocks are DCT transformed and the Because the human visual system is more sensitive
resulting coefficients are quantized and either Huffman to certain image components than others, the more
or arithmetic-coded. Thus, spatial redundancy is taken perceptually important components of an image can be
advantage of and high frequencies are removed from the enhanced, often at the expense of the others, in order
image. For video coding, temporal redundancy is utilized. to provide a subjectively higher-quality image. Either
Blocks of pixels in neighboring images tend to be similar, point operations or histogram equalization can be used
so for many blocks, a motion vector can be determined. to modify the contrast and brightness of an image in
This motion vector indicates the displacement of a block order to enhance or emphasize objects that are washed
to a similar block in a neighboring frame. The difference out or hidden in a dark region. Contrast enhancement,
between the two blocks is then transformed and coded. either contrast stretching or brightness enhancement, is
The more recently developed lossy standards are performed by manipulating pixel grayscale levels, that
wavelet-based. Because the lowest subband is a smoothed is, by nonlinear or linear mappings of grayscale values.
IMAGE PROCESSING 1077
If the slope of a linear mapping is negative, then the measures must be taken to restore these degraded
pixel values are reversed and the mapping will result in a images.
negative image. A similar process, histogram equalization, There are various methods for noise removal; perhaps
which stretches an image histogram so that it spans the the simplest is lowpass filtering or smoothing by
color value range, will also result in the increase of image neighborhood averaging in the spatial domain. Common
contrast. kernels employed in the filtering operation are uniform,
Since the human visual system is sensitive to image Gaussian, and Savitsky–Golay. More complex iterative
edges, these high-frequency image components can be or adaptive noise and blur removal methods can be
enhanced, typically by taking a local directional derivative, employed as well. Both stochastic and deterministic
in order to make the image appear sharper. The Laplacian filtering and estimation have been performed with much
operating on the image luminance is an approximation success [8,9].
Low-bit-rate compression by standard methods often
to the second derivative of the brightness. It acts as a
results in either block artifacts or ringing due to
highpass filter, by increasing the contrast at distinct lines
quantization of transform coefficients. An example of
in images and setting pixels in homogeneous areas to
blockiness resulting from JPEG compression is shown
zero. For edges that are step functions or plateaus in
in Fig. 3c. Techniques for compensating for this type of
brightness, the Roberts and Sobel operators result in degradation are local adaptive filtering in the spatial
better performance than does the Laplacian operator. or frequency domains. Molina et al. provide a survey of
These filters approximate the first derivative and are deblocking techniques [10].
applied oriented in multiple directions to capture diagonal When block-based coding schemes are employed,
edges. channel errors may affect the compressed bitstream and
Following compression, transmission over a noisy cause the decoder to lose synchronization and thus lose
or fading channel, or capture by imperfect methods, an entire block or multiple blocks. A typical example is
there may be degradation in the received image. This shown in the left side of Fig. 3d. For intraimage recovery,
resulting degradation may result from noise or speckling, typically some form of replacement or smoothing is
coding artifacts, or data loss. Figure 3 illustrates three employed. Blocks of pixels can be replaced by surrounding
types of degradation. Error reconstruction or concealment blocks, or interpolated from surrounding pixels, taking
(a) (b)
(c) (d)
Figure 3. Selected examples of image degradation. (a) original image; (b) image with noise;
(c) image with block artifacts; (d) image with block loss.
1078 IMAGE PROCESSING
into consideration image features such as lines. If the is to embed into an image an identifying mark that
errors arise in coded video, a combination of temporal and may or may not be readily identifiable to a viewer,
spatial error concealment is often employed. but is easily detectable either when compared with the
original image or with knowledge of a key. Typically,
7. QUALITY EVALUATION the watermark is spread throughout the image with the
use of a pseudonoise key. Keys may be private or public;
A measure of image quality is vital for either evaluation however, public keys permit the deletion of watermarks
of the appropriateness of an image for a particular since the user can readily identify the location of the
application, or for evaluation of the effects of processing watermark and the method of watermarking. Because
on an image. These measures differ broadly depending images may be stored, transmitted, copied, or printed, the
on the method of image acquisition and the image watermark must be robust in the face of scanning, faxing,
processing application. The simplest metrics are objective compression/decompression, transmission over a noisy
measures. For radar images, measures include the clutter : channel, and the further addition of watermarks [13].
noise ratio, the resolution, and metrics of additive and To effectively and imperceptibly embed data within
multiplicative noise [11]. If, following image processing, an image, image components must be modified only
the original image, x, is available for comparison, the slightly through the use of a key known to the
quality of a reconstructed image denoted by x̂ can be watermark generator. Since many image operations,
measured by one of four common distortion metrics: such as compression or filtering, tend to remove image
components that are perceptually insignificant, the
V
H
watermark must be embedded into perceptually important
x2i,j components. The image data are divided into perceptual
i=1 j=1
SNR = 10 log10 components, such as the frequency or color components,
V
H
and the watermark is embedded into one or more
(xi,j − x̂i,j )2
components depending on the components’ robustness to
i=1 j=1
distortion. Watermarking or data hiding may be performed
(maxi,j xi,j )2 in the spatial or transform domains. In the spatial domain
PSNR = 10 log10 one common watermarking technique is to amplitude-
V
H
(xi,j − x̂i,j )2 modulate a regular pattern of blocks of pixels by a small
i=1 j=1 amount another is to increment or decrement the means
of blocks. In the transform domain, the DFT, DCT, and
1
V
H
MSE = (xi,j − x̂i,j )2 DWT techniques are commonly used and information
HV is embedded in the transform phase or by imposing
i=1 j=1
relationships between transform coefficients.
1
V H
MAD = |xi,j − x̂i,j |
HV BIOGRAPHY
i=1 j=1
where SNR = signal-to-noise ratio, PSNR = peak signal- Maja Bystrom received a B.S. in computer science and a
to-noise ratio, MSE = mean-squared error, and MAD = B.S. in communications from Rensselaer Polytechnic Insti-
mean absolute difference. tute, Troy, New York, in 1991. She joined NASA-Goddard
A more intuitive measure of quality would be a subjec- Space Flight Center where she worked as a computer engi-
tive measure. The (U.S.) National Image Interpretability neer until August 1992. She then returned to Rensselaer
Rating Scale (NIIRS) is a 10-level scale used to rate images where she received M.S. degrees in electrical engineering
for their usefulness in terms of resolution and clarity [12]. and mathematics, and a Ph.D. in electrical engineering.
This scale is typically used to evaluate images acquired In 1997 she joined Drexel University, Philadelphia, Penn-
from airborne or space-based systems. For compression sylvania, as an assistant professor. She has received NSF
and enhancement applications a quality measure such as CAREER and Fulbright awards. Her research interests
the five-point ITU-R BT.500-10 scale can be employed. are image processing for communications, and joint source-
Subjective quality is determined by taking a large sample channel coding/decoding.
of evaluations; that is, a large number of viewers rate the
image on a selected scale, and the result is averaged. How-
ever, care must be taken in selecting evaluators, since BIBLIOGRAPHY
trained viewers tend to notice different effects than do
novices. 1. A. K. Jain, Fundamentals of Digital Image Processing,
Prentice-Hall, Englewood Cliffs, NJ, 1989.
8. DIGITAL WATERMARKING AND DATA HIDING 2. M. A. Sid-Ahmed, Image Processing: Theory, Algorithms, &
Architectures, McGraw-Hill, New York, 1995.
With the growth in image transmission, reproduction, 3. E. R. Dougherty, An Introduction to Morphological Image
and storage capabilities, digital image information hiding Processing, SPIE Optical Engineering Press, Bellingham,
and watermarking have become necessary tools for WA, 1992.
maintaining security and proving ownership of images. 4. G. Strang and T. Nguyen, Wavelets and Filter Banks, 2nd ed.,
The primary goal in image watermarking and data hiding Wellesley-Cambridge Press, Wellesley, MA, 1997.
IMAGE SAMPLING AND RECONSTRUCTION 1079
5. R. Nitzberg, Radar Signal Processing and Adaptive Systems, IMAGE SAMPLING AND RECONSTRUCTION
Artech House, Boston, 1999.
6. H. Wechsler, P. J. Phillips, V. Bruce, F. F. Soulie, and H. J. TRUSSELL
T. S. Huang, eds., Face Recognition: From Theory to North Carolina State University
Applications, Springer-Verlag, Berlin, 1998. Raleigh, North Carolina
7. J. Weng, T. S. Huang, and N. Ahuja, Motion and Structure
from Image Sequences, Springer-Verlag, Berlin, 1993.
1. INTRODUCTION
8. A. K. Katsaggelos, ed., Digital Image Restoration, Springer-
Verlag, Berlin, 1991. Images are the result of a spatial distribution of radiant
9. G. Demoment, Image reconstruction and restoration: energy. We see, record, and create images. The most
Overview of common estimation structures and problems, common images are two-dimensional color images seen on
IEEE Trans. Acoust. Speech Signal Process. 37: 2024–2036 television. Other everyday images include photographs,
(Dec. 1989). magazine and newspaper pictures, computer monitors,
10. R. Molina, A. K. Katsaggelos, and J. Mateos, Removal of and motion pictures. Most of these images represent
blocking artifacts using a hierarchical bayesian approach, realistic or abstract versions of the real world. Medical
in A. Katsaggelos and N. Galatsanos, eds., Signal Recovery and satellite images form classes of images where there
Techniques for Image and Video Compression, Kluwer, is no equivalent scene in the physical world. Computer
Boston, 1998. animation produces images that exit only in the mind of
11. W. G. Carrara, R. S. Goodman, and R. M. Majewski, Spot- the graphic artist.
light Synthetic Aperture Radar: Signal Processing Algo- In the case of continuous variables of space, time, and
rithms, Artech House, Boston, 1995. wavelength, an image is described by a function
12. J. Pike, National image interpretability rating scales
(Jan. 10, 1998), Federation of American Scientists, f (x, y, λ, t) (1)
http://www.fas.org/irp/imint/niirs.htm (posted Aug. 10,
2000); accessed 8/10/2000, updated 1/16/98. where x, y are spatial coordinates (angular coordinates
13. S. Katzenbeisser and F. A. P. Petitcolas, eds., Information can also be used), λ indicates the wavelength of the
Hiding Techniques for Stegnography and Digital Watermark- radiation, and t represents time. It is noted that images
ing, Artech House, Boston, 2000. are inherently two-dimensional (2D) spatial distributions.
Higher-dimensional functions can be represented by
a straightforward extension. Such applications include
FURTHER READING
medical CT and MRI, as well as seismic surveys. For
this article, we will concentrate on the spatial and
Aign S., Error concealment for MPEG-2 video, in A. Katsaggelos
and N. Galatsanos, eds., Signal Recovery Techniques for Image wavelength variables associated with still images. The
and Video Compression, Kluwer, Boston, 1998. temporal coordinate will be left for another chapter.
Bhaskaran V. and K. Konstantinides, Image and Video Com- In order to process images on computers, the images
pression Standards: Algorithms and Architectures, Kluwer, must be sampled to create digital images. This represents
Boston, 1997. a transformation from the analog domain to the discrete
Bowyer K. and N. Ahuja, eds., Advances in Image Understanding, domain. In order to view or display the processed images,
IEEE Computer Society Press, Los Alamitos, CA, 1996. the discrete image must be transformed back into the
Giorgianni E. and T. Madden, Digital Color Management: Encod- analog domain. This article concentrates entirely on the
ing Solutions, Addison-Wesley, Reading, MA, 1998. very basic steps of sampling an image in preparation for
Home site of the JPEG and JBIG committees, http://www.jpeg.org processing and reconstructing or displaying an image. This
(Aug. 10, 2000). may seem to be a very limited topic but let us consider
Netravali A. and B. Haskell, Digital Pictures: Representation, what will not be covered in this limited space.
Compression and Standards, 2nd ed., Plenum Press, New We introduced images as distributions of radiant
York, 1995. energy. The exact representation of this energy and its
Lindley C. A., Practical Image Processing in C, Wiley, New York, measurement is the subject of radiometry. For this article,
1991. we will ignore the physical representation of the radiant
Pennebaker W. B. and J. L. Mitchell, JPEG Still Image Data source. We will treat the image as if everyone knows what
Compression Standard, Van Nostrand Reinhold, New York, the value of f (x, y) means and how to interpret the two-
1993. dimensional gray-level distributions that will be used in
Rihaczek A. and S. Hershkowitz, Radar Resolution and Complex- this chapter to demonstrate various principles.
Image Analysis, Artech House, Boston, 1996. If we include the frequency or wavelength distribution
Russ J. C., The Image Processing Handbook, 2nd ed., CRC Press, of the energy, we can discuss spectrometry. Images for
Boca Raton, FL, 1995. most consumer and commercial uses are the color images
Sangwine S. J. and R. E. N. Horne, eds., The Colour Image that we see everyday. These images are transformations
Processing Handbook, Chapman & Hall, London, 1998. of continuously varying spectral, temporal and spatial
Sezan M. I. and A. M. Tekalp, Survey of recent developments in distributions. In order to fully understand the effects of
digital image restoration, Opt. Eng. 29: (May 1990). sampling and reconstruction of color images, more under-
Vetterli M. and J. Kovacevic, Wavelets and Subband Coding, standing of the human visual system is required than can
Prentice-Hall, Upper Saddle River, NJ, 1995. be presented here. Satellite images are now being recorded
1080 IMAGE SAMPLING AND RECONSTRUCTION
in multispectral and hyperspectral bands. In this termi- The proper display of images requires calibration of both
nology, a hyperspectral image has more than 20 bands. the input and output devices [15,16]. This is another topic
We will only touch on the basics of color sampling. that must be omitted for lack of space. For now, it is
All images exist in time and change with time. We’re reasonable to give some general rules about the display of
all familiar with the stroboscopic effects that we see in monochrome images:
the movies that make car wheels and airplane propellers
appear to move backward.1 The same sampling principles 1. For the comparison of a sequences of images, it is
can be used to explain these phenomena as will be used imperative that all images be displayed using the
to explain the spatial sampling that is presented here. same scaling.
The description of object motion in time and its effect on 2. Display a step-wedge, a strip of sequential gray
images is another rich topic that will be omitted here. levels from minimum to maximum values, with the
Before presenting the fundamentals of image presen- image to show how the image gray levels are mapped
tation, it necessary to define our notation and to review to brightness or density. This allows some idea of the
the prerequisite knowledge that is required to understand quantitative values associated with the pixels.
the following material. A review of rules for the display 3. Use a graytone mapping which allows a wide range of
of images and functions is presented in Section 2, fol- gray levels to be visually distinguished. In software
lowed by a review of mathematical preliminaries and such as MATLAB, the user can control the mapping
sampling effects in Section 3. Section 4 discusses sim- between the continuous values of the image and the
pling on a nonrectangular lattice. The practical case of values sent to the display device. It is recommended
using a finite aperture in the sampling process is pre- that adjustments be made so that a user is able
sented in Section 5. Section 6 reviews color vision and to distinguish all levels of a step-wedge of about
describes multidimensional sampling with concentration 32 levels.
on sampling color spectral signals. We will discuss the fun-
damental differences between sampling the wavelength
3. SPATIAL SAMPLING
and spatial dimensions of the multidimensional signal.
Finally, Section 7 contains a mathematical description of
In most cases, the multidimensional process can be
the display of multidimensional data. This area is often
represented as a straightforward extension of one-
neglected by many texts. The section will emphasize the
dimensional processes. Thus, it is reasonable to mention
requirements for displaying data in a fashion that is both
the one-dimensional operations that are prerequisite to
accurate and effective.
understanding this article and will form the basis of the
mutidimensional processes.
2. PRELIMINARY NOTES ON DISPLAY OF IMAGES
3.1. Ideal Sampling in One Dimension
One difference between 1D and 2D functions is the way Mathematically, ideal sampling is usually represented
they are displayed. One-dimensional functions are easily with the use of a generalized function, the Dirac delta
displayed in a graph where the scaling is obvious. The function, δ(t) [2]. The function is defined as zero for t = 0
observer need examine only the numbers that label the and having an area of unity. The most useful property of
axes to determine the scale of the graph and get a mental the delta function is that of sifting, for instance, extracting
picture of the function. With two-dimensional scalar- single values of a continuous function. This is defined by
valued functions, the display becomes more complicated. the integral
The accurate display of vector-valued two-dimensional ∞ ∞
functions, including color images, will be discussed s(t0 ) = s(t)δ(t − t0 ) dt = s(t0 )δ(t − t0 ) dt (2)
after covering the necessary material on sampling and −∞ −∞
colorimetry.
This shows the production of a single sample. We would
Two-dimensional functions can be displayed as an
represent the sampled signal as a signal that is zero
isometric plot, a contour plot, or a grayscale plot. Since we
everywhere except at the sampling time, st0 (t) = s(t0 )δ(t −
are dealing with images, we will use the grayscale plot for
t0 ). The sampled signal can be represented graphically by
images and the isometric plot for functions. All three types
using the arrow, as shown in Fig. 1.
are supported by MATLAB [1]. The user should choose the
The entire sampled sequence can be represented using
right display for the information to be conveyed. For the
the comb function
images used in this article, we should review some basic
rules for display.
∞
Consider a monochrome image that has been digitized comb(t) = δ(t − n) (3)
n=−∞
by some device, such as a scanner or camera. Without
knowing the physical process that created the image, it is where the sampling interval is unity. The sampled signal
impossible to determine the best way to display the image. is obtained by multiplication
1
∞
∞
We used to use the example of stagecoach wheels moving sd (t) = s(t)comb(t) = s(t) δ(t − n) = s(t)δ(t − n)
backward, but, alas, there are few Western movies anymore. n=−∞ n=−∞
Time marches on. (4)
IMAGE SAMPLING AND RECONSTRUCTION 1081
Sampled signal st0= s (t 0)d(tt0) process is shown in Fig. 3. The spectra in this figure
0 correspond to the time-domain signals in Fig. 2. The most
important feature of the spectrum of the sampled signals
is the replication of the analog spectrum. Mathematically,
if the spectrum of s(t) is denoted S(ω), then the spectrum
of the sampled signal, sd (t), is given by
s (t _0)
∞
Sd (ω) = S(ω − k2π Fs )
k=−∞
2
1 The two-dimensional Dirac delta function can be defined as
the separable product of one-dimensional delta functions,
0
δ(x, y) = δ(x)δ(y). The extension of the comb function to
0 1 2 3 4 5 6 7 8 9 10
two dimensions should probably be called a ‘‘brush,’’ but
3
we will continue to use the term comb and define it by
2
sd (t)
1
∞
∞
comb(x, y) = δ(x − m, y − n)
0 m=−∞ n=−∞
0 1 2 3 4 5 6 7 8 9 10
Time The equation for 2D sampling is
Figure 2. One-dimensional sampling.
s(m, n) = sd (x, y) = s(x, y)comb(x, y) (5)
2The proof of this is also available in the undergraduate signals where a normalized sampling interval of unity is assumed.
and systems texts [e.g., 2]. We have the same constraints on the sampling rate in two
1082 IMAGE SAMPLING AND RECONSTRUCTION
Spectrum analog 1
0.5
0
−1.5 −1 −0.5 0 0.5 1 1.5
Spectrum of comb
0.5
0
−1.5 −1 −0.5 0 0.5 1 1.5
Spectrum sampled
0.5
0
−1.5 −1 −0.5 0 0.5 1 1.5
F (hertz)
dimensions as in one. Of course, the frequency is measured Figure 7 shows the spectrum of a continuous analog image.
not in hertz, but in cycles per millimeter or inch.3 Spatial If the image is sampled with intervals of δx in each
sampling is illustrated in Figs. 4–6. Undersampling in the direction, the sampled image has a spectrum that shows
spatial domain signal results in spatial aliasing. This is periodic replications at 1/δx. The central portion of the
easy to demonstrate using simple sinusoidal images. First, periodic spectrum is shown in Fig. 8. For Fig. 8, we have
let us consider the mathematics. used δx = 301
mm.
Taking Fourier transforms of Eq. (5) yields Note that if the analog image, s(x, y), is bandlimited
to some 2D region, it is possible to recover the original
∞
∞
signal from the samples by using an ideal lowpass filter.
Sd (u, v) = S(u, v) ∗ comb(u, v) = S(u − k, v − l) The proper sampling intervals are determined from the
k=−∞ l=−∞
(6) requirement that the region of support in the frequency
where the asterisk (∗) denotes convolution. Note that domain (band limit) is contained in the rectangle defined
the sampled spectrum is periodic, as in the one- by |u| ≤ (1/2x) and |v| ≤ (1/2 y).
dimensional case. The effect of sampling can be demonstrated in the
Consider the effect of changing the sampling interval following examples. In these examples, aliasing will be
demonstrated by subsampling a high-resolution digital
x y image to produce a low resolution image. First let us
s(m, n) = s(x, y)comb , (7) consider a pure sinusoid. The function
x y
which yields 36x 24y
s(x, y) = cos 2π +
128 128
Sd (u, v) = S(u, v) ∗ comb(u, v)
where x is measured in mm, is sampled at 1 mm spacing in
1
∞
∞
= S(ux − k, vy − l) (8) each direction. This produces no aliasing. The function and
|xy| its spectrum are shown in Figs. 9 and 10, respectively.4
k=−∞ l=−∞
Note that the frequency of the spectrum is in normalized
3 Image processors use linear distance most often, but occasionally
use angular measurement, which yields cycles per degree. This 4 The spectra of the sinusoids appear as crosses instead of points
is done when considering the resolution of the eye or an optical because of the truncation of the image to a finite region. The full
system. explanation is beyond the scope of this article.
Spatial domain signal
4.5
3.5
2.5
1.5
0.5
0
60
40
60
20 40
0 20
−20 0
−20
−40
−40
−60 −60
y (mm)
x (mm)
Comb
4.5
3.5
2.5
1.5
0.5
0
60
40
60
20 40
0 20
−20 0
−40 −20
−40
y (mm) −60 −60 x (mm)
1083
Sampled signal
4.5
3.5
2.5
1.5
0.5
0
60
40
60
20 40
0 20
−20 0
−40 −20
−40
−60 −60
y (mm) x (mm)
0.8
0.6
0.4
0.2
40
20 40
0 20
0
−20
−20
−40
Fy (cycles/mm) −40 Fx (cycles/mm)
1084
IMAGE SAMPLING AND RECONSTRUCTION 1085
0.8
0.6
0.4
0.2
40
20 40
0 20
0
−20
−20
−40 −40
Fy (cycles/mm) Figure 8. Digital spectrum with
Fx (cycles/mm)
aliased images of analog spectrum.
digital frequency.5 For this case, the analog frequency, cos(2*pi*(36*x /128 + 24*y /128))
F and the digital frequency, f , are the same. This
yields Fx = fx = 128
36
= 0.28125 and Fy = f4 = 128 24
= 0.1875.
The spectrum shows peaks at this 2D frequency. The 0.9
image is subsampled by a factor of 4 and shown in
0.8
Fig. 11. This is equivalent to a sampling of the analog
signal with an interval of 4mm in each direction. The 0.7
aliased 2D frequency can be found by finding k and
l so that both |Fx − kFs | < 0.5Fs and |Fy − lFs | < 0.5Fs 0.6
hold. For this case, k = l = 1 and the aliased analog
y (mm)
will yield the same samples as s(x, y) above when sampled 0.2
at 4mm intervals in each direction. The spectrum of
0.1
the sampled signal is shown in Fig. 12. The digital
frequencies can be found by normalizing the aliased 0
analog frequency by the sampling rate. For this case, 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
(fx , fy ) = (0.03125/0.25, −0.0625/0.25) = (0.125, −0.25). x (mm)
An example of sampling a pictorial image is shown in
Figure 9. cos[2π(36x/128 + 24y/128)].
Figs. 13–16, where Fig. 13 is the original; Fig. 14 is its
spectrum; Fig. 15 is a 2 : 1 subsampling of the original;
Fig. 16 is the spectrum of the subsampled image. For this 4. SAMPLING ON NONRECTANGULAR LATTICES
case, we can see that the lower frequencies have been
preserved but the higher frequencies have been aliased. Because images may have oddly shaped regions of support
in the frequency domain, it is often more efficient to sample
with a nonrectangular lattice. A thorough discussion
5Normalized digital frequency is denoted by F and has the of this concept is found in the book by Dudgeon and
constraint |F| ≤ 12 [2]. Mersereau [3]. To develop this concept, it is convenient to
1086 IMAGE SAMPLING AND RECONSTRUCTION
−0.4
−0.2
−0.1
0.1
0.2
0.3
0.4
∞
∞
0.9 comb(r) = δ(r − mx1 − nx2 ) (9)
m=−∞ n=−∞
0.8
This yields the functional form
0.7
s(m, n) = s(mx1 + nx2 ) (10)
0.6
y (mm)
−0.4
−0.2
−0.1
0.1
0.2
0.3
0.4
Original bars
50
100
150
200
250
300
350
400
450
500
100 200 300 400 500
1087
Spectrum of original image
−0.5
−0.4
−0.2
−0.1
0.1
0.2
0.3
0.4
50
100
150
200
250
50 100 150 200 250
1088
IMAGE SAMPLING AND RECONSTRUCTION 1089
−0.4
−0.2
−0.1
0.1
0.2
0.3
0.4
The sampled spectrum can be written where a(t) represents the impulse response (or light
variation) of the aperture. This is simple correlation
S(w) = S(w) ∗ comb(w) and assumes that the same aperture is used for every
1
∞ ∞ sample. The mathematical representation can be written
= S(w − kw1 − lw2 ) (15) as convolution if the aperture is symmetric or, we replace
|X|
k=−∞ l=−∞ the function a(t) with a(−t). The sampling of the signal
can be represented by
(see the books by Dudgeon and Mersereau [3] and Jain [8]).
t
5. SAMPLING USING FINITE APERTURES s(n) = [s(t) ∗ a(t)]comb (17)
T
Practical imaging devices, such as videocameras, CCD where ∗ represents convolution. This model is reasonably
(charge-coupled device) arrays, and scanners, must use a accurate for spatial sampling of most cameras and
finite aperture for sampling. The comb function cannot be scanning systems.
realized by actual devices. The finite aperture is required The sampling model can be generalized to include the
to obtain a finite amount of energy from the scene. The case where each sample is obtained with a different
engineering tradeoff is one of signal-to-noise ratio (SNR) aperture. For this case, the samples which need not be
versus spatial resolution. Large apertures, receive more equally spaced, are given by
light, and thus, will have higher SNR’s than smaller un
apertures, while smaller apertures permit higher spatial s(n) = s(t)an (t) dt (18)
resolution than will larger ones. This is true for apertures ln
larger than the order of the wavelength of light. For
smaller apertures, diffraction limits the resolution. where the limits of integration correspond to the region of
The aperture may cause the light intensity to vary over support for each aperture. A common application of this
the finite region of integration. For a single sample of a representation in two dimensions is the finite area of a
one-dimensional signal at time nT, the sample value can CCD element of an imaging chip. The aperture function
be obtained by a(t) may also take into account the leakage of charge from
one cell to another. Equation (18) is also important in
nT
representing sampling the wavelength dimension of the
s(n) = s(t)a(t − NT) dt (16)
(n−1)T image signals.
1090 IMAGE SAMPLING AND RECONSTRUCTION
The generalized signal reconstruction equation has The problem with this approach is that the spectrum
the form A(u, v) often has values that are very small or zero which
∞
makes the inverse filter ill-conditioned; that is, H(u, v) will
s(t) = s(n)gn (t) (19)
n=−∞
have very large values that will amplify noise. Since most
apertures are lowpass, the small or zero values usually
where the collection of functions, {gn (t)}, provide the occur at higher frequencies. A common modification of the
interpolation from discrete to continuous space. The exact above correction is to include a term to make the filter
form of {gn (t)} depends on the form of {an (t)}. For sampling well-conditioned. Such a form is given by
using the ideal comb function with a sample interval of
T, gn (t) is a shift of the sinc function that represents the Hlp (u, v)
H(u, v) = (26)
ideal band-limited filter A(u, v)
sin(2π(t − nT)/T) where Hlp (u, v) is a lowpass filter.
gn (t) = (20)
2π(t − nT)/T
sensor. The sensitivity functions of the eye are shown in It is clear that there may be many different spectra that
any of the references [9–14]. appear to be the same color to the observer. Two spectra
Sampling of the radiant power signal associated with a that appear the same are called metamers. Metamerism is
color image can be viewed in at least two ways. If the goal one of the greatest and most fascinating problems in color
of the sampling is to reproduce the spectral distribution, science. It is basically color ‘‘aliasing’’ and can be described
then the same criteria for sampling the usual electronic by the generalized sampling described earlier.
signals can be directly applied. However, the goal of color The N-dimensional spectral space can be decomposed
sampling is not often to reproduce the spectral distribution into a 3D subspace known as the human visual subspace
but to allow reproduction of the color sensation. The goal is (HVSS) and an N − 3D subspace known as the black
to sample the continuous color spectrum in such a way that space. All metamers of a particular visible spectrum, r,
the color sensation of the spectrum can be reproduced by are given by
the monitor. To keep this discussion as simple as possible, x = Pv r + Pb g (32)
we will treat the color sampling problem as a subsampling
of a high-resolution discrete space, that is, the N samples
where Pv = S(ST S)−1 ST is the orthogonal projection
are sufficient to reconstruct the original spectrum using
operator to the visual space, Pb = [I − S(ST S)−1 ST ] is the
the uniform sampling of Section 3.
orthogonal projection operator to the black space, and g is
Let us assume that the visual wavelength spectrum
is sampled finely enough to allow the accurate use of any vector in N space.
numerical approximation of integration. A common sample It should be noted that humans cannot see (or detect)
spacing is 10 nanometers over the range 400–700 nm. all possible spectra in the visual space. Since it is a vector
Finer sampling is required for some illuminants with line space, there exist elements with negative values. These
emitters. Sampling of color signals is discussed in detail in elements are not realizable and thus cannot be seen. All
Ref. 9. With the assumption of proper sampling, the space vectors in the black space have negative elements. While
of all possible visible spectra lies in an N-dimensional the vectors in the black space are not realizable and cannot
vector space, where N = 31. The spectral response of be seen, they can be combined with vectors in the visible
each of the eye’s sensors can be sampled as well, giving space to produce a realizable spectrum.
three linearly independent N vectors that define the visual If sampling by an optical device, with sensor M, is done
subspace. correctly, the tristimulus values can be computed from the
Under the assumption of proper sampling, the integral optical measurements, v, that is, B can be chosen so that
of Eq. (27) can be well approximated by a summation
t = (ST r) = BMT r = Bv (33)
U
vk = r(nλ)mk (nλ) (28)
n=L
From the vector space viewpoint, the sampling is correct
if the three-dimensional vector space defined by the cone
where λ represents the sampling interval and the sensitivity functions is the same as the space defined by
summation limits are determined by the region of support the device sensitivity functions. Using matrix terminology,
of the sensitivity of the eye. The sensor mk (·) can represent the range spaces of the S and M are the same.
the eye as well as a photonic device. When we consider the sampling of reflective spectra,
The response of the sensors can be represented by we note that a reflective object must be illuminated to be
a matrix, M = [m1 , m2 , m3 ], where the N vectors, mi , seen. The resulting radiant spectra is the product of the
represent the response of the ith-type sensor (or cone). illuminant and the reflection of the object
Any visible spectrum can be represented by an N vector,
r. The response of the sensors to the input spectrum is a r = Lr0 (34)
3 vector, v, obtained by
where L is diagonal matrix containing the sampled radiant
v = MT r (29)
spectrum of the illuminant and the elements of the
reflectance of the object are constrained, 0 ≤ r0 (k) ≤ 1. The
Since we are interested in sampling to represent human
color sensitivity, let the matrix S = [s1 , s2 , s3 ], represent measurement of the appearance of the reflective object can
the sensitivity of the eye. The result of sensing with the be computed in the same way as the radiant object with the
eye, t note that the sensor matrices now include the illuminant,
t = ST r (30) where LS and LM must have the same range space.
It is noted here that most physical models of the
is given the special name of tristimulus vector or values. eye include some type of nonlinearity in the sensing
Two visible spectra are said to have the same color process. This nonlinearity is often modelled as a logarithm;
if they appear the same to the human observer. In our in any case, it is always assumed to be monotonic
linear model, this means that if r1 and r2 are two N within the intensity range of interest. The nonlinear
vectors representing different spectral distributions, they function, v = V(c), transforms the 3-vector in an element
are equivalent colors if independent manner:
Since equality is required for a color match by Eq. (31), the example, an image may be very small, say 64 × 64. If
function V(·) does not affect our definition of equivalent the image is displayed as one screen pixel for each image
colors. Mathematically pixel, the display device would show a reproduction that
is too small, say, 1 in. × 1 in. The viewer could not see
V(ST r1 ) = V(ST r2 ) (36) this well at normal viewing distances. To use the entire
area of the display requires producing an image with more
is true if, and only if, ST r1 = ST r2 . This nonlinearity does pixels. For this example, 8 × enlargement would produce
have a definite effect on the relative sensitivity in the a 512 × 512 image that would be 8 × 8 in. This type of
color matching process and is one of the causes of much interpolation reconstruction is very common when using
searching for the ‘‘uniform color space.’’ variable sized windows on a monitor.
The simple method of pixel replication is equivalent to
7. PRACTICAL RECONSTRUCTION OF IMAGES using a rectangular interpolating function, h(x, y). This is
shown in Figs. 9 and 11. The square aperture is readily
The theory of sampling states that a band-limited apparent. The figure of the pictorial image, Fig. 13, has
signal can be reconstructed if it is sampled properly. more pixels and uses a proportionally smaller reproduction
The reconstruction requires the infinite summation of aperture. The aperture is not apparent to the eye.5 In
a weighted sum of sinc functions. From the practical the case of the sinusoidal images of Figs. 9 and 11, the
viewpoint, it is impossible to sum an infinite number purpose of the image was to demonstrate sampling effects.
of terms and the sinc function cannot be realized with Thus, the obvious image of the aperture helps to make
incoherent illumination. Let us consider the two problems the sampling rate apparent. If we desire to camouflage the
separately. sampling and produce a smoother image for viewing, other
The finite sum can be modelled as the truncation of the methods are more appropriate.
infinite sum Bilinear interpolation uses a separable function com-
posed of triangle functions in each coordinate direction.
M
N
sin[π(x − m)] sin[π(y − n)] This is an extension of linear interpolation in one dimen-
ŝ(x, y) = s(m, n) sion. The image of Fig. 11 is displayed using bilinear
π(x − m) π(y − n)
m=−M n=−N interpolation in Fig. 17. The separability of the interpola-
(37) tion is noticed in the rectilinear artifacts in the image.
This is equivalent to truncating the number of samples by A more computationally expensive interpolation is
the use of the rect(·) function: the cubic spline. This method is designed to produce
x y continuous derivatives, in addition to producing a
ŝ(x, y) = s(x, y)comb(x, y)rect , ∗ sin c(x, y) (38) continuous function. The result of this method is shown
M N
in Fig. 18. For the smooth sinusoidal image, this method
Clearly, as the number of terms approaches infinity, works extremely well. One can imagine that images exist
the estimate of the function improves. Furthermore, the where the increased smoothness of the spline interpolation
sinc(x, y) is the optimal interpolation function for the would produce a result that appears more blurred than
mean-square error measure. Unfortunately, truncation by the bilinear method. There is no interpolation method
the ideal lowpass filter produces ringing at high-contrast that is guaranteed to be optimal for a particular image.
edges caused by Gibbs phenomenon. There are reasons to use the spline method for a wide
The inclusion of a practical reconstruction interpolation variety of images [19]. There are many interpolating
function can be modeled by replacing the sinc(x, y) by the functions that have been investigated for many different
general function h(x, y). The model is now given by applications [20].
x y
f̂a (x, y) = fa (x, y)comb(x, y)rect , ∗ h(x, y) (39)
M N 8. SUMMARY
The common forms of the interpolation function are the The article has given an introduction to the basics of
same as the sampling aperture when considering actual sampling and reconstruction of images. There are clearly
hardware, such as circular, rectangular, and Gaussian. several areas that the interested reader should expand on
These are used when considering output devices, such by additional reading. In-depth treatment of the frequency
as CRT (cathode ray tube) or flat-panel monitors, domain can be obtained from many of the common image
photographic film, and laser printers. The optimal design processing texts, such as that by Jain, [8]. The processing
of these apertures is primarily a hardware or optical of video and temporal imaging is covered well in Tekalp’s
problem. Software simulation usually plays a significant text [21]. Color imaging is treated in a special issue of
role in developing the hardware. Halftone devices, such the IEEE Transactions on Image Processing [17]. A survey
as inkjet printers, offset and gravure printing, use more paper in that issue is a good starting point on the current
complex models that are a combination of linear and state of the art. Sampling and reconstruction of medical
nonlinear processes [18]. There is another application images is treated in the treatise by Macovski [22].
that can be considered here that uses a wider variety
of functions.
Image interpolation is often done when a sampled 5 This is true even when the image is viewed without the effect of
image is enlarged many times its original size. For halftone reproduction, which is used here.
cos(2*pi*(36*x/128 + 24*y/128)) subsampled/bilinear interpolated
0.9
0.8
0.7
0.6
y (mm)
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 17. Linear interpolation of subsampled
x (mm) cos[2π(36x/128 + 24y/128)].
0.9
0.8
0.7
0.6
y (mm)
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 18. Spline interpolation of subsampled
x (mm) cos[2π(36x/128 + 24y/128)].
1093
1094 IMT-2000 3G MOBILE SYSTEMS
are examined in Section 5. Section 6 describes IMT- IMT-2000 standardisation activities for the radio
2000 radio access and core network architecture. In interface are conducted in ITU-R/WP 8F. At WARC
Section 7, the authors address the most salient economic 2000, additional frequency bands totalling 400 MHz were
implications of migration toward 3G, with emphasis on allocated for IMT-2000.
license costs and new business models. Section 8 discusses
evolution of mobile systems beyond 3G, which involves 3.2. IMT-2000 Radio Interface
not only an all-IP core network but also optimization
The early impetus for standardizing an IMT-2000 radio
of scarce spectral resources through cooperation among
interface, specifically, the interface between the mobile
heterogeneous networks.
terminal and the base station, can be attributed to an
observation of the contrasting situation of 2G systems in
2. DRIVING FORCES Europe and in the United States. In Europe, the Global
System for Mobile Communications (GSM) standard was
Two major trends are reshaping the world of telecom- developed and universally adopted before 2G systems were
munications. On one hand, mobile services have made first deployed in 1991, ensuring that customers’ terminals
great strides throughout the world, with penetration rates are compatible with mobile networks throughout the
exceeding 50% in many countries. On the other hand, continent. In the United States, the lack of country or
the explosion of Internet traffic testifies to the rapid continentwide harmonization, and the relatively greater
development of the multimedia market. The prospect for success of first-generation analog systems such as
convergence of these two trends, giving rise to mobile advanced mobile phone service (AMPS), led to the parallel
multimedia services, and the resulting need for greater development of three different 2G digital standards:
spectral resources, has driven equipment suppliers, oper-
ators, standards bodies, and regulators throughout the • Time-division multiple access (TDMA) including
world to develop a new generation of mobile systems. The digital AMPS (DAMPS) and IS136
stakes are considerable: around 2010, mobile traffic should • Code-division multiple access (CDMA), known as
be equal to that of fixed telephony [1]. The convergence of IS95
the mobile and Internet worlds, the strong dynamics of • Global System for Mobile Communications (GSM)
innovation, and anticipated cost reductions in these areas
have opened new opportunities for 3G services as of 2001
Table 1 provides a brief overview of the market share of
in Japan and possibly in the United States, and 2002 in
these standards in the United States.
Europe.
Despite the initial goal of a single worldwide 3G
air interface, standardization was strongly influenced by
3. STANDARDIZATION
Table 1. Market Share of 2G Cellular
3.1. IMT-2000 Frequency Spectrum
Standards in the United States
The International Telecommunications Union (ITU) ini- Technology 1999 2002
tiated 3G mobile standardization with the ambition of
defining a global standard replacing the broad variety GSM 4% 11–15%
of second-generation (2G) mobile systems, which implied AMPS 55% 20%
TDMA 25% 36%
common spectrum throughout the world. Hence, the first
CDMA 16% 28%
efforts on 3G systems truly started once the World Admin- Total subscribers (millions) 85 135
istrative Radio Conference (WARC) 92 had identified a
total of 230 MHz for IMT-2000, as illustrated in Fig. 1. Source: France Télécom North America.
IMT-2000 IMT-2000
ITU
Allocations 1885 1980 2010 2025 2110 2170 2200 MHz
mobile operators’ need to leverage their 2G investment. offers the advantage of enabling full use of the IMT-2000
Specifically, this meant that 3G systems must ensure back- frequency bands; the FDD mode is used in priority in the
ward compatibility with existing systems, providing seam- paired bands and the TDD mode, in unpaired bands. This
less handover to and from 2G systems, in addition to shar- compromise was then submitted to ITU-R as the European
ing tower infrastructure and transmission resources. As a proposal for the IMT-2000 radio interface.
result, in November 1999, the International Telecommuni- Standardization in Europe was strongly influenced by
cations Union — Radiocommunications (ITU-R) could not lobbying within forums such as the GSM Association
establish a consensus on any one 3G air interface and thus and the UMTS Forum, which strove to federate the
adopted five different solutions: stances of GSM operators and manufacturers regarding
the development of 3G systems. National regulation
• Universal Mobile Telecommunications System/Wide- authorities also played a fundamental role in defining
band CDMA (UMTS/WCDMA) one of the two modes the use of the spectrum identified by the WARC 92, and
supported by NTT DoCoMo, Nokia, and Ericsson, for the attribution of UMTS licenses.
and developed by the Third-Generation Partnership
Project (3GPP). 3.3.2. Japan, Korea, and China. In Japan, most 3G
• cdma2000, an evolution of the American CDMA developments were financed by mobile operator NTT
IS95A solution originally developed by Qualcomm DoCoMo. Japanese industry supported this R&D effort
and currently standardized by the Third-Generation to develop a new standard and take the lead in this
Partnership Project 2 (3GPP2). very competitive market. European manufacturers Nokia
and Ericsson took part in this effort, which led to the
• UMTS/TD-CDMA: UMTS mode combining time-
establishment of a common solution between them and
division (TD) and code-division multiple access,
Japan, based on WCDMA. This compromise was reached
supported by Siemens. This solution also includes a
just as ETSI was seeking candidates for its 3G mobile
specific option developed for China called TDSCDMA
system, leading to a convergent view between Japan and
(the S stands for synchronous). This mode is also
Europe in favor of WCDMA for the air interface. Although
developed by 3GPP.
other carriers like Japan Telecom followed this direction,
• Enhanced Data for GSM Evolution (EDGE) or UWC- KDDI, the second largest Japanese carrier, is strongly
136. This solution represents an evolution of both involved in cdma2000.
TDMA and GSM. In Korea, the Telecommunications Technology Asso-
• Digital enhanced cordless telephony DECT developed ciation (TTA) kept two paths open for the evolution of
by ETSI. the country’s CDMA mobile networks. Both WCDMA and
cdma2000 are officially supported by the Ministry of Infor-
Among these solutions, cdma2000, WCDMA, and EDGE mation and Communications. In fact, the three mobile
are discussed in greater detail in the following paragraphs. carriers (SK Telecom, KT Freetel and LG Telecom) have
already started to offer cdma2000 1x services in overlay
3.3. Regional Standardization of their 2G networks. The first two operators obtained 3G
licenses to deploy WCDMA, mainly for ease of roaming.
3.3.1. Europe. In Europe, impetus for the development of The government plans to grant another 3G license based
IMT-2000 was largely provided by European Commission on cdma2000.
research programs in the late 1980s and early 1990s In China, work on a 3G standard started in 1992 within
(RACE I and II,1 ACTS/FRAMES2 project). In 1991, with the First Research Institute of the Datang Group, now
the deployment of the first 2G systems, the European the Chinese Academy of Telecommunication Technology
Telecommunications Standards Institute (ETSI) created (CATT). After studying the European GSM standard in
a subcommittee to develop an IMT-2000 system called detail, the group developed its own standard, Synchronous
UMTS. Efforts first focused on defining the technical code-division multiple access (SCDMA), which became the
requirements for the radio interface, and various solutions basis for subsequent 3G developments. With the support
were presented at the December 1996 ETSI conference, of Siemens, TDSCDMA was adopted as an official ITU 3G
including three proposals by ACTS/FRAMES. Following a standard. The Chinese government is devoting significant
vote in January 1998, a compromise was found based resources and energy to building a national industry
on two harmonized modes: WCDMA and TDCDMA. around this standard, instead of relying on imported
WCDMA was adopted for the frequency-domain duplex network equipment and terminals. The country which,
(FDD) mode, namely, one frequency per transmission in 2001, represents the world’s second largest mobiles
direction, and TDCDMA for the time-domain duplex (TDD) market, can afford to develop its own standard, touted as
mode, specifically, time-division multiplexing of the two offering greater spectral efficiency than rivals WCDMA
directions on the same frequency. This combined solution and cdma2000 and being more cost-effective and more
easily integrated in the GSM environment. Carriers such
1 as state-owned China Mobile and China Unicom may
Research on Advanced Communication Technologies in Europe.
2 potentially deploy TDSCDMA.
Advanced Communication Technologies and Services/Future
Radio Wideband Multiple Access System). The main partners
were France Télécom, Nokia, Siemens, Ericsson, and CSEM/Pro 3.3.3. United States. In the United States, tough
Telecom. and unrestricted competition driven by the various
IMT-2000 3G MOBILE SYSTEMS 1097
manufacturers led to the creation of several standards entities3 developed with increasingly close contacts.
committees. Initially, the Telecommunication Industry In 1998, ETSI, the Association of Radio Industries
Association (TIA) was in charge of standardizing TDMA and Businesses (ARIB, Japan), TTC, as well as the
IS136 (TIA/TR43.3) and CDMA IS95A (TIA/TR45.5) while Telecommunications Technology Association (TTA) of
T1P1, the GSM Alliance, and the GSM Association were Korea and T1P1 of the United States founded the Third
involved in GSM standardization. When 3GPP2 was Generation Partnership Project (3GPP) as a forum to
created in 1999, the TIA/TR45.5 working group became a develop a common WCDMA standard, assuming GSM
major player in cdma2000 standardization. TIA/TR45.34, as a basis. The following year, the American National
on the other hand, decided that 3GPP2 was not the Standards Institute (ANSI) International Committee
suitable group for TDMA standardization and joined initiated 3GPP2, geared toward developing standards for
standardization efforts within the Universal Wireless the cdma2000 standard, in continuity with the CDMA
Communications Consortium (UWCC). IS95A standard. Member organizations include ARIB,
Establishing digital cellular coverage in the United China Wireless Telecommunication Standards Group
States is costly and carriers have had to invest a substan- (CWTS), Telecommunications Industry Association (TIA)
tial amount of money. It is thus not surprising to note from North America, Telecommunications Technology
that the primary recommendations of American proposals Association (TTA) from Korea and Telecommunications
for IMT-2000 often correspond to evolutions of existing Technology Committee (TTC) from Japan.
second-generation systems maintaining backward com- Harmonization efforts between 3GPP and 3GPP2
patibility in order to capitalize on the initial investment. resulted in a number of common features in the competing
In particular, cdma2000 1x was designed for smooth evo- technologies, and enabled work to be divided between
lution from the second-generation IS95A standard. them such that 3GPP handles direct-sequence WCDMA
The United States is facing a spectrum shortage for and 3GPP2 focuses on the multicarrier (MC) mode in
3G systems. A large part of the frequency band allocated cdma2000. 3GPP successfully produced a common set
by WARC 92 (see Fig. 1) is currently used by second- of standards for the WCDMA air interface, known as
generation personal communication systems. Although ‘‘Release 99.’’ Meanwhile, 3GPP2 issued Release A of
the United States has allotted only 210 MHz of spectrum cdma2000 1x and is currently (at the time of writing)
for mobile wireless use, compared to an average of working on Release B. In parallel, 3GPP2 also released
355 MHz per country in Europe, companies are currently an evolution of cdma2000 1x called ‘‘cdma2000 1xEV’’
pressing forward with their plans for third-generation (1xEVOLUTION) Phase 1 [also called ‘‘HDR (High Data
(3G) networks, while industry efforts at obtaining more Rate standard) Data Only’’] and is currently working
spectrum, led by the Cellular Telecommunications and on Phase 2 (Data and Voice). Section 4 provides more
Internet Association (CTIA), continue. Sprint PCS and information on these releases. Table 2 presents an
Cingular, for example, have announced plans to squeeze overview of the different CDMA technologies envisioned
more capacity out of existing networks by upgrading for 3G.
technology, although spectrum in the United States is An Operators Harmonization Group (OHG) was also
already more crowded than in other markets. The United created in 1999 for promoting and facilitating convergence
States has nearly 530,000 mobile customers per megahertz of 3G networks One goal is to provide seamless global
of spectrum while the United Kingdom has just more than roaming among the different CDMA 3G modes (cdma2000,
80,000 users, and Finland, the world’s leader in wireless WCDMA, and TDD modes). The OHG was involved
penetration, has only 15,000 users per megahertz. in specifying what mode should be used for the 3G
In October 2000, a memorandum was issued to define radio access network. While cdma2000 was specifically
the need for a radiofrequency spectrum for future mobile oriented toward the multicarrier mode, WCDMA was to
voice, high-speed data, and Internet-accessible wireless be only direct-spread. The objective is to achieve a flexible
services. The memorandum directed the Secretary of connection between the radio transmission technologies
Commerce to work cooperatively with the Federal (RTTs) and the core networks (either evolved GSM MAP
Communications Commission (FCC). (mobile application part) or evolved ANSI-41).
Various frequency bands had been identified for possi-
ble 3G use. The FCC and the National Telecommunica-
4. CONTEXT AND EVOLUTION PATHS
tions and Information Administration (NTIA) undertook
studies of the 2500–2690 MHz and the 1755–1850 MHz
4.1. Development Context
frequency bands in order to provide a full understanding
of all the spectrum options available. Both bodies stated Why are IMT-2000 systems called ‘‘3G’’? First-generation
possible sharing and segmentation options, but a review mobile systems were based on analog standards including
of the reports showed that, for every option, there were AMPS in the United States, TACS (Total Access
several caveats. Therefore, 3G carriers will mainly count Communication System) in the United Kingdom, CT-2
on existing spectrum for 3G services. Sprint PCS and Cin-
gular are heading this way when releasing 3G services 3 ETSI for Europe, the Telecommunications Technology Commit-
toward the end of 2001.
tee (TTC) and the Association for Radio Industries and Businesses
(ARIB) for Japan, the Telecommunication Industry Association
3.3.4. 3GPP and 3GPP2. In this international context, (TIA) and American National Standards Institute (ANSI) for the
standardization activities led within the ITU and regional United States.
1098 IMT-2000 3G MOBILE SYSTEMS
Source: Soundview Technology Group and Motorola (published in Global Wireless Magazine).
in Europe, and that of NTT (Nippon Telephone and packet-based data technologies such as General Packet
Telegraph) in Japan. The term ‘‘second generation’’ refers Radio Service (GPRS or GSM phase 2+).
to digital mobile systems such as TDMA and CDMA GPRS theoretically supports up to 115.2 kbps packet-
in the United States, GSM (USA, Europe, China) and switched mobile data alongside circuit-switched tele-
Personal Digital Cellular (PDC) in Japan. These systems phony. The principle is to utilize GSM time slots to carry
provide mobile telephony, short-message services (SMSs) data. The amount of data per time slot varies from about
and low rate data services relying on standards such as the 9 to 21 kbps, depending on the coding scheme used. GPRS
Wireless Applications Protocol (WAP) and I-mode. WAP is is well adapted to asymmetrical traffic, with a greater
a standard for delivering Internet content and applications number of time slots dedicated to the downlink. It also
to mobile telephones and other wireless devices such enables per volume billing, which, according to the I-mode
as personal digital assistants (PDAs). It can be used example, encourages use of the service. GPRS requires
independently of the type of terminal and network. WAP new equipment in the GSM base station subsystem (BSS)
content is written in wireless markup language (WML), and network subsystem (NSS) in order to handle the
a version of HTML designed specifically for display on packet data. The packet control unit (PCU) located in
wireless terminal screens. To provide WAP content, the the BSS handles the lower levels of the radio interface:
operator must implement a server between the wireless the radio-link control (RLC) and medium access control
network and the Internet, which translates the protocol (MAC) protocols. In the NSS, there are two important ele-
and optimizes data transfer to and from the wireless ments, which are also used in 3G networks. The first is
device. However, slow data rates, poor connections, and the Serving GPRS Support Node (SGSN), basically an IP
a limited number of services have significantly limited (Internet Protocol) router with specific functional features.
the use of mobiles for data applications. I-mode, on the For the subscribers in a given area of the mobile network,
other hand, attracted millions of users within its first it handles authentication and security mechanisms, mobil-
year of existence, starting in 1999, in Japan. Despite ity management, session management, billing functions,
the 9.6-kbps (kilobits per second) data, this proprietary in addition to the transmission of data packets. The sec-
packet-data standard developed by NTT DoCoMo met ond element is the Gateway GPRS Support Node (GGSN),
with widespread success for several reasons: no dial-up, another IP router which acts as gateway for data trans-
volume-based billing, services adapted to the low bit rate, mission between the GPRS network and other packet data
and an extensive choice of content providers. networks. Other elements in the GPRS network include
a domain name server (DNS) and a legal interception
4.2. Migration Paths gateway. The home location register (HLR) of the mobile
network is modified to take into account the data capa-
The technological options taken by different cellular bilities of GPRS customers. Mobile terminals (MTs) must,
operators to deploy 3G networks depend on the 2G of course, be GPRS-compatible. There are several MT
technology employed. In Europe, the starting point is classes; the most basic is a GPRS radio modem for a lap-
GSM. In the United States, in addition to GSM, operators top or handheld device, the most sophisticated handles
are focusing on two other paths, TDMA, and IS95A both voice and data flows simultaneously.
(CDMA). Each of these migration paths is described EDGE or Enhanced GPRS (EGPRS), one of the
hereafter. five standard IMT-2000 radio interfaces, represents an
upgrade not only from GSM but also from TDMA. The
4.2.1. GSM Migration Path. The first step was high- EDGE approach is similar to that of GPRS, in that it makes
speed circuit-switched data (HSCSD), introduced commer- use of existing time slots for packet data transmission.
cially in Finland in 1999. HSCSD supports data rates of However, EDGE uses a more elaborate coding scheme
up to 57.6 kbps by grouping four GSM time slots. Access providing up to 48 kbps per time slot, yielding an overall
to HSCSD, however, involves a new terminal for the cus- rate of up to 384 kbps. A potential drawback of EDGE
tomer. This service has not been developed extensively, is that, as data rates increase, range decreases. While
and operators are putting more energy into developing services are not interrupted, this fact may require the
IMT-2000 3G MOBILE SYSTEMS 1099
operator to build out a denser network. Furthermore, as Another potential alternative called ‘‘cdma2000 1x
in the case of HSCSD and GPRS, a specific terminal EV’’ (1x Evolution) also uses a 1.25-MHz channel. This
is required. EDGE is described in greater detail in transition includes two phases. For Phase 1, the High Data
Section 6.3. Rate (HDR) standard was released in August 2000. This
In its Release 99 (R99), UMTS, taken to be synonymous technology, also named 1xEV-DO (meaning 1x Evolution
with WCDMA, offers symmetrical links at 384 kbps Data Only), is supported by Qualcomm, Ericsson and
for customers at a speed of about 120 km/h, and Lucent. It is expected to increase peak data rates of up to
128 kbps for customers at a speed of up to 300 km/h. 2.4 Mbps. Phase 2 (1xEV-DV meaning 1xEvolution Data
Technically, this technology allows a data rate of up to and Voice), under standardization, will enable both voice
2 Mbps for a quasistationary user. In order to ensure and data channels (up to around 5 Mbps), while enhancing
compatibility with existing GSM networks, dual band, capacity and coverage.
dual-mode UMTS/GSM terminals are required for access Figure 2 illustrates the main alternatives for operators
to nationwide voice and data (GPRS or EDGE) services, at of 2G cellular networks.
least until UMTS coverage attains a significant percentage
of the country, and GSM networks are progressively
phased out. 5. IMT-2000 SERVICES AND TERMINALS
While intermediate steps HSCSD, GPRS, and EDGE
are overlaid onto the GSM radio network, UMTS requires 5.1. Services
an entirely different radio access network. The network A key feature of IMT-2000 is the wide range of services
architecture is described in Section 6.2. offered. In addition to voice, videotelephony, and videocon-
ferencing applications, it covers asymmetrical data flows
4.2.2. TDMA Migration Path. The first step in the (Web browsing, video or audio streaming), as well as low-
evolution of TDMA is IS136+, which advances the data data-rate machine-to-machine communications (metering,
speed to 64 kbps through packet switching over a 200-kHz e-wallet applications). Unlike 2G networks and fixed net-
channel. Since IS136+ require a 200-kHz channel, instead works, IMT-2000 does not preassign a bit rate and service
of the 30-kHz bandwidth previously used by TDMA, the quality level to each type of service, but rather provides
base stations must be upgraded with new hardware, which a framework within which communications can be char-
is expensive. The next phase is IS136 HS (high-speed), acterized in terms of their requirements. In a mobile
which uses EDGE technology, allowing the network to context, in particular, services should remain available
reach a theoretical data rate of 384 kbps, and also requires in a flexible manner (at a lower rate or with less error
a hardware upgrade to the network’s base stations. From protection, e.g., in case of degraded radio conditions) to
there, the network requires another expensive hardware ensure optimal use of the allocated frequency bands while
(base station) upgrade to support WCDMA. As a result, guaranteeing an acceptable service level from the user’s
even if a carrier skipped IS136+, the TDMA migration viewpoint. These considerations led to the definition of
involves two expensive hardware upgrades and therefore four quality-of-service (QoS) classes:
represents the most costly evolution path.
Release A Release B
CDMA2000 1× MC CDMA2000 1×MC CDMA2000 3×MC
CDMA IS-95A
up to 153.8 kbps up to 307 kbps up to 2.14 Mbps
14.4 kbps
1.25 MHz ~144 kbps 1.25 MHz 3 MHz
CDMA IS-95B
up to 115 kbps CDMA2000 1×EV CDMA2000 1×EV
~ 64 kbps up to 2.4 Mbps (Phase 2
(HDR) 1.25 MHz up to 5 Mbps)
Fundamental Preserve time relation Preserve time relation Request response Destination is not
characteristics (variation) between (variation) between pattern expecting the data
information entities information entities Preserve payload within a certain
of the stream of the stream content time
Conversational One-way flow Preserve payload
pattern: stringent content
and low delay
Example of the Voice, video telephony Video, audio Web browsing Background download
application streaming of emails
Quality of service in a packet network consists of at services (call hold, call forward, call conference, etc.).
least the following components: However, it was difficult for operators to propose
innovative services to attract the customer. To provide
Bandwidth — data rate greater flexibility in service creation, the second phase
of GSM standardization included the introduction of
Delay — end-to-end or round-trip latency
‘‘toolkits’’: CAMEL (customized applications of mobile
Jitter — interpacket latency variation network enhanced logic5 ), an intelligent network concept
Loss — rate at which packets are dropped for GSM, SIM toolkit (STK), and MExE (mobile execution
environment), which includes WAP. These toolkits were
Additionally, this QoS may be used in GSM to introduce prepaid services (CAMEL) and
mobile internet portals (WAP). In 3G, these principles are
Unidirectional or bidirectional still valid, but efforts are focused on integrating the various
Guaranteed or statistical toolkits in a single one called Open Service Architecture
(OSA), which is, in fact, an application programming
End-to-end or limited to a particular domain or domains
interface (API) based on PARLAY, a forum developing a
Applied to all traffic or just to a particular session or
sets of sessions.
5 CAMEL is based on an intelligent network architecture that
5.1.1. Open Service Architecture (OSA). Second-gene- separates service logic and database from the basic switching
ration mobile systems offered fully standardized services functions, and implements the Intelligent Network Application
such as voice, fax, short messages, and supplementary Protocol (INAP).
IMT-2000 3G MOBILE SYSTEMS 1101
common API for the different networks. This new concept A major change in 3G terminals with respect to second-
is still under development in 3GPP and will be introduced generation mobile terminals lies in the replacement of the
in post-R99 UMTS releases. 2G subscriber identity module (SIM) card with a more
An important feature of open service architecture is general-purpose card called a universal integrated circuit
that service design and provision can be ensured by card (UICC) of the same size. The UICC contains one or
companies other than the network operator. more user services identity modules (USIMs) as well as
other applications. Communications-related advantages
5.1.2. Virtual Home Environment. The virtual home include enhanced security, the ability to use the same
environment (VHE) concept, based on CAMEL, will handset for business and private use and to roam from
provide customers with the same set of services whatever UMTS to GSM networks as needed. The UICC can
the location. also contain payment mechanisms (micro-payment, credit
When a subscriber is roaming, his or her service profile, card), access badge functions, and user profile information.
or even the service logic registered in the home location
register (HLR), is transferred to the visited network to
offer the required services with the same ergonomics. 6. IMT-2000 NETWORK ARCHITECTURE
Target Source J
A3 traffic
BSC BSC A8 A10
A3 sig
SDU A9 PCF A11 PDSN
function
A7
Figure 4 depicts the logical architecture of the radio A5 Carries a full duplex stream of bytes between the
access network. It describes the overall system functions, interworking function (IWF) and the SDU function.
including services and features required for interfacing A7 Carries signaling information between a source BTS
a BTS with other BTSs, with the MSC for the circuit and a target BTS.
transmission mode, and with the packet control function A8 Carries user traffic between the BTS and the PCF.
(PCF) and the packet data serving node (PDSN) in the A9 Carries signaling information between the BTS and
packet transmission mode. the PCF.
The interfaces defined in this standard are described A10 Carries user traffic between the PCF and the PDSN.
below. A11 Carries signaling information between the PCF and
the PDSN.
A1 Carries signaling information between the call con-
trol (CC) and mobility management (MM) functions A8, A9, A10, and A11 are all based on the use of Internet
of the MSC and the call control component of the BTS Protocol (IP). IP can operate across various physical layer
(BSC). media and link layer protocols. Conversely, A3 and A7 are
A2 Carries 64/56-kbps pulse-code modulation (PCM) based on ATM Transport; and A1, on SS7 Signaling.
information (voice/data) or 64-kbps unrestricted In 2000, the Abis interface (between the BTS and the
digital information (UDI, for ISDN) between the MSC BSC) and Tandem Free Operation (TFO) for cdma2000
switch component and one of the following: systems were also standardized.
• The channel element component of the BTS (in the
case of an analog air interface) 6.1.2. Core Network. The 3GPP2 architecture is based
• The selection/distribution unit (SDU) function (in on the same wireless intelligent network (WIN) concept
the case of a voice call over a digital air interface) as that developed by 2G mobile networks, but this
A3 Carries coded user information (voice/data) and structure is partially modified to introduce packet data
signaling information between the SDU function and technologies such as IP. The objective of these circuit-
the channel element component of the BTS. The A3 switched networks was to bring intelligent network (IN)
interface is composed of two parts: signaling and user capabilities, based on ANSI-41, to wireless networks
traffic. The signaling information is carried across a in a seamless manner without making the network
separate logical channel from the user traffic channel, infrastructure obsolete. ANSI-41 was the standard backed
and controls the allocation and use of channels for by wireless providers because it facilitated roaming. It has
transporting user traffic. capabilities for switching and connecting different systems
IMT-2000 3G MOBILE SYSTEMS 1103
Supporting servers:
DNS, OA&M, DHCP,
Voice mail, SMS, EIR
Internet/Intranets HLR
AAAF/AAAH
IP backbone MSC
PSTN
PDSN/HA
cdmaOne cdmaOne
cdma2000 - 1x cdma2000 - 1x Figure 5. 3GPP2 architecture.
and the ability for performing direct connections between IP’’ and ‘‘mobile IP.’’ Only the mobile IP mode requires
different elements of the WIN based on SS7 Signaling such the use of home agent (HA) and PDSN/foreign agent
as GSM/MAP. (PDSN/FA) entities. It is clear that these two modes are not
Wireless data packet networking is based on the IS835 equivalent from a service point of view. Mobile IP supports
standard, whose architecture is illustrated in Fig. 5. This mobility toward different PDSNs during a session. In this
architecture provides access, via the cdma2000 air inter- sense, the mobile IP solution is similar to the General
face, to public networks (Internet) and private networks Packet Radio System (GPRS — see Section 4.2). Simple IP
(Intranets) considering the required quality of service only offers a connection to the packet domain without any
(QoS) and the pertinent accounting support. 3GPP2 strives real-time mobility between PDSNs. Simple IP supports
to reuse IETF open standards whenever possible in order mobility within a given PDSN.
to keep a high level of interoperability with other cellular It is important to note that this standard uses circuit-
standards, and to increase marketability. These standards switched MSC resources (ANSI-41 Domain and SS7
include mobile IP for interPDSN (packet data serving Signaling) to handle both the voice and packet radio
node) mobility management, radius for authentication, control resources (access registration, QoS profile based
authorization and accounting, and differentiated services verification, paging, etc.).
for the QoS. This model will certainly be modified with the
Figure 6 introduces the general model for the 3GPP2 emergence of the all-IP network, which will support
packet domain. This domain includes two modes, ‘‘simple data capacities largely exceeding those offered in the
Legend SS7
AAA Authentication, authorization and accounting VLR Network HLR
HA Home agent MSC
Home access
HLR Home location register
provider network
MS Mobile station
PCF Packet control function
PDSN Packet data serving node
RN Radio network Radius Radius
RRC Radio resource control Home IP
VLR Visitor location register network
R-P IP Radius
interface network
Broker
network
Mobile station PDSN
RN
HA
Visited access Only for the
provider network Home IP network, mobile IP mode
private network,
home access provider
network
Iu Iu
Figure 7. UMTS (W-CDMA) radio access network Cells Cells Cells Cells
architecture
mixed circuit/packet data scheme. This new architecture is ensure interworking with GSM and dual-mode FDD/TDD
expected to be ready by the end of 2002. The future network operation.
will enable services such as voice over IP, multimedia calls, FDD mode is appropriate for all types of cells, including
and video streaming, based on full use of IP Transport large cells, but is not well adapted to asymmetric
in both the radio access and core networks. This step traffic. TDD is, by definition, more flexible to support
will mark the end of the classical circuit-switched core traffic asymmetry, but it requires synchronization of
network. the base stations, and is not appropriate for large
cells because of the limited guard times between time
6.2. WCDMA slots. Table 4 lists the main characteristics of the two
modes.
6.2.1. Radio Access Network The WCDMA FDD mode is based on CDMA principles
6.2.1.1. Deployment, Duplex Mode. WCDMA deploy- with a bandwidth of 5 MHz. One major difference with
ment involves a multilayer cellular network, with the IS95 standard is that no synchronization is required
macrocells (0.5–10 km in range) for large-scale cover- among base stations, thus allowing easier deployment for
age, microcells (50–500 m) for hotspots, and picocells operators. One of the key advantages of WCDMA is its
(5–50 m) for indoor coverage. Handover is ensured both high spectral efficiency, or capacity per unit of spectrum
among WCDMA cells and between WCDMA and GSM [typically expressed in kilobits per second per megahertz or
cells, without any perceptible cut or degradation of (kbps/MHz)]. Depending on the services offered, WCDMA
quality. offers 2–3 times the capacity of GSM with the same
As indicated in Section 4, the air interface adopted by amount of spectrum.
ETSI in January 1998 is based on two harmonized modes: The TDD mode is based on a mix between TDMA
FDD/WCDMA for the paired bands and TDD/TDCDMA and CDMA. The TDD frame has 15 time slots and
for the unpaired bands. UMTS is to be deployed each time slot supports several simultaneous CDMA
using at least two duplex 5-MHz bands, and must communications.
IMT-2000 3G MOBILE SYSTEMS 1105
UMTS
BS
lub IN Platform
Camel
BTS
UMTS lu
UMTS UMTS MSC
BS RNC
Terminal
lu ISUP
GSM ISDN
Abis Map
BTS
A
GSM HLR
BTS BSC
GSM Gb
GSM/GPRS BTS Map
Terminal
Gi
Gn
A Internet
GSM
Abis SGSN GGSN
GSM
BTS BSC
Intranet
BTS GSM
BTS
GSM
Terminal Base station subsystem Network subsystem
6.2.1.3. WCDMA Radio Dimensioning. Dimensioning capacity made available based on the coverage criterion,
of the radio access network takes into account both it is necessary to add carriers and/or sites.
coverage and capacity. To ensure physical coverage, the
cell radius is based on link budgets calculated according 6.2.2. WCDMA Core Network. The WCDMA network
to the different service types, propagation environment, architecture is illustrated in Fig. 9. As mentioned, the
indoor penetration assumptions, and loading factor. It is radio access network is comprised of specific WCDMA
recalled that, in CDMA networks, each user contributes to base stations and RNCs. The network subsystem requires
the interference level, which causes the cell to ‘‘shrink,’’ a an SGSN (described in Section 4.2), which may be
phenomenon called ‘‘cell breathing.’’ Generally, a loading specific to the UMTS system or shared with the GPRS
factor of 50–70% is taken. This factor is based on a service, and a specific operations and maintenance center
theoretical ‘‘100%’’ load at which interference tends to (OMC). Other NSS components, such as the DNS and
infinity. GGSN, can also be shared with the existing GPRS
On the basis of geomarketing data, the number of system.
users per square kilometer and the average data rate The UMTS Release 99 core network comprises two
per user during busy hour yields a load in terms of data distinct domains: circuit-switched (CS) and packet-
rate (Mbps or kbps) or erlangs per square kilometer. The switched (PS), as in GSM/GPRS networks. The core
corresponding cell size is computed to handle this load. network elements are the same as in these networks:
This cell area is matched against that computed from MSC (mobile switching center) for CS services, and SGSN
the link budgets, and when the offered load exceeds the and GGSN for PS services.
1106 IMT-2000 3G MOBILE SYSTEMS
In Release 99, ATM was chosen for transport in the agreements with these new entrants, enabling them
access network. This choice makes it possible to support to offer their customers national coverage for voice
all the types of services offered: voice, circuit-switched service. The choice of assignment method — auctions or
data, and packet-switched data. Different ATM adaptation comparative hearings — led to highly divergent license
layers (AALs) are used: AAL2 for the voice or data on the costs in Europe. While auctions generated overall state
Iu-cs circuit-switched domain interfaces, Iur and Iubis. revenues ranging from 85 to 630 Euros per capita (with
AAL5 is used for signaling and for user data on the Iu-ps Germany and the United Kingdom gaining the highest
packed-switched domain interface. amounts), comparative hearings yielded revenues ranging
from approximately 0 to 45 Euros per capita; a notable
6.3. EDGE exception was France, where the license fee was set,
shortly after the English auctions, at 335 Euros per capita.
EDGE, as mentioned in Section 3.2., is a convergent However, only two candidates submitted applications for
solution for TDMA (IS136), and GSM. the four licenses available; other candidates cited the
Since July 2000, the 3GPP has been responsible high cost as the main deterrent. Rollout requirements,
for standardization of the GSM/EDGE Radio Access set primarily in conjunction with comparative hearings,
Network (GERAN). In order to harmonize this access generally call for initial deployment as of 2002, with
technology with others developed for IMT-2000, notably coverage of the main cities and extension to 50–80%
WCDMA and cdma2000, the 3GPP chose to align the of the population within the following 6–8 years. In
QoS requirements for EDGE with those of the other Japan, license costs were minimal, while in the United
technologies (see Table 1). However, because the data States, 3G spectrum auctions led to per capita costs
rate varies with distance from the base station in similar to those observed in the United Kingdom and
EDGE, only interactive and background type services Germany.
are expected to be offered initially, even though the The combined obligations of paying license fees and
EDGE standard encompasses enhanced circuit-switched building out entirely new radio access networks, which
data (ECSD) services. represent some 80% of the initial IMT-2000 investment
The core network does not differ from that of a GPRS within a set time frame, have placed a considerable
core network; however, because of the increased data burden on both operators and equipment manufacturers.
rate available, capacity of the data links between base In Sweden, for example, operators have formed joint
stations and BSCs, and between the BSCs and MSC ventures to build the radio infrastructure in areas outside
and SGSN must be dimensioned appropriately. The radio of the major metropolitan centers. In such cases, each
access network is based on existing 2G infrastructure, as operator exploits its own spectrum, but shares the cost
illustrated in Fig. 9. of buildout in order to focus on service development [6].
The introduction of the Iu interfaces enables fast han-
Opinions diverge as to the economic prospects for IMT-
dover in the packet-switched domain, real-time services
2000. The outlook is optimistic in Japan, where customer
in the packet-switched domain, and enhanced multiplex-
awareness of mobile data services is high because of I-
ing capabilities in both the packet- and circuit-switched
mode, and where willingness to pay for such services
domains. The A and Gb interfaces are maintained for
is among the highest in the world. Although Europe
compatibility with legacy systems.
is well advanced in terms of second-generation mobiles
The GERAN has been standardized for the following
penetration, mobile data services have gotten off to
GSM frequency bands: 900, 1800, 1900, 400, and 700 MHz.
a slow start, with the disappointingly limited scope
Note that these bands do not include those allocated to
and speed of WAP. As better data rates and volume-
IMT-2000 systems; hence this technology can be used
based billing become available with GPRS, customers
by operators without IMT-2000 spectrum. However, it is
may more willingly adopt new data services offered by
recalled that the highest data rate possible is 384 kbps
a wide panel of providers [6]. This ‘‘education’’ stage
versus the theoretical maximum of 2 and 2.4 Mbps, for
is considered crucial to the rapid adoption of IMT-
WCDMA and cdma2000, respectively.
2000.
In the United States, mobile data have generally
7. LICENSE ASSIGNMENT AND ECONOMIC been restricted to very low-data-rate exchanges, while
IMPLICATIONS OF IMT-2000 (4/2001) fixed lines remain the preference for Internet access.
Moreover, customers are used to free Internet content.
As mentioned in Section 3.3, national regulators played This means that innovative location-based, customized
a major role in shaping the development of IMT-2000 services will have to be developed for operators to recoup
3G networks and services. In Europe, they focused on their expenses.
three important factors: 3G license cost and allocation In all cases, with respect to 2G, new players will
mechanism, number of 3G operators per country, and be joining the value chain: content suppliers, service
rollout/coverage obligations. With an initial goal of brokers, virtual mobile network operators, and network
encouraging competition to enhance service offerings resource brokers. While some of these new players may
and ensure reasonable prices for the end customer, simply be affiliates of today’s major mobile operators,
regulatory agencies chose to allocate spectrum to at least some will be entirely new entities such as retailers, banks,
one or two new entrants in each country, requiring, in insurance companies, and entertainment companies. This
most cases, incumbent GSM operators to sign roaming means that the corresponding 3G revenues, whether
IMT-2000 3G MOBILE SYSTEMS 1107
from the end user, a third party, or advertising, will of standards organizations such as 3GPP and 3GPP2,
be spread more thinly than in the 2G world, leading the policies developed by national and international
in all likelihood to continentwide consolidation [7] of regulatory bodies that will have learned the lessons of 3G,
operators. the ability of operators and service suppliers to anticipate
customers’ needs and desires and, most importantly, the
customers themselves.
8. BEYOND 3G
Acknowledgments
With many uncertainties remaining as to the economic
The authors would like to thank their colleagues at France
viability of IMT-2000 in the near term, research and
Télécom R&D as well as the partners of the ACTS 364/TERA
development efforts are already turning toward the project for their useful support and discussions.
next phase, coined ‘‘beyond 3G.’’ This phase, expected
to emerge as of 2005, is based on seamless roaming
among heterogeneous wireless environments, including BIBLIOGRAPHY
3G mobile networks and indoor wireless facilities such
as radio LANs and Bluetooth networks. Another topic 1. UMTS Forum, Report 5, 1998.
currently being explored is the cooperation of 3G networks 2. 3rd Generation Partnership Project Technical Specification TS
and broadcast standards such as digital video or audio 23.107 v3.3.0, June 2000.
broadcasting (DVB, DAB).
3. Global Wireless Magazine, Crain Communications Inc.,
The goal is to optimize the service offered to the end March–April 2001, p. 18.
customer by taking advantage of the spectral resources
4. H. Holma and A. Toskala, eds., WCDMA for UMTS — Radio
available. For example, a train passenger can connect
Access for Third Generation Mobile Communications, Wiley,
with the company intranet via the UMTS network with
Chichester, UK, 2000, p. 2285.
the available bit rate and quality inherent in this support
service; then, when this passenger enters a train station or 5. Pyramid Research Perspective, Europe, Feb. 2, 2001.
airport lounge equipped with a wireless LAN, the terminal 6. Strategy Analytics SA Insight, Dec. 8, 2000.
detects the new network and vice versa, and switches 7. L. Godell et al., Forrester Report: Europe’s UMTS Meltdown,
over to this new broadband resource. If any background Dec. 2000.
tasks are being performed, they are uninterrupted
when the terminal goes from one environment to the
other. FURTHER READING
As a complement to IMT-2000, DVB-T can provide
fast one-to-many services such as weather, sports scores, Useful Websites
or stockmarket values. It can thus significantly ease the
burden on IMT-2000 frequency resources which are then Standards
used for bi-directional wideband links.
ITU: http://www.itu.int/imt/
3GPP: http://www.3gpp.org
9. CONCLUSION AND DISCUSSION 3GPP2: http://www.3gpp2.org
(member organizations can be accessed from these
IMT-2000, a family of third-generation mobile standards,
sites)
is on the verge of deployment. The prospect of offering
a wide range of mobile multimedia services prompted
numerous incumbent operators and new entrants to Manufacturers (Network Equipment and/or Terminals)
spend sometimes exorbitant amounts on 3G licenses.
Undoubtedly, the first years will be characterized by Nokia: http://www.nokia.com
turbulence in infrastructure rollout; technical teams Ericsson: http://www.ericsson.com
will have to be trained, the manufacturers will have Motorola: http://www.motorola.com
to ensure adequate levels of production, and — most Lucent Technologies: http://www.lucent.com
importantly — new sites will have to be negotiated. Qualcomm: http://www.qualcomm.com
Operators’ finances will be strained as they seek to attract
Alcatel: http://www.alcatel.com
customers and generate adequate revenues early enough
to offset the investments made. Smaller operators and new NEC: http://www.nec.com
entrants with no infrastructure may find the expenditure Nortel Networks: http://www.nortel.com
and task too daunting and, willingly or unwillingly, merge Siemens: http://www.siemens.com
with a larger competitor. Toshiba: http://www.toshiba.co.jp/worldwide/
As networks and services reach cruising speed,
Sharp: http://www.sharp-usa.com
operators will focus on enhancing services by finding new
spectral resources in other frequency bands (e.g., GSM Sanyo: http://www.sanyo.com
bands) and by encouraging cooperation among different Sony: http://sony.com
wireless networks, both fixed and mobile. Underlying Panasonic, Matsushita Electric Industrial Co., Ltd.:
the success of this evolution are the continued efforts http://www.panasonic.co.jp/global/
1108 IMT-2000 3G MOBILE SYSTEMS
1These sequences are also called weakly typical. A stronger 2 Relative entropy is also referred to by a variety of other names
version of typicality can be defined using the method of types [3]. including Kullback–Leibler distance and information divergence.
INFORMATION THEORY 1111
however it does not share all of the properties. One English text. The performance of a variable-length source
important difference is that differential entropy depends code is given by the expected code length, averaged over
on the choice of coordinates, while entropy is invariant to the source statistics. This will clearly be smaller if shorter
coordinate transformations. codewords are used for more likely symbols. Instead of
The mutual information between two continuous encoding each source letter individually as in the above
random variables is given by example, a source code may encode blocks of source letters
at once.
I(X; Y) = h(X) − h(X | Y) For variable-length codes, it is important to be able to
tell where one codeword ends and the next begins. For
f (x, y)
= f (x, y) log dx dy example, consider a code which assigns 0 to a and 00 to
f (x)f (y)
b; in this case the string 00 could represent aa or b. A
Unlike differential entropy, mutual information is invari- code with the property that any sequence of concatenated
ant to coordinate transformations, and is a natural gener- codewords can be correctly decoded is called uniquely
alization of its discrete counterpart. decodable. A special class of uniquely decodable codes
are codes where no codeword is the prefix of another; these
are called prefix-free codes or instantaneous codes. The
3. SOURCE CODING class of uniquely decodable codes is larger than the class
of prefix-free codes. However, it can be shown that the
Source coding or data compression refers to the problem of performance attained by any uniquely decodable code can
efficiently representing messages from a given information also be achieved by a prefix-free code, that is, it is sufficient
source as binary sequences.3 Here efficiency refers to the to only consider prefix-free codes. This results follows from
length of the resulting representation. the Kraft inequality, which states that a prefix-free code
Information sources are modeled as random processes, can be found with codewords of lengths l1 , l2 , . . . , lk if and
depending on the situation several different models may only if
be appropriate. One class of model consists of discrete k
sources; in this case, the source is modeled as a sequence of 2−li ≤ 1
discrete random variables. When these random variables i=1
are i.i.d. (independent, identically distributed), this model
is called a discrete memoryless source (DMS). Another The set of codeword lengths for any uniquely decodable
class of sources are analog source models, where the code must also satisfy this inequality.
output of the source is a continuous-time, continuous-
valued waveform. For discrete sources, compression may 3.2. Variable-Length Source Codes
be lossless; that is, the original source output can be In this section we discuss the performance of variable-
exactly recovered from the encoded bit sequence. On the length, uniquely decodable codes. Suppose that the code
other hand, for analog sources, compression must be lossy, encodes a single letter from a DMS at a time. From
since an infinite number of bits are required to represent the Kraft inequality, it can be shown that the minimum
any real number. In this case, source coding consists of average length of any such code must be greater than or
sampling the source and quantizing the samples. We equal to the source’s entropy. For a given distribution of
first discuss lossless source coding of discrete sources, source letters, a variable-length source code with minimal
here the information theoretic limitations are provided average length can be found using a simple iterative
by the source’s entropy rate. In Section 3.5 we consider algorithm discovered by Huffman [5]. The average length
analog sources, in this case the fundamental limits are of a Huffman code can be shown to be within one bit of
characterized by the rate distortion function, which is source’s entropy
defined using mutual information.
H(X) ≤ LH ≤ H(X) + 1
3.1. Classification of Source Codes
For a discrete source, a source code is a mapping or where H(X) is the source’s entropy and LH is the average
encoding of sequences generated by the source into a codeword length of the Huffman code. The above can be
corresponding binary strings. An example of a source code extended by considering a code that encodes blocks of N
for a source with an alphabet {a, b, c, d} is source letters at a time. A Huffman code can then be
designed by treating each block as a single ‘‘supersymbol.’’
a ↔ 00 b ↔ 01 c ↔ 10 d ↔ 11 In this case, if LN
H is the average codeword length, then the
compression ratio (the average number of encoded bits per
Notice this code is lossless, because each source letter is source symbol) satisfies
assigned a unique binary string. This is a fixed-length
LN 1
code, since each codeword has the same size. Source codes H(X) ≤ H
≤ H(X) +
may also be variable-length, as in the Morse code used for N N
Hence, as N becomes large, one can achieve a compression
3This easily generalizes to the case where messages are to be ratio that is arbitrarily close to the source’s entropy. At
represented as strings in an arbitrary, finite-sized code alphabet; times, this result is referred to as the variable-length
we focus on the binary case here. source coding theorem. The converse to this theorem
1112 INFORMATION THEORY
states that no lossless, variable-rate code can achieve a for compressing such a source is to first sample the
compression ratio smaller than the sources entropy. This waveform, which yields a sequence of continuous-valued
can be generalized to discrete sources with memory; in random variables. If the source is a band-limited,
this case, the entropy rate of the source is used. stationary random process, then it can be fully recovered
when sampled at twice its bandwidth (the Nyquist
3.3. The AEP and Shannon’s Source Coding Theorem rate). After sampling, the random variables are then
quantized so that they can be represented with a finite
The AEP, discussed in Section 2.1, provides additional number of bits. Quantization inherently introduces some
insight into the attainable performance of source codes. loss or distortion between the original signal and the
Consider a DMS with an alphabet of size K. There are reconstructed signal at the decoder. Distortion can be
K L possible sequences of length L that the source could reduced by using a finer quantizer, but at the expense of
generate. Encoding these with a fixed-length, lossless needing more bits to represent each quantized value. The
source code requires approximately L log2 (K) bits, or branch of information theory called rate-distortion theory
log2 (K) bits per source letter. However, from the AEP, investigates this tradeoff.
approximately 2LH(X) of these sequences are typical. A scalar quantizer can be regarded as a function that
Therefore a lossy source code can be designed that assigns assigns a quantized value or reconstruction point x̂ to
a fixed-length codeword to each typical sequence and each possible sample value x. A rate R quantizer has 2R
simply disregards all nontypical sequences. This requires quantization points. The loss incurred by quantization is
approximately H(X) bits per source letter. The probability quantified via a distortion measure, d(x, x̂), defined for
of not being able to recover a sequence is equal to the each sample value and reconstruction point. A common
probability that the sequence is atypical, which becomes distortion measure is the squared error distortion given by
negligible as L increases. Making this more precise, it
can be shown that for any ε > 0 a source code using no
d(x, x̂) − (x − x̂)2
more than H(X) + ε bits per source letter can be found
with arbitrary small probability of decoding error. This
is the direct part of Shannon’s source coding theorem. Given a probability distribution for the sample values and
The converse to this theorem states that any code that a fixed rate, the set of quantization points that minimize
uses fewer than H(X) bits per source letter will have a the expected distortion can be found using an iterative
probability of decoding failure that increases to one as the approach call the Lloyd–Max algorithm [9].
block length increases.4 Instead of quantizing one sample at a time, performance
can be improved by using a vector quantizer, that is, by
quantizing a block of N samples at once. A rate R vector
3.4. Universal Source Codes
quantizer consists of 2NR reconstruction points, where
Approaches such as Huffman codes require knowledge of each point is an N dimensional vector. In this case, the
the source statistics. In practice it is often desirable to distortion between a sequence of sample values and the
have a data compression algorithm that does not need a corresponding reconstruction points is defined to be the
priori knowledge of the source statistics. Among the more average per sample distortion. A distortion D is said to
widely used approaches of this type are those based on be achievable at rate R if there exists a sequence of rate
two algorithms developed in 1977 and 1978 by Lempel R quantizers with increasing block size N for which the
and Ziv [6,7]. For example, the 1978 version is used in the limiting distortion is no greater than D. For each distortion
UNIX compress utility. The 1977 algorithm is based on D, the infimum of the rates R for which D is achievable is
matching strings of uncoded data with strings of data given by the operational rate distortion function, Rop (D).
already encoded. Pointers to the matched strings are For an i.i.d. source {Xn }, the (Shannon) rate distortion
then used to encode the data. The 1978 algorithm uses function, R(D), is defined to be
a dictionary that changes dynamically based on previous
sequences that have been encoded. Asymptotically, the R(D) = min I(X; X̂)
Lempel–Ziv algorithms can be shown to compress any
stationary, ergodic source to its entropy rate [8]; such an where the minimization is over all conditional distribu-
approach is said to be universal. tions of X̂ given X which yield a joint distribution such
that the expected value of d(X, X̂) is no greater than
3.5. Lossy Compression: Rate Distortion Theory D. Here X is a scalar random variable with the same
Next we consider encoding analog sources. We restrict distribution as Xn . For a large class of sources and distor-
our attention to the case where the source generates real- tion measures, rate distortion theorems have been proved
valued continuous-time waveforms.5 The basic approach showing that Rop (D) = R(D). Therefore, the fundamental
limits of lossy compression are characterized by the infor-
mation rate distortion function. As an example, consider
4 This is sometimes referred to as the strong converse to the source an i.i.d. Gaussian source with variance σ 2 . In this case,
coding theorem. The weak converse simply states that if fewer for squared error distortion, the rate distortion function is
than H(X) bits per source letter are used then the probability of given by
error can not be zero. 1 σ2
R(D) = 2 log D , 0 ≤ D ≤ σ
5 More generally one can consider cases where the source is 2
In addition to i.i.d. sources, rate distortion theory has been 1−e 1−e
generalized to a large variety of other sources [e.g., 10]. 0 0 0 0
e e
E
3.6. Distributed Source Coding e e
An interesting extension of the preceding models are 1 1 1 1
1−e 1−e
various distributed source coding problems, that is Input Output Input Output
situations where there are two or more correlated Figure 2. A binary symmetric channel (left) and a binary erasure
information sources being encoded at different locations. channel (right).
One information theoretic result for this type of situation
is known as the Slepian–Wolf theorem [11]; this theorem
implies the surprising result that the best (lossless) channel; in this case the input alphabet is {0, 1}, but the
compression achievable by the two sources without output alphabet is {0, 1, E}. Each input letter is either
cooperating is as good as what can be achieved with received correctly, with probability 1 − ε, or is received as
complete cooperation. For distributed lossy compression, an erasure E, with probability ε.
characterizing the rate distortion region remains an open Uncoded transmission over a binary symmetric channel
problem. consists of simply mapping each bit arriving at the channel
coder into the corresponding input symbol. The decoder
would map the received signal back into the corresponding
4. CHANNEL CODING bit. In this case, one bit is sent in each channel use and the
probability of error is simply ε. To reduce the probability of
We now turn to the problem of communicating reliably over error, an error-correcting code may be used. For example,
a noisy channel. Specifically given a bit stream arriving suppose each bit is repeated 3 times when transmitted
at the channel coder, the object is to reproduce this bit over the channel. In this case, if at most one bit out
stream at the channel decoder with a small probability of three is in error, the decoder can correct the error.
of error. In addition, it is desirable to transmit bits This reduces the error probability, but also reduces the
over the channel with as high a rate as possible. A key transmission rate by a factor of 3.
result of information theory is that for a large class of The preceding example is called a repetition code.
channels, it is possible to achieve arbitrarily small error More generally, a (M, n) block code for a discrete channel
probabilities provided that the data rate is below a certain consists of a set of M codewords, where each codeword is a
threshold, called the channel capacity. As with the rate sequence of n symbols in the channel input alphabet. The
distortion function, capacity is defined in terms of mutual encoder is a function that maps each sequence of log2 M
information. Conversely, at rates above capacity, the error data bits into one of these codewords and the decoder maps
probability is bounded away from zero. Results of this type each possible received sequence of length n back into one
are known as channel coding theorems. of the original sequences of data bits. In the previous
Mathematically a channel is modeled as a set of possible example M = 2 and n = 3. The rate of the code is
input signals, a set of possible output signals and a set
of conditional distributions giving the probability for each log2 M
R= bits per channel use
output signal conditioned on a given input signal. As n
with sources, several different types of channel models are
In a (M, n) code, the probability of error for each codeword,
studied. One class consists of discrete channels where the
ei , i = 1, . . . , M, can be calculated from the channel model
input and output signals are sequences from a discrete
and the specification of the decoder. The maximal error
alphabet. Another class of channels consists of waveform
probability for a code is defined to be max ei , and the
channels where the input and output are viewed as i
continuous functions of time; the best-known example is average
error probability of the code is defined to be
the band-limited additive white Gaussian noise channel. (1/M) ei .
i
For a DMC, if the input letters are chosen with the
5. DISCRETE MEMORYLESS CHANNELS probability distribution p(x), then the joint probability
mass function of the channel input and output is given by
A discrete memoryless channel (DMC) is a discrete channel, p(x, y) = p(x)p(y | x). Using this probability mass function,
where the channel input is a sequence of letters {xn } the mutual information between the channel input and
chosen from a finite alphabet. Likewise, the channel output can be calculated for any input distribution. The
output is also a sequence {yn } from another finite alphabet. channel capacity of a DMC is defined to be the maximum
Each output letter yn depends statistically only on the mutual information
corresponding input letter xn . This dependence is specified
by a set of transition probabilities P(y | x) for each x C = max I(X; Y)
and y. Two examples of DMC’s are shown in Fig. 2. On
the left is a binary symmetric channel, where both the where the maximization is over all possible probability
input and output alphabet are {0, 1}. Each letter in the distributions on the input alphabet. In many special cases,
input sequence is reproduced exactly at the output with the solution to this optimization can be found in closed
probability 1 − ε and is converted into the opposite letter form; for a general DMC, various numerical approaches
with probability ε. On the right of Fig. 2 is a binary erasure may be used, such as the Arimoto–Blahut algorithm [12].
1114 INFORMATION THEORY
A variety of such problems have been considered. This 9. S. Lloyd, Least Squares Quantization in PCM, Bell Laborato-
includes the multiaccess channel, where several users ries Technical Note, 1957.
are communicating to a single receiver. In this case, a 10. T. Berger, Rate Distortion Theory: A Mathematical Basis for
capacity region is specified indicating the set of all rate Data Compression, Prentice-Hall, Englewood Cliffs, NJ, 1971.
pairs (for the two-user case). Coding theorems have been 11. D. Slepian and J. Wolf, Noiseless coding of correlated
found that establish the multiple access capacity region information sources, IEEE Trans. Inform. Theory 19:
for both the discrete memoryless case [17] and for the 471–480 (1973).
Gaussian case [18]. Another example is the interference 12. R. Blahut, Computation of channel capacity and rate
channel, where two senders each transmit to a separate distortion functions, IEEE Trans. Inform. Theory 18: 460–473
receivers. The complete capacity region for this channel (1972).
remains open.
13. R. Gallager, Information Theory and Reliable Communica-
tion, Wiley, New York, 1968.
7. FURTHER STUDIES 14. J. Wolfowitz, Coding Theorems of Information Theory,
Prentice-Hall, Englewood Cliffs, NJ, 1978.
For more in-depth reading there are a variety of 15. A. Lapidoth and P. Narayan, Reliable communication under
textbooks on information theory including Cover [19] and channel uncertainty, IEEE Trans. Inform. Theory 44:
Gallager [13]. Shannon’s original papers [1,2] are quite 2148–2175 (Oct. 1998).
readable; all of Shannon’s work, including [1,2], can be 16. C. Shannon, The zero error capacity of a noisy channel, IRE
found in the compilation by Sloane and Wyner [20]. A Trans. Inform. Theory 2: 112–124 (Sept. 1956).
collection of good survey articles can also be found in the 17. R. Ahlswede, Multi-way communication channels, Proc. 2nd
volume edited by [21]. Int. Symp. Information Theory, 1971, pp. 103–135.
18. T. Cover, Some advances in broadcast channels, Adv.
Commun. Syst. 4: 229–260 (1975).
BIOGRAPHY
19. T. Cover, Elements of Information Theory, Wiley, New York,
Randall Berry received the B.S. degree in Electrical 1991.
Engineering from the University of Missouri-Rolla in 1993 20. N. J. A. Sloane and A. D. Wyner, eds., Claude Elwood Shan-
and the M.S. and Ph.D. degrees in Electrical Engineering non: Collected Papers, IEEE Press, New York, 1993.
and Computer Science from the Massachusetts Institute 21. S.Verdú, ed., IEEE Trans. Inform. Theory (Special Commem-
of Technology in 1996 and 2000, respectively. Since 2000, orative Issue) 44: (Oct. 1998).
he has been an Assistant Professor in the Department of
Electrical and Computer Engineering at Northwestern
University. In 1998 he was on the technical staff at INTERFERENCE AVOIDANCE FOR WIRELESS
MIT Lincoln Laboratory in the Advanced Networks SYSTEMS
Group. His primary research interests include wireless
communication, data networks, and information theory. DIMITRIE C. POPESCU
CHRISTOPHER ROSE
BIBLIOGRAPHY Rutgers WINLAB
Piscataway, New Jersey
in the 5 GHz range [1] with few restrictions other than users can reliably transmit information (sum capacity).
absolute power levels — a ‘‘Wild West’’ environment of In other words, interference avoidance methods, through
sorts where the only restriction is the size of the weapon! the self-interested action of each user, lead to a socially
Thus, no central control is assumed. Needless to say, such optimum1 equilibrium (Pareto efficient [9,10]) in various
a scenario seems ripe for chaos. Specifically, self-interest mutual interference ‘‘games.’’
by individual users often results in unstable behavior. Interference avoidance was originally introduced in
Such behavior has been seen anecdotally in distributed the context of ‘‘chip-based’’ DS-CDMA systems [11] and
channelized systems such as early cordless telephones minimum mean-square error (MMSE) receivers, but was
in apartment buildings where the number of collocated subsequently developed in a general signal space [12–14]
phones exceeded the number of channels available. Phones framework [15,16] which makes them applicable to a wide
incessantly changed channels in an attempt to find a variety of communication scenarios. Related methods for
usable one. Thus, some means of assuring efficient use in transmitter and receiver adaptation have also been used
such distributed wireless environments is needed. in the CDMA context for asynchronous systems [17] and
Reaction to mutual interference is the heart of the systems affected by multipath [18].
shared-spectrum problem, and traditional approaches to The relationship between codeword assignment in
combating wireless interference start with channel mea- a CDMA system and sum capacity has been studied
surement and/or prediction, followed by an appropriate in several papers [19,20]; the paper by Viswanath and
selection of modulation methods and signal processing Anantharam [20] provides an algorithm to obtain sum
algorithms for reliable transmission — possibly coupled to capacity optimal codeword ensembles in a finite number
exclusive use contracts (licensing). Interestingly, since the of steps — and perhaps more importantly, also shows
time of the first radio transmissions, the methods used to that the optimal linear receiver for such ensembles is
deal with interference can be loosely grouped into three a matched filter for each codeword. Interference avoidance
categories: algorithms also yield optimal codeword ensembles but
seem conceptually simpler and suitable for distributed
• Build a fence (licensing) adaptive implementation in multiuser systems.
• Use only what you need (efficient modulation, power It is worth expanding upon this last point. Interference
control) avoidance is envisioned as a distributed method for
• Grin and bear it (signal processing at the receiver) unlicensed bands as opposed to a centralized procedure
done by an omnescient receiver. Of course, we will see
Examples of the first item are legion, while examples of
that the mathematics of the algorithm also lends itself
the last two items include single-sideband amplitude mod-
to central application, so the distinction is really only
ulation, frequency modulation with preemphasis at the
important for practical application. However, throughout
transmitter and deemphasis at the receiver, power control
we will assume distributed application which implicitly
in cellular wireless systems [2], and code-division multiple
suggests that each user knows its associated channel
access (CDMA) coupled to sophisticated signal processing
and in addition that each user has access to the whole
algorithms for interference suppression/cancellation at the
system covariance through a side-channel beacon. The
receiver [3].
receiver can adaptively track codeword variation in
However, Moore’s law advances in microelectronics
a manner reminiscent of adaptive equalization. Since
have led to the emergence of new transceiver hardware
communication is two-way and physical channels are
that add a new weapon to the interference mitigation
reciprocal, it is not unreasonable to assume that both
arsenal. Specifically, a class of radios that can be
the user and the system can know the channel. More
programmed to transmit almost arbitrary waveforms
important is the rate at which the channel varies. We will
and can act as almost arbitrary receiver types is
assume that channel variation is slow relative the frame
emerging — the so-called software radios [4–7]. So, as
rate [21,22], or if the channel variation rate is rapid that
opposed to traditional radios, which owing to complex
the average channel varies slowly enough for interference
transceiver hardware are difficult to modify once a
avoidance to be applied [23].
modulation method has been chosen, one can now
imagine programming transceivers to use more effective
2. THE EIGEN-ALGORITHM FOR INTERFERENCE
modulation methods. Thus, wireless systems of the near
AVOIDANCE
future will be able to choose modulation methods that
avoid ambient interference as opposed to precluding We consider the uplink of a synchronous CDMA
it via sole-use licenses, overpowering it with increased communication system with L users having signature
transmission power, or mitigating it with receiver signal waveforms {S (t)}L=1 of finite duration T, with equal
processing. received power at the base station and ideal channels.
Interference avoidance is the term used for adap- The received signal is
tive modulation methods where individual users — simply
put — employ their signal energy in ‘‘places’’ where inter-
L
ference is weak. Such methods have been shown to R(t) = b S (t) + n(t) (1)
=1
optimize shared use. More precisely, iterative interference
avoidance algorithms yield optimal waveforms that maxi-
mize the signal-to-interference plus noise-ratio (SINR) for 1 Maximum sum capacity or user capacity [8] in a single-receiver
all users while maximizing the sum of rates at which all multiuser system.
INTERFERENCE AVOIDANCE FOR WIRELESS SYSTEMS 1117
where b represents the information symbol sent by matrix Rk of the corresponding interference-plus-
user with signature S (t), and n(t) is an additive noise process.
Gaussian noise process. We assume that all signals are 3. Repeat step 2 until a fixed point is reached for
representable in an arbitrary N-dimensional signal space. which further modification of codewords will bring
Hence, each user’s signature waveform S (t) is equivalent no improvement.
to an N-dimensional vector s and the noise process n(t)
is equivalent to a noise vector n. The equivalent received It has been shown [22] that in a colored noise background
signal vector r at the base station is a variant of this algorithm, in which step 3 is augmented
with a procedure to escape suboptimal fixed points,
L
r= b s + n (2) converges to the optimal fixed point where the resulting
=1 codeword ensemble ‘‘waterfills’’ over the background noise
energy and maximizes sum capacity. If the background
By defining the N × L matrix S having as columns the noise is white and the system is not overloaded (fewer
user codewords s users than signal space dimensions L ≤ N), the algorithm
yields a set of orthonormal codewords that corresponds
| | |
to an ideal situation when users are orthogonal and
D = s1 s2 . . . sL (3)
therefore noninterfering. In the case of overloaded systems
| | |
(L > N) in white noise, the resulting codeword ensembles
form Welch bound equality (WBE) sets [19], which also
the received signal can be rewritten in vector matrix
minimize total squared correlation [15], a measure of the
form as
total interference in the system. In both underloaded and
r = Sb + n (4)
overloaded cases, the absolute minimum attainable total
where b = [b1 · · · bL ]T is the vector containing the symbols squared correlation is often used as a stopping criterion
sent by users. for the eigen-algorithm.
Assuming simple matched filters at the receiver for all Finally, we note that signal space ‘‘waterfilling’’ and
users and unit energy codewords sk , the SINR for user k is the implied maximization of sum capacity are emergent
properties of interference avoidance algorithms. Thus,
(sTk sk )2 1 individual users do not attempt maximization of sum
γk = L = (5) capacity via an individual or ensemble waterfilling
T
j=1, j =k (sk sj )
2 + E[(sTk n)2 ] sTk Rk sk
scheme, but rather, they greedily maximize the SINR
of their own codeword. In fact, individual waterfilling
where Rk is the correlation matrix of the interference plus
schemes over the whole signal space are impossible in this
noise seen by user k having the expression
framework since each user’s transmit covariance matrix
Rk = SST − sk sTk − W (6) X = s sT is of rank one and cannot possibly span an
N-dimensional signal space. So, emergent waterfilling
where W = E[nnT ] is the correlation matrix of the additive and sum capacity maximization is a pleasantly surprising
Gaussian noise. property of interference avoidance algorithms.
Interference avoidance algorithms maximize the SINR
through adaptation of user codewords. This is also 3. GENERALIZING THE EIGENALGORITHM
equivalent to minimizing the inverse SINR defined as
M
x (t) = b() ()
m sm (t) (9)
so that
m=1
x = S b (11)
as if each symbol in the frame corresponded to a distinct
virtual user. Therefore, the received signal can be rewritten as
In the N -dimensional signal space corresponding to
user , each waveform can be represented as an N -
L
r= H S b + n (12)
dimensional vector, thus the input vector x corresponding =1
to user is equivalent to a linear superposition of unit norm
codeword column vectors s() m scaled by the corresponding
Note that under the assumption that M ≥ N , the N × N
m . Therefore, each user uses an N × M codeword
b() transmit covariance matrix of user , X = E[x xT ] = S ST ,
matrix S has full rank and spans user ’s signal space.
| | | Extension of the eigen-algorithm to this general mul-
S = s()
1 s()
2 · · · s()
M (10) tiaccess vector channel setting is presented elsewhere in
| | | the literature [16,26]. The procedure starts by separating
the interference-plus-noise seen by a given user k and
rewriting the received signal in equation (12) from the
Ψ1 perspective of user k as
L
r = Hk Sk bk + H S b + n = Hk Sk bk + zk (13)
=1, =k
! "# $
ace zk =interference + noise
sp
ub
r 1s The covariance matrix of the interference-plus-noise seen
Use by user k
L
Zk = E[zk zTk ] = H S ST HT + W
e
Ψ2 (14)
c
pa
=1, =k
s
ub
2s
1. Start with a randomly chosen codeword ensemble the frequencies that define the signal space, and the modu-
specified by user codeword matrices {Sk }Lk=1 . lation scheme turns out to be a form of multicarrier CDMA
2. For each user k = 1 · · · L [28]. We note that even though interference avoidance is
a. Compute the transformation matrix Tk that applied in this multicarrier modulation framework [25],
whitens the interference-plus-noise seen by the method is completely general and applicable to various
user k. scenarios with appropriate selection of signal space basis
functions. For example using sinc functions the method
b. Change coordinates and compute the transformed
is applicable to time-domain representations in which the
user k channel matrix H̃k = Tk Hk .
vector channel model is obtained from Nyquist-sampled
c. Apply SVD for H̃k and project the problem onto waveforms (see, e.g., Ref. 29). Using ‘‘time chips’’ as basis
user k’s signal space. functions a vector model for DSCDMA systems with mul-
d. Adjust user k’s codewords sequentially using the tipath is obtained [16,18].
greedy procedure of the basic eigen-algorithm; Application of the generalized eigen-algorithm to
the codeword corresponding to symbol m of user multiuser MIMO systems is also possible [16,30]. The
k is replaced by the minimum eigenvector of same multicarrier modulation framework with multiple
the autocorrelation matrix of the corresponding antennas at the transmitter and receiver imply a
interference-plus-noise process. MIMO channel matrix composed of block diagonal
e. Iterate the previous step until convergence matrices corresponding to each transmit/receive antenna
(making use of escape methods [22] if the pair. We mention again that other MIMO channel
procedure stops in suboptimal points). models — for example, the spatiotemporal MIMO channel
3. Repeat step 2 until a fixed point is reached. model [31], in which the MIMO channel matrix is
composed of convolution matrices corresponding to each
We note that in steps 2d,e of the algorithm, user k transmit/receive antenna pair — are perfectly valid and
waterfills its signal space by application of the basic eigen- the generalized eigen-algorithm can be used for codeword
algorithm [22] for interference avoidance by regarding optimization in conjunction with such models as well.
all other user’s signals as noise. Thus, the generalized A similar signal space approach is used to apply the
eigen-algorithm iteratively waterfills each user’s signal generalized eigen-algorithm for codeword optimization in
space. It has been shown [23] that an iterative waterfilling very general asynchronous CDMA system models [16,32].
algorithm converges to a fixed point where the sum We note that for particular cases less general interference
capacity of the vector multiaccess channel is maximized, avoidance algorithms can be used [33].
which implies that the generalized eigen-algorithm will
always yield a codeword ensemble that maximizes sum 5. CONCLUSION
capacity.
Motivated by the emergence of software radios and the
4. THE GENERALIZED EIGEN-ALGORITHM: A VERSATILE desire to foster creative uses of wireless spectrum in
TOOL FOR CODEWORD OPTIMIZATION unlicensed bands, interference avoidance methods have
been developed for wireless systems. The underlying
The generalized eigen-algorithm is a powerful tool that idea of interference avoidance is for each user to
enables application of interference avoidance methods to greedily optimize spectrum use (SINR or capacity) through
various communication problems in which the underlying appropriate signal placement in response to interferers.
model is a multiaccess vector channel. Among these we Interference avoidance is applicable to a wide range of
mention codeword optimization in the uplink of a CDMA communications scenarios including dispersive channels,
system with nonideal (dispersive) channels, multiuser multiple antenna systems, and asynchronous systems.
systems with multiple inputs and outputs (MIMO), and The utility of interference avoidance methods in real
asynchronous CDMA systems, for which an appropriate wireless systems with multiple receivers — such as is
selection of signal space basis functions leads to particular found in a cellular environment — although currently
cases of the general multiaccess vector channel model for under study, is still an open question in general.
which application of the generalized eigen-algorithm to Specifically, theoretical results have been established
codeword optimization becomes straightforward. for application of interference avoidance methods in
For the uplink of a CDMA system with dispersive chan- a collaborative scenario wherein information from all
nels considered in earlier studies [16,25], the spanning set receivers is pooled and used to decode all users in the
of the signal space consists of a set of real sinusoids (sine system [34,35]. Since all information is available for use
and cosine functions) that are approximately eigenfunc- in decoding users, the collaborative scenario constitutes a
tions for all uplink channels corresponding to all users. best case of sorts. Unfortunately, real systems may not be
Introduced in 1964 [27], channel eigenfunctions form an collaborative or even cooperative.
orthonormal spanning set with the property that their Early experiments with geographically dispersed users
corresponding channel responses are also orthogonal, thus assigned to different bases showed unstable behavior
allowing convenient representation of channel outputs as under direct application of the eigen-algorithm, mirroring
scaled versions of the input vectors. In this case, the chan- the anecdotally reported behavior of early cordless
nel matrix of a given user is a diagonal matrix in which phones (see Section 1, above). However, by allowing
diagonal elements correspond to channel gain factors for each user to send and adapt multiple codewords — a
1120 INTERFERENCE AVOIDANCE FOR WIRELESS SYSTEMS
Dimitrie C. Popescu received the Engineering Diploma 6. Special issue on software radio, IEEE Pers. Commun. Mag.
6(4) (Aug. 1999) K.-C. Chen, R. Prasad, and H. V. Poor, eds.
and M.S. degrees in 1991 from the Polytechnic Institute of
Bucharest, Romania, and the Ph.D. degree from Rutgers 7. J. Mitola, The software radio architecture, IEEE Commun.
University in 2002, all in Electrical Engineering. He Mag. 33(5): 26–38 (May 1995).
is currently an Assistant Professor in the Department 8. P. Viswanath, V. Anantharam, and D. Tse, Optimal sequen-
of Electrical and Computer Engineering, the University ces, power control and capacity of spread spectrum systems
of Texas at San Antonio. His research interests are with multiuser linear receivers, IEEE Trans. Inform. Theory
in the general areas of communication systems, control 45(6): 1968–1983 (Sept. 1999).
engineering, and signal processing. In the summer of 1997 9. E. M. Valsbord and V. I. Zhukovski, Introduction to Multi-
he worked at AT&T Labs in Florham Park, New Jersey, Player Differential Games and Their Applications, Gordon
on signal processing algorithms for speech enhancement, and Breach Science Publishers, 1988.
and in the summer of 2000 he worked at Telcordia 10. R. B. Meyerson, Game Theory: Analysis of Conflict, Harvard
Technologies in Red Bank, New Jersey, on wideband Univ. Press, 1991.
CDMA systems. His work on interference avoidance and 11. S. Ulukus, Power Control, Multiuser Detection and Interfer-
dispersive channels was awarded second prize in the ence Avoidance in CDMA Systems, Ph.D. thesis, Rutgers
AT&T Student Research Symposium in 1999. Univ., Dept. Electrical and Computer Engineering, 1998 (the-
sis director: Prof. R. D. Yates).
Dr. Christopher Rose received the B.S. (1979), M.S. 12. J. G. Proakis, Digital Communications, 4th ed., McGraw-Hill,
(1981), and Ph.D. (1985) degrees all from the Mas- New York, 2000.
sachusetts Institute of Technology in Cambridge, Mas- 13. S. Haykin, Communication Systems, Wiley, New York, 2001.
sachusetts. Dr. Rose joined At&T Bell Laboratories in
14. H. L. Van Trees, Detection, Estimation, and Modulation
Holmdel, New Jersey as a member of the Network Sys- Theory, Part I, Wiley, New York, 1968.
tems Research Department in 1985 and in 1990 moved
to Rutgers University, where he is currently an Asso- 15. C. Rose, S. Ulukus, and R. Yates, Wireless systems and inter-
ference avoidance, IEEE Trans. Wireless Commun. 1(3): (July
ciate Professor of Electrical and Computer Engineering
2002). preprint available at http://steph.rutgers.edu/∼crose/
and Associate Director of the Wireless Networks Labo- papers/avoid17.ps.
ratory. He is Editor for the Wireless Networks (ACM),
Computer Networks (Elsevier) and Transactions on Vehic- 16. D. C. Popescu, Interference Avoidance for Wireless Systems,
Ph.D. thesis, Rutgers Univ., Dept. Electrical and Computer
ular Technology (IEEG) journals and has served on many
Engineering (2002) (thesis director: Prof. C. Rose).
conference technical program committees. Dr. Rose was
technical program Co-Chair for MobiCom’97 and Co-Chair 17. P. B. Rapajic and B. S. Vucetic, Linear adaptive transmitter-
of the WINLAB Focus’98 on the U-NII, the WINLAB receiver structures for asynchronous CDMA systems, Eur.
Trans. Telecommun. 6(1): 21–27 (Jan.–Feb. 1995).
Berkeley Focus’99 on Radio Networks for Everything and
the Berkeley WINLAB Focus 2000 on Picoradio Networks. 18. G. S. Rajappan and M. L. Honig, Signature sequence adap-
Dr. Rose, a past member of the ACM SIGMobile Executive tation for DS-CDMA with multipath, IEEE J. Select. Areas
Committee, is currently a member of the ACM MobiCom Commun. 20(2): 384–395 (Feb. 2002).
Steering Committee and has also served as General Chair 19. M. Rupf and J. L. Massey, Optimum sequence multisets for
of ACM SIGMobile MobiCom 2001 (Rome, July 2001). In synchronous code-division multiple-access channels, IEEE
December 1999 he served on an international panel to Trans. Inform. Theory 40(4): 1226–1266 (July 1994).
evaluate engineering teaching and research in Portugal. 20. P. Viswanath and V. Anantharam, Optimal sequences and
His current technical interests include mobility man- sum capacity of synchronous CDMA systems, IEEE Trans.
agement, short-range high-speed wireless (Infostations), Inform. Theory 45(6): 1984–1991 (Sept. 1999).
and interference avoidance methods for unlicensed band 21. D. C. Popescu and C. Rose, Interference avoidance and
networks. multiaccess dispersive channels. In Proc. 35th Annual
INTERFERENCE MODELING IN WIRELESS COMMUNICATIONS 1121
Asilomar Conference on Signals, Systems, and Computers, and Computer Engineering, 2001 (thesis director: Prof.
Pacific Grove, CA, November 2001. C. Rose).
22. D. C. Popescu and C. Rose, CDMA codeword optimiza- 39. T. M. Cover and J. A. Thomas, Elements of Information
tion for uplink dispersive channels through interference Theory, Wiley-Interscience, New York, 1991.
avoidance. IEEE Transactions on Information Theory. (Sub-
mitted 12/2002, revised 10/2001. Preprint available at
http://www.winlab.rutgers.edu/∼cripop/papers).
INTERFERENCE MODELING IN WIRELESS
23. D. C. Popescu and C. Rose, Fading channels and interference COMMUNICATIONS
avoidance. In Proc. 39th Allerton Conference on Communica-
tion, Control, and Computing, Monticello, IL, October 2001.
XUESHI YANG
24. G. Strang, Linear Algebra and Its Applications, 3rd ed., ATHINA P. PETROPULU
Harcourt Brace Jovanovich College Publishers, 1988. Drexel University
25. C. Rose, CDMA codeword optimization: Interference avoid- Philadelphia, Pennsylvania
ance and convergence via class warfare, IEEE Trans. Inform.
Theory 47(4): 2368–2382 (Sept. 2001).
1. INTRODUCTION
26. W. Yu, W. Rhee, S. Boyd, and J. M. Cioffi, Iterative water-
Filling for Gaussian Vector multiple access channels, 2001
IEEE Int. Symp. Information Theory, ISIT’01, Washington, In wireless communication networks, signal reception
DC, June 2001 (submitted for journal publication). is often corrupted by interference from other sources
27. H. J. Landau and H. O. Pollack, Prolate spheroidal wave that share the same propagation medium. Knowledge of
functions, Fourier analysis and uncertainty — III: The the statistics of interference is important in achieving
dimension of the space of essentially time- and band-limited optimum signal detection and estimation. Construction
signals, Bell Syst. Tech. J. 40(1): 43–64 (Jan. 1961). of most receivers is based on the assumption that the
28. D. C. Popescu and C. Rose, Interference avoidance for mul- interference is i.i.d. (independent, identically distributed)
tiaccess vector channels, 2002 IEEE Int. Symp. Information Gaussian. The Gaussianity assumption is based on
Theory, ISIT’02, Lausanne, Switzerland, July 2002 (submit- the central-limit theory. However, in situations such
ted for publication in IEEE Trans. Inform. Theory). as underwater acoustic noise, urban and synthetic
29. J. L. Holsinger, Digital Communication over Fixed Time-
(human-made) RF noise, low-frequency atmospheric noise,
Continuous Channels with Memory — with Special Appli- and radar clutter noise, the mathematically appealing
cation to Telephone Channels, Technical Report 366, Gaussian noise model is seldom appropriate [14]. For such
MIT — Lincoln Lab., 1964. cases, several mathematical models have been proposed,
30. N. Yee and J. P. Linnartz, Multi-Carrier CDMA in an Indoor
including the class A noise model [18], the Gaussian
Wireless Radio Channel, Technical Memorandum UCB/ERL mixture model [28], and the α-stable model [20]. All these
M94/6, Univ. California, Berkeley, 1994. noise models share a common feature: the tails of their
probability density function decay in a power-law fashion,
31. S. N. Diggavi, On achievable performance of spatial diversity
as opposed to the exponentially decaying tails of the
fading channels, IEEE Trans. Inform. Theory 47(1): 308–325
(Jan. 2001). Gaussian model. This implies that the corresponding time
series appear bursty, or impulsive, since the probability of
32. D. C. Popescu and C. Rose, Interference avoidance and
attaining very large values can be significant.
multiuser MIMO systems, 2002. Preprint available at
Existing models for non-Gaussian impulsive noise
http://www.winlab.rutgers.edu/∼cripop/papers).
are usually divided into two groups: empirical models
33. G. G. Raleigh and J. M. Cioffi, Spatio-temporal coding for and statistical–physical models. The former include
wireless communication, IEEE Trans. Commun. 46(3): the hyperbolic distribution and the Gaussian mixture
357–366 (March 1998).
models [28,29]. They fit a mathematical model to the
34. D. C. Popescu and C. Rose, Codeword optimization for asyn- measured data, without taking into account the physical
chronous CDMA systems through interference avoidance, mechanism that generated the data. Although empirical
Proc. 36th Conf. Information Sciences and Systems, CISS models offer mathematical simplicity in modeling, their
2002, Princeton, NJ, March 2002.
parameters are not related to any physical quantities
35. S. Ulukus and R. Yates, Optimum signature sequence sets related to the data. On the other hand, statistical–physical
for asynchronous CDMA systems, Proc. 38th Allerton Conf. models are grounded on the physical noise generation
Communication, Control, and Computing, Oct. 2000. process, and their parameters are linked to physical
36. O. Popescu and C. Rose, Minimizing total squared correlation parameters. The first physical models for noise can be
for multibase systems, Proc. 39th Allerton Conf. Communica- traced back in the works of Furutsu and Ishida [9],
tion, Control, and Computing, Monticello, IL, Oct. 2001. Middleton [18], and later in the works of Sousa [26],
37. O. Popescu and C. Rose, Interference avoidance and sum Nikias [20], Ilow [13], and Yang [33] and colleagues.
capacity for multibase systems, Proc. 39th Allerton Conf. All of these models consider the following scenario. A
Communication, Control, and Computing, Monticello, IL, Oct. receiver is surrounded by interfering sources that are
2001. randomly distributed in space according to a Poisson
38. D. Tabora, An Analysis of Covariance Estimation, Codeword point process. The receiver picks up the superposition
Feedback, and Multiple Base Performance of Interference of the contributions of all the interferers. By assuming
Avoidance, Master’s thesis, Rutgers Univ., Dept. Electrical that the propagation path loss is inversely proportional
1122 INTERFERENCE MODELING IN WIRELESS COMMUNICATIONS
to the power of the distance between the receiver and σ > 0 : scale parameter; also referred to as
the interferer, and that the pulses have a symmetrically dispersion. 2σ 2 equals to the variance
distributed random amplitude, it has been shown that in the Gaussian distribution case
the resulting instantaneous noise is impulsive and non- µ : location parameter
Gaussian [18,20]. The power-law assumption for path loss
is consistent with empirical measurement data [10,18]. The distribution is called symmetric α-stable (SαS) if
The same model can be applicable for modeling η = 0.
interference in a spread-spectrum wireless system, where Since (1) is characterized by four parameters, we denote
randomly located users access the transmission medium stable distributions by Sα (σ, η, µ), and indicate that the
simultaneously. The user identity is determined by a random variable X is α-stable distributed by
signature, which is embedded in the transmitted signal.
Ideally, signatures of different users are orthogonal; thus, X ∼ Sα (σ, η, µ) (2)
by correlating a certain signature with the total received
signal, the corresponding user signal can be recovered. The probability density functions of α-stable random
However, in practice the received signals from other users variables are seldom given in closed form. Some exceptions
are not perfectly orthogonal to the signature waveform are the following:
of the user of interest, due to non-perfect orthogonal
codes and channel distortion. Therefore, correlation at the • The Gaussian distribution, which can be expressed
receiver results in an interference term, usually referred as S2 (σ, 0, µ), and whose density is
to as cochannel interference. Cochannel interference is 2
the determining factor as far as quality of service is 1 −
(x−µ)
f (x) = √ e 4σ 2 = N(µ, 2σ 2 ) (3)
concerned, and sets a limit to the capacity of a spread- 2σ π
spectrum communication system. Modeling, analysis, and
mitigation of cochannel interference has been the subject • The Cauchy distribution, S1 (σ, 0, µ), whose density is
of numerous studies.
σ
The goal of this article is to provide a mathematical f (x) = (4)
treatment of the interference that occurs in wireless π((x − µ)2 + σ 2 )
communication systems. It is organized as follows. In
Two important properties of stable distributions are
Section 2, we provide a brief treatment of α-stable
the stability property and the generalized central-limit
distributions and heavy-tail processes. We also outline
theorem.
some techniques for deciding whether a set of data follows
The stability property is defined as follows. A random
an α-stable distribution, and for estimating the model
variable X has a stable distribution if and only if for
parameters. In Section 3, we discuss two widely studied
arbitrary constants a1 and a2 , there exist constants a
statistical–physical models for interference: the α-stable
and b such that a1 X1 + a2 X2 has the same distribution
model and the class A noise model.
as aX + b, where X1 and X2 are independent copies
of X. A consequence of the stability property is the
2. PROBABILISTIC BACKGROUND generalized central-limit theorem, according to which, the
stable distribution is the only possible limit distribution of
sums of i.i.d. random variables. In particular, if X1 , X2 , . . .
2.1. α-Stable Distributions are i.i.d. random variables, and there exists sequences of
α-Stable distributions are defined in terms of their positive numbers {an } and real numbers {bn }, such that
characteristic function:
X1 + · · · + Xn
Sn = − bn (5)
an
(ρ) = exp{ıµρ − σ α |ρ|α (1 + ıηsign(ρ)ϕ(ρ, α))} (1a)
converges to X in distribution, then X is stable distributed.
with If the Xi are i.i.d. and have finite variance, then the limit
distribution is Gaussian and the generalized central-limit
απ
tan if α = 1 theorem reduces to the ordinary central-limit theorem.
ϕ(ρ, α) = 2 2 (1b) α-Stable distributions are a special class of the so called
ln |ρ| if α=1 heavy-tail distributions. A random variable X is said be
π
heavy-tail distributed if
1, if ρ>0
sign(ρ) = 0, if ρ=0 (1c) Pr(|X| ≥ x) ∼ L(x)x−α , 0<α<2 (6)
−1, if ρ<0
where L(x) is a slowly varying function at infinity, that
where α ∈ (0, 2] : characteristic exponent — a measure is, limx→∞ L(cx)/L(x) = 1 for all c > 0. Thus, α-stable
of rate of decay of the distribution distributions with α < 2 have power-law tails, as opposed
tails; the smaller the α the heavier to the exponential tails of the Gaussian distribution.
the tails of the distribution Figure 1 illustrates the survival function P(X > x) of α-
η ∈ [−1, 1] : symmetry index stable distributions for different α (α = 0.5, 1, 1.5, 2). We
INTERFERENCE MODELING IN WIRELESS COMMUNICATIONS 1123
consists in plotting αk as a function of k. The sequence αk−1 the receiver according to a Poisson process, if continuous
is a consistent sequence of estimators of α −1 , thus the Hill time is considered, or according to a Bernoulli process if
plot should stabilize after some k to a value roughly equal discrete time is considered.
to α. In practice, finding that stable region of the Hill plot During each symbol interval T at the receiver there
is not a straightforward task, since the estimator exhibits are a random number of transmitting users, which have
extreme volatility. emitted pulses that interfere with signal reception during
the particular interval. The users are assumed to be
2.2.2. The QQ Estimator. The slope of regression 1 − Poisson-distributed in space with density λ:
log(1 − j/(k + 1)) through the points log(X(N−k+j) /X(N−k) )
can be used as an estimator of the tail index. This estimator e−λR (λR)k
P[k transmitting users in a region R] = (11)
is referred to as the QQ estimator [15]. Computing the k!
slope we find that the QQ estimator is given by
The signal transmitted from the ith interfering user,
k that is, pi (t) propagates through the transmission medium
i
− log and the receiver filters, and as a result become attenuated
k+1
i=1 and distorted. For simplicity, we will assume that
k distortion and attenuation can be separated. Let us
×n log(X(N−i+1) ) − log(X(N−j+1) ) first consider only the filtering effect. For short time
j=1 intervals, the propagation channel and the receiver can
α̂k−1 =
k
2
k 2 be represented by a time-invariant filter with impulse
i i
k − log − − log response h(t). As a result of filtering only, the contribution
k+1 N+1 of the ith interfering source at the receiver is of the form
i=1 i=1
(10) xi (t) = pi (t) ∗ h(t), where the asterisk denotes convolution.
The attenuation of the propagation is determined by
The QQ estimator is consistent for i.i.d. data if k−1 + the transmission medium and environment. In wireless
k/N → 0. Its variance is larger than the asymptotic communications, the power loss increases logarithmically
variance of the Hill estimator; however, the variability with the distance between the transmitter and the
of the QQ estimator always seems to be less than that of receiver [18]. If the distance between the transmitter and
the Hill estimator. the receiver is r, the power loss function may be expressed
Again, the choice of k is a difficult problem. Two in terms of signal amplitude loss function, a(r):
different plots can be constructed on the basis of the QQ
estimator: (1) the dynamic QQ plot obtained from plotting 1
a(r) = (12)
(k, α̂k ), l < k < N (similar to the Hill plot) and (2) the rγ /2
static QQ estimator, which is obtained by representing the where γ is the path loss exponent and γ is a function of the
data log(X(N−k+1) /X(N−k) ) as a function of log(1 − j/(k + 1)) antenna height and the signal propagation environment.
together with the least-squares regression line. The slope This may vary from slightly less than 2, for hallways
−1
of that line is used to compute the QQ estimator α̂k,n . within buildings, to >5, in dense urban environments and
hard-partitioned office buildings [21].
3. MODELING INTERFERENCE Taking into account attenuation and filtering, the
contribution of the ith interfering source at the receiver
3.1. A α-Stable Noise Model equals a(ri )xi (t), where ri , is the distance between the
Consider a packet radio network, where the users are interferer and the receiver.
distributed on a plane. The basic unit of time is the slot, Assuming that the interferers within the region of
which is equal to the packet transmission time T. Let us consideration have the same isotropic radiation pattern,
assume that all users are synchronized at the slot level. and that the receiver has an omnidirectional antenna, the
In a typical situation, a user transmits a packet to some total signal at the receiver is
destination at a distance. The success of the reception
depends partly on the amount of interference experienced x(t) = s(t) + a(ri )xi (t) (13)
by the receiving user. The interference at a certain i∈N
network location consists of two terms: self-interference, where s(t) is the signal of interest and N denotes the set of
or interference caused by other transmitting users; and interferers at time t. The number of interfering users,
external interference, such as thermal noise, coming from as already discussed, is a Poisson-distributed random
other systems. We are here concerned with the former variable with parameter λ.
type of interference. The receiver consists of a signal demodulator followed
Self-interference is shaped by the positions of the by the detector. A correlation demodulator decomposes the
network users and the transmission characteristics of received signal into an N-dimensional vector. The signal
each user. Traditionally, the locations of interfering users is expanded into a series of orthonormal basis functions
have been assumed to be distributed on a plane according {fn (s), 0 < s ≤ T, n = 1, . . . , N}. Let Zn (m) be the projection
to a Poisson point process. Although this is a simplistic of x(t) onto fn (·) at time slot m:
assumption, especially in the case of mobile users where
T
their locations can change in a time-varying fashion, it
facilitates analysis. The packets are assumed to arrive at Zn (m) = x(s + (m − 1)T))fn (s) ds (14)
0
INTERFERENCE MODELING IN WIRELESS COMMUNICATIONS 1125
To compute the probability of symbol error, one needs to By substituting (20) in (19), we obtain
determine first the joint probability density function of the
samples Zn (m). The samples Zn (m) may be expressed as Yb (ω) = E{AN }
Zn (m) = Sn (m) + a(ri )Xi,n (m) (15)
∞
= P(N = k)Ak
i∈N
k=0
= Sn (m) + Yn (m) (16)
∞
(λπ b2 )k e−λπ b
2
= Ak
where Xi,n (m) and Sn (m) are, respectively, the result of k!
k=0
the correlations of xi (t) and s(t) with the basis functions
2 (A−1)
fn (·), and Yn (m) represents interference. = eλπ b
Let us assume that the Xi,n (m) are spatially inde-
pendent [e.g., Xi,n (m) independent Xj,n (m) for i = j]. For By considering the logarithm of the characteristic function,
simplicity, we assume tht Xi,k (m), Xi,l (m) are identically and also using (20), we obtain
distributed. Therefore, we shall concentrate on one dimen-
sion only. For notational convenience, we shall drop the Yb (ω) = log Yb (ω)
subscript n, thus denoting Yn (m) and Xi,n (m) by respec-
b
tively Y(m) and Xi (m). According to the i.i.d. assumption,
= λπ b2
f (r)X (a(r)ω) dr − 1 (21)
we will later on denote Xi (m) by X(m). 0
For some time m, Y(m) is the sum of a random number
of i.i.d. random variables, which are contributions of Taking into account (12), setting
interfering users. To compute the characteristic function
of Y(m), we first restrict the sum to contain interferers 4
α= (22)
within a disk, Db , centered at the receiver and has radius b. γ
Later, we will let the disk radius b → ∞. The characteristic
function of interference received from Db , to be denoted by and after a variable substitution, the equation above
Yb (m), is becomes
Yb (ω) = E{ejωYb (m) } α ∞
αω
Nb Yb (ω) = λπ b2
t −1−α
X (t) dt − 1 (23)
= E{ejω i=0 a(ri )Xi (m) } (17) b2 ωb−2/α
where Nb is a random variable representing the number To obtain the first-order characteristic function of Y(n),
of interferers at times m in Db . Since the interferers are we need to evaluate this expression for b → ∞. Using
Poisson-distributed in the space, given that there are k of integration by parts, and noting that X (ωb−2/α ) − 1 → 0
them in a disk Db , they are i.i.d. and uniformly distributed. as b → ∞, we obtain
Thus, the distance ri between the ith interferer and the
receiver has density function Y (ω) = lim Yb (ω) = −σ |ω|α , for 0 < α < 2 (24)
b→∞
2r
r≤b where
f (r) = b2 (18) ∞
∂X (x) −α
0 elsewhere σ = −π λ x dx (25)
0 ∂x
From (17), using the i.i.d. property of Xi (m) and ri , and
also using the property of conditional expected values Equation (25) corresponds to the log-characteristic func-
tion of a SαS distribution with exponent α; thus, it implies
E{X} = E{E{X | Y}} that the interference Y(n), for a fixed n, is SαS, or equiva-
we obtain lently, marginally SαS.
N The basic propagation characteristic that led to the
Yb (ω) = E{E{ejω i=0
a(ri )Xi (m))
| N}} nice closed form result shown above is the power-law
= E{[E{ejωa(r)X(m) }]N } (19) attenuation. However, the attenuation expression in (12)
is valid for large values of r. As r → 0, the signal amplitude
where ri and Xi (m) are random variables, which, according appears to approach infinity. To avoid this problem, an
to the i.i.d. assumption, are generically denoted by r and alternative loss function was proposed [11]
X(m), respectively. The inner expectation in (19) equals
a (r) = min(s, r−γ /2 ) (26)
A = E{ejωa(r)X(m) }
= E{E{ejωa(r)X(m) | r}} for some s > 0. Under this form of attenuation, the log-
characteristic function is as in (24), except that now σ =
= E{X (a(r)ω)} σ (ω). The difference between the two log-characteristic
b functions approaches zero as ω → ∞ while s is fixed. For
= f (r)X (a(r)ω) dr (20) small ω the difference tends to zeros as s → 0.
0
The meaning of (24) is illustrated next via an
where X (.) is the characteristic function of X(·). example. Consider a direct-sequence spread-spectrum
1126 INTERFERENCE MODELING IN WIRELESS COMMUNICATIONS
(DSSS) system with large processing gain and chip a(ri )xi (m) [see Eq. (15)]. Since this particular cochannel
synchronization. Assume that users’ contributions, Xi (m) user continues to be active for some random time duration
values, are Gaussian, and that the attenuation law is 1/r4 . after its initiation of the session, it also contributes to
Equation (24) suggests that the resulting interference at the interference at time m + l with probability F L (l).
the receiver will be Cauchy-distributed. Here F L (·) is the survival function of session life L.
The discussion above applies to both narrowband inter- Therefore, the interference at time m and m + l are
ference and wideband interference. In most communica- correlated, in contrast to conventional i.i.d. assumption
tion systems, the receiver is narrowband. In those cases, for the interference at different times.
we need to derive the statistics of the envelope and phase of The joint statistics of the interference can be analyzed
the impulsive interference. The joint characteristic func- through the joint characteristic function (JCF) of the
tion of the in-phase and quadrature components of the interference at time m and n (m < n):
interference can be found to be [20]
m,n (ω1 , ω2 ) = E exp{jω1 Y(m) + jω2 Y(n)}
log (ω1 , ω2 ) = −c(α)(ω12 + ω22 )α/2 (27)
= E exp jω1 a(ri )Xi (m)
where c(α) is a constant that depends on α. This form
i∈Nn
of joint characteristics function is referred to as isotropic
α-stable.
Let Yc and Ys denote respectively the in-phase and + jω2 a(ri )Xi (m) (30)
quadrature components of the interference. The envelope i∈Nn
n−m
lim xα P(A > x) = β(α, γ ) (29) ln m,n (ω1 , ω2 ) = −σ α F L (l)(|ω1 |α + |ω2 |α )
x→∞
l=1
where β is some function that does not vary with x.
1 α
∞
According to this equation, the envelope is heavy-tailed. − σ FL (l)(|ω1 + ω2 |α + |ω1 − ω2 |α )
To implement optimum signal detection and estima- 2
l=n−m+1
tion, one needs to obtain joint statistics of the interference. (31)
In the literature, for simplicity reasons, interference is
traditionally assumed to be i.i.d. However, as we will with α as defined in (22), and
see next, such assumptions are oversimplified and it is
1/α
inconsistent with practical communication systems. Tem- 1 λπ 3/2 (1 − α/2)
porary dependence arises when the interference sources σ = (32)
2 (1/2 + α/2)
are the cochannel users, whose activity is decided by the
information type being exchanged and the user behavior. Equation (31) implies that the interference at m and n are
Let us define the term emerging interferers at symbol jointly α-stable distributed [24, p. 69].
interval l, which describes the interfering sources whose In high-speed data networks, where large variations
contribution arrive for the first time at the receiver in the of file sizes are exchanged, the holding time of a
beginning of the symbol interval l. It is assumed that the single transmission session exhibits high variability.
emerging interferers at some symbol interval are located It has been shown [30,34] that the holding times
according to a Poisson point process in space. In particular, can be well modeled by heavy-tail-distributed random
at any given symbol interval, the expected number of variables. As data service becomes increasingly popular in
emerging interferers in a unit area/volume is given by λe . wireless communication networks, where more and more
Once a user initiates a transmission, it may last for a bandwidth is available for users, it should be expected that
random duration of time, referred to as session life L. The session life of users will behave similarly as in wired data
distribution of L is assumed to be known a priori. It is not networks, that is, will be heavy-tail-distributed. Indeed,
difficult to show that at any symbol interval m, the active preliminary verification of the latter has been presented
transmitting users are Poisson-distributed in the space in Ref. 16 through statistical analysis of wireless network
with density λ = λe E{L}. traffic data.
Consider an interference source, namely, a cochannel By modeling the session life of the interferers as a
user, whose transmission starts at some random time slot heavy-tail-distributed random variable with tail index
m. According to the previous analysis, it contributes to the 1 < αL < 2, and Xi (m) as i.i.d. Bernoulli random variables
interference observed at time m at the targeted receiver, taking possible values 1 and −1 with equal probability 12 ,
INTERFERENCE MODELING IN WIRELESS COMMUNICATIONS 1127
Interference
where I(·) = codifference of the resulted interference 0
τ = time lag between symbol intervals
αL = tail index of the session life distribution −2000
σ = as defined in (32)
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
processes. For processes with finite autocorrelation, this
implies that the correlation of well separated samples
is not negligible, and decays in a power-law fashion Time
with the distance between the samples. As a result, the
autocorrelation of a long-range dependent process is not Figure 2. Interference at a receiver surrounded by Poisson
distributed interferers.
summable. When the noise exhibits LRD the performance
of signal detection and estimation algorithms, which are
optimized for i.i.d. noise, can degrade [2,32]. For processes
with infinite autocorrelation, such as marginally α-stable 10
processes, LRD is defined in terms of a power-law decaying 9
generalized codifference [22], which is the case in (33).
8
We next present some simulation results based on the
model described above. A wireless network is simulated, 7
Hill estimator ak
k − k0 −α −6000
P{X ≥ k} = 1 + , k = k0 , k0 + 1, k0 + 2 · · · −6000 −4000 −2000 0 2000 4000 6000
σ
Interference quantiles
where k0 is an integer denoting the location parameter, σ > 0 Figure 4. QQ plot of simulated interference and Cauchy random
denotes the scale parameter and α > 0 is the tail index. variables.
1128 INTERFERENCE MODELING IN WIRELESS COMMUNICATIONS
3.2. The Class A Noise Model noise dominates the interfering sources. However, it is
often difficult to evaluate (34). A simplification has been
Another model, which has finite variance, but has
proposed by Spaulding [27], where the infinite sum in (34)
been used to approximate impulsive interference, is the
has been replaced by a finite sum. Note that the density
Middleton class A noise model [18].
function of class A noise can be written as
The class A model assumes that a narrowband receiver
is exposed to an infinite number of potential interfering
∞
sources, which emit basic waveforms with common form, f (ε) = am g(ε; 0, σk2 ) (37)
scale, durations, frequencies, and so on. The parameters m=0
∞
Am −ε2 /2σm2 fmix (x) = ag(x; 0, σ02 ) + (1 − a)g(x; 0, σ12 ) (39)
Pr(ε > ε0 ) ∼
= e−A e 0 , 0 ≤ ε0 < ∞ (34)
m!
m=0 where 0 < a < 1 and σ02 < σ12 .
Efforts have been devoted to developing a multivariate
with 2σm2 = (m/A + )/(1 + ). Here ε, ε0 are normalized class A noise model. The multivariate case arises in the sce-
envelopes nario of communication systems with multiple antennas
or antenna arrays reception, or in times where multiple
E
ε≡ √ (35) time observations are available. In Ref. 7, multivariate
2(1 + ) class A models are developed based on mathematical
E0 extension of the univariate class A model. Three different
ε0 ≡ √ (36)
2(1 + ) extensions — independent, dependent, and uncorrelated
cases — are discussed in Ref. 7. Issues of signal detec-
where E0 is some preselected threshold value of the tion under these models are also investigated. A more
envelope E. There are three parameters involved in the physical-mechanism oriented approach is presented in
class A model, namely, (A, , ). They all have physical Ref. 17. Under assumptions similar to those expressed in
significance as stated next. Ref. 18, the authors consider the noise presented in the
intermediate-frequency (IF) stage in a communication sys-
1. A = the impulsive index, which is defined as the tem with two antennas. By assuming the emission times
average number of emissions multiplied by the mean of the sources are uniformly distributed over a long time
duration of a typical interfering source emission. interval, Ref. 17 obtains the characteristic function of the
When A is large, the central-limit theory comes into resulting noise for cases where the antenna observations
play, and one approaches Gaussian statistics (for may be statistically dependent from antenna to antenna.
envelopes, it is Rayleigh). The probability density function thus can be deduced by
2. ≡ σG2 / = the ratio of the independent Gaussian utilizing Fourier transform techniques.
component σG2 to the intensity of the non-Gaussian,
impulsive component. 4. CONCLUSION
3. = the intensity of the impulsive component.
In this chapter, we present two mathematical models
The phase statistics of the resulting noise is uniformly for interference in wireless communications: the α-stable
distributed in (0, 2π ). model and the class A noise model. Both models are based
The class A noise model is a canonical model, in the on the physical mechanism of the noise generation process,
sense that it is invariant of the particular noise source and lead to a non-Gaussian scenario, where noise may
and associated quantifying parameters. The parameters become impulsive, or heavy-tail-distributed. The noise
of the class A noise model can be deduced from physical process may not be i.i.d. Moreover, under heavy-tailed
measurement, because its derivation is based on a holding times, the noise becomes highly correlated.
general physical mechanism. The class A noise model Impulsiveness of noise can have severe degrading
fits a variety of measurements, and has been applied effects on system performance, particularly on most
in various communication scenaria, where non-Gaussian conventional systems designed for optimal or near-optimal
INTERFERENCE MODELING IN WIRELESS COMMUNICATIONS 1129
performance against white Gaussian noise. A significant 6. J. M. Chambers, Graphical Methods for Data Analysis, PWS
amount of research has been devoted to signal detection Publishing, 1983.
and estimation when the noise is non-Gaussian and/or 7. P. A. Delaney, Signal detection in multivariate Class-A
correlated [e.g., 14,19,20]. interference, IEEE Trans. Commun. 43: 365–373 (Feb.–April
1995).
Xueshi Yang received his B.S. degree in electronic 9. K. Furutsu and T. Ishida, On the theory of amplitude
engineering from Tsinghua University, Beijing, China, distribution of impulsive random noise, J. Appl. Phys. 32(7):
in 1998, and a Ph.D. degree in electrical engineering from (1961).
Drexel University, Philadelphia, Pennsylvania, in 2001. 10. A. Giordano and F. Haber, Modeling of atmosphere noise,
From September 1999 to April 2000 he was a visiting Radio Sci. 7(11): (Nov. 1972).
researcher with Laboratoire de Signaux et Systemes, 11. J. W. Gluck and E. Geraniotis, Throughput and packet error
CNRS-Universite de Paris-Sud, SUPELEC, Paris, France. probability in cellular frequency-hopped spread-spectrum
He is currently a postdoc research associate in Electrical radio networks, IEEE J. Select. Areas Commun. 7: 148–160
Engineering at Princeton University, New Jersey, and (Jan. 1989).
Drexel University. His research interests are in the area 12. B. M. Hill, A simple general approach to inference about the
of non-Gaussian signal processing, self-similar processes, tail of a distribution, Ann. Stat. 3: 1163–1174 (1975).
and wireless/wireline data networking. Dr. Yang received 13. J. Ilow, D. Hatzinakos, and A. N. Venetsanopoulos, Perfor-
the George Hill Jr. Fellowship in 2001 and the Allen mance of FH SS radio networks with interference modeled as
Rothwarf Outstanding Graduate Student Award in 2000, a mixture of Gaussian and Alpha-stable noise, IEEE Trans.
respectively, from Drexel University. Commun. 46(4): (April 1998).
14. S. A. Kassam, Signal Detection in Non-Gaussian Noise,
Athina P. Petropulu received the Diploma in electrical Springer-Verlag, New York, 1987.
engineering from the National Technical University of
15. M. F. Kratz and S. I. Resnick, The QQ estimator and heavy
Athens, Greece, in 1986, the M.Sc. degree in electrical and
tails, Stoch. Models 12(4): 699–724 (1996).
computer engineering in 1988, and her Ph.D. degree in
electrical and computer Engineering in 1991, both from 16. T. Kunz et al., WAP traffic: Description and comparison to
Northeastern University, Boston, Massachusetts. WWW traffic, Proc. 3rd ACM Int. Workshop on Modeling,
Analysis and Simulation of Wireless and Mobile Systems,
In 1992, she joined the Department of Electrical and
Boston, Aug. 2000.
Computer Engineering at Drexel University, Philadelphia,
Pennsylvania, where she is now a professor. During the 17. K. F. McDonald and R. S. Blum, A statistical and physical
academic year 1999–2000 she was an associate profes- mechanisms-based interference and noise model for array
observations, IEEE Trans. Signal Process. 48(7): (July 2000).
sor at LSS, CNRS-Université Paris Sud, École Supérieure
d’Electrícité in France. Dr. Petropulu’s research interests 18. D. Middelton, Statistical-physical models of electromagnetic
span the area of statistical signal processing, communi- interference, IEEE Trans. Electromagn. Compat. EMC-19(3):
cations, higher-order statistics, fractional-order statistics (Aug. 1977).
and ultrasound imaging. She is the coauthor of the text- 19. D. Middleton and A. D. Spaulding, Elements of weak signal
book entitled, Higher-Order Spectra Analysis: A Nonlin- detection in non-Gaussian noise environments, in H. V. Poor
ear Signal Processing Framework, (Englewood Cliffs, NJ: and J. B. Thomas, eds., Advances in Statistical Signal
Prentice-Hall, Inc., 1993). She is the recipient of the 1995 Processing, Vol. 2, Signal Detection, JAI Press, Greenwich,
Presidential Faculty Fellow Award. CT, 1993.
20. C. L. Nikias and M. Shao, Signal Processing with Alpha-
Stable Distributions and Applications, Wiley, New York,
BIBLIOGRAPHY 1995.
21. J. D. Parsons, The Mobile Radio Propagation Channel, Wiley,
1. R. B. D’Agostino and M. A. Stephens, eds., Goodness-of-fit
New York, 1996.
Techniques, Marcel Dekker, New York, 1986.
22. A. P. Petropulu, J.-C. Pesquet, X. Yang, and J. Yin, Power-
2. R. J. Barton and H. V. Poor, Signal detection in fractional
law shot noise and relationship to long-memory processes,
Gaussian noise, IEEE Trans. Inform. Theory 34: 943–959
IEEE Trans. Signal Process. 48(7): (July 2000).
(Sept. 1988).
23. J. Rice, Mathematical Statistics and Data Analysis, Duxbury
3. J. Beran, Statistics for Long-Memory Processes, Chapman &
Press, Belmont, CA, 1995.
Hall, New York, 1994.
24. G. Samorodnitsky and M. S. Taqqu, Stable Non-Gaussian
4. K. L. Blackard, T. S. Rappaport, and C. W. Bostian, Measure-
Random Processes: Stochastic Models with Infinite Variance,
ments and models of radio frequency impulsive noise for
Chapman & Hall, New York, 1994.
indoor wireless communications, IEEE J. Select. Areas Com-
mun. 11: 991–1001 (Sept. 1993). 25. G. W. Snedecor and W. G. Cochran, Statistical Methods, Iowa
State Press, 1990.
5. O. Cappe et al., Long-range dependence and heavy-tail
modeling for teletraffic data, IEEE Signal Process. Mag. 26. E. S. Sousa, Performance of a spread spectrum packet radio
(Special Issue on Analysis and Modeling of High-Speed network link in a Poisson field of interferers, IEEE Trans.
Network Traffic) (in press). Inform. Theory 38(6): (Nov. 1992).
1130 INTERFERENCE SUPPRESSION IN SPREAD-SPECTRUM COMMUNICATION SYSTEMS
27. A. D. Spaulding, Locally optimum and suboptimum detector decode the useful signal reliably. For this reason, signal
performance in a non-Gaussina interference environment, processing techniques are frequently used in conjunction
IEEE Trans. Commun. COM-33(6): 509–517 (1985). with the SS receiver to augment the processing gain,
28. G. V. Trunk, Non-Rayleigh sea clutter: Properties and permitting greater interference protection without an
detection of targets, in D. C. Shleher ed., Automatic Detection increase in the bandwidth. Although much of the work in
and Radar Data Processing, Artech House, Dedham, 1980. this area has been motivated by the applications of SS as
29. E. J. Wegman, S. C. Schwartz, and J. B. Thomas, eds., Topics an antijamming method in military communications, it is
in Non-Gaussian Signal Processing, Springer, New York, equally applicable in commercial communication systems
1989. where SS systems and narrowband communication
30. W. Willinger, M. S. Taqqu, R. Sherman, and D. V. Wilson, systems may share the same frequency bands.
Self-similarity through high-variability: Statistical analysis This article covers both the direct-sequence spread-
of Ethernets LAN traffic at the source level, IEEE/ACM spectrum (DSSS) and frequency-hopping (FH) commu-
Trans. Network. 5(1): (Feb. 1997). nication systems, but the main focus is on the DSSS
31. B. D. Woerner, J. H. Reed, and T. S. Rappaport, Simulation communication systems. For DSSS communication sys-
issues for future wireless modems, IEEE Commun. Mag. tems, two types of interference signals are considered,
(July 1994). namely, narrowband interference (NBI) and nonstationary
32. G. W. Wornell, Wavelet-based representations for the 1/f interference, such as instantaneously narrowband inter-
family of fractal processes, Proc. IEEE 81(10): 1428–1450 ference (INBI).
(Oct. 1993). The early work on narrowband interference rejection
33. X. Yang and A. P. Petropulu, Joint statistics of impulsive techniques in DSSS communications has been reviewed
noise resulted from a Poisson field of interferers, IEEE Trans. comprehensively by Milstein in [1]. Milstein discusses
on Signal Processing (submitted). in depth two classes of rejection schemes: (1) those
34. X. Yang and A. P. Petropulu, The extended alternating
based on least-mean-square (LMS) estimation techniques
fractal renewal process for modeling traffic in high-speed and (2) those based on transform domain processing
communication networks, IEEE Trans. Signal Process. 49(7): structures. The improvement achieved by these techniques
(July 2001). is subject to the constraint that the interference be
relatively narrowband with respect to the DSSS signal
waveform. Poor and Rusch [2] have given an overview
of NBI suppression in SS with the focus on CDMA
INTERFERENCE SUPPRESSION IN
communications. They categorize CDMA interference
SPREAD-SPECTRUM COMMUNICATION
suppression by linear techniques, nonlinear estimation
SYSTEMS
techniques, and multiuser detection techniques (multiuser
detection is outside the scope of this article). Laster
MOENESS G. AMIN
and Reed [3] have provided a comprehensive review
YIMIN ZHANG
of interference rejection techniques in digital wireless
Villanova University
Villanova, Pennsylvania
communications, with the focus on advances not covered
by the previous review articles.
Interference suppression techniques for nonstationary
1. INTRODUCTION signals, such as INBI, have been summarized by Amin and
Akansu [4]. The ideas behind NBI suppression techniques
Suppression of correlated interference is an important can be extended to account for the nonstationary nature
aspect of modern broadband communication platforms. of the interference. For time-domain processing, time-
For wireless communications, in addition to the presence varying notch filters and subspace projection techniques
of benign interferers, relatively narrowband cellular can be used to mitigate interferers characterized by their
systems, employing time-division multiple access (TDMA) instantaneous frequencies and instantaneous bandwidths.
or the Advanced Mobile Phone System (AMPS) may Interference suppression is achieved using linear and
coexist within the same frequency band of the broadband bilinear transforms, where the time–frequency domain
code-division multiple access (CDMA) systems. Hostile and wavelet/Gabor domain are typically considered. Sev-
jamming is certainly a significant issue in military eral methods are available to synthesize the nonstationary
communication systems. Global Positioning System (GPS) interference waveform from the time–frequency domain
receivers potentially experience a mixture of both and subtract it from the received signal.
narrowband and wideband interference, both intentionally Interference rejection for FH is not as well developed as
and unintentionally. interference rejection for DS or for CDMA. In FH systems,
One of the fundamental applications of spread- the fast FH (FFH) is of most interest, and the modulation
spectrum (SS) communication systems is its inherent most commonly used in FH is frequency shift keying (FSK).
capability of interference suppression. SS systems are Two types of interference waveforms can be categorized,
implicitly able to provide a certain degree of protection namely, partial-band interference (PBI) and multitone
against intentional or unintentional interferers. However, interference (MTI). Typically, interference suppression
in some cases, the interference might be much stronger techniques for FH communication systems often employ
than the SS signal, and the limitations on the spectrum a whitening or clipping stage to reject interference, and
bandwidth render the processing gain insufficient to then combined by diversity techniques.
INTERFERENCE SUPPRESSION IN SPREAD-SPECTRUM COMMUNICATION SYSTEMS 1131
The mean-square value E[y2 [n]], representing the output where w[n] is the filter weight vector of elements wl [n],
average power, is given by l = 1, 2, . . . , N, x[n] is the vector that includes the data
within the filter, and y[n] is the output, all at the nth
E(y2 [n]) = E(x2 [n]) − 2wT E(x[n]x[n]) + wT E(x[n]xT [n])w adaptation, and µ is a parameter that determines the
rate of convergence of the algorithm. It is noted that
= E(x2 [n]) − 2wT p + wT Rw (8) in most applications of the LMS algorithm, an external
reference waveform is needed in order to correctly adjust
where p = E(x[n]x[n]) is the correlation vector between the tap weights. However, in this particular application,
x[n] and x[n], and the signal on the reference tap x[n] serves as the external
reference.
R = E(x[n]xT [n]) (9) The drawback of the LMS algorithm is its slow
convergence. To improve the convergence performance,
is the covariance matrix of x[n]. It is noted that, when the techniques including self-orthogonalizing LMS, recur-
PN sequence is sufficiently long, the PN signal samples sive least-squares (RLS), and lattice structure can be
at different taps are approximately uncorrelated. On the used [9,12].
other hand, samples of the narrowband interference at
different taps has high correlations. Since the DSSS signal,
3.1.2. SINR and BER Analysis. The output signal-to-
interference, and thermal noise are mutually uncorrelated,
interference-plus-noise ratio (SINR) and bit error rate
it follows that
(BER) are two important measures for communication
quality and the performance enhancement using the signal
p = E(x[n]x[n])
processing techniques.
= E{(s[n] + u[n] + b[n])(s[n] + u[n] + b[n])} To derive the output SINR, we rewrite the filter output
as1
= E(u[n]u[n]). (10)
N
N
Minimizing the output power E[y2 [n]] yields the y[n] = wl x[n − l] = wl (c[n − l] + u[n − l] + b[n − l])
following well-known Wiener–Hopf solution for the l=0 l=0
optimum weight vector wopt : (15)
where w0 = 1. The signal {y[n]} is then fed to the
wopt = R−1 p (11) PN correlator. Denote L as the number of chips per
information bit. Then the output of the PN correlator,
The cost of notch filtering is the introduction of some which is the decision variable for recovering the binary
distortion into the SS signal. Such distortion results in information, is expressed as
power loss of the desired DSSS signal as well as the
introduction of self-noise. Both effects become negligible
L
L
r= y[n]c[n] = c[n]
when the processing gain is sufficiently large. n=1 n=1
Note that when precise statistical knowledge of the
interference cannot be assumed, adaptive filtering can be
N
× wl (c[n − l] + u[n − l] + b[n − l])
used to update the tap weights. There are a variety of
l=0
adaptive algorithms and receiver structures [6,9–11]. The
optimum Wiener–Hopf filter can be implemented by using
L
L
N
interference escaping the excision process and appearing is commonly applied for time-sampled signals [15].
at the output of the PN correlator. The last term in (16) is Adaptive subband transforms generalize transform-
the additive noise component. domain processing [16,17], and can yield uncorrelated
The mean value of r is transform coefficients.
The interference-suppressed signal based on a block
E(r) = L (17) transform can be written as
N
N
N
where x[n] is the received input vector; A and B are
var(r) = σ 2 = L w2l + L wn wl ρ[n − l] the forward transform matrix and inverse transform
l=1 n=1 l=0
matrix, respectively; and E is a diagonal matrix with
N each diagonal element acting as a weight multiplied to
+ Lσn2 w2l (18) the input signal at each transform bin. The weights can
l=0 be controlled by different schemes. Two commonly used
methods are either to set the weights binary (i.e., a
where σn2 = E(b2 [n]) is the AGWN variance and weight is either one or zero) or to adjust the weights
adaptively. In applying the first method, powerful NBI
ρ[n − l] = E(u[k]u[k + n − l]) is detected by observing the envelope of the spectral
waveform. Substantial interference suppression can be
The three terms of the right-hand side of (18) represent achieved by multiplying the input signal with a weight
the mean square values caused by the self-noise, residual that is set to zero when the output of the envelope detector
narrowband interference, and noise, respectively. at a transform bin exceeds a predetermined level. Figure 3
The output SINR is defined as the ratio of the square of illustrates the concept of transform-domain notch filtering.
the mean to the variance. Thus Adaptive algorithms, such as LMS and RLS, can be used to
E2 (r) L determine the excision weights adaptively. The application
SINR0 = = of these algorithms, however, requires a reference signal
var(r)
N
N
N
N
w2l + wn wl ρ[n − l] + σn2 w2l that is correlated with the DSSS signal.
l=1 n=1 l=0 l=0
When the binary weights are used, the transform-
(19) domain processing technique may suffer from the
Note that if there is no interference suppression filter, interference sidelobes. With block transforms, the energy
wl = 1 for l = 0 and zero otherwise. Therefore, the in the narrowband interference, which initially occupies a
corresponding output SINR is small region of the frequency spectrum, is dispersed in a
relatively large spectral region. In this case, excision of a
L large frequency band may become necessary to effectively
SINRno = (20)
ρ[0] + σn2 remove the interference power from most transform
bins. The frequency dispersion of the interference can
If we assume that the self-noise, residual interference, be nearly eliminated by weighting the received signal
and noise components at the output of correlator is
Gaussian, then the BER can be evaluated in the same
manner as the conventional BPSK corrupted only by
AWGN. Under such assumption, the BER is given by Interference
Signal
0 √ spectrum spectrum
1 2 /2σ 2
Pb = P(r < 0) = √ e−(r−L) dr = Q( SINR0 )
−∞ 2π σ
(21)
where ∞
1 2 /2 w
Q(x) = √ e−v dv (22)
2π x
in the time domain with a nonrectangular weighting process in the presence of AWGN. If the prediction is
function prior to evaluating the transform. In doing so, done in a non-Gaussian environment, as in the case of
the levels of the sidelobes of the interference frequency SS signals, linear prediction methods will no longer be
spectrum are attenuated at the expense of broadening the optimum. In Ref. 2, depending on whether the statistics of
mainlobe [18,14]. In this case, the conventional matched the AR process is known or unknown, time-recursive and
filter is no longer optimal. Using adapted demodulation data-adaptive nonlinear filters with soft-decision feedback
accordingly can improve the receiver performance [19]. are used to estimate the SS signal. For known interference
It is important to point out that for transform-domain statistics, the interference suppression problem is cast in
processing, symbol detection can be performed in either the state space for use with Kalman–Bucy and approximate
time or the transform domain. In the later case, filtering conditional mean (ACM) filters. A fixed-length LMS
and correlation operations can be combined in one step. transversal filter, on the other hand, is used when there
The BER expression for transform-domain interference is no a priori statistical information is provided. With
excision can be easily formulated using the Gaussian the same AR model, both schemes are shown to achieve
tail probability or the Q function. The residual filtered similar performance, which is an improvement over the
and despreaded interference is treated as an equivalent Gaussian assumed environment.
AWGN source. Typically, a uniform interference phase
distribution is assumed. When transform-domain filtering 3.4. Multiuser Detection Techniques
is considered, the BER depends on both the excision
coefficients and the error misadjustment. A narrowband interference could be a digital commu-
nication signal with a data rate much lower than the
3.2. Synthesis and Subtraction for Sinusoidal Interference spread-spectrum chip rate. This is typically the case when
spread-spectrum signals are used in services overlaying
In this section, we view the interference signal as the on existing frequency band occupants. In this case, the
one that is corrupted by the additive noise and the DSSS narrowband interference is a strong communication sig-
signal. In a typical situation, the power level of the DSSS nal that interferes with commercial DSSS communication
signal is negligible relative to the power level of the systems. This type of interferer is poorly modeled as either
interference and, in most cases, relative to the additive a sinusoid or an autoregressive process. Because of the
noise. For high interference-to-noise ratio (INR), the similarity of the spread spectrum signal and the digital
correlation matrix of the received signal vector consists of interference, techniques from multiuser detection theory
a limited number of large eigenvalues contributed mainly are applied to decode the SS user signal and simultane-
by the narrowband interference, and a large number of ously suppressing the interferer [22].
small and almost equal eigenvalues contributed by the In order to apply methods from multiuser detection,
DSSS signal and noise. The eigenanalysis interference the single narrowband interferer is treated as a collection
canceller is designed with a weight vector orthogonal to the of m spread-spectrum users, where m is a function of
eigenvectors corresponding to the large eigenvalues [20]. the relative data rates of the true SS signal and the
The eigendecomposition of the correlation matrix, defined interference. That is, m bits of the narrowband user occur
in (9), results in for each bit of the SS user. As shown in Fig. 4, and using
H square waves for illustrations, each narrowband user’s bit
r 0 Ur
R = UUH = [Ur Un ] (24) can be regarded as a signal arising from a virtual user
0 n UHn having a signature sequence with only one nonzero entry.
where the columns of Ur span the interference subspace, The virtual users are orthogonal, but correlated with the
whereas the columns of Un span the signal with noise SS user.
subspace, and r and n are diagonal matrices whose The optimum receiver implementing the maximum-
elements are the eigenvalues of R. For real sinusoidal likelihood (ML) detector has a complexity that is
interference, the number of dimensions of the interference exponential in the number of virtual users, m. To
subspace is twice the number of interfering tones. overcome such complexity, the optimal linear detector and
The projection of the signal vector on the noise subspace
results in interference suppressed data sequence
n x[n] = (I − Ur Ur )x[n]
x̂[n] = Un UH H
(25)
decorrelating detector are applied. While the first requires efficiency. The BER is given by
knowledge of relative energies of both the narrowband
2m −1
√
interferer and the SS user and maximizes the receiver 1 w2 (1 + αvT p − α(αvT + pT )qi )
asymptotic efficiency, the latter is independent of the Pold = m Q )
2
i=0 σn 1 + 2αvT p + α 2 vT v
receiver energies and achieves the near–far resistance
(28)
of the ML detector. The asymptotic efficiency is the
where the ith element of vector v is given by
limit of the receiver efficiency as the AWGN goes to
zero. It characterizes the detector performance when
1 −ρi > α
the dominant source of corruption is the narrowband
interferer rather than the AWGN. The receiver efficiency, vi = −1ρ ρi > α (29)
− i otherwise.
on the other hand, quantifies the SS user energy that α
would achieve the same probability of error in a system
with the same AWGN and no other users. The input of 4. Ideal Predictor/Subtracter (IPS), which is similar to
both detectors is the output of the filter bank and consists the transversal filter excision techniques described
of filters matched to the spreading codes of each active in Section 3.1. Perfect knowledge of the narrowband
user, as depicted in Fig. 5. signal is assumed. Further, it is assume that perfect
The following expressions have been derived [22] for prediction to the sample interior to the narrowband
the probability of errors of four different detectors (in all bit is achieved and the only error occurs when
four cases, it is assumed that the narrowband signal is predicting at bit transitions. The expressions have
synchronized with the SS signal): been derived [22], where one detector assumes zero
bit estimate of the narrowband bit at the transition
and the other detector takes this estimate to be
1. Conventional Detector (CD), where the received random. For the former detector
signal is sent directly to a single filter matched
m √
1
to the spreading code. The output of the filter is then 2 −1
w2 (1 − α p̃T qi )
compared to a threshold to yield the spread-spectrum Pips = Q (30)
2m σn
bit estimate. This detector is only optimum in the i=0
100
Time
Conv. det. domain excision
Decor. det. f2
Frequency
Opt. lin. det.
10−1 Frequency
Ideal pred./sub.
∆f domain
excision Chirp
Probability of error
f1
10−2
t1 ∆t t2
10−3 Time
distribution properties, including power localization at the based on dechirping techniques commonly applied in radar
IF. As shown in Eq. (32), the TFD is the Fourier transform algorithms. In the first step the time-varying interference
of a time-average estimate of the autocorrelation function. is converted to a fixed-frequency sinusoid eliminated by
A time–frequency notch filter can be designed, in which time-invariant filters. The process is reversed. In the sec-
the position of the filter notch is synchronous with the ond step and the interference-free signal is multiplied by
interference IF estimate. Based on the IF, two constraints the interference t-f signature to restore the DSSS signal
should exist to construct an interference excision filter and noise characteristics that have been strongly impacted
with desirable characteristics. First, an FIR filter with in the first phase.
short impulse response must be used. Long-extent filters Similar to the predictor/subtracter method discussed
are likely to span segments of changing frequency contents in Section 3, Lach et al. proposed synthesis/subtracter
and, as such, allow some of the interference components to technique for FM interference using TFD [30]. A replica
escape to the filter output. Second, at any given time, the of the interference can be synthesized from the t-f domain
filter frequency response must be close to an ideal notch and subtracted from the incoming signal to produce an
filter to be able to null the interference with minimum essentially interference-free channel.
possible distortion of the signal. This property, however, Another synthesis/subtracter method has been intro-
requires filters with infinite or relatively long impulse duced [31] where the discrete evolutionary and the Hough
responses. transforms are used to estimate the IF. The interference
Amin [26] has shown that a linear-phase 5-coefficient amplitude is found by conventional methods such as lin-
filter is effective in FM interference excision. Assuming ear filtering or singular value decomposition. This excision
exact IF values, the corresponding receiver SINR is technique applies equally well to one or multicomponent
given by chirp interferers with constant or time-varying ampli-
L tudes and with instantaneous frequencies not necessarily
SINR = (33) parametrically modeled.
11/8 + 9σn2 /4
To overcome the drawbacks of the potential amplitude
which shows that full interference excision comes at the and phase errors produced by the synthesis methods,
expense of a change in the noise variance in addition of Amin et al. [32] proposed a projection filter approach
a self-noise form, as compared with the noninterference in which the FM interference subspace is constructed
case. The main objective of any excision process is to from its t-f signature. Since the signal space at the
receiver is not specifically mandated, it can be rotated
reduce both effects. The SINR in (33) assumes a random IF
such that a single interferer becomes one of the basis
with uniform distribution over [0, 2π ]. For an interference
functions. In this way, the interference subspace is one-
with fixed frequency ω0 , the receiver SINR becomes
dimensional and its orthogonal subspace is interference-
dependent on ω0 . The receiver performance sensitivity to
free. A projection of the received signal onto the
the interference frequency is discussed in detail in Ref. 26.
orthogonal subspace accomplishes interference excision
Wang and Amin [27] considered the performance
with a minimal message degradation. The projection
analysis of the IF-based excision system using a general
filtering methods compare favorably over the previous
class of multiple-zero FIR excision filters showing the
notch filtering systems.
dependence of the BER on the filter order and its group
Zhang et al. [33] proposed a method to suppress
delay. The effect of inaccuracies in the interference IF on
more general INBI signals. The interference subspace is
receiver performance was also considered as a function of
constructed using t-f synthesis methods. In contrast to the
the filter notch bandwidth. Closed-form approximations
work in Ref. 30, the interferer is removed by projection
for SINR at the receiver are given for the various cases.
rather than subtraction. To estimate the interference
One of the drawbacks to the notch filter approach [26]
waveform, a mask is constructed and applied such that
is the infinite notch depth due to the placement of
the masked t-f region captures the interference energy,
the filter zeros. The effect is a ‘‘self-noise’’ inflicted on
but leaves out most of the DSSS signals.
the received signal by the action of the filter on the
Seong and Loughlin have also extended the projection
PN sequence underlying the spread information signal.
method developed by Amin et al. [32] for excising constant
This problem led to the design of an open-loop filter amplitude FM interferers from DSSS signals to the case of
with adjustable notch depth based on the interference AM/FM interferers [34]. Theoretical performance results
energy. The notch depth is determined by a variable (correlator SNR and BER) for the AM/FM projector filter
embedded in the filter coefficients chosen as the solution show that FM estimation errors generally cause greater
to an optimization problem that maximizes receiver SINR. performance degradation than the same level of error in
The TFD is necessary for this work, even for single estimating the AM. The lower-bound for the correlator
component signals, because simple IF estimators do not SINR for the AM/FM projection filter for the case of both
provide energy information. Amin et al. accomplished this AM and FM errors is given by
work [28], incorporating a ‘‘depth factor’’ into the analysis
and redeveloping all the SINR calculations. The result L−1
SINR = (34)
was a significant improvement in SINR, especially at 1 1 2
−σφ
midrange interference-to-signal ratios (ISR’s), typically + σn2 + A2 (1 − e ) + σ 2
L 1 + σa
2 a
the variances of the estimation errors in the AM and FM, removed the high-value coefficients in all generated
respectively. STFTs, and used the combined results, via efficient least-
Linear t-f signal analysis has also been shown effective square synthesis, to reconstruct an interference-reduced
to characterize a large class of nonstationary interferers. signal. Bultan and Akansu [39] proposed a chirplet-
Roberts and Amin [35] proposed the use of the discrete transform-based exciser to handle chirplike interference
Gabor transform (DGT) as a linear joint time–frequency types in SS communications.
representation. The DGT can attenuate a large class of The block diagram in Fig. 8 depicts the various inter-
nonstationary wideband interferers whose spectra are ference rejection techniques using the time–frequency
localized in the t-f domain. Compared to bilinear TFDs, methods cited above.
the DGT does not suffer from the cross-term interference
problems, and enjoys a low computational complexity. Wei 4.1. Example
et al. [36] devised a DGT-based, iterative time-varying At this point, in order to further illustrate these excision
excision filtering, in which a hypothesis testing approach methods, the work by Amin et al. [32] will be detailed since
was used to design a binary mask in the DGT domain. The it includes comparisons between the two most prominent
time–frequency geometric shape of the mask is adapted techniques based on TFDs currently being studied: notch
to the time-varying spectrum of the interference. They filtering and projection filtering. The signal model is, as
show that such a statistical framework for the transform- expected, given by Eq. (1), and the major theme of the work
domain mask design can be extended to any linear is to annihilate interference via projection of the received
transform. Both the maximum-likelihood test and the local signal onto an ‘‘interference-free’’ subspace generated from
optimal test are presented to demonstrate performance the estimated interference characteristics. This paper
versus complexity. includes a figure, reprinted here as Fig. 9, which clearly
Application of the short-time Fourier transform (STFT) illustrates the tradeoffs between projection and notch
to nonstationary interference excision in DSSS commu- filtering based on the ISR. In the legend, the variable a
nications has been considered [37,38]. In those studies represents the adaptation parameter for the notch filtering
[37,38], due to the inherent property of STFT to trade off scheme and N represents the block size, in samples, for a
temporal and spectral resolutions, several STFTs corre- 128-sample bit duration in the projection method. Thus,
sponding to different analysis windows were generated. N = 128 means no block processing and N = 2 corresponds
Ouyang and Amin [37] used a multiple-pole data window to 64 blocks per bit being processed for projection. Since
to obtain a large class of recursive STFTs. Subsequently, the projection and nonadaptive notch filter techniques are
they employed concentration measures to select the STFT assumed to completely annihilate the interference, their
that localizes the interference in the t-f domain. This pro- performance is decoupled from the interference power,
cedure is followed by applying a binary excision mask and therefore correctly indicate constant SINR across
to remove the high-power t-f region. The remainder is the graph. The dashed line representing the notch filter
synthesized to yield a DSSS signal with improved signal- with a = 0 really indicates no filtering at all, since the
to-interference ratio (SIR). adaptation parameter controls the depth of the notch.
Krongold et al. [38] proposed multiple overdeter- It is evident from Fig. 9 that without adaptation a
mined tiling techniques and utilized a collection of crossover point occurs around 2 dB, where filtering with
STFTs for the purpose of interference excision. Unlike an infinitely deep notch is advantageous. Thus, when
the Ouyang–Amin procedure [37], Krongold et al. [38] the interference power exceeds this point, presumably
Modified PN
15 receiver.
An FFH receiver that employs a prewhitening filter
to reject NBI has been described [44]. For binary
10 Notch filter, a = 0 FSK modulations, it is shown that the FFH signal
Notch filter, a = 1 is statistically uncorrelated at lag values of Th /(4Nh ),
Adaptive notch filter
Projection, N = 2
where Th is the hop duration and 2Nh is the total
5 Projection, N = 4 number of frequency slots (i.e., there are Nh MARK
Projection, N = 8 and Nh SPACEs). Thus, as in the DS case, NBI can be
Projection, N = 128 predicted and suppressed independently of the desired
0 FFH signal. Using the complex LMS algorithm to update
−20 −15 −10 −5 0 5 10 15 20
the prewhitening filter coefficients, this technique is shown
Input ISR (dB)
to compare favorably with the maximal-ratio combiner
Figure 9. Comparison between projection and notch filtering diversity technique. When the interferer is wide-sense
excision methods. stationary, the prewhitening-filter-based receiver provides
performance approaching that of the AGC receiver and
at least 2–3 dB superior to that of the self-normalizing
a user would flip a switch to turn on the excision receiver. However, when hostile interference is present,
subsystem. However, with adaptation, this process the adaptive prewhitening filter technique may not be
happens automatically, while giving superior performance able to track the interference rapidly enough. In this case,
in the midrange. For the projection technique, the nonparametric techniques such as the self-normalizing
block size determines receiver performance conspicuously receiver must be used to reject the jammed hops.
(ceteris paribus). Most important to note, however, is Reed and Agee [46] have extended and improved on
the superior performance of projection over all methods the idea of whitening by using a time-dependent filter
when the block size is equal to the bit duration, namely, structure to estimate and remove interference, based
no block processing. It is feasible that computational on the interference spectral correlation properties. In
complexity may warrant a tradeoff between SINR and this method, the detection of FHSS in the presence of
block size, in which case a hybrid implementation may spectrally correlated interference is nearly independent of
be of benefit — one that automatically switches between the SIR. The process can be viewed as a time-dependent
adaptive notch filtering and projection depending on the whitening process with suppression of signals that
desired SINR. In any case, this example illustrates the exhibit a particular spectral correlation. The technique
parameters involved in the design of modern excision is developed from the maximum-likelihood estimate of the
filters for nonstationary interferers. spectral frequency of a frequency agile signal received
in complex Gaussian interference with unknown spectral
5. INTERFERENCE SUPPRESSION FOR correlation. The resulting algorithm uses the correlation
FREQUENCY-HOPPING COMMUNICATIONS between spectrally separated interference components to
reduce the interference content in each spectral bin prior
Interference rejection for FH is not as well developed to the whitening/detection operation.
as interference rejection for DS or for CDMA. In FH An alternative approach to suppress PBI using the FFT
systems, the fast FH (FFH) is of most interest, and the has been proposed [45]. The major attraction of FFT-based
modulation most commonly used in FH is frequency shift implementation lies in the ability to achieve guaranteed
keying (FSK). Two types of interference waveforms can be accuracy and perfect reproducibility.
categorized, namely, partial-band interference (PBI) and For suppression of MTI, basically the same processing
multitone interference (MTI). methods applied for PBI can be employed. However, the
The effects of PBI and AWGN on several diversity- performance analyses differ from those for PBI situations.
combining receivers in FFH/FSK SS communication sys- The performance depends on the distribution of the MTI
tems have been investigated [40–42,44]. An alternative and, in turn, how many bands are contaminated by
method using a fast Fourier transform (FFT) has been MTI. Performance analyses of FFH SS systems have
proposed [45]. An automatic gain-control (AGC) receiver been presented for linear combining diversity [47,48],
using a diversity technique has also been presented [40]. for clipped diversity [49], for maximum likelihood and
In this method, each soft-decision square-law detected product-combining receivers [50,51].
MARK and SPACE filter output is weighted by the inverse
of the noise power in the slot prior to linear combining. BIOGRAPHIES
This method is near-optimal (in terms of SNR) if the
exact information of noise and interference power can Dr. Moeness Amin received his B.Sc. degree in 1976
be obtained. A similar clipped-linear combining receiver from Cairo University, Egypt, M.Sc. degree in 1980 from
1140 INTERFERENCE SUPPRESSION IN SPREAD-SPECTRUM COMMUNICATION SYSTEMS
University of Petroleum and Minerals, Dhahran, Saudi spread-spectrum systems, IEEE Trans. Commun. COM-30:
Arabia, and his Ph.D. degree in 1984 from University 913–924 (May 1982).
of Colorado at Boulder. All degrees are in electrical 7. L. Li and L. B. Milstein, Rejection of narrow-band interfer-
engineering. He has been on the faculty of the Department ence in PN spread-spectrum system using transversal filters,
of Electrical and Computer Engineering at Villanova IEEE Trans. Commun. COM-30: 925–928 (May 1982).
University, Villanova, Pennsylvania, since 1985, where is 8. R. A. Iltis and L. B. Milstein, Performance analysis of narrow-
now a professor. Dr. Amin is a fellow of the IEEE and the band interference rejection techniques in DS spread-spectrum
recipient of the IEEE Third Millennium Medal. He is also systems, IEEE Trans. Commun. COM-32: 1169–1177 (Nov.
a recipient of the 1997–Villanova University Outstanding 1984).
Faculty Research Award as well as the recipient of 9. S. Haykin, Adaptive Filter Theory, 3rd ed., Prentice-Hall,
the 1997–IEEE Philadelphia Section Service Award. Englewood Cliffs, NJ, 1996.
He is a member of the Franklin Institute Committee
10. R. A. Iltis and L. B. Milstein, An approximate statistical
of Science and Arts. Dr. Amin has 4 book chapters,
analysis of the Widrow LMS algorithm with application to
60 journal articles, 2 review articles, and over 150 narrow-band interference rejection, IEEE Trans. Commun.
conference publications. His research includes the areas of COM-33: 121–130 (Feb. 1985).
wireless communications, time-frequency analysis, smart
11. F. Takawira and L. B. Milstein, Narrowband interference
antennas, anti-jamming gPS, interference cancellation in
rejection in PN spread spectrum system using decision
broadband communication platforms, digitized battlefield, feedback filters, Proc. MILCOM, Oct. 1986, pp. 20.4.1–20.4.5.
high definition TV, target tracking and direction finding,
channel equalization, signal coding and modulation, and 12. J. G. Proakis, Digital Communications, 3rd ed., McGraw-Hill,
New York, 1995.
radar systems. He has two U.S. patents and served as a
consultant to Micronetics Wireless, ELCOM Technology 13. L. B. Milstein and P. K. Das, Spread spectrum receiver using
Corporation, and VIZ Manufacturing Company. surface acoustic wave technology, IEEE Trans. Commun.
COM-25: 841–847 (Aug. 1977).
Yimin Zhang received his M.S. and Ph.D. degrees from 14. S. Davidovici and E. G. Kanterakis, Narrowband interference
the University of Tsukuba, Japan, in 1985 and 1988, rejection using real-time Fourier transform, IEEE Trans.
respectively. He joined the faculty of the Department Commun. 37: 713–722 (July 1989).
of Radio Engineering, Southeast University, Nanjing, 15. R. C. Dipietro, An FFT based technique for suppressing
China, in 1988. He served as a technical manager at narrow-band interference in PN spread spectrum commu-
Communication Laboratory Japan, Kawasaki, Japan, in nication systems, Proc. IEEE Int. Conf. Acoustics, Speech,
1995–1997, and a visiting researcher at ATR Adaptive and Signal Processing, 1989, pp. 1360–1363.
Communications Research Laboratories, Kyoto, Japan, 16. M. V. Tazebay and A. N. Akansu, Adaptive subband trans-
in 1997–1998. Currently, he is a research fellow at the forms in time-frequency excisers for DSSS communication
Department of Electrical and Computer Engineering, Vil- systems, IEEE Trans. Signal Process. 43: 1776–1782 (Nov.
lanova University, Villanova, Pennsylvania. His current 1995).
research interests are in the areas of array signal process- 17. M. Medley, G. J. Saulnier, and P. Das, Adaptive subband
ing, space-time adaptive processing, multiuser detection, filtering of narrowband interference, in H. Szu, ed., SPIE
blind signal processing, digital mobile communications, Proc. — Wavelet Appls. III, Vol. 2762, April 1996.
and time-frequency analysis.
18. J. Gevargiz, M. Rosenmann, P. Das, and L. B. Milstein, A
comparison of weighted and non-weighted transform domain
BIBLIOGRAPHY processing systems for narrowband interference excision,
Proc. MILCOM, 1984, pp. 32.3.1–32.3.4.
1. L. B. Milstein, Interference rejection techniques in spread 19. S. D. Sandberg, Adapted demodulation for spread-spectrum
spectrum communications, Proc. IEEE 76(6): 657–671 (June receivers which employ transform-domain interference exci-
1988). sion, IEEE Trans. Commun. 43: 2502–2510 (Sept. 1995).
2. H. V. Poor and L. A. Rusch, Narrowband interference sup- 20. A. Haimovich and A. Vadhri, Rejection of narrowband
pression in spread-spectrum CDMA, IEEE Pers. Commun. interferences in PN spread spectrum systems using an
Mag. 1(8): 14–27 (Aug. 1994). eigenanalysis approach, Proc. IEEE Signal Processing
3. J. D. Laster and J. H. Reed, Interference rejection in digital Workshop on Statistical Signal and Array Processing, Quebec,
wireless communications, IEEE Signal Process. Mag. 14(3): Canada, June 1994, pp. 1002–1006.
37–62 (May 1997). 21. B. K. Poh, T. S. Quek, C. M. S. See, and A. C. Kot, Suppres-
4. M. G. Amin and A. N. Akansu, Time-frequency for interfer- sion of strong narrowband interference using eigen-structure-
ence excision in spread-spectrum communications (section in based algorithm, Proc. MILCOM, July 1995, pp. 1205–1208.
Highlights of signal processing for communications), IEEE 22. L. A. Rusch and H. V. Poor, Multiuser detection techniques
Signal Process. Mag. 15(5): (Sept. 1998). for narrow-band interference suppression in spread spectrum
5. F. M. Hsu and A. A. Giordano, Digital whitening techniques communications, IEEE Trans. Commun. 43: 1725–1737
for improving spread-spectrum communications performance (Feb.–April 1995).
in the presence of narrow-band jamming and interference, 23. H. V. Poor and X. Wang, Code-aided interference suppression
IEEE Trans. Commun. COM-26: 209–216 (Feb. 1978). for DS/CDMA communications. I. Interference suppression
6. J. W. Ketchum and J. G. Proakis, Adaptive algorithms for capability, IEEE Trans. Commun. 45: 1101–1111 (Sept.
estimating and suppressing narrow-band interference in PN 1997).
INTERLEAVERS FOR SERIAL AND PARALLEL CONCATENATED (TURBO) CODES 1141
24. B. Boashash, Estimating and interpreting the instantaneous 40. J. S. Lee, L. E. Miller, and Y. K. Kim, Probability of error
frequency of a signal. I. Fundamentals, Proc. IEEE 80: analysis of a BFSK frequency-hopping system with diver-
520–538 (April 1992). sity — Part II, IEEE Trans. Commun. COM-32: 1243–1250
(Dec. 1984).
25. B. Boashash, Estimating and interpreting the instantaneous
frequency of a signal. II. Algorithms and applications, Proc. 41. L. E. Miller, J. S. Lee, and A. P. Kadrichu, Probability of
IEEE 80: 540–568 (April 1992). error analyses of a BPSK frequency-hopping system with
diversity under partial-band jamming interference — Part III:
26. M. G. Amin, Interference mitigation in spread-spectrum
Performance of a square-law self-normalizing soft decision
communication systems using time-frequency distributions,
receiver, IEEE Trans. Commun. COM-34: 669–675 (July
IEEE Trans. Signal Process. 45(1): 90–102 (Jan. 1997).
1986).
27. C. Wang and M. G. Amin, Performance analysis of instan-
42. C. M. Keller and M. B. Pursley, Clipper diversity combin-
taneous frequency based interference excision techniques in
ing for channels with partial-band interference — Part I:
spread spectrum communications, IEEE Trans. Signal Pro-
Clipper linear combining, IEEE Trans. Commun. COM-35:
cess. 46(1): 70–83 (Jan. 1998).
1320–1328 (Dec. 1987).
28. M. G. Amin, C. Wang, and A. R. Lindsey, Optimum inter-
43. C. M. Keller and M. B. Pursley, Clipper diversity combin-
ference excision in spread-spectrum communications using
ing for channels with partial-band interference — Part II:
open-loop adaptive filters, IEEE Trans. Signal Process. (July
Ratio-statistic combining, IEEE Trans. Commun. COM-37:
1999).
145–151 (Feb. 1989).
29. S. Barbarossa and A. Scaglione, Adaptive time-varying can-
44. R. A. Iltis, J. A. Ritcey, and L. B. Milstein, Interference rejec-
cellations of wideband interferences in spread-spectrum com-
tion in FFH systems using least squares estimation tech-
munications based on time-frequency distributions, IEEE
niques, IEEE Trans. Commun. 38: 2174–2183 (Dec. 1990).
Trans. Signal Process. 47(4): 957–965 (April 1999).
45. K. C. Teh, A. C. Kot, and K. H. Li, Partial-band jammer
30. S. Lach, M. G. Amin, and A. R. Lindsey, Broadband nonsta- suppression in FFH spread-spectrum system using FFT,
tionary interference excision in spread-spectrum communi- IEEE Trans. Vehic. Technol. 48: 478–486 (March 1999).
cations using time-frequency synthesis techniques, IEEE J.
Select. Areas Commun. 17(4): 704–714 (April 1999). 46. J. H. Reed and B. Agee, A technique for instantaneous
tracking of frequency agile signals in the presence of
31. H. A. Khan and L. F. Chaparro, Formulation and implemen- spectrally correlated interference, Proc. Asilomar Conf.
tation of the non-stationary evolutionary Wiener filtering, Signals, Systems, and Computers, Pacific Grove, CA, Nov.
Signal Process. 76: 253–267 (1999). 1992.
32. M. G. Amin, R. S. Ramineni, and A. R. Lindsey, Interference 47. B. K. Livitt, FH/MFSK performance in multitone jamming,
excision in DSSS communication systems using projection IEEE J. Select. Areas Commun. SAC-3: 627–643 (Sept. 1985).
techniques, Proc. IEEE Int. Conf. Acoustics, Speech, and
Signal Processing, Istanbul, Turkey, June 2000. 48. R. E. Ezers, E. B. Felstead, T. A. Gulliver, and J. S. Wight,
An analytical method for linear combining with application
33. Y. Zhang, M. G. Amin, and A. R. Lindsey, Combined synthe- to FFH NCFSK receivers, IEEE J. Select. Areas Commun. 11:
sis and projection techniques for jammer suppression in 454–464 (April 1993).
DS/SS communications, IEEE Int. Conf. Acoustics, Speech,
and Signal Processing, Orlando, FL, May 2002. 49. J. J. Chang and L. S. Lee, An exact performance analysis
of the clipped diversity combining receiver for FH/MFSK
34. S.-C. Jang and P. J. Loughlin, AM-FM interference excision systems against a band multitone jammer, IEEE Trans.
in spread spectrum communications via projection filtering, Commun. 42: 700–710 (Feb.–April 1994).
J. Appl. Signal Process. 2001(4): 239–248 (Dec. 2001).
50. K. C. Teh, A. C. Kot, and K. H. Li, Performance study of a
35. S. Roberts and M. Amin, Linear vs. bilinear time-frequency maximum-likelihood receiver for FFH/BFSK systems with
methods for interference mitigation in direct-sequence multitone jamming, IEEE Trans. Commun. 47: 766–772
spread-spectrum communication systems, Proc. Asilomar (May 1999).
Conf. Signals, Systems, and Computers, Pacific Grove, CA,
51. K. C. Teh, A. C. Kot, and K. H. Li, Performance analysis of
Nov. 1995.
an FFH/BFSK product-combining receiver under multitone
36. D. Wei, D. S. Harding, and A. C. Bovik, Interference rejection jamming, IEEE Trans. Vehic. Technol. 48: 1946–1953 (Nov.
in direct-sequence spread-spectrum communications using 1999).
the discrete Gabor transform, Proc. IEEE Digital Signal
Processing Workshop, Bryce Canyon, UT, Aug. 1998.
37. X. Ouyang and M. G. Amin, Short-time Fourier transform INTERLEAVERS FOR SERIAL AND PARALLEL
receiver for nonstationary interference excision in direct
CONCATENATED (TURBO) CODES
sequence spread spectrum communications, IEEE Trans.
Signal Process. 49(4): 851–863 (April 2001).
TOLGA M. DUMAN
38. B. S. Krongold, M. L. Kramer, K. Ramchandran, and D. L.
Arizona State University
Jones, Spread-spectrum interference suppression using
Tempe, Arizona
adaptive time-frequency tilings, Proc. IEEE Int. Conf.
Acoustics, Speech, and Signal Processing, Munich, Germany,
April 1997. 1. INTRODUCTION
39. A. Bultan and A. N. Akansu, A novel time-frequency exciser
in spread-spectrum communications for chirp-like interfer- Interleavers are commonly used devices in digital
ence, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal communication systems. Basically, an interleaver is used
Processing, Seattle, WA, May 1998. to reorder a block or a sequence of binary digits [1].
1142 INTERLEAVERS FOR SERIAL AND PARALLEL CONCATENATED (TURBO) CODES
Data Output
Channel Digital Digital Channel
Interleaver Channel Deinterleaver
encoder modulator demodulator decoder
m k information bits
Interleaver of
length mk
(n2,k ) linear
systematic m (n2 − k ) parity bits
Figure 2. Parallel concatenated block code via block code
an interleaver.
Traditionally interleavers have been employed in coded classic paper [2], Shannon showed that the codes with
digital communications, as shown in Fig. 1, to reorder the large blocklengths chosen randomly achieve the channel
coded bits in order to combat the burst errors that may capacity. However, such codes are in general very difficult
be caused by signal fading or different types of interfering to encode or decode. As a remedy, one can construct
signals. codes with large blocklengths by concatenating two simple
A simple way to reorder the coded bits for this purpose (short-blocklength) codes either in parallel or in series
is to use a block interleaver where the sequence is written using an interleaver.
to a matrix rowwise, and read columnwise. For example, if An example of a parallel concatenated code is shown in
the original sequence is {u1 , u2 , u3 , . . . , uN1 N2 }, the N1 × N2 Fig. 2 [3]. Two linear systematic block codes are used as
matrix component encoders: m k-bit blocks are input to the first
component encoder, which produces m(n1 − k) parity bits,
u1 u2 · · · uN1 and the interleaved version is input to the second encoder,
uN +1 uN1 +2 · · · u2N1
1 which produces m(n2 − k) parity bits. The information
u2N +1 u2N1 +2 · · · u3N1
1 bits together with the two sets of parity bits constitute the
··· ··· ··· ··· codeword corresponding to the original mk information
u(N2 −1)N1 +1 u(N2 −1)N1 +2 · · · uN1 N2 sequence. The overall code is an (m(n1 + n2 − k), mk)
systematic block code. Clearly, by proper selection of the
is constructed and read columnwise, resulting in the
parameter m, we can obtain a large-blocklength code.
interleaved sequence
Different code rates can be obtained by selecting the
component codes properly, and by puncturing some of
{u1 , uN1 +1 , u2N1 +1 , . . . , u(N2 −1)N1 +1 ,
the redundant bits produced by the component codes.
u2 , uN1 +2 , u2N1 +2 , . . . , uN1 N2 } Similarly, simple block codes can be concatenated
in series to construct codes with large blocklengths.
Clearly, the order in which the data are written or read An example is shown in Fig. 3. In this case, m k-
may change, resulting in different variations of the block bit blocks are input to an outer linear systematic
interleaver. block code that produces m p-bit coded blocks. These
A more current use of interleavers is in the construction coded bits are interleaved and then input to an inner
of channel codes having very large codelengths. In his encoder that generates m n-bit blocks that constitute
m p-bit m p-bit
m k-bit blocks (p,k ) linear (n,p) linear mn coded bits
blocks Interleaver blocks
systematic systematic
of length mp
block code block code
the overall codeword corresponding to the original since the early 1990s. In particular, long ‘‘randomlike’’
information sequence. The overall code is an (mn, mk) block codes with rather simple decoding algorithms are
linear systematic block code. constructed, and it is shown that their performance is
As will be demonstrated later, a simple block inter- very close to the Shannon limit on the AWGN channel.
leaver is seldom a good choice for constructing concate- To be specific, at a bit error rate of 10−5 , performance
nated codes because they fail to eliminate problematic within 1 dB of the channel capacity is common. Let us now
error events because of their inherent structure. There- describe these codes in more detail.
fore, we need to develop methods of generating good
interleavers for code concatenation. 2.1. Parallel Concatenated Convolutional Codes
The article is organized as follows. In Section 2, we The idea in Turbo coding is to concatenate two recursive
briefly describe Turbo codes together with the iterative systematic convolutional codes in parallel via an inter-
decoding algorithms and explain the role of the interleaver. leaver as shown in Fig. 4. The information sequence is
We summarize the results on the interleaver gain for divided into blocks of a certain length. The input of the
serial and parallel concatenated convolutional codes in first encoder is the information block, and the input of the
Section 3. We devote Section 4 to review of various second encoder is an interleaved version of the information
promising interleaver design techniques and present block. The encoded sequence (codeword) corresponding to
several examples. We specifically focus on the use of that information sequence is then the information block
Turbo codes over AWGN (additive white Gaussian noise) itself, the first parity block, and the second parity block.
channels. Finally, we conclude in Section 5. The block diagram shown in Fig. 4 assumes a rate- 13 Turbo
code with 57 (in octal notation) component convolutional
2. TURBO CODES codes.1 As in the case of parallel concatenated block codes,
higher rate codes can be obtained by puncturing some of
While concatenation of simple block codes with an the parity bits.
interleaver is a viable solution for constructing large- The main role of the interleaver in this construction
blocklength codes to approach the Shannon limit, encoding is to help the code imitate a ‘‘random’’ code with a large
and decoding of such codes is difficult since block codes blocklength.
rarely admit simple (soft-decision) maximum-likelihood
2.2. Serially Concatenated Convolutional Codes
decoding. Therefore, it is desirable to design concatenated
codes using simple convolutional codes as the building Convolutional codes can also be concatenated in series
blocks. via an interleaver to construct powerful error-correcting
Parallel or serial concatenation of convolutional codes
via an interleaver (i.e., Turbo codes [4,5]), coupled with 1 The term p/q convolutional code is used to indicate the places
a suboptimal iterative decoder, has proved to be one of of the feedforward and feedback connections in the convolutional
the most important developments in the coding literature code in octal notation.
Systematic block
Information
block
Interleaver
(Turbo) codes. In this case, the information sequence is decoders, one for each convolutional code, that can
divided into blocks, and each block is encoded using produce soft information about the transmitted bits.
an outer convolutional code. Then, the encoded block At each iteration step, one of the decoders takes the
is scrambled using an interleaver and passed to the systematic information (directly from the observation
inner convolutional code. Both inner and outer codes are of the systematic part) and the extrinsic loglikelihood
usually selected to be systematic. In order to obtain a information produced by the other decoder in the previous
good performance, the inner code needs to be recursive, iteration step, and computes its new extrinsic loglikelihood
whereas the outer code may be selected as a feedback–free information. The newly produced extrinsic information
convolutional code [5]. The block diagram of the encoder is is ideally independent of the systematic information
shown in Fig. 5. and the extrinsic information of the other decoder. The
Similarly, the interleaver is employed to generate a block diagram of the iterative decoding algorithm is
‘‘randomlike’’ long-blocklength code. presented in Fig. 6, where the extrinsic information of
the jth component decoder about the information bit dk is
2.3. Iterative Decoding denoted by Lje (dk ) and its interleaved version by Lje (dk ),
Assume that the parallel or serial concatenated code j = 1, 2. The updated extrinsic information is fed into the
is used for transmission over an AWGN channel. other decoder for the next iteration step. The extrinsic
Because of the usage of the interleaver, performing information of both decoders is initialized to zero before the
maximum-likelihood decoding is very difficult. In general, iterations start. After a number of iterations, the algorithm
one has to consider all the possible codewords (there converges and the decision on each transmitted bit is made
are 2N possibilities, where N is the length of the based on its ‘‘total’’ likelihood (i.e., sum of two extrinsic
information block), compute the squared Euclidean information terms and the systematic loglikelihood).
distance corresponding to each one, and declare the one For serial concatenated convolutional codes, a similar
with the lowest distance as the transmitted codeword. iterative (suboptimal) decoder is employed. As in the
Even for short interleavers, this is a tedious task and case of parallel concatenation, there are two component
cannot be done in practice. Fortunately, there is a decoders for the inner and the outer code, which can
suboptimal iterative decoding algorithm that achieves be implemented using a soft-input/soft-output decoder.
near-optimal performance. However, in this case the information exchanged between
Consider the case of parallel concatenation. The the component decoders include the information about the
iterative decoding algorithm is based on two component parity bits of the outer code along with the systematic
Systematic Parity
info. info.
Deinter- Inter-
leaver leaver
Second
decoder
L′2e (dk ) L′1e (dk )
Systematic Parity
info. info.
Figure 6. Block diagram of the iterative Turbo decoder
for parallel concatenated convolutional codes. L′ : Interleaved likelihoods
INTERLEAVERS FOR SERIAL AND PARALLEL CONCATENATED (TURBO) CODES 1145
bits. Therefore, the soft-in/soft-out algorithms suitable for are employed, there is no interleaver gain. Therefore the
component decoders of parallel concatenated codes should use of the recursive codes is critical.
be modified accordingly. For serially concatenated convolutional codes [5], the
outer code may be selected recursive or nonrecursive,
whereas the inner code must be selected to be a recursive
3. UNIFORM INTERLEAVER AND INTERLEAVER GAIN convolutional code to make sure that an interleaver gain
is observed. Then, under the assumption of uniform
−do /2
interleaving, the interleaver gain is given by N f ,
A uniform interleaver devised by Benedetto and Mon- o
where df is the free distance of the outer convolutional
torsi [6] is a probabilistic device that takes on any one
code.
of the possible N! interleavers with equal probability. It
It is important to emphasize that concatenation of
is used primarily in analysis of Turbo codes, and can
block codes in parallel or series does not result in any
be employed to determine the interleaver gains provided
interleaver gain that increases with interleaver size [5,6].
by the parallel and serial concatenation of convolutional
Therefore, using convolutional codes as component codes
codes. The performance predicted by the uniform inter-
is advantageous.
leaver is an average over the ensemble of all possible
interleavers, and therefore, an interleaver selected at ran- 4. INTERLEAVER DESIGN FOR PARALLEL
dom is expected to achieve this performance. Another CONCATENATED CONVOLUTIONAL CODES
interpretation is that there exists an interleaver whose
performance is at least as good as the performance pre- Mathematically, an interleaver of size N is a one-to-one
dicted by the uniform interleaver. mapping from the set of integers {1, 2, . . . , N} to the same
It is shown [7] that if two recursive convolutional set. Let π(·) denote this mapping; then π(i) = j means that
codes are concatenated in parallel, the average number the ith bit of the original sequence is placed to the jth
of lowest-weight error events for the overall code (under position after interleaving.
the assumption of uniform interleaving) is proportional A very simple but effective way of obtaining interleavers
to 1/N, where N is the interleaver length. If we for Turbo codes is to select them in a pseudorandom
consider the union bound on the bit error probability, manner. To construct such interleavers, one can easily pick
the most important terms of the bound decays with 1/N random integers from {1, 2, . . . , N} without replacement.
providing an interleaver gain. Hence, the performance If the ith number picked is j, then π(i) = j. Pseudorandom
of the code improves with the interleaver length (for interleavers are shown to perform well for Turbo codes.
maximum-likelihood decoding). It is also important to For example, Fig. 7 shows the bit error rate versus signal-
note that if nonrecursive component convolutional codes to-noise ratio (SNR) obtained by a rate- 12 (obtained by
10−1
10−2
Bit error probability
10−3
N = 500
10−4
N = 1000
Waterfall region
10−5
10−6
0 0.5 1 1.5 2 2.5 3 3.5
SNR per bit in dB
puncturing half of the parity bits) parallel concatenated Among the larger input weight information sequences,
Turbo code with 57 component codes. However, as we will the most problematic ones are some of the weight 2
see shortly, the Turbo code performance can be improved information sequences. Clearly, if the input polynomial
considerably by proper interleaver design techniques. is divisible with G(D), then the parity sequence produced
Before going into the details of interleaver design, it is by the component code terminates, resulting in a low-
worthwhile to mention a basic asymptotic result. Turbo weight error event. For example, for the code described
codes are linear; therefore the weight distribution deter- in Fig. 4, the feedback polynomial is 1 + D + D2 , and
mines the distance spectrum of the code. Khandani has any input polynomial of the form Dj (1 + D3l ) corresponds
shown [8] that the weight of the systematic bitstream, to a ‘‘bad’’ weight 2 error sequence. In general, if the
and the two parity bitstreams have Gaussian distribu- degree of the feedback polynomial is m, then it divides
tion for large interleaver sizes, and that the correlation the input polynomials of the form Dj (1 + Dkl ) for some k
coefficients between these three bitstreams are nonneg- with k ≤ 2m − 1. If the feedback polynomial is primitive,
ative and go to zero as N → ∞ for almost any random then the lowest-degree polynomial that is divisible by the
m
interleaver. For proper interleaver design, it is desired feedback polynomial is 1 + D2 −1 . Therefore, as a side note,
that these correlation coefficients be as small as possible, we emphasize that in order to reduce the number of ‘‘bad’’
and asymptotically this is already satisfied by a random error events, the feedback polynomial is usually selected
interleaver. Therefore, for large interleaver sizes a ran- to be primitive.
domly selected interleaver is as good as it gets; that is, The information weight 2 sequences are the most
there are no interleavers that will perform significantly difficult to break down because the percentage of the
better than the average over the ensemble of all possible self-terminating weight 2 sequences is much larger than
interleavers. that for the larger input weight sequences. For example, it
However, Khandani’s result is only an asymptotic is easy to see that with uniform interleaving the number
result, and for practical block sizes, it is important to of error events with input sequence weight 2 drops only
design the interleavers properly. There are two basic with 1/N, whereas the number of higher input weight
approaches to accomplish this goal. One is to attack the error events with low overall Hamming weights reduce
weight distribution of the overall Turbo code assuming with 1/N l , where l ≥ 2. Therefore, asymptotically, the
maximum-likelihood decoding, and the other is to design performance of the turbo code is determined by the error
the interleavers by considering the suboptimal iterative events of information weight two.
decoding algorithms. We consider these two approaches Figure 7 illustrates the bit error rate versus SNR for
separately. a typical Turbo code. From the figure two regimes are
apparent: the waterfall region and the error floor region.
4.1. Interleaver Design Based on Distance Spectrum The waterfall region is due to the ‘‘thin’’ distance spectrum
of the Turbo code, and the error floor is due to the fact that
The recursive convolutional code used to construct the
the minimum distance of the Turbo code is usually small
Turbo code is simply a division circuit in a Galois field
caused by ‘‘bad’’ weight 2 information sequences.
GF(2). Let us denote its transfer function by F(D)/G(D).
Interleaver design based on the distance spectrum of
Turbo codes are linear block codes; thus let us assume
the overall Turbo code is concerned mainly with lowering
that the all-zero codeword is transmitted over an AWGN
the error floor present by attacking the problematic lower
channel. The possible error sequences corresponding to
information weight sequences, in particular sequences
this transmission are all the nonzero codewords of the
with information weight 2.
Turbo code. Consider a codeword corresponding to a weight
1 information block. Since the component encoders are
4.1.1. Block Interleaver. A simple block interleaver is
selected as recursive convolutional encoders, the parity
not suitable for use in Turbo codes as there are certain
sequences corresponding to this information sequence will
‘‘bad’’ information sequences that cannot be broken down.
not terminate until the end of the block is reached. With
For example, consider the information sequence written
a good selection of the interleaver, if the single 1 occurs
rowwise
toward the end of the input sequence, it will occur toward
the beginning of the block for the other component encoder. 0 ··· 0 0 0 0 0 0 ··· 0
Therefore, the codewords with information weight 1 will · ··· · · · · · · ··· ·
have a large parity weight, hence a large total weight, · ··· · · · · · · ··· ·
provided the interleavers that map the bits toward the 0 ··· 0 1 0 0 1 0 ··· 0
end of the original sequence too close to the end of the 0 ··· 0 0 0 0 0 0 ··· 0
interleaved sequence are avoided. 0 ··· 0 0 0 0 0 0 ··· 0
For larger information weight sequences, the premise 0 ··· 0 1 0 0 1 0 ··· 0
is that the interleaver ‘‘breaks down’’ the ‘‘bad’’ sequences. · ··· · · · · · · ··· ·
In other words, if the information block results in a · ··· · · · · · · ··· ·
lower-weight parity sequence corresponding to one of the 0 ··· 0 0 0 0 0 0 ··· 0
encoders, it will have a larger weight parity sequence
corresponding to the other one. Therefore, most of the and read columnwise. The pattern formed by four 1s shown
codewords will have large Hamming weights, or the in boldface cannot be broken down by the block interleaver
average distance spectrum will be ‘‘thinned’’ [9], and the since both the original sequence and the interleaved
code will perform well over an AWGN channel. version contain two terminating weight 2 error patterns
INTERLEAVERS FOR SERIAL AND PARALLEL CONCATENATED (TURBO) CODES 1147
(for the 57 convolutional code) regardless of the size of for Turbo codes is the ‘‘bad’’ weight 2 information
the interleaver. Since we have to consider all possible sequences. If the two 1s resulting in the terminating
binary N-tuples, there are many information sequences parity sequence are close to each other in both the original
containing the same problematic pattern. Clearly, for any sequence and the interleaved version, both sequences
component code, there always exists similar problematic result in low parity weights, and thus the overall codeword
weight 4 information sequences. has a low Hamming weight. In order to increase the
Another important information sequence that cannot Hamming distance and thus reduce the error floor, an s-
be broken down is the one with a single 1 at the end of the random interleaver [11] ensures that any pair of positions
block. that are close to each other in the original sequence are
Although the block interleavers are well suited for separated by more than a preselected value s (called
coded communications over bursty channels (e.g., fading ‘‘spread’’) in the interleaved version. More precisely,
channels), or breaking up bursts errors due to an we want
inner code decoded using the Viterbi algorithm in a max{|i − j|, |π(i) − π(j)|} > s
concatenated coding scheme, they are seldom appropriate i,j
10−3
Reverse block
interleaver
Simulation of reverse block interleaver;
BER
10−5 N = 196
Random interleaver
Random interleaver
N = 1,000
10−6 N = 10,000
the most problematic weight 2 information sequences that The overall cost function is formed as the sum of these
cause the error floor are accommodated. We will give a elementary cost functions. Finally, since for the parallel
specific example in the following section and compare it concatenation, both the actual information sequence and
with other interleaver design methods. its interleaved version are inputs to convolutional codes,
We note that block √interleavers can be )used to achieve the cost function is defined for the inverse interleaver
a good spread, up to N as opposed to N/2, which is as well. The sum of the two are considered for the
achieved with the s-random interleavers. However, with overall interleaver design. Extensions of these cost
block interleaving, the higher-weight sequences become functions for designing interleavers for other channels
very problematic and the overall interleaver does not are straightforward by considering the new appropriate
perform very well. pairwise probability of error expressions.
In addition to the weight 2 input sequences, we Now that a cost function is defined, and a method
can consider larger input weight sequences to improve of increasing the interleaver size by one is described,
the interleaver structure. For instance, Fragouli and the details of the interleaver growth algorithm can
Wesel [12] consider the multiple error events and be presented. We start with a small-size interleaver
ensure that the interleavers that map the problematic optimized with respect to the cost function defined by an
information sequences with multiple error events to exhaustive search over all interleavers. Then, we extend
another problematic sequence are avoided. It is shown this interleaver by appending the ‘‘best’’ prefix (with
that such interleavers, when concatenated with a respect to the cost function defined) to the transposition
higher-order modulation scheme, perform well compared vector that describes the original permutation. Since
to the s-random interleaver. However, there is no there are only a limited number of possible prefixes to
significant improvement when they are used for binary consider, this step is simple to implement. We repeat this
communications over an AWGN channel. process until an interleaver of a desired size is obtained.
Another approach [13] is to consider ‘‘bad’’ [i.e., divisible We simply are looking for a ‘‘greedy’’ solution to find
by G(D)] information sequences of higher weights, and a good interleaver. Clearly, the process is not optimal;
make sure that the interleaver does not map such however, experiments show that it results in very good
sequences to another similar sequence. Simulations show interleavers. We also note that the overall algorithm has
that such interleavers perform slightly better than do the only polynomial complexity.
s-random interleavers. In Fig. 9, we present the performance of the Turbo
‘‘Swap’’ interleavers are considered in another code with 57 component codes with three different
study [14], where the interleaver is initialized to a block interleavers of size N = 160, that is, with the interleaver
interleaver, and two randomly selected positions are designed using the algorithm in Ref. 15, the reverse block
swapped. If the swapping results in violation of the spread interleaver and the s-random interleaver. We observe that
constraint, the modification is rejected, and a new pair the interleaver designed using the iterative interleaver
of positions are selected. After a number of iterations the growth algorithm performs significantly better than the
interleaver is finalized. With swap interleavers, a slight others. In particular, it is ∼0.3 dB better than the
performance improvement over the s-random interleaver reverse block interleaver and ∼0.5 dB better than the
is observed (in the order of 0.05 dB). s-random interleaver at a bit error rate of 10−6 . Also
shown in the figure is the ML (maximum-likelihood)
4.1.4. Iterative Interleaver Growth Algorithms. Danesh- decoding bound for a uniform interleaver (i.e., the average
garan and Mondin [15,16] have proposed and studied performance of all the possible interleavers). We observe
the idea of iterative interleaver growth algorithms in that the interleavers designed outperform the average
a systematic way. The main idea is to argue that we significantly, particularly, in the error floor region.
should be able to construct good interleavers of size
n + 1 from good interleavers of size n. They exploit the 4.1.5. Other Deterministic Interleavers. Several other
algebraic structure of interleavers and represent any deterministic interleaver design algorithms are proposed
interleaver with an equivalent ‘‘transposition vector’’ of in the literature with varying levels of success [e.g.,
size N that defines the permutation precisely. They prove 17–20].
what is called the ‘‘prefix symbol substitution property,’’
4.2. Interleaver Design Based on Iterative Decoder
which intuitively states that if a new transposition
Performance
vector is obtained by appending an index to an existing
transposition vector, the new vector defines a permutation The previous interleaver design algorithms described
of one larger size, and it is very closely related to the old improve the distance spectrum of the turbo code by
one. For details, see Ref. 15. avoiding or reducing the number of problematic low-
They then define a cost function that is closely input-weight error sequences. They consider the maximum
related to the bit error rate that is the ultimate likelihood decoding performance and inherently assume
performance measure. Problematic error sequences are that the performance of the suboptimal iterative decoding
identified using the component code structure, and employed will be close to the optimal decoder. Another
elementary cost functions for each error pattern are approach to interleaver design is to consider the
defined. The elementary cost functions are directly related performance of the iterative decoder as proposed by
to the pairwise error probability that corresponds to Hokfelt et al. [21], and design interleavers according to
the particular error pattern over an AWGN channel. their suitability for iterative decoding.
INTERLEAVERS FOR SERIAL AND PARALLEL CONCATENATED (TURBO) CODES 1149
−1
−2
−3
−4
log10(Pb)
−5
−6
Figure 9. Performance of the Daneshgaran–
Mondin (DM) interleaver (simulated points
−7
shown by ‘‘o’’), the s-random (SR) interleaver
(simulated points shown by ‘‘*’’), and the reverse
−8 block (RB) interleaver with N = 169 (simulated
points shown by ‘‘x’’). The asymptotic BER
curves are shown by dashed lines; at 5 dB,
−9
the uppermost curve is for SR, the middle one
is for RB, and the lowermost one is for DM.
−10 The union bound for the uniform interleaver is
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
shown by the solid line. (From Daneshgaran and
SNR in dB Mondin [15], 1999 IEEE.)
Consider the iterative decoding algorithm summarized The interleaver is designed as follows [21]. Starting with
in Section 2.3 (Fig. 6). Hokfelt et al. [21] showed that i = 1, the ith entry of the interleaver is selected as
the extrinsic information produced at the output of the
component decoders correlates with the systematic input. π(i) = argmin e−c(|π(m)−j|+|i−m|)
Let us denote the jth systematic information input to the j m
lth component decoder with xj,l , j = 1, 2, . . . , N, l = 1, 2.
The extrinsic information produced by the lth decoder where the summation is performed over predefined
about the ith bit is denoted by Lle,i . interleaver elements, and the minimization is over all
Empirical results show that after the first decoding permissible positions to choose from.
step (first half of the iteration), the correlation coefficient We note that this interleaver design technique tries
between the ith extrinsic information produced by the to minimize the number of short cycle error events that
decoder and the jth systematic input can be approximated deteriorate the iterative decoder performance due to the
by [21] excessive correlation of the extrinsic information. We also
ae−c|i−j| if i = j note that the interleaver designed in this manner competes
ρLe,i ,xj =
(1)
0 if i = j very well with the s-random interleaver. In particular, the
iterative decoding algorithm converges faster, and BER
where i, j = 1, 2, . . . , N, and the subscript l denoting the performance improves in the order of 0.1 dB.
component decoder index is suppressed. The constants a Sadjadpour et al. [22] combined the two basic inter-
and c depend on the particular component code selected, leaver design approaches to improve the interleaver
and can be computed using simulations. structure; that is, both the distance spectrum of the
Similarly, after the second iteration step, these code and the correlation of the extrinsic information are
correlations can be approximated by considered. In addition to the iterative decoding suitabil-
ity criterion of Hokfelt et al. [21], Sadjadpour et al. [22]
1 −c|π(j)−i| 1 2 −c(|π −1 (m)−j|+|i−m|)
N
propose another related criterion that tries to ensure
ρL(2)e,i ,xj = ae + a e that the extrinsic information is uniformly correlated
2 2 m=1
m=i with the input, and that the ‘‘power’’ of the correla-
tion coefficients (sum of squares) is minimized. Then, the
where π −1 denotes the inverse of the interleaver. interleaver is designed in two steps: (1) an s-random inter-
The nonzero correlation between the systematic input leaver is selected and then (2) all the low-weight error
and the extrinsic information to the component decoders sequences that result in a lower weight than a predeter-
deteriorates the performance of the iterative decoder. mined value — typically, problematic weight 2 information
With this interleaver design approach, the objective of sequences — are considered. On the basis of these prob-
the interleaver is to make the nearby extrinsic inputs as lematic sequences, the interleaver is updated by using
uncorrelated to each other as possible, or to ensure that the the iterative decoding suitability criterion. This oper-
extrinsic information at the output of the second decoder ation is continued until all the problematic sequences
is uniformly correlated with the systematic information. are considered. Finally, step 2 is repeated with the new
1150 INTERLEAVERS FOR SERIAL AND PARALLEL CONCATENATED (TURBO) CODES
interleaver until there are no low-weight error patterns and therefore is easy to implement. The authors present
with a minimum distance less than the preselected value. an example design and show that much larger mini-
Clearly, this procedure may not converge if the desired mum distances can be obtained compared to codes that
minimum distance of the overall code is selected to be employ random or s-random interleaving. Therefore, the
too large. error floor of the code designed in this manner is much
Examples presented by Sadjadpour et al. [22] indicate smaller.
that this two-step s-random interleaver design technique Another natural approach is to design the interleaver
has a better performance than that of the s-random based on the suboptimal iterative decoding algorithm. In
interleaver, typically in the error floor region. particular, as in the case of parallel concatenation, one can
select the interleaver to make sure that the short cycles
5. INTERLEAVER DESIGN FOR SERIALLY are avoided. However, at this point, this approach has not
CONCATENATED CONVOLUTIONAL CODES been explored.
The roots of cryptographic research can be traced decrypts the message digest using his/her private key
to William F. Friedman’s report Index of Coincidence and then computes the hash value of the message. If
and Its Applications [3], and Edward H. Hebern’s rotor the computed hash value is the same as the decrypted
machine [4] in 1918. In 1948 Claude Shannon presented one, the message integrity is considered to be preserved
his work on the communication theory of secrecy systems during transmission. However, this is not an absolute
in the Bell System Technical Journal [5]. In early 1970s guarantee since a hash collision is possible (i.e., a modified
work by Horst Feistel from IBM Watson Laboratory led or fabricated message may have the same hash value
the first U.S. Data Encryption Standard (DES) [6]. In as the original. Furthermore, the hash function is public
DES both parties must share the same secret key of so that the attacker can intercept a message, modify it,
56 bits before communication begins. However, Michael and compute a new hash value for the modified message.
Weiner showed that exhaustive search can be used to find Thus, it would be a good idea to encrypt the message as
any DES key [7]. More recently, the National Institute well or use a message authentication code (MAC), which
of Standards and Technology (NIST) has selected the is a one-way function with a key.
Advanced Encryption Standard (AES), the successor to
the venerable DES. AES was invented by Joan Daemen 1.1.3. Nonreputiation. In order to prove the source of
and Vincent Rijmen [8]. a message, one-way functions called digital signatures can
be used in conjunction with public key cryptography. To
1.1.1. Confidentiality. Encryption is a function E that stamp a message with its digital signature, the sender
takes plaintext message M as input and produces the encrypts the message digest with its private key. The
encrypted ciphertext C of M: E(M) = C. Decryption is receiver first decrypts the message using the sender’s
the function for the reverse process: D(C) = M. Note that public key and then computes the message digest to
D(E(M)) = M. In modern cryptography, a key K is used compare it to the one that arrives with the message.
for encryption and decryption so that DK (EK (M)) = M.
Cryptographic algorithms, based on using a single key, 1.1.4. Authentication. To prevent an attacker from
(i.e., both encryption and decryption are done by the impersonating a legitimate party, digital certificates are
same key) are called symmetric ciphers and they have two used. At the beginning of a secure Internet session, sender
drawbacks: (1) an arrangement must be made to ensure transmits its digital certificate to have his/her identity
that two parties have the same key prior to communicate to be verified. Digital certificates may be issued in a
with each other and (2) the number of keys required for hierarchical way for distributed administration. A digital
a complete communication mash for an n party network certificate may follow the ITU standard X.509 [11,12] and
is O(n2 ). Although a trusted third party such as a key is issued by a certificate authority (CA) as a part of
distribution center (KDC) can be used to circumvent these public key infrastructure (PKI). The certificate contains
two problems, it requires that KDC must be available in the sender’s public key, the certificate serial number and
real-time to initiate a communication. validity period, and the sender’s and the CA’s domain
Public key cryptography proposed by Whitfield Diffie names. CA must ensure integrity of the issued certificate;
and Martin Hellman in 1975 [9] is based on asymmetric thus it may encrypt the hash value of it using its private
ciphers. In such systems, the encryption key K1 is different key and append it to the certificate.
from the decryption key K2 so that DK2 (EK1 (M)) = M.
The encryption key K1 is called the public key, and it is 1.2. Cryptography and Security
not secret. The second key K2 is called the private key, Although cryptography provides the building blocks, there
and it is kept confidential. The best known public key is much to consider for Internet security. First, the rapid
crypto system, proposed by Ronald Rivest, Adi Shamir, growth of the Internet increases its heterogeneity and
and Leonard Adleman (RSA) [10]. Although the public complexity. A communication channel between a pair
key ciphers reduce the number of keys to O(n), they also of users may pass through different network elements
suffer two problems: (1) key size is much larger than in running diverse protocols. Most of these protocols are
symmetric systems and (2) the encryption and decryption designed with performance considerations and carry
are much slower. These two issues become problematic in design and implementation holes from security point of
a bandwidth processing-constrained environments such view. For example, consider the Anonymous File Transfer
as wireless networks. Thus, the main use of public Protocol (FTP) [13], which provides one of the most
key systems is limited to distribution of symmetric important services in the Internet for distribution of
cipher keys. information. There are several problems with FTP and
its variants. For example, the ftpd deamon runs with
1.1.2. Integrity. Cryptographic solutions for integrity superuser privileges for password and login processing.
protection are based on one-way hash functions. A hash Thus, leaving a sensitive file such as the password file
function is a one-way function that is easy to compute in the anonymous FTP site will be a serious security
but significantly harder to compute in reverse (e.g., ax gap. Another example is the Transport Control Protocol
mod n). It takes a variable-length input and produces a (TCP) [14], which provides a connection between a pair
fixed-length hash value (also known as ‘‘message digest’’). of users. In TCP each connection is identified with
In general it works as follows. The sender computes the a 4-tuple: <source (local) host IP address, local port
hash value of the message, encrypts it using the receiver’s number, destination (remote) host IP address ID, remote
public key, and appends it to the message. The receiver port number>. Since the same 4-tuple can be reused,
INTERNET SECURITY 1153
the sequence numbers are used to detect the lingering 3. NETWORK-LAYER SECURITY
packets from the previous uses of the tuple. There is
a potential threat here since an attacker can ‘‘guess’’ The Internet is composed of many independent manage-
the initial sequence number and convince the remote ment domains called autonomous systems (ASes). Internet
host that it is communicating with a trusted host. This routing algorithms within an AS and among ASes are
is known as a sequence number attack [15]. A remedy different. Most important intra-AS routing (interior rout-
for this attack would be to hide the target host behind ing) and inter-AS (exterior routing) protocols are the Open
a dedicated gateway called a firewall to prevent direct Shortest Paths First (OSPF), and Border Gateway Pro-
connections [16]. tocol (BGP), respectively. However, the common theme
Also, the network software is hierarchical and layered. in these protocols is the exchange of routing informa-
At each layer a different protocol is in charge and interacts tion to converge in a stable routing state. However,
with the protocols in the adjacent layers. In the following because of the lack of scalable authenticity check, rout-
sections we will examine layer-specific security concerns. ing information exchanged between the peers is subject
to attacks. The attacker can eavesdrop, modify, and rein-
ject the exchanged messages. Most of these attacks can
2. LINK-LAYER SECURITY be addressed by deployment of public key infrastructures
(PKI) and certificates for authentication and validation
of messages. For example, Kent et al. [18] discuss how
Link-layer security issues in local-area networks (LANs) to secure BGP protocol using PKI with X.509 certifi-
and wide-area networks (WANs) are fundamentally cates, and IPsec protocol suite. The solution proposes to
different. In a WAN link-layer security requires that end use a new BGP path attribute to ensure the authen-
point of each link is secure and equipped with encryption ticity and integrity of BGP messages and validate the
devices. Although institutions such as the military have source of UPDATE messages. However, if a legitimate
been using link-layer encryption, it is not feasible in router is compromised, then such cryptographic mech-
the Internet. anisms cannot be sufficient and the security problem
In a LAN hosts share the same communication medium degenerates to the Byzantine agreement problem [19] in
that has (in general) a broadcast nature (i.e., transmission distributed computing.
from one node can be received by all others on the Next we discuss the IPsec protocols for securing IP-
same LAN). Thus eavesdropping is easy and, to ensure based intranets and then review the security issues in
confidentiality encryption, is required. For example, in a ATM networks.
LAN it is better to use the SSH protocol [17] instead of
Telnet to avoid compromising passwords. 3.1. IP Security:IPsec
The suite of IPsec protocols are designed to provide
2.1. Access Control security for Internet Protocol version 4 (IPv4) and version
6 (IPv6) [20]. IPsec offers access control, connectionless
Typically, a LAN connects hosts who are in the integrity, source authentication, and confidentiality. IPsec
same security or administrative domain. While allowing defines two headers that are placed after the IP header and
legitimate user accessing to a LAN from outside (via a before the header of layer 4 protocols (i.e., TCP or UDP).
dialup modem, or a DSL, or a cable modem), it is crucial These headers are used in two traffic security protocols: the
to prevent unauthorized access. Firewalls are dedicated authentication header (AH) and the encapsulating security
gateways used for access control and can be grouped payload (ESP). AH is recommended when confidentiality
into three classes [16]: packet filter gateways, circuit- is not required, while ESP provides optional encryption.
level gateways, and application-layer gateways. Packet They both ensure integrity and authentication using tools
filters operate by selectively dropping packets based on such as keyed hash functions. Both AH and ESP use a
source address, destination address, or port number. In simplex connection called a security association (SA). An
a firewall the security policies can be specified in a table SA is uniquely identified by a triple that contains a security
that contains the filtering rules to which the incoming or parameter index (SPI), a destination IP address, and an
outgoing packets are subject. For example, all outgoing identifier for the security protocol (i.e., AH or ESP). The
mail traffic can be permitted to pass through a firewall negotiation of security association between two entities
while Telnet requests from a list of hosts can be dropped. and exchange of keys can be done by using the Internet
Filtering rules must be managed carefully to prevent Key Exchange (IKE) protocol [22]. Conceptually, an SA
loopholes in a firewall. In case of multiple firewalls is a virtual tunnel based on encapsulation. Two types
managed by the same security domain, it is crucial to of SAs are defined in the standard: transport mode, and
eliminate inconsistencies between the rules. tunnel mode. The former is a security association between
Application- and circuit-level gateways are firewalls two hosts, while the latter is established between network
that can secure the usage of a particular application elements. Thus, in the transport mode the security protocol
by screening the commands. Logically it resides in the header comes right after the IP header and encapsulates
middle of a protocol exchange and ensures that only valid any higher-level protocols. In the tunnel mode there is an
commands are sent. For example, it may monitor a FTP outer header and an inner IP header. The outer header
session to ensure that only a specific file is accessed with specifies the next hop, while the inner header indicates
read-only permission. the final destination. The security protocol header in the
1154 INTERNET SECURITY
tunnel mode is inserted after the outer IP header and Internet browsers and servers by Netscape Corporation.
before the inner one, thus protecting the inner header. Protocols such as HTTP run over SSL to provide
There are successful attacks on IPsec in spite of secure connections. The Transport Layer Security (TLS)
the secure ciphers used by the protocol. For example, protocol [25] is expected to become the standard for secure
consider the cut-and-paste attack by Bellovin [21]. In this client/server applications over the Internet. TLS v.1.0
attack, an encrypted ciphertext from a packet carrying is based on SSL v.3.0 and considered to be SSL v.3.1.
sensitive (targeted) information is cut and pasted into the Both SSL and TLS provide encryption, authentication,
ciphertext of another packet. The objective is to trick the and integrity protection over a public network. They are
receiver to decrypt the modified ciphertext and reveal the composed of two subprotocols (layers): the Record Protocol
information. and the Handshake Protocol. The Record Protocol is at
the lowest layer and resides above a reliable transport
3.2. ATM Security layer protocol such as TCP. It provides encryption
Asynchronous transfer mode (ATM) technology is based using symmetric cryptography (e.g., DES), and message
on establishing switched or permanent virtual circuits integrity check using a keyed MAC. The Handshake
(SVCs, PVCs) to transmit fixed-size (53-byte) cells. There Protocol is used to agree on cryptographic algorithms,
are no standards for ATM security, and work is in progress to establish a set of keys to be used by the ciphers, and
at the ATM Forum [23]. Next we review some of the to authenticate the client. The Handshake Protocol starts
security threats inherent in the architecture. All the with a ClientHello message sent to a server by a client. This
cells carrying the same VPI/VCI (virtual path identifier, message contains a random number, version information,
virtual connection identifier, respectively) are carried encryption algorithms that the client supports, and a
on the same virtual channel. Thus, eavesdropping and session ID. The server sends back a ServerHello message
integrity violation attacks can be mounted to all the that also contains random data, session ID, and indicates
cells of a connection from a single point. In particular selected cipher. In addition, the server sends a Certificate
the cells carrying signaling information can be used to message that contains the server’s RSA public key in an
identify communicating parties. For example, capturing X.509 [11,12] certificate. The client verifies the server’s
CONNECT or CALL PROCEEDING messages during public key, generates a 48-byte random number called the
signaling will reveal the VPI/VCIs assigned by the network premaster key, encrypts it using server’s public key, and
to a particular connection. Flooding network with SET UP sends it to the server. The client also computes a master
requests can be used to achieve denial-of-service attacks. secret and uses the master key to derive at a symmetric
Management cells can be abused to disrupt or disconnect session key. The server performs similar operations to
legitimate connections. For example, by tampering with compute a master key and a symmetric key. After the
AIS/FERF cells, the attacker can cause a connection to keys are installed to the record layer, the handshake is
be terminated. completed. Although SSL and TLS are similar there are
also important differences between them. For example,
3.2.1. IP over ATM. ATM networks has been deployed in SSL each message is transmitted with a new socket
in high-speed backbone as the switching plane for IP traffic while in TLS multiple messages can be transmitted over
using IP over ATM protocols. The IP over ATM suite brings the same socket. Security of the SSL protocol is well
security concerns in ATM networks, many of which are examined and reported to be sound, although there are
similar to those in IP networks; however, their remedies some easy-to-fix problems [26]. For example, unprotected
are more difficult. For example, firewalls and packet data structures (e.g., server key exchange) can be exploited
filters used for access control in IP networks will require to perform cryptographic attacks (e.g., ciphersuite rollback
termination of ATM connection, inducing large delays
attack [26]).
and overhead. Authentication between ATMARP (ATM
Address Resolution Protocol) server and hosts is a must 4.1. Multicast Security
for preventing various threads, including spoofing, denial-
of-service, and man-in-the-middle attacks. For example, Multicasting is a group communication with single
it is possible to send spoofed IP packets over an ATM or multiple sources and multiple receivers. It has
connection, if the ATM address of an ATMARP server considerably more challenging security concerns than does
is known. The attacker can first establish a virtual a single source–destination communication:
connection to the server and then use the IP address
of the victim to spoof the packets on this connection. 1. Message authentication and confidentiality in a
Since the server will reply back to the attacker using multicast group requires efficient key management
the same connection the victim may not even know the protocols. Establishing a different key between each
attack. Similarly, the attacker can use the IP addresses of pair of multicast members is not scalable. Thus,
victims to register them with the ATMARP server. Since most of the solutions focus on a shared key [27–29].
each IP address can be used only once, the victims will be However, a single-key approach is not sufficient to
denied service. authenticate the identity of a sender since it is
shared by all the members. Signing each message
4. TRANSPORT-LAYER SECURITY using a public key scheme is a costly solution; thus
MAC-based solutions have been proposed [30].
The Secure Sockets Layer (SSL) protocol [24] was 2. The membership dynamics (i.e., joining and leaving
developed to ensure secure communication between the a multicast group) requires efficient key revocation
INTERNET SECURITY 1155
Kerberos is an authentication service that allows users WLANs use RF technology to receive and transmit data
and services to authenticate themselves to each other [31]. in a local-area network domain. In contrast with a wired
It is typically used when a user on a network requests a LAN, a WLAN offers mobility and flexibility due to lack of
network service, and the server needs to ensure that the any fixed topology. IEEE 802.11 is the most widely adapted
user is a legitimate one. For example, it enables users to log standard for WLANs and it operates in the 2.4–2.48-
in to remote computers over the network without exposing GHz band.
their passwords to network packet-sniffing programs. There are several security vulnerabilities of a WLAN
User authentication is based on a ‘‘ticket’’ issued by the due to its nature: (1) any node within the transmission
Kerberos key distribution center (KDC), which has two range of the source can eavesdrop easily, (2) unsuccessful
modules: authentication server (AS) and ticket-granting attempts to access to a WLAN may be interpreted as a
server (TGS). Both the user and the server are required to high bit error rate (BER) — this misinterpretation can be
have keys registered with the AS. The user’s key is derived used to conceal an intruder’s unauthorized access attack
from a password that is seen by only the local machine; to a WLAN, and (3) the transmission medium is ‘‘shared’’
the server key is selected randomly. The authentication among the users. Thus, intentional interference (called
between a user u and server S has the following steps: ‘‘jamming’’) can be produced in a WLAN for denial of
service attacks.
1. The user u sends a message to the AS specifying the The spread-spectrum transmission technology helps
server S. countermeasure some of these problems in the WLANs.
In spread spectrum, a signal is spread over the channel
2. The AS produces two copies of a key called the session
using two different techniques: (1) the frequency-hopping
key to be used between u and S. AS encrypts one of
spread spectrum (FHSS), and (2) the direct-sequence
the session keys and the identity S of the server
spread spectrum (DSSS). An attacker must know the
using the user’s key. Similarly, it encrypts the other
hopping pattern in FHSS or the codewords in DSSS to
session key and identity of the user with the server
tune into the right frequency for eavesdropping. (Ironically
key. It sends both of the encrypted messages, called
these parameters are made public in the IEEE 802.11
the ‘‘tickets’’ (say, m1 and m2 , respectively) to u.
standard.) Additional help comes from sophisticated
3. u can decrypt m1 with its own key, extracting the network interface cards (NICs) of IEEE 802.11b devices.
session key and the identity of the server S. However, These cards can be equipped with a unique public and
u cannot decrypt m2 instead it timestamps a new private key pair, in addition to their unique address, to
message m3 (called the authenticator), encrypts it prevent unauthorized access to a WLAN.
with the session key and sends both m2 and m3 to S. IEEE 802.11 standard provides a security capability
4. S decrypts m2 with its own key to obtain the session called wired equivalent privacy (WEP). In WEP there is
key and the identity of user u. It then decrypts m3 a secret 40-bit or a 128-bit key that is shared between
with the session key to extract the timestamp in a wireless node and an access point. Communication
order to authenticate the identity of the user u. between a wireless station and its access point can
be encrypted using the key and RSA’s RC4 encryption
Following Step 4, all the communication between u algorithm. RC4 is a stream cipher with a variable key size
and S will be done using the session key. However, in and uses an initialization vector (IV). IV is used to produce
order to avoid performing all the steps above for each different ciphertexts for identical plaintexts by initializing
request, the TGS module in KDC issues a special ticket the shift registers with random bits. IV does not need to
called the ticket-granting ticket (TGT). TGT behaves like a be secret but it should be unique for each transmission.
temporary password, with a lifetime of several hours only, However, IEEE 802.11 does not enforce the uniqueness of
and all other tickets are obtained using TGT. IV. Thus, one potential problem with the WEP is the reuse
1156 INTERNET SECURITY
of IV, which may be exploited for cryptanalyzing and for Turkey, and M.S. and Ph.D. degrees in Computer Science,
fabricating new messages [37]. both from Columbia University, in 1987 and 1994,
respectively. He was a Member of Technical Staff at
6.2. Wireless Transport Layer (WTLS) the Bell Laboratories in Murray Hill, New Jersey during
1998–2001. Before joining to the Bell Laboratories in 1998,
The Wireless Transport Layer Security (WTLS) [38] proto-
he served as an Assistant Professor at Lehigh University
col provides authentication, privacy and integrity for the
and NJIT. His current research interests include quality
Wireless Application Protocol (WAP) [39]. The WTLS is
of service in the IP networks, wireless networks, and
based on TLS v.1.0 and takes into account the character-
Internet security. He has served on the Technical Program
istics of wireless world (e.g., low bandwidth, limited pro-
Committee of leading IEEE conferences and workshops.
cessing and power capacity, and connectionless datagram
Dr. Yener is a member of the IEEE and serves on
service). WTLS supports a rich set of cryptographic algo-
the editorial boards of the Computer Networks Journal
rithms. Confidentiality is provided by using block ciphers
and the IEEE Network Magazine. He is a member
such as DES CBC, integrity is ensured by SHA-1 [41]
of IEEE.
and MD5 [40] MAC algorithms, and the authentication is
checked by RSA and Diffie–Hellman-based key exchange
algorithms. WTLS does not contain any serious security BIBLIOGRAPHY
problems to force an architectural change. Nevertheless
there are several weak points of the protocol: (1) the
1. B. Schneier, Applied Cryptography, 2nd ed., Wiley, New York,
computation of initialization vector (IV) is not a secret,
1996.
(2) some fields in the data structures used by the protocol
are not protected (one example is the sequence numbering, 2. D. R. Stinson, Cryptography: Theory and Practice (Discrete
which enables an attacker to generate replay attacks), and Mathematics and Its Applications), Chapman & Hall,
1995.
(3) the key size should be at least 56 bits since 40-bit keys
are not sufficient. 3. William F. Friedman, Index of Coincidence and Its Applica-
tions in Cryptography, Riverbank Publication 22, Riverbank
Labs., 1920, reprinted by Agean Park Press, 1976.
7. CONCLUSIONS
4. E. H. Hebern, Electronic coding machine, U.S. Patent
1,510,441,30.
Heterogeneity of the Internet requires a skillful integra-
tion of the cryptographic building blocks with protocols 5. C. E. Shannon, in N. J. A. Sloane and A. D. Wyner, eds.,
for ensuring end-to-end security. Thus, deployment of Collected Papers: Claude Elmwood Shannon, IEEE Press,
New York, 1993.
security in the Internet cannot be confined to a par-
ticular crypto algorithm or to a particular architecture. 6. H. Feistel, Cryptography and computer privacy, Sci. Am.
The limitations on the processing capability or band- 228(5): 15–23 (1973).
width forces sacrifices on the security (e.g., smaller 7. M. J. Weiner, Efficient DES key search, Proc. CRYPTO’93,
key sizes, IV reuse, CRC for integrity check). Some of 1993.
these problems can be addressed by efficient crypto- 8. J. Daemen and V. Rijmen, Rijndael Home Page (online)
graphic tools such as ECC, and some will disappear as http://www.esat.kueven.ac.be/rijmen/rijndael (March 26,
the technology improves. Many attacks exploit the way 2002).
protocols are designed and implemented, even if these 9. W. Diffie and M. E. Hellman, New directions in cryptography,
protocols may use very secure ciphers. Examples include IEEE Trans. Inform. Theory IT-22: 644–654 (1976).
unprotected fields in the data structures (e.g., SSL 3.0
10. R. Rivest, A. Shamir, and L. Adleman, A method for obtaining
server key exchange message) and lack of authentica-
digital signatures and public-key cryptosystems, Commun.
tion in ATMARP between server and client. Finally, ACM 21(2): 120–126 (1978).
the security problem in the Internet degenerates to the
11. ITU-T Recommendation X.509, Information Technology —
distributed consensus problem if network elements are
Open System Interconnection — The Directory: Authentication
compromised by the adversary. For example there is
Framework, 1997.
no easy way to check the ‘‘correctness’’ of a routing
exchange message if it is signed by a once-legitimate- 12. R. Housley, W. Ford, W. Polk, and D. Solo, Internet X.509
Public Key Infrastructure Certificate and CRL Profile, IETF
but-compromised router.
RFC 2459, 1999.
Thus the Internet will never be absolutely secure, and
creating a high cost–benefit tradeoff for the attacker, 13. J. Postel and J. Reynolds, File Transfer Protocol, IETF RFC
to reduce the incentive, will always remain a practical 959, 1985.
security measure. 14. J. Postel, Transmission Control Protocol, IETF RFC 791, 19.
15. S. M. Bellovin, Security Problems in the TCP/IP Proto-
BIOGRAPHY col Suite, ACM Comput. Commun. Rev. 19(2): 32–48
(1989).
Bulent Yener is an Associate Professor at the Computer 16. S. M. Bellovin and W. R. Cheswick, Firewalls and Internet
Science Department at Rensselaer Polytechnic Institute. Security, Addison-Wesley, New York, 1994.
Dr. Yener received B.S. and M.S. degrees in Industrial 17. T. Ylonen et al., SSH Protocol Architecture (online) draft-
Engineering from the Technical University of Istanbul, ietfsecsh-architecture-12.txt, 2002.
INTERSYMBOL INTERFERENCE IN DIGITAL COMMUNICATION SYSTEMS 1157
18. S. Kent, C. Lynn, and K. Seo, Secure Border Gateway INTERSYMBOL INTERFERENCE IN DIGITAL
Protocol (S-BGP), IEEE JSAC Network Security 18(34): COMMUNICATION SYSTEMS
582–592 (2000).
19. M. Raynal, Distributed Algorithms and Protocols, Wiley, New JOHN G. PROAKIS
York, 1988. Northeastern University
20. S. Kent and R. Atkinson, Security Architecture for the Internet Boston, Massachusetts
Protocol. IETF RFC 2401, 1998.
21. S. Bellovin, Problem areas for the IP security protocols, Proc. 1. INTRODUCTION
6th USENIX Security Symp. 1996, pp. 205–214.
22. D. Harkins and D. Carrel, The Internet Key Exchange, IETF Intersymbol interference arises in both wireline and wire-
RFC 2409, 1998. less communication systems when data are transmitted
at symbol rates approaching the Nyquist rate and the
23. ATM Forum. http://www.atmforum.org.
characteristics of the physical channels through which the
24. A. Frier, P. Karlton, and P. Koccher, The SSL3.0 Protocol Ver- data are transmitted are nonideal. Such interference can
sion 3.0, Netscape, 1996 (online) http://home.netscape.com/ severely limit the performance that can be achieved in dig-
eng/ssl3/ (March 26, 2002). ital data transmission, where performance is measured in
25. T. Dierks and C. Allen, The TLS Protocol Version 1.0, IETF terms of the data transmission rate and the resulting prob-
RFC 2246, 1999. ability of error in recovering the data from the received
26. D. Wagner and B. Schneier, Analysis of the SSL 3.0 channel corrupted signal.
protocol, Proc. 2nd USENIX Workshop on Electronic The degree to which one must be concerned with chan-
Commerce, USENIX Press, 1996, pp. 29–40 (online) nel impairments generally depends on the transmission
http://citeseer.nj.nec.com/article/wagner96analysi.html rate of data through the channel. If R is the transmis-
(March 26, 2002). sion bit rate and W is the available channel bandwidth,
27. H. Harney and C. Muckenhirn, Group Key Management intersymbol interference (ISI) caused by channel distor-
Protocol (GKMP) Specification, IETF RFC 2093, 1997. tion generally arises when R/W > 1. Since bandwidth is
usually a precious commodity in communication systems,
28. H. Harney and C. Muckenhirn, Group Key Management
Protocol (GKMP) Architecture, IETF RFC 2094, 1997.
it is desirable to utilize the channel to as near its capac-
ity as possible. In such cases, the communication system
29. S. Mittra, Iolus: A framework for scalable secure multicast- designer must employ techniques that mitigate the effects
ing, Proc. ACM SIGCOMM’97, 1997.
of ISI caused by the channel.
30. R. Canetti et al., Multicast security: A taxonomy and efficient This article provides a characterization of ISI resulting
constructions, Proc. IEEE INFOCOM’99, 1999. from channel distortion.
31. J. Kohl and C. Neuman, The Kerberos Network Authentica-
tion Service (V5), IETF RFC 1510, 1993. 2. CHANNEL DISTORTION
32. A. Fiat and M. Naor, Broadcast encryption, Advances in
Cryptography — Crypto’92, 1995, Vol. 8, pp. 189–200. Wireline channels may be characterized as linear
33. D. M. Wallner, E. J. Harder, and R. C. Agee, Key Manage- time-invariant filters with specified frequency-response
ment for Multicast: Issues and Architectures, IETF RFC 2627, characteristics. If a channel is band-limited to W Hz, then
1999. its frequency response C(f ) = 0 for |f | > W. Within the
34. C. K. Wong and S. Lam, Digital signature for flows and
bandwidth of the channel, the frequency response C(f )
multicasts, Proc. IEEE ICNP’98’, 1998. may be expressed as
35. N. Koblitz, Elliptic curve cryptosystems, Math. Comput. 48: C(f ) = |C(f )|e jθ (f ) (1)
203–209 (1987).
36. Certicom White Paper, Current Public-Key Cryptographic where |C(f )| is the amplitude-response characteristic and
Systems, 1997 (online) http://www.certicom.com (March 26, θ (f ) is the phase-response characteristic. Furthermore, the
2002). envelope delay characteristic is defined as
37. N. Borisov, I. Goldberg, and D. Wagner, Intercepting mobile
communications: The insecurity of 802.11, Proc. Mobile 1 dθ (f )
τ (f ) = − (2)
Computing and Networking, 2001. 2π df
38. WAP Forum, Wireless Application Protocol — Wireless
A channel is said to be nondistorting or ideal if the
Transport Layer Security Specification version 1 (online)
http://www.wapforum.org (March 26, 2002).
amplitude response |C(f )| is constant for all |f | ≤ W and
θ (f ) is a linear function of frequency, that is, if τ (f ) is a
39. WAP Forum, Wireless Application Protocol (online) http: constant for all |f | ≤ W. On the other hand, if |C(f )| is not
//www.wapforum.org (March 26, 2002).
constant for all |f | ≤ W, we say that the channel distorts
40. R. Rivest, The MD5 Message-Digest Algorithm, IETF RFC the transmitted signal in amplitude, and, if τ (f ) is not
1321, 1992. constant for all |f | ≤ W, we say that the channel distorts
41. Federal Information Processing Standard Publication 180-1, the signal in delay.
1995 (online) http://www.itl.nist.gov/fibspubs/fip180-1.htm As a result of the amplitude and delay distortion
(March 26, 2002). caused by the nonideal channel frequency-response
1158 INTERSYMBOL INTERFERENCE IN DIGITAL COMMUNICATION SYSTEMS
Amplitude
W are smeared to the point that they are no longer 0.75 3
distinguishable as well-defined pulses at the receiving
0.5 2
terminal. Instead, they overlap, and, thus, we have
0.25 1
intersymbol interference. As an example of the effect of
delay distortion on a transmitted pulse, Fig. 1a illustrates 0 1000 2000 3000 0 1000 2000 3000
a band-limited pulse having zeros periodically spaced in Frequency (Hz) Frequency (Hz)
time at points labeled ±T, ±2T, and so on. If information
Figure 2. Average amplitude and delay characteristics of
is conveyed by the pulse amplitude, as in pulse amplitude
medium-range telephone channel.
modulation (PAM), for example, then one can transmit
a sequence of pulses each of which has a peak at the
periodic zeros of the other pulses. However, transmission
0.5
of the pulse through a channel modeled as having a linear
envelope delay characteristic τ (f ) [quadratic phase θ (f )] 0.4
results in the received pulse shown in Fig. 1b having 0.3
zero crossing that are no longer periodically spaced.
0.2
Consequently, a sequence of successive pulses would be
Amplitude
smeared into one another and the peaks of the pulses 0.1
would no longer be distinguishable. Thus, the channel 0
5 10
delay distortion results in intersymbol interference. −0.1
Another view of channel distortion is obtained by Time (ms)
considering the impulse response of a channel with −0.2
nonideal frequency response characteristics. For example, −0.3
Fig. 2 illustrates the average amplitude response |C(f )| −0.4
and the average envelope delay τ (f ) for a medium-
−0.5
range (180–725-mi) telephone channel of the switched
telecommunications network. It is observed that the Figure 3. Impulse response of medium-range telephone channel
usable band of the channel extends from ∼300 to with frequency response shown in Fig. 2.
∼3000 Hz. The corresponding impulse response of this
average channel is shown in Fig. 3. Its duration is about
10 ms. In comparison, the transmitted symbol rates on such a channel may be of the order of 2500 pulses
or symbols per second. Hence, intersymbol interference
might extend over 20–30 symbols.
Besides wireline channels, there are other physical
(a) channels that exhibit some form of time dispersion and,
thus, introduce intersymbol interference. Radio channels
such as shortwave ionospheric channels (HF), tropospheric
scatter channels, and mobile cellular radio channels are
examples of time-dispersive channels. In these channels,
time dispersion and, hence, intersymbol interference are
the result of multiple propagation paths with different
−5T −4T −3T −2T −T 0 T 2T 3T 4T 5T path delays. The number of paths and the relative time
delays among the paths vary with time, and, for this
reason, these radio channels are usually called time-
(b) variant multipath channels. The time-variant multipath
conditions give rise to a wide variety of frequency-
response characteristics and the resulting phenomenon
called signal fading.
In the following section, a mathematical model for
the intersymbol interference (ISI) is developed and
transmitted signal characteristics are described for
−5T −4T −3T −2T −T 0 T 2T 3T 4T 5T avoiding ISI.
3. CHARACTERIZATION OF INTERSYMBOL
INTERFERENCE
lowpass signal (prior to frequency conversion1 for trans- Now, if y(t) is sampled at times t = kT + τ0 , k = 0, 1, . . .,
mission over the bandpass channel) of the form we have
∞
∞
or, equivalently
where {In } represents the discrete information-bearing
sequence of symbols and g(t) is a modulation fil-
∞
ter pulse that, for the purposes of this discus- yk = In xk−n + vk , k = 0, 1, . . . (8)
sion, is assumed to have a band-limited frequency- n=0
response characteristic G(f ), specifically, G(f ) = 0 for
|f | > W. For PAM, the information-bearing sequence where τ0 is the transmission delay through the channel.
{In } consists of symbols taken from the alphabet The sample values can be expressed as
{±1, ±3, . . . , +(M − 1)} for M-level amplitude modula-
tion. In the case of PSK, the information-bearing ∞
1
sequence
{In } consists of symbols taken from * the alpha- y k = x 0 Ik + In xk−n + vk , k = 0, 1, . . . (9)
2π x0 n=0
bet e , θm =
jθm
m, m = 0, 1, . . . , M − 1 . QAM may be n=k
M
considered as a combined form of digital amplitude and We regard x0 as an arbitrary scale factor, which can be set
phase modulation, so that the sequence {In } takes values equal to unity for convenience. Then
of the form{Am e jθm , m = 0, 1, . . . , M − 1}.
The signal given by Eq. (3) is transmitted over a channel
∞
having a frequency response C(f ), also limited to |f | < W. y k = Ik + In xk−n + vk (10)
Consequently, the received signal can be represented as n=0
n=k
∞
The term Ik represents the desired information symbol at
r(t) = In h(t − nT) + z(t) (4)
the kth sampling instant, the term
n=0
∞
where ∞ In xk−n (11)
h(t) = g(τ )c(t − τ ) dτ (5) n=0
n=k
−∞
∞
y k = Ik + In xk−n + vk (13)
n=0
n=k
Optimum Nyquist [1] formulated and solved this problem in the late
sampling 1920s. He showed that a necessary and sufficient condition
time for zero ISI, given by Eq. (14) is that the Fourier transform
Sensitivity Distortion of
to timing zero crossings X(f ) of the signal pulse x(t) satisfy the condition
error
∞
X(f + m)T) = T (15)
m=−∞
the output of the receiver filter at the demodulator results to a series of ISI components that converges to a finite
in an infinite series of ISI components. Such a series is not value.
absolutely summable because of the 1/t rate of decay of Because of the smooth characteristics of the raised-
the pulse, and, hence, the sum of the resulting ISI does not cosine spectrum, it is possible to design practical filters
converge. Consequently, the signal pulse given by Eq. (17) for the transmitter and the receiver that approximate the
does not provide a practical solution to the signal design overall desired frequency response. In the special case
problem. where the channel is ideal, C(f ) = 1, |f | ≤ W, we have
By reducing the symbol rate 1/T to be slower than the
Nyquist rate, 1/T < 2W, there exists numerous choices for Xrc (f ) = GT (f )GR (f ) (20)
X(f ) that satisfy Eq. (17). A particular pulse spectrum that
has desirable spectral properties and has been widely used where GT (f )and GR (f ) are the frequency responses of the
in practice is the raised-cosine spectrum. The raised-cosine filters at the transmitter and the receiver, respectively. In
frequency characteristic is given as this case, if the receiver filter is matched to the transmitter
filter, we have Xrc (f ) = GT (f )GR (f ) = |GT (f )|2 . Ideally
0 ≤ |f | ≤
1−β )
T
GT (f ) = |Xrc (f )| e−j2π ft0 (21)
2T
T *
πT 1−β 1−β 1−β
Xrc (f ) = 1 + cos |f | − ≤ |f | ≤ and GR (f ) = G∗T (f ), where t0 is some nominal delay that is
2 β 2T 2T 2T
required to ensure physical realizability of the filter. Thus,
1+β the overall raised-cosine spectral characteristic is split
0 |f | >
2T evenly between the transmitting filter and the receiving
(18)
filter. Note also that an additional delay is necessary to
where β is called the rolloff factor and takes values in the
ensure the physical realizability of the receiving filter.
range 0 ≤ β ≤ 1. The bandwidth occupied by the signal
beyond the Nyquist frequency 1/2T is called the excess
5. CHANNEL DISTORTION AND ADAPTIVE
bandwidth and is usually expressed as a percentage of the
EQUALIZATION
Nyquist frequency. For example, when β = 12 , the excess
bandwidth is 50% and when β = 1, the excess bandwidth is Modems that are used for transmitting data either on
100%. The pulse x(t), having the raised-cosine spectrum, is wireline or wireless channels are designed to deal with
channel distortion, which usually differs from channel
sin(π t/T) cos(πβt/T)
x(t) = to channel. For example, a modem that is designed for
π t/T 1 − 4β 2 t2 /T 2 data transmission on the switched telephone network
(19) encounters a different channel response every time a
cos(πβt/T) telephone number is dialed, even if it is the same telephone
= sinc(π t/T)
1 − 4β 2 t2 /T 2 number. This is due to the fact that the route (circuit) from
the calling modem to the called modem will vary from one
Note that x(t) is normalized so that x(0) = 1. Figure 7 telephone call to another.
illustrates the raised-cosine spectral characteristics and Since the channel distortion is variable and unknown a
the corresponding pulses for β = 0, 21 , and 1. Note that priori, a modem contains an additional component, called
for β = 0, the pulse reduces to x(t) = sinc(π t/T), and the an adaptive equalizer, that follows the receiving filter
symbol rate 1/T = 2W. When β = 1, the symbol rate is GR (f ), which further processes the received signal samples
1/T = W. In general, the tails of x(t) decay as 1/t3 for {yk } to reduce the ISI. The most commonly used equalizer
β > 0. Consequently, a mistiming error in sampling leads is a linear, discrete-time, finite-impulse-response duration
(FIR) filter with coefficients that are adjusted to reduce the
ISI. With {yk } as its input, given by Eq. (10) and coefficients
(a) x(t )
{bk }, the equalizer output sequence is an estimate of the
b = 0; 0.5 transmitted symbols {Ik }
N
b=1 Îk = bm yk−m (22)
m=−N
b = 1 2T
0 T 3T 4T where 2N + 1 is the number of equalizer coefficients.
b=0 b = 0.5 When a call is initiated, a sequence {Ik } of training
symbols (known to the receiver) are transmitted. The
(b) x(f ) b=0 receiver compares the known training symbols with the
b = 0.5 sequence of estimates {Îk } from the equalizer and computes
the sequence of error signals
b=1
ek = Ik − Îk
−1 − 1 1 1 f
T 2T 2T T
N (23)
= Ik − bm yk−m
Figure 7. Signal Pulses with a Raised-Cosine Spectrum. m=−N
1162 INTERSYMBOL INTERFERENCE IN DIGITAL COMMUNICATION SYSTEMS
The equalizer coefficients {bk } are the adjusted to minimize MIT in 1961, and the Ph.D. from Harvard University
the mean (average) squared value of the error sequence. in 1967. He is an Adjunct Professor at the University
Thus, the equalizer adapts to the channel characteristics of California at San Diego and a Professor Emeritus
by adjusting its coefficients to reduce ISI. at Northeastern University. He was a faculty member
The effect of the linear adaptive equalizer in com- at Northeastern University from 1969 through 1998
pensating for the channel distortion can also be viewed and held the following academic positions: Associate
in the frequency domain. The transmitter filter GT (f ), Professor of Electrical Engineering, 1969–1976; Professor
the channel frequency response C(f ), the receiver filter of Electrical Engineering, 1976–1998; Associate Dean of
GR (f ), and the equalizer B(f ) are basically liner filters the College of Engineering and Director of the Graduate
in cascade. Hence, their overall frequency response is School of Engineering, 1982–1984; Interim Dean of the
GT (f )C(f )GR (f )B(f ). If the cascade GT (f )GR (f ) is designed College of Engineering, 1992–1993; Chairman of the
for zero ISI as in Eq. (20), then by designing the equal- Department of Electrical and Computer Engineering,
izer frequency response B(f ) such that B(f ) = 1/C(f ), the 1984–1997. Prior to joining Northeastern University,
ISI is eliminated. In this case, the adaptive equalizer has he worked at GTE Laboratories and the MIT Lincoln
a frequency response that is the inverse of the channel Laboratory.
response. If the adaptive equalizer had an infinite num- His professional experience and interests are in the
ber of coefficients (infinite-duration impulse response), its general areas of digital communications and digital sig-
frequency response would be exactly equal to the inverse nal processing and more specifically, in adaptive filtering,
channel characteristic and, thus, the ISI would be com- adaptive communication systems and adaptive equaliza-
pletely eliminated. However, with a finite number of tion techniques, communication through fading multipath
coefficients, the equalizer response can only approximately channels, radar detection, signal parameter estimation,
equal the inverse channel response. Consequently, some communication systems modeling and simulation, opti-
residual ISI will always exist at the output of the equalizer. mization techniques, and statistical analysis. He is active
For more detailed treatments of adaptive equalization, the in research in the areas of digital communications and
interested reader may refer to Refs. 2–4. digital signal processing and has taught undergraduate
and graduate courses in communications, circuit anal-
6. CONCLUDING REMARKS ysis, control systems, probability, stochastic processes,
discrete systems, and digital signal processing. He is the
author of the book Digital Communications (McGraw-
The treatment of ISI in this article was based on
the premise that the signal transmitted through the Hill, New York: 1983, first edition; 1989, second edition;
channel was carried on a single carrier frequency. The 1995, third edition; 2001, fourth edition), and co-author
effect of channel distortion and, hence, ISI can be of the books Introduction to Digital Signal Processing
significantly reduced if a channel with bandwidth W (Macmillan, New York: 1988, first edition; 1992, second
is subdivided into a large number of subchannels, say, edition; 1996, third edition), Digital Signal Processing
N, where each subchannel has a bandwidth B = W/N. Laboratory (Prentice-Hall, Englewood Cliffs, NJ, 1991);
Modulated carrier signals are then transmitted in all Advanced Digital Signal Processing (Macmillan, New
the N subchannels. In this manner, each information York, 1992), Algorithms for Statistical Signal Processing
symbol on each subchannel has a symbol duration of (Prentice-Hall, Englewood Cliffs, NJ, 2002), Discrete-Time
NT, where T is the symbol duration in a single carrier Processing of Speech Signals (Macmillan, New York, 1992,
modulated signal. The modulation of all the carriers in IEEE Press, New York, 2000), Communication Systems
the N subchannels is performed synchronously. This type Engineering (Prentice-Hall, Englewood Cliffs, NJ: 1994,
of multicarrier modulation is called orthogonal frequency- first edition; 2002, second edition), Digital Signal Process-
division multiplexing (OFDM). If the bandwidth B of each ing Using MATLAB V.4 (Brooks/Cole-Thomson Learning,
subchannel is made sufficiently small, each subchannel Boston, 1997, 2000), and Contemporary Communication
will have (approximately) an ideal frequency-response Systems Using MATLAB (Brooks/Cole-Thomson Learning,
characteristic. Hence, the ISI in each subchannel becomes Boston, 1998, 2000). Dr. Proakis is a Fellow of the IEEE.
insignificant. However, OFDM is particularly vulnerable He holds five patents and has published over 150 papers.
to interference among the subcarriers (interchannel
interference) due to frequency offsets or Doppler frequency
spread effects, which are present when either the BIBLIOGRAPHY
transmitter and/or the receiver terminals are mobile.
ISI in digital communications systems is treated
numerous publications in the technical literature and 1. H. Nyquist, Certain topics in telegraph transmission theory,
various textbooks. The interested reader may consult AIEE Trans 47: 617–644 (1928).
Refs. 2–4 for more detailed treatments. 2. J. G. Proakis, Digital Communications, 4th ed., McGraw-Hill,
New York, 2001.
BIOGRAPHY 3. E. A. Lee and D. G. Messerschmitt, Digital Communications,
2nd ed., Kluwer, Boston, 1994.
Dr. John G. Proakis received the B.S.E.E. from the 4. R. W. Lucky, J. Salz, and E. J. Weldon, Jr., Principles of Data
University of Cincinnati in 1959, the M.S.E.E. from Communications, McGraw-Hall, New York, 1968.
INTRANETS AND EXTRANETS 1163
Users Everyone Members of the specific firm Group of closely related firms
Information Fragmented Proprietary Shared in closely trusted
held circles
Access Public Private Semi-private
Security mechanism None Firewall, encryption Intelligent firewall,
encryption, various
document security
standards
1164 INTRANETS AND EXTRANETS
• Training programs or other educational material that software and data content resources will be administered
companies could develop and share for different levels of accessibility and security. This can be
achieved with the help of tools for network management.
2. INTRANET’S HARDWARE AND SOFTWARE 2.1. Extranets and Intergroupware
The intranet’s hardware consists of the client/server Workgroups may often need special tools for communica-
network. The clients are computers that are connected tions, collaboration, and coordination, which are referred
either to the LAN or to the WAN. The servers are high- as groupware [4]. Emphasis is on computer-aided help
performance computers with a large hard disk capacity to implement message-based human communications and
and RAM. information sharing as well as to support typical work-
The network operating system is the software required group tasks such as scheduling and routing of the work-
to run the network and share its resources (printers, flow tasks.
files, applications, etc.) among the users. There are In the typical enterprise communications the inter-
three basic software packages needed for a corporate and intraorganizational media-based communications
intranet: Web software, firewall software, and browser activities are forming two separate parallel planes [4]. The
software. Web software allows the server to support dimensions of each plane are the degree of structure and
HTTP (Hyper Text Transfer Protocol) so it can exchange the degree of mutuality in the communications activities.
information with the clients. Firewall software provides Structure lies between informal (ad hoc) and formally
the security needed to protect corporate information from structured, defined, and managed or edited processes.
the outside world. Browser software allows the user to Mutuality ranges from unidirectional or sequential
read electronic documents published on the Internet or back-and-forth message passing, to true joint work or
corporate intranet. collaborative transactions in a shared space of information.
Other functions can be provided by adding software for The resulting four regions characterize four different
Internet access and searching, authoring and publishing kinds of interaction, which place distinct demands on
documents, collaboration and conferencing, database their media vehicles or tools. Separate tools have been
archive and retrieval, and document management access. developed in each region, usually as isolated solutions, but
A properly constructed intranet allows for the following the need to apply them to the wide spectrum of electronic
to be done effectively [1]: centralized storage of data documents and in coordinated form is causing them to
(real-time and archived), scalability and management of converge. Thus, we can describe groupware in terms of the
application services, centralized control over access to following categories representing three of the mentioned
knowledge, decentralized management of knowledge, and four regions. These are [4]:
universal access to the information that decision makers
have deemed it appropriate to view. • Communications or messaging (notably E-mail)
These of established hardware and software tech- • Collaboration or conferencing (notably forums or
nologies for intranet and extranet infrastructure makes ‘‘bulletin board’’ systems that organize messages into
intranets and extranets very economical in comparison topical ‘‘threads’’ of group discussion, maintained in
with the creation and maintenance of proprietary net- a shared database)
works. The following main approaches are used for build- • Coordination or workflow and transactions (apply-
ing extranet software and inevitably are linked to the ing predefined rules to automatically process and
backing industrial groups or corporations: route messages)
• Crossware: Netscape, Oracle, and Sun Microsystems Figure 1 shows the distinct forms of electronic media
have formed an alliance to ensure that their extranet supported by the groupware, the kinds of interactions they
products can work together by standardizing on support, and how they are converging [4]. Intergroupware
JavaScript and the Common Object Request Broker is just groupware applied with the flexibility to support
Architecture (CORBA). multiple interacting groups, which may be open or closed,
• Microsoft supports the Point-to-Point Tunneling and which may share communications selectively, as
Protocol (PPTP) and is working with American appropriate (as in an extranet). It should be noted that
Express and other companies on an Open Buying Lotus Notes is one of the approaches to implement
on the Internet (OBI) standard. intergroupware, and its success is based on the recognition
• The Lotus Corporation is promoting its groupware of the fact that while these mentioned categories
product, Notes, as well suited for extranet use. have distinct characteristics, they can only be served
effectively by a unified platform that allows them to
interact seamlessly.
The first approach (Crossware) is oriented to development
of open application standards, while the other two are more
2.2. Open Application Standards for Creating Extranets
focused on a ‘‘one company’s product’’ type of philosophy.
Thus, intranets and extranets are rather classes of Broad use of the Internet technology in general and
applications than categories of networks. Applications in development of extranets in particular is now supported by
the public Internet, intranet, and extranet will all run the existence of the open application standards that offer
on the same type of network infrastructure, but their a range of features and functionality across all client and
INTRANETS AND EXTRANETS 1165
Web/ Transaction/
publishing workflow
Messaging/ Collaboration/
E-mail bulletin board
Transaction/ Structured/
Web/ managed
publishing workflow
Messaging/ Collaboration/
E-mail bulletin board
Ad-hoc
Figure 1. Relations between extranets
Communication/messaging Collaboration/ and intergroupware. (Source: Teleshuttle
(directional/sequential) transaction (joint) Corporation, (4), 1997).
and strong authentication capabilities throughout the EDI INT provides a set of recommendations and
extranet. Key benefits include: guidelines that combine existing EDI standards for
transmission of transaction data with the Internet protocol
• Users can search for contact information across suite. By using S/MIME and digital signatures, EDI
enterprises, partners, and customers using the transactions between enterprises can be exchanged in a
same interface and protocols as internal corporate secure and standard fashion.
directories. Thus, open standards provide the most flexible, effi-
• A standard format for storage and exchange of cient, and effective foundation for enterprise networking.
X.509 digital certificates allows single-user logon
applications and secure exchange of documents and 3. NETWORK SECURITY ISSUES
information via S/MIME.
• Replication over open LDAP protocol allows secure Exposing of the organization’s private data and net-
distribution of directory data between enterprises. working infrastructure to the Internet crackers definitely
• Enables extranet applications that rely on fast and increases concerns of the network administrators about
flexible queries of structure information. the security of their networks [5]. It has been very well
expressed that ‘‘security should not be a reason for
X.509 v3 digital certificates provide a secure container avoiding cyberspace, but any corporation that remains
of validated and digitally signed information. They offer amateurish about security is asking for trouble’’ [6].
strong authentication between parties, content, or devices Each user who needs external access has a unique set
on a network including secure servers, firewalls, email, of computing and data requirements, and the solution
and payment systems. They are a foundation for the secu- will not and should not be the same for all [7]. Six basic
rity in S/MIME, object signing, and Electronic Document extranet components can be identified as well as some
Interchange over the Internet (EDI INT). Digital certifi- specialized and hybrid solutions [7]: (1) access to the exter-
cates can be limited to operate within an intranet or they nal resources; (2) Internet protocol (IP) address filtering;
can operate between enterprises with public certificates (3) authentication servers; (4) application layer manage-
coissued by the company and a certification authority ment; (5) proxy servers; and (6) encryption services. Each
such as VeriSign. Certificates surpass passwords in pro- is sufficient to initiate business communications, but each
viding strong security by authenticating identity, verifying carries different performance, cost, and security. Namely,
message and content integrity, ensuring privacy, autho- security (as recent statistics shows [8]) is the main issue
rizing access, authorizing transactions, and supporting preventing organizations from establishing extranets.
nonrepudiation. The term security in general refers to techniques for
Key benefits include: ensuring that data stored in a computer cannot be read
or compromised. Obvious security measures involve data
• Digital certificates eliminate cumbersome login encryption and passwords. Data encryption by definition
and password dialog boxes when connecting to is the translation of data into a form that is unintelligible
secure resources. without a deciphering mechanism. A password, obviously,
is a secret word or phrase that gives a user access to a
• Each party can be confident of the other’s identity.
particular program or system. Thus, in the rest of this
• Digital certificates ensure that only the intended section we focus on the following issues: risk assessment;
recipient can read messages sent. development of the security policy; and establishment of
• Sophisticated access privileges and permissions can the authentication, authorization, and encryption.
be built in, creating precise levels of authority for
Internet transactions. 3.1. Risk Assessment
S/MIME message transmission uses certificate-based Risk assessment procedures should answer the following
authentication and encryption to transmit messages typical questions:
between users and applications. S/MIME enables the
exchange of confidential information without concerns • What are the organization’s most valuable intellec-
about inappropriate access. tual and network assets?
vCards provide a structured format for exchanging • Where do these assets reside?
personal contact information with other users and • What is the risk if they are subjected to unautho-
applications, eliminating the need to retype personal rized access?
information repeatedly. • How much damage could be done — can it be
JavaSoft’s signed objects allow trusted distribution and estimated in terms of money?
execution of software applications and applets as part of an
• Which protocols are involved?
extranet. With signed objects, tasks can be automated and
access to applications and services within the extended
3.2. Security Policy
network can be granted based on capability. A digital
certificate is used with a signed object to authenticate To provide the required level of protection, an organization
the identity of the publisher and grant appropriate access needs a security policy to prevent unauthorized users from
rights to the object. accessing resources on the private network and to protect
INTRANETS AND EXTRANETS 1167
against the unauthorized export of private information. desktop, workgroup, team, project, application, divi-
Even if an organization is not connected to the Internet, it sion, and enterprise level.
may still want to establish an internal security policy to 4. Writing the policy.
manage user access to portions of the network and protect 5. Development of a procedure for revisiting the policy
sensitive or secret information. According to the FBI, 80% as changes are made.
of all break-ins are internal.
6. Writing an implementation plan.
Policy is the allocation, revocation, and management of
permission as a network resource to define who gets access 7. Implementation of the policy.
to what [9]. Rules and policy should be set by business
managers, the chief information officer, and a security 3.3. Authentication, Authorization, Encryption
specialist — someone who understands policy writing and
When it comes to the security aspects of teleworking and
the impact of security decisions. Network managers can
remote access, there is a tension between the goals of
define policy for a given resource by creating an entry in
the participants in all aspects of information technology
access control lists, which are two-dimensional tables that
security. Users want access to information as quickly and
map users to resources.
as easily as possible, whereas information owners want to
A firewall is an implementation of access rules, which
make sure that users can access only the information they
are an articulation of the company’s security policy.
are allowed to access. Security professionals often find it
It is important to make sure that some particular
difficult to reduce this tension because of the demands of
firewall supports all the necessary protocols. If LANs
users in a rapidly changing business world. Smartcards
are segmented along departmental lines, firewalls can
may provide the solution to this problem [10].
be set up at the departmental level. However, multiple
Encryption requires introduction of the key man-
departments often share a LAN. In this case, the creation
agement/updating procedure. Encryption can be imple-
of virtual private network (VPN) for each person is
mented:
highly advisable.
The following are recognized as basic steps for
developing a security policy: • At the application: Examples of this are Pretty Good
Privacy (PGP) and Secure Multipurpose Internet
Mail Extensions (S/MIME), which provide encryption
1. Assessment of the types of risks to the data will
for email.
help to identify weak spots. After correction, regular
assessments will help to determine the ongoing • At the client or host network layer: The advantage of
security of the environment. this approach is that it will provide extra protection
for the hosts that will be in place even if there
2. Identification of the vulnerabilities in the system and
is no firewall or if it is compromised. The other
possible responses, including operating system vul-
advantage is that it allows distribution of the burden
nerabilities, vulnerabilities via clients and modems,
of processing the encryption among the individual
internal vulnerabilities, packet sniffing vulnerabili-
hosts involved. This can be done at the client with
ties, and means to test these vulnerabilities. Possible
products such as Netlock (see [11]), which provides
responses include encrypting data and authenticat-
encryption on multiple operating system platforms
ing users via passwords and biometrically.
at the IP level. A system can be set up so that it will
3. Analysis of the needs of user communities: accept only encrypted communications with certain
• Grouping data in categories and determining hosts. There are similar approaches from Netmanage
access needs. Access rights make the most sense and FTP Software.
on a project basis. • At the firewall network layer: The advantage to
• Determining the time of day, day of week, and this approach is that there is centralized control of
duration of access per individual are the most encryption, which can be set up based on IP address
typical procedures. or port filter. It can cause a processing burden on
Determination and assignment of the security levels the firewall, especially if many streams must be
can include the following, five levels: encrypted or decrypted. Many firewalls come with
• Level one for top-secret data such as pre-released a feature called virtual private network (VPN). VPNs
quarterly financials or a pharmaceutical firm’s allows encryption to take place as data leave the
product formula database firewall. It must be decrypted at a firewall on the
other end before it is sent to the receiving host.
• Level two for highly sensitive data such as the
inventory positions at a retailer • At the link level: The hardware in this case is
dedicated solely to the encryption process, thus off-
• Level three for data covered by nondisclosure
loading the burden from a firewall or router. The
agreements such as six month product plans
other advantage of this method is that the whole
• Level four for key internal documents such as a stream is encrypted, without even a clue as to the
letter from the CEO to the staff IP addresses of the devices communicating. This
• Level five for public domain information can be used only on a point-to-point link, as the IP
To implement this security hierarchy, it is recom- header would not be intact, which would be necessary
mended that firewalls be placed at the personal for routing.
1168 INTRANETS AND EXTRANETS
Products such as those manufactured by Cylink [12] It is a current trend to outsource such VPN applications
can encrypt data after they leave the firewall or router to a commercial service provider (e.g., regional ISP, top-
connected to a WAN link. tier network access provider, public carrier, or a service
Extranet routers combine the functions of a VPN server, provider specializing in managed security services). In
an encryption device, and a row address strobe [13]. such outsourced VPNs, the service provider is responsible
The benefits of using the extranet routers include the for VPN configuration (provisioning) and monitoring.
network’s ability to build secure VPNs and tunnel Service providers may locate their VPN devices at
corporate Internet protocol traffic across public networks customer premises or at the carrier’s POP.
like the Internet. Management is much easier than it is in a
multivendor, multidevice setup. Expenses are significantly 3.5. Use of Tunnels for Enabling Secure VPNs
lower because there is no need for leased lines.
Secure VPN applications are supported by secure,
network-to-network, host-to-network, or host-to-host tun-
3.4. Secure Virtual Private Networks nels — virtual point-to-point connections. VPN tunnels
The goal of all VPN products is to enable deployment of may offer three important security services:
logical networks, independent of physical topology [14].
That is the ‘‘virtual’’ part — allowing a geographically • Authentication to prove the identity of tunnel
distributed group of hosts to interact and be managed endpoints
as a single network, extending the user dynamics of LAN • Encryption to prevent eavesdropping or copying of
or workgroup without concern as to physical location. sensitive information transferred through the tunnel
Some interpret private simply as a ‘‘closed’’ user • Integrity checks to ensure that data are not changed
group — any virtual network with controlled access. This in transit
definition lets the term VPN fit a wide variety of
carrier services, from traditional frame relay and ATM Tunnels can exist at several protocol layers.
networks to emerging MPLS-based networks. Others Layer 2 tunnels carry point-to-point data link (PPP)
interpret private as ‘‘secure’’ virtual networks that provide connections between tunnel endpoints in remote access
confidentiality, message integrity, and authentication VPNs. In a compulsory mode, an ISP’s network access
among participating users and hosts. We focus on server intercepts a corporate user’s PPP connections and
secure VPNs. tunnels these to the corporate network. In a voluntary
VPN topologies try to satisfy one of the three mode, VPN tunnels extend all the way across the public
applications. network, from dial-up client to corporate network. Two
layer 2 tunneling protocols are commonly used:
1. Site-to-site connectivity. Private workgroups can
be provided with secure site-to-site connectivity, • The point-to-point tunnel protocol (PPTP) provides
even when LANs that comprise that workgroup authenticated, encrypted access from Windows desk-
are physically distributed throughout a corporate tops to Microsoft or third-party remote access servers.
network or campus. Intranet services can be offered • The IETF standard Layer 2 Tunneling Protocol
to entire LANs or to a select set of authorized hosts (L2TP) also provides authenticated tunneling, in
on several LANs, for example, allowing accounting compulsory and voluntary modes. However, L2TP
users to securely access a payroll server over network by itself does not provide message integrity or
segments that are not secured. In site-to-site VPNs, confidentiality. To do so, it must be combined
dedicated site-to-site WAN links are replaced by with IPsec.
shared network infrastructure — a service provider
network or the public Internet. Layer 3 tunnels provide IP-based virtual connections.
2. Remote access. Low-cost, ubiquitous, and secure In this approach, normal IP packets are routed between
remote access can be offered by using the public tunnel endpoints that are separated by any intervening
Internet to connect teleworkers and mobile users network topology. Tunneled packets are wrapped inside
to the corporate intranet, thus forming a Virtual IETF-defined headers that provide message integrity
Private Dial Network (VPDN). In remote access and confidentiality. These IP security (IPsec) protocol
VPNs, privately operated dial access servers are extensions, together with the Internet Key Exchange
again replaced by shared infrastructure — the dial (IKE), can be used with many authentication and
access servers at any convenient ISP POP. encryption algorithms (e.g., MD5, SHA1, DES, 3DES). In
3. Extranet or business-to-business services. Site-to- site-to-site VPNs, a security gateway — an IPsec-enabled
site and remote access topologies can also be router, firewall, or appliance — tunnels IP from one LAN
used to provide business partners and customers to another. In remote access VPNs, dial-up clients tunnel
with economical access to extranet or business- IP to security gateways, gaining access to the private
to-business (B2B) services. In extranet and B2B network behind the gateway.
VPNs, a shared infrastructure and the associated Companies with ‘‘email only’’ or ‘‘web only’’ security
soft provisioning of sites and users makes it possible requirements may consider other alternatives, such as
to respond more rapidly and cost-effectively to Secure Shell (SSH, or SecSH). SSH was originally
changing business relationships. developed as a secure replacement for UNIX ‘‘r’’ commands
INTRANETS AND EXTRANETS 1169
(rsh, rcp, and rlogin) and is often used for remote them. Additional information for various platforms can be
system administration. But SSH can actually forward obtained in Refs. 15–19.
any application protocol over an authenticated, encrypted
client–server connection. For example, SSH clients can 4. COST OF RUNNING WEBSITES
securely forward POP and SMTP to a mail server that
is running SSH. SSH can be an inexpensive method Evolutionary scale of Websites suggested by the Positive
of providing trusted users with secure remote access to Support Review Inc. of Santa Monica, California [20]
a single application, but it does require installing SSH includes the following:
client software.
A far more ubiquitous alternative is Netscape’s Secure 1. Promotional: A site focused on a particular product,
Sockets Layer (SSL). Because SSL is supported by every service, or company. Cost: $300,000 to $400,000
web browser today, it can be used to secure HTTP per year (17–20% on hardware and software,
without adding client software. SSL evolved into IETF- 5–10% on marketing, and the balance on content
standard Transport Layer Security (TLS), used to ‘‘add and servicing).
security’’ application protocols like POP, SMTP, IMAP,
2. Knowledge-based: A site that publishes information
and Telnet. Both SSL and TLS provide digital certificate
that is updated constantly. Cost: $1 to $1.5 million
authentication and confidentiality. In most cases, SSL
annually (20–22% on hardware and software,
clients (e.g., browsers) authenticate SSL servers (e.g.,
20–25% on marketing, and 55–60% on content
e-Commerce sites). This is sometimes followed by server-
and servicing).
to-client subauthentication (e.g., user login). SSL or
TLS can be a simple, inexpensive alternative for secure 3. Transaction-based: A site that allows surfers to shop,
remote access to a single application (e.g., secure extranet to receive customer services, or to process orders.
‘‘portals’’). Cost: $3 million per year (20–24% on hardware and
Table 2 shows a comparison of these alternatives. software, 30–35% on marketing, and 45–50% on
content and servicing).
Combining approaches is also possible and, to satisfy
some security policies, absolutely necessary. L2TP does
A similar classification by Zona Research Inc. of Redwood
not provide message integrity or confidentiality. Standard
City, California (cited in Ref. 21) divides Websites into:
IPsec does not provide user-level authentication. The
Windows 2000 VPN client layers L2TP on top of IPsec to
1. Static presence (‘‘Screaming and Yelling’’). According
satisfy both of these secure remote access requirements.
to Zona Research, the page cost for such sites is less
On the other hand, vanilla IPsec is more appropriate for
than $5,000. At present, the over whelming majority
site-to-site VPNs, and SSL is often the simplest secure
of Websites belong to this category.
extranet solution.
2. Interactive (‘‘Business Processes and Data Support’’),
with page costs ranging from $5,000 to $30,000.
3.6. Summary of the Weak Points and Security Hazards
Perhaps 15 to 20% of all current Websites are in
Table 2 provides a summary of the weak points in the this category.
system security, identifies and shows how these problems 3. Strategic (‘‘Large-Scale Commerce’’), with dynamic
can be addressed, and suggests technical solutions for pages that cost more than $30,000 each to produce
and maintain. Currently, less than 0.5% of all sales per hour. Hours of operation are 24 hours a day,
Websites are in this category. seven days a week, 365 days a year.
In actuality, the line managers of the site, not the
Thus, electronic commerce sites are not toys, and before IT staff, should calculate the costs of downtime. This
entering these WWWaters organization must develop a information often is not forthcoming, however. So, this
clear idea about current and strategic investments to this example can be presented to give a general sense of the
part of business. impact that downtime has on the company’s finances as
well as a guideline for estimation of the cost of outages.
4.1. ISDN Cost Model The cost of outages in the hypothetical network with
an availability rate of 99.9% is about $500,000 a year (see
The mobile workers should look at ISDN as an important
Table 3 to Table 5). If hardware and software necessary
technology for building extranet infrastructure because of
to do the job have already been bought, this estimate can
its ability to support flexible access. The ISDN cost model
be considered as a guideline for the additional budget to
should consider the following:
be spent on providing redundancy. This is separate and
apart from the funds required to provide a base level of
• Service fees from local telecom for each location network functionality. Thus, it really is not worth rushing
(installation + monthly + per minute) headlong into designing a fault-tolerant network unless
• Long-distance charges, if applicable all parties agree on all the implications that downtime has
• Cost of equipment (NT1 + TA, NT1 + bridges, NT1 on the operation.
+ routers, etc.)
• Cost of Internet Service Provider (ISP) services,
if applicable Table 3. Weak Points and Security Hazards
Weak Point/Hazard Technical Solution
Thus, planning a budget for a month, for example, will
include $30 a month + 3 cents per minute per channel. Operating system/ Research vulnerabilities; Monitor CERT
Therefore, it is $3.60 per hour for two B connections. If the applications on advisories; Work with vendors; Apply
user needs to be connected three hours per day, 20 days servers appropriate patches or remove
per month, it will cost $16 + $30 = $246 for a month of services/applications; Limit access to
services on host and firewall; Limit
service, just for the local telecom portion of your ISDN
complexity
connection. Long-distance fees and ISP charges naturally
would be added on the top. Viruses Include rules for importing files on disks and
from the Internet in security policy; Use
virus scanning on client, servers, and at
4.2. Mobile Connection Cost Model
Internet firewall
Mobile workers can consider wireless communications as Modems Restrict use; Provide secured alternatives
another option that can be highly cost-effective, but its when possible (such as a dial-out only
costs generally are higher than wireline communications. modem pool)
It should be mentioned that with packet data the modem Clients Unix: Same as server issues above;
occupies the radio channel only for the time it takes to Windows for Workgroups, Win95, NT: Filter
transmit that packet. In ordinary data networks, users TCP/UDP ports 137,138,139 at firewall;
are usually billed for the amount of data they send. In Be careful with shared services, use
contrast, for data communication over a wireless link, Microsoft’s Service Pack for Win95 to fix bugs
there is established a circuit connection and payment is Network snooping Use encryption; Isolate networks with switch
based on the duration of the call just as with an ordinary or router
voice call. The per-minute charges are usually the same. Network attacks Internet firewall; Internal firewall or router;
Wireless modems are complex electronic devices con- Simple router filters that do not have an
taining interface logic and circuitry, sophisticated radios, impact on performance
considerable central processing power, and digital signal Network spoofing Filter out at router or firewall
processing. As such, they cost more than landline modems.
Table 5. Impact to Production tools to help the staff find out who is doing what. These
Profit Per Transaction 0.5 tools include a project registry, a technology and a method-
ology library, employee profiles, and discussion areas for
Transactions per hour per employee 60 consultants. Building these communities creates groups
Missed transactions per hour 30,000 who can help each other out so as not to repeat mistakes
Total missed transactions 262,800
previously made or to prevent others from reinventing the
Impact of missed transactions $131,400
wheel.
Trend 7: New business opportunities and revenues. As
time continues and organizations continue to implement
Table 6. Opportunity Lost and enhance their intranet or extranet technologies, cap-
turing and sharing information will become more efficient.
Profit Per Sale 20
New opportunities can arise from this effectiveness, such
Sales per hour per employee 3 as learning that the organization had internal skills that
Missed sales per hour 1,500 were not obvious, leading it to extend its offerings.
Total missed sales 13,140 Trend 8: Enhancing learning environments. The
Impact of missed sales $262,600 intranet or extranet has entered the educational arena
as it has become feasible to do so with the vast numbers
of adolescents online. This will empower educators to
5. INTRANET AND EXTRANET TRENDS develop their own intranet or extranet pages and update
them accordingly. This will increase student–teacher
Mindbridge.com Inc. has been analyzing and describing interaction and develop a stronger and more productive
trends typical for intranets [1]. We suggest that these relationship. Distance learning, particularly in regard to
finding are also applicable to extranets. overseas students, will also be affected, as the boundaries
Trend 1: Shifting the focus around the customer. of the classroom walls will continue to fade.
Enhancing and simplifying customer interactions with
the supplier or dealer has become a major focus of many CONCLUSION
industries with the proliferation of the Internet. The
primary focus has been to increase the success of the Currently, the extranet is conceptualized as the key
customer, which in turn creates a greater sense of loyalty technology that can enable development of the third
and increases profits. wave of electronic commerce sites. Although technical
Trend 2: Automate routine tasks. No longer is it and cost advantages are of very high importance, the
necessary to fill out and file through request forms for daily real significance of the extranet is that it is the first
routine activities. Now employees can complete requests nonproprietary technical tool that can support rapid
for products such as office supplies when they log onto evolution of electronic commerce.
the intranet or extranet site and click on what is needed. It is already clear that the Internet impacted retail
This saves not only time but also considerable money by sales, the use of credit cards, and various digital cash
allowing the purchasing department to order in bulk, thus and payment settlement schemes. However, the experts
acquiring a rebate on such orders. predict that the real revolution will be in systems for
Trend 3: Delivering information where it is needed. global procurement of goods and services at the wholesale
No longer is location an issue. Organizations can deliver level and that the role of extranets is crucial for this.
information where it is needed by an upload to the network It is also expected that on a more fundamental level
server. Communications have become streamlined and the extranets will stimulate the business evolution of
efficient as virtual teams are now connected and able to conventional corporations into knowledge factories.
collaborate without concern to time or distance.
Trend 4: The acceptance of the intranet or extranet BIOGRAPHY
by top management. Now that the intranet or extranet
has had a chance to prove itself, top management and Algirdas Pakštas received the M.Sc. degree in radio-
many businesses are buying in. It has been identified as physics and electronics in 1980 from Irkutsk State Univer-
a critical part of the organization, and many have moved sity, Irkutsk, Russia; Ph.D. degree in system programming
to improve and expand their existing intranet or extranet from the Institute of Control Sciences in 1987; and DrTech
capabilities. degree from the Lithuanian Science Council in 1993,
Trend 5: More interesting features. Web construction respectively. He first joined the Siberian Institute of Solar-
kits, tools that help customers create their own homepages, Terrestrial Physics (SibIZMIR) in 1980. At SibIZMIR he
and tickers that track the status of major projects are just worked on the design and development of distributed data
a few of the new and interesting features being added acquisition and visualization systems. Since 1987, he has
to the intranet or extranet. This should continue as the been a senior research scientist and later head of the
intranet or extranet becomes an increasingly vital tool for department at the Institute of Mathematics and Informat-
doing business. ics, Lithuanian Academy of Sciences, where he has been
Trend 6: Knowledge management is growing. Knowl- working on software engineering for distributed computer
edge management is growing, which is helping company systems. From 1991 to 1993, he worked as research fel-
communications between parts and developing a set of low at the Norwegian University of Technology (NTH),
1172 IP TELEPHONY
Trondheim, and from 1994 to 1998 as professor and later 20. V. Junalaitis, The true cost of the Web, PC Week 18: 85 (Nov.
full professor at the Agder University College, Grimstad, 1996).
Norway. Currently, Pakstas is with London Metropoli- 21. E. Shein, Natural selection, PC Week 14: E2 (Oct. 1996).
tan University, London, England. He has published 2 22. B. Walsh, (1996). Fault-Tolerant networking. [Online]. CMP
research monographs, edited 1 book, and authored more Net. http://techweb.cmp.com/nc/netdesign/faultmgmt.html
than 130 publications. His areas of interest are communi- [1997, September 21].
cations software, multimedia communications, as well as
enterprise networking and applications.
IP TELEPHONY
BIBLIOGRAPHY
MATTHEW CHAPMAN CAESAR
1. Mindbridge, (2000). Intranet WhitePapers. [Online]. Mind- University of California at
bridge.com Inc. http://www.mindbridge.com/whitepaperinfo. Berkeley
htm [2001, August 2]. Berkeley, California
call centers, virtual phone lines, and real-time billing. Section 8 lists regulatory agencies and standardization
The increased capabilities of the endpoints allow for bodies guiding and accelerating the deployment of IP
better user interfaces than can be supported in the telephony solutions.
PSTN. Cable providers could deploy an interactive set-
top box to offer directory services and caller ID on the
2. ARCHITECTURE AND PROTOCOLS
television screen [30]. Unlike the PSTN, new services can
be deployed without making expensive, time-consuming A typical IP telephony architecture is shown in Fig. 1. The
modifications to the network. In addition, third parties key network elements include the terminal, the IP tele-
may deploy services on an IP telephony service provider’s phony gateway (ITG), the multipoint control unit (MCU),
network. and the gatekeeper. Terminals may be a specialized device
Several difficulties can arise in deploying IP telephony such as an IP phone, or a personal computer (PC) equipped
networks: with IP telephony software. Terminals form the endpoints
of an IP telephony call. The ITG provides a bridge between
1. Delay, loss, and jitter can decrease audio quality the IP network and the PSTN, which uses the Signaling
in IP networks. Audio quality can be improved System 7 (SS7) protocol to set up and tear down dedicated
by increasing network bandwidth, implementing circuits to service a call. A gatekeeper is used to support
QoS enhancements inside the network, applying admission control functionality, while the MCU enables
protocol enhancements such as forward error advanced services such as multiparty conference calls.
correction (FEC) to send redundant data, and using The two key standards for real-time voice and video con-
strategies to limit and load balance congestion in the ferencing are (1) H.323, which is developed by the Inter-
network. national Telecommunications Union’s Telecommunication
2. Some standards for IP telephony are complex, Standardization Sector (ITU-T) [14]; and (2) Session Ini-
partially developed, or vague. Many companies tiation Protocol (SIP), which is developed by the Internet
are resorting to proprietary protocols that limit Engineering Task Force (IETF) [15]. There is some over-
interoperability. Thirdly, in order to recover the lap between these two standards and combinations of
cost of deployment and recover their huge sunken the two may be used. Many vendors are moving to sup-
costs in PSTN equipment, providers need to port both these protocols to increase interoperability, and
implement competitive and economically feasible several application programming interfaces (APIs) have
pricing strategies. been developed to speed development of IP telephony sys-
3. It is not clear how IP telephony will be regulated. tems [3,27,28].
IP telephony traffic may be unregulated, tariffed, or
even outlawed by national governments [19]. 2.1. Functional Entities
2.1.1. H.323. H.323 operations require interactions
This article gives an overview of IP telephony. The between several different components, including termi-
architecture and the related protocols are discussed in nals, gateways, gatekeepers, and multipoint control units.
Section 2. Section 3 describes several common deployment
scenarios. Delivering good audio quality over best effort IP 2.1.1.1. Terminals. IP telephony terminals are able to
networks is a challenging task, and Section 4 highlights initiate and receive calls. They provide an interface for
the key issues. Section 5 discusses design issues of the a user to transmit bidirectional voice, and optionally
various entities in the IP telephony architecture. Section 7 video or data: (1) the terminal must use a compres-
gives an overview of security considerations, while sor/decompressor (codec) to encode and compress voice for
Cellular
telephone Telephone
STP
Fax machine
SSP
STP
transport across the network — the receiving host must gateway. Each gatekeeper has authority over a portion
decode and decompress the voice for playback to the user; of the H.323 network called a zone. It performs admissions
(2) terminals must perform signaling to negotiate codec control of incoming calls originating in the zone based on
type, data transfer speed, and perform call setup proce- H.323 guidelines and user defined policies. These policies
dures; and (3) the terminal may provide an interface to may control admission based on bandwidth usage, network
choose the remote host and services desired. The interface load, or other criteria. The gatekeeper is also responsible
may also display real-time feedback as to the connection for address translation from alias addresses such as
quality and price charged. an H.323 identifier, URL (Uniform Resource Locator), or
H.323 supports several codecs for voice and video in email address to an IP address. The gatekeeper maintains
the terminal, including G.711, G.723.1, and G.722. Codecs a table of these translations, which may be updated using
vary in bit rate, processing delay, frame length and size, the registration–admission–status (RAS) channel. The
memory and processor overhead, and resulting sound gateway also monitors the bandwidth and can restrict
quality [21]. usage to provide enhanced QoS to other applications on
H.323 networks can support a wide variety of end the network.
systems. A PC may be configured to act as an IP telephony Several enhancements may be made to gatekeepers to
end system by installing appropriate client software. Many improve utility:
of these clients may be downloaded free of charge, but
may require multimedia hardware such as a microphone,
1. It may act as a proxy to process call control
speakers, and possibly a videocamera. Specialized sound
signals rather than passing them directly between
cards are available that allow the use of a standard
endpoints.
telephone in place of a microphone. Performance may be
improved and software cost reduced by installing a card 2. Calls may be authorized and potentially rejected.
to perform voice encoding in hardware. PC-less solutions Policies may be implemented that reject a call based
are also available; specialized IP phones are manufactured on a user’s access rights, or to terminate calls if
that can make voice calls across a campus data network. higher priority ones come in.
Standard PSTN telephones may be equipped with an 3. The gatekeeper may send a billing system call details
adapter to support similar functionality. on call termination.
4. The gatekeeper may perform call routing to collect
2.1.1.2. Gateway. A gateway connects IP networks to more detailed call information, or to route the
other types of networks, such as the PSTN: (1) it uses a call to other terminals if the called terminal is
signaling module to translate signaling information such not available. The gatekeeper may perform load
as call setup requests from one network to another, (2) it balancing strategies across multiple gatekeepers to
uses a media gateway to perform translations between avoid congestion.
different data encoding types, and (3) it allows for real- 5. The gatekeeper may limit the bandwidth of a call to
time control of the data stream through the use of a media less than what was requested to decrease network
controller. The signaling module and the media controller congestion.
are sometimes collectively referred to as the signaling 6. The gatekeeper may keep a database of user profiles
gateway. Home users may purchase telephony cards to for directory services. Additional supplementary
transform their PC into a gateway. More commonly, services such as call park/pickup may also be
gateways are sold as specialized devices and deployed implemented with a gatekeeper.
in service provider networks.
For example, a gateway might bridge the PSTN
network to an H.323 network by translating PSTN 2.1.1.4. Multipoint Control Unit (MCU ). A multipoint
Signaling System 7 (SS7) into H.323 style signaling, and control unit (MCU) is used to manage conference calls
encoding PSTN voice. Endpoints communicate with the between 3 or more H.323 endpoints. It may be imple-
gateway using H.245 and Q.931. The PSTN user dials mented as a standalone unit, as part of the gateway, or as
a phone number to connect to the gateway. Address part of the terminal. It performs mixing, transcoding, and
translation may be performed by requesting a name or redistribution operations on a set of media streams.
extension from the user, and forming a connection to the
appropriate IP address. An IP user may call a PSTN user 2.1.2. SIP. The SIP architecture [15] is composed of
by requesting the gateway to dial the appropriate phone terminals, proxy servers, and redirect servers. Unlike
number. H.323, terminals are the only required entities. Terminals
Telecommunications companies (telcos) are often hes- have functionality similar to that of an H.323 terminal.
itant to expose their SS7 network because of its critical Servers are used to implement directory services. Redirect
nature. A preferred deployment method is to install the sig- servers inform the calling SIP terminal the location of the
naling gateway at the telco premises and allow it to control called party, and the terminal may then connect directly
media gateways in data service provider networks [7,21]. to that location. Proxy servers act as an intermediary for
the connection, and forward the call setup request to a
2.1.1.3. Gatekeeper. A gatekeeper is responsible for new location. SIP allows any codec registered with the
managing and administrating the initiated calls in the Internet Assigned Numbers Authority (IANA) to be used,
H.323 network. It is often implemented as part of a in addition to the ITU-T codecs supported in H.323.
IP TELEPHONY 1175
Audio Video
System control
apps apps
G.711
G.722 H.261
H.263 Call Call
G.729 RAS
etc. signaling control
etc. channel
channel channel
H.225
Q.931 H.245
RTP
RTCP
UDP TCP
IP
RTP session
Bye Bye Bye
200 OK 200 OK
200 OK
Figure 4. Sample SIP call establishment
and teardown.
to request an audio channel. This message contains the A sample SIP call through a redirect server is shown in
UDP ports it wishes to use for the RTP audio stream and Fig. 4. Suppose user A at T1 wants to contact user B at T2.
RTCP messages. T2 acknowledges this and attempts to T1 sends an ‘‘Invite’’ message to the SIP server running
set up a channel in the reverse direction to complete in redirect mode to indicate user B is being invited to
the bidirectional call, and RTP flows are established. participate in a session. This message usually contains a
Eventually, user B wishes to terminate the call, and causes session description written in SDP including the media
T2 to send a close logical channel, which is acknowledged types T1 is willing to receive. The redirect server contacts
by T1. T1 sends a similar message to T2, and both sides a location server to lookup user B’s current address, and
send disengage requests to the gatekeeper. returns a 302 ‘‘Moved temporarily’’ message containing the
location of user B. T1 then sends a second invite message
to T2. When T2 receives the message it sends back a 100
2.2.2. SIP. Like H.323, SIP is a standard for signaling
‘‘trying’’ message and tries to alert the user. When user
and control for IP telephony networks. SIP may be used
B picks up, T2 sends a ‘‘200 OK’’ message containing its
in conjunction with H.323. SIP is based on HTTP, and
capabilities. T1 responds with an acknowledgement (Ack)
provides call setup and termination, call control, and call
and RTP flows are established, completing the call. Either
data transfer. SIP is a client/server protocol, and can
side may send a ‘‘Bye’’ message to terminate the call, which
operate over reliable and unreliable transport protocols.
must be answered with a 200 OK.
SIP 2.0 contains client requests to invite a user to join
A call handled by a SIP proxy server is also shown
a session, to acknowledge a new connection, to request
in Fig. 4. All messages pass through the server, and the
the server’s capability information, to register a user’s
calling terminal exchanges connection setup messages as
location for directory services, to terminate a call, and
if it is connecting directly to the callee’s terminal. This
to terminate directory searches. SIP uses the Session
scheme can cause the SIP server to become a bottleneck,
Description Protocol (SDP) for session announcement and
limiting scalability of the IP telephony network. However,
invitation, and to choose the type of communication [43].
proxy servers allow for the accurate measurement of usage
There are several differences between SIP and H.323.
necessary for billing services. Furthermore, the use of a
First, H.323 uses a binary format to exchange information.
proxy server can enhance privacy by hiding each party’s
SIP is text-based, making parsers easier to build and
IP address and identification.
the protocol easier to debug. Although H.323’s format
reduces some bandwidth and processing overhead, this
overhead is minimal given that signaling and control 3. DEPLOYMENT SCENARIOS
happens infrequently during a call. SIP is considered
to be more adaptable, as H.323 is required to be fully In order to understand how an IP telephony system
backward compatible. SIP exchanges fewer messages on is constructed, it is useful to review several common
connection setup, and hence can more quickly establish scenarios for voice telephony. We will note how IP
connections. H.323 may use only ITU-T developed codecs, telephony can affect each of these service markets.
whereas SIP may use any codec registered with IANA.
Furthermore, SIP is considered to be more scalable, allows
3.1. Data Service Provider
the servers to be stateless, provides loop detection, and has
a smaller call setup delay. On the other hand, following Data service providers can provide telephony services to
the early publication of the standard, H.323 has been more increase revenue, and can provide voice with advanced
widely supported. Furthermore, the H.323 specification supplementary services and a regulatory price advantage
provides improved fault tolerance as well as more thorough over the PSTN carriers. Traditionally this has been done
coverage of data sharing, conferencing, and video. Recent by becoming a competitive local exchange carrier (CLEC)
revisions of H.323 are addressing some of its shortcomings. and installing PSTN services in rented space in a central
IP TELEPHONY 1177
office (CO). Today, data service providers that wish to IP telephony service. ITSPs have recently entered the
offer voice services can implement IP telephony solutions market and are facing similar issues.
on top of their existing data networks, eliminating the
need to purchase equipment for and train personnel to 3.2. Local Exchange Carrier (LEC)
run a separate PSTN network. The unreliability of IP
Local exchange carriers (LECs) provide local telephone
networks requires an investment in upgrading network
service. Phone lines run from the customer premises
element software and hardware to increase availability
to a CO. COs are connected in a local access and
and service quality. To succeed, data service providers
transport area (LATA) or to an interexchange carrier (IXC)
must convince the public that IP telephony is not a low-
for long-distance traffic. LECs already provide voice
quality service, reengineer their networks to give good
services to a large number of customers, giving them
service quality under high load, and implement reliable
a significant advantage over data service providers
account control and billing systems [6].
in the IP telephony market. LECs also have well-
The major players in the data service provider space
developed operations for marketing, billing, and complying
include companies that deploy data lines to customer
with federal regulations. Implementing an IP telephony
premises, and those that provide long-haul data transport
network can improve a LEC’s efficiency by eliminating
services. Many cable television (CATV) providers use their
congestion points in and decreasing costs associated with
hybrid fiber coaxial (HFC) networks for bidirectional data
their PSTN network.
transfer to subscribers. These networks consist of an
LECs face several hurdles in implementing an IP
optical node that converts optical signaling from the
telephony solution. LECs have a large investment in
control center transmitter to electrical signaling on coaxial
circuit switched voice services, and implementing VoIP
cable. Instead of deploying specialized HFC telephony
diverts revenue away from these services. Furthermore,
equipment, CATV providers are considering using Voice
in the United States, LECs are required by law to subsidize
over IP (VoIP) to decrease cost and attract customers
certain users, and hence have higher operating costs.
with a set of supplementary services [30]. Problems that Also the Regional Bell Operating Companies (RBOCs) are
need to be solved include updates to the data over cable forbidden by law from offering long distance services.
service interface specification (DOCSIS) to support voice This may interfere with their ability to deploy large-
requirements, and forming interconnection agreements scale IP telephony networks. Finally, LECs must build
to more effectively route the IP telephony calls. Also, a reliable, overprovisioned data network to handle the
IP telephony support will need to be deployed through stringent requirements of voice traffic.
DOCSIS-compliant products at the client and inside
the network. Because of the large investment in circuit
3.3. Interexchange Carrier (IXC)
switched products and the strict QoS requirements of
voice traffic, many CATV providers favor initially using IXCs are long distance carriers. They provide transport for
IP telephony for local calls. The revenue generated inter-LATA calls. Like the LECs, IXCs lack an economic
may then be used to implement a nationwide IP incentive to implement IP telephony networks, due to their
telephony network. Finally, providers will need to decide large investment in PSTN equipment. IXCs must also
between an all-IP-based telephony solution or a hybrid build a large-scale management architecture for billing,
IP/circuit switched solution involving a network call provisioning, and monitoring [5] and face similar problems
signaling gateway (NCSG). Other providers that deploy in developing or partnering with the provider of a data
data connections directly to the subscriber, like digital network scalable enough to handle a large number of
subscriber line (DSL) and wireless data service providers, calls.
face similar issues. IP telephony can provide considerable benefits to an
Internet service providers (ISPs) are also considering IXC:
implementing IP telephony solutions in their networks.
ISPs provide long-haul data transport. There are several (1) IXCs are required by law in several countries by
types of ISPs, including backbone ISPs and access ISPs. A law to pay a minimum fixed price per minute to the
typical access ISP consists of points of presence (POPs) LECs terminating and originating the long-distance
to which users may connect through modem, DSL, call — costs may be reduced by using a data network
cable, Integrated Services Digital Network (ISDN), or to bypass the LECs on each end of the call;
other types of leased lines. Access ISPs can terminate (2) IP telephony allows IXCs to implement a bundled
IP telephony calls at gateways placed at one or more solution for nationwide data, video, and voice
POPs. IP telephony is expected to significantly increase transport with supplementary services;
the operating costs of an ISP, so it is critical that the
(3) cost savings may be achieved by multiplexing
ISP implement good pricing strategies to recover these
PSTN-originated voice calls over a nationwide data
costs. Backbone ISPs provide a long-distance network
network.
to interconnect access ISPs. Backbone ISPs can expect
to generate more revenue with the introduction of IP
3.4. IP Host to IP Host
telephony services because of the resulting increase in data
traffic carrying voice flowing between POPs. There are Two users may install IP telephony software on their
several pure Internet telephony service providers (ITSPs) computers to communicate. If the call does not need to
that lease capacity from existing data network to provide pass through the PSTN, there is no need for any sort of
1178 IP TELEPHONY
network intelligence, and only plain data services need to packet sizes decreases this delay, but increases network
be purchased from the service provider. It is not clear how load, as more bandwidth must be expended on sending
an ISP should treat calls that do not require transit on packet headers. Some coders examine the next frame to
the PSTN network. Although these calls increase network exploit any correlation with the current frame, causing
load, the ISP may not be able to distinguish VoIP traffic, lookahead delays.
and hence may be unable to charge separately for IP
telephony services. However, users are often willing to 4.1.2. Loss. IP network congestion can cause router
pay more for high-quality connections in IP telephony buffers to overflow, resulting in packet loss. Unlike the
networks, and so ISPs may wish to charge more for PSTN, no end-to-end circuits are established, and IP
increased QoS [32]. Furthermore, the IP hosts may wish packets from many links are queued for transmission
for part of the call path to be routed over the PSTN, over an outgoing link. Packets are dropped if there is
improving QoS. no space in the queue. Packet loss interferes with the
ability for the receiving host’s codec to reconstruct the
audio signal. Each packet contains 40–80 ms of speech,
4. ISSUES OF PACKET AUDIO OVER IP NETWORKS
matching the duration of the phoneme, a critical phonetic
unit used to convey meaning in speech. If many packets
IP networks do not provide the QoS guarantees found
are lost, the number of lost phonemes increases, making
in the PSTN. Hence, they are inherently limited in
it more difficult for the human brain to reconstruct the
meeting the requirements of voice transport. However,
missing phonemes and understand the talker. Packet loss
proper network and end system design can provide an
in the Internet is improving, and even long distance paths
infrastructure with good end-to-end voice quality.
average less than 1% packet loss.
4.1. Factors Affecting Audio Quality
4.1.3. Jitter. Random variation in packet interarrival
Several factors affect the quality of the received audio. times at the receiver is referred to as jitter. Jitter occurs
Delay and echo can interfere with normal conversations as a result of the variance in queueing delays at IP
and make communication over the network unpleasant. network routers. If this delay is too long, the packet will be
Packet-switched networks suffer from packet jitter and considered lost by the receiver, decreasing audio quality.
loss, which further decrease audio quality. Packets that arrive late require complex processing in the
receiver’s codec. Jitter increases with both network load
4.1.1. Delay. Delay is the time for the voice signal to and distance. Jitter tends to range from near-zero values
be encoded, sent through the network, and decoded at the for overprovisioned local area networks (LANs) and 20 ms
receiver. One way delays greater than 150 ms cause the for intercity links up to 99 ms for international links [32].
listener to hesitate before responding, making the mood
of the conversation sound cold and unfriendly [25]. PSTN 4.1.4. Echo. Echo is the sound of the talker’s voice
delays are typically 30 ms, whereas Internet delays tend to being reflected back to the talker. The most common cause
range from 50 to 400 ms. Delay in IP telephony networks is of echo in IP telephony systems is an impedance mismatch
caused by IP network delays and signal processing delays. in the network, causing some of the speech energy to be
There are five types of IP network delays: propagation reflected back to the talker. Echo may also be caused
delays, packet capture delays, serialization delays, switch- by acoustic coupling between the terminal’s speaker and
ing delays, and queuing delays. Propagation delay is the microphone. Echo is seldom noticed in the PSTN, because
time for the signal to propagate through a link, and is low delays allow it to return quickly to the talker’s ear.
a function of transmission distance. Packet capture delay The long delays of IP networks make echo more perceptible
is the time required for the router to receive a complete and annoying.
IP packet before processing it and forwarding it towards
the destination. Serialization delay is the time it takes to 4.1.5. End-System Design. Several other factors can
place a packet on the transmission link. Switching delay influence packet audio. Poor design of end-user equipment
is the time for the router to choose the correct output port can accentuate background noise, decreasing intelligibility
on the basis of information contained in the IP packet of speech. Noise can also originate from analog components
header. Queuing delays occur when packets are buffered or bit errors introduced by the end-user system. Finally,
at the router while other packets are being processed. voice activity detectors (VADs) decrease bandwidth uti-
Signal processing delays occur in the codec algorithms lization by only sending packets when the user is speaking.
implemented at the gateway or end-user systems. Improper design of these devices can lead to inadvertently
Codecs perform encoding and compression operations clipping parts of the audiostream.
on the analog voice signal, resulting in digital signal
processing (DSP) delays. These delays include the time
4.2. Measurement Techniques
to detect dual-tone multifrequency (DTMF) tones, detect
silence, and cancel echo. DSP algorithms depend on The quality of the received audio affects the ability of
processing an entire frame at a time, and the time to fill the listener to understand the talker. Hence, appropriate
one of these frames before passing it to the DSP algorithm measurement algorithms are needed to evaluate audio
increases delay. The time for a transmitter to fill a packet quality in telephony networks. These measurement
with voice data is the packetization delay. Using shorter algorithms often operate by comparing an encoded
IP TELEPHONY 1179
waveform to an uncoded reference, and often give outputs or Differentiated Services (DiffServ) models, or operat-
in terms of a mean opinion score (MOS). ing over ATM. Finally, increasing link speeds decreases
The ITU-T defines the perceptual speech quality serialization delay, further increasing service quality.
measurement (PSQM) method to provide a relative score of
the quality of a voice signal, taking into account cognitive 5.2. End System
factors and the physiology of the human ear. Output and
IP telephony endpoints are much more intelligent than
reference waveforms must be time synchronized before their PSTN counterparts, allowing us to implement
they are input to the algorithm. PSQM+ was later defined sophisticated algorithms to achieve improved QoS and
as an improvement to PSQM. PSQM+ more accurately more effective use of bandwidth.
reflects audio quality in the presence of packet drops Proper design of codecs can greatly improve service
and other quality impairments. The perceptual analysis quality. Using compression decreases network utilization,
measurement system (PAMS) is similar to PSQM, but uses but can lower audio quality. Furthermore, it increases
a different signal-processing model. PAMS performs time delay, sensitivity to dropped packets, and computational
alignment to eliminate delay effects, and improves on the overhead. Silence suppression can decrease bandwidth
accuracy of PSQM+. A second alternative to PSQM is utilization by up to 50% by only transmitting packets when
measuring normalized blocks (MNB). PESQ combines the the speaker is talking. However, quality of the signal may
cognitive modeling of PSQM with the time alignment be reduced if the transmitter does not accurately detect
techniques of PAMS. PESQ has been found to give, when speech is present. Increasing the number of frames
on average, the most accurate measurements of these in an IP packet improves network utilization by removing
techniques [26]. packet overhead, but increases packetization delay and
sensitivity to dropped packets [21]. For example, the ITU-
T’s G.711 codec gives the best received audio quality
5. SYSTEM DESIGN
under poor network conditions, while the G.723.1 codec
offers much lower network utilization at the expense of
IP telephony service providers must implement well- audio quality. Furthermore, codec DSP algorithms may
designed systems to meet the strict requirements of voice be implemented in hardware, significantly decreasing the
traffic. Data service providers must provide networks processing latency.
capable of handling a large number of voice calls. Packet loss in today’s IP networks is fairly common.
Equipment vendors must implement hardware and Voice transport in IP networks usually takes place over
software that is highly available and routes voice traffic unreliable transport mechanisms, as the overhead for
with low delay. Finally, protocols and standards must be reliable transport significantly increases delay. However,
developed to support scalable, robust, and secure wide- packet loss may be acceptable because of the ability of the
area telephony systems. human brain to reconstruct speech from lost fragments
and the advanced processing capabilities at IP telephony
5.1. IP Network terminals. FEC can be used to reconstruct lost packets
by transmitting redundant information along with the
High availability is a prerequisite for telecommunications. original information [22–24]. The redundant information
The PSTN has been engineered to provide virtually is used to reconstruct some of the lost original data. For
uninterrupted dial tone to over a billion users worldwide. example, a low-quality copy of the previous packet may be
Matching this impressive achievement with a fully IP- transmitted with the current packet.
based network will be a challenging task. If IP telephony Packet jitter can be alleviated by using a playback
is to be accepted as a viable alternative to the PSTN, buffer at the receiver. The sender transmits RTP packets
IP network hardware and software must be engineered to with a sequence number and a timestamp, and the receiver
provide similar reliability, scalability, and ease of use. This queues each packet to be played at a certain time in the
can be achieved through redundancy, intelligent network future. This playout delay must be long enough so that the
design, and failover to PSTN networks in cases of severe majority of packets are received by their playout times.
congestion or failure. However, too much delay will be uncomfortable to the
The network must provide good QoS to the voice listener. Receivers may estimate the network delay and
application. Designers of campus networks should over- jitter, and calculate this playout delay accordingly.
provision to avoid call blocking, and should mark traffic To deal with unwanted echo, echo cancelers may
to support routing QoS enhancements [5,8]. WANs may be placed in the PSTN, the gateway, or the end-user
run at near-link capacity to reap the benefits of statis- terminal. These devices should be placed as close as
tical multiplexing, and should use QoS enhancements possible to the source of the echo. Long-distance networks
such as traffic prioritization and QoS routing to improve usually implement two echo cancelers per path, one for
service [2,38]. Gateways should be placed close to termi- each direction. A good echo canceler delivers good audio
nals to decrease loss, jitter, and delay. Network topologies quality without echo, distortion, or background noise. Echo
that duplicate voice traffic over several links can improve cancelers work by obtaining a replica of the echo by a
performance at the cost of decreased efficiency. Packet linear filter and subtracting it from the original signal,
classification may also be used to provide priority ser- then using a nonlinear processor (NLP) to eliminate the
vice to voice traffic. QoS improvements may be made in remaining echo by attenuating the signal. PSTN echo
IP networks following the Integrated Services (IntServ) cancelers are seldom constructed to recognize the large
1180 IP TELEPHONY
delays from packet networks and do not attempt to perform the need for QoS request signaling and channel setup
echo cancellation in such circumstances. Echo cancellation delay. However, since DiffServ does not perform admission
must therefore also be implemented in the IP network or control, it cannot provide QoS guarantees to VoIP
terminal. applications.
6.2. Pricing
6. RESOURCE MANAGEMENT
It is expected that a moderate use of IP telephony will
IP telephony networks consist of limited resources that bring about an increase of 50% in an ISP’s operating
must be allocated to a group of subscribers. These costs [37]. It is important that these costs be passed
resources include IP network bandwidth, PSTN trunks, on to the users. The signaling architecture along with
gateway functionality, and access to MCUs. IP telephony the greater intelligence of IP telephony endpoints allows
service providers must design and implement schemes service providers to formulate and implement complex
to allocate resources to the users with greater demand. pricing models that take into account both the dynamic
IntServ and DiffServ are two architectures useful for congestion at the gateway along with the desired QoS of
designing networks in which resources may be easily the call [33–35]. The price charged to the user should take
allocated. Pricing may then be used to control the way into account the amount of resources consumed as well as
in which these resources are allocated to different users. the expected revenue lost to the provider because higher-
paying users are blocked from using those resources [43].
6.1. Resource Allocation Several pricing schemes have been proposed for IP
telephony services. Although flat pricing schemes are well
6.1.1. Integrated Services (IntServ). A simple way to accepted by users and tend to dominate in most Internet
partition resources among a group of users is to let the services, light users subsidize heavy users, making them
network decide whether a call will be admitted on the basis
economically inefficient [41]. Pricing resources based on
of factors such as user requirements and current network
the current congestion in the network causes low paying
load. In the IntServ model, each router in the network is users to back off, freeing resources for high-paying users.
required to know what percentage of its link bandwidths In congestion sensitive pricing, users may be charged
are reserved [18]. An endpoint wishing to place a call
per byte or per minute. In ‘‘smart market’’ (SM) pricing,
must request enough bandwidth for the call, and the
the user puts a bid into each packet. During times of
network may decide whether there are sufficient resources.
congestion, the router will drop the packet if it is below the
If not, the call is blocked. The Resource Reservation
current congestion price. Charging users a higher price for
Protocol (RSVP) is used in an IntServ framework to deliver a higher-QoS connection leaves high-quality connections
QoS requests from a host to a router, and between
free for higher-paying users. In QoS-sensitive pricing, the
intermediate routers [11]. QoS routing may be used to
user may be charged a higher price for closer gateways or
find the optimal path for the call to take through the IP
gateways offering supplementary services. In Paris Metro
network, as RSVP does not determine the links in which
Pricing [36], the network bandwidth is partitioned into
reservations are made.
several subnetworks, each priced differently. Users with
Some networks, such as the Internet, are not con-
low demand will tend to congregate in the lower priced
structed using the IntServ model. However, an adminis-
partitions, increasing the service quality of more expensive
trator defined policy may be implemented at the H.323
partitions. The provider should choose a pricing scheme
gatekeeper or SIP server to provide admission control and
that is economically efficient and accepted by users. These
resource reservation. For example, the gatekeeper may
pricing schemes may be implemented using the Gateway
choose to block incoming calls when the total bandwidth
Location Protocol (GLP) to distribute price information in
in use by callers passing through that gateway exceeds a
real time [10,12,16].
certain threshold.
In the H.323 protocol architecture, security of the international gateways. However, they must pay tariffs
videostream, call setup, call control, and communications to interface with the PSTN. Japan requires ISPs to obtain
with the gatekeeper is achieved by H.235. This is achieved a license to establish and operate circuits and facilities.
by having the endpoints authenticate themselves with Regulatory issues are difficult to resolve and need to be
gateways, gatekeepers, and MCUs. Encrypting data and reviewed in detail.
control channels can prevent attackers from making
unauthorized modifications to the information in transit,
thereby providing data integrity and privacy. This can, for BIOGRAPHIES
example, prevent attackers from using a LAN analyzer to
listen to an unencrypted call on a broadcast network. Matthew Chapman Caesar completed his B.S. degree
Finally, nonrepudiation ensures that no endpoint can in computer science at the University of California at
deny that it participated in a call, and is provided by Davis in 1999. During 2000, he was a member of
the gatekeeper. Being sure of a caller’s identity is critical technical staff at iScale, Mountain View, California.
for billing services. Currently, he is enrolled in the Ph.D. program in computer
SIP derives most of its security properties from science at the University of California at Berkeley. In
HTTP [15]. In addition to these, SIP also supports public 2001, he was awarded the National Science Foundation
key encryption. SIP does not specify the security of (NSF) Graduate Research Fellowship and the Department
user sessions, but SDP messages may be exchanged to of Defense National Defense Science and Engineering
allow the endpoints to decide on a common scheme. Data Graduate (NDSEG) Fellowship. His research interests
integrity and privacy can be implemented with encryption. include Internet economics, congestion control, and real-
Authentication of end users may be performed with HTTP time multimedia services.
authentication or PGP. Having callers register with a SIP
server provides nonrepudiation. Dipak Ghosal received his B.Tech degree in electrical
engineering from Indian Institute of Technology, Kanpur,
8. REGULATORY AND STANDARDIZATION ISSUES India, in 1983, his M.S. degree in computer science from
Indian Institute of Science, Bangalore, India, in 1985,
There are several standards organizations designing stan- and his Ph.D. degree in computer science from University
dards for IP telephony. The Internet Engineering Task of Louisiana, Lafayette, in 1988. From 1988 to 1990 he
Force (IETF) develops IP related standards for the Inter- was a research associate at the Institute for Advanced
net. The IETF publishes requests for comments (RFCs) on Computer Studies at University of Maryland (UMIACS)
many areas, including IntServ, DiffServ, Gateway loca- at College Park. From 1990 to 1996 he was a member of
tion, Telephony Routing over IP (TRIP), RTP, and SIP. technical staff at Bell Communications Research (Bellcore)
The Asynchronous Transfer Mode (ATM) Forum deals at Red Bank. Currently, he is an associate professor of
with IP and ATM issues. Unlike IP, ATM was designed the Computer Science Department at the University of
to support applications with different characteristics and California at Davis. His research interests are in the
service requirements, and networks may use IP over ATM areas of IP telephony, wireless network, web caching, and
networks to improve QoS. Telephony and Internet Pro- performance evaluation of computer and communication
tocol Harmonization Over Networks (TIPHON) develops systems.
specifications covering interoperability, architecture, and
functionality for IP telephony networks. The Digital Audio
BIBLIOGRAPHY
Video Council (DAVIC) was created to address end-to-end
interoperability of audiovisual information and multime-
dia communication. Its charter is to develop specifications 1. J. Kurose and K. Ross, Computer Networking: A Top-Down
for development based on Moving Picture Experts Group Approach Featuring the Internet, Addison-Wesley, 2000.
version 2 (MPEG-2) coding. The ITU-T is concerned with 2. B. Li, M. Hamdi, D. Jiang, and X. Cao, QoS enabled voice
developing standards for all fields of telecommunications support in the next generation internet: Issues, existing
except radio aspects. The ITU-T began to study IP related approaches and challenges, IEEE Commun. Mag. 38(4):
projects in 1997, and established formal collaborations 54–61 (April 2000).
with the IETF, ATM Forum, and DAVIC. 3. D. Bergmark and S. Keshav, Building blocks for IP telephony,
National governments have a wide range of Internet IEEE Commun. Mag. 38(4): 88–94 (April 2000).
and telephony regulatory policies. The technological 4. F. Anjum et al., ChaiTime: A system for rapid creation
advance that brought about IP telephony has moved of portable next-generation telephony services using third-
much faster than most countries were able to change party software components, Proc. 2nd IEEE Conf. Open
policy. Furthermore, it is not clear whether VoIP Architectures and Network Programming (OPENARCH),
should be regulated and taxed as voice in the PSTN. New York, March 1999.
Countries such as the United States, Sweden, Italy, 5. M. Hassan, A. Nayandoro, and M. Atiquzzaman, Internet
and Russia do not restrict access to the Internet nor telephony: Services, technical challenges, and products, IEEE
regulate IP telephony services. China disallows ISPs Commun. Mag. 38(4): 96–103 (April 2000).
from operating international gateways or providing 6. A. Rayes and K. Sage, Integrated management architecture
a nationwide infrastructure, although they may run for IP-based networks, IEEE Commun. Mag. 38(4): 48–53
their own switches. Canada allows ISPs to establish (April 2000).
1182 ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS
7. M. Hamdi et al., Voice service interworking for PSTN and IP 28. Microsoft Corp., IP telephony with TAPI 3.0 (Microsoft Corp),
networks, IEEE Commun. Mag. (May 1999). technical white paper, http://www.microsoft.com/windows-
2000/techinfo/howitworks/communications/telephony/
8. Cisco Systems, Architecture for Voice, Video and Integrated
iptelephony.asp.
Data, technical white paper, http://www:cisco:com/warp/
public/cc/so/neso/vvda/iptl/avvid− wp:htm. 29. Cisco, IP telephony Solution Guide: Planning the IP
telephony Network, technical white paper, http://www.cisco.
9. M. Korpi and V. Kumar, Supplementary services in the H.323
com/warp/public/788/solution guide/3 planni.htm.
IP telephony network, IEEE Commun. Mag. 37(7): 118–125
(July 1999). 30. G. Cook, Jr., Taking the hybrid road to IP telephony,
Commun. Eng. Design Mag. (Dec. 2000).
10. J. Rosenberg and H. Schulzrinne, Internet telephony gateway
location, IEEE INFOCOM, San Francisco, March–April 1998. 31. W. Matthews, L. Cottrell, and R. Nitzan, 1-800-CALL-HEP,
presented at Computing in High Energy and Nuclear
11. R. Braden et al., Resource reservation protocol (RSVP) — Physics 2000 (CHEP’00), Feb. 2000 (full text on website
Version 1 Functional Specification, RFC 2205, September http://www.slac.stanford.edu/pubs/slacpubs/8000/slac-
1997. pub-8384.html).
12. J. Rosenberg and H. Schulzrinne, A Framework for Telephony 32. J. Altmann and K. Chu, A proposal for a flexible service plan
Routing over IP, IETF, RFC 2871, June 2000. that is attractive to users and Internet service providers,
13. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, IEEE INFOCOM, Anchorage, April 2001.
Audio-Video Transport Working Group, RTP: A Transport 33. X. Wang and H. Schulzrinne, Pricing network resources for
Protocol for Real-Time Applications, IETF, RFC 1889, Jan. adaptive applications in a differentiated services network,
1996. IEEE INFOCOM, Anchorage, April 2001.
14. Recommendation H.323, Visual Telephone Systems and 34. M. Falkner, M. Devetsikiotis, and I. Lambadaris, An over-
Equipment for Local Area Networks Which Provide a Non- view of pricing concepts for broadband IP networks, IEEE
guaranteed Quality of Service, International Telecommuni- Commun. Surv. 3(2): (April 2000).
cations Union, Telecommunications Standardization Sector 35. L. DaSilva, Pricing for QoS-enabled networks: A survey,
(ITU-T), Geneva, Switzerland, Nov. 1996. IEEE Commun. Surv. 3(2): (April 2000).
15. M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, 36. A. Odlyzko, Paris Metro Pricing for the Internet, Proc. ACM
SIP: Session Initiation Protocol, IETF, RFC 2543, March Conf. Electronic Commerce, Denver, Nov. 1999, pp. 140–147.
1999.
37. L. McKnight, Internet telephony: Costs, pricing, and policy,
16. C. Agapi et al., Internet Telephony Gateway Location Service Proc. 25th Annual Telecommunications Policy Research Conf.,
Protocol, IETF, Internet Draft, November 1998. 1997.
17. S. Black et al., An Architecture for Differentiated Service, 38. A. Dubrovsky, M. Gerla, S. Lee, and D. Cavendish, Internet
IETF, RFC 2475, December 1998. QoS routing with IP telephony and TCP traffic, Proc. ICC,
New Orleans, June 2000.
18. R. Braden, D. Clark, and S. Shenker, Integrated Services in
the Internet Architecture: An Overview, IETF, RFC 1633, June 39. S. Frankel, Demystifying the Ipsec Puzzle, Artech House,
1994. Boston, 2001.
19. Organization for Economic Co-operation and Development, 40. T. Dierks and C. Allen, The TLS Protocol: Version 1.0, IETF,
OECD Communications Outlook 1999, OECD, 1999. RFC 2246, Jan. 1999.
20. M. Arango et al., Media Gateway Control Protocol (MGCP), 41. G. Huston, ISP Survival Guide: Strategies for Running a
IETF, RFC 2705, Oct. 1999. Competitive ISP, Wiley, New York, 1999.
21. M. Perkins, C. Dvorak, B. Lerich, and J. Zebarth, Speech 42. M. Handley, and V. Jacobson, SDP: Session Description
transmission performance planning in hybrid IP/SCN net- Protocol, RFC 2327, April 1998.
works, IEEE Commun. Mag. 37(7): 126–131 (July 1999). 43. M. Caesar, S. Balaraman, and D. Ghosal, ‘‘A Comparative
Study of Pricing strategies for IP telephony’’, IEEE Globecom
22. J. C. Bolot, S. Fosse-Parisis, and D. Towsley, Adaptive FEC-
2000, Global Internet Symposium, San Francisco, Nov. 2000.
based error control for Internet telephony, Proc. IEEE
INFOCOM, New York, March 1999. 44. M. Caesar and D. Ghosal, ‘‘IP Telephony Annotated Bibli-
ography’’, http://www.cs.berkeley.edu/∼mccaesar/research/
23. M. Podolsky, C. Romer, and S. McCanne, Simulation of FEC-
iptel− litsurv.html.
based control for packet audio on the Internet, Proc. IEEE
INFOCOM, San Francisco, March–April 1998.
24. J. C. Bolot and H. Crepin, Analysis and control of audio
packet loss over packet-switched networks, Proc NOSSDAV, ITERATIVE DETECTION ALGORITHMS IN
Durham, NC, April 1995. COMMUNICATIONS
25. A. Percy, Understanding Latency in IP telephony, tech-
KEITH M. CHUGG
nical white paper, http://www:brooktrout.com/whitepaper/
iptel latency.htm. University of Southern California
Los Angeles, California
26. J. Anderson, Methods for Measuring Perceptual Speech Qual-
ity, technical white paper, http://onenetworks.comms.agi-
1. INTRODUCTION
lent.com/downloads/PerceptSpeech2.pdf.
27. G. Herlein, The Linux telephony kernel API, Linux J. 82: Iterative algorithms are those that repeatedly refine
(Feb. 2001). a current solution to a computational problem until
ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS 1183
an optimal or suitable solution is yielded. Iterative an understanding of its practical characteristics and
algorithms have a long history and are widely used range of application. The transfer of this understanding
in modern computing applications. The focus of this in the research community to engineering practice is
article is iterative algorithms applied to communication well under way. The use of turbo-like codes is now
systems, particularly to receiver signal processing in common is systems and standards designed after 1996.
digital communication systems. In this context, the basic Practical implementations of other applications and a
problem is for the receiver to infer the digital information more aggressive exploitation of the potential benefits of
that was encoded at the transmitter using only a waveform iterative detection are currently emerging in industry.
observed after corruption by a noisy transmission channel. The motivation, basic ideas underlying the approach,
Thus, one can imagine that an iterative receiver algorithm and potential advantages are described in the following
would obtain an initial rough estimate of the transmitted subsections.
block of data, and then refine this estimate. The refinement
process can take into account various sources of corruption 1.1. Motivation for Iterative Detection
to the observed signal and various sources of structure that
A common, generic, digital communication system block
have been embedded into the transmitted waveform. Even
diagram is shown in Fig. 1 (a) and (b). In fact, the
within this fairly narrow context, there are a large number
receiver block diagram in Fig. 1 (b) mirrors the processing
of distinct iterative algorithms that have been suggested
performed in most practical receiver implementations.
in the literature.
This segregated design paradigm allows each component of
There is one elegant and relatively simple iterative
the receiver to be designed and ‘‘optimized’’ without much
approach that has recently emerged as a powerful tool
regard to the inner workings of the other blocks of the
in modern communication system design. This approach,
receiver. As long as each block does the job it is intended
which is the focus of this article, is known by many
for, the overall receiver should perform the desired task:
names in the literature, including iterative detection and
extracting the input bits.
decoding [1], belief propagation [2–4], message-passing
Despite the ubiquity of the diagram in Fig. 1 (b) and the
algorithms [5], and the ‘‘turbo principle.’’1 Appreciation
associated design paradigm, it clearly is not optimal from
for the power and generality of this approach in the
the standpoint of performance. More specifically, the prob-
communications and coding literature resulted from the
ability of error for the data estimates is not minimized by
invention of ‘‘turbo codes’’ in 1993 [6,7] and the associated
this structure. This segregated processing is adapted for
decoding algorithm, which is a direct application of
tractability — both conceptual tractability and tractabil-
the standard iterative algorithm addressed in this
ity of hardware implementation. The optimal receiver for
article [8]. Turbo codes are error-correction codes with
virtually any system is conceptually simple, yet typically
large memory and strong structure that are constructed
prohibitively complex to implement. For example, consider
using multiple constituent codes, each with relatively
the transmission of 1000 bits through a system of the form
low complexity, connected together by data permutation
shown in Fig. 1 (a). These bits may undergo forward error-
devices (interleavers). Turbo codes and similar turbo-like
correction coding (FEC), interleaving, training insertion
codes were found to perform very close to the theoretical
(pilots, synchronization fields, training sequences, etc.)
limit, a feat that evaded coding theorist for years and was
before modulation and transmission. The channel may
thought by many to be practically impossible.
corrupt the modulated signal through random distortions
Intrigued by the effectiveness of the iterative decoding
(possibly time-varying and nonlinear), like-signal inter-
algorithm, researchers in the field of data detection and
ference (co-channel, multiple access, crosstalk, etc.), and
decoding pursued several parallel directions shortly after
additive noise. The point is, regardless of the complexity of
the invention of turbo codes. One subject addressed was
the transmitter and/or channel, the optimal receiver would
the general rules for the iterative algorithm, i.e., Is
compute 21000 likelihoods and select the data sequence that
there a standard view of the various iterative algorithms
most closely matches the assumed model. Intuitively, this
suggested to decode turbo-like codes? A related issue
is one of the primary advantages of digital modulation
is In what sense and under what conditions is the
techniques — i.e., the receiver knows what to look for and
algorithm optimal and, otherwise, how may it be viewed
there are but a finite number of possibilities. This is shown
as a good approximation to optimal processing? A third
in Fig. 1 (c). Ignoring the obvious complexity problems,
area of research was the application of the iterative
this requires a good model of the transmitter formatting
decoding paradigm to other receiver processing tasks
and the channel effects. For example, the aforementioned
such as channel equalization, interference mitigation, and
likelihood computation may include averaging over the
parameter tracking. By the late 1990s, all of these issues
statistics of a fading channel model or the possible data
had been relatively well addressed in the literature [1]. A
values of like-signal interferers.
more recent accomplishment was the development of tools
The iterative approaches described in this article
for predicting the convergence properties of the standard
are exciting because they enable receiver processing
iterative algorithms [9,10].
that can closely approximate the above optimal solution
Currently, there is a consensus in the literature
with feasible conceptual and implementation complexity.
regarding the standard iterative detection paradigm and
Specifically, data detection and parameter estimation are
done using the entire global system structure. Unlike
1Unless emphasized otherwise, these terms are used interchange- the direct approach in Fig. 1 (c), the iterative receiver
ably in this article. in Fig. 1 (d) exploits this structure indirectly. The key
1184 ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS
Like-signal interference
Distortion (filtering, fading etc.) Channel
Noise
(b) Bit decisions
Deinterleaving Channel
Decoder and Demodulation
unformatting mitigation
Channel
estimation
concept in this approach is the exchange and updating of The standard iterative detection paradigm defines a
‘‘soft information’’ on digital quantities in the system (e.g., way of refining soft information on the system variables,
the coded modulation symbols). This concept is shown in most importantly the digital data inputs to the system.
Fig. 1 (d). The iterative detection receiver is similar to the The components of this approach are:
conventional segregated design in that, for each subsystem
block in the model, there is a corresponding processing • Modeling System Structure: This involves speci-
block. In fact, each of these corresponding processing fying the structure of the system or computational
blocks in the receiver of Fig. 1 (c) exploits only local system problem of interest such that local dependencies are
structure (e.g., the FEC decoder does not use any explicit captured and the global system structure is accu-
knowledge of the channel structure). As a consequence, rately modeled. For example, in Fig. 1 (a) this is
the complexity of the receiver in Fig. 1 (d) is comparable accomplished by modeling the system (global struc-
to the traditional segregated design in Fig. 1 (b) [i.e., the ture) as a concatenated network of subsystems (local
increase in complexity usually is linear as opposed to the structure). This may seem like a trivial step, but, in
exponential increase in complexity associated with the fact, this is possibly the most important and least
optimal processing in Fig. 1 (c)]. well-understood aspect of the field. As will become
apparent, once the model has been selected, the stan-
1.2. Components of the Standard Iterative Detection dard iterative detection algorithm is defined for the
Algorithm most part. The properties of the model selected deter-
There are several basic concepts that form the basis of the mine in large part the complexity of the processing,
standard iterative detection paradigm. At the core of this the optimality or the effectiveness of a suboptimal
paradigm is the exchange and update of soft information algorithm, and many relevant implementation prop-
on digital variables. A digital variable is one that takes erties such as latency and area tradeoffs. This basic
only a finite number of values, which are known to the problem has provided the impetus for applying graph
receiver. Soft information on a digital variable is an array theory to this field. Specifically, graphical models
of numbers that gives a measure ‘‘belief’’ that a given are simply explicit index block diagrams that cap-
conditional value is correct. For example, soft information ture local dependencies between relevant system
on a binary variable b ∈ {0, 1} can be represented by variables.
two numbers. If such soft information were represented • Directly and Fully Exploiting Local Structure:
as probabilities, P[b = 0] = 0.2 and P[b = 1] = 0.8 would For each defined local subsystem or node in the model,
represent the case where the current belief is that b = 1 the associated processing performed is defined in
is four times as likely as the zero hypothesis. Soft terms of the optimal receiver for that subsystem.
information can be represented in various forms and This leads to the notion of a marginal soft inverse
therefore is also referred to as beliefs, reliabilities, soft (or, for brevity, the soft inverse) of a subsystem. The
decisions, messages, and metrics. Soft information should soft inverse takes in marginal soft information on
be contrasted with hard decision information. For the each of the its digital inputs and outputs and updates
example discussed previously, the associated hard decision this information by exploiting the local subsystem
information would simply be the best current guess for the structure. The term marginal soft information means
value of b — i.e., b̂ = 1. that the soft information is provided separately for
ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS 1185
each variable indexed in a sequence. For example, suppose that a system using convolutional coding and
if {b0 , b1 , . . . b1023 } is a sequence of binary inputs to interleaving experiences severe like-signal interference
the system, the marginal soft information is 1024 and distortion over the channel. In this case, the channel
arrays of size two as opposed to one array of size mitigation block in Fig. 1 (b) will output hard decisions on
21024 (i.e., the latter is joint soft information). The the coded/interleaved bit sequence ak . Suppose that, given
subsystem soft inverse is defined based only on the the severity of the channel, the error probability associated
local subsystem structure. Specifically, it is based with these coded-bit decisions will be approximately 0.4.
on the optimal data detection algorithm under the Deinterleaving these decisions and performing hard-in
assumption of a sequence of independent inputs and a (Hamming distance) decoding of the convolutional code
memoryless channel corruption of the outputs. Thus, will provide a very high bit error rate (BER) (i.e.,
the soft inverse takes in ‘‘soft-in’’ information on the nearly 0.5).
system inputs and ‘‘soft-in’’ information on the system For the receiver in Fig. 1 (d), however, the channel
outputs, and it produces ‘‘soft-out’’ information on mitigation block produces soft-decision information on the
the system inputs and ‘‘soft-out’’ information on the coded/interleaved bit sequence ak . For example, this may
system outputs. For this reason, the soft inverse be thought of as two numbers P[ak = 1] and P[ak = 0] that
processor is sometimes referred to as a soft-in/soft-out represent a measure of current probability or belief that
(SISO) processor. the k-th coded bit ak takes on the value 1 or 0, respectively.
• Exploit Global Structure via Marginal Soft- Clearly, soft decisions contain more information than
Information Exchange: Since the soft inverse does the corresponding hard decisions. In this example, it
not take into account the overall global system is possible that even though the hard decisions on ak
structure, this structure should be accounted for associated with the receiver of Fig. 1 (b) are hopelessly
somehow if the globally optimal receiver is to be inaccurate, the soft-decision information contains enough
information to jump-start a decoding procedure. For
achieved or well approximated. This occurs by the
example, two different possible sequences of soft-decision
exchange of soft information between soft inverse
information are shown in Table 1. Note that each of these
processors that correspond to subsystems having
correspond to exactly the same hard-decision information
direct dependencies on the same variables. For
(i.e., the hard decisions obtained by thresholding the soft
example, if the output of one subsystem is the
information is the same). However, the soft information
input to another subsystem, they will exchange soft
in case B is much worse than that in case A. Specifically,
information on these common variables. The soft
for case A, there is a high degree of confidence for correct
information can be viewed as a method to bias
decisions and very low confidence for incorrect decisions.
subsequent soft inverse processing. Specifically, soft-
For case B, there is little confidence in any of the decisions.
in information on the system inputs plays the role of a
A receiver of the form in Fig. 1 (d) would pass the soft
priori probabilities on inputs and soft-in information
information through a deinterleaver to a soft-in/soft-out
on the system outputs plays the role of channel
decoder for the convolutional code. Thus, after activation
likelihoods.
of this decoder, one could make a decision on the uncoded
bits. Alternatively, the updated beliefs on the coded bits
Consider the preceding general description in the could be interleaved and used in the role of a priori
context of the example in Fig. 1. As mentioned, the probabilities to bias another activation of the channel
concatenated block diagram defines the model. For each mitigation processing unit in Fig. 1 (d). In fact, this process
block in the model, there is a soft inverse processor in the could be repeated with the channel mitigation and FEC
iterative receiver of Fig. 1 (d). The arrows leading into each decoder exchanging and updating beliefs on the coded bits
soft inverse block correspond to the soft-in information and through the interleaver/deinterleaver pair. After several
the arrows departing represent soft-out information. The iterations, final decisions can be made on the uncoded bits
task of these processing units is to update the beliefs on by thresholding the corresponding beliefs generated by the
the input and output variables of the corresponding system code processing unit. This is what is meant by iterative
sub-block in Fig. 1 (a). Each sub-block processing unit will detection.
be activated several times, each time biased by a different Note that in this example, even though the hard-
(updated) set of beliefs. decision information on the coded bits after activating
The iterative receiver offers significant performance the channel mitigation processing unit is very unreliable
advantages over the segregated design. For example, (e.g., an error rate of 0.4), the soft information may allow
true data: 0 0 1 0 1
case A: (0.99, 0.01) (0.97, 0.03) (0.51, 0.49) (0.48, 0.52) (0.03, 0.97)
case B: (0.51, 0.49) (0.55, 0.45) (0.51, 0.49) (0.48, 0.52) (0.48, 0.52)
decisions: 0 0 0 1 1
1186 ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS
system configuration is defined by an allowable input- inputs and a memoryless channel yields MAP symbol
output sequence pair [e.g., (a, x(a)) where boldface detection (MAP-SyD). Under the same assumptions, max-
represents the vector of variables]. In the probability product processing yields soft information that, when
domain, combining is performed by multiplication, in the thresholded, yields decisions optimal under the MAP
same way that the joint probability is the product of the sequence decision (MAP-SqD) criterion. In summary,
marginal probabilities for independent quantities. Thus, while the soft inversion problem was presented as a
for each configuration one could compute computational problem, it is based on Bayesian decision
theory for the system in isolated, ideal conditions.
P[a, x(a)] = P[x(a)] × P[a] As mentioned previously, for numerical stability and
N−1 M−1 hardware efficiency, the soft inversion processing is almost
= PI[xn (a)] × PI[am ] (1) always implemented in the log domain. Both sum-product
n=0 m=0
and max-product processing can be equivalently carried
out in the metric domain. For the max-product case, this is
where each element in the second equality is consistent straightforward since the negative-log and max operations
with the configuration (a, x(a)). commute. Furthermore, in the metric domain, the product
The second step is marginalization of this joint soft combining becomes sum combining. Thus, in the metric
information to produce updated marginal soft information. domain, max-product processing corresponds to min-sum
A natural way to marginalize is to sum over all P[a, x(a)] processing of metrics that are defined as the negative-log
consistent with a specific conditional value of a specific of the probability-domain quantities defined above [e.g.,
input or output variable. For example, to get PO[x10 = 3], MI[am ] = − ln(PI[am ])]. In particular, the min-sum soft
one could sum P[a, x(a)] over all configurations with inversion problem is
x10 = 3. The notation z : y is used to denote ‘‘all z consistent
with y.’’ Thus, marginalization to obtain PO[·] for the input M[a, x(a)] = M[x(a)] + M[a] (3a)
and output variables is N−1 M−1
= MI[xn (a)] + MI[am ] (3b)
n=0 m=0
PO[am ] = P[a, x(a)] ÷ PI[am ] (2a)
a : am
MO[am ] = min M[a, x(a)] − MI[am ] (3c)
a : am
PO[xn ] = P[a, x(a)] ÷ PI[xn ] (2b)
a : xn MO[xn ] = min M[a, x(a)] − MI[xn ] (3d)
a : xn
term Minimum Sequence Metric or Minimum Sum Metric to as intrinsic information, while the soft-in/soft-out
(MSM) is used to denote the minimum configuration information to be passed to other soft inverse modules
metric consistent with a particular conditional value of is in extrinsic (likelihood) form.
a variable. The notation MSM[u] is used to denote this In summary, the soft inverse of any system is defined
quantity. Thresholding MSM[am ] yields MAP-SqD. For based on the optimal receiver processing (MAP) for the
this reason, the min-sum version of the soft inversion system in isolation under the assumption of independent
problem is referred to as a shortest path problem [12]. inputs and a memoryless channel. The exact form of
A very useful result is that under certain condi- the soft inversion problem depends on the optimality
tions on the marginalization and combining operators, criterion (i.e., MAP-SyD or MAP-SqD) and the format
an algorithm to perform the soft inversion under one used to represent the soft information (i.e., metric or
convention can be directly converted to another con- probability domain). For the marginalization-combining
vention. Specifically, the marginalization and combining operators discussed, the semi-ring properties hold. Thus,
operators together with the soft information representa- most soft inversion algorithms of interest can be converted
tion should form a commutative semi-ring [5,12]. In this by simply replacing the combining and marginalization
case, any algorithm that uses only the properties of the operators.
marginalization-combining semi-ring3 can be converted
to any other marginalization-combining convention that
2.1. Specific Example Sub-Subsystems
forms a semi-ring. For example, this allows one to change
a proper MAP-SyD algorithm to a MAP-SqD algorithm It is important to note that the soft inversion [e.g., as
by simply redefining operators. This allows one to work stated in Eq. (3)] is a computational problem rather than
with the most convenient form to obtain a soft inverse an algorithm. The problem statement does suggest a
algorithm, then this algorithm can be converted as nec- method of computing the soft inverse, but this brute-
essary. For example, min-sum algorithms can often be force approach will typically be prohibitively complex. For
derived using pictures and intuition, while sum-product example, if the system in Fig. 4 has M binary inputs, it
algorithms are the most straightforward to prove analyti- will have 2M configurations. Computing soft information
cally. for each of these configurations and then performing
Consider a toy example of a system and its min-sum soft the subsequent marginalization will be prohibitively
inverse described in Table 2. This system has two inputs, complex even for moderate values of M. This brute-
a ∈ {0, 1} and b ∈ {0, 1, 2, 3} and one output c ∈ {0, 1}, with force method is referred to as exhaustive combining and
soft-in metrics as defined in the caption. Note that the best marginalization. In many cases, it is possible to compute
of the eight configurations is (a = 1, b = 1, c = 1). It follows the soft inverse with dramatically less computational effort
that MSM[a = 1] = MSM[b = 1] = MSM[c = 1] = −5 and by exploiting the special structure of the system. Thus,
MO[a = 1] = −7, MO[b = 1] = −2, and MO[c = 1] = −1. while there is really one unique computational problem,
Other values can be computed in a similar manner. it is useful to consider special cases of systems, find
For example, MSM[b = 2] = −2 (when a = 0 and c = 1) efficient algorithms for their soft inversion, and then use
and MO[b = 2] = −4. Note that if one desires the best these as standard modules for iterative detection–based
hard decision for a given variable, the soft-in and soft- receivers.
out information should be combined and this information For the implicit block diagram convention, Benedetto
should be thresholded (this is the MSM information in et al. [13] defined the marginal soft inverse for a variety
the min-sum case and the APP information in the sum- of systems commonly encountered. These are shown in
product case). This information is sometimes referred Fig. 5 with minor modification. Some of these are quite
trivial. For example, the soft inverse of an interleaver
Table 2. Toy Example of a System
is an interleaver/deinterleaver pair. Similarly for rate
with Two Inputs a and b and One converters, no computation is required for the soft
Output c. The Soft-In Information inversion.
is MI[a = 0] = 0, MI[a = 1] = 2, The memoryless mapper is a mapper that maps small
MI[b = 0] = 0, MI[b = 1] = −3, blocks of inputs onto outputs without memory between
MI[b = 2] = 2, MI[b = 3] = 6, blocks. For example, if four bits are collected and mapped
MI[c = 0] = 0, MI[c = 1] = −4 onto a 16-ary constellation, then the memoryless mapper
a b c Configuration Metric is the proper subsystem model. The soft inverse of
the memoryless mapper is computed using exhaustive
0 0 0 0+0+0=0 combining and marginalization over the small block
0 1 0 0 − 3 + 0 = −3
size (e.g., 16 configurations in the modulator example).
0 2 1 0 + 2 − 4 = −2
Another example in which the memoryless mapper
0 3 0 0+6+0=6
1 0 1 2 + 0 − 4 = −2 can be applicable is block error-correction codes. A
1 1 1 2 − 3 − 4 = −5 common special case of the mapper is the broadcaster
1 2 0 2+2+0=4 or repeater where all inputs and outputs are necessarily
1 3 0 2+6+0=8 equal.
One of the most important subsystems is the finite state
machine (FSM), which is discussed in detail in the next
3 These are called semi-ring algorithms in Ref. [1]. section. With the modules shown in Fig. 5, a large number
ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS 1189
ak xk SO[ak] SI[xk]
FSM SISO
SI[ak] SO[xk]
Finite state machine
of practical systems can be modeled and the corresponding fading effects [16]. So, an efficient method for soft inversion
standard iterative receivers defined. of an FSM is desirable.
Using the development of the soft inverse notion and
2.1.1. Soft Inversions of an FSM via the For- a presumed familiarity with the Viterbi algorithm [17]
ward–Backward Algorithm. An FSM is a common model for or similar dynamic programming tools [12], an efficient
subsystems of digital communications systems. An FSM is algorithm can be derived pictorially. This is the for-
a system that at time k has a current state sk that takes on ward–backward algorithm (FBA), which can be used to
a finite number of possible values. Application of the input compute the MSM of the inputs and/or outputs of an
at time k, ak , results in the system producing output xk FSM [18–20]. First, note that for a given transition, the
and transitioning to the next state sk+1 . The transition at FSM output xk and input ak are specified uniquely. Thus,
time k is denoted by tk , which is defined by (sk , ak ) or with each well-defined trellis transition can be assigned a tran-
some redundancy (sk , ak , sk+1 ). It is common to represent sition metric of the form
an FSM by a diagram that is explicit in time and in value,
the so-called trellis diagram.
Mk [tk ] = MI[ak (tk )] + MI[xk (tk )]. (5)
Convolutional codes and related trellis-coded modula-
tion schemes are naturally modeled as FSMs. Certain
modulation formats and precoding methods such as dif- Second, the metric of the shortest path through a given
ferential encoding, line coding, and continuous phase transition tk can be obtained if one has the metric of the
modulation (CPM) are also routinely defined in terms shortest path to the states sk from the left edge of the
of FSMs. Channel impairments are also often well mod- index range and the metric of the shortest path from the
eled as FSMs. This includes, for example, the intersymbol right edge to sk+1 . This concept is shown in Fig. 6, where
j
interference (ISI) channel [14], the multiple access inter- MSMi [·] denotes the MSM using input metrics over the
ference channel [15], and even channels with random indices from i to j inclusive. Mathematically, the claim
1190 ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS
3 13 6 6
11 11
4 4
6 3 7 10 13 3 7 22
4 4
10 10 9 9
6 6
Completion
Fk − 1[sk] Mk[tk] Bk + 1[sk + 1]
12 4 1
5
3 6
11
4
6 3 7 22
4
10 9
6
MSM0K − 1[ak = +1] = min {(12 + 4 + 1), (3 + 5 + 1), (6 + 11 + 6), (10 + 3 + 6)} = 9
MSM0K − 1[ak = −1] = min {(12 + 4 + 22), (3 + 7 + 22), (6 + 4 + 9), (10 + 6 + 9)} = 19
3+1
6 0+3 22
6+1
3+1
10 9
5+1
MI[ak = +1] = 3 MO[ak = +1] = 9 − 3 = 6 = (3 + 2 + 1)
MI[ak = −1] = 1 MO[ak = −1] = 19 − 1 = 18 = (6 + 3 + 9)
have inputs or outputs that are internal or hidden In most cases of practical interest, there either is
variables). a natural activation schedule or different activation
schedules produce similar results. Thus, for the most
A common stopping criterion is that a fixed number part, the iterative detector is specified once the block
of iterations are to be performed, with this number diagram is given and the subsystem soft inverses are
determined by computer simulation. For example, while determined. For example, the iterative detector for the
formal proofs of convergence for complicated iterative general concatenated system in Fig. 2 is shown in Fig. 3.
detectors are difficult, in most cases of practical interest, As a simple example of this paradigm, a simple
the performance improvement from iteration reaches a turbo code, or parallel concatenated convolutional code
point of diminishing returns. Alternatively, performance (PCCC) is considered. The encoder is shown in Fig. 8 and
may be sacrificed to reduce the number of iterations. It the corresponding standard decoder is shown in Fig. 9.
also is possible to define a stopping rule that results in The blocks used in the encoder are two FSMs, a one-
variable number of iterations. to-two broadcaster and a puncture/binary mapper. The
1192 ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS
Figure 9. The min-sum iterative decoder for the PCCC in Fig. 8. SO[xk] SI[xk]
Figure 11. The soft inverse of the transition subsystem.
decoder exploits the special properties of a one-to-two Activation is equivalent to one update of the backward and
broadcaster and efficient min-sum metric representation forward state metric recursions, and completion for both the
using normalization techniques. Even such a simple code FSM input and output.
is capable of approaching the theoretical limits within
1 dB of SNR.
under any activation schedule as long as it satisfies some
3.1. Sufficient Condition for Optimality of the Standard basic requirements. Intuitively, the soft-in information for
Rules each node must be passed to all other nodes before the
values stabilize. So, in this example, the globally optimal
Note that the iterative detector in Fig. 3 has the same solution is achieved by applying the standard rules to a
input/output format as a soft inverse. Specifically, the concatenated system model.
detector takes in soft-in information on the global system This is not the case in general. For example, for
inputs and outputs and performs an update to produce the iterative decoder shown in Fig. 9, one will observe
soft-out information on these same variables. Since the conditions where the soft information passed does not
soft inverse is defined relative to well-established optimal completely stabilize. Consider the explicit index diagram
receiver criteria, the natural question arises: When does for the PCCC encoder of Fig. 8 as illustrated in Fig. 12.
applying the standard iterative detection rules to a system Note that this contains two FSM subgraphs of the form
model yield the global system soft inverse, and hence the in Fig. 10 (a), but with connections due to the interleaver.
optimal receiver for the global system? Interestingly, there Ignoring the directions of the arrows, one can trace a loop
is a simple sufficient condition for this to occur. This is by following the connections in the diagram. Intuitively,
developed through the following examples.
Note that the general rules were not dependent on the
implicit index convention and apply as well to explicit
c0(0) c0(1) c1(0) ck(0) ck(1) cK − 2(0) c (0)
index diagrams. Consider, for example, the explicit index (even k) cK − 2(1) K − 1
··· ···
diagram for the general FSM as sown in Fig. 10 (a) s0(1) s1(1) s2(1) sk(1)+ 1 (1)
(1) sk − 1
in which the FSM has been decomposed or modeled T0(1) T1(1) T (1)
k TK−2 TK(1)− 1
by a series of small transition nodes or subsystems, sk(1) sK(1)− 2
b0 b1 bk bK − 2 bK − 1
each defining one transition in the trellis. Applying the ··· ···
standard definitions, it can be shown that the soft inverse b0 b1 bk
of each of these transition nodes performs one forward (Tail bits shown)
state recursion step, one backward state recursion step, Interleaver
and a completion step for both the FSM input and output.
a0 a1 ··· ak ··· aK − 2 aK − 1
This is illustrated in Fig. 11. As a result, running the
standard iterative detection rules on the receiver shown s0(2) s1(2) s2(2) sk(2)+ 1 sk(2)− 1
T0(2) T1(2) Tk(2) TK(2)− 2 TK(2)− 1
in Fig. 10 (b), with a specific activation schedule, yields sk(2) sK(2)− 2
exactly the FBA-based soft inverse of the FSM. Even more ··· ···
remarkable, it can be shown that after some point, further d1(1) dk(1) (odd k) dK − 1(1)
activation of the soft inverse nodes does not change the Figure 12. The explicit index block diagram of the PCCC shown
soft information. In fact, the soft information will stabilize in Fig. 8.
ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS 1193
one may understand how this could compromise the of severe parameter dynamics at very low SNR. Work in
optimality of the global detector. Specifically, the initial this area can be found in [1,27–31].
belief obtained from the channel for a particular variable A similar situation can arise when all parameters are
may propagate through the processing and return to be known, but there is a subsystem that has a prohibitively
‘‘double counted’’ by the original processing node. This can complex soft inverse. In such cases, one can attempt
lead to convergence to a false minimum. to approximate the soft inverse with an algorithm of
In fact, it can be shown rigorously that if no cycles reasonable complexity. One important case is an FSM
exist in the explicit index model (or equivalently the subsystem with a large number of states. One approach
graphical model), then under some simple requirements is to use decision feedback techniques similar to those
on the activation schedule, the standard rules result in used in reduced state sequence detectors [32]. There are
convergence to the globally optimal solution. This fact a number of variations on this theme suggested for
was known in computer science for some time [2,25]. reduced state SISO algorithms primarily applied to ISI
Two remarkable facts did arise, however, from the mitigation (equalization) [33,34]. Another approach is to
research efforts in communications and coding. First, at use an approximate soft inverse based on a constrained
a conceptual level, there really is only one algorithm. receiver structure. For example, one may use a linear
For example, the FBA and the turbo decoding algorithm or decision feedback equalizer, modified to update soft
are both just examples of running the standard rules on information, in place of an FBA-based soft inverse [35].
different system models [3,5,8,26]. Second, although this Another case in which some modification of the rules
processing on graphical models with loops is suboptimal in can be helpful is when there are short cycles in the
general, it is highly effective in many practical scenarios. underlying graphical models. In such cases, it is often
This can be motivated intuitively in a manner similar to observed that convergence occurs quickly, but performance
the finite traceback approximation in the Viterbi algorithm is poor. As is the case in many iterative algorithms,
or the fixed-lag approximation in the FBA. Specifically, if this effect can be alleviated somewhat by attempting to
the local neighborhood of all nodes is cycle-free, then one slow down the convergence of the algorithm. This can be
can expect the processing to well-approximate the optimal accomplished in a number of ways. For example, filtering
solution. the messages over iterations so as to slow their evolution or
In summary, there is one standard iterative detection simply scaling them to degrade the associated confidence
algorithm that is optimal when applied to cycle-free graphs can provide significant improvements in applications
and generally suboptimal, yet very effective, when applied where the basic approximations break down [1].
to graphical models with loops. One reasonable convention
for terminology is to refer to the algorithm on cycle- 4. APPLICATIONS AND IMPACT
free graphs as message passing and on loopy graphs as
iterative message passing. Also note that selection of the Once a general view of the iterative detection algorithm is
system model is critical in determining the complexity understood, applying the approach to various problems is
and performance of the algorithm. For example, use of relatively straightforward. A detailed description of these
the approach on models with cycles is most effective when applications is beyond the scope of this article. Instead, a
there exists some sparse structure in the system that can brief summary of the applications, the performance gains,
be exploited locally by proper reindexing of the variables. complexity issues, and representative references is given
in this section.
3.2. Modified Rules
Despite the elegance of the single set of message-passing • Turbo-Like Codes: Following the invention of turbo
rules, in some applications modification of these rules codes, many variations on the theme were described
is necessary or desirable. One example is when there in the literature. This includes serial concatenated
is some uncertainty in the structure of one or more convolutional codes (SCCCs) [36], low-density parity
subsystems. This occurs, for example, when some channel check (LDPC) codes [37,38], and product codes [39].
parameters are not completely known. In this case, the The most significant difference in the decoding
input–output relation for subsystems in contact with the algorithm from what has been presented herein is
channel may not be completely defined. This requires the use of indirect system models to perform the soft
consideration of the proper marginal soft inverse definition inversion. For example, it is possible to perform soft
in the presence of parametric uncertainty. Typically, inversion using the parity check structure of a block
the theory will suggest that exhaustive combining and code. This is possible because one needs only to be
marginalization is required to perform soft inversion in able to identify allowable configurations to combine
the presence of unknown parameters. However, greedy and marginalize over. Decoding of LDPC codes
approximations to this processing work well in practice. and product codes based on high-rate block codes
Such solutions typically are based on applying decision can be performed efficiently using this approach.
feedback or memory truncation techniques to recursive Codes similar in performance and structure to LDPC
formulations of the soft inverse computation. This yields codes, but with simpler encoding have also been
practical adaptive SISO modules that approximate the suggested. These include the generalized repeat-
soft inverse. Incorporating the parameter estimation and accumulate codes [40,41] and parallel concatenated
tracking tasks into the iteration process is a powerful tool. zig-zag codes [42], which are both based on a
Such adaptive iterative detection algorithms allow tracking recursive single parity check codes and broadcasters.
1194 ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS
Code design techniques have developed to the point of code SISO modules. Utilizing the code structure
where turbo-like codes are attractive alternatives is especially helpful if there is a high degree of
to previous approaches at virtually all code rates correlation between the signals on the multiple access
and reasonable input block sizes (e.g., >64). For channel (i.e., the channel is heavily loaded). In fact, it
reasonable block sizes and rates, it is not unusual has been demonstrated that users can be separated
to achieve 3 to 8 dB of additional coding gain relative when the only unique feature is an interleaver
to more conventional FEC designs. Codes based pattern (see [1] and references therein).
on PCCCs and SCCCs are commonly adopted for
standardized systems. In the coding application, code In applications where the operating SNR is very low,
construction allows a designer to achieve interleaver there can be an advantage to using min∗ -sum processing
gain as well as iteration gain. A system with over min-sum processing. This advantage typically is 0.2
interleaver gain will perform better as the size of to 1.0 dB for these applications, which are the turbo-
the interleaver increases, while iteration gain simply like codes and similar coding-modulation constructions.
means that iteration improves the performance. For other applications, such as joint equalization and
• Modulation and Coding: Based on the knowledge decoding or interference mitigation, there is little practical
of the design rules for turbo-like codes and the stan- advantage to using min∗ -sum processing over min-sum
dard iterative detection paradigm, several common processing.
modulation and coding system designs have been The number of iterations required varies significantly
demonstrated to benefit greatly from iterative pro- with application as well. Generally, systems with weak
cessing. One example is the serial concatenation of local structure converge more slowly. This is the case with
a convolutional code, an interleaver, and a recursive LDPC and similar codes that are based on single-parity-
inner modulator, which can be viewed as an effective, check codes. For a typical SCCC or PCCC 6 to 10 iterations
simple SCCC. This has been exploited in the case will yield the majority of the gains for most practical
of the inner system being a differential encoder [43] scenarios. In many of the interference mitigation and
and in the case of the inner modulation being CPM, equalization applications, most of the gains are achieved
which has a recursive representation [44,45]. Simi- with 3 to 5 iterations.
larly, simple schemes such as bit interleaved coded In all cases, the complexity increase relative to a
modulation (BICM), where coded bits are interleaved standard solution is moderate. In the case of turbo-like
and then mapped onto a nonbinary constellation, codes, it is not uncommon to achieve better performance
have been shown to benefit significantly from iter- with less complexity than conventional designs. In other
ative decoding/demodulation [46]. The benefits in applications, the soft inverse typically is 2 to 4 times
these various applications range from 1 to 6 dB of as complex as a subsystem processor based on the
SNR improvement for typical scenarios. segregated design paradigm. Substantially more memory
• Equalization and Decoding: Many systems are is also required. While not insignificant, these increases
designed with coding, interleaving, and a propagation in memory and computation requirements impact digital
channel that results in ISI. A number of researchers circuitry, which continues to experience a steady, rapid
realized the applicability of the standard iterative improvement in capabilities.
approach to such a scenario, typically with convolu-
tional coding and a relatively short ISI channel delay
spread [47–49]. In this case, the ISI and code SISOs 5. CONCLUSION
are both implemented using the FBA. For typical sce-
narios, soft-out equalization provides approximately Iterating soft-decision information to improve the per-
2.5 dB of gain in SNR over hard decision processing formance of practical digital communication systems can
and iteration provides another 4 dB in SNR. If the be motivated as an approach to approximate the opti-
channel is fading, then most of the iteration gain mal receiver and improve upon the performance of the
is lost when the average performance is considered. segregated design. This intuitive notion can be formal-
This is because the worst-case fading conditions, for ized using the standard tools of communication receiver
which there is little iteration gain, dominate the per- design, namely, Bayesian decision theory. This results in
formance [1]. In the case of severe channel dynamics, a standard approach that is based on the notion of the
significant iteration gain will be achieved in fading soft inverse of a system and the exchange and update
if a properly designed adaptive iterative detection of soft information. This approach, viewed generally,
algorithm is used. Unless combined with a turbo-like describes a standard algorithm for optimal or near-optimal
code or equivalent modulation techniques, there is no data detection for complex systems. A sufficient condition
interleaver gain in this application. for optimality (i.e., minimum error probability) is that
• Interference Mitigation: Similar to the previous the underlying graphical model describing dependencies
application, one can use the standard iterative between system variables be cycle-free. In the case where
algorithm to perform joint decoding and like-signal cycles exist, which is the case in many practical applica-
interference mitigation. Specifically, considering the tions of interest, the performance of the standard approach
case where each user’s data is coded and interleaved, is often extremely good. The approach has been demon-
they can be effectively separated by using an strated to improve performance substantially in a number
interference mitigation SISO module and a bank of relevant applications. The general approach is becoming
ITERATIVE DETECTION ALGORITHMS IN COMMUNICATIONS 1195
better known to researchers in the field, and the results are 16. J. Lodge and M. Moher, Maximum likelihood estimation of
finding adoption in engineering practice rather quickly. CPM signals transmitted over Rayleigh flat fading channels,
IEEE Trans. Commun. 38: 787–794 (June 1990).
17. G. D. Forney, Jr., The Viterbi algorithm, Proc. IEEE 61:
Acknowledgments
268–278 (March 1973).
The author thanks John Proakis for his encouragement and
understanding during the preparation of this article; Kluwer 18. R. W. Chang and J. C. Hancock, On receiver structures for
Academic Publishers for permission to use material from [1]; channels having memory, IEEE Trans. Inform. Theory IT-12:
and the co-authors of [1], Achilleas Anastasopoulos and Xiaopeng 463–468 (Oct. 1966).
Chen.
19. P. L. McAdam, L. R. Welch, and C. L. Weber, M.A.P. bit
decoding of convolutional codes, Proc. IEEE Int. Symp. Info.
Theory (1972).
BIBLIOGRAPHY
20. L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal
decoding of linear codes for minimizing symbol error rate,
1. K. M. Chugg, A. Anastasopoulos, and X. Chen, Iterative
IEEE Trans. Inform. Theory IT-20: 284–287 (March 1974).
Detection: Adaptivity, Complexity Reduction, and Applica-
tions, Kluwer Academic Publishers, 2001. 21. A. J. Viterbi, Justification and implementation of the MAP
decoder for convolutional codes, IEEE J. Select. Areas
2. J. Pearl, Probabilistic Reasoning in Intelligent Systems:
Commun. 16: 260–264 (Feb. 1998).
Networks of Plausible Inference, Morgan Kaufmann, San
Francisco, 1988. 22. G. Masera, G. Piccinini, M. R. Roch, and M. Zamboni, VLSI
architectures for turbo codes, IEEE Trans. VLSI 7: (Sept.
3. R. J. McEliece, D. J. C. MacKay, and J. F. Cheng, Turbo
decoding as an instance of Pearl’s ‘‘belief propagation’’ 1999).
algorithm, IEEE J. Select. Areas Commun. 16: 140–152 (Feb. 23. P. A. Beerel and K. M. Chugg, A low latency SISO with
1998). application to broadband turbo decoding, IEEE J. Select.
4. F. Kschischang and B. Frey, Iterative decoding of compound Areas Commun. 19: 860–870 (May 2001).
codes by probability propagation in graphical models, IEEE 24. P. Thiennviboon and K. M. Chugg, A low-latency SISO via
J. Select. Areas Commun. 16: 219–231 (Feb. 1998). message passing on a binary tree, in Proc. Allerton Conf.
5. S. M. Aji and R. J. McEliece, The generalized distributive Commun., Control, Comp. (Oct. 2000).
law, IEEE Trans. Inform. Theory 46: 325–343 (March 2000). 25. F. V. Jensen, An Introduction to Bayesian Networks,
6. C. Berrou, A. Glavieux, and P. Thitmajshima, Near shannon Springer-Verlag, 1996.
limit error-correcting coding and decoding: turbo-codes, 26. F. Kschischang, B. Frey, and H.-A. Loeliger, Factor graphs
in Proc. International Conf. Communications, (Geneva, and the sum-product algorithm, IEEE Trans. Inform. Theory
Switzerland), pp. 1064–1070, May 1993.
47: 498–519 (Feb. 2001).
7. C. Berrou and A. Glavieux, Near optimum error correcting
27. A. Anastasopoulos and K. M. Chugg, Adaptive soft-input soft-
coding and decoding: turbo-codes, IEEE Trans. Commun. 44:
output algorithms for iterative detection with parametric
1261–1271 (Oct. 1996).
uncertainty, IEEE Trans. Commun. 48: 1638–1649 (Oct.
8. N. Wiberg, Codes and Decoding on General Graphs. PhD 2000).
thesis, Linköping University (Sweden), 1996.
28. A. Anastasopoulos and K. M. Chugg, Adaptive iterative
9. T. Richardson and R. Urbanke, The capacity of low-density detection for phase tracking in turbo-coded systems, IEEE
parity-check codes under message-passing decoding, IEEE Trans. Commun. 49: 2135–2144 (Dec. 2001).
Trans. Inform. Theory 47: 599–618 (Feb. 2001).
29. M. C. Valenti and B. D. Woerner, Refined channel estimation
10. H. E. Gamal and J. A. R. Hammons, Analyzing the turbo for coherent detection of turbo codes over flat-fading channels,
decoder using the Gaussian approximation, IEEE Trans. IEE Electron. Lett. 34: 1033–1039 (Aug. 1998).
Inform. Theory 47: 671–686 (Feb. 2001).
30. J. Garcí a-Frí as and J. Villasenor, Joint turbo decoding and
11. P. Robertson, E. Villebrum, and P. Hoeher, A comparison
estimation of hidden Markov sources, IEEE J. Select. Areas
of optimal and suboptimal MAP decoding algorithms
Commun. 1671–1679 (Sept. 2001).
operating in the log domain, in Proc. International Conf.
Communications, (Seattle, WA), pp. 1009–1013, 1995. 31. G. Colavolpe, G. Ferrari, and R. Raheli, Noncoherent itera-
tive (turbo) detection, IEEE Trans. Commun. 48: 1488–1498
12. T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction
to Algorithms, Cambridge, Mass.: MIT Press, 1990. (Sept. 2000).
13. S. Benedetto, G. Montorsi, D. Divsalar, and F. Pollara, Soft- 32. M. V. Eyuboğlu and S. U. Qureshi, Reduced-state sequence
input soft-output modules for the construction and distributed estimation with set partitioning and decision feedback, IEEE
iterative decoding of code networks, European Trans. Trans. Commun. COM-38: 13–20 (Jan. 1988).
Telecommun. 9: 155–172 (March/Apr. 1998). 33. X. Chen and K. M. Chugg, Reduced state soft-in/soft-out algo-
14. G. D. Forney, Jr., Maximum-likelihood sequence estimation rithms for complexity reduction in iterative and non-iterative
of digital sequences in the presence of intersymbol inter- data detection, in Proc. International Conf. Communications,
ference, IEEE Trans. Inform. Theory IT-18: 284–287 (May (New Orleans, LA), 2000.
1972). 34. P. Thiennviboon, G. Ferrari, and K. Chugg, Generalized
15. S. Verdú, Minimum probability of error for asynchronous trellis-based reduced-state soft-input/soft-output algorithms,
Gaussian multiple-access channels, IEEE Trans. Inform. in Proc. International Conf. Communications, (New York),
Theory 32: 85–96 (Jan. 1986). pp. 1667–1671, May 2002.
1196 ITERATIVE DETECTION METHODS FOR MULTIUSER DIRECT-SEQUENCE CDMA SYSTEMS
35. M. Tuchler, R. Koetter, and A. Singer, Turbo equalization: the demands on enabling communication networks. The
principles and new results, IEEE Trans. Commun. 50: continuous development of ever-faster computers allows
754–767 (May 2002). for the development of ever larger communication systems
36. S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, Serial to meet the ever-increasing demand for capacity to support
concatenation of interleaved codes: performance analysis, the never-ending supply of new communications services.
design, and iterative decoding, IEEE Trans. Inform. Theory The third-generation (3G) mobile network, the so-called
44: 909–926 (May 1998). IMT2000 (International Mobile Telecommunication 2000)
37. R. G. Gallager, Low density parity check codes, IEEE Trans. system, is the next step to bringing wireless Internet
Inform. Theory 8: 21–28 (Jan. 1962). to the general consumer [1]. The 3G cellular mobile
38. D. J. C. MacKay, Good error-correcting codes based on very network provides up to 2 megabits per second (Mbps) for
sparse matrices, IEE Electron. Lett. 33: 457–458 (March indoor environments and 144 kilobits per second (kbps) for
1997). vehicular environments. The two dominating standards
39. J. Hagenauer, E. Offer, and L. Papke, Iterative decoding of for 3G networks are UMTS (Universal Mobile Telephone
binary block and convolutional codes, IEEE Trans. Inform. System) and cdma2000, respectively [1]. They are both
Theory 42: 429–445 (March 1996). based on direct sequence, code-division multiple-access
40. H. Jin, A. Khandekar, and R. McEliece, Irregular repeat (DSCDMA) technology, which was considered to provide
accumulate codes, in Turbo Code Conf., (Brest, France), 2000. the best alternative within the standardization process.
DSCDMA is a spread-spectrum transmission technology
41. K. R. Narayanan, I. Altunbas, and R. Narayanaswami, On
the design of LDPC codes for MSK, in Proc. Globecom Conf., where each user in principle is assigned a unique signature
(San Antonio, TX), pp. 1011–1015, Nov. 2001. waveform, creating a distinguishing feature separating
multiple users and thus providing multiple access [2].
42. L. Ping, X. Huang, and N. Phamdo, Zigzag codes and
concatenated zigzag codes, IEEE Trans. Inform. Theory 47:
In popular terms, these principles may be likened to
800–807 (Feb. 2001). conversations at a cocktail party where each conversation
is conducted in a unique language. Even though
43. P. Hoeher and J. Lodge, Turbo DPSK: iterative differential
a particular conversation can be clearly heard by
PSK demodulation and channel decoding, IEEE Trans.
Commun. 47: 837–843 (June 1999). surrounding people engaged in other conversations, they
do not become disturbed since they do not understand the
44. K. Narayanan and G. Stuber, Performance of trellis-coded
language and thus consider it as background noise. In
CPM with iterative demodulation and decoding, IEEE Trans.
case two languages are closely related, two simultaneous
Commun. 49: 676–687 (Apr. 2001).
conversations may interfere with each other. The same
45. P. Moqvist and T. Aulin, Serially concatenated continuous phenomenon occurs in CDMA if two users are assigned
phase modulation with iterative decoding, IEEE Trans.
signature waveforms that are closely correlated. This is
Commun. 49: 1901–1915 (Nov. 2001).
termed multiple-access interference (MAI) and is one of the
46. X. Li and J. A. Ritcey, Trellis-coded modulation with bit performance-limiting factors when conventional single-
interleaving and iterative decoding, IEEE J. Select. Areas user technologies are used in CDMA systems [3,4]. The
Commun. 17: 715–724 (Apr. 1999).
basics of spread-spectrum and CDMA technologies are
47. C. Douillard, Iterative correction of intersymbol interfer- described in more details elsewhere in this book. For
ence: Turbo equalization, European Trans. Telecommun. 6: further information, see also Refs. 1,2,5 and 6.
507–511 (Sept. 1995).
Third-generation mobile cellular systems are designed
48. A. Anastasopoulos and K. M. Chugg, Iterative equaliza- for multimedia communications. The standards of person-
tion/decoding of TCM for frequency-selective fading chan- to-person communications can be improved through better
nels, in Proc. Asilomar Conf. Signals, Systems, Comp., voice quality and the possibility of exchanging high-quality
pp. 177–181, Nov. 1997.
images and video. In addition, access to information and
49. A. Picart, P. Didier, and A. Glavieux, Turbo-detection: a new services on public and private networks will improve
approach to combat channel frequency selectivity, in Proc. through higher data rates and variable data rate options,
International Conf. Communications, (Montreal, Canada), introducing a high level of flexibility. Through the
1997.
standardization process, CDMA technologies came out
as the overall winners. The UMTS system is based on
so-called wideband CDMA (WCDMA) [7]. There are no
ITERATIVE DETECTION METHODS FOR conceptual differences between WCDMA and CDMA. The
MULTIUSER DIRECT-SEQUENCE CDMA former merely uses a bandwidth that is significantly
SYSTEMS larger than second-generation CDMA systems, leading to
additional spread-spectrum advantages such a robustness
LARS K. RASMUSSEN toward hostile mobile radio channels.
University of South Australia The flexible data rates, and especially the high data
Mawson Lakes, Australia rates of 2 Mbps, represent significant challenges for
equipment manufacturers. For the system load to be
1. INTRODUCTION commercially viable, technologies providing considerable
system capacity improvements are required. Three areas
The continuing development of the Internet, wireless of technology have been identified as enabling techniques
communication and wireless Internet is rapidly increasing for 3G CDMA systems:
ITERATIVE DETECTION METHODS FOR MULTIUSER DIRECT-SEQUENCE CDMA SYSTEMS 1197
• Error control coding [8,9] in order to find the combination that is maximum likely.
• Multiuser detection [5] Maximum-likelihood (ML) joint multiuser detection has
• Space–time processing [10] been suggested by several authors in the literature [12,13],
most notably by Verdú [14]. Verdú [14] suggested the use
of the Viterbi search algorithm [16] for detection was as
In this article, we discuss the use of multiuser detection
an efficient implementation of the exhaustive search.
for CDMA systems in general, and the use of iterative
The ML problem is known to be a so-called NP-hard
multiuser detection strategies in particular.
problem [16] and can be solved only by an exhaustive
search as described above, leading to a detection
1.1. Multiuser Detection complexity that grows exponentially with the number of
Historically, multiple-access systems have been designed users. In some important cases, this complexity growth
to avoid the MAI problem. This is accomplished by dividing is beyond practical implementation and will remain so
the available system resources into dedicated portions for the foreseeable future. To address this complexity
for the exclusive use by a designated communication problem, an abundance of receiver structures have been
connection. In a CDMA system, we break with these proposed [5,17]. Most of these structures are sub-optimal
principles and allow users to utilize all resources approximations to classic design criteria.
simultaneously. Under certain system conditions, it is Among the first complexity reducing schemes, the
possible to avoid MAI. However, such conditions are in decorrelating detector was suggested [18]. The detector
general impossible to achieve in a practical setting, and decorrelates the data symbols, removing the MAI entirely
thus, a certain level of MAI is to be expected. through linear filtering. The filter is determined by the
Initially, the MAI was considered as unavoidable inverse of the channel matrix. The filter removes all
interference and assumed to possess similar statistical MAI at the expense of noise enhancement. A slightly
characteristics as thermal background noise generated different approach is taken by the linear minimum
in electronic components. Based on such arguments, the mean-squared error (LMMSE) detector, which minimizes
optimal receiver structures developed for the case of the mean squared error between detector output and
thermal noise only, are also optimal in the case of MAI the transmitted symbol [19]. This detector takes the
with similar statistical behavior [11]. The performance thermal noise as well as the correlation between users
of such systems are determined by the signal power to into account and therefore generally performs better
noise power ratio. It is therefore tempting to increase than the decorrelator in terms of bit error rate (BER).
the transmitted signal power to improve performance. Both the decorrelator and the LMMSE detector require
However, if all active users do that, the power of matrix inversion, which is generally also prohibitively
the interfering MAI increases with the same amount, complex for practical implementation of realistically sized
providing no performance gains at all. On the basis of systems. As an alternative, the LMMSE detector can be
these assumptions and techniques, it follows that the approximated by adaptive detectors such as described in
systems are interference limited. A strict upper limit on the literature [20–22].
active users, corresponding to a certain signal to total Since even linear detectors have complexity problems,
noise level, decides the system capacity [5]. multiuser detection for CDMA was initially considered
The problem with this approach is that each active to be prohibitively complex for practical implementation.
user is making decisions in isolation, regarding the cor- With the massive research conducted in conjunction with
responding MAI as unstructured noise. As an alternative, IMT2000, however, this view has now changed. For prac-
a joint decision among all users simultaneously takes the tical implementation, interference cancellation schemes
known structure of the MAI into account, treating the MAI have been subject to most attention. These techniques
as additional information in the decision process. Assum- rely on simple processing elements constructed around
ing binary transmission (two possible signal waveform conventional receiver concepts. The main component in a
alternatives for transmission for each user), a decision conventional receiver is a filter matched to the user-specific
made in isolation is based on a choice between which of signaling waveform. An estimate of the contribution of a
the two signal waveform alternatives were transmitted, specific user to the composite received signal of all users
ignoring the known structure of the MAI. In contrast, can be generated based on the corresponding matched-
a joint multiuser decision is based on evaluating a suit- filter output. For a specific user, we can now subtract
able cost function for all possible transmitted waveform the contributions to the received signal made from all
combinations between the active users, selecting the com- other users. If this estimate of the MAI experienced by
bination that maximizes the cost function. For optimal that specific user is correct, we can effectively eliminate
detection, we shall attempt to maximize the probability the disturbance, obtaining a clean signal with no MAI.
of a certain combination of data symbols being trans- In practise the MAI estimate is rarely completely correct,
mitted, given the particular received signal. Assuming leaving some residual interference. Cancellation can then
that all data symbols are equally likely, this optimization be done iteratively to improve the quality of the resulting
criterion is equivalent to maximizing the corresponding signal for detection of a specific user [23,24]. A family of
likelihood function [11]. For three active users, each using structures can be defined based on how MAI estimates are
binary transmission, we need to evaluate the likelihood generated [23,25,26].
function for 23 combinations of data symbols, or in gen- Let us assume that we have tentative decisions for
eral for K active users, 2K combinations of data symbols the data symbols for all users. On the basis of these
1198 ITERATIVE DETECTION METHODS FOR MULTIUSER DIRECT-SEQUENCE CDMA SYSTEMS
decisions we can make an estimate of the MAI experienced is piecewise linear [30]. A similar shape can be obtained
by each user, and subtract the corresponding estimates on the basis of nonlinear principles as shown in Fig. 1
from the received signal. With K parallel processing for a hyperbolic tangent function [31,32]. It should be
units, one iteration for each user can be done in parallel. noted that once the cancellation process is completed, hard
This approach is naturally denoted parallel interference decisions are applied to the resulting decision statistics in
cancellation (PIC) [23]. As an alternative, we can process case a final decision is required. In some cases, the soft
one user at a time. From a single cancellation operation, cancellation output is used directly as input to an error
we get an updated tentative decision for a specific user control decoder [33], in which case no final decisions are
that can now be used in the process of generating made in the iterative detector.
an updated MAI estimate for the following user. This The cancellation strategy and the tentative decision
way, the most updated information is always used in function define the character of an interference cancel-
the cancellation process. In this approach, the users lation structure. The best performance in terms of bit
are processed successively, introducing a detection delay error rate is obtained by SIC schemes as compared to
between users. The strategy is known as successive PIC schemes. This advantage is, however, achieved at the
interference cancellation (SIC) [25,26]. expense of detection delays due to the successive process-
In the discussion above, we have assumed we have ten- ing. For the tentative decision functions, the linear clip
tative decisions available. Tentative decisions are obtained and the hyperbolic tangent generally provide better per-
based on so-called decision statistics, which are passed formance than do linear and hard decisions. These issues
through a corresponding tentative decision function. The are discussed at length in the remainder of the article.
collection of all matched filter outputs represents a suf- The rest of the article is organized as follows. In
ficient statistic, including all information required for Section 2, an algebraic model is derived for a simple
making an optimal ML decision for all users simultane- synchronous CDMA system. The model is kept simple
ously. The cancellation process, however, is not in general to more clearly illustrate the principles of iterative mul-
an iterative joint detection process. The decision statistics tiuser detection. In Section 3 the fundamental principles
resulting from cancellation are thus not necessarily repre- for interference cancellation are formalized and moti-
senting sufficient statistics. In some special cases, such as vated. Corresponding modular structures are suggested,
linear cancellation [27], however, they are. constructed around a simple interference cancellation
Assume that each user is transmitting binary data unit. The hard tentative decision function is discussed
dk , for example represented by dk = +1 or dk = −1. The in Section 4, while the concept of weighted cancellation
corresponding received decision statistic then contains is introduced in Section 5, as a powerful technique for
contributions from the desired transmitted data symbol, improving convergence speed and BER performance. Lin-
noise generated in the receiver and corresponding MAI. ear cancellation based on the linear tentative decision
On the basis of this received decision statistic, a tentative function is presented in Section 6, where the connection
decision is to be made. In the literature mainly four to classic iterations for solving linear equation systems is
different tentative decision functions have been suggested explained. Here it is shown that the cancellation structure
for interference cancellation. These are shown in Fig. 1. in fact leads to an iterative solution to the constrained
The first one is a linear decision where in fact the decision ML problem. The advantages of the clipped linear ten-
statistic is left untouched [28,29]. Alternatively, a hard tative decision function are detailed in Section 7, and
decision can be made where it is decided whether a +1 or in Section 8, structures using the hyperbolic tentative
a −1 was transmitted [23,25]. This is done on the basis of decision function are discussed. Numerical examples are
the polarity of the decision statistic. These two principles included in Section 9 to illustrate the characteristics of the
can be combined to generate the linear clip function, which schemes discussed, and in Section 10, concluding remarks
are made.
d1 T y1
p1(t ) ⌠ s p1(t )dt
⌡0
• •
• • r (t ) • •
• • + + • •
• •
dK T yK
pK (t ) n (t ) ⌠ s pK (t )dt Figure 2. A simple continuous-time model for a
⌡0
synchronous CDMA system.
where {ak (j): 0 ≤ j ≤ N − 1} is a spreading code sequence We also assume that binary phase shift keying (BPSK)
consisting of N chips that take on values {−1, +1}, and modulation formats are used: D = {−1, +1}. All the
p(t) is a chip pulse of duration Tc where Tc is the chip presented concepts, however, generalize to more elaborate
interval. Here, pk (t) = 0 outside the symbol interval, t < 0, cases. For this simplified case, it is sufficient to only
t > Ts since a short-code system is considered. Thus, we consider one symbol interval. The symbol interval index
have N chips per symbol and Ts = NTc . Without loss of is therefore omitted in the following. The corresponding K
generality, we assume that all K signature waveforms user model is shown in Fig. 2, where φk = 0 for 1 ≤ k ≤ K.
have unit energy: The transmitted signal is assumed to be corrupted
by additive white Gaussian noise (AWGN). Hence, the
Ts
received signal may be expressed as
|pk (t)|2 dt = 1 (2)
0
r(t) = s(t) + n(t) (5)
The information sequence of the kth user is denoted
{dk (l): 0 ≤ l ≤ L − 1}, where the value of each information where n(t) is the noise with double-sided power spectral
symbol may be chosen from the set D, and L denotes the density σn2 = N0 /2. The sampled output of a chip matched
block length of the transmission. For binary transmission, filter (CMF) for chip interval j, user k and an arbitrary
we have D = {−1, +1}. The corresponding equivalent symbol interval is [11]
lowpass transmitted waveform may be expressed as (j+1)Tc
rj = r(t)p(t − jTc )dt, 0≤j≤N−1 (6)
L−1
jTc
sk (t) = e jφk (l)
dk (l)pk (t − lTs ) (3)
l=0 Collecting all CMF outputs in a column vector
The composite signal for the K users may be expressed r = (r1 , r2 , . . . , rN )T (7)
as follows, assuming for simplicity a single-path channel
with unit magnitude: we have
1
K
K
K
L−1 r= √ ak dk + n (8)
s(t) = sk (t − τk ) = e jφk (l) dk (l)pk (t − lTs − τk ) N k=1
k=1 k=1 l=0
(4) where ak is a length N vector representing the code
where {τk : 1 ≤ k ≤ K} are the transmission delays, which sequence {ak (j): 0 ≤ j ≤ N − 1} and n is a length N
satisfy the condition 0 ≤ τk ≤ Ts for 1 ≤ k ≤ K and Gaussian noise vector with autocorrelation matrix
{φk (l): 1 ≤ k ≤ K, 0 ≤ l ≤ L − 1} are the random-phase
rotations, assumed constant over one symbol interval. This E{nnT } = σn2 I (9)
is the model for a multiuser signal in asynchronous mode,
which is typical for an uplink scenario. In the special case Define
of synchronous transmission, τk = 0 for 1 ≤ k ≤ K, which 1
sk = √ ak (10)
is a typical scenario for downlink transmission. This model N
can easily be extended to model transmission over multi-
path channels. The extension is discussion in more detail On the basis of Eq. (8), discrete-time code matched filtering
in, for example, Ref. 34. for user k can be conveniently described as
To make the presentation in the following sections
notationwise and conceptually simple, we will focus
K
K
yk = sTk r = sTk si di + sTk n = sTk si di + zk
on a synchronous CDMA system without any random
i=1 i=1
phase rotation, specifically, τk = 0 and φk (l) = 0 for 1 ≤
k ≤ K, 0 ≤ l ≤ L − 1. A synchronous system is naturally = dk + ρki di + zk = dk + wk + zk (11)
associated with downlink transmission while multiuser i =k
detection strategies are intended primarily for uplink
transmission. We still maintain a synchronous model in where ρki = sTk si is the cross-correlation between the
this presentation as it greatly simplifies notation and spreading codes of users k and i, zk is the Gaussian noise
conception. The extensions to asynchronous systems are experienced by user k, and wk is the MAI experienced by
straightforward once the basic principles are understood. user k, respectively.
1200 ITERATIVE DETECTION METHODS FOR MULTIUSER DIRECT-SEQUENCE CDMA SYSTEMS
Considering the algebraic structure of the discrete-time The correlation matrix can be partitioned as R = I + M,
received signal yk described in (11), we can conclude the fol- where I is the identity matrix and M is the corresponding
lowing. Interference arises since, in general, every output, off-diagonal matrix. We can then write (16) as follows
yk , has a contribution from every input, {dk : 1 ≤ k ≤ K}.
Under ideal conditions where all users are orthogonal to y = (I + M)d + z = d + Md + z (19)
each other, all the cross-correlations are zero and there
is no interference. In practice such a scenario is virtually where d is the desired signal vector and Md is MAI.
impossible to achieve, and thus each output has contri-
butions from all users. When the MAI is ignored in the 3. PRINCIPLE STRUCTURE
detection process, the performance is strongly interfer-
ence limited. However, when the structure of the MAI is It was argued in Section 2 that the decision statistics for
considered in the detector, considerable gains are possible. detection are polluted by MAI. Let us for a moment assume
The models in (8) and (11) can conveniently be extended that, somehow, the MAI is perfectly known at the receiver.
to include all users, applying linear algebra to provide a It is then possible to eliminate the interference simply by
compact description. Equation (8) can be described as subtracting it from the received signal:
r = Sd + n (12)
xk = yk − ρki di = dk + zk (20)
i =k
where S is a N × K matrix containing the spreading
codes (10) of all users as columns By subtracting the known MAI, we obtain an interference-
free received signal, and thus the performance is identical
S = (s1 , s2 , . . . , sK ) (13) to the case where only one user is present, the so-called
single-user (SU) case.
and d is a length K vector of user symbols Unfortunately, the MAI is not perfectly known at the
receiver. To have perfect knowledge of the MAI requires
d = (d1 , d2 , . . . , dK )T (14) perfect knowledge of the transmitted symbols, in which
case there would be no information contained in the
Each decision statistic is described by (11). Collecting all transmission. Instead of perfect knowledge, we can use an
decision statistics in a vector estimate of the transmitted symbols. For example, let the
initial estimate for user k, u1,k be determined by a hard
y = (y1 , y2 , . . . , yK )T (15)
decision based on the corresponding received matched-
filter output, yk , i.e., the polarity of yk decides whether
we arrive at the following model
u1,k = 1 or u1,k = −1
y = ST Sd + ST n = Rd + z (16)
u1,k = Sgn (yk ) (21)
where R is a symmetric, positive semi-definite correlation
where Sgn(·) denotes the polarity check function, which is
matrix of dimension K × K
the same as the hard decision in Fig. 1:
1 ρ12 · · · ρ1K
. .. 1 x≥0
R = .. . (17) Sgn (x) = (22)
−1 x < 0
ρK1 ρK2 · · · 1
An updated decision statistic, x2,k after one step of MAI
and z is a vector of length K, containing the Gaussian subtraction is then
noise samples with autocorrelation function:
x2,k = yk − ρki u1,i = dk + ρki (di − u1,i ) + zk (23)
E{zzT } = σn2 R (18) i =k i =k
This strategy can be continued until no further improve- The residual error vector can be updated recursively:
ments are obtained. In case the updates for all the users
are done simultaneously, this iterative multiuser detection
k
K
scheme is also known as hard-decision, multistage parallel em+1,k+1 = r − si um+1,i − si um,i (34)
i=1 i=k+1
interference cancellation. It is called interference cancel-
lation (IC) for obvious reasons as we attempt to cancel
k−1
K
MAI. It is termed ‘‘parallel’’ since updated decision statis- =r− si um+1,i − si um,i
tics for all the users are determined in parallel, based on i=1 i=k
the same tentative estimates of the transmitted symbols. − sk um+1,k + sk um,k (35)
The update process is obviously recursive or iterative, a
characteristic that initially was termed ‘‘multistage detec- = em+1,k − sk (um+1,k − um,k )
tion’’ [23]. Finally the scheme is based on hard decisions, = em+1,k − em+1,k (36)
namely, polarity check, on the resulting decision statistics.
This is, however, not necessarily the best tentative Here, em+1,K+1 = em+2,1 .
decision strategy. In Fig. 1, four alternatives are shown: The PIC structure described previously can also be
linear decision, hard decision, clipped linear decision, and described in this manner. In this case
hyperbolic tangent decision. We will later examine these
tentative decision functions in more detail and try to
K
+ si um−1,i − si um−1,i
um+1,k = fx yk − ρki um,i (28) i=1 i=1
i =k
K
K
= em − si (um+1,k − um,k ) = em − em+1,i (38)
with u0,k = 0. i=1 i=1
Cancellation can also be based on the most current
estimate of the MAI. In this case, the MAI estimate is where em+1,k = sk (um+1,k − um,k ) as before. Comparing
updated for each new tentative decision. As a consequence, the cases for SIC and PIC, we see that for user k, the
the users are processed successively, leading to SIC in required input to make an updated tentative decision is
contrast to the PIC described above. An iterative SIC um,k and em+1,k while the output is conveniently um+1,k
scheme is described by and em+1,k . We can thus define a basic interference
cancellation unit (ICU) as shown in Fig. 4, where fx (·)
k−1
K
is a predetermined tentative decision function, possibly
xm+1,k = yk − ρki um+1,i − ρki um,i (29) selected among the four alternatives illustrated in Fig. 1.
i=1 i=k+1 The SIC and the PIC structures are then obtained
um+1,k = fx (xm+1,k ). (30) by different interconnection strategies of ICUs. In Fig. 5,
we have an SIC structure. The residual error vector is
To arrive at a description convenient for implementation, updated according to (36), as it should be. In contrast,
we first use the fact that yk = sTk r and ρki = sTk si : we have a PIC structure in Fig. 6, where the same
residual error vector is input to all ICUs at the same
k−1
K iteration and it is updated according to (37). This modular
xm+1,k = sTk r − si um+1,i − si um,i (31) structure is quite attractive for practical implementation
i=1 i=k+1 and thus IC structures have received most attention
r
u1,1 u2,1 u3,1
e1,1 ICU ∆e1,1 e2,1 ICU ∆e2,1 e3,1 ICU ∆e3,1
Ts Ts Ts
− − −
+ + +
u1,2 u2,2 u3,2
e1,2 ICU ∆e1,2 e2,2 ICU ∆e2,2 e3,2 ICU ∆e3,2
Ts Ts Ts
− − −
+ + +
u1,3 u2,3 u3,3
e1,3 ICU ∆e1,3 e2,3 ICU ∆e2,3 e3,3 ICU ∆e3,3
Ts Ts Ts
− − −
+ + +
u1,4 u2,4 u3,4
e1,4 ICU ∆e1,4 e2,4 ICU ∆e2,4 e3,4 ICU ∆e3,4
Ts Ts Ts
− − −
+ + +
e4,1
Figure 5. A modular SIC structure.
r
u1,1 u2,1 u3,1
e1,1 ICU ∆e1,1 e2,1 ICU ∆e2,1 e3,1 ICU ∆e3,1
for potential commercial use. The regular structure or in a more compact form
also motivates hybrid, or group, cancellation strategies,
combining successive and parallel techniques [35,36]. A um+1 = fx (y − Mum ) (41)
groupwise cancellation structure is shown in Fig. 7.
As mentioned previously, it should be noted that once with u0 = 0. For a convenient algebraic description of SIC,
the cancellation process is completed, hard decisions are the following partition of M is helpful
applied to the resulting decision statistics in case a final
decision out of the detector is required. M= L+U (42)
As in Section 2, the decision statistics can be collected
in vectors and the cancellation process can be described where L is a strictly lower left triangular matrix and U is
conveniently through matrix algebra. A PIC scheme can a strictly upper right triangular matrix, respectively. An
thus be described as iterative SIC scheme is then described by
r
u1,1 u2,1 u3,1
e1,1 ICU ∆e1,1 e2,1 ICU ∆e2,1 e3,1 ICU ∆e3,1
4. HARD TENTATIVE DECISION FUNCTION and given knowledge of all other transmitted symbols. Let
us first define
In the previous section, we defined SIC and PIC principles
and established how they are related. In the continuing d(k) = (d1 , d2 , . . . , dk−1 , dk+1 , . . . , dK )T (48)
discussion, we focus on PIC structures. The extension to
SIC schemes is usually straightforward, although more We now want to maximize
cumbersome notation is required. Known difficulties are p(y, d(k) | dk )
explicitly pointed out. Also in the previous section, the P(dk | y, d(k)) = P(dk )
p(y, d(k))
principles of cancellation were presented in an intuitive
manner. In the following sections, we show that certain p(y | d(k), dk )P(d(k) | dk )
= P(dk ) (49)
iterative interference cancellation structures are in fact p(y, d(k))
iterative realizations of known theoretically justified
detector structures. with respect to dk . Here, p(·) denotes a probability
One of the first suggested IC structures was hard density function while P(·) denotes a probability. The
decision PIC [23]. This structure can be derived as an equality follows from Baye’s rule [11]. Maximizing (49)
approximation to the optimal ML detector, given certain is equivalent to maximizing p(y | d(k), dk ), a problem that
simplifying assumptions. Considering the signal model can be described by the loglikelihood function,
in (16), then given perfect knowledge of the correlation
matrix, the decision statistics are jointly Gaussian d̂k = arg min [dT Rd − 2yT d]
dk ∈{−1,1}
distributed, leading to the following probability density
function which is also the likelihood function:
= arg min d2k + 2dk ρki di − 2yk dk (50)
-1 −1
. dk ∈{−1,1}
i =k
p(y | d) = C exp 2
(y − Rd) R (y − Rd)
T
(45)
where we have thrown away all terms independent of dk
Here, C is a constant which is independent of the and thus do not influence the optimization problem. We
conditional data vector d. The corresponding log-likelihood can also write this as
function is [37]
(d) = dT Rd − 2yT d (46) d̂k = arg max dk yk − ρki di
dk ∈{−1,1}
i =k
and thus the optimal ML decision is the argument
d ∈ {−1, 1}K that minimizes the loglikelihood function: = Sgn yk − ρki di (51)
i =k
optimal structure. Using current estimates of these 6. LINEAR TENTATIVE DECISION FUNCTION
symbols of course leads to an approximating, suboptimal
approach. Depending on the strategy of cancellation, PIC, Let us now focus on the linear tentative decision function.
SIC or hybrids of the two are obtained. In this case, the corresponding iterative detectors are also
linear. We therefore start by considering optimal linear
detectors. Allowing the symbol estimate vector to be any
5. WEIGHTED CANCELLATION real-valued vector, u ∈ K , the corresponding ML solution
is easily found to be [18]
Hard decision cancellation is prone to error propagation
and can exhibit a significant error floor for high SNR, as it u = R−1 y (56)
is in effect interference-limited. This is due to the coarse
approximation that previous symbol estimates provide for Similarly, on the basis of the linear minimum mean-
the MAI squared error criterion, the solution is [19]
xm+1,k = dk + sTk si (di − um,i ) + sTk n = dk + ẑk (52) u = (R + σn2 I)−1 y (57)
i =k
um+1,k = Sgn (xm+1,k ) (53) Before data are delivered, the real-valued estimate must,
of course, be mapped to a valid data symbol. Both optimal
linear detectors rely on matrix inversion, which has a
where complexity of the order of O(K 3 ). The matrix inverse
ẑk = sTk si (di − um,i ) + sTk n (54) represents the solution to a set of linear equations [43].
i =k
There exist, however, efficient iterative techniques for
solving a set of linear equations. As an example, let us
Inherent
assumptions are that the cancellation error, focus on the implementation of the decorrelator, Eq. (56).
sTk si (di − um,i ) is Gaussian and independent of the The set of linear equations to be solved is described by
i =k
thermal noise n. Neither of these assumptions is true.
y = Ru = (I + M)u = (I + L + U)u (58)
Obviously the second assumption cannot be true since each
tentative decision depends on the thermal noise. This was
where we have applied the partition of R described
taken into account in the improved cancellation scheme
previously. A common iteration used for matrix inversion
suggested by Divsalar et al. [38]. Here, it is assumed that
is the Jacobi iteration
the cancellation error and the thermal noise are correlated.
Also, at iteration (m + 1), the detector is derived based on
observing the received signal yk as well as the previous um+1 = y − Mum (59)
decision statistic xm,k . The joint likelihood function is
still derived as conditioned on perfect knowledge of d(k), which is identical to (39) given a linear tentative decision
however, it now depends on the correlation between the as um = fx (xm ) = xm . Here, fx (·) denotes a vector function
Gaussian noise and the residual interference. applying the decision function, fx (·) to each element of the
Following some manipulations, some simplifying argument vector independently. Similarly, the well-known
assumptions and substituting the most current estimate Gauss–Seidel (GS) iteration is described as
um (k) in place of d(k), we arrive at the following revised
decision statistic: um+1 = y − Lum+1 − Uum (60)
which, in turn, is equivalent to (43). It follows that
xm+1,k = µm+1,k yk − ρki um,i + (1 − µm+1,k )xm,k (55) these two classic iterations are linear PIC and linear
i =k SIC, respectively. This was first realized by Elders-Boll
et al. [27]. For the linear case, IC therefore represents
The updated decision statistic is now determined as an iterative implementation of optimal linear detectors,
a weighted sum of the previous decision statistic and given that the particular iteration converges. The GS
the corresponding decision statistic determined by a iteration is guaranteed to converge while the Jacobi
traditional PIC cancellation. The weighting factor, µm+1,k , iteration is not. The linear PIC converges if the iteration
is described by an involved combination of the correlation matrix Mm = (R − I)m converges for increasing m. For
parameters between the Gaussian noise and the residual Mm to converge, the maximum eigenvalue of R must be
error term [38]. It may not be possible to accurately constrained by [44]
determine the weighting factor analytically, but trial- λmax ≤ 2 (61)
and-error selection has shown that the general structure
is very powerful and provides significant performance which is not true for all possible correlation matrices.
gains over traditional hard-decision structures [38]. This To guarantee convergence for a linear PIC scheme, and
technique was originally termed partial cancellation, and in general also to increase convergence speed for linear
the principles are now used in most practical studies of IC IC, more advanced iterations can be used [44–46]. The
techniques [39–42]. concept of over-relaxation can be used for both PIC and
ITERATIVE DETECTION METHODS FOR MULTIUSER DIRECT-SEQUENCE CDMA SYSTEMS 1205
SIC structures. The Jacobi over-relaxation iteration is relatively reliable. This is in contrast to a hard decision
described by based on a decision statistic only marginally different
from zero, which is bound to be relatively unreliable,
um+1 = µ(y − Rum ) + um (62) potentially introducing additional interference with high
probability. A linear tentative decision function is effective
The corresponding iteration matrix is now (µR − when the decision statistic has a magnitude less than
I)m , and thus the relaxation parameter directly scales one. For very small decision statistics, only a very small
the eigenvalues of R, allowing for tuned, guaranteed interference estimate is subtracted, limiting the potential
convergence. Considering the decision statistic for user damage of a wrong decision. As the decision statistic
k, we get increases, increasingly larger interference estimates are
subtracted. For decision statistics with magnitude larger
K
um+1,k = µ yk − ρki um,i + um,k than one, a linear decision will therefore invariably
i=1 introduce additional interference. The transmitted symbol
is limited to unit magnitude and thus we should not
attempt to cancel out more than that.
= µ yk − ρki um,i + (1 − µ)um,k (63)
The clipped linear decision function depicted in Fig. 1
i =k
seems to be an appropriate choice for combining the
which has the structure of (55), emphasizing that these benefits of a hard decision and a linear decision,
advanced iterations correspond to weighted linear IC. The respectively, avoiding the inherent drawbacks. Therefore,
optimal weighting factor for the Jacobi overrelaxation is assume now that we constrain the allowable solution for
determined by the eigenvalue spread of the correlation each user to −1 ≤ um,k ≤ 1 for all k and m. Enforcing
matrix R [43]. Fastest convergence is assured when the these constraints for all users simultaneously describe
positive and the negative mode of convergence for the a K-dimensional hypercube in Euclidean space. Such a
iteration matrix are equal, which is obtained by constraint is denoted a box-constraint and is formally
defined as
µ=
2
(64) K = {d ∈ K : d ∈ [−b, b]} (66)
λmax + λmin
where b is an all ones vector. The corresponding box-
where λmax and λmin are the maximum and minimum
constrained optimization problem is described as
eigenvalues of R, respectively. An asymptotic analysis for
large systems [44] has shown that this optimal weighting
u = arg min[dT Rd − 2yT d] (67)
factor is well approximated by d∈K
N
µ= (65) Ahn [48] first suggested an iterative algorithm for solving
N+K the general problem of constraining the solution to
when N and K are large. The ICU of Fig. 4 should be any convex set. A hypercube, a so-called box, is a
modified as shown in Fig. 8 to accommodate the weighted tight convex set. For any convex set we can define
cancellation of (62). an orthogonal projection. For the box constraint, the
First-order and second-order iterations have also been orthogonal projection is merely the clipped linear decision
suggested, such as the steepest-descent iteration and the function applied independently to all the users as
conjugant gradient iteration [45–47]. For the steepest- demonstrated in Fig. 9. As shown in Refs. 30,49, and 50,
descent iteration, expressions for optimal weighting the algorithm suggested by Ahn is in fact a generalization
factors for both short and long codes have been derived Guo of first order iterations for solving linear equation systems:
et al. [29,46]. For a more thorough analysis and discussion
of linear IC schemes, please consult Refs. 44–46. xm+1 = µ(y − Qum+1 − (R − Q)um ) + um (68)
um+1 = fx (xm+1 ) (69)
7. CLIPPED LINEAR TENTATIVE DECISION FUNCTION
Letting Q = 0, we have a weighted PIC structure, while
A hard decision is effective when the decision statistic is Q = L leads to a weighted SIC structure. It has been
large, in which case the corresponding decision should be shown that the same convergence conditions as for the
linear case apply [48]. It follows that a traditional SIC
um,k um +1,k scheme based on a clipped linear decision always converges
to the solution of the box-constrained ML problem. The
PIC also converges to this solution when an appropriate
weighting factor is chosen. Unfortunately, no analytic
em +1,k − ∆em +1,k results have yet been obtained for deriving optimal
× + fx ( • ) + × weighting factors. In general an SIC structure converges
faster than a PIC structure. The clipped linear decision is
µ s Tk sk quite attractive as it provides good performance, relatively
fast convergence, and it is simple to implement in
Figure 8. Basic structure of a weighted ICU. hardware. The corresponding ICU has the same structure
1206 ITERATIVE DETECTION METHODS FOR MULTIUSER DIRECT-SEQUENCE CDMA SYSTEMS
p(xm+1,k | dk = 1) − p(xm+1,k | dk = 1)
1 (xm,1, xm,2) um+1,k = (78)
p(xm+1,k | dk = 1) + p(xm+1,k | dk = −1)
as shown in Fig. 8. This approach has been suggested for In this case, the corresponding ICU has the form shown in
practical implementation in several papers [e.g., 41]. Fig. 4.
The final tentative decision function we consider is based In this section numerical examples are presented,
on an MMSE optimized estimate. illustrating the characteristics of the different cancellation
strategies and the different tentative decision functions,
um,k = E{dk | xm,k } (70) respectively. The impact of weighted cancellation is
demonstrated, although no attempts have been made for
This was first suggested by Tarköy [51] and later optimizing the weighting factors. Only general trends are
developed further by other authors [31,32]. Considering illustrated, leaving the interested reader to consult the
the general description of PIC, we can write the decision vast literature on the topic for more details regarding
statistic for user k and iteration m as done in (23) weight optimization.
A symbol synchronous CDMA system with processing
xm+1,k = dk − ρki (di − um,i ) + zk = dk − ρki εm,i + zk gain N = 32 and the simple channel model of Eq. (16) is
i =k i =k considered. Long codes are assumed, so a new random
(71) spreading sequence is used for each user and each symbol
Assuming that the cancellation error is a zero-mean interval. In Fig. 10, the BER performance of PIC and
Gaussian random variable, independent of the thermal weighted PIC (WPIC) is shown as a function of the
noise, xm+1,k is also a Gaussian random variable with number of users in the system. Here, the four tentative
2
mean dk and variance σm,k : decision functions depicted in Fig. 1 are used, respectively.
For weighted cancellation, a factor of µ = 0.5 have been
xm+1,k ∼ N(dk , σm,k
2
) (72) used for all cases. Better performance can be obtained
2
σm,k = 2
σε,m,k + σn2 (73) for more carefully selected weighting factors. The PIC
can be considered as the case of µ = 1. With caution it
The expectation in (70) is obviously determined by is therefore possible to roughly predict performance for
factors 0.5 ≤ µ ≤ 1 as the optimal weights in most cases
um+1,k = P(dk = 1 | xm+1,k ) − P(dk = −1 | xm+1,k ) (74) are within this interval. The performance is captured at
a bit energy to noise ratio Eb /N0 = 5 dB. The iterative
Considering one term at a time detectors have been restricted to five iterations which is
considered reasonable for potential practical applications.
P(dk )p(xm+1,k | dk = 1) In the following paragraphs we will denote the use of the
P(dk = 1 | xm+1,k ) = (75)
p(xm+1,k ) tentative functions in Fig. 1 as LIN, HARD, CLIP, and
TANH, respectively.
Using the fact that Considering first PIC, significant performance losses
are observed as the system load, K/N, increases.
P(dk = 1 | xm+1,k ) + P(dk = −1 | xm+1,k ) = 1 (76) Especially a linear tentative decision function is sensitive
to the load. As K increases, it becomes more likely that
we arrive at the iteration matrix is diverging, leading to detector
collapse with very poor performance. For the other decision
P(dk = 1 | xm+1,k ) functions, the performance degradation is more graceful,
p(xm+1,k | dk = 1) but still severe as the load increases. A load of 10–15%
= (77)
p(xm+1,k | dk = 1) + p(xm+1,k | dk = −1) for the linear case and of 25–30% for the others can be
ITERATIVE DETECTION METHODS FOR MULTIUSER DIRECT-SEQUENCE CDMA SYSTEMS 1207
10−1
Ave BER
HARD, PIC
LIN, PIC
10−2 CLIP, PIC
TANH, PIC
HARD, WPIC
LIN, WPIC
CLIP, WPIC
TANH, WPIC
10−3
6 8 10 12 14 16 18 20 22 24
No. of users
Figure 10. The BER performance for PIC and WPIC as a function of the number of users. HARD,
LIN, CLIP, and TANH are used together with the cases of µ = 1 and µ = 0.5. The solid horizontal
line represents single-user performance at 5 dB.
accommodated with reasonable losses at a processing gain and is mainly left to a trial-and-error approach with
of 32. The TANH shows the best performance, followed little analytic justification. To avoid these drawbacks, the
by the CLIP and HARD. For high load, CLIP and HARD decision function should differentiate on decision statistic
provide similar performance. quality as done by the other three alternatives.
Introducing a fixed weighting factor of µ = 0.5 does Comparing PIC (µ = 1) and WPIC (µ = 0.5), we can
not provide better performance for small loads. The conclude that the weighting factor should decrease with
performance degradation for WPIC as the load increases load, starting at µ = 1 for small loads. For appropriately
is noticeably more graceful, extending potential load with varying weights, reasonable performance is to be expected
appropriately selected weights to about 50% for TANH, at least up to a load of 50%.
CLIP, and LIN. The performance of WPIC for LIN, CLIP, It should be kept in mind that only 5 iterations are
and TANH are quite similar to each other. This is mainly allowed in Fig. 10. In Fig. 11, the BER as a function
due to the limited number of iterations allowed. For a of the number of iterations is shown for the case of
larger number of iterations, differentiating performance is K = 24 and Eb /N0 = 7 dB. In this case, µ = 1 does not
obtained as illustrated in Fig. 11. provide reasonable performance for any decision function.
For HARD, a low weighting factor at small to moderate In all cases, pingpong effects are observed where the
loads is not appropriate. For µ = 0.5, the performance BER oscillates with iterations [52]. For µ = 0.5, HARD
is considerably worse than µ = 1 up to a load of 50%. is still not useful as previously discussed, while LIN and
This illustrates the difficulty of selecting weighting factors TANH improve gradually up to 8 iterations, after which
for HARD. Since a hard decision leads to cancellation the performance converges to a level determined by the
of a ‘‘full’’ MAI contribution scaled by µ, regardless of residual MAI. The CLIP continues to improve even beyond
the quality of the decision statistic, weighting may do 15 iterations, representing the best alternative. The CLIP
more harm than good. In the first iteration decisions will, however, also converge to a level above the SU
are based on the matched-filter output, which usually performance since this strategy provides a box-constrained
provides a BER of less than 0.5. With a weighting factor ML solution and not necessarily a SU solution. At 7 dB,
of µ = 0.5, the additional MAI introduced due to wrong the SU performance is just below 10−3 . The benefits of
decisions are reduced, but at the same time only half of a larger number of iterations at higher loads are nicely
the MAI contributions corresponding to correct decisions illustrated.
are eliminated. The distinct nonlinearity of hard decisions Successive cancellation is expected to provide better
makes the selection of weighting factors more complicated performance than PIC. This is illustrated in Fig. 12,
BER at 7 dB, K = 24
100
10−1
Ave BER
HARD, PIC
LIN, PIC
10−2 CLIP, PIC
TANH, PIC
HARD, WPIC
LIN, WPIC
CLIP, WPIC
TANH, WPIC
10−3
2 4 6 8 10 12 14
No. of iterations
Figure 11. The BER performance for PIC and WPIC as a function of the number of iterations.
HARD, LIN, CLIP, and TANH are used together with the cases of µ = 1 and µ = 0.5.
BER, 5 iterations, K = 24
100
10−1
Ave BER
HARD, WPIC
LIN, WPIC
CLIP, WPIC
10−2 TANH, WPIC
HARD, SIC
LIN, SIC
CLIP, SIC
TANH, SIC
SU
10−3
0 1 2 3 4 5 6 7 8
Eb /N 0 /dB
Figure 12. The BER performance for SIC and WPIC as a function of Eb /N0 , where HARD, LIN,
CLIP, and TANH tentative decisions are used. For WPIC, µ = 0.5.
1208
ITERATIVE DETECTION METHODS FOR MULTIUSER DIRECT-SEQUENCE CDMA SYSTEMS 1209
22. M. Honig, U. Madhow, and S. Verdú, Blind adaptive mul- 37. L. K. Rasmussen, T. J. Lim, and T. M. Aulin, Breadth-first
tiuser detection, IEEE Trans. Inform. Theory 41: 944–960 maximum-likelihood detection in multiuser CDMA, IEEE
(July 1995). Trans. Commun. 45: 1176–1178 (Oct. 1997).
23. M. K. Varanasi and B. Aazhang, Multistage detection in 38. D. Divsalar, M. Simon, and D. Raphaeli, Improved parallel
asynchronous code–division multiple–access communica- interference cancellation for CDMA, IEEE Trans. Commun.
tions, IEEE Trans. Commun. 38: 509–519 (April 1990). 46: 258–268 (Feb. 1998).
24. K. Jamal and E. Dahlman, Multi-stage serial interference 39. M. Sawahashi, Y. Miki, H. Andoh, and K. Higuchi, Serial
cancellation for DS-CDMA, Proc. IEEE VTC ’96, Atlanta, Canceler Using Channel Estimation by Pilot Symbols for
April 1996, pp. 671–675. DS-CDMA, IEICE Technical Report RCS95-50, July 1995,
25. P. Dent, B. Gudmundson, and M. Ewerbring, CDMA-IC: A Vol. 12, pp. 43–48.
novel code divisions multiple access scheme based on 40. T. Ojaperä et al., Design of a 3rd generation multirate CDMA
interference cancellation, Proc. 3rd IEEE Int. Symp. PIMRC system with multiuser detection, MUD-CDMA. Proc. IEEE
’92, Boston, Oct. 1992, pp. 98–102. Int. Symp. Spread Spectrum Techniques and Applications
26. P. Patel and J. Holtzman, Analysis of simple successive (ISSSTA), Mainz, Germany, Sept. 1996, pp. 334–338.
interference cancellation scheme in a DS/CDMA, IEEE J. 41. H. Seki, T. Toda, and Y. Tanaka, Low delay multistage
Select. Areas Commun. 12: 796–807 (June 1994). parallel interference canceller for asynchronous DS/CDMA
27. H. Elders-Boll, H. D. Schotten, and A. Busboom, Efficient systems and its performance with closed-loop TPC, Proc. 3rd
implementation of linear multiuser detectors for asyn- Asia-Pacific Conf. Communications, Sydney, Australia, Dec.
chronous CDMA systems by linear interference cancella- 1997, pp. 832–836.
tion, Eur. Trans. Telecommun. 9(4): 427–437 (Sept.–Nov. 42. M. Sawahashi, H. Andoh, and K. Higuchi, Interference rejec-
1998). tion weight control for pilot symbol-assisted coherent multi-
28. L. K. Rasmussen, T. J. Lim, and A.-L. Johansson, A matrix- stage interference canceller using recursive channel estima-
algebraic approach to successive interference cancellation tion in DS-CDMA mobile radio, IEICE Trans. Fund. E81-A:
in CDMA, IEEE Trans. Commun. 48: 145–151 (Jan. 957–970 (May 1998).
2000). 43. O. Axelsson, Iterative Solution Methods, Cambridge Univ.
29. D. Guo, L. K. Rasmussen, S. Sun, and T. J. Lim, A matrix- Press, 1994.
algebraic approach to linear parallel interference cancellation 44. A. Grant and C. Schlegel, Convergence of linear interference
in CDMA, IEEE Trans. Commun. 48: 152–161 (Jan. 2000). cancellation multiuser receivers, IEEE Trans. Commun. 49:
30. P. H. Tan, L. K. Rasmussen, and T. J. Lim, Constrained 1824–1834 (Oct. 2001).
maximum-likelihood detection in CDMA, IEEE Trans. 45. R. M. Buehrer, S. P. Nicoloso, and S. Gollamudi, Linear
Commun. 49: 142–153 (Jan. 2001). versus nonlinear interference cancellation, J. Commun.
31. R. R. Müller and J. B. Huber, Iterative soft-decision interfer- Networks 1: 118–133 (June 1999).
ence cancellation for CDMA, in Louise and Pupolin, eds., 46. D. Guo, L. K. Rasmussen, and T. J. Lim, Linear parallel
Digital Wireless Communications, Springer Verlag, 1998, interference cancellation in random-code CDMA, IEEE J.
pp. 110–115. Select. Areas Commun. 17: 2074–2081 (Dec. 1999).
32. S. Gollamudi, S. Nagaraj, Y. -F. Huang, and R. M. Buehrer, 47. P. H. Tan and L. K. Rasmussen, Linear interference cancella-
Optimal multistage interference cancellation for CDMA tion in CDMA based on iterative solution techniques for linear
systems using nonlinear MMSE criterion, Proc. Asilomar equation systems, IEEE Trans. Commun. 48: 2099–2108
Conf. Signals, Systems, Computers 98, Oct. 1998, Vol. 5, (Dec. 2000).
pp. 665–669.
48. B. H. Ahn, Iterative methods for linear complementary
33. P. D. Alexander, A. J. Grant, and M. C. Reed, Iterative detec- problems with upper bounds on primary variables, Math.
tion in code-division multiple-access with error control coding, Prog. 26(3): 295–315 (1983).
Eur. Trans. Telecommun. 9: (July–Aug. 1998).
49. A. Yener, R. D. Yates, and S. Ulukus, A nonlinear program-
34. L. K. Rasmussen, P. D. Alexander, and T. J. Lim, A linear ming approach to CDMA multiuser detection, Proc. Asilomar
model for CDMA signals received with multiple antennas Conf. Signals, Systems, Computers 99, Pacific Grove, CA, Oct.
over multipath fading channels, in F. Swarts, P. van Rooyen, 1999, pp. 1579–1583.
I. Oppermann, and M. Lötter, eds., CDMA Techniques
50. P. H. Tan, L. K. Rasmussen, and T. J. Lim, Iterative interfer-
for 3rd Generation Mobile Systems, Kluwer, Sept. 1998,
ence cancellation as maximum-likelihood detection in CDMA,
Chap. 2.
in Proc. Int. Conf. Information, Communication, Signal Pro-
35. S. Sumei, L. K. Rasmussen, T. J. Lim, and H. Sugimoto, A cessing 99, Singapore, Dec. 1999.
hybrid interference canceller in CDMA, Proc. IEEE Int.
51. F. Tarköy, MMSE-optimal feedback and its applications,
Symp. Spread Spectrum Techniques and Applications, Sun
Proc. IEEE Int. Symp. Information Theory, Whistler, Canada,
City, South Africa, Sept. 1998, pp. 150–154.
Sept. 1995, p. 334.
36. S. Sumei, L. K. Rasmussen, T. J. Lim, and H. Sugimoto, A
52. L. K. Rasmussen and I. J. Oppermann, Ping-pong effects in
matrix-algebraic approach to linear hybrid interference can-
linear parallel interference cancellation for CDMA, to IEEE
celler in CDMA, Proc. Int. Conf. Univ. Personal Communica-
Trans. Wireless Commun. (in press).
tion, Florence, Italy, Oct. 1998, pp. 1319–1323.
J
JPEG2000 IMAGE CODING STANDARD Amazingly, all the features of JPEG2000 are realized
by coding the image data only once. This contrasts
B. E. USEVITCH sharply with previously used image compression methods
University of Texas at El Paso where many of the image properties, such as image
El Paso, Texas size or quality, are fixed at compression time. Thus,
to get compressed data representing different image
1. INTRODUCTION sizes or quality, multiple codings of the original data,
and the subsequent storage of multiple compressed
JPEG2000 is a new international standard for the representations were required. Given all the advantages
coding (or compression) of still images. The standard was to JPEG2000, are there any disadvantages? One main
developed in order to address some of the shortcomings of disadvantage, which should become clear after reading
the original JPEG standard, and to implement improved this description, is that JPEG2000 is considerably more
compression methods discovered since the original JPEG complex than its predecessor JPEG. Thus JPEG may still
standard first appeared. The JPEG2000 standard offers a be the preferred method for coding images at medium
number of new features, with one of the most significant compression rates where it is only slightly worse than
being the flexibility of the compressed bit stream. As a JPEG2000 in terms of distortion performance.
result of this flexibility, many image processing operations This article gives a brief overview of the JPEG2000
such as rotation, cropping, random spatial access, panning, coding algorithm, covering only Part 1 or the baseline
and zooming, can be performed in JPEG2000 either of the standard. Section 2 first describes some basic
directly on the compressed data, or by only decompressing concepts, namely, image progressions and embedded
a relevant subset of the compressed data. The flexible bit representations, that are needed to understand the
stream allows the compressed data to be reordered such standard. Section 3, the longest and most detailed section,
that decompression will result in images of progressively describes the JPEG2000 coding algorithm from the
larger size, higher quality, or more colors. All the perspective of encoding an image. Section 4 gives some
bit streams, original and reordered, maintain a strict performance results of JPEG2000 relative to the SPIHT
embedded property in which the most important bits algorithm and JPEG, and give conclusions.
come first in the bit stream. Consequently, truncating
these embedded bit streams at any point gives an optimal
compressed representation for the given bitlength. Other 2. BACKGROUND
desirable features of JPEG2000 include
2.1. Image Scaling
• Lossy and lossless compression using the same
The JPEG2000 standard is capable of progressively
algorithm flow (truncated lossless bit streams give
decoding images in several different ways corresponding
lossy image representations)
to different ways of scaling image data. The current
• Efficient algorithm implementation on both small standard uses four methods of image scaling: resolution,
memory and parallel processing devices quality, position, and color. Resolution scaling corresponds
• Region of interest coding to changing the image size, where increasingly larger
• Improved error resilience. resolution gives larger image sizes (see Fig. 1). Resolution
scaling is useful in applications where an image is decoded rates can be achieved by coding the image only once. The
to different display sizes, such as a palmtop display or a final optimal compressed representation is achieved by
21-in. monitor, and in Web serving applications where a simply truncating the bit stream from the single coding to
small thumbnail image is typically downloaded prior to the desired bitlength.
downloading a full-sized image. Embedded bit streams can be constructed such
Quality scaling, also called signal-to-noise (SNR) that when sequentially decompressed they give the
scaling, refers to altering the numerical precision used image progressions described above. Since the image
to represent the pixels in an image. Increasing the progressions are different, it would appear that a separate
quality corresponds to higher fidelity images as shown coding run would be required to create the embedded
in Fig. 2. Quality scaling is useful when decompressing bit stream corresponding to each progression. A major
pictures to displays having different capabilities, such breakthrough achieved by the JPEG2000 coding algorithm
as displays having only black-and-white pixels or those is that the embedded bit streams corresponding to all the
having 256 levels of grayscale. Position or spatial scaling basic image progressions can be achieved by doing only one
refers to altering the order in which smaller portions coding. The way JPEG2000 is able to do this is by dividing
of an image are used to progressively build up the entire up a transformed image into a number of independent
image. For example, most printers print images from top to codeblocks. Each codeblock is independently compressed
bottom in a raster scan order, corresponding to increasing to form its own quality scalable bit stream called an
the spatial scale of the image. Spatial scaling is useful in elementary embedded bit stream. The set of all elementary
printers which may have to print large images with limited bit streams from all codeblocks are annotated and collected
memory and in applications that require random access to together into a database called a generalized embedded
locations of an image. Color scaling refers to changing the representation [3]. The creation of this database is the
number of colors planes [such as RGB (red-green-blue)] initial step of the coding process. The second and final step
used to represent an image and is useful when a color consists of extracting, annotating, and ordering the bits
image is printed or displayed on a black-and-white printer from this generalized representation to give the final coded
or monitor. image representation. This final coded representation
gives the progression desired as well as being an embedded
2.2. Embedded Bit Streams representation, and thus optimal for any given truncation
A powerful tool used in the JPEG2000 coding algorithm point. JPEG2000 is able to create these final coded
is that of embedded coding. A binary bit stream is said to representations by only selecting and rearranging the bits
be embedded if any truncation of the bit stream to length resulting from the single initial coding.
L results in an optimal compressed representation. By
optimal we mean that no other compressed representation 3. JPEG2000 ENCODING
of length L, embedded or not, will have better resulting
distortion than the truncated embedded bit stream.
3.1. Preprocessing (Tiling and Color Transform)
EZW [1] and SPIHT [2] are good examples of algorithms
that give embedded representations that are quality The first step in the JPEG2000 coding algorithm is to
scalable. Truncating the bit streams resulting from these divide the image into rectangular regions called ‘‘tiles.’’
algorithms thus gives lower SNR representations, where These tiles are all of the same size and completely cover
each representation is optimal for the truncated length. A the image in a regular array. The tile size can be chosen
significant advantage of embedded bit streams is that the to be as large as the image itself, in which case the image
compressed representations of an image at many different consists of only one tile. For purposes of coding, each
tile can be treated as an independent image having its The final preprocessing step is to remove any average
own set of coding parameters. Thus tiling can be useful in (or DC) value in the image coefficients by adding a
coding compound documents, which are documents having constant value to the image. Eliminating the average
separate subregions of texts, graphics, and picture data. value reduces the probability of overflow and allows
Tiles from compound images that contain only text data the JPEG2000 algorithm to take full advantage of fixed
can be compressed with parameters that are very different precision arithmetic hardware.
from tiles that contain only picture data. Tiling can also
be used to process very large images using systems with 3.2. Wavelet Transform
small amounts of memory, and to give random spatial After preprocessing the image coefficients are transformed
access to compressed data (although this spatial access using a standard dyadic (power of 2) discrete wavelet
can also be done in a single tile). The main disadvantage transform (DWT). An example three-level DWT with
of tiling is that it leads to blocking artifacts and reduced corresponding notation is shown in Fig. 3. Note that the
compression performance similar to that found in the number of levels of wavelet decomposition M gives rise
original JPEG standard. to M + 1 well defined image resolutions (sizes). These
After tiling, color images, or images consisting of resolutions are numbered from the smallest (r0 ) to the
more than one color component, can be transformed to full image size before transformation (rM ) as shown in
reduce the redundancy amongst the color components. Two Fig. 3. JPEG2000 uses a special form of the DWT called
transforms are defined in JPEG2000 for use on standard the symmetric wavelet transform (SWT), which is able
RGB color data. The first is a linear transform which to handle border effects and has the property that the
converts the three RGB components into a luminance transformed image has the same number of coefficients
(or black-and-white) component, and two chrominance as the original image [4]. The wavelet transform can be
(or color) components denoted YCb Cr . This transform, implemented using either a filterbank or lifting methods,
called the irreversible color transform (ICT), cannot exactly and special methods can be employed by the transform to
recover the original data from the transformed data and reduce memory requirements [5].
therefore is used for lossy coding applications. The second Part 1 of the JPEG2000 standard specifies only two
transform, a nonlinear transform called the reversible sets of filter coefficients to be used in the transform: the
color transform (RCT), converts RGB data into three 9/7 and 5/3 filters (where the numbers correspond to filter
components Y Db Dr . The RCT is an integer approximation lengths). Wavelet transforms using the 9/7 filters and finite
to the ICT that maps integers to integers in a reversible precision arithmetic cannot exactly recover the original
manner. Because it is reversible, the RCT is the only image from the wavelet transformed coefficients. Thus
transform used for lossless compression. The separate the 9/7 filters are only used in lossy coding applications.
color components resulting from either the ICT or RCT Wavelet transforms using the 5/3 filters and lifting
are then coded independently. Since tiles and color map integers to integers such that the exact original
components are coded independently and in the same image can be reconstructed from the wavelet-transformed
manner, this article assumes for simplicity that the image coefficients. Thus the 5/3 filters are the only ones used
being coded has only one tile and one color component. for lossless compression. The 5/3 filters can also be used
r0 r1 r2 r3
c4 c5 c16 c17
c6 c7 c18 c19
p1
c0 c1 c8 c9
c2 c3 c10 c11
p2
c12 c13 c20 c21
c14 c15 c22 c23
in lossy compression but are not preferred since the 9/7 resulting quantization interval of the compressed image
filters give better lossy compression performance. (2k b ) is determined by truncation of the embedded bit
stream.
3.3. Embedded Quantization
The wavelet coefficients are quantized using a uniform 3.4. Bit Plane Encoding
‘‘deadzone’’ quantizer, where the deadzone is twice as large
as the quantization step size and symmetric about zero (see Prior to coding the quantized wavelet coefficients, the
Fig. 4). A deadzone quantizer is used since the deadzone wavelet subbands are divided into a regular array of
improves coding performance when the quantizer is used relatively small rectangles called codeblocks (see Fig. 3).
with entropy coding. The deadzone quantizer can be Each of these codeblocks is then coded independently
represented mathematically as to form its own elementary embedded bit stream (see
Section 2.2). Coding small blocks rather than entire
|w| subbands or the whole image offers several advantages.
q = sign(w) (1) Since the blocks are small, they can be coded in
b
hardware having limited memory, and since the blocks
where w is the unquantized wavelet coefficient in subband are independently coded, several blocks can be coded in
b, and b is the quantization step size for subband b. parallel. Independently coding blocks also gives better
The quantized wavelet coefficients are represented in error resilience since the errors in one block will not
binary signed magnitude form. Specifically, one bit of propagate into other blocks (error propagation can be a
the binary number represents the sign, zero for positive, significant problem in EZW and SPIHT). Having a large
and the remainder of the bits represent the quantized number of blocks makes the resulting coded bit stream
magnitude (see Fig. 4). The quantization step includes more flexible. The coded blocks can be arranged such
some loss of information since all wavelet coefficients in that different progressive decodings are possible. Also, by
a quantization interval of size b are mapped into the using a large number of codeblocks the rate distortion
same quantized value. For lossless coding applications, the performance of the compressed image can be optimized
wavelet coefficients are not quantized, which corresponds without further coding. This is accomplished by selecting
to setting b = 1 in Eq. (1). only the best bits from each compressed codeblock in a
The quantized wavelet coefficients are coded using what process called postcompression rate distortion optimization
is called bit plane coding [6]. In bit plane coding the most (see book by Taubman and Marcellin [3] and Section 3.6).
significant magnitude bits of all the coefficients are coded Each codeblock is coded to give a quality embedded bit
prior to coding the next most significant bits of all the stream. The bit stream has the property that those bits
coefficient magnitudes, and so forth. The combination that reduce the output distortion the most come first in the
of quantization and bit plane coding can be viewed as bit stream. This is accomplished by bit plane encoding the
quantizing data with a set of successively finer quantizers, wavelet coefficients starting with the most significant bit
and this process is called embedded quantization. A set plane and ending with the least significant one. To see why
of embedded scalar quantizers is shown in Fig. 4. These the most significant bits must be coded first to get a quality
quantizers have the property that finer quantizers are embedded bit stream, consider the case of an encoder
formed by subdividing the quantization intervals of a wanting to send a coefficient to a decoder. If the binary
more coarse quantizer. The result of using embedded value of the coefficient is 10,0102 and only one bit could
quantization in JPEG2000 is that the quantization using be sent, which would be the best bit? The answer from a
interval b includes all the quantizations having coarser squared error perspective is to send the most significant
quantization intervals of 2k b where 0 ≤ k ≤ kmax . Note bit since this results in a squared reconstruction error of
also from Fig. 4 that the quantization interval index is only (10,0102 − 10,0002 )2 = (102 )2 = 4 while sending the
formed by appending a bit to the index of the next coarser lower significant bit results in a larger squared error of
interval to which it belongs. (10,0102 − 102 )2 = (10,0002 )2 = 256.
Embedded quantization is the method in which JPEG2000 uses a quantized coefficient’s significance to
JPEG2000 is able to construct quality scalable compressed determine how it will be coded in a bit plane. A coefficient
data. Because of embedded quantization the choice of is defined to be significant with respect to a bit plane if the
finest quantization interval b is not critical. The interval coefficient’s magnitude has a nonzero bit in the current or
b is typically chosen to be rather narrow (or fine), and the higher bit plane. Defining the significance function σk (q)
−111 −110 −101 −100 −011 −010 −001 000 +001 +010 +011 +100 +101 +110 +111
of a quantized wavelet coefficient q at bit plane k as code length, since on the average keeping a fraction αL
of the bits reduces the distortion by the amount αD. By
|q| coding the higher slope bits first, truncated bit streams
σk (q) =
2k give distortion results that are much closer to the optimum
rate distortion curve.
then a wavelet coefficient is significant in bit plane k The 3 passes corresponding to the three fractional
if σk (q) > 0. Coding of coefficients in a codeblock begins bit planes scan the codeblock using the scanning
in bit plane kmax , where kmax is the highest bit plane pattern shown in Fig. 6. The scan pattern is basically
such that at least one of the wavelet coefficients in a raster scan where each scan line consists of several
the block is significant (kmax = log2 (max |q|), where the height four columns of coefficients. The first pass,
maximization is over all quantized coefficients q in the called the significance propagation (SP) pass, codes all
codeblock). the insignificant bits that are immediate neighbors of
JPEG2000 codes each bit plane using three separate coefficients that have been previously found significant. By
passes, where each pass corresponds to what is called a ‘‘previously significant coefficients,’’ we mean coefficients
fractional bit plane. This is different from earlier coders found significant in a previous bit plane or in the current
such as EZW and SPIHT, which used only one pass per bit plane earlier on in the scanning pattern of this pass.
bit plane. These previous methods argued that all bits Each coefficient in this pass is coded with what is called
in a bit plane reduced the output distortion by the same standard coding, which is either a 0 to indicate that the
amount. Although this is true, further research showed coefficient remains insignificant, or a 1 and a sign bit to
that some bits in a bit plane have a higher distortion indicate that the coefficient has become significant. The
rate slope than do others [7,8]. The intuition behind this second pass is called the magnitude refinement (MR) pass.
is that all bits in a bit plane (input bits) reduce the It codes the current bit in the bit plane corresponding
image distortion by the same amount D. However, after to coefficients found significant in a prior bit plane. That
entropy coding (discussed in the next section), some input is it codes the next most significant bit in the binary
bits require fewer output bits L to encode. The result representation of the these coefficients. Coefficients that
is that the distortion rate slopes D/L are different for became significant in the current bit plane are not coded
different input bits. Coding bits with higher slopes first in this pass since their values were already coded in the
leads to a coding advantage as illustrated in Fig. 5. The SP pass.
bottom curve in this figure shows the optimal distortion The final pass is the ‘‘cleanup’’ (CL) pass, which codes
rate curve resulting from entropy-coded quantization and the significance of all coefficients in the bit plane not
continuously varying the quantization step size. The two coded in the previous two passes. The bits from this pass
dark dots give the distortion rate results at the end are coded either using standard coding or with a special
of coding the k and k − 1 bit planes, corresponding to run mode designed for coding sequences of zeros. The
quantizing with step sizes 2k b and 2k−1 b respectively. special run mode is advantageous since at medium to high
Without fractional bit planes truncation of the bit stream compression ratios most of the output bits from this pass
results in distortion decreasing linearly with increased will be zero [3]. Run mode is entered if a height 4 column
from the scanning pattern satisfies the following: (1) the
four coefficients in the column are currently insignificant
(i.e., all the coefficients are to be coded in this pass) and
(2) all the neighbors of the four coefficients are currently
Fractional insignificant (i.e., the neighbors are either outside the
k bit planes codeblock and thus considered insignificant or are all to be
coded in this pass and if already coded, have not become
significant). Note that the conditions for run mode can be
deduced at both the encoder and decoder so that no extra
Distortion
Single bit
plane
Gain
k −1
T Length
Figure 5. An illustration showing the benefit of using fractional
bit plane coding when truncating to an arbitrary length T.
Fractional bit planes result in reduced distortion when the code Figure 6. The scanning pattern for the three fractional bit plane
stream is not truncated at bit plane boundaries. passes.
1216 JPEG2000 IMAGE CODING STANDARD
side information needs to be sent. When in run mode, a 4. Since −11 was coded in the current bit plane in the
0 indicates that all four coefficients in the column remain SP pass, it is not coded in the MR pass.
insignificant while a 1 or ‘‘run interrupt’’ indicates that
one or more of the coefficients have become significant. 3.5. Entropy Coding
After a run interrupt 2 bits are sent to indicate the length
of the run of zeros in the column (0 to 3), followed by the The sequence of ones and zeros resulting from bit plane
sign of the first significant bit. Standard coding resumes coding is further compressed through entropy coding using
on the next coefficient. a binary arithmetic encoder. Information theory results
The ordering of the passes is SP, MR, and CL for each show that the minimum average number of bits required
bit plane except the first. Because there are no significant to code a sequence of independent input symbols is given
bits from previous bit planes, the first bit plane starts with by the average entropy:
the CL pass.
Eav = −P1 log2 P1 − (1 − P1 ) log2 (1 − P1 ).
3.4.1. Bit Plane Coding Example. Table 1 shows the
results of bit plane coding some example quantized wavelet The input symbols are the zeros and ones from bit plane
coefficients. For clarity signs are shown as +/− instead of coding, and P1 is the probability that the symbol is a 1.
0/1 and the following comments refer to the table: If P1 = 0.5, the formula above shows that it requires on
average 1 bit per symbol to code a sequence; thus, coding
doesn’t help. However, if P1 differs or is skewed from
1. This column satisfies the conditions for run mode,
0.5, then it will require less than 1 bit per symbol on
and since all the coefficients remain insignificant in
average to encode the binary input sequence. Arithmetic
this pass, they can be coded with a single 0.
encoders are very useful since they are able to code
2. This column satisfies the conditions for run mode but sequences of symbols at rates approaching the average
has a coefficient that becomes significant, namely, entropy. A binary arithmetic encoder requires two pieces
25. The first 1 ends the run mode, and the 01 of information in order to do its encoding: (1) the symbol
indicates a run of one insignificant coefficient. The (1 or 0) to be coded and (2) a probability model (the value
next coefficient, 25, is known to be significant so only of P1 ). A great utility of the arithmetic encoder is that it
its sign is coded. can dynamically estimate P1 by keeping a running count
3. Since −11 became significant in this pass and 3 is of the input symbols it receives. The more symbols the
its neighbor later in the scanning pattern, it is coded arithmetic encoder processes, the closer these counts will
in the SP pass. be to approximating the true probability P1 .
The specific arithmetic encoder used in JPEG2000 is
called the MQ coder, and is the same coder used by
Table 1. The First Three Passes Resulting from Bit Plane
the JBIG standard. In order to get probabilities skewed
Coding the Quantized Wavelet Coefficientsa
from 0.5, JPEG2000 codes bits according to their context,
3 5 −1 which is determined by the value of the eight nearest-
neighbor coefficients to the coefficient being coded. A
−1 25 2
symbol probability P1 is computed for each context, and
7 −2 −11 each symbol is coded with the probability corresponding
1 −6 3 to its own context. By using contexts, the final symbol
probabilities are more skewed from 0.5 than they would
Pass be without using contexts. Thus the use of contexts results
in a coding gain.
Coef. cl4 sp3 mr3 cl3 sp2 mr2 cl2 In JPEG2000 contexts are formed by labeling neighbors
3 0(1) 0 0 as significant or insignificant, making a total of 28 contexts
possible. By careful experimentation and consideration of
−1 0 0
symmetries the JPEG2000 algorithm was able to reduce
7 0 1+ this number down to only 18 contexts for arithmetic
1 0 0 encoding: 9 for standard significance coding, 5 for sign
coding, and 3 for magnitude refinement coding. A full
5 101(2) 0 1+ description of the contexts and how they were derived can
25 + 1 0 be found in Taubman and Marcellin’s book [3]. Having a
−2 0 0 0 small number of contexts allows the probability models
formed by counting symbols to quickly adapt to the true
−6 0 0 1−
probability models. This is important since JPEG2000
−1 0 0 0 uses small codeblocks and the probability models are
2 0 0 0 reinitialized for each codeblock.
−11 0 1− (4) 0
3.6. Final Embedded Bit Stream Formation
3 0 0(3) 0
The result of bit plane coding is that each codeblock is
a
Superscript numbers in parentheses refer to listed items in Section 3.4.1. represented by an elementary embedded bit stream. The
JPEG2000 IMAGE CODING STANDARD 1217
next step is to process and arrange these elementary bit packet ordering results in final coded bit streams
streams into a generalized embedded representation. The that are resolution, quality, or spatially scalable. The
structure of this generalized representation is designed to only complexity involved in forming these final coded
make it easy to extract final coded image representations representations is that of reordering data.
having the scaling and embedded properties discussed
in Section 2. It is easy to construct resolution and
4. PERFORMANCE AND CONCLUSIONS
scalable compressed images from the elementary bit
streams. For example, resolution scalable data can be Table 2 shows lossy image coding results (in terms of
formed by concatenating elementary bit streams, starting peak SNR [6]) of JPEG2000 relative to SPIHT and the
with the lowest-resolution codeblocks and following with original JPEG standard. The results were generated
the higher-resolution codeblocks (extra data need to be using publicly available computer programs for JPEG [9]
included to indicate codeblock location and lengths). and SPIHT [10], and programs distributed in [3] for
Sequentially decoding this bit stream gives resolution JPEG2000. The test images used are the monochrome,
scaling since lower-resolution subband data appears 8 bit grayscale woman and bike images shown in Figs. 1
before higher resolution subband data. Spatial scaling and 2. To give an idea of visual quality versus bit rate, the
is implemented from this same bit stream by decoding bike images in Fig. 2 were coded at rates of 0.0625, 0.25,
only those blocks in the datastream associated with a and 1 bit per pixel. The results show that the performance
particular region of interest. The only scaling not possible of SPIHT is very close to JPEG2000. However, remember
from this simple bit stream is quality scaling. that JPEG2000 is more flexible (SPIHT is not resolution
Quality scaling is introduced in JPEG2000 by dividing or color scalable) and thus incurs some overhead due to
up the bits from the elementary embedded bit streams this flexibility.
into a collection of quality layers. A quality layer is The JPEG2000 standard represents a culmination of
defined as the set of bits from all subbands necessary to over a decade of wavelet based compression research.
increase the quality of the full sized image by one quality It represents a fundamental shift from Fourier based
increment. These layers can be implemented by adding techniques to wavelet based techniques that was enabled
information to the elementary bit streams to indicate those by the discovery of wavelet analysis. Research in image
bits belonging to each quality layer. Since each elementary compression for the foreseeable future will focus on
bit stream is embedded, this amounts to selecting a set of improving and extending wavelet-based techniques such
increasing truncation points in each elementary embedded as JPEG2000 until the discovery of new enabling
bit stream. These truncation points are found by selecting technologies and theory.
a set of increasing final codelengths L1 < L2 < · · · < LN ,
where LN is the sum total of all the bits in all the
elementary bit streams. The optimal set of L1 bits from BIOGRAPHY
all the codeblocks that minimizes the distortion is then
Bryan E. Usevitch received the B.S. degree in elec-
selected. These bits are indicated in each elementary bit
trical engineering (magna cum laude) from Brigham
stream by a truncation point and constitute the bits in the
Young University, Provo, Utah, in 1986, and the M.S.
first quality layer. Next, the optimal set of L2 bits from all
and Ph.D. degrees from the University of Illinois at
the codeblocks that minimizes the distortion is selected.
Urbana–Champaign in 1989 and 1993, respectively. From
Since the elementary bit streams are embedded, this set
1986 to 1987 and 1993 to 1995 he was a staff engineer at
consists of the L1 bits in the first quality layer with an
additional L2 − L1 bits. These additional bits are indicated
in each elementary bit stream by another truncation point Table 2. Peak SNR Results from Lossy Coding the Images
(one per elementary bit stream) and constitute the second of Figs. 1 and 2a
quality layer. The process, called postcompression rate
Bit Rate (bpp) 0.1 0.2 0.5 1.0
distortion optimization, is repeated for the remaining
lengths Ln to form all the quality layers. Woman
The data structure JPEG2000 uses in the generalized JPEG — 24.88 29.68 33.55
representation to track resolution, spatial location, and SPIHT (lossless) 26.34 28.56 33.04 37.58
quality information is called a ‘‘packet.’’ Each packet SPIHT (lossy) 26.72 28.93 33.56 38.33
represents one quality increment for one resolution level JPEG2000 (lossless) 26.16 28.31 32.84 37.54
at one spatial location. Since spatial locations are spread JPEG2000 (lossy) 26.76 29.02 33.62 38.48
across three subbands for a particular resolution level,
JPEG2000 uses precincts to refer to spatial locations. A Bike
‘‘precinct’’ is defined as a set of codeblocks at one resolution JPEG — 24.67 30.14 34.37
level that correspond to the same spatial location as shown SPIHT (lossless) 24.67 27.64 32.64 36.98
in Fig. 3. A full-quality layer can then be defined as one SPIHT (lossy) 24.92 28.04 33.01 37.70
packet, from each precinct, at each resolution level. JPEG2000 (lossless) 24.98 28.00 32.95 37.34
Packets are the fundamental data structure that JPEG2000 (lossy) 35.51 28.49 33.52 38.09
JPEG2000 uses to achieve highly scalable embedded bit a
Recall that truncating lossless encoded data gives lossy compression.
streams since packets contain incremental resolution, JPEG values are only approximate due to the difficulty in coding at exact
quality, and spatial data. Simply rearranging the bit rates.
1218 JPEG2000 IMAGE CODING STANDARD
TRW designing satellite communication systems. In 1995 4. B. Usevitch, A tutorial on modern lossy wavelet compression:
he joined the department of electrical engineering at the Foundations of JPEG2000, IEEE Signal Process. Mag. 18:
University of Texas at El Paso where he is currently an 22–35 (Sept. 2001).
associate professor. Dr. Usevitch’s research interests are 5. C. Chrysafis and A. Ortega, Line based, reduced memory,
in signal processing, focusing on wavelet based image wavelet image compression, IEEE Trans. Image Process. 9:
compression and multicarrier modulation. 378–389 (March 2000).
6. A. Jain, Fundamentals of Digital Image Processing, Prentice-
Hall, Englewood Cliffs, NJ, 1989.
BIBLIOGRAPHY
7. J. Li and S. Lei, Rate-distortion optimized embedding, Proc.
1. J. Shapiro, Embedded image coding using zerotrees of wavelet Picture Coding Symp., Berlin, Sept. 1997, pp. 201–206.
coefficients, IEEE Trans. Signal Process. 41: 3445–3462 (Dec. 8. E. Ordentlich, M. Weinberger, and G. Seroussi, A low-
1993). complexity modeling technique for embedded coding of
2. A. Said and W. Pearlman, A new, fast and efficient image wavelet coefficients, Proc. IEEE Data Compression Conf.,
codec based on set partitioning, IEEE Trans. Circuits Syst. Snowbird, UT, March 1998, pp. 408–417.
Video Technol. 6: 243–250 (June 1996). 9. I. J. Group, Home Page, (online) (no date), http://www.ijg.org.
3. D. Taubman and M. Marcellin, JPEG2000: Image Compres- 10. A. Said and W. Pearlman, SPIHT Image Compression,
sion Fundamentals, Standards, and Practice, Kluwer, Boston, (online) (no date), http://www.cipr.rpi.edu/research/
2002. SPIHT/spiht3.html.
K
√
KASAMI SEQUENCES (−1)t for t ∈ {0, 1}, i = −1. For a sequence u, wt(u)
denotes the number of ones per period in u. For sequences
TADAO KASAMI u = {ui } and v = {vj } with the same period N, the periodic
RYUJI KOHNO cross-correlation function θu,v (·) is defined by
Hiroshima City University
Hiroshima, Japan
N−1
θu,v (τ ) = X(uj )X(vj+τ ) for 0≤τ <N
j=0
1. INTRODUCTION = N − 2wt(u ⊕ T τ v) (1)
Code-division multiple-access, (CDMA) based on spread- For a special case of u = v, the expression θu,v (·) is
spectrum communication systems requires many pseudo- called the periodic autocorrelation function. For a set S
noise sequences with sharp autocorrelation and small of sequences with period N, the peak cross-correlation
cross-correlation. On one hand, sharp autocorrelation magnitude θmax of S is defined by
results in not only reliable and quick acquisition but also
tracking of sequence synchronization. On the other hand,
θmax = max{θu,v (τ ) : u, v ∈ S and 0 ≤ τ < N} (2)
small cross-correlation reduces interference between
multiple accessing users’ signals, and thus results in
increasing the number of multiple accessing users in For sequences u = {uj } and v = {vj }, u ⊕ v denotes {uj ⊕ vj },
CDMA [1–5]. that is, the sequence whose jth element is uj ⊕ vj , where
Kasami sequences can be generated with a binary linear ⊕ denotes addition modulo 2, that is, the exclusive OR
feedback shift register in the same manner as maximum- operation. For a positive integer f , consider the sequence
length sequences abbreviated as m sequences, Gold, v = {vj } formed by taking every f th bit of sequence u = {uj },
Gold-like, and dual-BCH sequences. Kasami sequences that is, vj = ufj for every integer j. This sequence, denoted
are classified into two sets: (1) the small set of Kasami u[f ], is said to be a decimation by f of u.
sequences and (2) the large set of Kasami sequences, Let h(x) = a0 xn + a1 xn−1 + · · · + an−1 x + an denote a
which have period 2n − 1, n even. For a small set binary polynomial of degree n where a0 = an = 1 and other
of Kasami sequences, the peak correlation magnitude, coefficients are value 0 or 1. A sequence u = {uj } is said to
the maximum absolute value of cross-correlations, is be a sequence generated by h(x) if for all integers j
optimal and approximately half of that achieved by the
Gold [6,7] and Gold-like sequences of the same period. a0 uj ⊕ a1 uj−1 ⊕ a2 uj−2 ⊕ · · · ⊕ an uj−n = 0 (3)
However, its size — the number of sequences in the set — is
approximately the square root of that of the Gold or Gold- Subsequence u0 , u1 , u2 , . . . can be generated by an n-stage
like sequences. A large set of Kasami sequences contains binary linear feedback shift register that has a feedback
a set of Gold or Gold-like sequences and the small set tap connected to the ith cell if hi = 1 for 0 < i ≤ n. If u
of Kasami sequences as subsets. Compared with a set is a sequence generated by h(x), then its phase shift is
of Gold or Gold-like sequences of the same period, its also generated by h(x). The period of a nonzero sequence
peak correlation magnitude is the same, and its size is u generated by the polynomial h(x) of degree n cannot
approximately 2n/2 times larger. exceed 2n − 1. If u has this maximal period N = 2n − 1, it
In this article, we consider only binary sequences. is called a maximum-length sequence or m sequence, and
Sarwate suggested extension to nonbinary or polyphase h(x) is called a primitive binary polynomial of degree n.
sequences in a manner similar to that for m sequences (for Let gcd(i, j) denote the greatest common divisor of
the term nonbinary Kasami sequences, see Ref. 2). the integers i and j. Let u be an m sequence. For
a positive integer f , if u[f ] is not identically zero,
u[f ] has period N/gcd(N, f ), and is generated by the
2. DEFINITIONS AND BASIC CONCEPTS polynomial hf (x) whose roots are the f th powers of
the roots of h(x). The degree of hf (x) is the smallest
(For further details, see Refs. 1, 2, 8, and 9.) By a positive integer m such that 2m − 1 is divisible by
sequence, we mean a binary infinitely long sequence with a N/gcd(N, f ). For positive integers f1 , f2 , . . . , ft such that
finite period. A sequence u = . . . , u−2 , u−1 , u0 , u1 , u2 , . . . is hf1 (x), hf2 (x), . . . , hft (x) are all different, the set of sequences
abbreviated as {uj }. Let T denote the left-shift operator by generated by hf1 (x)hf2 (x) . . . hft (x) is the set of linear
one bit. For an integer i, T i denotes the i-times applications (with respect to ⊕) sums of some phase shifts of
of T; that is, T i {uj } = {uj+i }. The period N of u is the least u[f1 ], u[f2 ], . . . , u[ft ].
positive integer such that ui = ui+N for all i. T i u is called The linear span of a sequence is defined as the smallest
a phase shift of u. degree of polynomials that generate the sequence, and it is
Define X(0) = +1 and X(1) = −1, where X represents also called the linear complexity. The imbalance between
binary shift keying modulation, that is, X(t) = exp{iπ t} = zeros and ones per period in a sequence u is defined as
1219
1220 KASAMI SEQUENCES
the absolute value of the difference between the numbers Small sets of Kasami sequences contain only M = 2n/2 =
of zeros and ones per period in u, which is equal to (N + 1)1/2 sequences, while Gold and Gold-like sequences
|N − 2wt(u)|, where N is the period of u. contain N + 2 and N + 1 sequences, respectively.
Regarding the linear span, since hS (x) has degree 3n/2,
the maximum linear span of sequences in the set is 3n/2.
3. SMALL SETS OF KASAMI SEQUENCES Considering the imbalance between the numbers of zeros
and ones per period, the maximum is 2n/2 + 1 : 1.
See Refs. 1 and 2.
4. LARGE SETS OF KASAMI SEQUENCES
3.1. Definition
See Refs. 1 and 2.
Let n be even and let u = {uj } denote an m sequence of
period N = 2n − 1 generated by a primitive polynomial 4.1. Definition
h1 (x) of degree n. Define s(n) = 2n/2 + 1. Consider the
Let n be even. Define t(n) = 2(n+2)/2 + 1. Let u and w
sequence w = u[s(n)] derived by decimating or sampling be defined as the nonzero sequences generated by the
the sequence {uj } with every s(n) bits. Then, w is a polynomials h1 (x) of degree n and hs(n) (x) of degree
sequence of period N = N/gcd(N, s(n)) = (2n − 1)/(2n/2 + n/2, respectively, as mentioned above for the small sets
1) = 2n/2 − 1 which is generated by the polynomial hs(n) (x) of Kasami sequences, and let v = u[t(n)] be a nonzero
whose roots are the s(n)th powers of the roots of h1 (x). sequence generated by ht(n) (x) of degree n [derived by
Since hs(n) has degree n/2 and is primitive, w is an decimating or sampling the sequence u with every t(n)
m sequence of period 2n/2 − 1. bits].
Now consider the nonzero sequences generated by the The period of v is given by
polynomial hS (x) = h1 (x)hs(n) (x) of degree 3n/2. As stated
in Section 2, any such sequence must be one of the forms N N/3 for n ≡ 0 mod 4
= (7)
T i u, T j w, T i u ⊕ T j w, 0 ≤ i < 2n − 1, 0 ≤ j < 2n/2 − 1. Thus gcd(N, t(n)) N for n ≡ 2 mod 4
any sequence of period N generated by hS (x) is some phase
shift of a sequence in the following set KS (u, w) defined by Then, the set of sequences generated by hL (x) =
h1 (x)hs(n) (x)ht(n) (x) of degree 5n/2 has period N = 2n − 1,
KS (u, w) = {u, u ⊕ w, u ⊕ Tw, u ⊕ T 2 w, and any sequence with the period N in the set is some
n/2 −2
phase shift of a sequence in the following set KL (u, v, w),
. . . , u ⊕ T2 w} (4) called the large set of Kasami sequences. There are two
cases.
This set of sequences is called the small set of Kasami
sequences.
Case 1. If n ≡ 2 mod 4, then
3.2. Correlation Properties
KL (u, v, w) = {G(u, v), G(u, v) ⊕ w, G(u, v) ⊕ Tw,
It has been proved [10,11] that the periodic correlation n/2 −2
functions θx,y (τ ) of any sequences x and y belonging to the . . . , G(u, v) ⊕ T 2 w} (8)
small sets of Kasami sequences KS (u, w) take only three
values: where G(u, v) is the set of Gold sequences [6,7]
defined by
θx,y (τ ) = −1 or − 2n/2 − 1 or 2n/2 − 1 (5) G(u, v) = {u, v, u ⊕ v, u ⊕ Tv, u ⊕ T 2 v, . . . ,
n −2
It is obvious that the peak correlation magnitude of the u ⊕ T2 v} (9)
periodic correlation function θmax = 2n/2 + 1 = s(n) for the
and G(u, v) ⊕ T i w denotes the set {x ⊕ T i w : x ∈
small set of Kasami sequences is approximately one half of
G(u, v)}.
the values of θmax = 2(n+2)/2 + 1 achieved by the Gold and
Gold-like sequences. Regarding the size of the sequences, KL (u, v, w)
The Welch bound [12] applied to a set of M = 2n/2 contains 2n/2 (2n + 1) sequences.
sequences of period N = 2n − 1 provides a lower bound Case 2. If n ≡ 0 mod 4, then
of θmax for all binary sequences:
KL (u, v, w)
1/2
M−1
= {H(u, v), H(u, v) ⊕ w, H(u, v) ⊕ Tw,
θmax ≥ N > 2n/2 (6)
NM − 1 n/2 −2
. . . , H(u, v) ⊕ T 2 w
Since N is odd, it follows from Eq. (1) and Eq. (2) that θmax v(0) ⊕ w, v(0) ⊕ Tw, . . . , v(0) ⊕ T (2
n/2 −1)/3−1
w
is also odd. This implies that θmax ≥ 2n/2 + 1. Comparing
(2n/2 −1)/3−1
this Welch bound with θmax = 2n/2 + 1 of the small set of v(1)
⊕ w, v (1)
⊕ Tw, . . . , v (1)
⊕T w
Kasami sequences, it is noted that the small set of Kasami (2n/2 −1)/3−1
v(2)
⊕ w, v (2)
⊕ Tw, . . . , v (2)
⊕T w}
sequences is an optimal collection of binary sequences with
respect to the bound. (10)
KASAMI SEQUENCES 1221
where H(u, v) is the set of Gold-like sequences [1–2] N whose parity-check polynomial is h(x). Sequence uc is
defined by called a codeword of C. Corresponding to the left-shift
n −1)/3−1
operator T, the cyclic left-shift operator Tc is defined by
H(u, v) ={u, v(0) ⊕ u, v(0) ⊕ Tu, . . . , v(0) ⊕ T (2 u
Tc (u0 , u1 , u2 , . . . , uN−1 ) = (u1 , u2 , . . . , uN−1 , u0 ). For uc ∈ C,
v(1)
⊕ u, v(1)
⊕ Tu, . . . , v
(1)
⊕T (2n −1)/3−1
u T i uc ∈ C and T i uc is called a cyclic shift of uc . The period
of u is the least positive integer N such that TcN uc = uc ,
(2n −1)/3−1
v(2) ⊕ u, v(2) ⊕ Tu, . . . , v(2) ⊕ T u} which is the same as the period of u. The code C can be
partitioned into blocks in such a way that codewords uc
(11)
and H(u, v) ⊕ T i w denotes the set {x ⊕ T i w : x ∈ and vc belong to a block if and only if they are cyclic shifts
H(u, v)} and v(i) = (T i u)[t(n)] is the result of of the other. The period of a codeword in a block is equal
decimating T i u by every t(n) bits. to the size of the block.
Let S be a minimal subset of S0 such that any nonzero
Regarding the size of the sequences, KL (u, v, w)
sequence with period N in S0 is some phase shift of a
contains 2n/2 (2n + 1) − 1 sequences.
sequence in S. Define Cs = {uc : u ∈ S}. Then, Cs consists
4.2. Correlation Properties of codewords chosen as a unique representative from each
block of size N. Thus the size of S is equal to the number
In either case 1 or 2, the correlation functions for any of blocks of size N.
sequences x, y ∈ KL (u, v, w) take only the following five For a codeword uc , let wt(uc ) denote the weight of uc ,
values [11]; for 0 ≤ τ < N = 2n − 1, we obtain that is, the number of ones in uc . The set W = {wt(uc ) :
Peak Cross
Period Linear Correlation Decimation
Sequence Set Order n N Size M Span θmax Sampler f
BIOGRAPHIES BIBLIOGRAPHY