Professional Documents
Culture Documents
1, JANUARY 2000 1
Abstract—Joint source-channel decoding based on residual and standard forward-error correction (FEC). However, FEC
source redundancy is an effective paradigm for error-resilient may be bandwidth-inefficient when the channel conditions are
data compression. While previous work only considered fixed-rate fairly mild. A variable-rate extension of JSC decoding could
systems, the extension of these techniques for variable-length
encoded data was recently independently proposed by the authors thus potentially replace FEC under mild conditions, and for
and by Demir and Sayood. In this letter, we describe and compare noisier channels, it could possibly be used with FEC to reduce
the performance of a computationally complex exact maximum a the coding rate and to extend the range of conditions under
posteriori (MAP) decoder, its efficient approximation, an alterna- which the bit stream is adequately protected.
tive approximate decoder, and an improved version of this decoder However, the nature of variable-rate systems greatly compli-
suggested here. Moreover, we evaluate several source and channel
coding configurations. The results show that our approximate cates the estimation problem at the decoder. Whereas, under
MAP technique outperforms other approximate methods and certain statistical assumptions, optimal maximum a posteriori
provides substantial error protection to variable-length encoded (MAP) sequence estimation can be efficiently achieved in the
data. fixed-rate case by dynamic programming on a trellis, this is not
Index Terms—Hidden Markov models, joint source-channel possible, in general, in the variable-rate case. Recently, several
decoding, MAP estimation, residual redundancy, variable length different variable-rate approaches were proposed. In [6], a com-
codes. putationally complex exact MAP decoding method and an effi-
cient approximation were both suggested. In [2], a different ap-
I. INTRODUCTION proximate method was independently proposed. In [13], another
method was proposed specifically for memoryless sources. Re-
(a)
(b)
Fig. 1. Communications system models considered in this letter, based on (a) BCE and (b) NCE.
symbol alphabet of size . In case a), we take to be the quan- Two alternative graph representations proposed in [6] and [2]
tization index sequence output by the source encoder, and in b), are shown, respectively, in Figs. 2 and 3, for and a partic-
we take it to be the symbol sequence output by the nonbinary ular set of codewords. In both figures, the small boxes represent
convolutional encoder. In either case, each index is Huffman possible decoder states, with a decoded sequence specified by
coded, resulting in an encoded bit stream , a connected path through the directed graph. However, in each
, with the length varying for each block of sym- case, the states have different meanings. In the bit-constrained
bols. For decoding, in case a), we propose a decoding (inner) case [6] (Fig. 2), the horizontal axis represents bit “time,” with
of the convolutional code followed by a JSC decoding (outer). each state at “time” representing a particular set of symbol
In case b), we use JSC decoding followed by NCE decoding sequences that exhaust bits. The sequences in the set are char-
(a symbol-to-quantization index mapping). In this section, we acterized by having a common length and a common termi-
focus on the JSC decoding, common to both configurations. The nating symbol. The enclosed number within a state box is the
JSC decoder, which is assumed to know the length based on length of the symbol sequence. In the symbol-constrained case
transmitted side information, receives the corrupted bit stream [2] (Fig. 3), the horizontal axis represents symbol “time,” with
.1 Its objective is to estimate given . As each state at “time” representing a particular set of symbol se-
in previous work [4], [8], [12], we model residual redundancy by quences of length . The sequences in this set are characterized
assuming the random process is first-order Markovian, based by having a common terminating symbol and by exhausting the
on the conditional probabilities and the same number of bits. The enclosed number within a state is the
initial probabilities . We will also assume that the length of the associated bit sequence. The statistical modeling
channel in Fig. 1 is binary symmetric, with bit-error rate (BER) assumptions guarantee that for both graph representations, one
. Moreover, in Fig. 1(a), we will find it useful to model the effec- can compute (which differs from
tive channel formed by the BCE encoder-channel-BCE decoder by the constant ) by summing metric
cascade as also being memoryless and binary symmetric. While contributions associated with each branch along the decoded
the true effective channel will in general have memory, the mem- path. Each branch contributes a term of the form
oryless assumption allows us to directly apply the JSC decoding , with the second proba-
method from [6]. Also, the decoder based on this (somewhat bility based on the BER and the Hamming distance between
inaccurate) model will be seen to achieve good performance. the Huffman codeword for and its corrupted version
An extension of [6] based on more accurate modeling can be , a subsequence of bits from
identified as a future direction.2 Given our modeling assump- . Here, is the length of the bit sequence required to de-
tions, the decoder objective addressed here is to estimate the se- code up to symbol . A dynamic programming algorithm for
quence of MAP probability, i.e., to realize the sequence MAP realizing the exact MAP rule was suggested in [6] for the struc-
rule . For the fixed-rate case, se- ture in Fig. 2. This approach could be applied to the graph in
quence MAP decoding can be posed as a search for the optimal Fig. 3 as well. However, in both representations, the number
path through an -state trellis, efficiently implementable via of states grows with the sequence length and can become very
dynamic programming [8]. In the variable-rate case, the optimal large, making exact MAP decoding computationally imprac-
sequence MAP decoder can also be achieved by dynamic pro- tical.3 This complexity necessitated the development of approx-
gramming as proposed in [6]. However, the problem cannot be imate schemes, proposed independently in [6] and [2], based,
represented on a trellis, but rather it must be posed on a more respectively, on the structures in Figs. 2 and 3.
complicated directed graph, due to the lack of synchronization There is no strong reason to favor either graph structure for
between the information symbols and the corrupted, received exact MAP decoding. However, as we indicate in this section,
bits. the structures differ markedly when one considers approximate
schemes. For the graph in Fig. 2, consider the set of states within
case a), r is the output of the convolutional decoder.
1In any dashed box. In each such set, all states exhaust the same
2A Markovian assumption, based on a finite number of channel states, is more number ( ) of bits and terminate a symbol subsequence at bit
reasonable than the memoryless BSC for modeling the effective channel. If the using the same decoded symbol. Such sets are defined at all
channel and source are both assumed Markovian, then the (source, channel)
pairs are likewise Markovian. Thus, one can develop extensions of the sequence 3As an example, for T = 256 (an image row length), N = 8, B = 429, and
estimation methods considered here, based on this (somewhat) more complex a particular set of VLC codewords, the number of states for the bit-constrained
statistical model. graph structure will reach 1365.
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 48, NO. 1, JANUARY 2000 3
Fig. 2. Bit-constrained directed graph representation for VLC decoding/MAP sequence estimation in [6].
along with all branches emanating from this state. When sub-
sequently performing state reduction for the first dashed set at
, symbol sequences that include the branch from
to will not be considered as candidates.5
While it is not implemented this way, it may be useful to think
of our approximate method [6] in terms of the following equiv-
alent procedure: 1) perform state reduction on the graph up to
bit time and 2) perform optimal (MAP) sequence estimation
on the resulting reduced graph using the Viterbi algorithm. A
more detailed pseudocode description of both our approximate
and exact MAP algorithms can be found in [6].
Fig. 3. Symbol-constrained directed graph representation for VLC
At this point, we make two observations about our approxi-
decoding/approximate MAP sequence estimation in [2]. mate method. First, although the cardinality of each dashed set
grows with bit length, the number of retained states for each
bit lengths. Our approximate method applied to this graph [6] dashed set remains constant, at one. Thus, the complexity of the
is similar to dynamic programming in that it involves a “for- method remains manageable for increasing bit length. Second,
ward” operation for propagating and accumulating metrics and we emphasize that our state reduction rule is based on a MAP
saving pointers to previous states, and a “traceback” operation criterion. To be precise, consider the th state in the th dashed
to find the best state sequence. However, whereas in standard set at bit length . Let denote the symbol
dynamic programming this forward operation effectively saves, sequence with MAP probability from the set of sequences
at each time, the best partial sequence terminating in each state, terminating in this state. Here, is the graph “carved
the approximate method reduces complexity by saving only one out” by state reduction through bit length . The explicit
sequence for each set of states belonging to the same dashed set. notational dependence of on indicates that the can-
The saved sequence is the one best in the sense of a posteriori didate symbol sequences are restricted to those that trace paths
probability. For each dashed set, this approach compares symbol through the reduced graph . Given this notation, state
sequences of different lengths and then prunes all but one. The reduction for set at bit length realizes the bit-constrained
sequence pruning is achieved by removing, from each dashed MAP rule: , with
set, all but one state (the one indexed by the length of the op- and the subset of states in the dashed
timal sequence). Thus, a state reduction operation is performed set consistent with the reduced graph . The state
within the Viterbi algorithm’s forward step, for each dashed set, hypotheses all correspond to symbol sequences of different
at each bit time. lengths, but they all try to “explain” the same observed data
Let us consider an example of state reduction for the graph in sequence . To summarize, our “bit-constrained” method
Fig. 2. Denote the th state in the th dashed set at bit length performs state reduction consistent with a MAP criterion and
by . Then, for the first dashed set at , the finds the symbol sequence that is MAP-optimal on the resulting
method will choose between and .4 The saved reduced graph via dynamic programming.
state is the one which terminates the symbol sequence of MAP Now, consider approximate decoding for the symbol-con-
(log) probability. Note that this operation at bit length prunes strained graph in Fig. 3 [2]. Again, the dashed sets group
states and branches from the graph, thus restricting the candi- together states that terminate symbol sequences using the same
date symbol sequences (and states) considered at subsequent bit decoded symbol. However, now, rather than exhausting the
lengths . Using the same example, if is chosen same number of bits, all states in any dashed set exhaust the
over at , then is removed from the graph, same number of symbols. The method in [2], similar to [6],
4Here, s (1; 1) corresponds to the state with symbol length 4 and s (1; 2) 0
5Moreover, if a removed state at bit length n 1 is the only source of branch
to the state with symbol length 3. The states within dashed sets are enumerated connections for some possible state at bit length n, then this state will not be
from the top down. created. For example, if s (1; 1) is removed, then s (1; 1) will not be created.
4 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 48, NO. 1, JANUARY 2000
involves a “forward” operation and a “traceback.” Similar to modify the symbol-constrained approach to include this term
[6], this approach approximates the Viterbi “forward” step in the accumulated metric used for state reduction. Next, we
by saving, at each symbol length, only the best state from consider the effect of this modification, and more generally,
each dashed set. Thus, the number of saved states remains evaluate the performance of all of the JSC decoders discussed
constant for increasing symbol length, with the complexity here.
of the resulting method again manageable. Clearly, the
methods from [6] and [2] implement quite similar operations. III. RESULTS
Thus, we might expect them to perform roughly the same.
We tested the various methods for DPCM coding of images
However, there are potential problems with the approach
over a binary symmetric channel. The choice of DPCM allows
in [2], as next indicated. Note that whereas state reduction
comparison with [2] and provides a simple framework for eval-
in [6] compares symbol sequences of different lengths that
uating decoding techniques and basic system configurations,
exhaust the same number of bits, the method in [2] compares
rather than focusing on absolute image coding performance.
symbol sequences of the same length that exhaust different
We also note that DPCM is often practically used for coding
numbers of bits. This suggests that state reduction in [2]
low-frequency transform and subband signals. We considered
might be equivalent to a “symbol-constrained” MAP rule:
both variable- and fixed-rate systems, with and without FEC,
. Here, is the
with the FEC based on BCE or NCE. In [2], a system composed
symbol sequence length, with the received bit sequence
of NCE-based channel encoding and JSC decoding was com-
associated with the th state in the th dashed set. Again,
pared with a more conventional system based on BCE and (stan-
is the subset of states in the dashed set consistent with the
dard Viterbi) channel decoding. There, the NCE-based system
reduced graph . Also, is the sequence of
was found to achieve superior performance. However, in the
symbols with MAP probability from the set of sequences that
BCE-based scheme, no JSC decoding was used. Here, for the
are consistent with the state-reduced graph and that terminate
BCE-based scheme, we use both inner (bit level, Viterbi based)
in state . We emphasize that unlike the previous
decoding and outer (symbol level, JSC) decoding.
bit-constrained case, the bit sequences associated with
For BCE, we chose a rate 2/3 convolutional encoder with con-
states in the same dashed set all have different lengths. The
straint length 2 as in [10]. This was applied to the Huffman-
significance of this observation is next seen.
coded indices associated with a six-level quantizer. In this case
It is not obvious which state reduction rule—bit-constrained
[Fig. 1(a)], rather than using , the JSC decoder used the ef-
or symbol-constrained MAP—is to be preferred. However, a
fective BER of the concatenated BCE-channel-Viterbi decoder
difficulty with [2] is that it only implements an approximation
system. For NCE, we matched the code to the (six-level) quanti-
to the symbol-constrained MAP rule. In particular, note that
zation to increase residual redundancy in the sequence, as in ear-
for both decoder structures (Figs. 2 and 3), the accumulated
lier work [2]. We thus chose an NCE with 6 input symbols and
metric used for state reduction consists of only two terms:
12 output symbols.7 Denote the quantization index at symbol
the log likelihood of a symbol sequence starting at symbol
time by . Then, the NCE output is given
time 1 and ending at some symbol time , ,
by the equation
and the conditional log likelihood of a received bit sequence
. However, the log a posteriori probability if
is . Thus, for otherwise.
both approximate decoder structures, the term is
omitted. In the bit-constrained case, this does not introduce any Our particular choices for BCE and NCE were made so that the
suboptimality in the state reduction rule, since all competing different systems would achieve comparable bit rates, allowing
states from the same dashed set involve the same useful comparisons.
received bit sequence. However, omitting this term does intro- For variable rates, in conjunction with BCE and Viterbi-based
duce suboptimality in the symbol-constrained case, since each (inner) decoding, we implemented the bit-constrained approxi-
state from the same dashed set corresponds to a bit sequence mate MAP (BC-AMAP) (outer) JSC decoder [6]. This method
with a different length. Thus, to achieve MAP state reduction in is denoted BC-AMAP + BCE. For variable rates, in con-
the symbol-constrained case, each candidate requires junction with NCE, we implemented the exact MAP decoder
a different term , omitted in [2]. Moreover, (denoted MAP + NCE) [6], the bit-constrained approximation
one cannot easily correct this problem in a precise way, since (BC-AMAP + NCE) [6], the symbol-constrained method
calculation of requires at least as much computation (SC-AMAP + NCE) [2], and its modification suggested here
as the exact MAP rule itself.6 As a practical way to improve the (MSC-AMAP + NCE). We also implemented a 14-level vari-
method from [2], we assume all bit sequences of a given length able-rate quantizer with decoding solely based on BC-AMAP,
are equally likely, i.e., we let . We then i.e., without channel coding (denoted BC-AMAP~FEC). For
fixed-rate DPCM, we used an eight-level quantizer with: 1)
6Although one may use either the graph structure in Fig. 2 or the one in Fig. 3 conventional DPCM decoding (denoted FLC); 2) channel-opti-
to calculate P [r ], the probability of a bit sequence of length n, it is both simpler mized scalar quantization (COSQ) [3]; and 3) sequence-based,
and more intuitive to consider the structure in Fig. 2. For this structure, P [r ]
can be computed by first recursively calculating forward variables (see [11]) for 7We chose an ordered mapping of quantization levels to indices, with an in-
all states in the graph up to and including bit length n. The sum of the forward creasing sequence of integer indices assigned to the sequence of quantization
variables at length n gives the required probability. levels ordered from smallest to largest.
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 48, NO. 1, JANUARY 2000 5
IV. CONCLUSION [2] N. Demir and K. Sayood, “Joint source/channel coding for variable
length codes,,” in Proc. Data Comp. Conf., Snowbird, UT, 1998, pp.
We have described several methods for extending JSC de- 139–148.
coding to VLC data. In comparing these techniques, we found [3] A. J. Kurtenbach and P. A. Wintz, “Quantizing for noisy channels,” IEEE
Trans. Commun., vol. COM-17, pp. 291–302, Apr. 1969.
the approximate MAP method from [6] to achieve the best [4] D. J. Miller and M. Park, “A sequence-based approximate MMSE
results, in comparison with alternative approximate decoders. decoder for source coding over noisy channels using discrete hidden
Our experiments identified BER regimes where fundamentally Markov models,” IEEE Trans. Commun., vol. 46, pp. 222–231, Feb.
1998.
different system configurations (fixed rate, variable rate, FEC [5] A. H. Murad and T. E. Fuja, “Joint source-channel decoding of vari-
using either BCE or NCE, no explicit channel coding) are able-length encoded sources,” in Proc. Information Theory Workshop,
preferred. In very recent work, we have developed an alter- Killarney, Ireland, June 1998.
[6] M. Park and D. J. Miller, “Decoding entropy-coded symbols over noisy
native decoder for variable-rate systems based on a minimum channels by MAP sequence estimation for asynchronous HMMs,” in
mean-squared error criterion [7]. Future work may address Proc. Conf. Information Sciences and Systems, Princeton, NJ, Mar.
extensions, including bit allocation strategies and alternative 1998, pp. 477–482.
[7] , “Improved joint source-channel decoding for variable-length en-
source and channel models. We may also apply the new JSC coded data using soft decisions and MMSE estimation,” in Proc. Data
methods to other compression environments, such as subband Comp. Conf., Snowbird, UT, Mar. 1999, p. 544.
coding. [8] N. Phamdo and N. Farvardin, “Optimal detection of discrete Markov
sources over discrete memoryless channels—Application to combined
source channel coding,” IEEE Trans. Inform. Theory, vol. 40, pp.
186–193, Jan. 1994.
ACKNOWLEDGMENT [9] N. Phamdo, F. Alajaji, and N. Farvardin, “Quantization of memoryless
and Gauss–Markov sources over binary Markov channels,” IEEE Trans.
The authors would like to thank Dr. R. Van Dyck from Penn- Commun., vol. 45, pp. 668–675, June 1997.
sylvania State University for a suggestion that helped to improve [10] J. G. Proakis, Digital Communications. New York: McGraw-Hill,
1995, p. 472.
the paper. [11] L. Rabiner and B. H. Juang, Fundamentals of Speech Recogni-
tion. Englewood Cliffs, NJ: Prentice Hall, 1993, pp. 335–337.
[12] K. Sayood and J. C. Borkenhagen, “Use of residual redundancy in the
REFERENCES design of joint source/channel coders,” IEEE Trans. Commun., vol. 39,
pp. 838–846, June 1991.
[1] E. Ayanoğlu and R. M. Gray, “The design of joint source and channel [13] K. P. Subbalakshmi and J. Vaisey, “Optimal decoding of entropy coded
trellis waveform coders,” IEEE Trans. Inform. Theory, vol. IT-33, pp. memoryless sources over binary symmetric channels,” in Proc. Data
855–865, Nov. 1987. Comp. Conf., Snowbird, UT, 1998, p. 573.