Rebollo EURASIPSig Proc 2006

High-Rate Quantization and Transform Coding
with Side Information at the Decoder

David Rebollo-Monedero , Shantanu Rane, Anne Aaron and Bernd Girod
Information Systems Lab., Dept. of Electrical Engineering
Stanford University, Stanford, CA 94305, USA
Abstract
We extend high-rate quantization theory to Wyner-Ziv coding, i.e., lossy source coding with side information
at the decoder. Ideal Slepian-Wolf coders are assumed, thus rates are conditional entropies of quantization indices
given the side information. This theory is applied to the analysis of orthonormal block transforms for Wyner-Ziv
coding. A formula for the optimal rate allocation and an approximation to the optimal transform are derived. The
case of noisy high-rate quantization and transform coding is included in our study, in which a noisy observation
of source data is available at the encoder, but we are interested in estimating the unseen data at the decoder,
with the help of side information.
We implement a transform-domain Wyner-Ziv video coder that encodes frames independently but decodes
them conditionally. Experimental results show that using the discrete cosine transform results in a rate-distortion
improvement with respect to the pixel-domain coder. Transform coders of noisy images for dierent communica-
tion constraints are compared. Experimental results show that the noisy Wyner-Ziv transform coder achieves a
performance close to the case in which the side information is also available at the encoder.
Keywords: high-rate quantization, transform coding, side information, Wyner-Ziv coding, distributed source cod-
ing, noisy source coding
1. Introduction nating motion compensation, and decoding using

past frames as side information, while keeping the
Rate-distortion theory for distributed source eciency close to that of motion-compensated en-
coding [36] shows that under certain conditions, coding [911]. In addition, even if the image cap-
the performance of coders with side information tured by the video encoder is corrupted by noise,
available only at the decoder is close to the case we would still wish to recover the clean, unseen
in which both encoder and decoder have access data at the decoder, with the help of side infor-
to the side information. Under much more re- mation, consisting of previously decoded frames,
strictive statistical conditions, this also holds for and perhaps some additional local noisy image.
coding of noisy observations of unseen data [7,8]. In these examples, due to complexity con-
One of the many applications of this result is re- straints in the design of the encoder, or simply
ducing the complexity of video encoders by elimi- due to the unavailability of the side information
This
at the encoder, conventional, joint denoising and
work was supported by NSF under Grant No. CCR-
0310376. The material in this paper was presented par-
coding techniques are not possible. We need prac-
tially at the 37th and 38th editions of the Asilomar Confer- tical systems for noisy source coding with decoder
ence on Signals, Systems and Computers, Pacic Grove, side information, capable of the rate-distortion
CA, Nov. 2003 [1] and 2004 [2]. performance predicted by information-theoretic
Corresponding author. Tel.: +1-650-724-3647. E-mail
address: drebollo@stanford.edu (D. Rebollo-Monedero).

studies. To this end, it is crucial to extend the
2 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod
building blocks of traditional source coding and tings. In particular, [32] considered the important
denoising, such as lossless coding, quantization, case of ideal Slepian-Wolf coding of the quantiza-
transform coding and estimation, to distributed tion indices, at a rate equal to the conditional
source coding. entropy given the side information. In [3436],
It was shown by Slepian and Wolf [3] that loss- nested lattice quantizers and trellis-coded quan-
less distributed coding can achieve the same per- tizers followed by Slepian-Wolf coders were used
formance as joint coding. Soon after, Wyner and to implement WZ coders.
Ziv [4,12] established the rate-distortion limits for The Karhunen-Loève Transform (KLT) [37
lossy coding with side information at the decoder, 39] for distributed source coding was investigated
which we shall refer to as Wyner-Ziv (WZ) cod- in [40,41], but it was assumed that the covariance
ing. Later, an upper bound on the rate loss due matrix of the source vector given the side infor-
to the unavailability of the side information at mation does not depend on the values of the side
the encoder was found in [5], which also proved information, and the study was not in the con-
that for power-dierence distortion measures and text of a practical coding scheme with quantiz-
smooth source probability distributions, this rate ers for distributed source coding. Very recently,
loss vanishes in the limit of small distortion. A the distributed KLT was studied in the context
similar high-resolution result was obtained in [13] of compression of Gaussian source data, assum-
for distributed coding of several sources with- ing that the transformed coecients are coded at
out side information, also from an information- the information-theoretic rate-distortion perfor-
theoretic perspective, that is, for arbitrarily large mance [42,43]. Most of the recent experimental
dimension. In [14] (unpublished), it was shown work on WZ coding uses transforms [44,1,24].
that tessellating quantizers followed by Slepian- There is extensive literature on source cod-
Wolf coders are asymptotically optimal in the ing of a noisy observation of an unseen source.
limit of small distortion and large dimension. The nondistributed case was studied in [4547],
It may be concluded from the proof of the con- and [7,4850,8] analyzed the distributed case
verse to the WZ rate-distortion theorem [4] that from an information-theoretic point of view. Us-
there is no asymptotic loss in performance by con- ing Gaussian statistics and Mean-Squared Error
sidering block codes of suciently large length, (MSE) as a distortion measure, [13] proved that
which may be seen as vector quantizers, followed distributed coding of two noisy observations with-
by xed-length coders. This suggests a conve- out side information can be carried out with a
nient implementation of WZ coders as quantiz- performance close to that of joint coding and de-
ers, possibly preceded by transforms, followed by noising, in the limit of small distortion and large
Slepian-Wolf coders, analogously to the imple- dimension. Most of the operational work on dis-
mentation of nondistributed coders. tributed coding of noisy sources, that is, for a
Practical distributed lossless coding schemes xed dimension, deals with quantization design
have been proposed, adapting channel coding for a variety of settings [5154], but does not con-
techniques such as turbo codes and low-density sider the characterization of such quantizers at
parity-check codes, which are approaching the high rates or transforms.
Slepian-Wolf bound, e.g., [1523]. See [24] for a A key aspect in the understanding of opera-
much more exhaustive list. tional coding is undoubtedly the theoretic charac-
The rst studies on quantizers for WZ cod- terization of quantizers at high rates [55], which is
ing were based on high-dimensional nested lat- also fundamental in the theoretic study of trans-
tices [2527], or heuristically designed scalar forms for data compression [56]. In the litera-
quantizers [16,28], often applied to Gaussian ture reviewed at this point, the studies of high-
sources, with xed-length coding or entropy cod- rate distributed coding are information theoretic,
ing of the quantization indices. A dierent ap- thereby requiring arbitrarily large dimension,
proach was followed in [2932], where the Lloyd among other constraints. On the other hand, the
algorithm [33] was generalized for a variety of set- aforementioned studies of transforms applied to
High-Rate Quantization and Transform Coding with Side Information at the Decoder 3
compression are valid only for Gaussian statistics 2. High-Rate WZ Quantization of Clean Sources
and assume that the transformed coecients are
coded at the information-theoretic limit. We study the properties of high-rate quantizers
In this paper, we provide a theoretic charac- for the WZ coding setting in Fig. 1. The source
terization of high-rate WZ quantizers for a xed
dimension, assuming ideal Slepian-Wolf coding of X Q X
q( x) x(q, y )
the quantization indices, and we apply it to de-
velop a theoretic analysis of orthonormal trans-
forms for WZ coding. Both the case of coding
Y
of directly observed data, and the case of coding
of a noisy observation of unseen data, are consid- Fig. 1. WZ quantization.
ered. We shall refer to these two cases as coding
of clean sources and noisy sources, respectively. data to be quantized is modeled by a continuous
The material in this paper was presented partially random vector X of nite dimension n. Let the
in [1,2]. quantization function q(x) map the source data
Section 2 presents a theoretic analysis of high- into the quantization index Q. A random vari-
rate quantization of clean sources, and Section 3, able Y , distributed in an arbitrary alphabet, dis-
of noisy sources. This analysis is applied to the crete or continuous, plays the role of side infor-
study of transforms of the source data in Sec- mation, available only at the receiver. The side
tions 4 and 5, also for clean and noisy sources, re- information and the quantization index are used
spectively. Section 6 analyzes the transformation jointly to estimate the source data. Let X repre-
of the side information itself. In Section 7, exper- sent this estimate, obtained with the reconstruc-
imental results on a video compression scheme tion function x (q, y).
using WZ transform coding, and also on image MSE is used as a distortion measure, thus
denoising, are shown to illustrate the clean and the expected distortion per sample is D =
2
n E X X . The rate for asymmetric Slepian-
noisy coding cases. 1
Throughout the paper, we follow the conven- Wolf coding of Q given Y has been shown to be
tion of using uppercase letters for random vari- H(Q|Y ) in the case when the two alphabets of the
ables, including random scalars, vectors, or ab- random variables involved are nite [3], in the
stract random ensembles, and lowercase letters sense that any rate greater than H(Q|Y ) would
for the particular values that they take on. The allow arbitrarily low probability of decoding er-
measurable space in which a random variable ror, but any rate lesser than H(Q|Y ) would not.
takes values will be called alphabet. Let X In [57], the validity of this result has been general-
be a random variable, discrete, continuous or ized to countable alphabets, but it is still assumed
in an arbitrary alphabet, possibly vector valued. that H(Y ) < . We show in Appendix A that
Its probability function, if it exists, will be de- the asymmetric Slepian-Wolf result remains true
noted by pX (x), whether it is a probability mass under the assumptions in this paper, namely, for
function (PMF) or a probability density func- any Q in a countable alphabet and any Y in an
tion (PDF)(a) . For notational convenience, the arbitrary alphabet, possibly continuous, regard-
covariance operator Cov and the letter will be less of the niteness of H(Y ). The formulation
used interchangeably. For example, the condi- in this work assumes that the coding of the in-
tional covariance of X| y is the matrix function dex Q with side information Y is carried out by an
X|Y (y) = Cov[X| y]. ideal Slepian-Wolf coder, with negligible decod-
ing error probability and rate redundancy. The
expected rate per sample is dened accordingly
as R = n1 H(Q| Y ) [32].
(a) As a matter of fact, a PMF is a PDF with respect to We emphasize that the quantizer only has ac-
the counting measure. cess to the source data, not to the side informa-
tion. However, the joint statistics of X and Y are satises, for large RX|Y (y),
assumed to be known, and are exploited in the 2
design of q(x) and x(q, y). We consider the prob- DX|Y (y) Mn V (y) n , (1)
lem of characterizing the quantization and recon- RX|Y (y) 1
n (h(X| y) log2 V (y)) , (2)
struction functions that minimize the expected 2
2RX|Y (y)
Lagrangian cost C = D + R, with a nonnega- DX|Y (y) Mn 2 n h(X|y)
2 . (3)
tive real number, for high rate R. Then, there exists an asymptotically optimal
The theoretic results are presented in Theo- quantizer q(x) for large R, for the WZ coding set-
rem 1. The theorem holds if the Bennett as- ting considered such that:
sumptions [58,59] apply to the conditional PDF
pX|Y (x| y) for each value of the side informa- 1. q(x) is a uniform tessellating quantizer with
tion y, and if Gershos conjecture [60] is true minimum moment of inertia Mn and cell
(known to be the case for n = 1), among other volume V .
technical conditions, mentioned in [55]. For a rig- 2. No two cells of the partition dened by q(x)
orous treatment of high-rate theory that does not need to be mapped into the same quantiza-
rely on Gershos conjecture, see [61,62]. tion index.
We shall use the term uniform tessellating
quantizer in reference to quantizers whose quan- 3. The rate and distortion satisfy
tization regions are possibly rotated versions of 2
D Mn V n , (4)
a common convex polytope, with equal volume.
Lattice quantizers are, strictly speaking, a par- R 1
n (h(X| Y ) log2 V ) , (5)
ticular case. In the following results, Gershos D Mn 2
2
n h(X|Y )
2 2R
. (6)
conjecture for nondistributed quantizers, which
allows rotations, will be shown to imply that op- Proof: The proof uses the quantization setting
timal WZ quantizers are also tessellating quan- in Fig. 2, which we shall refer to as a condi-
tizers, and the uniformity of the cell volume will tional quantizer, along with an argument of opti-
be proved as well(b) . Mn denotes the minimum mal rate allocation for q(x| y), where q(x| y) can
normalized moment of inertia of the convex poly- be regarded as a quantizer on the values x and y
topes tessellating Rn (e.g., M1 = 1/12). taken by the source data and the side informa-
tion, or a family of quantizers on x indexed by y.
Theorem 1 (High-rate WZ quantization). Sup-
In this case, the side information Y is available
pose that for each value y in the alphabet of Y ,
the statistics of X given Y = y are such that
X Q X
the conditional dierential entropy h(X| y) ex- q( x | y ) x(q, y )
ists and is nite. Suppose further that for
each y, there exists an asymptotically optimal
entropy-constrained uniform tessellating quan- Y
tizer of x, q(x| y), with rate RX|Y (y) and distor-
tion DX|Y (y), with no two cells assigned to the Fig. 2. Conditional quantizer.
same index and with cell volume V (y) > 0, which
to the sender, and the design of the quantiza-
(b) A tessellating quantizer need not be uniform. A triv- tion function q(x| y) on x, for each value y, is a
ial example is a partition of the real line into intervals of nondistributed entropy-constrained quantization
dierent length. In 1 dimension, uniform tesselating quan- problem. More precisely, for all y dene
tizers are uniform lattice quantizers. It is easy to construct
simple examples in R2 of uniform and nonuniform tesse- DX|Y (y) = 1 2 | y],
E[X X
lating quantizers that are not lattice quantizers using rec- n
tangles. However, the optimal nondistributed, xed-rate RX|Y (y) = 1
n H(Q| y),
quantizers for dimensions 1 and 2 are known to be lattices:
the Z-lattice and the hexagonal lattice respectively. CX|Y (y) = DX|Y (y) + RX|Y (y).

By iterated expectation, D = E DX|Y (Y ) and Proof: Use h(X| Y ) = 12 log2 (2e)n det X|Y ,
R = E RX|Y (Y ), thus the overall cost satises and Mn 2e1
[63], together with Theorem 1.
C = E CX|Y (Y ). As a consequence, a family of
quantizers q(x| y) minimizing CX|Y (y) for each y
also minimizes C.
Since CX|Y (y) is a convex function of RX|Y (y) 3. High-Rate WZ Quantization of Noisy Sources
for all y, it has a global minimum where its deriv-
ative vanishes, or equivalently, at RX|Y (y) such In this section, we study the properties of high-
that 2 ln 2 DX|Y (y). Suppose that is small rate quantizers of a noisy source with side infor-
enough for RX|Y (y) to be large and for the ap- mation at the decoder, as illustrated in Fig. 3,
proximations (1)-(3) to hold, for each y. Then, all which we shall refer to as WZ quantizers of a
quantizers q(x| y) introduce the same distortion noisy source. A noisy observation Z of some un-
(proportional to ) and consequently have a com-
mon cell volume V (y) V . This, together with Z Q X
the fact that EY [h(X| y)]y=Y = h(X| Y ), implies q( z ) x(q, y )
(4)-(6). Provided that a translation of the parti-
tion dened by q(x| y) aects neither the distor-
tion nor the rate, all uniform tessellating quantiz- Y
ers q(x| y) may be set to be (approximately) the Fig. 3. WZ quantization of a noisy source.
same, which we denote by q(x). Since none of the
quantizers q(x| y) maps two cells into the same seen source data X is quantized at the encoder.
indices, neither does q(x). Now, since q(x) is The quantizer q(z) maps the observation into a
asymptotically optimal for the conditional quan- quantization index Q. The quantization index is
tizer and does not depend on y, it is also optimal losslessly coded, and used jointly with some side
for the WZ quantizer in Fig. 1. information Y , available only at the decoder, to
obtain an estimate X of the unseen source data.
Equation (6) means that, asymptotically, there
x
(q, y) denotes the reconstruction function at the
is no loss in performance by not having access to
decoder. X, Y and Z are random variables with
the side information in the quantization.
known joint distribution, such that X is a con-
Corollary 2 (High-rate WZ reconstruction). Un- tinuous random vector of nite dimension n. No
der the hypotheses of Theorem 1, asymptotically, restrictions are imposed on the alphabets of Y
there is a quantizer that leads to no loss in per- and Z.
formance by ignoring the side information in the MSE is used as a distortion measure, thus
reconstruction. the expected distortion per sample of the unseen
source is D = n1 E X X 2 . As in the previ-
Proof: Since index repetition is not required, the ous section, it is assumed that the coding of the
distortion (4) would be asymptotically the same if index Q is carried out by an ideal Slepian-Wolf
the reconstruction x
(q, y) were of the form x
(q) = coder, at rate per sample R = n1 H(Q| Y ). We
E[X| q]. emphasize that the quantizer only has access to
the observation, not to the source data or the side
Corollary 3. Let X and Y be jointly Gaussian information. However, the joint statistics of X,
random vectors. Then, the conditional covari- Y and Z can be exploited in the design of q(z)
ance X|Y does not depend on y, and for large R, and x(q, y). We consider the problem of charac-
terizing the quantizers and reconstruction func-
1 tions that minimize the expected Lagrangian cost
D Mn 2e (det X|Y ) n 22R C = D + R, with a nonnegative real number,
1
(det X|Y ) n 22R . for high rate R. This includes the problem in the
n previous section as the particular case Z = X.
3.1. Nondistributed Case Take expectation to obtain (7). Note that the
rst term of (7) does not depend on the quantiza-
We start by considering the simpler case of
tion design, and the second is the MSE between X
quantization of a noisy source without side in-
and X.
formation, depicted in Fig. 4. The following
Let r(q) be the codeword length function of
a uniquely decodable code, that is, satisfying
Z Q X r(q)
q( z ) x(q) q2 1, with R = E r(Q). The La-
grangian cost of the setting in Fig. 4 can be writ-
Fig. 4. Quantization of a noisy source without side infor- ten as
mation.
C= 1
n (E tr Cov[X| Z]+
theorem extends the main result of [46,47] to
entropy-constrained quantization, valid for any + inf E inf {
x(Z) x
(q)2 + r(q)}),
x
(q),r(q) q
rate R = H(Q), not necessarily high. Dene
x
(z) = E[X| z], the best MSE estimator of X and the cost of the setting in Fig. 5 as
given Z, and X =x(Z).
C= 1
n (E tr Cov[X| Z]+
Theorem 4 (MSE noisy quantization). For any x
nonnegative and any Lagrangian-cost optimal + inf E inf {X (q)2 + r(q)}),
x
(q),r(q) q
quantizer of a noisy source without side infor-
mation (Fig. 4), there exists an implementation which give the same result. Now, since the ex-
with the same cost in two steps: pected rate is minimized for the (admissible) rate
measure r(q) = log pQ (q) and E r(Q) = H(Q),

1. Obtain the minimum MSE estimate X. both settings give the same Lagrangian cost with
2. Quantize the estimate X regarded as a a rate equal to the entropy.
clean source, using a quantizer q(x) and Similarly to the remarks on Theorem 1, the
a reconstruction function x
(q), minimizing hypotheses of the next theorem are believed to
E X X
2 + H(Q). hold if the Bennett assumptions apply to the PDF
This is illustrated in Fig. 5. Furthermore, the pX (
x) of the MSE estimate, and if Gershos con-
total distortion per sample is jecture is true among other technical conditions.
X
2 ), Theorem 5 (High-rate noisy quantization). As-
D= 1
n (E tr Cov[X| Z] + E X (7) < and that there exists a uni-
sume that h(X)
where the rst term is the MSE of the estimation form tessellating quantizer q( x) of X with cell
step. volume V that is asymptotically optimal in La-
grangian cost at high rates. Then, there exists an
asymptotically optimal quantizer q(z) of a noisy
Z X Q X source in the setting of Fig. 4 such that:
E[ X | z ] q( x ) x(q)
1. An asymptotically optimal implementation
Fig. 5. Optimal implementation of MSE quantization of a of q(z) is that of Theorem 4, represented
noisy source without side information.
in Fig. 5, with a uniform tessellating quan-
Proof: The proof is a modication of that in [47], tizer q(
x) having cell volume V .
replacing distortion by Lagrangian cost. De- 2. The rate and distortion per sample satisfy
ne the modied distortion measure d(z, x ) =
it is easy 2
E[X x2 | z]. Since X Z X, D 1 n,
X).
2 = E d(Z, By the n E tr Cov[X| Z] + Mn V
to show that E X X
orthogonality principle of linear estimation, R n (h(X) log2 V ),
1
2
x
d(z, (z)2 | z] +
) = E[X x x(z) x
2 . D 1
n E tr Cov[X| Z] + Mn 2 n h(X) 22R .
Proof: Immediate from Theorem 4 and conven- Y ) = h(X

4. h(X| Z | Y ).
tional theory of high-rate quantization of clean
sources. Proof: The proof is similar to that for clean
sources in Theorem 1 and only the dierences are
3.2. Distributed Case emphasized. First, as in the proof of WZ quanti-
zation of a clean source, a conditional quantiza-
We are now ready to consider the WZ quan- tion setting is considered, as represented in Fig. 6.
tization of a noisy source in Fig. 3. Dene An entirely analogous argument using conditional
x
(y, z) = E[X| y, z], the best MSE estimator costs, as dened in the proof for clean sources,
of X given Y and Z, X = x (Y, Z), and D = implies that the optimal conditional quantizer is
1
n E tr Cov[X| Y, Z]. The following theorem ex- an optimal conventional quantizer for each value
tends the results on high-rate WZ quantization of y. Therefore, using statistics conditioned on y
in Section 2 to noisy sources. The remark on
the hypotheses of Theorem 5 also applies here, Z Q X
where the Bennett assumptions apply instead to q( z, y ) x(q, y )
(
the conditional PDF pX|Y x| y) for each y.
Theorem 6 (High-rate noisy WZ quantization).
Y
Suppose that the conditional expectation func-
Fig. 6. Conditional quantization of a noisy source.
tion x
(y, z) is additively separable, i.e., x
(y, z) =
Y (y) + x
x Z (z), and dene X Z = x Z (Z). Sup- everywhere, by Theorem 4, the optimal condi-
pose further that for each value y in the alpha- tional quantizer can be implemented as in Fig. 7,
y) < , and that there exists a
bet of Y , h(X| with conditional costs
uniform tessellating quantizer q( with
x, y) of X,
no two cells assigned to the same index and cell 2
DX|Y (y) 1 n
n E[tr Cov[X| y, Z]| y] + Mn V (y) ,
volume V (y) > 0, with rate RX|Y (y) and dis-
RX|Y (y)
n (h(X| y) log2 V (y)),
1
tortion DX|Y
(y), such that, at high rates, it is
asymptotically optimal in Lagrangian cost and DX|Y (y) 1
n E[tr Cov[X| y, Z]| y]+
2
(y) Mn V (y) ,
DX|Y n 2
+ Mn 2 n h(X|y) 22RX|Y (y) .

RX|Y
(y)
1 y) log2 V (y) ,
h(X|
n
2 The derivative of CX|Y (y) with respect to
h(X|y) 2RX|Y
(y) Mn 2 n
DX|Y 2 (y)
.
Then, there exists an asymptotically optimal Z X Q X
E[ X y, z ] q( x , y ) x(q, y )
quantizer q(z) for large R, for the WZ quanti-
zation setting represented in Fig. 3 such that:
1. q(z) can be implemented as an estimator Y
Z (z) followed by a uniform tessellating
x Fig. 7. Optimal implementation of MSE conditional quan-
quantizer q(
xZ ) with cell volume V . tization of a noisy source.
2. No two cells of the partition dened by 2

xZ ) need to be mapped into the same
q( RX|Y (y) vanishes when 2 ln 2 Mn V (y) n ,
quantization index. which, as in the proof for clean sources, implies
that all conditional quantizers have a common
3. The rate and distortion per sample satisfy cell volume V (y) V (however, only the second
2 term of the distortion is constant, not the over-
D D + Mn V n , (8) all distortion). Taking expectation of the condi-
Y ) log2 V ),
R 1 (h(X| (9) tional costs proves that (8) and (9) are valid for
n
2 the conditional quantizer of Fig. 7. The validity
D D + Mn 2 n h(X|Y ) 22R . (10) of (10) for the conditional quantizer can be shown
by solving for V in (9) and substituting the result xZ ).

are the centroids of q(
into (8).
Proof: In the proof of Theorem 6, q(xZ ) is a uni-
The assumption that x (y, z) = xY (y) + x Z (z)
form tessellating quantizer without index repeti-
means that for two values of y, y1 and y2 , x (y1 , z)
tion, a translated copy of q(
x, y).
and x(y2 , z), seen as functions of z, dier only by
a constant vector. Since the conditional quantizer Theorem 6 and Corollary 7 show that the WZ
q(
of X, x| y), is a uniform tessellating quantizer quantization setting of Fig. 3 can be implemented
at high rates, a translation will neither aect the as depicted in Fig. 8, where xZ (q, y) can be made
distortion nor the rate, and therefore x (y, z) can independent from y without asymptotic loss in
be replaced by x Z (z) with no impact on the La- performance, so that the pair q( Z (q) form
xZ ), x
grangian cost. In addition, since all conditional a uniform tessellating quantizer and reconstructor
quantizers have a common cell volume, the same Z .
for X
translation argument implies that a common un-
conditional quantizer q( xZ ) can be used instead, Z XZ Q X Z X
xZ ( z ) q ( xZ ) xZ (q, y )
with performance given by (8)-(10), and since
conditional quantizers do not reuse indices, nei- xY ( y )
ther does the common unconditional quantizer.
Y
The last item of the theorem follows from the
Z | y) = h(X Z | y). Fig. 8. Asymptotically optimal implementation of MSE
fact that h( xY (y) + X WZ quantization of a noisy source with additively separa-
Clearly, the theorem is a generalization of The- ble x
(y, z).
orem 1, since Z = X implies x Z (z) = z,
(y, z) = x Finally, if x
Z (z) is a bijective vector eld, then,
trivially additively separable. under mild conditions, including continuous dif-
The case in which X can be written as X = ferentiability of x Z (z) and its inverse, it can be
f (Y ) + g(Z) + N , for any (measurable) func- shown that h(X| Y ) in Theorem 6 satises
tions f , g and any random variable N with
E[N | y, z] constant with (y, z), gives an exam- Y ) = h(Z| Y ) + E log2 det dxZ (Z) ,
h(X| dz
ple of additively separable estimator. This in-
where d xZ (z)/dz denotes the Jacobian matrix
cludes the case in which X, Y and Z are jointly
of x
Z (z).
Gaussian. Furthermore, in the Gaussian case,
since xZ (z) is an ane function and q( xZ ) is a
4. WZ Transform Coding of Clean Sources
uniform tessellating quantizer, the overall quan-
xZ (z)) is also a uniform tessellating quan-
tizer q( The following intermediate denitions and re-
tizer, and if Y and Z are uncorrelated, then sults will be useful to analyze orthonormal trans-
Y (y) = E[X| y] and x
x Z (z) = E[X| z], but not forms for WZ coding. Dene the geometric ex-
in general. pectation of a positive random scalar S as G S =
Observe that, according to the theorem, if the bE logb S , for any positive real b dierent from 1.
estimator x (y, z) is additively separable, there is Note that if S were discrete with probability
no asymptotic loss in performance by not using mass function pS (s), then G S = s spS (s) . The
the side information at the encoder. constant factor in the rate-distortion approxima-
tion (3) can be expressed as
Corollary 7. Assume the hypotheses of Theo-
rem 6, and that the optimal reconstruction lev- 2 1/n
Mn 2 n h(X|y) = 2X|Y (y) det X|Y (y) ,

els x
(q, y) for each of the conditional quantizers
q(
x, y) are simply the centroids of the quantiza- where 2X|Y (y) depends only on Mn and
tion cells for a uniform distribution. Then, there pX|Y (x| y), normalized with covariance identity.
is a WZ quantizer q( xZ ) that leads to no as- If h(X| y) is nite, then
ymptotic loss in performance if the reconstruction 1/n
function is x
Z (q) + x
(q, y) = x Z (q)
Y (y), where x 2X|Y (y) det X|Y (y) > 0,
2 2
and since GY [2 n [h(X|y)]y=Y ] = 2 n h(X|Y ) , (6) is estimate of X given Y , i.e., E[X| Y ]. In fact, the
equivalent to orthogonality principle of conditional estimation
implies
1/n 2R
D G[2X|Y (Y )] G[ det X|Y (Y ) ]2 .
X|Y + Cov E[X| Y ] = Cov X,

We are now ready to consider the transform
X|Y Cov X, with equality if and only if
thus
coding setting in Fig. 9. Let X = (X1 , . . . , Xn )
E[X| Y ] is a constant with probability 1.
X1 X 1c Q1c Q1c X 1c X 1
q1c SWC x1c Theorem 8 (WZ transform coding). Assume Ri
large so that the results for high-rate approxima-
X2 X 2c Q2c Q2c X 2c X 2 tion of Theorem 1 can be applied to each subband
UT q2c SWC x2c U in Fig. 9, i.e.,

Xn X nc Qnc Qnc X nc X n Di 1
12 22 h(Xi |Y ) 22Ri . (11)
qnc SWC xnc
Suppose further that the change of the shape
of the PDF of the transformed components
Y with the choice of U is negligible so that
2
i G Xi |Y (Y ) may be considered constant, and
Fig. 9. Transformation of the source vector.
that Var X 2
(Y ) 0, which means that the
i |Y
be a continuous random vector of nite dimen- variance of the conditional distribution does not
sion n, modeling source data, and let Y be an change signicantly with the side information.
arbitrary random variable playing the role of side Then, minimization of the overall Lagrangian
information available at the decoder, for instance, cost C is achieved when the following conditions
a random vector of dimension possibly dierent hold:
from n. The source data undergo an orthogonal
transform represented by the matrix U , precisely, 1. All bands have a common distortion D. All
X = U T X. Each transformed component Xi quantizers are uniform, without index repe-
is coded individually with a scalar WZ quantizer tition, and with a common interval width
(represented in Fig. 1). The quantization index is such that D 121
2 .
assumed to be coded with an ideal Slepian-Wolf
1
h(Xi |Y )
coder, abbreviated as SWC in Fig. 9. The (entire) 2. D 1
12 22 n i 22R .
side information Y is used for Slepian-Wolf decod-
ing and reconstruction to obtain the transformed 3. An optimal choice of U is one that diago-
estimate X , which is inversely transformed to re- X|Y , that is, it is the KLT for the
nalizes
cover an estimate of the original source vector ac- expected conditional covariance matrix.
cording to X =UX .
The expected distortion in subband i is Di = 4. The transform coding gain T , which we de-
E (Xi X )2 . The rate required to code the
i ne as the inverse of the relative decrease of
quantization index Qi is Ri = H(Qi | Y ). De- distortion due to the transform, satises
ne the total expected distortion per sample as
D = n1 E X X 2 , and the total expected rate 2
i G[X i |Y
(Y )]1/n
per sample as R = n1 i Ri . We wish to mini- T 2

G[X (Y )]1/n
mize the Lagrangian cost C = D + R. i
i |Y
2
Dene the expected conditional covariance iG[X i |Y
(Y )]1/n
X|Y = E X|Y (Y ) = EY Cov[X| Y ]. Note that
.
X|Y is the covariance of the error of the best X|Y 1/n
det
X|Y implies that X |Y =

X|Y ,
Proof: Since U is orthogonal, D = n1 Di . The Since U = U
minimization of the overall Lagrangian cost we conclude that the distortion is minimized pre-
cisely for that choice of U . The expression for the
C = n1 Di + Ri transform coding gain follows immediately.
i
yields a common distortion condition, Di D Corollary 9 (Gaussian case). If X and Y are

(proportional to ). Equation (11) is equivalent jointly Gaussian random vectors, then it is only
to necessary to assume the high-rate approximation
2Ri hypothesis of Theorem 8, in order for it to hold.
Di G[2X |Y (Y )] G[X
2
|Y (Y )] 2 .
i i
Furthermore, if DVQ and RVQ denote the distor-
1/n
Since Di D for all i, then D = i Di and tion and the rate when an optimal vector quan-
tizer is used, then we have:
D G[2X |Y (Y )]1/n X|Y = X XY 1 T .
i 1. Y XY
i

1/n 2R
2
G[X |Y (Y )] 2 , (12) 2. h(X| Y ) = i h(Xi | Y ).
i
i
D
which is equivalent to Item 2 in the statement 3. DVQ 1/12
Mn e 1.53 dB.
n 6
of the theorem. The fact that all quantizers are
uniform and the interval width satises D = 12 1
2 4. R RVQ 1
log2 1/12
1
log2 e

2 Mn n 2 6
is a consequence of Theorem 1 for one dimension.
0.25 b/s.
For any positive random scalar S such that
Var S 0, it can be shown that G S E S. It is Proof: Conditionals of Gaussian random vec-
assumed in the theorem that Var X 2
(Y ) 0, tors are Gaussian,
and linear transforms preserve
i |Y
hence Gaussianity, thus i G 2X |Y (Y ), which depends
i
2
G X |Y (Y ) E X |Y (Y ).
2 only on the type of PDF, is constant with U . Fur-
i i thermore,
This,
together with the assumption that
i G 2
Xi |Y (Y ) may be considered constant, im-
X|Y (y) = X XY 1 T
Y XY ,
plies that the choice of U that minimizes the dis-
2
tortion (12) is approximately equal to that mini- constant with y, hence Var X (Y ) = 0. The
i |Y
mizing i E X 2
(Y ). dierential entropy identity follows from the fact
i |Y
X|Y is nonnegative denite. The spectral de-
that for Gaussian random vectors (conditional)
composition theorem implies that there exists an independence is equivalent to (conditional) un-
orthogonal matrix U X|Y and a nonnegative def- correlatedness, and that this is the case for each y.
inite diagonal matrix X|Y =
X|Y such that To complete the proof, apply Corollary 3.
UX|Y X|Y U T . On the other hand,
X|Y The conclusions and the proof of the previous
corollary are equally valid if we only require that
y X |Y (y) = U T X|Y (y) U X| y be Gaussian for every y, and X|Y (y) be
X|Y U,
X |Y = U T constant.
2
As an additional example with Var X (Y ) =
i |Y
where a notation analogous to that of X is used 0, consider X = f (Y ) + N , for any (measurable)
for X . function f , and assume that N and Y are inde-
Finally, from Hadamards inequality and the pendent random vectors. X |Y (y) = U T N U ,
fact that U is orthogonal, it follows that constant with y. If in addition, N is Gaussian,

E X 2
|Y (Y ) det X |Y = det X|Y .
then so is X| y.
i
i
Corollary 10 (DCT). Suppose that for each y, Then, the transform coding gain satises
X|Y (y) is Toeplitz with a square summable as-
1
2
sociated autocorrelation so that it is also asymp- n i X |Y (Y )
T G T (Y ), T (Y ) = 2 i 1/n .
totically circulant as n . In terms of the i X |Y (Y )
i
associated random process, this means that Xi is
conditionally covariance stationary given Y , that Proof: Dene
is, (Xi E[Xi | y]| y)iZ is second-order stationary 2 1/n
for each y. Then, it is not necessary to assume i X |Y (y)
T (y) = 2 i .
that Var X2
(Y ) 0 in Theorem 8 in order for i X |Y (y)
1/n
i |Y i
it to hold, with the following modications for U
and T : According to Theorem 8, it is clear that T
G T (Y ). Now, for each y, since by assumption
1. The Discrete Cosine Transform (DCT) is the conditional variances are constant with i, the
an asymptotically optimal choice for U (c) . numerator of T (y) satises

2
2. The transform coding gain is given by X i |Y
(y)1/n = X
2
0 |Y
(y) = n1 2
X i |Y
(y).
i i
2 1/n
i Xi |Y (Y )
T G T (Y ), T (Y ) = 1/n . Finally, since X = U T X and U is orthonormal,
det X|Y (Y )

2
X i |Y
(y) = E[X E[X| y]2 | y] =
Proof: The proof proceeds along the same lines i

of that of Theorem 8, observing that the DCT = E[X E[X | y]2 | y] = 2
X |Y (y).
i
matrix asymptotically diagonalizes X|Y (y) for i
each y, since it is symmetric and asymptotically
circulant [64, Chapter 3].
5. WZ Transform Coding of Noisy Sources
Observe that the coding performance of the
cases considered in Corollaries 9 and 10 would be 5.1. Fundamental Structure
asymptotically the same if the transform U were
If x(y, z) is additively separable, the asymp-
allowed to be a function of y.
totically optimal implementation of a WZ quan-
We would like to remark that there are sev-
tizer established by Theorem 6 and Corollary 7,
eral ways by which the transform coding gain in
illustrated in Fig. 8, suggests the transform cod-
Item 4 of the statement of Theorem 8, and also
ing setting represented in Fig. 10. In this set-
in Item 2 of Corollary 10, can be manipulated to
ting, the WZ uniform tessellating quantizer and
resemble an arithmetic-geometric mean ratio in- Z , regarded as a clean source,
reconstructor for X
volving the variances of the transform coecients.
have been replaced by a WZ transform coder of
This is consistent with the fact that the transform
clean sources, studied in Section 4. The trans-
coding gain is indeed a gain. The following corol-
form coder is a rotated, scaled Z-lattice quan-
lary is an example.
tizer, and the translation argument used in the
Corollary 11. Suppose, in addition to the hypothe- proof of Theorem 6 still applies. By this argu-
ses of Theorem 8, that X 2 2
(y) = X (y) for ment, an additively separable encoder estimator
i |Y 0 |Y
all i = 1, . . . , n, and for all y. This can be under- x
(y, z) can be replaced by an encoder estimator
stood as a weakened version of the conditional co- Z (z) and a decoder estimator x
x Y (y) with no loss
variance stationarity assumption in Corollary 10. in performance at high rates.
The transform coder acts now on X Z , which
(c) Precisely, undergoes the orthonormal transformation X =
U T is the analysis DCT matrix, and U the Z
T
synthesis DCT matrix. U XZ . Each transformed coecient XZ i is
X Z1 X Zc 1 Q1c Q1c X Zc 1 X Z 1
q1c SWC xZc 1
Z XZ2 X Zc 2 Q2c Q2c X Zc 2 X Z 2 X

xZ ( z ) UT q2c SWC xZc 2 U
XZn X Zc n Qnc Qnc X Zc n X Z n

qnc SWC xZc n
xY ( y )
Y
Fig. 10. WZ transform coding of a noisy source.
coded separately with a WZ scalar quantizer (for All quantizers are uniform, without index
a clean source), followed by an ideal Slepian- repetition, and with a common interval
Wolf coder (SWC), and reconstructed with the width such that D 2 /12.
help of the (entire) side information Y . The re- 2 |Y )
h(X
construction X
is inversely transformed to ob-
Z
D
2. D = D + D, 1
12 2n i Zi
22R .
tain X
Z = U X
. The nal estimate of X is
Z 3. U diagonalizes E Cov[X Z | Y ], i.e., is the
X =x Y (Y ) + X
Z . Clearly, the last summation KLT for the expected conditional covariance
could be omitted by appropriately modifying the Z .
matrix of X
reconstruction functions of each subband. All the
denitions of the previous section are maintained, Proof: Apply Theorem 8 to X Z . Note that since
except for theoverall rate per sample, which is X =X Y + X Z and X
=XY + X , then X
Z
Z
now R = n1 Ri , where Ri is the rate of the

XZ = X X, and use (7) for (Y, Z) instead of Z

i subband. D = n1 E X
th Z X
Z 2 denotes the to prove Item 2.
distortion associated with the clean source X Z . y=x
Similarly to Theorem 6, since X| Y (y) +
The decomposition of a WZ transform coder of
XZ | y, h(XZi | Y ) = h(Xi | Y ). In addition,
a noisy source into an estimator and a WZ trans- = 1 E X 2 and E Cov[X
X Z | Y ] =
D n
form coder of a clean source allows the direct ap-
E Cov[X| Y ] Cov X.
plication of the results for WZ transform coding
of clean sources in Section 4. Corollary 13 (Gaussian case). If X, Y and Z are
Theorem 12 (Noisy WZ transform coding). Sup- jointly Gaussian random vectors, then it is only
pose x(y, z) is additively separable. Assume the necessary to assume the high-rate approximation
hypotheses of Theorem 8 for X Z . In summary, hypotheses of Theorem 12, in order for it to hold.
assume that the high-rate approximation hypothe- Furthermore, if DVQ denotes the distortion when
ses for WZ quantization of clean sources hold for the optimal vector quantizer of Fig. 8 is used, then
each subband, the change in the shape of the PDF
D D 1/12 e
of the transformed components with the choice of 1.53 dB.
DVQ D Mn n 6
the transform U is negligible, and the variance
of the conditional distribution of the transformed Proof: x(y, z) is additively separable. Apply
coecients given the side information does not Corollary 9 to X Z and Y , which are jointly
change signicantly with the values of the side Gaussian.
information. Then, there exists a WZ transform
coder, represented in Fig. 10, asymptotically op-
Corollary 14 (DCT). Suppose that x (y, z) is addi-
timal in Lagrangian cost, such that: y] =
tively separable and that for each y, Cov[X|

1. All bands introduce the same distortion D.
Cov[XZ | y] is Toeplitz with a square summable
associated autocorrelation so that it is also as- a convenient transform-domain estimation struc-

ymptotically circulant as n . In terms of ture. Suppose that X, Y and Z are zero-mean,
the associated random processes, this means that jointly wide-sense stationary random processes.
X i (equivalently, X
Zi ) is conditionally covariance Suppose further that they are jointly Gaussian,
stationary given Y , i.e., ((X i | y])| y)iZ is
i E[X or, merely for simplicity, that a linear estimator
second-order stationary for each y. of X given ( YZ ) is required. Then, under cer-
Then, it is not necessary to assume in Theo- tain regularity conditions, a vector Wiener lter
rem 12 that the conditional variance of the trans- (hY hZ ) can be used to obtain the best linear es-
formed coecients is approximately constant with
timate X:
the values of the side information in order for it
to hold, and the DCT is an asymptotically opti-
X(n) = (hY hZ )(n) ( YZ ) (n) =
mal choice for U . = hY (n) Y (n) + hZ (n) Z(n).
Z and Y .
Proof: Apply Corollary 10 to X
Observe that, in general, hY will dier from the
We remark that the coding performance of the individual Wiener lter to estimate X given Y ,
cases considered in Corollaries 13 and 14 would and similarly for hZ . The Fourier transform of
be asymptotically the same if the transform U the Wiener lter is given by
and the encoder estimator x Z (z) were allowed to
depend on y. (HY HZ )(ej ) = SX ( Y ) (ej ) S( Y ) (ej )1 , (13)
Z Z
For any random vector Y , set X = f (Y ) + Z +
NX and Z = g(Y )+NZ , where f , g are any (mea- where S denotes a power spectral density matrix.
surable) functions, NX is a random vector such For example, let NY , NZ be zero-mean wide-
that E[NX | y, z] is constant with (y, z), and NZ sense stationary random processes, representing
is a random vector independent from Y such that additive noise, uncorrelated with each other and
Cov NZ is Toeplitz. Cov[X| y] = Cov[Z| y] = with X, with a common power spectral density
Cov NZ , thus this is an example of constant matrix SN . Let Y = X + NY and Z = X + NZ
conditional variance of transformed coecients be noisy versions of X. Then, as an easy conse-
which, in addition, satises the hypotheses of quence of (13), we conclude
Corollary 14.
HY (ej ) = HZ (ej ) =
5.2. Variations on the Fundamental Structure
SX (ej )
The fundamental structure of the noisy WZ = . (14)
2SX (ej ) + SN (ej )
transform coder analyzed can be modied in a
number of ways. We now consider variations The factor 2 multiplying SX in the denomina-
on the encoder estimation and transform for this tor reects the fact that 2 signals are using for
structure, represented completely in Fig. 10, and denoising. Suppose now that X, Y and Z are in-
partially in Fig. 11(a). Later, in Section 6, we stead blocks (of equal length) of consecutive sam-
shall focus on variations involving the side infor- ples of random processes. Recall that a block
mation. drawn from the convolution of a sequence with
A general variation consists of performing the a lter can be represented as a product of a
encoder estimation in the transform domain. Toeplitz matrix h, with entries given by the im-
More precisely, dene Z = U T Z, X = U TX
Z
Z pulse response of the lter, and a block x drawn
T
and xZ (z ) = U x Z (U z ) for all z . Then, the en- from the input sequence. If the lter has nite
coder estimator satises x Z (z) = U x Z (U T z), as energy, the Toeplitz matrix h is asymptotically
T
illustrated in Fig. 11(b). Since U U = I, the es- circulant as the block length increases, so that
timation and transform U T x Z (z) can be written it is asymptotically diagonalized by the Discrete
simply as x Z (U T z), as shown in Fig. 11(c). Fourier Transform (DFT) matrix [65,66], denoted
The following informal argument will suggest by U , as h = U HU T . The matrix multiplication
X Z1 X Zc 1 X Z1 X Zc 1 X Zc 1
q1c q1c q1c
xZ ( z )
Z XZ2 X Zc 2 Z Zc X Zc XZ2 X Zc 2 Z Zc X Zc 2
xZ ( z ) UT q2c UT xZc ( z c) U UT q2c UT xZc ( z c) q2c
XZn X Zc n XZn X Zc n X Zc n
qnc qnc qnc
(a) Fundamental structure. (b) Estimation in the transformed domain. (c) Equivalent structure.
Fig. 11. Variations of the fundamental structure of a WZ transform coder of a noisy source.
y = hx, analogous to a convolution, is equiva- underlying processes (14) at the appropriate fre-
lent to U T y = HU T x, analogous to a spectral quencies. Furthermore, if the Wiener lter com-
multiplication for each frequency, since H is di- ponent hZ associated with x Z (z) is even, as in
agonal. This suggests the following structure for the previous example, then the convolution ma-
trix is not only Toeplitz but also symmetric, and
xZ ( z )
the DCT can be used instead of the DFT as the
D1 transform U [64](d) . An ecient method for gen-
Z Zc X Zc XZ eral DCT-domain ltering is presented in [67].
T
U U If the transform-domain estimator is of the
Dn form x Z (z ) = HZ z , for some diagonal ma-
trix HZ , as in the structure suggested above, or
Fig. 12. Structure of the estimator x
Z (z) inspired by linear more generally, if x Z (z ) operates individually on
shift-invariant ltering. A similar structure may be used each transformed coecient zi , then the equiva-
for x
Y (y). lent structure in Fig. 11(c) can be further sim-
plied to group each subband scalar estimation
the estimator used in the WZ transform coder, Z i (zi ) and each scalar quantizer qi (zi ) as a sin-
x
represented in Fig. 12: x(y, z) = x Y (y) + x
Z (z), gle quantizer. The resulting structure transforms
where xZ (z) = U HZ U T z, for some diagonal ma- the noisy observation and then uses a scalar WZ
trix HZ , and similarly for xY (y). The (diagonal) quantizer of a noisy source for each subband.
entries of HY and HZ can be set according

to the This is in general dierent from the fundamen-
Yi
best linear estimate of Xi given Z . For the tal structure in Figs. 10 or 11(a), in which an
i
previous example, in which Y and Z are noisy estimator was applied to the noisy observation,
observations of X, the estimation was transformed, and each trans-
formed coecient was quantized with a WZ quan-
(HY ii HZ ii ) = Y 1
Y
i
tizer for a clean source. Since this modied struc-
Xi i ture is more constrained than the general struc-
Zi Zi
2
ture, its performance may be degraded. However,
X
the design of the noisy WZ scalar quantizers at
HY ii = HZ ii = 2 + 2
i
,
2X N each subband, for instance using the extension of
i i
the Lloyd algorithm in [8], may be simpler than
where X 2 T
= ui X ui is the variance of the i
th the implementation of a nonlinear vector estima-
i
transform coecient of X, and ui the correspond- tor x Z (z), or a noisy WZ vector quantizer oper-
ing (column) analysis vector of U , and similarly (d) If a real Toeplitz matrix is not symmetric, there is no
2
for N . Alternatively, HY ii and HZ ii can be ap-
guarantee that the DCT will asymptotically diagonalize
i
proximated by sampling the Wiener lter for the it, and the DFT may produce complex eigenvalues.
2
ating directly on the noisy observation vector. is equivalent to minimizing X| . Since
X
T T
= Var[X c Y ] Var[Xc Y ] = X|Y ,
2 2
6. Transformation of the Side Information X| X
6.1. Linear Transformations the minimum is achieved, in particular, for c = c

Suppose that the side information is a random and = 1 (and in general for any scaled c ).
vector of nite dimension k. A very convenient The following theorems on transformation of
simplication in the setting of Figs. 9 and 10 the side information are given for the more gen-
would consist of using scalars, obtained by some eral, noisy case, but are immediately applicable
transformation of the side information vector, in to the clean case by setting Z = X = X =XZ .
each of the Slepian-Wolf coders and in the recon-
struction functions. This is represented in Fig. 13. Theorem 16 (Linear transformation of side infor-
Even more conveniently, we are interested in lin- mation). Under the hypotheses of Corollary 13,
ear transforms Y = V T Y that lead to a small for high rates, the transformation of the side in-
loss in terms of rate and distortion. It is not formation given by
required for V to dene an injective transform,
since no inversion is needed. V T = U T X Z Y 1
Y (15)
Proposition 15. Let X be a random scalar with minimizes the total rate R, with no performance
mean X , and let Y be a k-dimensional random loss in distortion or rate with respect to the trans-
vector with mean Y . Suppose that X and Y are form coding setting of Fig. 10 (and in particu-
jointly Gaussian. Let c Rk , which gives the lar Fig. 9), in which the entire vector Y is used
= cT Y . Then,
linear estimate X for decoding and reconstruction. Precisely, recon-
struction functions dened by E[X | q, y] and by
Zi

= h(X| Y ),
min h(X| X) E[XZi | q, yi ] give approximately the same distor-
c tion D i , and Ri = H(X | Y ) H(X | Y ).
Zi i Zi
and the minimum is achieved for c such that X is Proof: Theorems 6 and 12 imply
the best linear estimate of X X given Y Y ,
Zi
Ri = H(X Zi
| Y ) h(X | Y ) log2 ,
in the MSE sense.
Proof: Set c = XY 1 T thus the minimization of Ri is approximately

Y , so that c Y is the | Y ).
best linear estimate of X X given Y Y . (The equivalent to the minimization of h(X Zi
Since linear transforms preserve Gaussianity, X
assumption that Y is Gaussian implies, by de- Z
nition, the invertibility of Y , and therefore the and Y are jointly Gaussian, and Proposition 15
applies to each X . V is determined by the best
existence of a unique estimate.) For each y, X| y Zi
2 linear estimate of X given Y , once the means
is a Gaussian random scalar with variance X|Y , Z
constant with y, equal to the MSE of the best have been removed. This proves that there is no
ane estimate of X given Y . Since additive con- loss in rate. Corollary 2 implies that a subopti-
stants preserve variances, the MSE is equal to the mal reconstruction is asymptotically as ecient,
variance of the error of the best linear estimate of thus there is no loss in distortion either.
X X given Y Y , also equal to Var[X cT Y ]. Observe that X Z Y 1
Y in (15) corresponds
On the other hand, for each c, X| x is a Gaussian to the best linear estimate of X Z from Y , dis-
2
random scalar with variance X| equal to the
X regarding their means. This estimate is trans-
variance of the error of the best linear estimate of formed according to the same transform applied
XX given X T
, denoted by Var[X c Y ]. to X, yielding an estimate of X . In addi-
X Z
Minimizing tion, joint Gaussianity implies the existence of
a matrix B such that X Z = BZ. Consequently,
=
h(X| X) 1 2
log2 (2e X| ) X Z Y = BZY .
2 X
X Z1 X Zc 1 Q1c Q1c X Zc 1 X Z 1
q1c SWC xZc 1
Y1c
Z XZ2 X Zc 2 Q2c Q2c X Zc 2 X Z 2 X
xZ ( z ) UT q2c SWC xZc 2 U
Y2c
XZn X Zc n Qnc Qnc X Zc n X Z n
qnc SWC xZc n
Ync
xY ( y )
yc( y ) Y
Fig. 13. WZ transform coding of a noisy source with transformed side information.
6.2. General Transformations Proof: Theorems 6 and 12 imply Ri =

H(X | Y ) h(X
| Y ) log2 . Proposition 17
Theorem 16 shows that under the hypotheses Zi Zi
| Y ), and Corol-
ensures that h(XZi | Y ) = h(X Zi i
of high-rate approximation, for jointly Gaussian
lary 7 that a suboptimal reconstruction is asymp-
statistics, the side information could be lin-
totically as ecient if Yi is used instead of Y .
early transformed and a scalar estimate used for
Slepian-Wolf decoding and reconstruction in each
subband, instead of the entire vector Y , with no In view of these results, Theorem 16 inciden-
asymptotic loss in performance. Here we extend tally shows that in the Gaussian case, the best
this result to general statistics, connecting WZ linear MSE estimate is a sucient statistic, which
coding and statistical inference. can also be proven directly (for instance com-
Let X and be random variables, represent- bining Propositions 15 and 17). The obtention
ing, respectively, an observation and some data of (minimal) sucient statistics has been stud-
we wish to estimate. A statistic for from X is ied in the eld of statistical inference, and the
a random variable T such that X T , for Lehmann-Schee method is particularly useful
instance, any function of X. A statistic is su- (e.g. [69]).
cient if and only if T X. Many of the ideas on the structure of the es-
timator x (y, z) presented in Section 5.2 can be
Proposition 17. A statistic T for a continuous applied to the transformation of the side informa-
random variable from an observation X satis- tion y (y). For instance, it could be carried out in
es h(| T ) h(| X), with equality if and only the domain of the data transform U . If, in addi-
if T is sucient. tion, xY (y) is also implemented in the transform
domain, for example in the form of Fig. 12, then,
Proof: Use the data processing inequality to write in view of Fig. 13, a single transformation can be
I(; T ) I(; X), with equality if and only if T shared as the rst step of both y (y) and x Y (y).
is sucient [68], and express the mutual informa- Furthermore, the summation x

Y (Y ) + XZ can be
tion as a dierence of entropies.
is
carried out in the transform domain, since X Z
available, eliminating the need to undo the trans-
Theorem 18 (Reduction of side information). Un- form as the last step of x Y (y).
der the hypotheses of Theorem 12 (or Corollar- Finally, suppose that the linear transform
ies 13 or 14), a sucient statistic Yi for X
Zi in (15) is used, and that U (asymptotically) di-
from Y can be used instead of Y for Slepian-Wolf agonalizes both X Z Y and Y . Then, since U is
decoding and reconstruction, for each subband i in orthonormal, it is easy to see that y (y) = V T y =
the WZ transform coding setting of Fig. 10, with X Z Y 1 T
Y U y, where denotes the correspond-
no asymptotic loss in performance. ing diagonal matrices and U T y is the transformed
side information. Of course, the scalar multipli- form distortion across bands. Rate-compatible
cations for each subband may be suppressed by punctured turbo codes are used for Slepian-Wolf
designing the Slepian-Wolf coders and the recon- coding in each subband. The parity bits produced
Y (y) is of
struction functions accordingly, and, if x by the turbo encoder are stored in a buer which
the form of Fig. 12, the additions in the transform transmits a subset of these parity bits to the de-
domain can be incorporated into the reconstruc- coder upon request. The rate-compatible punc-
tion functions. tured turbo code for each band can come close
to achieving the ideal Slepian-Wolf rate for the
7. Experimental Results given transform band.
At the decoder, we take previously recon-
7.1. Transform WZ Coding of Clean Video structed frames to generate side information Y ,
In [11], we apply WZ coding to build a which is used to decode X. In the rst setup
low-complexity, asymmetric video compression (MC-I), we perform motion-compensated inter-
scheme where individual frames are encoded in- polation on the previous and next reconstructed
dependently (intraframe encoding) but decoded key frames to generate Y . In the second
conditionally (interframe decoding). In the pro- scheme (MC-E), we produce Y through motion-
posed scheme we encode the pixel values of a compensated extrapolation using the two previ-
frame independently from other frames. At the ous reconstructed frames: a key frame and a WZ
decoder, previously reconstructed frames are used frame. The DCT is applied to Y , generating the
as side information and WZ decoding is per- dierent side information coecient bands Yi . A
formed by exploiting the temporal similarities be- bank of turbo decoders reconstruct the quantized
tween the current frame and the side information. coecient bands independently using the corre-
In the following experiments, we extend the sponding Yi as side information. Each coecient
WZ video codec, outlined in [11], to a trans- subband is then reconstructed as the best esti-
form domain WZ coder. The spatial transform mate given the previously reconstructed symbols
enables the codec to exploit the statistical de- and the side information. Note that unlike in
pendencies within a frame, thus achieving better Fig. 9, where the entire Y is used as side informa-
rate-distortion performance. tion for decoding and reconstructing each trans-
For the simulations, the odd frames are desig- form band, in our simplied implementation, we
nated as key frames which are encoded and de- only use the corresponding side information sub-
coded using a conventional intraframe codec. The band. More details of the proposed scheme and
even frames are WZ frames which are intraframe extended results can be found in [70].
encoded but interframe decoded, adopting the The compression results for the rst 100 frames
WZ transform coding setup for clean sources and of Mother & Daughter are shown in Fig. 14. MSE
transformed side information, described in Sec- has been used as a distortion measure, expressed
tions 4 and 6. as Peak Signal-To-Noise Ratio (PSNR) in dB, de-
To encode a WZ frame X, we rst apply a ned as 10 log10 (2552 /MSE). For the plots, we
blockwise DCT to generate X . As stated in only include the rate and distortion of the lumi-
Corollary 10, the DCT is an asymptotically op- nance of the even frames. The even frame rate is
timal choice for the orthonormal transform. Al- 15 frames per second. We compare our results to:
though the system is not necessarily high rate, we 1. DCT-based intraframe coding: the even
follow Theorem 8 in the quantizer design. Each frames are encoded as Intracoded (I)
transform coecient subband is independently frames.
quantized using uniform scalar quantizers with
no index repetitions and similar step sizes across 2. H.263+ interframe coding with an I-B-I-B
bands(e) . This allows for a approximately uni- predictive structure, counting only the rate
and each transform band has dierent dynamic range, it
(e) Since we use xed-length codes for Slepian-Wolf coding is not possible to have exactly the same step sizes.
and distortion of the Bidirectionally pre- results of Sections 3, 5 and 6. The source data X
dicted (B) frames. consists of all 8 8 blocks of the rst 25 frames
of the Foreman Quarter Common Intermediate
We also plot the compression results of the pixel- Format (QCIF) video sequence, with the mean
domain WZ codec. For the pixel-domain WZ removed. Assume that the encoder does not
codec, we quantize the pixels using the optimal know X, but has access to Z = X + V , where V
scalar quantizers suggested by Theorem 1, that is a block of white Gaussian noise of variance V2 .
is, uniform quantizers with no index repetition. The decoder has access to side information Y =
50 X + W , where W is white Gaussian noise of vari-
H.263+ I-B-I-B WZ MC-E 4x4 DCT ance W 2
. Note that this experimental setup re-
WZ MC-I 4x4 DCT WZ MC-E pixel dom.
produces our original statement of the problem
WZ MC-I pixel dom. DCT-based intra. cod.
of WZ quantization of noisy sources, drawn in
PSNR of even frames [dB]
45
Fig. 3. In our experiments, V , W and X are sta-
tistically independent. In this case, E[X| y, z] is
not additively separable. However, since we wish
40
to test our theoretic results which apply only to
separable estimates, we constrain our estimators
to be linear. Thus, in all the experiments of this
35 section, the estimate of X, given Y and Z is de-
ned as

30 1 y
0 100 200 300 400 (y, z) = X ( Y ) Y
x =xY (y) + x
Z (z).
Rate of even frames [b/pel] Z ( Z ) z
Fig. 14. Rate and PSNR comparison of WZ codec vs. We now consider the following cases, all con-
DCT-based intraframe coding and H.263+ I-B-I-B coding. structed using linear estimators and WZ 2D-DCT
Mother & Daughter sequence. coders of clean sources:
As observed from the plots, when the side in- 1. Assume that Y is made available to the en-
formation is highly reliable, such as when MC-I is coder estimator, perform conditional linear
used, the transform-domain codec is only 0.5 dB estimation of X given Y and Z, followed
better than the pixel-domain WZ codec. With by WZ transform coding of the estimate.
the less reliable MC-E, using a transform before This corresponds to a conditional coding
encoding results in a 2 to 2.5 dB improvement. scenario where both the encoder and the
These improvements over the pixel-domain sys- decoder have access to the side informa-
tem demonstrate the transform gain in a practical tion Y . This experiment is carried out for
Wyner-Ziv coding setup. the purpose of comparing its performance
Compared to conventional DCT-based in- with that of true WZ transform coding of a
traframe coding, the WZ transform codec is noisy source. Since we are concerned with
about 10 to 12 dB (with MC-I) and 7 to 9 dB the performance of uniform quantization at
(with MC-E) better. The gap from H.263+ in- high rates, the quantizers qi are all chosen
terframe coding is 2 dB for MC-I and about to be uniform, with the same step-size for
5 dB for MC-E. The proposed system allows low- all transform sub-bands. We assume ideal
complexity encoding while approaching to the entropy coding of the quantization indices
compression eciency of interframe video coders. conditioned on the side-information.
7.2. WZ Transform Coding of Noisy Images
2. Perform noisy WZ transform coding of Z
We implement various cases of WZ transform exactly as shown in Fig. 13. As men-
coding of noisy images to conrm the theoretic tioned above, the orthonormal transform
being used is the 2D-DCT applied to 8 8 38.25

Z . As before, all quantiz- PSNR of best affine estimate = 38.2406
pixel blocks of X 38.2
ers are uniform with the same step-size and
ideal Slepian-Wolf coding is assumed, i.e., 38.15
the Slepian-Wolf rate required to encode 38.1
PSNR [dB]
the quantization indices in the ith sub-band (1) Cond. estim. & WZ
is simply the conditional entropy H(Qi |Yi ). 38.05 transform coding
As seen in Fig. 13, the decoder recovers the 38 (2) Noisy WZ

Z , and obtains the nal estimate transform coding of Z
estimate X
37.95 (3) Direct WZ
as X =xY (Y ) + X Z . transform coding of Z
37.9 (4) Noisy WZ w/o side
inform. in reconstruct.
3. Perform WZ transform coding directly 37.85
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2
on Z, reconstruct Z at the decoder and ob- Rate [b/pel]
tain X =x This experiment is per-
(Y, Z).
formed in order to investigate the penalty Fig. 15. WZ transform coding of a noisy image is as-
ymptotically equivalent to the conditional case. Foreman
incurred, when the the noisy input Z is sequence.
treated as a clean source for WZ transform
coding. 8. Conclusions
If ideal Slepian-Wolf coders are used, uniform
4. Perform noisy WZ transform coding of Z tessellating quantizers without index repetition
as in Case 2, except that x Zi (qi , yi ) =
are asymptotically optimal at high rates. It is

E[Xi | qi ], i.e., the reconstruction function known [5] that the rate loss in the WZ problem
does not use the side information Y . This for smooth continuous sources and quadratic dis-
experiment is performed in order to exam- tortion vanishes as D 0. Our work shows that
ine the penalty incurred at high rates for the this is true also for the operational rate loss and
situation described in Corollary 7, where for each nite dimension n.
the side-information is used for Slepian- The theoretic study of transforms shows that
Wolf encoding but ignored in the recon- (under certain conditions) the KLT of the source
struction function. vector is determined by its expected conditional
covariance given the side information, which is
Fig. 15 plots rate vs. PSNR for the above cases, approximated by the DCT for conditionally sta-
with V2 = W 2
= 25, and X 2
= 2730 (mea- tionary processes. Experimental results conrm
sured). The performance of conditional estima- that the use of the DCT may lead to important
tion (Case 1) and WZ transform coding (Case 2) performance improvements.
are in close agreement at high rates as predicted If the conditional expectation of the unseen
by Theorem 12. Our theory does not explain the source data X given the side information Y and
behavior at low rates. Experimentally, we ob- the noisy observation Z is additively separable,
served that Case 2 slightly outperforms Case 1 then, at high rates, optimal WZ quantizers of Z
at lower rates. Both cases show superior rate- can be decomposed into estimators and uniform
distortion performance than direct WZ coding tessellating quantizers for clean sources, achieving
of Z (Case 3). Neglecting the side-information the same rate-distortion performance as if the side
in the reconstruction function (Case 4) is ine- information were available at the encoder. This
cient at low rates, but at high rates, this sim- is consistent with the experimental results of the
pler scheme approaches the performance of Case 2 application of the Lloyd algorithm for noisy WZ
with the ideal reconstruction function, thus con- quantization design in [54].
rming Corollary 7. The additive separability condition for high-
rate WZ quantization of noisy sources, albeit less [71, Lemma 2.1], as assumed in the Wyner-Ziv
restrictive, is similar to the condition required theorem, and the two regularity conditions re-
for zero rate loss in the quadratic Gaussian noisy quired by theorem are clearly satised [12, Equa-
Wyner-Ziv problem [8], which applies exactly for tion (2.2),Theorems 2.1,2.2]. Proceeding as in [4,
any rate but requires arbitrarily large dimension. Remark 3, p.3], and realizing that Fanos inequal-
We propose a WZ transform coder of noisy ity is still valid for arbitrary Y , the asymmetric
sources consisting of an estimator and a WZ Slepian-Wolf result follows immediately from the
transform coder for clean sources. Under certain the Wyner-Ziv theorem. This proves the case of
conditions, in particular if the encoder estimate nite Q and arbitrary Y .
is conditionally covariance stationary given Y , In the case when the alphabet of Q is countably
the DCT is an asymptotically optimal transform. innite, observe that for any rate R > H(Q|Y ),
The side information for the Slepian-Wolf decoder there exists a nite quantization Q of Q such
and the reconstruction function in each subband that R > H(Q|Y ) H(Q|Y ), and Q can be
can be replaced by a sucient statistic with no coded with arbitrarily low probability of decod-
asymptotic loss in performance. ing error between the original Q and the de-
coded Q. This shows achievability. To prove
A. Appendix: Asymmetric Slepian-Wolf Coding the converse, suppose that a rate R < H(Q|Y )
with Arbitrary Side Information is admissible. For any such R, simply by the
denition of conditional entropy for general al-
We show that the asymmetric Slepian-Wolf phabets in terms of nite measurable partitions,
coding result extends to sources in countable al- there exists a nite quantization Q of Q such that
phabets and arbitrary, possibly continuous, side R < H(Q|Y ) H(Q|Y ). Since the same code
information. Let U, V be arbitrary random vari- used for Q could be used for Q, with trivial mod-
ables. The general denition of conditional en- ications, with arbitrarily small probability of er-
tropy used here is H(U |V ) = I(U ; U |V ), with the ror, this would contradict the converse for the
conditional mutual information dened in [71], nite-alphabet case.
consistent with [12].
The proposition can also be proven directly
Proposition 19 (Asymmetric Slepian-Wolf Cod- from the Slepian-Wolf result for nite alphabets,
ing). Let Q be a discrete random variable repre- without invoking the Wyner-Ziv theorem for gen-
senting source data to be encoded, and let Y be eral sources. The arguments used are somewhat
an arbitrary random variable acting as side in- lengthier, but similarly to the previous proof,
formation in an asymmetric Slepian-Wolf coder exploit the denitions of general information-
of Q. Any rate satisfying R > H(Q|Y ) is admis- theoretic quantities in terms of nite measurable
sible in the sense that a code exists with arbitrar- partitions.
ily low probability of decoding error, and any rate
R < H(Q|Y ) is not admissible.
Acknowledgment
Proof: The statement follows from the Wyner- The authors would like to thank the anony-
Ziv theorem for general sources [12], that is, the mous reviewers for their helpful comments, which
one-letter characterization of the Wyner-Ziv rate- motivated a number improvements in this paper.
distortion function for source data and side infor-
mation distributed in arbitrary alphabets. Sup-
pose rst that the alphabet of Q is nite, and References
the alphabet of Y is arbitrary. The reconstruc- [1] D. Rebollo-Monedero, A. Aaron, B. Girod, Trans-
tion alphabet in the Wyner-Ziv coding setting is forms for high-rate distributed source coding, in:
Proc. Asilomar Conf. Signals, Syst., Comput., Vol. 1,
chosen to be the same as that of Q, and the distor- Pacic Grove, CA, 2003, pp. 850854, invited paper.
tion measure is the Hamming distance d(q, q) = [2] D. Rebollo-Monedero, S. Rane, B. Girod, Wyner-Ziv
1{q = q}. Then, I(Q; Y ) = H(Q) H(Q|Y ) < quantization and transform coding of noisy sources at
high rates, in: Proc. Asilomar Conf. Signals, Syst., tributed source coding technique for correlated images
Comput., Vol. 2, Pacic Grove, CA, 2004, pp. 2084 using turbo-codes, IEEE Commun. Lett. 6 (9) (2002)
2088. 379381.
[3] J. D. Slepian, J. K. Wolf, Noiseless coding of corre- [22] A. D. Liveris, Z. Xiong, C. N. Georghiades, Compres-
lated information sources, IEEE Trans. Inform. The- sion of binary sources with side information at the de-
ory IT-19 (1973) 471480. coder using LDPC codes, IEEE Commun. Lett. 6 (10)
[4] A. D. Wyner, J. Ziv, The rate-distortion function for (2002) 440442.
source coding with side information at the decoder, [23] D. Schonberg, S. S. Pradhan, K. Ramchandran,
IEEE Trans. Inform. Theory IT-22 (1) (1976) 110. LDPC codes can approach the Slepian Wolf bound
[5] R. Zamir, The rate loss in the Wyner-Ziv problem, for general binary sources, in: Proc. Allerton Conf.
IEEE Trans. Inform. Theory 42 (6) (1996) 20732084. Commun., Contr., Comput., Allerton, IL, 2002.
[6] T. Linder, R. Zamir, K. Zeger, On source coding [24] B. Girod, A. Aaron, S. Rane, D. Rebollo-Monedero,
with side-information-dependent distortion measures, Distributed video coding, in: Proc. IEEE, Special Is-
IEEE Trans. Inform. Theory 46 (7) (2000) 26972704. sue Advances Video Coding, Delivery, Vol. 93, 2005,
[7] H. Yamamoto, K. Itoh, Source coding theory for pp. 7183, invited paper.
multiterminal communication systems with a remote [25] R. Zamir, S. Shamai, Nested linear/lattice codes for
source, Trans. IECE Japan E63 (1980) 700706. Wyner-Ziv encoding, in: Proc. IEEE Inform. Theory
[8] D. Rebollo-Monedero, B. Girod, A generalization Workshop (ITW), Killarney, Ireland, 1998, pp. 9293.
of the rate-distortion function for Wyner-Ziv coding [26] R. Zamir, S. Shamai, U. Erez, Nested linear/lattice
of noisy sources in the quadratic-Gaussian case, in: codes for structured multiterminal binning, IEEE
Proc. IEEE Data Compression Conf. (DCC), Snow- Trans. Inform. Theory 48 (6) (2002) 12501276.
bird, UT, 2005, pp. 2332. [27] S. D. Servetto, Lattice quantization with side informa-
[9] H. S. Witsenhausen, A. D. Wyner, Interframe coder tion, in: Proc. IEEE Data Compression Conf. (DCC),
for video signals, U.S. Patent 4191970 (Nov. 1980). Snowbird, UT, 2000, pp. 510519.
[10] R. Puri, K. Ramchandran, PRISM: A new robust [28] J. Kusuma, L. Doherty, K. Ramchandran, Distributed
video coding architecture based on distributed com- compression for sensor networks, in: Proc. IEEE Int.
pression principles, in: Proc. Allerton Conf. Com- Conf. Image Processing (ICIP), Vol. 1, Thessaloniki,
mun., Contr., Comput., Allerton, IL, 2002. Greece, 2001, pp. 8285.
[11] A. Aaron, R. Zhang, B. Girod, Wyner-Ziv coding of [29] M. Fleming, M. Eros, Network vector quantization,
motion video, in: Proc. Asilomar Conf. Signals, Syst., in: Proc. IEEE Data Compression Conf. (DCC),
Comput., Pacic Grove, CA, 2002, pp. 240244. Snowbird, UT, 2001, pp. 1322.
[12] A. D. Wyner, The rate-distortion function for source [30] M. Fleming, Q. Zhao, M. Eros, Network vector quan-
coding with side information at the decoderII: Gen- tization, IEEE Trans. Inform. Theory 50 (8) (2004)
eral sources, Inform., Contr. 38 (1) (1978) 6080. 15841604.
[13] R. Zamir, T. Berger, Multiterminal source coding [31] J. Cardinal, G. V. Asche, Joint entropy-constrained
with high resolution, IEEE Trans. Inform. Theory multiterminal quantization, in: Proc. IEEE Int.
45 (1) (1999) 106117. Symp. Inform. Theory (ISIT), Lausanne, Switzerland,
[14] H. Viswanathan, Entropy coded tesselating quantiza- 2002, p. 63.
tion of correlated sources is asymptotically optimal, [32] D. Rebollo-Monedero, R. Zhang, B. Girod, Design of
unpublished (1996). optimal quantizers for distributed source coding, in:
[15] M. E. Hellman, Convolutional source encoding, IEEE Proc. IEEE Data Compression Conf. (DCC), Snow-
Trans. Inform. Theory IT-21 (6) (1975) 651656. bird, UT, 2003, pp. 1322.
[16] S. S. Pradhan, K. Ramchandran, Distributed source [33] S. P. Lloyd, Least squares quantization in PCM, IEEE
coding using syndromes (DISCUS): Design and con- Trans. Inform. Theory IT-28 (1982) 1291373.
struction, in: Proc. IEEE Data Compression Conf. [34] Z. Xiong, A. Liveris, S. Cheng, Z. Liu, Nested quanti-
(DCC), Snowbird, UT, 1999, pp. 158167. zation and Slepian-Wolf coding: A Wyner-Ziv coding
[17] J. Garca-Fras, Y. Zhao, Compression of correlated paradigm for i.i.d. sources, in: Proc. IEEE Workshop
binary sources using turbo codes, IEEE Commun. Stat. Signal Processing (SSP), St. Louis, MO, 2003,
Lett. 5 (10) (2001) 417419. pp. 399402.
[18] J. Bajcsy, P. Mitran, Coding for the Slepian Wolf [35] Y. Yang, S. Cheng, Z. Xiong, W. Zhao, Wyner-Ziv
problem with turbo codes, in: Proc. IEEE Global coding based on TCQ and LDPC codes, in: Proc.
Telecomm. Conf. (GLOBECOM), Vol. 2, 2001, pp. Asilomar Conf. Signals, Syst., Comput., Vol. 1, Pacic
14001404. Grove, CA, 2003, pp. 825829.
[19] G.-C. Zhu, F. Alajaji, Turbo codes for nonuniform [36] Z. Xiong, A. D. Liveris, S. Cheng, Distributed source
memoryless sources over noisy channels, IEEE Com- coding for sensor networks, IEEE Signal Processing
mun. Lett. 6 (2) (2002) 6466. Mag. 21 (5) (2004) 8094.
[20] A. Aaron, B. Girod, Compression with side informa- [37] H. Hotelling, Analysis of a complex of statistical vari-
tion using turbo codes, in: Proc. IEEE Data Compres- ables into principal components, J. Educ. Psychol. 24
sion Conf. (DCC), Snowbird, UT, 2002, pp. 252261. (1933) 417441, 498520.
[21] A. D. Liveris, Z. Xiong, C. N. Georghiades, A dis- [38] K. Karhunen, Uber lineare methoden in der
wahrscheinlichkeitsrechnung, Ann. Acad. Sci. Fenn., [55] R. M. Gray, D. L. Neuho, Quantization, IEEE Trans.
Ser. A I Math.-Phys. 37 (1947) 379. Inform. Theory 44 (1998) 23252383.
[39] M. Lo` eve, Fonctions aleatoires du second ordre, in: [56] V. K. Goyal, Theoretical foundations of transform
P. Levy (Ed.), Processus stochastiques et mouvement coding, IEEE Signal Processing Mag. 18 (5) (2001)
Brownien, Gauthier-Villars, Paris, France, 1948. 921.
[40] M. Gastpar, P. L. Dragotti, M. Vetterli, The distrib- [57] T. M. Cover, A proof of the data compression theorem
uted Karhunen-Loève transform, in: Proc. IEEE Int. of Slepian and Wolf for ergodic sources, IEEE Trans.
Workshop Multimedia Signal Processing (MMSP), St. Inform. Theory 21 (2) (1975) 226228, (Corresp.).
Thomas, US Virgin Islands, 2002, pp. 5760. [58] W. R. Bennett, Spectra of quantized signals, Tech.
[41] M. Gastpar, P. L. Dragotti, M. Vetterli, The dis- J. 27, Bell Syst. (Jul. 1948).
tributed, partial, and conditional Karhunen-Loève [59] S. Na, D. L. Neuho, Bennetts integral for vector
transforms, in: Proc. IEEE Data Compression Conf. quantizers, IEEE Trans. Inform. Theory 41 (1995)
(DCC), Snowbird, UT, 2003, pp. 283292. 886900.
[42] M. Gastpar, P. L. Dragotti, M. Vetterli, On compres- [60] A. Gersho, Asymptotically optimal block quantiza-
sion using the distributed Karhunen-Loève transform, tion, IEEE Trans. Inform. Theory IT-25 (1979) 373
in: Proc. IEEE Int. Conf. Acoust., Speech, Signal 380.
Processing (ICASSP), Vol. 3, Philadelphia, PA, 2004, [61] P. L. Zador, Topics in the asymptotic quantization of
pp. 901904. continuous random variables, Tech. Memo. 1966, Bell
[43] M. Gastpar, P. L. Dragotti, M. Vetterli, The distrib- Lab., unpublished.
uted Karhunen-Loève transform, IEEE Trans. Inform. [62] R. M. Gray, T. Linder, J. Li, A Lagrangian formula-
TheorySubmitted. tion of Zadors entropy-constrained quantization the-
[44] S. S. Pradhan, K. Ramchandran, Enhancing analog orem, IEEE Trans. Inform. Theory 48 (3) (2002) 695
image transmission systems using digital side informa- 707.
tion: A new wavelet-based image coding paradigm, in: [63] R. Zamir, M. Feder, On lattice quantization noise,
Proc. IEEE Data Compression Conf. (DCC), Snow- IEEE Trans. Inform. Theory 42 (4) (1996) 11521159.
bird, UT, 2001, pp. 6372. [64] K. R. Rao, P. Yip, Discrete cosine transform: Al-
[45] R. L. Dobrushin, B. S. Tsybakov, Information trans- gorithms, advantages, applications, Academic Press,
mission with additional noise, IRE Trans. Inform. San Diego, CA, 1990.
Theory IT-8 (1962) S293S304. [65] U. Grenander, G. Szeg o, Toeplitz forms and their ap-
[46] J. K. Wolf, J. Ziv, Transmission of noisy information plications, University of California Press, Berkeley,
to a noisy receiver with minimum distortion, IEEE CA, 1958.
Trans. Inform. Theory IT-16 (4) (1970) 406411. [66] R. M. Gray, Toeplitz and circulant matrices: A review
[47] Y. Ephraim, R. M. Gray, A unied approach for en- (2002).
coding clean and noisy sources by means of waveform URL http://ee.stanford.edu/~gray/toeplitz.pdf
and autoregressive vector quantization, IEEE Trans. [67] R. Kresch, N. Merhav, Fast DCT domain ltering
Inform. Theory IT-34 (1988) 826834. using the DCT and the DST, IEEE Trans. Image
[48] T. Flynn, R. Gray, Encoding of correlated observa- Processing 8 (1999) 821833.
tions, IEEE Trans. Inform. Theory 33 (6) (1987) 773 [68] T. M. Cover, J. A. Thomas, Elements of information
787. theory, Wiley, New York, 1991.
[49] H. S. Witsenhausen, Indirect rate-distortion prob- [69] G. Casella, R. L. Berger, Statistical Inference, 2nd
lems, IEEE Trans. Inform. Theory IT-26 (1980) 518 Edition, Thomson Learning, Australia, 2002.
521. [70] A. Aaron, S. Rane, E. Setton, B. Girod, Transform-
[50] S. C. Draper, G. W. Wornell, Side information aware domain Wyner-Ziv codec for video, in: Proc.
coding strategies for sensor networks, IEEE J. Select. IT&S/SPIE Conf. Visual Commun., Image Process-
Areas Commun. 22 (6) (2004) 966976. ing (VCIP), San Jose, CA, 2004.
[51] W. M. Lam, A. R. Reibman, Quantizer design for de- [71] A. D. Wyner, A denition of conditional mutual infor-
centralized estimation systems with communication mation for arbitrary ensembles, Inform., Contr. 38 (1)
constraints, in: Proc. Conf. Inform. Sci. Syst., Balti- (1978) 5159.
more, MD, 1989.
[52] W. M. Lam, A. R. Reibman, Design of quantizers
for decentralized estimation systems, IEEE Trans. In-
form. Theory 41 (11) (1993) 16021605.
[53] J. A. Gubner, Distributed estimation and quantiza-
tion, IEEE Trans. Inform. Theory 39 (4) (1993) 1456
1459.
[54] D. Rebollo-Monedero, B. Girod, Design of optimal
quantizers for distributed coding of noisy sources,
in: Proc. IEEE Int. Conf. Acoust., Speech, Signal
Processing (ICASSP), Vol. 5, Philadelphia, PA, 2005,
pp. 10971100, invited paper.

Rebollo EURASIPSig Proc 2006

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rebollo EURASIPSig Proc 2006

Uploaded by

Copyright:

Available Formats

High-Rate Quantization and Transform Coding

with Side Information at the Decoder

1. Introduction nating motion compensation, and decoding using

address: drebollo@stanford.edu (D. Rebollo-Monedero).

Proof: Immediate from Theorem 4 and conven- Y ) = h(X

2. No two cells of the partition dened by 2

by solving for V in (9) and substituting the result xZ ).

10 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

X|Y implies that X |Y =

yields a common distortion condition, Di D Corollary 9 (Gaussian case). If X and Y are

Z XZ2 X Zc 2 Q2c Q2c X Zc 2 X Z 2 X

XZn X Zc n Qnc Qnc X Zc n X Z n

associated autocorrelation so that it is also as- a convenient transform-domain estimation struc-

6.1. Linear Transformations the minimum is achieved, in particular, for c = c

Proof: Set c = XY 1 T thus the minimization of Ri is approximately

6.2. General Transformations Proof: Theorems 6 and 12 imply Ri =

being used is the 2D-DCT applied to 8 8 38.25

You might also like

Rebollo EURASIPSig Proc 2006

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rebollo EURASIPSig Proc 2006

Uploaded by

Copyright:

Available Formats

High-Rate Quantization and Transform Coding

with Side Information at the Decoder

1. Introduction nating motion compensation, and decoding using

address: drebollo@stanford.edu (D. Rebollo-Monedero).

Proof: Immediate from Theorem 4 and conven- Y ) = h(X

2. No two cells of the partition dened by 2

by solving for V in (9) and substituting the result xZ ).

10 D. Rebollo-Monedero, S. Rane, A. Aaron, B. Girod

X|Y implies that X |Y =

yields a common distortion condition, Di  D Corollary 9 (Gaussian case). If X and Y are

Z XZ2 X Zc 2 Q2c Q2c X Zc 2 X Z 2 X

XZn X Zc n Qnc Qnc X Zc n X Z n

associated autocorrelation so that it is also as- a convenient transform-domain estimation struc-

6.1. Linear Transformations the minimum is achieved, in particular, for c = c

Proof: Set c = XY 1 T thus the minimization of Ri is approximately

6.2. General Transformations Proof: Theorems 6 and 12 imply Ri =

being used is the 2D-DCT applied to 8 8 38.25

You might also like

yields a common distortion condition, Di D Corollary 9 (Gaussian case). If X and Y are