Professional Documents
Culture Documents
ISSN 0032-9460, Problems of Information Transmission, 2009, Vol. 45, No. 4, pp. 295308.
c V.V. Prelov, 2009, published in Problemy Peredachi Informatsii, 2009, Vol. 45, No. 4, pp. 317.
Original Russian Text
INFORMATION THEORY
1. INTRODUCTION
.
Let P = {pi } and Q = {qi }, i N = {1, 2, . . .}, be two discrete probability distributions. Recall
that the (information) divergence is dened as
D(P || Q) =
i
pi ln
pi
,
qi
and the variational distance V (P, Q) between P and Q is the L1 distance between them; i.e.,
V (P, Q) =
|pi qi |
(though sometimes the variational distance between P and Q is dened as half the L1 distance
between them).
There is extensive literature devoted to investigation of relationships between D(P || Q) and
V (P, Q) (see, e.g., [4] and references therein). Here we only mention the so-called Pinskers inequality
1 2
V (P, Q) D(P || Q),
(1)
2
though in his original paper [5] Pinsker proved a weaker inequality D(P || Q) c V 2 (P, Q) with a
constant c < 1/2. Note also that in general it is impossible to upper estimate D(P || Q) via V (P, Q)
without some additional conditions on the probability distributions P and Q, since D(P || Q) can
be arbitrarily large while V (P, Q) is arbitrarily small.
In this paper, we consider a special case where P is a joint distribution of several discrete random
variables and Q is the direct product of distributions of these random variables. Let X1 , . . . , Xn
be discrete random variables ranging in nite or countable sets Ii , i = 1, . . . , n, respectively.
1
Supported in part by the Russian Foundation for Basic Research, project no. 09-01-00536.
295
296
PRELOV
Since it what follows we consider only discrete random variables and operate only with probability
.
distributions of such random variables, we usually assume without loss of generality that Ii =
{1, 2, . . . , Ni }, i = 1, . . . , n, where Ni are given integers; moreover, some of them may be innite.
Denote by
.
(2)
I(X1 ; . . . ; Xn ) = D(PX1 ...Xn || PX1 . . . PXn )
the information divergence, and by
.
(X1 , . . . , Xn ) = V (PX1 ...Xn , PX1 . . . PXn ),
(3)
the variational distance between the joint distribution PX1 ...Xn of the random variables X1 , . . . , Xn
and the product PX1 . . . PXn of their marginal distributions. The quantity I(X1 ; . . . ; Xn ) is
usually called the mutual information of X1 , . . . , Xn (see, e.g., [6]), which in the special case n = 2
coincides with the standard mutual information I(X1 ; X2 ) of two random variables. The quantities
dened above satisfy the inequality
1 2
(X1 , . . . , Xn ) I(X1 ; . . . ; Xn ),
2
which is a special case of inequality (1).
Consider the quantities
.
I (X1 ; . . . ; Xn ) =
sup
(4)
I(X1 ; . . . ; Xn ; Y ),
(5)
Y : (X1 ,...,Xn ,Y )
where the supremum is over all discrete random variables Y dened by conditional distributions
PY | X1 ...Xn such that (X1 , . . . , Xn , Y ) . In addition, note that I (X1 ; . . . ; Xn ) is dened only
for (X1 , . . . , Xn ), since it is easily seen that (X1 , . . . , Xn , Y ) (X1 , . . . , Xn ) for any Y (see
also Section 3).
For given integers N1 , . . . , Nn , dene
.
sup I (X1 ; . . . ; Xn ),
(6)
I(N1 ,...,Nn ) =
Xi : |Ii |=Ni
i=1,...,n
n
i=1
n
(7)
i=1
Vol. 45 No. 4
2009
297
where, as usual, H() and H( | ) denote the entropy and conditional entropy of the corresponding
random variables, respectively. Therefore, considering (X1 , . . . , Xn ) as a single random variable,
one can use some results of [13] for estimating sup I(X1 ; . . . ; Xn ; Y ) via under the assumption
Y
that V (PX1 ...Xn Y || PX1 ...Xn PY ) . However, our aim is to estimate sup I(X1 ; . . . ; Xn ; Y ) via
Y
provided that (X1 , . . . , Xn , Y ) = V (PX1 ...Xn Y || PX1 . . . PXn PY ) , but there is no direct dependence between V (PX1 ...Xn Y || PX1 ...Xn PY ) and V (PX1 ...Xn Y || PX1 . . . PXn PY ).
We will see in Section 3 that V (PX1 ...Xn Y || PX1 ...Xn PY ) can be both larger or smaller than
V (PX1 ...Xn Y || PX1 . . . PXn PY ), though in the special case where the random variables
X1 , . . . , Xn are independent, V (PX1 ...Xn Y || PX1 ...Xn PY ) = V (PX1 ...Xn Y || PX1 . . . PXn PY ).
Thus, in general we cannot directly apply the results of [13]; however, as will be seen in Section 4,
some methods of those papers can be partially used in the case considered here.
Let us introduce some necessary notation to state our results. Joint and marginal probability
distributions of random variables X1 , . . . , Xn are denoted by
.
pi1 ...in = Pr{X1 = i1 , . . . , Xn = in },
(8)
(k) .
pik = Pr{Xk = ik }, ik Ik , k = 1, . . . , n,
respectively. Let
.
(X1 , . . . , Xn ) = max (X1 , . . . , Xn , Y ),
(9)
where the maximum is over all random variables Y . In Section 3 (see Lemma 1), it is shown that
(X1 , . . . , Xn ) = 2 1
(1)
(n)
pi1 ...in pi1 . . . pin
(10)
i1 ,...,in
(1)
(n)
Assume that vectors (pi1 , . . . , pin ) are ordered in such a way that
(1)
(n)
(1)
(n)
n
(k)
n
(k)
k=1
k=1
p ik
pj k .
n
(k)
.
Ks =
p ik
.
Ls =
k=1
p(k)
sk
ln
k=1
Moreover, let
.
M=
i1 ,...,in
n
k=1
(k)
p ik
n
n
(k)
p ik
(12)
k=1
p(k)
sk
(13)
k=1
pi1 ...in
1
ln
(11)
n
k=1
(k)
p ik
n
k=1
(k)
p ik
ln
n
(k)
p ik
(14)
k=1
i=1
from (7) if we put Y = (X1 , . . . , Xn ). Therefore, to study the behavior of I (X1 ; . . . ; Xn ), we may
restrict ourselves to the case of (X1 , . . . , Xn ) < (X1 , . . . , Xn ).
PROBLEMS OF INFORMATION TRANSMISSION
298
PRELOV
(15)
Proofs of this and subsequent propositions are given in Section 4. An upper bound for I
is given in the following proposition.
n
.
Nk , we have the inequality
Proposition 2. For any , 0 < 2(1 1/N ), N =
k=1
I(N1 ,...,Nn )
ln(N 1) + h
,
2
2
(17)
.
where h(x) = x ln x (1 x) ln(1 x) is the binary entropy function, and
I(N1 ,...,Nn ) = ln N
if
2(1 1/N ).
(18)
The lower bound given in the following proposition, though not optimal, is asymptotically
optimal in some cases.
Proposition 3. For any , (X1 , . . . , Xn ) < (X1 , . . . , Xn ), we have
I (X1 ; . . . ; Xn )
n
i=1
H(Xi )
(X1 , . . . , Xn )
H(X1 , . . . , Xn ).
(X1 , . . . , Xn ) (X1 , . . . , Xn )
(19)
Remark 1. In a special case where the random variables X1 , . . . , Xn are independent, one can
(N ,...,Nn )
given in Proposieasily verify that the upper and lower bounds for I (X1 ; . . . ; Xn ) and I 1
tions 13 coincide with the corresponding bounds for these quantities obtained in [13] if the vector
(X1 , . . . , Xn ) is considered as a single discrete random variable ranging in the set I = I1 . . . In .
In particular, this observation allows us to claim that for any , 0 2(1 1/N ), with
N=
Nk we have
k=1
I (X1 ; . . . ; Xn ) =
N
ln N
2(N 1)
(20)
if X1 , . . . , Xn are independent and each Xi takes Ni dierent values with equal probability.
Remark 2. Note that for = (X1 , . . . , Xn ) (i.e., for the minimum value of ), the lower estimate (19) reduces to the inequality I (X1 ; . . . ; Xn ) I(X1 ; . . . ; Xn ). At rst sight, it seems that
this estimate is tight, i.e., there should be equality instead of the inequality, since it is obvious
that (X1 , . . . , Xn , Y ) = (X1 , . . . , Xn ) if Y does not depend on the collection of random variables X1 , . . . , Xn , and therefore I (X1 ; . . . ; Xn ) = I(X1 ; . . . ; Xn ). However, actually we have the
strong inequality I (X1 , . . . , Xn ) > I(X1 ; . . . ; Xn ) if the random variables X1 , . . . , Xn are dependent, since there exists a random variable Y such that (X1 , . . . , Xn , Y ) = (X1 , . . . , Xn ) and at the
same time Y depends on the collection of random variables X1 , . . . , Xn (see Section 3, Lemma 1),
and therefore we obviously have
I (X1 ; . . . ; Xn ) I(X1 ; . . . ; Xn ; Y )
= I(X1 ; . . . ; Xn ) + I((X1 , . . . , Xn ); Y ) > I(X1 ; . . . ; Xn ).
PROBLEMS OF INFORMATION TRANSMISSION
Vol. 45 No. 4
2009
299
Note also that Propositions 13 imply two corollaries stated below, which are proved in the
Appendix.
Corollary 1. We have the asymptotic relations
I(N1 ,...,Nn ) =
and
n
.
N=
Nk ,
ln N (1 + o(1)),
2
(21)
k=1
1
n
ln + O( ) I(N1 ,...,Nn ) ln + O( ),
2(n + 1)
2
0.
(22)
Before formulating the second corollary, recall that I (X1 ; . . . ; Xn ) was dened as
I(X1 ; . . . ; Xn ; Y ).
sup
Y : (X1 ,...,Xn ,Y )
(m)
sup
I(X1 ; . . . ; Xn ; Y1 ; . . . ; Ym ).
(23)
Y1 ,...,Ym :
(X1 ,...,Xn ,Y1 ,...,Ym )
(25)
where k takes integer values from 1 to n. In particular, for k = 1 we obtain the former function,
i.e.,
((X1 ), X2 , . . . , Xn ) = (X1 , . . . , Xn ).
The quantities
(X1 , . . . , Xk , (Xk+1 , . . . , Xm ), Xm+1 , . . . , Xn )
are dened similarly. When dening such quantities, one should take into account that it is necessary to consider the random vector (Xk+1 , . . . , Xm ) as a single random variable whose probability
distribution is the joint distribution of the collection of random variables Xk+1 , . . . , Xm .
Let us list several simple properties of these quantities.
300
PRELOV
If the random variables Xk+1 , . . . , Xn and the vector (X1 , . . . , Xk ) are jointly independent,
then ((X1 , . . . , Xk ), Xk+1 , . . . , Xn ) = 0. In particular, if X1 , . . . , Xn are independent, then
(X1 , . . . , Xn ) = 0;
For any integers k, 1 k n 1, and m 0, we have
(X1 , . . . , Xk , (Xk+1 , . . . , Xn )) (X1 , . . . , Xk , (Xk+1 , . . . , Xn+m ))
(26)
(X1 , . . . , Xk ) (X1 , . . . , Xn );
(27)
and
moreover, (X1 , . . . , Xk ) = (X1 , . . . , Xn ) if the random variables Xk+1 , . . . , Xn and the vector
(X1 , . . . , Xk ) are jointly independent;
If Xi X with probability 1 for all i = 1, . . . , n, then for all integers k, 1 k n 1, we have
((X1 , . . . , Xk ), Xk+1 , . . . , Xn ) = 2 1
pnk+1
i
(28)
in particular,
(X1 , . . . , Xn ) = (X, . . . , X) = 2 1
pni
(29)
pi1 ...in+m pi1 . . . pik pik+1 ...in+m = (X1 , . . . , Xk , (Xk+1 , . . . , Xn+m )).
(X1 , . . . , Xk , (Xk+1 , . . . , Xn )) =
i1 ,...,in+m
Some other (less obvious) properties of the variational distance for the considered class of probability distributions are given in the following lemma. Some of these properties are already mentioned
in Section 2.
Lemma 1. The following statesments are valid:
(1) The function (X1 , . . . , Xn ) dened in (9) satises equality (10); moreover,
(X1 , . . . , Xn ) = (X1 , . . . , Xn , (X1 , . . . , Xn ))
=2 1
(1)
(n)
pi1 ...in pi1 . . . pin
(30)
i1 ,...,in
(31)
Y1 ,...,Ym
(3) If the random variables X1 , . . . , Xn are dependent, then there exists a random variable Y
such that (X1 , . . . , Xn , Y ) = (X1 , . . . , Xn ) but Y depends on X1 , . . . , Xn .
Before proving this lemma, we make two remarks.
PROBLEMS OF INFORMATION TRANSMISSION
Vol. 45 No. 4
2009
301
Remark 3. The function (X1 , . . . , Xn ) can be both greater or smaller than ((X1 , . . . , Xk ),
Xk+1 , . . . , Xn ), depending on the probability distribution of the random variables X1 , . . . , Xn .
A similar statement is also valid for the functions (X1 , . . . , Xn ) and
.
((X1 , . . . , Xk ), Xk+1 , . . . , Xn ) = max ((X1 , . . . , Xk ), Xk+1 , . . . , Xn , Y ).
Y
Indeed, it easily follows from (28)(30) that in the case where Xi = X, i = 1, . . . , n, and X is a
nondegenerate random variable, we have
(X1 , . . . , Xn ) > ((X1 , . . . , Xk ), Xk+1 , . . . , Xn )
and
(X1 , . . . , Xn ) > ((X1 , . . . , Xk ), Xk+1 , . . . , Xn )
for any n 3 and k, 2 k n 1. These inequalities are valid for most joint probability
distributions. However, in some cases the opposite inequalities hold, as the following example
shows. Let X and Y be random variables, each of them taking two possible values 1 and 2, and let
their joint distribution be given by the formulas
.
.
p22 = Pr{X = 2, Y = 2} = q 2 ,
p11 = Pr{X = 1, Y = 1} = p2 + ,
.
.
p12 = Pr{X = 1, Y = 2} = Pr{X = 2, Y = 1} = p21 = pq,
where p > 0, q > 0, p + q = 1, and > 0 is suciently small. Let Z = (X, Y ). Then, using
equality (30), we get
(X, Y ) = (X, Y, Z) = 2 1
pij pi qj ,
i,j
where
.
p1 = p11 + p12 = p + ,
.
q1 = p11 + p21 = p + ,
and
.
p2 = 1 p1 = q ,
.
q2 = 1 q1 = q ,
((X, Y )) = ((X, Y ), Z) = 2 1
p2ij
i,j
0,
i,j
and
0.
i,j
Therefore, if p is rather close to 1 and is suciently small, then (X, Y ) < ((X, Y )).
Remark 4. Let X1 , . . . , Xn be a collection of random variables with a given joint probability dis1, . . . , X
n a system of independent random variables with the same marginal
tribution. Denote by X
distributions as the former one. It turns out that (X1 , . . . , Xn ) can be both larger or smaller
1 , . . . , X
n ), depending on the joint distribution of the random variables X1 , . . . , Xn . Inthan (X
.
deed, consider two examples for the case n = 2, denoting as usual pij = Pr{X = i, Y = j},
.
.
pi = Pr{X = i}, and qj = Pr{Y = j}, i, j {1, 2}.
PROBLEMS OF INFORMATION TRANSMISSION
302
PRELOV
Let
.
pij =
1/4 + if i = j,
1/4 otherwise.
(X, Y ) = 2 1
pij pi qj
= 3/2
i,j
and
2
2
pi
qj = 3/2
(X, Y ) = 2 1
Y ).
for any , 1/4 1/4, so that in this case we have (X, Y ) = (X,
Let
if i = j,
1/4 +
.
pij = 1/4
if i = 1, j = 2,
1/4 2 if i = 2, j = 1.
Then we have
(X, Y ) = 2 3/4 22 + 43
and
Y ) = 2 3/4 22 44 ,
(X,
Proof of Lemma 1. (1) Let us prove the rst claim of the lemma. To this end, we upper
estimate (X1 , . . . , Xn+1 ) for arbitrary random variables Xn+1 given the joint distribution of the
.
(1)
(n+1)
random variables X1 , . . . , Xn . Consider the set A = (i1 , . . . , in+1 ) : pi1 ...in+1 > pi1 . . . pin+1 .
Then, using denition (2), we get
(X1 , . . . , Xn+1 ) = 2
(1)
(n+1)
(1)
=2
A
(n)
(1)
(n)
(1)
=2 1
(n)
i1 ,...,in+1
(1)
(n)
pi1 ...in pi1 . . . pin
(32)
i1 ,...,in
=2 1
(1)
(n)
pi1 ...in pi1 . . . pin
(33)
i1 ,...,in
Vol. 45 No. 4
2009
303
Y1 ,...,Ym
Xn+1
(X1 , . . . , Xn+1 ) = 2 1
(1)
(n+1)
pi1 ...in+1 pi1 . . . pin+1
i1 ,...,in+1
1
(1)
(n)
=2 1
pi ...i p . . . pin
M i ,...,i 1 n i1
1
as
M ,
n
k=1
and
0 < Pr{X1 = i1 , . . . , Xn = in } <
n
Pr{Xk = ik }.
k=1
Now consider a random variable Y taking two equiprobable values, so that Pr{Y = 1} =
Pr{Y = 2} = 1/2, and dene the joint probability distribution of the random variables X1 , . . . , Xn
and Y as follows:
Pr{X1 = i1 , . . . , Xn = in , Y = 1}
Pr{X1 = i1 , . . . , Xn = in } +
. 1
=
Pr{X1 = i1 , . . . , Xn = in }
1 Pr{X = i , . . . , X = i }
1
1
n
n
2
and
Pr{X1 = i1 , . . . , Xn = in , Y = 2}
Pr{X1 = i1 , . . . , Xn = in }
. 1
=
Pr{X1 = i1 , . . . , Xn = in } +
2
1 Pr{X = i , . . . , X = i }
1
1
n
n
2
It is easy to verify that for this denition of a joint distribution of the random variables X1 , . . . , Xn
and Y both the joint distribution of X1 , . . . , Xn and the distribution of Y remain the same; moreover, (X1 , . . . , Xn , Y ) = (X1 , . . . , Xn ) if > 0 is suciently small. Hence, the third statement of
the lemma follows, since the random variable Y obviously depends on X1 , . . . , Xn for any = 0.
304
PRELOV
i1 ,...,in
over all probability distributions {qj } and all conditional distributions pi1 ...in |j satisfying the following conditions for all possible values of i1 , . . . , in and j:
0 qj 1,
qj = 1,
i1 ,...,in
j
qj
0 pi1 ...in |j 1,
pi1 ...in |j = 1,
(35)
qj pi1 ...in |j = pi1 ...in ,
(36)
|pi1 ...in |j
(1)
(n)
p i1 . . . p in |
(37)
i1 ,...,in
(k)
where {pi1 ...in } and pik , k = 1, . . . , n are the joint and marginal distributions of the random
variables X1 , . . . , Xn , respectively.
Denote by {qj } and {pi1 ...in |j } the optimal distributions maximizing the functional (34) under
conditions (35)(37). The following lemma, generalizing Lemma 2 from [1], describes the class of
probability distributions which the optimal distribution {pi1 ...in |j } belongs to.
Lemma 2. For any j, the optimal distribution {pi1 ...in |j } has one of the following three forms:
There exists a vector (i1 , . . . , in ) such that pi ...i |j = 1 and pi1 ...in |j = 0 for all (i1 , . . . , in ) =
n
1
(i1 , . . . , in );
pi1 ...in |j =
0,
k=1
k=1
(k)
p ik
(k)
n
k=1
(k)
p i
A proof of this lemma is given in the Appendix. Now we nd the maximum of the functional
pi1 ...in |j = 1 but assuming that the other conditions
F ({pi1 ...in |j }, {qj }), skipping the condition
i1 ,...,in
in (35)(37) are fullled. Then Lemma 2 implies that to nd the maximum, we may restrict
(k)
k=1
(k)
the value of the functional F ({pi1 ...in |j }, {qj }). Therefore, denoting
.
= i1 ...in =
j:
i1 ,...,in
qj
(38)
(1)
(n)
pi1 ...in |j =pi ...pin
1
(i1 ...in 1)
n
(k)
p ik
k=1
ln
n
(k)
p ik
(39)
k=1
where the maximum is over all collections {qj } and {pi1 ...in |j} under the above conditions.
PROBLEMS OF INFORMATION TRANSMISSION
Vol. 45 No. 4
2009
305
. (1)
(n)
= i1 ...in = pi1 . . . pin
.
= i1 ...in =
qj ,
j: pi1 ...in |j =0
qj
(40)
j: pi1 ...in |j =1
and express through . To this end, note that we have the equalities
in
which follow from conditions (35) and (36) and denitions (38) and (40). These equalities imply
=1
(1)
i1 ...in
(1)
(n)
(1)
(n)
(n)
(n)
1 p i1 . . . p in
(41)
Now note that the quantities = i1 ...in dened in (40) must satisfy the inequalities
i1 ...in /2 ,
(1)
(n)
(42)
i1 ,...,in
Indeed, the rst inequality in (42) follows from (37), and the second one follows from denition (40)
and the second equality in (36). Therefore, by optimizing the right-hand side of (39) over i1 ...in
and taking into account (41) and conditions (42), we derive the required estimate (15).
n , PX1 ...Xn | Y =j )
1 ...X
(i
,
.
.
.
,
i
)
P
(i
,
.
.
.
,
i
)
=
PX
,
1
n
1
n
X1 ...Xn | Y =j
...X
1
i1 ,...,in
1 , . . . , X
n are independent random variables such that for all i = 1, . . . , n the probability
where X
i coincides with that of Xi . Then
distribution of each random variable X
I(X1 ; . . . ; Xn ; Y ) =
n
H(Xi ) H(X1 , . . . , Xn | Y )
i=1
1 , . . . , X
n ) H(X1 , . . . , Xn | Y )
= H(X
j
1 , . . . , X
n ) H(X1 , . . . , Xn | Y = j)
PY (j) H(X
j
j
ln(N 1) + h
PY (j)
2
2
.
where N = N1 Nn .
|H(U ) H(V )|
ln(N 1) + h
,
2
2
.
where = V (PU , PV ), and N is the cardinality of the range of each random variable U and V .
Now note that
j
PY (j)j =
|PX1 (i1 ) PXn (in )PY (j) PX1 ...Xn Y (i1 . . . in , j)|
i1 ,...,in ,j
306
PRELOV
Therefore, taking into account the concavity of the function h(x), we conclude that
I(X1 ; . . . ; Xn ; Y )
j
j
PY (j)
2
ln(N 1) + h
j
j
PY (j)
2
ln(N 1) + h
,
2
2
x
x
ln(N 1) + h
is monotone increasing in the segment [0, 2(1 1/N )].
2
2
k=1
with X1 , . . . , Xn :
(X1 , . . . , Xn )
.
(X1 , . . . , Xn ) (X1 , . . . , Xn )
n
H(Xi ) (1 )H(X1 , . . . , Xn ),
i=1
APPENDIX
Proof of Lemma 2. The proof of this lemma is quite similar to that of Lemmas 1 and 2
from [1], and therefore we only outline the main arguments for our case. Let a collection of
probability distributions {pi1 ...in |j , qj } be optimal, maximizing the functional F ({pi1 ...in |j }, {qj })
under conditions (40)(42). Assume the contrary: let the claim of Lemma 2 be not valid, i.e., let
there exist two dierent vectors (i1 , . . . , in ) and (i1 , . . . , in ) such that for some j0 we have
pi ...in |j0
1
/ 0,
n
(k)
p i
k=1
/ 0,
n
(k)
pi
k=1
For a suciently small , consider two new conditional probability distributions, {pi1 ...in |j0 } and
{pi1 ...in |j0 }, dened by the equalities
pi ...in |j0 = pi ...in |j0 + ,
1
Vol. 45 No. 4
2009
and
307
It is easy to verify that the conditional distribution {pi1 ...in |j0 } is the half-sum of the distributions
{pi1 ...in |j0 } and {pi1 ...in |j0 }; moreover, we have
n
n
n
1
1
(k)
(k)
(k)
pik pi1 ...in |j0 =
pik pi1 ...in |j0 +
pik pi1 ...in |j0 .
2
2
i1 ,...,in k=1
i1 ,...,in k=1
i1 ,...,in k=1
Then it is easy to show that the conditional distribution {pi1 ...in |j0 } cannot be included in the
collection of optimal distributions {pi1 ...in |j , qj } maximizing the functional F ({pi1 ...in |j }, {qj }). This
claim generalizes Lemma 1 from [1] and is almost obvious: it suces to split the state j0 into
two states j0 and j0 (each with probability qj0 /2) and consider two distributions {pi1 ...in |j0 } and
{pi1 ...in |j0 } instead of {pi1 ...in |j0 }. This change increases the value of the functional F .
Proof of Corollary 1. Asymptotic equality (21) directly follows from the upper estimate (17)
and equality (20). The upper estimate in (22) also follows from (17). To prove the lower bound
in (22), we apply inequality (19), putting Xi X, i = 1, . . . , n, with probability 1, where X
is a random variable taking two values with probabilities and 1 , respectively. Choosing
I(N1 ,...,Nn )
I (X; . . . ; X) nh
2(n + 1)
+ O( ) =
n
1
ln + O( ),
2(n + 1)
0,
as desired.
Proof of Corollary 2. First of all note that denitions (9) and (23) imply
I(m) (X1 ; . . . ; Xn ) sup I (X1 ; . . . ; Xn ; Y ) I (X1 ; . . . ; Xn ; Z)
Y
for any m 2, where the random variable Z is independent of X1 , . . . , Xn and takes N equiprobable
values. Now we lower estimate I (X1 ; . . . ; Xn ; Z) by inequality (19), so that
I (X1 ; . . . ; Xn ; Z)
n
H(Xi ) + H(Z)
i=1
(X1 , . . . , Xn , Z)
,
(X1 , . . . , Xn , Z) (X1 , . . . , Xn , Z)
from which we easily conclude that I (X1 ; . . . ; Xn ; Z) as N . Indeed, the latter follows
from the above estimate if we take into account that (X1 , . . . , Xn , Z) = (X1 , . . . , Xn ) (since the
random variable Z is independent of X1 , . . . , Xn ), (X1 , . . . , Xn , Z) 2 as N (which easily
follows from (30)), and H(Z) = ln N .
REFERENCES
1. Pinsker, M.S., On Estimation of Information via Variation, Probl. Peredachi Inf., 2005, vol. 41, no. 2,
pp. 38 [Probl. Inf. Trans. (Engl. Transl.), 2005, vol. 41, no. 2, pp. 7175].
2. Prelov, V.V., On Inequalities between Mutual Information and Variation, Probl. Peredachi Inf., 2007,
vol. 43, no. 1, pp. 1527 [Probl. Inf. Trans. (Engl. Transl.), 2007, vol. 43, no. 1, pp. 1223].
PROBLEMS OF INFORMATION TRANSMISSION
308
PRELOV
3. Prelov, V.V. and van der Meulen, E.C., Mutual Information, Variation, and Fanos Inequality, Probl.
Peredachi Inf., 2008, vol. 44, no. 3, pp. 1932 [Probl. Inf. Trans. (Engl. Transl.), 2008, vol. 44, no. 3,
pp. 185197].
4. Fedotov, A.A., Harremoes, P., and Topse, F., Renements of Pinskers Inequality, IEEE Trans. Inform.
Theory, 2003, vol. 49, no. 6, pp. 14911498.
5. Pinsker, M.S., Informatsiya i informatsionnaya ustoichivost sluchainykh velichin i protsessov, Probl.
Peredachi Inf., issue 7, Moscow: Akad. Nauk SSSR, 1960. Translated under the title Information and
Information Stability of Random Variables and Processes, San Francisco: Holden-Day, 1964.
6. Csisz
ar, I. and K
orner, J., Information Theory: Coding Theorems for Discrete Memoryless Systems, New
York: Academic; Budapest: Akad. Kiad
o, 1981. Translated under the title Teoriya informatsii: teoremy
kodirovaniya dlya diskretnykh sistem bez pamyati, Moscow: Mir, 1985.
7. Zhang, Z., Estimating Mutual Information via Kolmogorov Distance, IEEE Trans. Inform. Theory, 2007,
vol. 53, no. 9, pp. 32803282.
Vol. 45 No. 4
2009