KL Simplex

KULLBACK-LEIBLER SIMPLEX
Abstract. This technical reference presents the functional structure and the algorithmic implemen-
tation of KL (Kullback-Leibler) simplex. It details the simplex approximation and fusion. The KL
simplex is fundamental, robust, adaptive an informatics agent for computational research in econom-
ics, nance, game and mechanism. From this perspective the study provides comprehensive results to
facilitate future work in such areas.
God does not care about our mathematical diculties. There is nothing free, except the grace of God.
He integrates empirically. Albert Einstein True Grit (2010)
1. Introduction
This paper presents an alternative for sequential optimizing agent which is crucial for the reliability
of computational economics research. In particular it is a version of online classier, a machine learning
which processes classication with data stream. The sequential implementation makes it ecient, fast
and practical data ow processing. Among this type of classier, informatics divergence approach stands
out with solid foundation in mathematical statistics and informatics theory. It is instructive to see the
dierence of the two approaches. Standard approach targets the performance in objective function, while
the informatics works with statistical measures, e.g. Kullback-Leibler and Renyi divergence [CDR07].
Positively the informatics agent can be eective alternative to standard sequential optimizers.
Furthermore informatics approach delivers powerful concepts, e.g. (i) the advance will leverage the
notion and insight from dynamic programming [Sni10]; when a control is simplex and transition matrix,
it has a strong foundation in probability and Markov chain [Beh00]. (ii) model-free or agnostic data
makes it capable of deriving superior second-order perceptron working the real-world data [BCG05].
This approach consequently can improve machine learning that is robust and applicable for computa-
tional research in economics, nance, game and mechanism.
The next section lists useful formula and identity. Section 3 presents the structure of online machine
learning [CDF08, LHZG11] and key results; section 4 discusses the implementation. The instructive
remarks are in section 5 and the proof is in Appendix.
2. The matrix
Simplex [CY11].
1
1 = 1 and 2
, with min () 0.
Taylor expansion. ln ( x
i
) ln (
i
x
i
) +
(
i
)xi
i
xi
.
Date: March 23, 2012.
Key words and phrases. KL divergence, second-order perceptron, informatics agent, simplex projection and fusion.
Acknowledgments. Economics department at Thammasat and Queens university greatly supports the study. The author
warmly thanks Frank Flatters, Frank Milne, Ted Neave and Pranee Tinakorn for the encouragement, without which this
paper would not had been written. He certainly appreciates comments and discussions with the Sukniyom, Ko-Kao,
Chai-Shane, Lek-Air, Ron-Fran, Pui, NaPoj and of course, Kay and Nongyao.
Contact. facebook.com/popon.kangpenkae.
1
KULLBACK-LEIBLER SIMPLEX 2
Symmetric squared decomposition (SSD).
[i]
=
2
[i]
,
[i]
= Q
[i]
_
diag
_
[i,]1
, . . . ,
[i,]d
_
Q
[i]
: Q
[i]
is
orthogonal and the eigenvector of
[i]
;
_
[i,]1
, . . . ,
[i,]d
_
is the eigenvalue of
[i]
. Of course
[i]
,
[i]
is
symmetric PSD.
Inversion [PP08] [146].
(A+BCD)
1
= A
1
A
1
B
_
C
1
+DA
1
B
_
1
DA
1
for our application,
1
=
1
i
+
x
i
x
i
c
=
i

i
x
i
x
i

i
c +x
i

i
x
i
Dierentiation [PP08] [78, 49, 102, 83].
(
i
)
2
i
(
i
) = 2
2
i
(
i
)
ln
_
det
2
_
= 2
1
Tr
_
2
i

2
_
=
2
i
+
2
i
i

2
x
i
= x
i
x
i
+ x
i
x
i
=

|x
i
|
2
|x
i
| =
xix
i
+xix
i
2xi
=
xix
i
+xix
i
2
i

2
xi
KL divergence.
D
KL
_
N
_
,
2
_
[[N
_
i
,
2
i
__
=
1
2
_
ln
_
det
2
i
det
2
_
+ Tr
_
2
i

2
_
+ (
i
)
2
i
(
i
)
_
3. Approximation
3.1. This section refers to [CDR07, LHZG11] for the model concept and denition.
As KL simplex solution in does not have a closed form, the approximation will start with

,
_
i+1
,
i+1
_
= arg min D
KL
(N(, ) [[N(
i
,
i
))
subject to (y
i
f ( x
i
) )
_
x
i
x
i
, y
i
1, 1 , and
.
Applying the main result in [LCLMV04] [V I.2] , an invariance theorem is straightforward,
Theorem. The optimal pair
_
i+1
,
i+1
_
is invariant to similarity-metric divergences.
We consider
_
normal, hinge, hinge
2
constraint (see section Section 5), with two avors:

linear,logarithm = [ln] , [ln] f (.) . Let
[i]
=
2
[i]
where
[i]
has SSD, the Lagrangian is
/ =
1
2
_
ln
_
det
2
i
det
2
_
+ Tr
_
2
i

2
_
+ (
i
)
2
i
(
i
)
_
+(|x
i
| ) + ( 1 1)
Dene hinge function z| = max 0, z and z = z| / [z[ 0, 1 .
3.2. [normal] ,
.
3.2.1. Linear :
[ln]
.
Lemma 1.
1
i+1

[ln]
1
i+1
=
1
i
+
x
i
x
i
_
x
i

i+1
x
i
Lemma 2.
i+1

[ln]
i+1
=
i
i
x
i
x
i

i
where =

ui+i
, (u
i
,
i
)
_
x
i

i+1
x
i
, x
i

i
x
i
_
.
Lemma 3.

u
i

[ln]
u
i
=

i
+
_
2
i
+ 4
i
2
Lemma 4.
i+1

[ln]
i+1
=
i
+y
i
i
(x
i
x
i
)
where x = x1
1
ixi
1
i1
1
Lemma 5.
[ln]
, =
_
b
b
2
4ac
2a
_
such that
(a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
_
,
_
=
_
y
i
(
i
x
i
) , x
i

i
(x
i
x
i
)
_
3.2.2. Logarithm :
[ln]
.
Lemma 6.
1
i+1

[ln]
Lemma 1.
Lemma 7.
i+1

[ln]
Lemma 2.
Lemma 8.

u
i

[ln]
Lemma 3.
Lemma 9.
i+1

[ln]
i+1

i
+
y
i
i
x
i
i
(x
i
x
i
) ,
where x = x1
1
ixi
1
i1
1.
Lemma 10.
[ln]
, =
_
b
b
2
4ac
2a
_
such that
(a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
_
,
_
y
i
ln (
i
x
i
) ,
x
i
i(xixi)
(
i
xi)
2
_
3.3. [hinge] ,
1
and
_
hinge
2
,
2
.
3.3.1. Linear :
1[ln]
,
2[ln]
.
1
i+1

[1,2][ln]
, Lemma 11 Lemma 21 Lemma 1,
1
i+1

[ln]
i+1

[1,2][ln]
i+1

[ln]
u
i

[1,2][ln]

u
i

[ln]
Lemma 14.
i+1

1[ln]
i+1
=
i
+y
i
(
i
x
i
) y
i
i
(x
i
x
i
)
where x
i
= x
i
1
1
ixi
1
i1
1
Lemma 15.
1[ln]
, =
_
b
b
2
4ac
2a
_
such that
(a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
,
_
,
_
=
_
y
i
(
i
x
i
) , x
i

i
(x
i
x
i
)
_
Lemma 24.
i+1

2[ln]
i+1
=
i
+
_
y
i
(
i
x
i
)
0.5
1
x
i

i
(x
i
x
i
)
_
y
i
i
(x
i
x
i
)
where x
i
= x
i
1
1
ixi
1
i1
1
Lemma 25.
2[ln]
, =
_
b
b
2
4ac
2a
_
such that
(a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
_
,
_
(y
i
(
i
x
i
) )
2
, 4x
i

i
(x
i
x
i
)
_
3.3.2. Logarithm :
1[ln]
,
2[ln]
.
1
i+1

[1,2][ln]
1
i+1

[ln]
i+1

[1,2][ln]
i+1

[ln]
u
i

[1,2][ln]

u
i

[ln]
Lemma 19.
i+1

1[ln]
i+1
=
i
+y
i
ln (
i
x
i
)
y
i
i
x
i
i
(x
i
x
i
)
where x
i
= x
i
1
1
ixi
1
i1
1
Lemma 20.
1[ln]
, =
_
b
b
2
4ac
2a
_
such that
(a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
_
,
_
y
i
ln (
i
x
i
) ,
x
i
i(xixi)
(
i
xi)
2
_
Lemma 29.
i+1

2[ln]
i+1

i
+
_
y
i
ln (
i
x
i
)
0.5
1
i
i(xixi)
(
i
xi)
2
_
y
i
i
x
i
i
(x
i
x
i
)
where x
i
= x
i
1
1
ixi
1
i1
1
Lemma 30.
2[ln]
, =
_
b
b
2
4ac
2a
_
such that
(a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
_
,
_
(y
i
ln (
i
x
i
) )
2
, 4
x
i
i(xixi)
(
i
xi)
2
_
4. Implementation
4.1. Results in section 3. is valid for the

simplex. A more common constraint is simplex;
however the close-form solution is not possible with this simplex. Projecting simplex

on is a
practical approximation; [LHZG11] reports the eectiveness of this method. The projection necessar-
ily requires a certain transformation of -covariance matrix. Further information on implementing
projection algorithm and covariance transformation is in [CY11] and [LHZG11], respectively.
Conjecture. Correlation transform is an nSD-effective covariance transformer.
4.2. Section 3 presents various choices of simplex, from which one can limit the set of simplex using
statistical dominance concept, e.g. nSD-eective. Then projecting the simplex and integrating or fusing
them which is, in practice, an empirical issue. We dene a new simplex fusing method FED (fusing
extensive dimension) as follows. Let
i{1...m}
be a set of nSD-eective simplex, each
i
[0, 1]
N
.
Connect m subsimplex into a vector in [0, 1]
mN
; apply simplex projection to the vector. The result
is simplex [0, 1]
mN
; overlay simplex , i.e. slot into m vectors in [0, 1]
N
and sum the vectors
with the proper array. The overlay will compose a FED simplex [0, 1]
N
.
Conjecture. FED simplex is an nSD-effective fuse of its nSD-effective subsimplex.
}
nSD-eective is empirical non-dominated, wrt. to the n-order stochastic dominance denition [Dav06].
5. Remark
5.1. The logic of condence constraint. Suppose
F(wxi)
F(wx
i
)
F(wx
i
)
= Z
cdf
; consider a generic con-
dence constraint Pr (F (w x
i
) 0) ().
Pr
_
F (w x
i
)
F(wxi)
F(wxi)

F(wxi)
F(wxi)
_

_
F(wxi)
F(wxi)
_
1
F(wxi)
F(wxi)

1
(1 ) =
1
()
F(wxi)

1
()
F(wxi)
=
F(wxi)
, i.e. the distance

F(wxi)
F (
wxi
)
_
F(wxi)
wxi
_
determines the proximity to the

condence constraint; [OC09] discusses the validity of similar approach for online optimization.
5.2. The approximating property of
_
normal,hinge, hinge
2
condence. Dene
_
normal,hinge, hinge
2
function as follows,
normal:
[f]

_
[ln]
,
[ln]
_
y
i
( x
i
) , y
i
ln ( x
i
)
hinge:
1[f]

_
1[ln]
,
1[ln]
_
y
i
( x
i
) | , y
i
ln ( x
i
) |
hinge
2
:
2[f]

_
2[ln]
,
2[ln]
_
_
y
i
( x
i
) |
2
, y
i
ln ( x
i
) |
2
_
, as a result of assumption w N
_
, =
2
_
;
normal:
[ln]
is exact;
[ln]
is approximate
F (w x
i
) = y
i
(w x
i
)
_
F(wxi)
,
2
F(wxi)
_
=
_
y
i
( x
i
) ,
wxi
= x
i
x
i
_
F (w x
i
) = y
i
ln (w x
i
)
F(wxi)
y
i
ln ( x
i
)
hinge:
1[ln][ln]
is approximate
F (w x
i
) = y
i
(w x
i
) |
F(wxi)
y
i
( x
i
) |
F (w x
i
) = y
i
ln (w x
i
) |
F(wxi)
y
i
ln ( x
i
) |
hinge
2
:
2[ln][ln]
is approximate
F (w x
i
) = y
i
(w x
i
) |
2
F(wxi)
y
i
( x
i
) |
2
F (w x
i
) = y
i
ln (w x
i
) |
2
F(wxi)
y
i
ln ( x
i
) |
2
Appendix
Lemma 1.
1
i+1

[ln]
/ = 0 =
1
+
1
2
2
i
+
1
2
2
i
+
x
i
x
i

2
_
x
i

2
x
i
+
x
i
x
i
2
_
x
i

2
x
i
1
update condition is,
1
=
1
2
2
i
+
1
2
2
i
+
x
i
x
i

2
_
x
i

2
x
i
+
x
i
x
i
2
_
x
i

2
x
i
_
Start with the solution,

2
implicit update,
2

2
i+1
=
2
i
+
x
i
x
i
_
x
i

2
x
i
_
which yields
1
2
=

2
i

2
+

2

x
i
x
i

_
x
i

2
x
i
[]
1
2
=

2
i
2
+

2

x
i
x
i
_
x
i

2
x
i
[]
_
[] + []
_
, i.e.
2
implicit update satisfying
1
update. The result is
direct from the replacement
_
2
i
,
2
_
= (
i
,
i+1
) :
1
i+1
=
1
i
+
x
i
x
i
_
x
i

i+1
x
i
Lemma 2.
i+1

[ln]
Apply matrix inversion to
1
i+1
=
1
i
+
xix
i
i+1xi
,
i+1
=
i

i
x
i
x
i

i
i
i+1xi
+x
i

i
x
i
=
i

i
x
i
x
i

i
_
x
i

i+1
x
i
+x
i

i
x
i
i+1
=
i

i
x
i
x
i

i
u
i
+
i
=
i
i
x
i
x
i

i
Lemma 3.

u
i

[ln]
i+1
=
i

i
x
i
x
i

i
u
i
+
i
x
i

i+1
x
i
= x
i

i
x
i

_
x
i

i
x
i
_ _
x
i

i
x
i
_
u
i
+
i
u
i
=
i

2
i
u
i
+
i
u
i
=

i
+
_
2
i
+ 4
i
2
Lemma 4.
i+1

[ln]
/ = 0 =
2
i
(
i
)
y
i
x
i
+1;

/ = 0 = 1 1
2
i
(
i
)
y
i
x
i
+ 1 = 0 =
i
+
2
i
_
y
i
x
i
1
_
1
= 1
i
+
y
i
1
2
i
x
i
1
2
i
1
1 =
y
i
_
1
2
i
x
i
1
2
i
1
_
1 =
y
i
x
i
=
i
+
y
i
2
i
(x
i
x
i
)
use
(.) = 1, f
(.) = 1 and
2
i
=
i
to have
i+1
= =
i
+y
i
i
(x
i
x
i
)
Lemma 5.
[ln]
From Lemma 3

u
i
=
i+
2
i
+4i
2
, which can be simplied with +
u
i
. Its
quadratic is a
2
+ b + c = 0 , such that (a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
. The
solution to +
u
i
is
b
b
2
4ac
2a
. We choose =
_
b
b
2
4ac
2a
_
to ensure valid 0.
To nd
_
,
_
, use binding constraint |x
i
| =
[ln]
|x
i
| = y
i
( x
i
) . Apply the
update =
i
+y
i
i
(x
i
x
i
) and

u
i
|x
i
|,
u
i
= y
i
i
x
i
+x
i

i
(x
i
x
i
)
i.e.
_
,
_
=
_
y
i
i
x
i
, x
i

i
(x
i
x
i
)
_
.
Lemma 6.
1
i+1

[ln]
Lemma 1.
Lemma 7.
i+1

[ln]
Lemma 2.
Lemma 8.

u
i

[ln]
Lemma 3.
Lemma 9.
i+1

[ln]
Similar to Lemma 4, =
i
+
y
i
2
i
(x
i
x
i
) ; use
(.) = 1; ln ( x
i
) ln (
i
x
i
) +
(
i
)xi
i
xi
f
(.) =
1
i
xi
and
2
i
=
i
, which gives
i+1
=
i
+
yi
i
xi
i
(x
i
x
i
)
Lemma 10.
[ln]
_
b
b
2
4ac
2a
_
where (a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
.
To nd
_
,
_
, set the constraint binding |x
i
| =
[ln]
|x
i
| = y
i
ln ( x
i
) . Apply
the update =
i
+
yi
i
xi
i
(x
i
x
i
) and

u
i
|x
i
| and the approximation y
i
ln ( x
i
)
y
i
_
ln (
i
x
i
) +
(
i
)xi
i
xi
_
u
i
y
i
ln (
i
x
i
) +
x
i

i
(x
i
x
i
)
(
i
x
i
)
2

, i.e.
_
,
_
y
i
ln (
i
x
i
) ,
x
i
i(xixi)
(
i
xi)
2
_
.
Lemma 11.
1
i+1

1[ln]
Lemma 1.
Lemma 12.
i+1

1[ln]
Lemma 2.
Lemma 13.

u
i

1[ln]
Lemma 3.
Lemma 14.
i+1

1[ln]
Similar to Lemma Lemma 4, =
i
+
1
f
y
i
2
i
(x
i
x
i
). There are two cases, y
i
( x
i
)
[>] [] 0.
Case [>]:
1
(.) = 1, f
(.) = 1 and
2
i
=
i
,
i+1
= =
i
+y
i
i
(x
i
x
i
)
Case []:
1
(.) = 0
i+1
= =
i
With some manipulation we nd a update
i+1
=
i
+y
i
(
i
x
i
) y
i
i
(x
i
x
i
)
Lemma 15.
1[ln]
_
b
b
2
4ac
2a
_
where (a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
.
To nd
_
,
_
, use binding constraint |x
i
| =
1[ln]
|x
i
| = y
i
( x
i
) | . We only need
the update-case y
i
( x
i
) > 0. Apply the update =
i
+y
i
i
(x
i
x
i
) and

u
i
|x
i
|,
u
i
= y
i
( x
i
) | = y
i
( x
i
) = y
i
i
x
i
+x
i

i
(x
i
x
i
)
i.e.
_
,
_
=
_
y
i
i
x
i
, x
i

i
(x
i
x
i
)
_
.
Lemma 16.
1
i+1

1[ln]
Lemma 6.
Lemma 17.
i+1

1[ln]
Lemma 7.
Lemma 18.

u
i

1[ln]
Lemma 8.
Lemma 19.
i+1

1[ln]
i
+
1
f
y
i
2
i
(x
i
x
i
) with two cases, y
i
ln ( x
i
) [>] [] 0.
Case [>]:
1
(.) = 1, ln ( x
i
) ln (
i
x
i
) +
(
i
)xi
i
xi
f
(.) =
1
i
xi
and
2
i
=
i
i+1
= =
i
+
y
i
i
x
i
i
(x
i
x
i
)
Case []:
1
(.) = 0
i+1
= =
i
With some manipulation we nd a update
i+1
=
i
+y
i
ln (
i
x
i
)
y
i
i
x
i
i
(x
i
x
i
)
Lemma 20.
1[ln]
_
b
b
2
4ac
2a
_
where (a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
.
To nd
_
,
_
, set the constraint binding |x
i
| =
1[ln]
|x
i
| = y
i
ln ( x
i
) |. We
only need the update-case y
i
ln ( x
i
i
+
yi
i
xi
i
(x
i
x
i
) and
u
i
|x
i
| and the approximation y
i
ln ( x
i
) y
i
_
ln (
i
x
i
) +
(
i
)xi
i
xi
_
,
u
i
= y
i
ln ( x
i
) | = y
i
ln ( x
i
) y
i
ln (
i
x
i
) +
x
i

i
(x
i
x
i
)
(
i
x
i
)
2
, i.e.
_
,
_
y
i
ln (
i
x
i
) ,
x
i
i(xixi)
(
i
xi)
2
_
.
Lemma 21.
1
i+1

2[ln]
Lemma 1.
Lemma 22.
i+1

2[ln]
Lemma 2.
Lemma 23.

u
i

2[ln]
Lemma 3.
Lemma 24.
i+1

2[ln]
Similar to Lemma Lemma 4, =
i
+
2
f
y
i
2
i
(x
i
x
i
). There are two cases, y
i
( x
i
)
[>] [] 0.
Case [>]:
2
(.) = 2 (y
i
( x
i
) ) ; use f
(.) = 1 and
2
i
=
i
,
=
i
+ 2(y
i
( x
i
) ) y
i
i
(x
i
x
i
)
y
i
( x
i
) = y
i
(
i
x
i
) + 2(y
i
( x
i
) ) x
i

i
(x
i
x
i
)
Write X = y
i
( x
i
) , C = y
i
(
i
x
i
) , S = 2x
i

i
(x
i
x
i
),
(, X) =
_
i
+ 2Xy
i
i
(x
i
x
i
) , C +SX =
C
1 S
_
Case []:
2
(.) = 0 (, X) = (
i
+ 2Xy
i
i
(x
i
x
i
) , 0).
We can conclude the update
i+1
= =
i
+ 2X| y
i
i
(x
i
x
i
)
i+1
=
i
+
_
y
i
(
i
x
i
)
0.5
1
x
i

i
(x
i
x
i
)
_
y
i
i
(x
i
x
i
)
Lemma 25.
2[ln]
_
b
b
2
4ac
2a
_
where (a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
.
To nd
_
,
_
, use binding constraint 0 |x
i
| =
2[ln]
|x
i
| = y
i
( x
i
) |
2
. We only
need the update-case y
i
( x
i
i
+
yi(
i
xi)
0.5
1
x
i
i(xixi)
y
i
i
(x
i
x
i
)
and

u
i
|x
i
|.
u
i
= y
i
( x
i
) |
2
= (y
i
( x
i
) )
2
u
i
=
_
y
i
i
x
i
+
y
i
(
i
x
i
)
0.5
1
x
i

i
(x
i
x
i
)
x
i

i
(x
i
x
i
)
_
2
Suppose g () =
_
A+
AC
0.5
1
C
_
2
, with (A, C,
0
) =
_
y
i
(
i
x
i
) , x
i

i
(x
i
x
i
) , 0
_
and use
Taylor expansion g () g (
0
) + g
(
0
) (
0
) . It follows that
_
g (0) , g
(0)
_
=
_
A
2
, 4A
2
C
_
, thus
u
i
= g () A
2
+ 4A
2
C
_
,
_
(y
i
(
i
x
i
) )
2
, 4x
i

i
(x
i
x
i
)
_
.
Lemma 26.
1
i+1

2[ln]
Lemma 6.
Lemma 27.
i+1

2[ln]
Lemma 7.
Lemma 28.

u
i

2[ln]
Lemma 8.
Lemma 29.
i+1

2[ln]
i
+
2
f
y
i
2
i
(x
i
x
i
) with two cases, y
i
ln ( x
i
) [>] [] 0.
Case [>]:
2
(.) = 2 (y
i
ln ( x
i
) ) ; use ln ( x
i
) ln (
i
x
i
) +
(
i
)xi
i
xi
f
(.) =
1
i
xi
and
2
i
=
i
,

i
+ 2
_
y
i
_
ln (
i
x
i
) +
(
i
) x
i
i
x
i
_
_
y
i
i
x
i
i
(x
i
x
i
)
(
i
) y
i
x
i
i
x
i
2
x
i

i
(x
i
x
i
)
(
i
x
i
)
2
_
y
i
ln (
i
x
i
) +
(
i
) y
i
x
i
i
x
i
_
Write X =
(
i
)yixi
i
xi
, C = y
i
ln (
i
x
i
) , S = 2
x
i
i(xixi)
(
i
xi)
2
,
hence X = S (C +X) =
SC
1S
and
(, C +X)
_
i
+ 2(C +X)
y
i
i
x
i
i
(x
i
x
i
) ,
C
1 S
_
Case []:
2
(.) = 0 (, C +X) =
_
i
+ 2(C +X)
yi
i
xi
i
(x
i
x
i
) , 0
_
.
We can conclude with the update
i+1
=
i
+ 2C +X| y
i
i
(x
i
x
i
)
i+1

i
+
_
y
i
ln (
i
x
i
)
0.5
1
i
i(xixi)
(
i
xi)
2
_
y
i
i
x
i
i
(x
i
x
i
)
Lemma 30.
2[ln]
_
b
b
2
4ac
2a
_
where (a, b, c) =
_
+
i
2
_
, 2
_
+
i
2
2
_
,
2
2
_
.
To nd
_
,
_
, use binding constraint 0 |x
i
| =
2[ln]
|x
i
| = y
i
ln ( x
i
) |
2
. We
only need the update-case y
i
ln ( x
i
) > 0. Apply the update
=
i
+
y
i
ln (
i
x
i
)
0.5
1
i
i(xix)
(
i
xi)
2
y
i
i
x
i
i
(x
i
x
i
)
and

u
i
|x
i
| to have
u
i
= y
i
ln ( x
i
) |
2
= (y
i
ln ( x
i
) )
2
.
Use the approximation y
i
ln ( x
i
) y
i
_
ln (
i
x
i
) +
(
i
)xi
i
xi
_
,
u
i

_
y
i
_
ln (
i
x
i
) +
(
i
)xi
i
xi
_
_
2
_
_
y
i
ln (
i
x
i
) +
yi ln(
i
xi)
0.5
1
i

i
(x
i
x
i
)
(
i
x
i
)
2
i
i(xixi)
(
i
xi)
2
_
_
2
Similar to Lemma 25, with (A, C,
0
) =
_
y
i
ln (
i
x
i
) ,
x
i
i(xixi)
(
i
xi)
2
, 0
_
; one can show
u
i

A
2
+ 4A
2
C
_
,
_
(y
i
ln (
i
x
i
) )
2
, 4
x
i
i(xixi)
(
i
xi)
2
_
.
References
[Beh00] E. Behrends. Introduction to Markov Chains with Special Emphasis on Rapid Mixing. Advanced Lecture Notes
in Mathematics, Vieweg Verlag, 2000.
[BCG05] N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order perceptron algorithm. SIAM Journal of Commu-
tation, 34(3): 640668, 2005.
[CY11] Y. Chen and X. Ye. Projection onto a simplex. Department of Mathematics, University of Florida, 2011.
[CDR07] J.F. Coeurjolly, R. Drouilhet, and J.F. Robineau. Normalized information-based divergences. Problems of In-
formation Transmission, 43(3): 167189, 2007.
[CDF08] K. Crammer, M. Dredze, and F. Pereira. Exact convex condence-weighted learning. Neural Information Pro-
cessing Systems (NIPS), 2008.
[Dav06] R. Davidson. Stochastic dominance. Department of Economics, McGill University, 2006.
[LCLMV04] M. Li, X. Chen, X. Li, B. Ma and P.M.B. Vitnyi. The similarity metric. IEEE Trans. Inform. Theory,
50(12): 32503264, 2004.
[LHZG11] B. Li, S.C.H. Hoi, P. Zhao, and V. Gopalkrishnan. Condence weighted mean reversion strategy for on-line
portfolio selection. School of Computer Engineering, Nanyang Technological University, 2011.
[OC09] F. Orabona and and K. Crammer. New adaptive algorithms for online classication. Neural Information Process-
ing Systems (NIPS), 2010.
[PP08] K.B. Petersen and M.S. Pedersen. The Matrix Cookbook, 2008.
[Sni10] M. Sniedovich. Dynamic Programming: Foundations and Principles, 2nd ed., CRC Press, 2010.

KL Simplex

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

KL Simplex

Uploaded by

Copyright:

Available Formats

KULLBACK-LEIBLER SIMPLEX

constraint (see section Section 5), with two avors:

determines the proximity to the

Start with the solution,

You might also like