Professional Documents
Culture Documents
1=(j +t) 1=(n+T), and we want tojbe within = =T . Proof. The function log f is convex. The derivative
Since this must hold for nT different ut ’s, it suffices for, of logf at ~b is is the vector f 0 (~b) = ( b~1 ; b2~ ; : : :; bn~ ).
f (b) f (b) f (b)
; The matrix F 00 of second derivatives has i; j th entry
bi bj
e m2 =(2T 2(n+T ))
nT f (~b)2 .
Thus F 00 = f 0T f 0 is a negative semidefinite matrix, im-
which holds for the number of samples m chosen in the plying that logf is a concave function in the positive or-
theorem. 2 thant. 2
The biased sampler will actually sample from a distribu- The symmetric random walk described above can be
tion that is close to t, call it pt , with the property that modified to have any desired target distribution. This is
called the Metropolis filter [13], and can be viewed as a
Z
jt(b) pt(b)jdb 0
combination of the walk with rejection sampling: If the
walk is at x and chooses the point y as its next step, then
move to y with probability min(1; ff ((yx)) ) and do nothing The main issue is how fast the random walk approaches
with the remaining probability (i.e. try again). Lovasz . We return to the discrete distribution on the grid points.
and Simonovits [12] have shown that this random walk is Let the distribution attained by the random walk after
rapidly mixing, i.e. it attains a distribution close to the sta- steps be p , i.e. p (x) is the probability that the walk is
tionary one in polynomial time. at the grid point x after steps. The progress of the random
For our purpose, however, the following discretized ran- walk can be measured as the distance between its current
dom walk has the best provable bounds on the mixing time. distribution p and the stationary distribution as follows:
First rotate so pthat it is on the plane x = 0 and scale it X
by a factor of 1= 2 so that it has unit diameter. We will jjp jj = jp (x) (x)j
only walk on the set of points in whose coordinates are x 2D
multiples of a fixed parameter > 0 (to be chosen below),
i.e. points on an axis parallel grid whose “unit” length is . In [7], Frieze and Kannan derive a bound on the conver-
Any point on this grid has 2n neighbors, 2 along each axis. gence of this random walk which can be used to derive the
following bound for our situation.
1. Start at a (uniformly) random grid point in the simplex.
Theorem 2 After steps of the random walk,
2. Suppose X() is the location of the walk at time .
3. Let y be a random neighbor of X(). jjp jj2 e nt2 (n+t)2 (n + t)2
4. If y is in , then move to it, i.e. set X( + 1) = y where
> 0 is an absolute constant.
with probability p = min(1; ff ((xy)) ), and stay put with
probability 1 p (i.e. X( + 1) = X()). Corollary 1 For any 0 > 0, after O(nt2(n + t)2 log n+0 t )
steps,
Let the set of grid points be denoted by D. We will ac- jjp jj2 0 :
tually only sample from the set of grid points in that are
not too close to the boundary, namely, each coordinate xi is
for a small enough . For convenience we will
at least n+
Proof (of theorem). Frieze and Kannan prove that
t
assume that each coordinate is at least 2(n1+t) . Let this set
2 2
of grid points be denoted by D. Each grid point x can be 2jjp jj2 e nd2 log 1 + M
nd
2
associated with a unique axis-parallel cube of length cen-
tered at x. Call this cube C(x). The step length is chosen where
> 0 is a constant, d is the diameter of the con-
so that for any grid point x, f(x) is close to f(y) for any vex body in which we are running the random walk, is
y 2 C(x). minx2D (x), is a parameter between 0 and 1, and
Lemma 3 If we choose < 2( log1+ then for any grid point X
n+t)t = (x):
z in D, and any point y 2 C(z), we have x 2D : vol(C (x)\)
vol(C (x))
(1 + ) 1f(z) f(y) (1 + )f(z):
In words, is the probability of the grid points
Proof. Since y 2 C(z), maxj jyj zj j . For any price whose cubes intersect the simplex in less than frac-
relative x , the ratio yzxx is at most maxj zjj . This can be
y tion of their volume. The parameter M is defined1 as
written as maxx p0 (x) log p0((xx)) , where p0 is the initial distribution on
max zj + (yj zj ) = max(1 + ) the states.
For us the diameter d is 1. We will set = 41 and choose
j zj j zj small enough so that is a constant. This can be done
1 for example with any 2(n1+t) . To see this, consider
Since each coordinate is at least 2(n+t) we have that the
ratio is at most (1 + 2(n + t)). Thus the ratio ff ((yz)) is at the simplexPblown up by a factor of 1 i.e. the set 1 =
most (1 + 2(n + t))t and the lemma follows. 2 fyjy 0; i yi = 1 g: Now the set of points with integer
coordinates correspond to the original grid points. Let B be
The stationary distribution of the random walk will the set of cubes on the border of this set, i.e. the volume
be proportional to f(z) for each grid point z . Thus when of each cube in B that is in 1 is less than 14 . Then by
viewed as a distribution on the simplex, for any point y in
1 The M we use here differs slightly from the definition in [7], where
M = maxx p0 xx log p0 xx . However, the theorem holds with either
the simplex, ( ) ( )
( 1 )n
tion of the UNIVERSAL algorithm. Not only does the ap-
proximation have an expected performance equal to that of
UNIVERSAL, but with high probability (1 ) it is within
Also, the performance of these border grid points can only (1 ) times the performance of universal, and runs in time
be (1 + )t better than the corresponding (non-blown up) polynomial in log 1 , 1=, the number of days, and the num-
points in the corresponding points. Thus (1+)n+t <
2 for 2(n1+t) .
ber of stocks. With money, it is especially important to
achieve this expectation. For example, a 50% chance at 10
Thus the bound above on the distance to stationary be- million dollars may not be as valuable to most people as a
comes guaranteed 5 million dollars.
log 1 + 2Mn
2
While our implementation can be used for applications
2jjp jj2 e n
2
of UNIVERSAL, such as data compression [6] and lan-
guage modeling [10], we do not implement it in the case
Next we observe that by our choice of starting point (uni- of transaction costs [2] or for the Dirichelet(1=2; : : :; 1=2)
form over the simplex) M is exponentially small. Thus we UNIVERSAL [4].
can ignore the second term in the right hand side. Finally
)t, which simplifies the
we note that is at least n ( n+ t References
inequality to
2
[1] D. Applegate and R. Kannan Sampling and integration of
jjp jj2 e n (n + t)2 near log-concave functions. In Proceedings of the Twenty
Third Annual ACM Symposium on Theory of Computing,
Our choice of (= O( (n+1 t)t ) implies the theorem (with a 1991.
different
)). 2 [2] A. Blum and A. Kalai Universal Portfolios With and With-
out Transaction Costs. Machine Learning, 35:3, pp 193-205,
1999.
5.1. Collecting samples
[3] Universal portfolios. Math. Finance, 1(1):1-29, January
1991.
Although generating the first sample point takes
O (nt2(n + t)2) steps of the random walk, future samples [4] T.M. Cover and E. Ordentlich. Universal portfolios with
can be collected more efficiently using a trick from [11]. In side information. IEEE Transactions on Information The-
fact, the position of the random walk can be observed ev-
ery O(n2(n + t)2 ) steps to obtain sample points with the
ory, 42(2), March 1996.
property that they are pairwise nearly independent in the [5] E. Ordentlich. and T.M. Cover. Online portfolio selection.
following sense. For any two subsets A; B of the simplex, In Proceedings of the 9th Annual Conference on Computa-
two sample points u and v satisfy tional Learning Theory, Desenzano del Garda, Italy, 1996.
jPr(u 2 A; v 2 B) Pr(u 2 A)Pr(v 2 B)j [6] T.M. Cover. Universal data compression and portfolio selec-
tion. In Proceedings of the 37th IEEE Symposium on Foun-
dations of Computer Science, pages 534-538, Oct 1996.
This can be used to reduce the number of samples. We
collect completely independent groups of samples, each [7] A. Frieze and R. Kannan. Log-Sobolev inequalities and
sample consisting of pairwise independent samples, and sampling from log-concave distributions. Annals of Applied
then compute the average of the groups. Probability 9, 14-26.
As an implementation detail, one can do the random
walk as follows. Choose an initial point at random on the [8] D.P. Foster and R.V. Vohra. Regret in the On-line Decision
Problem. Games and Economic Behavior, Vol. 29, No. 1/2,
simplex, as described earlier, and then choosing an individ-
pp. 7-35, Nov 1999.
ual component at random. Increase (or decrease) that com-
ponent by , and then decrease (or increase) the remaining [9] D. Helmbold, R. Schapire, Y. Singer, and M. Warmuth. On-
components by =(n 1). Use the same rejection technique line portfolio selection using multiplicative updates. Ma-
to decide whether to actually take that step in the random chine Learning: Proceedings of the Thirteenth International
walk. Conference, 1996.
[10] A. Kalai, S. Chen, A. Blum, and R. Rosenfeld. On-line Al-
gorithms for Combining Language Models. In Proceedings
of the International Conference on Accoustics, Speech, and
Signal Processing (ICASSP ’99), 1999.