Professional Documents
Culture Documents
Markov Chains
Markov chains are an important tool for modeling stochastic processes that occur frequently in computer science. For example, after introducing the basic theory, we will
demonstrate how they are used for deriving randomized algorithms for Satisfiability.
4.1
Basics
for all i S.
28
Example 4.1. Suppose that a certain process can have one of two states, i.e., S = {0, 1}.
If in state 0, then the process jumps to state 1 with probability [0, 1]. If in state
1, then the process jumps to 0 with probability [0, 1]. Otherwise the state does not
change. This induces the following diagram
1
P =
.
Observation 4.2. A discrete time stochastic process X = (Xs )0st is M (P, ) if and
only if
Pr [X0 = i0 , X1 = i1 , . . . , Xt = it ] = i0 pi0 ,i1 . . . pit1 ,it
holds for all i0 , . . . , it S.
Proof. If X = (Xs )0st is M (P, ) then
Pr [X0 = i0 , X1 = i1 , . . . , Xt = it ]
= Pr [X0 = i0 ] Pr [ X1 = i1 | X0 = i0 ] Pr [ Xt = it | X0 = i0 , . . . , Xt1 = it1 ]
Conversely,
if the claimed condition holds for t, then by summing both sides and using
P
jS pi,j = 1 we see that it also holds for t 1. Thus the condition holds for the times
t0 = 0, 1, . . . , t. In particular we have Pr [X0 = i0 ] = 0 and for t0 = 0, . . . , t 1
Pr [ Xt0 +1 = it0 +1 | X0 = i0 , . . . , Xt0 = it0 ]
Pr [X0 = i0 , X1 = i1 , . . . , Xt0 +1 = it0 +1 ]
=
Pr [X0 = i0 , X1 = i1 , . . . , Xt0 = it0 ]
= pit0 ,it0 +1
and so the process is M (P, ).
Define the unit mass at state i S by a vector i = (i,j )jS with
(
1 if i = j,
i,j =
0 otherwise.
The following result is important for the theory of time invariant Markov chains as it
captures the idea of a restart.
Theorem 4.3 (Markov Property). Let (Xt )tT be M (P, ). Conditional on the event
Xm = i, the stochastic process (Xt+m )tT is M (P, i ) and independent of X0 , . . . , Xm .
Proof. We have to show that for any event A determined by X0 , . . . , Xm we have
Pr [ {Xm = im , . . . , Xm+t } A | Xm = i]
and then the result follows by Observation 4.2. First consider an elementary event
A = {X0 = i0 , . . . , Xm = im }.
In that case we have to show
Pr [ {Xm = im , . . . , Xm+t } A | Xm = i]
Pr [X0 = i0 , . . . , Xm+t = im+t , i = im ]
=
Pr [Xm = i]
i,im pim ,im+1 . . . pim+t1 ,im+t Pr [X0 = i0 , . . . , Xm = im ]
=
Pr [Xm = i]
= i,im pim ,im+1 . . . pim+t1 ,im+t Pr [ X0 = i0 , . . . , Xm = im | Xm = i] .
which is true by Observation 4.2. In general, any event A determined by X0 , . . . , Xm can
be written as a disjoint union of countable many elementary events Ak , i.e., A =
k=1 Ak .
The claim now follows by summation over the identities for the Ak .
Let pi (t) denote the probability that the process X is in state i at time t. Define the
row vector p(t) =P(ps (t))sS which is called the distribution of the chain at time t. Observe
that pi (t + 1) = jS pj,i pj (t).
Observation 4.4. If X is M (P, ), then we have
p(t + 1) = p(t) P
for all t T , where p(0) = .
(m)
Now define the m-step transistion probability pi,j = Pr [ Xt+m1 = j | Xt1 = i] and
(m)
analogusly pi
execise.
(t), and p(m) (t). The proof of the following basic observation is left as an
4.2
Class Structure
It is sometimes possible to break a chain into smaller pieces, where each piece is simpler to
understand and which together give an understanding of the whole. One way is to define
the communicating classes of the chain. We need some definitions. Let i, j S be two
states. We say that j is reachable from i if Pr [ Xt+m = j for some m T | Xt = i] > 0
and write i j. The proof of the theorem below is an excerise.
30
4.3
Let X be a Markov chain with states S and let A S be a state set. The random variable
H A T {} defined by H A () = inf{t T : Xt () A} is called the hitting time of
the state set. (The infimum of is .) Define the probability that, starting from state
i S, we hit A by
A
hA
i = Pr H < X0 = i .
Furthermore define by
A
X0 = i
tA
i =E H
the expected hitting time for A by starting from i. If A is a closed class, then hA
i is called
A
the absorbition probability and ti the absorbtion time, respectively. Define the vectors
A
A
hA = (hA
i )iS and t = (ti )iS . Below, the expression minimal solution means that, if
h is a minimal solution and h0 is any solution, then h0i hi for all i S.
Theorem 4.9. We have that hA is the minimal non-negative solution to the system of
linear equations given by
(
hA
for i A,
i =1
P
A
A
hi = jS hj pi,j for i 6 A.
Proof. First we show that hA must satisfy the above conditions. If X0 = i A, then
A
H A = 0 and hA
i = 1. If X0 = i 6 A, then H 1 and by the Markov property,
Pr H A < X1 = j, X0 = i = Pr H A < X1 = j = hA
j .
Thus
A
hA
i = Pr H < X0 = i
X
=
Pr H A < X1 = j, X0 = i Pr [ X1 = j | X0 = i]
jS
X
jS
hA
j pi,j .
31
Now suppose that x = (xi : i S) is any solution for the above conditions. We have to
show that x hA . Clearly, hA
i = xi = 1 for i A. For i 6 A we have
X
X
X
xi =
xj pi,j =
pi,j +
xj pi,j .
jS
j6A
jA
X
X X
X
xi =
pi,j +
pj,k +
xk pj,k pi,j
jA
j6A
k6A
kA
= Pr [ X1 A | X0 = i] + Pr [ X2 A, X1 6 A | X0 = i] +
XX
pi,j pj,k xk .
j6A k6A
jm 6A
Now, if x is non-negative,
so
is the last
the other
term on the right-hand
A side above and
A
terms sum to Pr H m X0 = i . Therefore xi Pr H m X0 = i . Hence we
have
xi lim Pr H A m X0 = i = Pr H A < X0 = i = hA
i
m
1/2
0
1/2
1
1/2
We ask ourselves what the probability of absorbtion in A = {3} is. Thus, using the
abbreviations hi = hA
i , we write the required system of equations:
h3 = 1
1
h2 = h1 +
2
1
h1 = h0 +
2
1
h3
2
1
h2 .
2
We resolve and find h1 = 1/3 + 2/3h0 and h2 = 2/3 + 1/3h0 . The value for h0 is not
determined by the system, but the minimality requirement takes us choose h0 = 0. Hence
we have h1 = 1/3, which means, that the probability of absorbtion in A = {3} when
starting from state 1 is 1/3. Analogously h2 = 2/3.
Example 4.11 (Gamblers Ruin). Suppose you gamble with initial wealth of i Euro. With
probability p you will win one Euro, with probability 1 p you will lose one Euro. If you
are broke, i.e., the wealth is 0 Euro, the game ends. What is the probability that you will
go broke?
This process obviously translates into a Markov chain with the following diagram:
32
p
1
1p
1p
p
3
1p
1p
pi,i+1 = p
for i = 1, 2, . . . ,
for i = 1, 2, . . . .
for i = 1, 2, . . . .
If p 6= 1 p, i.e, p 6= 1/2, then the solution of the recurrence has the form
hi = +
1p
p
i
,
for i = 0, 1, . . .
33
4.4
Theorem 4.3 on the Markov property basically states that, if we stop the process at time
m, say, look in which state it is, and then continue, the restarted process is also a Markov
chain. What happens if we wait for the process to hit a certain state i, at a random
time M , say? Is the restarted process also a Markov chain? It turns out that the Markov
property also holds at stopping times.
A random variable M T {} is called a stopping time if the event {M = m}
depends only on X0 , . . . , Xm for any m T . Intuitively, you sense when the event occurs
by watching the process and if you are asked to stop at M , you know when to stop.
Example 4.13. The hitting time H A defined earlier is a stopping time because
{H A = m} = {X0 6 A, . . . , Xm1 6 A, Xm A},
i.e., the event that the set A is hit at time m depends only on X0 , . . . , Xm .
Theorem 4.14 (Strong Markov Property). Let (Xt )tT be M (P, ) and let M be a stopping time for it. Then, conditional on M < and XM = i, the process (Xt+M )tT is
M (P, i ) and independent of X0 , . . . , XM .
Proof. If A is an event determined by X0 , . . . , XM , then A {M = m} is determined by
X0 , . . . , Xm . So, by conditioning and the Markov property at time m
Pr [{XM = i0 , . . . , XM +t = it } A {M = m} {XM = i}]
= Pr [ XM = i0 , . . . , XM +t = it | A {M = m} {XM = i}]
Pr [A {M = m} {XM = i}]
= Pr [ X0 = i0 , . . . , Xt = it | X0 = i] Pr [A {M = m} {XM = i}] ,
where we have used the condition {M = m}. Now we sum over m = 0, 1, . . . and divide
by Pr [M < , XM = i] to obtain
Pr [ {XM = i0 , . . . , XM +t = it } A | M < , XM = i]
= Pr [ XM = i0 , . . . , XM +t = it | A, M < , XM = i] Pr [ A | M < , XM = i]
= Pr [ X0 = i0 , . . . , Xt = it | X0 = i] Pr [ A | M < , XM = i]
4.5
The first passage time to state i is the random variable Ti defined by Ti = inf{t 1 :
(0)
(1)
Xt = i}. Now define the r-th passage time inductively Ti = 0, Ti = Ti , and
(r+1)
Ti
(r)
= inf{t Ti
+ 1 : Xt = i}.
(r)
. We
Proof. We apply the strong Markov property at the stopping time R = Ti
clearly have XR = i on R < . So, conditional on R < , (XR+m )m0 is M (P, i ) and
(r)
(r)
independent of X0 , . . . , XR . But Si = inf{m 1 : XR+m = i}, so Si is the first passage
time of (XR+m )m0 to state i.
Let the random variable Vi,t indicate P
the event {Xt = i}, i.e., a visit of state i at time
t. Then the number of visits to i is Vi = tT Vi,t . Notice that
X
X
X (t)
E [ Vi | X0 = i] =
E [ Vi,t | X0 = i] =
Pr [ Xt = i | X0 = i] =
pi,i .
tT
tT
tT
Proof. Observe that if X0 = i then {Vi > r} = {Ti < }. When r = 0 the result is
true. Suppose inductively that it is true for r, then
h
i
Pr [ Vi > r + 1 | X0 = i] = Pr T (r+1) < X0 = i
h
i
(r)
(r+1)
= Pr Ti < , Si
< X0 = i
h
i h
i
(r)
(r+1)
(r)
= Pr Si
< Ti < , X0 = i Pr Ti < X0 = i
= fi fir = fir+1
by Lemma 4.15 and the claim is proved.
Now we give the desired result.
Theorem 4.17. The following dichotomy holds:
(t)
tT
pi,i = .
tT
pi,i < .
35
(t)
(t)
pi,i = E [ Vi | X0 = i] = .
X
r=0
Pr [ Vi > r | X0 = i]
fir =
r=0
1
<
1 fi
so Pr [ Vi = | X0 = i] = 0 and i is transient.
P
(t)
It is useful to remember the criterion that a state i is recurrent if and only if tT pi,i =
. Moreover, recurrence and transience are properties of communicating classes. Here we
state without proof:
Corollary 4.18. We have that:
(i) Either all states in a class are transient or all are recurrent.
(ii) Every recurrent class is closed.
(iii) Every finite closed class is recurrent.
Corollary 4.19. If P is irreducible and recurrent, then for all i S we have
Pr [Ti < ] = 1.
Example 4.20. In Example 4.10 the classes {0} and {3} are recurrent. The class {1, 2}
is transient.
It is easy to spot closed classes, so the recurrence or transience of finie classes is easy
to determine. On the other hand, infinite closed classes may be transient. Thus we are
now interested in (irreducible) Markov chains with an infinite number of states.
36
4.5.1
Random Walk on Z
pi,i+1 = p
for all i Z.
(2t+1)
p0,0 =
(2t)!
(4p(1 p))t
t
,
(p(1
p))
'
(t!)2
t
Case 1. If p = 1 p, i.e., p = 1/2 (called the symmetric case), then 4p(1 p) = 1 then
(2t)
p0,0 1/ t and hence
X
(t)
p0,0 =
tT
X
tT
(2t)
p0,0
X 1
=
t
tT
and the state 0 is recurrent. Hence, since the Markov chain is irreducible, all states
are recurrent.
(2t)
Case 2. If p 6= 1 p, then 4p(1 p) = r < 1 and we have p0,0 rt / . Hence
X
(t)
p0,0 =
tT
X
tT
1 X t
(2t)
p0,0
r < ,
tT
4.5.2
Random Walk on Z3
While the symmetric random walks on Z and Z2 are recurrent, this is not the case for
Z3 as we will show here. The symmetric random walk on Z3 is given by the transition
probabilities
(
1
if |i j| = 1,
pi,j = 6
0 otherwise.
That is, in the integer grid Z3 , the walk jumps to one of the nearest neighbors equiprobably.
Again, starting from state 0 = (0, 0, 0), we can only return after an even number of steps,
2t, say. In each of the dimensions, we must have the same number of positive and negative
jumps. Thus
X
2
2t
t
1
(2t)
.
p0,0 =
t
i j k 62t
i,j,k0
i+j+k=t
37
Observe that
X
i,j,k0
i+j+k=t
t
1
=1
i j k 3t
because the left hand side is the total probability of placing t balls equiprobably into three
bins. For t = 3m we have the upper bound
t
t!
t!
3m
=
=
ijk
i!j!k!
m!m!m!
mmm
for all i, j, k with i + j + k = t = 3m. Hence, using Stirlings formula
X 3m 2 1
6m
(2t)
(6m)
p0,0 = p0,0 =
i j k 63m
3m
i,j,k0
i+j+k=3m
6m 1
3m
1
6m
3m
3m 2
mmm 3
X
i,j,k0
i+j+k=3m
3m
1
3m
ijk 3
6m 1
3m
1
=
6m
6m
3m 2
mmm 3
6m
e 6m
1
6m
26m
6m
'
2
e
3m
23m 2
3m
3m
e 3m
23m
1
3m
3
e
m
2m 3
26m
1
23m
1
3m
6m
=2
2 26m 3
3 33m
23m
2m
1
1
= 3/2 .
2 m
(6m)
p0,0
m0
(6m)
But p0,0
(6m2)
(6m)
tT
m1
1
< ,
m3/2
m0
X
(6m4)
(1/6)2 p0,0
and p0,0 (1/6)4 p0,0
. Therefore
X (2t) X (6m)
(6m2)
(6m4)
p0,0 =
p0,0 + p0,0
+ p0,0
<
4.6
As we will see shortly, it is often useful to consider the long-term properties of a Markov
chain. Such behaviour is often connected with the notion of an invariant distribution.
A measure P
is any row vector = (i )iS with i 0 for all i S. A measure is a
distribution if iS i = 1. A measure = (i )iS is invariant if
= P,
where P is the transition matrix of a (time invariant, discrete) Markov chain. (Equivalent
notions are equilibrium or stationary measure.)
38
Observation 4.21. If (Xt )tT is M (P, ), where is invariant, then, for any m T ,
(Xt+m )tT is also M (P, ).
Proof. We clearly have Pr [Xm = i] = (P m )i = i for all i S. Conditional on Xm+t = i,
Xm+t+1 is independent of Xm , Xm+1 , . . . , Xm+t and has the distribution j for j S.
The following observation basically states that, if the m-step transition probabilities
converge, then this yields an invariant distribution. The central result, Theorem 4.29
presented later on, gives a sufficient condition which types of Markov chains converge to
equilibrium (even for every initial distribution).
Observation 4.22. Let X be M (P, ) with finite S. If for some i S we have for all
(m)
j S that pi,j j as m , then = (j )jS is an invariant distribution.
Proof. We have
X
j =
jS
and
(t)
X
jS
(t)
(t1)
pi,k
pk,j =
kS
X
kS
(t)
pi,j = 1
jS
(t1)
lim pi,k
pk,j =
k pk,j
kS
where we have used finiteness of S to justify the exchange of summation and limit. Hence
is an invariant distribution.
Notice that the finiteness of S is essential: In the random walks on Z{1,2,3} discussed
(m)
earlier, we clearly have pi,j 0 as m . The limit measure is certainly invariant, but
it is not a distribution.
We now show that every irreducible and recurrent Markov chain has an essentially
unique invariant distribution. For a fixed state k, let the expected time spent in state i
between visits to k be
" T 1
#
k
X
k
i = E
Vi,t X0 = k .
t=0
39
Since P is recurrent, we have Tk < with probability one and X0 = XTk = k by condition.
Hence
#
"T
k
X
1{Xt =j} X0 = k
jk = E
t=1
"
#
X
=E
1{Xt =j,Tk t} X0 = k
t=1
X
t=1
Pr [ Xt = j, Tk t | X0 = k]
XX
iS t=1
pi,j
Pr [ Xt1 = i, Xt = j, Tk t | X0 = k]
X
t=1
iS
"
Pr [ Xt1 = i, Tk t | X0 = k]
pi,j E
X
t=1
" T 1
k
X
pi,j E
1{Xt =i}
t=0
iS
1{Xt1 =i,Tk t}
iS
#
X0 = k
#
X0 = k
ik pi,j .
iS
(n)
(m)
(m)
For (iii), for each state i, there exist n, m 0 with pi,k , pk,i > 0. Then ik kk pk,i > 0
(n)
Theorem 4.24. Let P be irreducible and let be an invariant measure for P with k = 1.
Then k . If, in addition, P is recurrent, then = k .
Proof. For each j S we have
X
X
j =
i0 pi0 ,j =
i0 pi0 ,j + pk,j
i0 S
i0 6=k
i0 ,i1 6=k
pk,i0 pi0 ,j
i0 6=k
..
.
=
i0 ,...,it 6=k
+ pk,j +
X
i0 6=k
t
X
s=1
jk
pk,i0 pi0 ,j + +
X
i0 ,...,it1 6=k
Pr [ Xs = j, Tk s | X0 = k]
for t .
40
pk,it1 . . . pi0 ,j
X
iS
ik
X i
1
=
<
k
k
iS
(m)
has the property that P 2m = I and P 2m+1 = P . Thus the pi,j do not converge for all
i, j. The cause is that the chain is periodic, that is, if we know that the chain is in state
0 at time t, then we know for certain that it is not in state 0 at time t + 1.
41
(t)
(2t)
(m)
Lemma 4.27. A state i is aperiodic if and only if pi,i > 0 for all sufficiently large m.
(m)
Proof. Let i S satisfy the condition that pi,i > 0 for all sufficiently large m. Then the
(t)
set {t T : pi,i > 0} has no common divisor other than one. Hence i is aperiodic.
Let i S be aperiodic. With the use of the Extended Euclidean Algorithm we
have the following statement: For a, b N, there is n0 N such that, for d = gcd(a, b), and
(a) (b)
any n n0 , there are x, y N0 such that nd = xa + yb. If pi,i , pi,i > 0, then we also have
(nd)
pi,i
(xa+yb)
= pi,i
(a)
(b)
(s)
(t)
pj,k
for all sufficiently large m = r + s + t, where we have used that state i is aperiodic and
Lemma 4.27.
Theorem 4.29 (Convergence). Let P be irreducible and aperiodic, and suppose that P
has an invariant distribution . Let be any distribution. If X is M (P, ) then
Pr [Xt = j] j
as t for all j S.
(t)
(t) (t)
q(i,k),(j,`) = pi,j pk,l > 0 for sufficiently large t. So the matrix Q is irreducible and has
an invariant distribution (i,k) = i k . So by Theorem 4.25 Q is positive recurrent.
M is the first passage time of W to (b, b), so Pr [M < ] = 1, by Corollary 4.19.
42
if t < M ,
if t M .
Pr [t < M ]
0
for t ,
43
4.7
Applications
4.7.1
Satisfiability
First we assume that F does not have a satisfying assignment. In that case the algorithm will always (correctly) output unsatisfiable. Now assume that F has a satisfying assignment a {0, 1}n , say. Let the assignment after t steps of the algorithm be
At {0, 1}n and let Nt denote the number of variables in At that have the same value as
in a, i.e., the variables that match. Of course, Nt {0, . . . , n} and if Nt = n holds, then
the satisfying assignment a is found.
In each step, we choose an unsatisfied clause C and hence At and a must disagree in at
least one variable. As k = 3, the probability that the algorithm flips one of these variables
44
for i = 0, . . . , n 1,
for i = 1, . . . , n 1,
We can analyze the expected number of steps until Nt = n holds with the Markov
chain X = (Xt )tT , where X0 = N0 and
Pr [ Xt+1 = 1 | Xt = 0] = 1,
1
Pr [ Xt+1 = i + 1 | Xt = i] = ,
3
2
Pr [ Xt+1 = i 1 | Xt = i] = ,
3
Pr [ Xt+1 = n | Xt = n] = 1.
for i = 1, . . . , n 1,
for i = 1, . . . , n 1,
1
0
1
2/3
1/3
...
2
2/3
1/3
2/3
1/3
n1
2/3
2/3
(Technically one has to prove that the Markov chain (Nt )tT reaches the state n in expectation faster than the chain (Xt )tT does. But we omit the required coupling argument
here.)
We give two analyses of the algorithm here. The first analysis is for a naive version of
the algorithm Random Walk SAT, which yields (undesirable) expected O (2n ) formula
evaluations. Then we slightly modify the algorithm, which then yields O ((2 2/k)n )
formula evaluations, which is especially useful for small values of k.
Theorem 4.32. Let k = 3 then the algorithm Random Walk SAT has expected running
time O (poly (n) 2n ).
Proof. We choose m = 2n . If there is no satisfying assignment for the formula F , the
algorithm terminates after 2n formula evaluations. Each evaluation takes time O (poly (n)).
Now we assume that there is a satisfying assignment. We compute the vector t{n} of
{n}
expected hitting times for state n. We abbreviate ti = ti
for any initial state i
{0, . . . , n}. We apply Theorem 4.12 and have to solve the following system of linear
equations:
tn = 0
2
1
ti = ti1 + ti+1 + 1
3
3
t0 = t1 + 1.
for i = 1, . . . , n 1,
45
As there are 2n possible assignments, the above running time is not very appealing.
There are two insights that help reducing the number of formula evaluations:
(i) If we choose the initial assignment uniformly at random, then the number N0 is
binomial distributed with expected value n/2. However, there is an exponential
small, but non-negligible probability that N0 is significantly larger than n/2.
(ii) It is more likely to move towards 0 rather than towards n. Thus, the longer the
process runs, the more likely it has moved to 0. Hence we will choose a small value
for m and repeat the whole algorithm (with new initial assignments) more often.
Consider the algorithm Random Walk SAT with the following modifications: In
Step 1 the assignment x is chosen uniformly at random, in Step 2 we choose m = 3n (for
technical reasons), and the whole algorithm is repeated up to r = cn(2 2/k)n times,
where c is a constant defined later.
Theorem 4.33. Let k = 3 then the modified Random Walk SAT algorithm has expected
running time O (poly (n) (4/3)n ).
Proof. Recall that m = 3n and there are up to r = n/c (2 2/k)n = n(4/3)n repetitions
for k = 3, where c is a constant defined below. Thus, if the formula F does not have a
satisfying assignment, then the modified algorithm terminates after O (poly (n) (4/3)n )
evaluations that take O (poly (n)) time, each. Now, let F have a satisfying assignment, a,
say. The modified algorithm has up to m = 3n steps to reach a starting from a random
assignment. We now compute that r many repetitions suffice in expetation to actually
reach a satisfying assignment.
Let q be the probability that the process reaches a in up to m = 3n steps starting
from a random assignment. Below we define a lower bound qi on the probability that a is
reached from an initial assignment, in which exactly i variables do not agree with a, i.e.,
N0 = n i. Notice that
j i+j
2
i + 2j
1
j
3
3
is the probability that a random walk on Z with p = 1/3 moves exactly j times down and
exactly i + j times up. It is hence a lower bound on the probability that the algorithm
(i.e., the Markov chain X) reaches the assignment a (i.e., the state n) within i + 2j 3n
many steps starting with an assignment that has exactly i non-matching variables (i.e.,
initial state N0 = n i). Therefore we define and derive
j i+j i 2i
3i
2
i + 2j
2
1
1
qi := max
,
j=0,...,i
j
3
3
i
3
3
where we have considered the case i = j. Stirlings formula yields
3i
(3i)!
c 27 i
=
'
i
i!(2i)!
i 4
= i.
i
3
3
3
3
i 4
i2
46
n
X
i=0
Pr [N0 = n i] qi
n
X
n 1
c 1
1
i
n+
n
2
i 2
i2
i=1
n i
1
c 1 X n
1ni
n
2
i
n2
i=0
n n
1
c
3
=
2
n 2
n
c
3
=
,
n 4
P
where we have used the Binomial theorem yielding ni=0 ni 1/2i 1ni = (1 + 1/2)n =
(3/2)n . Thus, by the geometric distribution, 1/q n/c (4/3)n repetitions suffice in
expectation.
Remark 4.34. The above algorithm is of course also applicable for general values of k.
For example, for k = 2, the problem is solvable in polynomial time and the algorithm
achieves expected running time of O n2 . This is not very compelling, since there is a
deterministic algorithm with running time O (n). The following table depicts the expected
number of formula evaluations for larger values of k (without proof ):
k
3
4
5
6
k
4.7.2
Queueing System
Imagine a queue, where objects (customers, jobs, packets) wait for service. We consider
a model, where time is divided into slots (of length one) and the queue has a capacity
of n, i.e., at most n objects can be in the queue at any time. Let 0 < , < 1 be two
parameters, called arrival and departure probability (with + 1). Let Xt be the
number of objects in the queue at time t. At each time t, exactly one of the following
occurs:
(i) If Xt < n, then Xt+1 = Xt + 1 with probability , i.e., a new object enters.
(ii) If Xt > 0, then Xt+1 = Xt 1 with probability , i.e., one object disappears.
(iii) The queue remains unchanged otherwise.
The process is obviously a time-invariant Markov chain with the following transition
probabilities yielding a transition matrix P = (pi,j )1i,jn :
47
pi,i+1 = Pr [ Xt+1 = i + 1 | Xt = i] =
Pr [ Xt+1
pi,i1 = Pr [ Xt+1 = i 1 | Xt = i] =
if i = 0,
1
= i | Xt = i] = 1 if 1 i n 1,
1
if i = n.
if i < n,
if i > 0,
The Markov chain is irreducible, finite, and aperiodic. So it has a unique stationary
distributon satisfying = P . We have
0 = (1 )0 + 1 ,
i = i1 + (1 )i + i+1 ,
n = n1 + (1 )n .
One verifies that
fori = 1, . . . , n 1,
i
i = 0
Pn
i=0 i
= 1 we have
1
.
i
i=0 (/)
0 = Pn
Hence we find for j = 0, . . . , n
(/)j
j = Pn
.
i
i=0 (/)
Thus the long-term behaviour of the chain is given by the state-vector = (pi )0in
above.
4.7.3
Random walks on graphs occur once in a while in the analysis of algorithms. Let G =
(V, E) be a finite, connected, undirected graph. For a vertex i, let N (i) and d(i) = |N (i)|
denote the (number of) neighbours of i, i.e., its degree. A random walk on G is a Markov
chain X = (Xt )tT with Xt V the transition probabilities
pi,j = Pr [ Xt+1 = j | Xt = i] =
1
d(i)
Thus, a finite, non-bipartite, connected undirected graph yields a Markov chain with
a unique stationary distribution, having the following properties. Recall that si denotes
the expected return time for vertex i.
Theorem 4.36. A random walk X on G converges to a stationary distribution with
i =
d(i)
,
2|E|
and
si =
1
i
for all i V .
Proof. Since
iV
i =
X d(i)
=1
2|E|
iV
=
2|E| d(j)
2|E|
jN (i)
jN (i)
49