Markov Chains

Chapter 4
Markov Chains
Markov chains are an important tool for modeling stochastic processes that occur frequently in computer science. For example, after introducing the basic theory, we will
demonstrate how they are used for deriving randomized algorithms for Satisfiability.
4.1
Basics
Let T be any countable set, referred to as time. A sequence of random variables X =

(Xt )tT , where Xt S, is called a (discrete time) stochastic process with states S. We
will assume that T = N0 and S N0 here. A probability distribution = (i )iS over S
is called initial distribution if
Pr [X0 = i] = i
for all i S.
A stochastic process X satisfying the Markov condition

Pr [ Xt = it | Xt1 = it1 , . . . , X0 = i0 ] = Pr [ Xt = it | Xt1 = it1 ]
for all t T is called a (discrete time) Markov chain.
The condition does not say that the random varibles Xt and Xt1 , . . . , X0 are independent. It rather states that the state st which X assumes at time t only depends on
the state it1 , where X is at time t 1. In different words, the whole history earlier than
time t 1 is forgotten. The above property is also called memoryless property.
A Markov chain X is called time invariant if the equality Pr [ Xt = j | Xt1 = i] =
Pr [ Xt0 = j | Xt0 1 = i] holds for all times t, t0 T and all state pairs i, j S. It then
allows us to define
pi,j = Pr [ Xt = j | Xt1 = i]
as the (1-step) transition probability of moving from state i to state j. This obviously
induces a matrix
P = (pi,j )i,jS ,
called the transition matrix P . We denote a time invariant Markov chain X with initial
distribution and transition matrix P by M (P, ).
There is also a graphical representation of Markov chains: The vertices of a directed
graph G are the states S and there is an arc connecting states i, j S with a label pi,j .
The arcs with pi,j = 0 are usually not drawn.
28
Example 4.1. Suppose that a certain process can have one of two states, i.e., S = {0, 1}.
If in state 0, then the process jumps to state 1 with probability [0, 1]. If in state
1, then the process jumps to 0 with probability [0, 1]. Otherwise the state does not
change. This induces the following diagram
and transition matrix

1
P =
.
Observation 4.2. A discrete time stochastic process X = (Xs )0st is M (P, ) if and
only if
Pr [X0 = i0 , X1 = i1 , . . . , Xt = it ] = i0 pi0 ,i1 . . . pit1 ,it
holds for all i0 , . . . , it S.
Proof. If X = (Xs )0st is M (P, ) then
Pr [X0 = i0 , X1 = i1 , . . . , Xt = it ]
= Pr [X0 = i0 ] Pr [ X1 = i1 | X0 = i0 ] Pr [ Xt = it | X0 = i0 , . . . , Xt1 = it1 ]
= 0 pi0 ,i1 . . . pit1 ,it .
Conversely,
if the claimed condition holds for t, then by summing both sides and using
P
jS pi,j = 1 we see that it also holds for t 1. Thus the condition holds for the times
t0 = 0, 1, . . . , t. In particular we have Pr [X0 = i0 ] = 0 and for t0 = 0, . . . , t 1
Pr [ Xt0 +1 = it0 +1 | X0 = i0 , . . . , Xt0 = it0 ]
Pr [X0 = i0 , X1 = i1 , . . . , Xt0 +1 = it0 +1 ]
=
Pr [X0 = i0 , X1 = i1 , . . . , Xt0 = it0 ]
= pit0 ,it0 +1
and so the process is M (P, ).
Define the unit mass at state i S by a vector i = (i,j )jS with
(
1 if i = j,
i,j =
0 otherwise.
The following result is important for the theory of time invariant Markov chains as it
captures the idea of a restart.
Theorem 4.3 (Markov Property). Let (Xt )tT be M (P, ). Conditional on the event
Xm = i, the stochastic process (Xt+m )tT is M (P, i ) and independent of X0 , . . . , Xm .
Proof. We have to show that for any event A determined by X0 , . . . , Xm we have
Pr [ {Xm = im , . . . , Xm+t } A | Xm = i]
= i,im pim ,im+1 . . . pim+t1 ,im+t Pr [ A | Xm = i]

29
and then the result follows by Observation 4.2. First consider an elementary event
A = {X0 = i0 , . . . , Xm = im }.
In that case we have to show
Pr [ {Xm = im , . . . , Xm+t } A | Xm = i]
Pr [X0 = i0 , . . . , Xm+t = im+t , i = im ]
=
Pr [Xm = i]
i,im pim ,im+1 . . . pim+t1 ,im+t Pr [X0 = i0 , . . . , Xm = im ]
=
Pr [Xm = i]
= i,im pim ,im+1 . . . pim+t1 ,im+t Pr [ X0 = i0 , . . . , Xm = im | Xm = i] .
which is true by Observation 4.2. In general, any event A determined by X0 , . . . , Xm can
be written as a disjoint union of countable many elementary events Ak , i.e., A =
k=1 Ak .
The claim now follows by summation over the identities for the Ak .
Let pi (t) denote the probability that the process X is in state i at time t. Define the
row vector p(t) =P(ps (t))sS which is called the distribution of the chain at time t. Observe
that pi (t + 1) = jS pj,i pj (t).
Observation 4.4. If X is M (P, ), then we have
p(t + 1) = p(t) P
for all t T , where p(0) = .
(m)
Now define the m-step transistion probability pi,j = Pr [ Xt+m1 = j | Xt1 = i] and
(m)
analogusly pi
execise.
(t), and p(m) (t). The proof of the following basic observation is left as an
Observation 4.5. If X is M (P, ), then we have

p(m) (t) = p(t + m) = p(t) P m .
for all t, m T , where p(0) = .
Example 4.6. An application where the chain from Example 4.1 is useful is this: Suppose
you have a (potentially biased) bit B {0, 1} with Pr [B = 1] = p. If n independent
trials
P
Bi with i = 1, . . . , n are run, what is the probability that the sum X = ni=1 Bi is even
(respectively is odd)?
The above chain can model the parity of X. To this end, we choose = = p. The
process is then M (P, (1 p, p)). The values p0 (n) and p1 (n) are the probabilities that X
(after n trials) is even, respectively odd.
4.2
Class Structure
It is sometimes possible to break a chain into smaller pieces, where each piece is simpler to
understand and which together give an understanding of the whole. One way is to define
the communicating classes of the chain. We need some definitions. Let i, j S be two
states. We say that j is reachable from i if Pr [ Xt+m = j for some m T | Xt = i] > 0
and write i j. The proof of the theorem below is an excerise.
30
Theorem 4.7. For i 6= j S the following are equivalent:

(i) i j,
(m)
(ii) pi,j > 0 for some m T ,

(iii) pi0 ,i1 pi1 ,i2 pim1 ,im > 0 for some states i = i0 , i1 .i2 , . . . , im1 , im = j.
Two states i, j S communicate if j is reachable from i and i is reachable from
j. We then write i j. The equivalence classes to the relation are called the
(communicating) classes. We say that a class C S is closed if i C and i j imply
j C. That is, there is no way out of the class. A state i S is called absorbing if {i}
is a closed class. If a chain has exactly one class, then it is called irreducible.
Observation 4.8. The communicating classes of a chain X are exactly the 2-connected
components of the underlying directed graph G of the arcs with pi,j > 0.
4.3
Hitting Times and Probabilities
Let X be a Markov chain with states S and let A S be a state set. The random variable
H A T {} defined by H A () = inf{t T : Xt () A} is called the hitting time of
the state set. (The infimum of is .) Define the probability that, starting from state
i S, we hit A by

A

hA
i = Pr H < X0 = i .
Furthermore define by
A

X0 = i
tA
i =E H
the expected hitting time for A by starting from i. If A is a closed class, then hA
i is called
A
the absorbition probability and ti the absorbtion time, respectively. Define the vectors
A
A
hA = (hA
i )iS and t = (ti )iS . Below, the expression minimal solution means that, if
h is a minimal solution and h0 is any solution, then h0i hi for all i S.
Theorem 4.9. We have that hA is the minimal non-negative solution to the system of
linear equations given by
(
hA
for i A,
i =1
P
A
A
hi = jS hj pi,j for i 6 A.
Proof. First we show that hA must satisfy the above conditions. If X0 = i A, then
A
H A = 0 and hA
i = 1. If X0 = i 6 A, then H 1 and by the Markov property,

Pr H A < X1 = j, X0 = i = Pr H A < X1 = j = hA
j .
Thus

A

hA
i = Pr H < X0 = i
X

=
Pr H A < X1 = j, X0 = i Pr [ X1 = j | X0 = i]
jS
X
jS
hA
j pi,j .
31
Now suppose that x = (xi : i S) is any solution for the above conditions. We have to
show that x hA . Clearly, hA
i = xi = 1 for i A. For i 6 A we have
X
X
X
xi =
xj pi,j =
pi,j +
xj pi,j .
jS
j6A
jA
Now we substitute xj and obtain
X
X X
X
xi =
pi,j +
pj,k +
xk pj,k pi,j
jA
j6A
k6A
kA
= Pr [ X1 A | X0 = i] + Pr [ X2 A, X1 6 A | X0 = i] +
XX
pi,j pj,k xk .
j6A k6A
Thus, after repeated substitution, we obtain after m steps

xi = Pr [ X1 A | X0 = i] + Pr [ X2 A, X1 6 A | X0 = i] + . . .
+ Pr [ Xm A, Xm1 6 A, . . . , X1 6 A | X0 = i]
X
X
+
pi,j1 . . . pjm1 ,jm xjm .

j1 6A
jm 6A
Now, if x is non-negative,
so
is the last
the other
term on the right-hand
A side above and
A

terms sum to Pr H m X0 = i . Therefore xi Pr H m X0 = i . Hence we
have

xi lim Pr H A m X0 = i = Pr H A < X0 = i = hA
i
m
which is what we had to show.

Example 4.10. Consider the chain with the diagram below:
1/2
1
1/2
0
1/2
1
1/2
We ask ourselves what the probability of absorbtion in A = {3} is. Thus, using the
abbreviations hi = hA
i , we write the required system of equations:
h3 = 1
1
h2 = h1 +
2
1
h1 = h0 +
2
1
h3
2
1
h2 .
2
We resolve and find h1 = 1/3 + 2/3h0 and h2 = 2/3 + 1/3h0 . The value for h0 is not
determined by the system, but the minimality requirement takes us choose h0 = 0. Hence
we have h1 = 1/3, which means, that the probability of absorbtion in A = {3} when
starting from state 1 is 1/3. Analogously h2 = 2/3.
Example 4.11 (Gamblers Ruin). Suppose you gamble with initial wealth of i Euro. With
probability p you will win one Euro, with probability 1 p you will lose one Euro. If you
are broke, i.e., the wealth is 0 Euro, the game ends. What is the probability that you will
go broke?
This process obviously translates into a Markov chain with the following diagram:
32
p
1
1p
1p
p
3
1p
1p
Let 0 < p < 1. The transition probabilities are:

p0,0 = 1
pi,i1 = 1 p
pi,i+1 = p
for i = 1, 2, . . . ,
for i = 1, 2, . . . .
We are interested in the absorbing class A = {0}. As a shorthand define hi = hA

i , i.e., the
hitting probability for state 0 when starting from state i. We are interested in the minimal
non-negative solution h for
h0 = 1
hi = phi+1 + (1 p)hi1 ,
for i = 1, 2, . . . .
If p 6= 1 p, i.e, p 6= 1/2, then the solution of the recurrence has the form

hi = +
1p
p
i
,
for i = 0, 1, . . .
where and are as follows. If p < 1 p, then 0 hi 1 forces = 0 and so hi = 1 for

all i N. If p > 1 p, then since h0 = 1 we get

!

1p i
1p i
+ 1
hi =
,
p
p
where we have 0 due to non-negativity. Thus we must have = 0 and hence
hi = ((1 p)/p)i .
For p = 1 p, the recurrence has a general solution
hi = + i.
The restriction 0 hi 1 forces = 0 and, again, hi = 1 for all i N.
Concluding, even if you play a fair game, i.e., p = 1 p, you are certain to end up
broke if you happen to gamble for too long.
Not surprisingly, an analogue to Theorem 4.9 also exists for hitting times. The respective proof is left as an exercise.
Theorem 4.12. We have that tA is the minimal non-negative solution to the system of
linear equations given by
(
tA
for i A,
i =0
P
A
A
ti = 1 + jS tj pi,j for i 6 A.
33
4.4
Stopping Times and Strong Markov Property
Theorem 4.3 on the Markov property basically states that, if we stop the process at time
m, say, look in which state it is, and then continue, the restarted process is also a Markov
chain. What happens if we wait for the process to hit a certain state i, at a random
time M , say? Is the restarted process also a Markov chain? It turns out that the Markov
property also holds at stopping times.
A random variable M T {} is called a stopping time if the event {M = m}
depends only on X0 , . . . , Xm for any m T . Intuitively, you sense when the event occurs
by watching the process and if you are asked to stop at M , you know when to stop.
Example 4.13. The hitting time H A defined earlier is a stopping time because
{H A = m} = {X0 6 A, . . . , Xm1 6 A, Xm A},
i.e., the event that the set A is hit at time m depends only on X0 , . . . , Xm .
Theorem 4.14 (Strong Markov Property). Let (Xt )tT be M (P, ) and let M be a stopping time for it. Then, conditional on M < and XM = i, the process (Xt+M )tT is
M (P, i ) and independent of X0 , . . . , XM .
Proof. If A is an event determined by X0 , . . . , XM , then A {M = m} is determined by
X0 , . . . , Xm . So, by conditioning and the Markov property at time m
Pr [{XM = i0 , . . . , XM +t = it } A {M = m} {XM = i}]
= Pr [ XM = i0 , . . . , XM +t = it | A {M = m} {XM = i}]
Pr [A {M = m} {XM = i}]
= Pr [ X0 = i0 , . . . , Xt = it | X0 = i] Pr [A {M = m} {XM = i}] ,
where we have used the condition {M = m}. Now we sum over m = 0, 1, . . . and divide
by Pr [M < , XM = i] to obtain
Pr [ {XM = i0 , . . . , XM +t = it } A | M < , XM = i]
= Pr [ XM = i0 , . . . , XM +t = it | A, M < , XM = i] Pr [ A | M < , XM = i]
= Pr [ X0 = i0 , . . . , Xt = it | X0 = i] Pr [ A | M < , XM = i]
which is what we have to show.
4.5
Recurrence, Transience, and Random Walks
It is sometimes of interest to know if a process keeps coming back to a certain state

(recurrent) or if it eventually leaves the state forever (transient). We will show that each
state is either recurrent or transient. In particular, the states of a communicating class
are either all recurrent or all transient.
Let (Xt )tT be M (P, ). We say that a state is recurrent if
Pr [ Xt = i for infinitely many t | X0 = i] = 1.
A state is transient if
Pr [ Xt = i for infinitely many t | X0 = i] = 0.
34
The first passage time to state i is the random variable Ti defined by Ti = inf{t 1 :
(0)
(1)
Xt = i}. Now define the r-th passage time inductively Ti = 0, Ti = Ti , and
(r+1)
Ti
(r)
= inf{t Ti
+ 1 : Xt = i}.
The length of the r-th excursion to i is the r-th sojourn time

(
(r)
(r1)
(r1)
Ti Ti
if Ti
< ,
(r)
Si =
0
otherwise.
(r1)
(r)
Lemma 4.15. For r = 2, 3, . . . , conditional on Ti

< , Si is independent of {Xt :
(r1)
t Ti
} and

h
i
(r1)
(r)
Pr Si = m Ti
< = Pr [ Ti = m | X0 = i] .
(r1)
. We
Proof. We apply the strong Markov property at the stopping time R = Ti
clearly have XR = i on R < . So, conditional on R < , (XR+m )m0 is M (P, i ) and
(r)
(r)
independent of X0 , . . . , XR . But Si = inf{m 1 : XR+m = i}, so Si is the first passage
time of (XR+m )m0 to state i.
Let the random variable Vi,t indicate P
the event {Xt = i}, i.e., a visit of state i at time
t. Then the number of visits to i is Vi = tT Vi,t . Notice that
X
X
X (t)
E [ Vi | X0 = i] =
E [ Vi,t | X0 = i] =
Pr [ Xt = i | X0 = i] =
pi,i .
tT
tT
tT
We can compute the distribution of Vi in terms of the return probability

fi = Pr [ Ti < | X0 = i] .
Lemma 4.16. For r = 0, 1, 2, . . . , we have Pr [ Vi > r | X0 = i] = fir .
(r)
Proof. Observe that if X0 = i then {Vi > r} = {Ti < }. When r = 0 the result is
true. Suppose inductively that it is true for r, then

h
i

Pr [ Vi > r + 1 | X0 = i] = Pr T (r+1) < X0 = i

h
i

(r)
(r+1)
= Pr Ti < , Si
< X0 = i

h
i h
i
(r)

(r+1)
(r)
= Pr Si
< Ti < , X0 = i Pr Ti < X0 = i
= fi fir = fir+1
by Lemma 4.15 and the claim is proved.
Now we give the desired result.
Theorem 4.17. The following dichotomy holds:
(t)
(i) If Pr [ Ti < | X0 = i] = 1 then i is recurrent and
tT
pi,i = .
(ii) If Pr [ Ti < | X0 = i] < 1 then i is transient and
tT
pi,i < .
35
(t)
In particular each state is either transient or recurrent.

Proof. If Pr [ Ti < | X0 = i] = 1, then by Lemma 4.16
Pr [ Vi = | X0 = i] = lim Pr [ Vi > r | X0 = i] = 1
r
the state i is recurrent and

X
tT
(t)
pi,i = E [ Vi | X0 = i] = .
On the other hand, if fi = Pr [ Ti < | X0 = i] < 1, then by Lemma 4.16

X (t)
pi,i = E [ Vi | X0 = i]
tT
X
r=0
Pr [ Vi > r | X0 = i]
fir =
r=0
1
<
1 fi
so Pr [ Vi = | X0 = i] = 0 and i is transient.
P
(t)
It is useful to remember the criterion that a state i is recurrent if and only if tT pi,i =
. Moreover, recurrence and transience are properties of communicating classes. Here we
state without proof:
Corollary 4.18. We have that:
(i) Either all states in a class are transient or all are recurrent.
(ii) Every recurrent class is closed.
(iii) Every finite closed class is recurrent.
Corollary 4.19. If P is irreducible and recurrent, then for all i S we have
Pr [Ti < ] = 1.
Example 4.20. In Example 4.10 the classes {0} and {3} are recurrent. The class {1, 2}
is transient.
It is easy to spot closed classes, so the recurrence or transience of finie classes is easy
to determine. On the other hand, infinite closed classes may be transient. Thus we are
now interested in (irreducible) Markov chains with an infinite number of states.
36
4.5.1
Random Walk on Z
The random walk on Z for 0 < p < 1 is given by

and pi,i1 = 1 p
pi,i+1 = p
for all i Z.
(2t+1)
If we start at 0, we can not return after an odd number of steps, so p0,0

= 0 for all
t T . Any sequence of steps of length 2t from 0 to 0 occurs with probability pt (1 p)t .
There are obviously 2tt ways of choosing such a sequence. Therefore

2t t
(2t)
p0,0 =
p (1 p)t .
t
Using Stirlings approximation n! ' 2n(n/e)n yields

(2t)
p0,0 =
(2t)!
(4p(1 p))t
t
,
(p(1
p))
'
(t!)2
t
Case 1. If p = 1 p, i.e., p = 1/2 (called the symmetric case), then 4p(1 p) = 1 then
(2t)
p0,0 1/ t and hence
X
(t)
p0,0 =
tT
X
tT
(2t)
p0,0
X 1
=
t
tT
and the state 0 is recurrent. Hence, since the Markov chain is irreducible, all states
are recurrent.
(2t)
Case 2. If p 6= 1 p, then 4p(1 p) = r < 1 and we have p0,0 rt / . Hence
X
(t)
p0,0 =
tT
X
tT
1 X t
(2t)
p0,0
r < ,
tT
since r < 1. In that case the random walk is transient.

It is an excercise to show that the symmetric random walk on Z2 is also recurrent.
(The transition probabilities are pi,j = 1/4 if |i j| = 1 and pi,j = 0 otherwise.)
4.5.2
Random Walk on Z3
While the symmetric random walks on Z and Z2 are recurrent, this is not the case for
Z3 as we will show here. The symmetric random walk on Z3 is given by the transition
probabilities
(
1
if |i j| = 1,
pi,j = 6
0 otherwise.
That is, in the integer grid Z3 , the walk jumps to one of the nearest neighbors equiprobably.
Again, starting from state 0 = (0, 0, 0), we can only return after an even number of steps,
2t, say. In each of the dimensions, we must have the same number of positive and negative
jumps. Thus
X
2
2t
t
1
(2t)
.
p0,0 =
t
i j k 62t
i,j,k0
i+j+k=t
37
Observe that
X
i,j,k0
i+j+k=t

t
1
=1
i j k 3t
because the left hand side is the total probability of placing t balls equiprobably into three
bins. For t = 3m we have the upper bound

t
t!
t!
3m
=
=
ijk
i!j!k!
m!m!m!
mmm
for all i, j, k with i + j + k = t = 3m. Hence, using Stirlings formula

X 3m 2 1
6m
(2t)
(6m)
p0,0 = p0,0 =
i j k 63m
3m
i,j,k0
i+j+k=3m

6m 1
3m
1
6m
3m
3m 2
mmm 3
X
i,j,k0
i+j+k=3m

3m
1
3m
ijk 3

6m 1
3m
1
=
6m
6m
3m 2
mmm 3
6m

e 6m
1
6m
26m
6m
'
2
e
3m
23m 2

3m
3m
e 3m
23m
1
3m
3
e
m
2m 3
26m
1
23m
1
3m
6m
=2
2 26m 3
3 33m
23m
2m
1
1
= 3/2 .
2 m
Let c = 1/2 and thus
(6m)
p0,0
m0
(6m)
But p0,0
(6m2)
(6m)
tT
m1
1
< ,
m3/2
m0
X
(6m4)
(1/6)2 p0,0
and p0,0 (1/6)4 p0,0
. Therefore
X (2t) X (6m)
(6m2)
(6m4)
p0,0 =
p0,0 + p0,0
+ p0,0
<
and the random walk is transient.
4.6
Invariant Distributions and Convergence
As we will see shortly, it is often useful to consider the long-term properties of a Markov
chain. Such behaviour is often connected with the notion of an invariant distribution.
A measure P
is any row vector = (i )iS with i 0 for all i S. A measure is a
distribution if iS i = 1. A measure = (i )iS is invariant if
= P,
where P is the transition matrix of a (time invariant, discrete) Markov chain. (Equivalent
notions are equilibrium or stationary measure.)
38
Observation 4.21. If (Xt )tT is M (P, ), where is invariant, then, for any m T ,
(Xt+m )tT is also M (P, ).
Proof. We clearly have Pr [Xm = i] = (P m )i = i for all i S. Conditional on Xm+t = i,
Xm+t+1 is independent of Xm , Xm+1 , . . . , Xm+t and has the distribution j for j S.
The following observation basically states that, if the m-step transition probabilities
converge, then this yields an invariant distribution. The central result, Theorem 4.29
presented later on, gives a sufficient condition which types of Markov chains converge to
equilibrium (even for every initial distribution).
Observation 4.22. Let X be M (P, ) with finite S. If for some i S we have for all
(m)
j S that pi,j j as m , then = (j )jS is an invariant distribution.
Proof. We have
X
j =
jS
and
(t)
j = lim pi,j = lim

t
X
jS
(t)
lim pi,j = lim
(t1)
pi,k
pk,j =
kS
X
kS
(t)
pi,j = 1
jS
(t1)
lim pi,k
pk,j =
k pk,j
kS
where we have used finiteness of S to justify the exchange of summation and limit. Hence
is an invariant distribution.
Notice that the finiteness of S is essential: In the random walks on Z{1,2,3} discussed
(m)
earlier, we clearly have pi,j 0 as m . The limit measure is certainly invariant, but
it is not a distribution.
We now show that every irreducible and recurrent Markov chain has an essentially
unique invariant distribution. For a fixed state k, let the expected time spent in state i
between visits to k be

" T 1
#
k

X

k
i = E
Vi,t X0 = k .

t=0
Theorem 4.23. Let P be irreducible and recurrent. Then we have

(i) kk = 1,
(ii) k = (ik )iS satisfies k = k P ,
(iii) 0 < ik < for all i S.
Proof. Statement (i) is obvious. For statement (ii), for t = 1, 2, . . . the event {Tk t}
depends only on X0 , . . . , Xt1 , so by the Markov property at t 1,
Pr [ Xt1 = i, Xt = j, Tk t | X0 = k] = pi,j Pr [ Xt1 = i, Tk t | X0 = k] .
39
Since P is recurrent, we have Tk < with probability one and X0 = XTk = k by condition.
Hence

#
"T
k

X

1{Xt =j} X0 = k
jk = E

t=1

"
#

X

=E
1{Xt =j,Tk t} X0 = k

t=1
X
t=1
Pr [ Xt = j, Tk t | X0 = k]
XX
iS t=1
pi,j
Pr [ Xt1 = i, Xt = j, Tk t | X0 = k]
X
t=1
iS
"
Pr [ Xt1 = i, Tk t | X0 = k]
pi,j E
X
t=1
" T 1
k
X
pi,j E
1{Xt =i}
t=0
iS
1{Xt1 =i,Tk t}
iS

#

X0 = k

#

X0 = k

ik pi,j .
iS
(n)
(m)
(m)
For (iii), for each state i, there exist n, m 0 with pi,k , pk,i > 0. Then ik kk pk,i > 0
(n)
and ik pi,k kk = 1 by (i) and (ii).
Theorem 4.24. Let P be irreducible and let be an invariant measure for P with k = 1.
Then k . If, in addition, P is recurrent, then = k .
Proof. For each j S we have
X
X
j =
i0 pi0 ,j =
i0 pi0 ,j + pk,j
i0 S
i0 6=k
i1 pi1 ,i0 pi0 ,j + pk,j +
i0 ,i1 6=k
pk,i0 pi0 ,j
i0 6=k
..
.
=
it pit ,it1 . . . pi0 ,j
i0 ,...,it 6=k
+ pk,j +
X
i0 6=k
t
X
s=1
jk
pk,i0 pi0 ,j + +
X
i0 ,...,it1 6=k
Pr [ Xs = j, Tk s | X0 = k]
for t .
40
pk,it1 . . . pi0 ,j
Hence k . If P is recurrent, then k is invariant by Theorem 4.23, so = k is

(t)
also invariant and 0. Since P is irreducible, given i S, we have pi,k > 0 for some t,
P
(t)
(t)
and 0 = k = jS j pj,k i pi,k , so i = 0.
The next result states how invariant distributions look like. Recall the first passage
time Ti for state i and define the expected return time si = E [ Ti | X0 = i]. If a recurrent
state has finite expected return time, then it is called positive recurrent, otherwise null
recurrent.
Theorem 4.25. Let P be irreducible. Then the following are equivalent:
(i) Every state is positive recurrent.
(ii) Some state i is positive recurrent.
(iii) P has an invariant distribution = (i )iS .
In particular, if (iii) holds, then i = 1/si for all i S.
Proof. Obviously (i) implies (ii). Now we show (ii) implies (iii). If i is positive recurrent,
it is certainly recurrent, so P is recurrent. By Theorem 4.23, i is then invariant. But
X
ji = si <
jS
by positive recurrence, so j = ji /si defines an invariant

Pdistribution. Finally (iii)
implies (i): Fix any state k. Since P is irreducible and
iS i = 1 we have k =
P
(t)
iS i pi,k > 0 for some t. Set i = i /k . Then is an invariant measure with k = 1.
So by Theorem 4.24 k . Hence
sk =
X
iS
ik
X i
1
=
<
k
k
iS
and k is positive recurrent.

If the state space is finite and the limit in Observation 4.22 exists, then it must be an
invariant distribution. But the limit does not always exist, as the example below shows.
Example 4.26. The two-state Markov chain with the transition matrix

0 1
P =
1 0
and the diagram
1
(m)
has the property that P 2m = I and P 2m+1 = P . Thus the pi,j do not converge for all
i, j. The cause is that the chain is periodic, that is, if we know that the chain is in state
0 at time t, then we know for certain that it is not in state 0 at time t + 1.
41
(t)
A state i is called d-periodic if there is an integer d 1 such that pi,i = 0 unless t is

divisible by d. If d > 1 the state i is periodic, if d = 1 the state i is aperiodic. If any state
of a Markov chain is periodic, the chain is called periodic. Otherwise, i.e., all states are
aperiodic, the chain is aperiodic. Notice that a state i is aperiodic if and only if the set
(t)
{t 0 : pi,i > 0} has no common divisor other than 1.
(2t+1)
The chain in the above example is 2-periodic because pi,i

t 0 and i {0, 1}.
(2t)
= 0 and pi,i = 1 for all
(m)
Lemma 4.27. A state i is aperiodic if and only if pi,i > 0 for all sufficiently large m.
(m)
Proof. Let i S satisfy the condition that pi,i > 0 for all sufficiently large m. Then the
(t)
set {t T : pi,i > 0} has no common divisor other than one. Hence i is aperiodic.
Let i S be aperiodic. With the use of the Extended Euclidean Algorithm we
have the following statement: For a, b N, there is n0 N such that, for d = gcd(a, b), and
(a) (b)
any n n0 , there are x, y N0 such that nd = xa + yb. If pi,i , pi,i > 0, then we also have
(nd)
pi,i
(xa+yb)
= pi,i
(a)
(b)
= (pi,i )x (pi,i )y > 0. Since i is aperiodic, there is a sequence of numbers

(a )
a0 , a1 , . . . , ak such that pi,i` > 0 for ` = 0, . . . , k and d1 = gcd(a0 , a1 ), d` = gcd(d`1 , a` )

for ` = 2, . . . , k with the property d1 > d2 > > dk = 1. These two properties imply
(m)
that pi,i > 0 for all sufficiently large m, as claimed.
Lemma 4.28. Let P be irreducible and have an aperiodic state i. Then, for all j, k S
(m)
we have pj,k > 0 for all sufficiently large m. In particular, all states are aperiodic.
(r)
(s)
(t)
Proof. There exist r, s, t 0 with pj,i , pi,i , pi,k > 0. Then

(r+s+t)
pj,k
(r) (s) (t)
pj,i pi,i pi,k > 0
for all sufficiently large m = r + s + t, where we have used that state i is aperiodic and
Lemma 4.27.
Theorem 4.29 (Convergence). Let P be irreducible and aperiodic, and suppose that P
has an invariant distribution . Let be any distribution. If X is M (P, ) then
Pr [Xt = j] j
as t for all j S.
(t)
In particular pi,j j as t for all i, j S.

Proof. We use a coupling argument. Let Y be M (P, ) and independent of X. Fix a
reference state b and set
M = inf{t 1 : Xt = Yt = b}.
Step 1. We show Pr [M < ] = 1. The process W = (Wt )tT defined by Wt = (Xt , Yt )
is a Markov chain on S S with transition probabilities q(i,k),(j,`) = pi,j pk,l and
initial distribution (i,k) = i k . Since P is aperiodic, for all states i, j, k, ` we have
(t)
(t) (t)
q(i,k),(j,`) = pi,j pk,l > 0 for sufficiently large t. So the matrix Q is irreducible and has
an invariant distribution (i,k) = i k . So by Theorem 4.25 Q is positive recurrent.
M is the first passage time of W to (b, b), so Pr [M < ] = 1, by Corollary 4.19.
42
Step 2. Define the process

(
Xt
Zt =
Yt
if t < M ,
if t M .
and we will show that Z = (Zt )tT is M (P, ).

The strong Markov property applies to W at time M since this is a stopping time.
Thus (XM +t , YM +t )tT is M (Q, (b,b) ) and independent of (X0 , Y0 ), . . . , (XM , YM ).
By symmetry, we can replace the process (XM +t , YM +t )tT by (YM +t , XM +t )tT ,
which is also M (Q, (b,b) ) and independent of (X0 , Y0 ), . . . , (XM , YM ). Hence the
process W 0 = (Wt0 )tT with Wt0 = (Zt , Zt0 ) and
(
Yt if t < M ,
0
Zt =
Xt if t M
is M (Q, ). In particular Z is M (P, ).
Step 3. We have
Pr [Zt = j] = Pr [Xt = j, t < M ] + Pr [Yt = j, t M ]
and hence since X and Z are M (P, )
|Pr [Xt = j] j | = |Pr [Zt = j] Pr [Yt = j] |
= |Pr [Xt = j, t < M ] Pr [Yt = j, t < M ] |
Pr [t < M ]
0
for t ,
completing the proof.

An aperiodic, positive recurrent state is an ergodic state. A Markov chain is ergodic if
all its states are ergodic.
Observation 4.30. Any finite, irreducible, and aperiodic Markov chain X = (Xt )tT is
an ergodic chain.
As the central result for discrete space, discrete time, and time-invariant Markov chains,
we state the following direct corollary without proof.
Corollary 4.31 (Ergodic Theorem). Let X = (Xt )tT be finite, irreducible, and ergodic
Markov chain M (P, ). Then X has the following properties:
(i) X has a unique stationary distribution = (i )iS ;
(ii) Pr [Xt = i] i as t for all i S;
(iii) i = 1/si for all i S.
43
4.7
Applications
4.7.1
Satisfiability
Recall the problem Satisfiability: There is a set of n Booean variables X = {x1 , . . . , xn }

with xi {0, 1} for i = 1, . . . , n. This induces a set of 2n literals L = {x1 , x
1 , . . . , xn , x
n },
where x
= 1 x denotes the negation of x. A clause is a disjunction of literals, i.e.,
C = (`1 `k ) with li L for i = 1, . . . , k. The length k(C) of a clause C is the
number of literals in C. A formula F is a conjunction of clauses, i.e., F = C1 C` .
Here the length `(F ) of the formula F is the number of clauses in F . We are given a
Boolean formula F and have to decide if it is satisfiable, i.e., if there is an assignment of
the xi {0, 1} such that each clause evaluates to 1.
The problem k-Satisfiability requires that each clause has length at most k. If
k 2, the problem can be solved deterministically in (polynomial) time O (n). For k 3,
the problem is NP-complete. An obvious solution procedure is to try all assignments,
which requires 2n formula-evaluations. Here we give a randomized Monte Carlo algorithm
which produces a correct answer in expected time O (poly (n) (2 2/k)n ). This is in
particular interesting for small values of k. In the analysis, we will assume k = 3 only for
sake of exposition. The basic idea of Algorithm 4.1 is this: While there is an unsatisfied
clause C, pick a literal in it uniformly at random and flip it, i.e., change the value of the
corresponding variable x to 1 x. If no satisfying assignment was found after a certain
number m of these attempts, then abort.
Algorithm 4.1 Random Walk SAT
Input. Boolean formula F , integer m
Output. Satisfying assignment x {0, 1}n , or unsatisfiable
Step 1. Start with an arbitrary assignment x {0, 1}n .
Step 2. Repeat up to m times, terminating if x satisfies F :
(a) Choose arbitrary unsatisfied clause C.
(b) Choose literal ` C uniformly at random and let x(`) be the corresponding
variable. Set x(`) = 1 x(`).
Step 3. Return unsatisfiable.
First we assume that F does not have a satisfying assignment. In that case the algorithm will always (correctly) output unsatisfiable. Now assume that F has a satisfying assignment a {0, 1}n , say. Let the assignment after t steps of the algorithm be
At {0, 1}n and let Nt denote the number of variables in At that have the same value as
in a, i.e., the variables that match. Of course, Nt {0, . . . , n} and if Nt = n holds, then
the satisfying assignment a is found.
In each step, we choose an unsatisfied clause C and hence At and a must disagree in at
least one variable. As k = 3, the probability that the algorithm flips one of these variables
44
is thus at least 1/3. Hence we have

1
Pr [ Nt+1 = i + 1 | Nt = i] ,
3
2
Pr [ Nt+1 = i 1 | Nt = i] .
3
for i = 0, . . . , n 1,
for i = 1, . . . , n 1,
We can analyze the expected number of steps until Nt = n holds with the Markov
chain X = (Xt )tT , where X0 = N0 and
Pr [ Xt+1 = 1 | Xt = 0] = 1,
1
Pr [ Xt+1 = i + 1 | Xt = i] = ,
3
2
Pr [ Xt+1 = i 1 | Xt = i] = ,
3
Pr [ Xt+1 = n | Xt = n] = 1.
for i = 1, . . . , n 1,
for i = 1, . . . , n 1,
The diagram is hence:

1/3
1
0
1
2/3
1/3
...
2
2/3
1/3
2/3
1/3
n1
2/3
2/3
(Technically one has to prove that the Markov chain (Nt )tT reaches the state n in expectation faster than the chain (Xt )tT does. But we omit the required coupling argument
here.)
We give two analyses of the algorithm here. The first analysis is for a naive version of
the algorithm Random Walk SAT, which yields (undesirable) expected O (2n ) formula
evaluations. Then we slightly modify the algorithm, which then yields O ((2 2/k)n )
formula evaluations, which is especially useful for small values of k.
Theorem 4.32. Let k = 3 then the algorithm Random Walk SAT has expected running
time O (poly (n) 2n ).
Proof. We choose m = 2n . If there is no satisfying assignment for the formula F , the
algorithm terminates after 2n formula evaluations. Each evaluation takes time O (poly (n)).
Now we assume that there is a satisfying assignment. We compute the vector t{n} of
{n}
expected hitting times for state n. We abbreviate ti = ti
for any initial state i
{0, . . . , n}. We apply Theorem 4.12 and have to solve the following system of linear
equations:
tn = 0
2
1
ti = ti1 + ti+1 + 1
3
3
t0 = t1 + 1.
for i = 1, . . . , n 1,
Solving this recurrence yields

ti = 2n+2 2i+2 3(n i).
45
As there are 2n possible assignments, the above running time is not very appealing.
There are two insights that help reducing the number of formula evaluations:
(i) If we choose the initial assignment uniformly at random, then the number N0 is
binomial distributed with expected value n/2. However, there is an exponential
small, but non-negligible probability that N0 is significantly larger than n/2.
(ii) It is more likely to move towards 0 rather than towards n. Thus, the longer the
process runs, the more likely it has moved to 0. Hence we will choose a small value
for m and repeat the whole algorithm (with new initial assignments) more often.
Consider the algorithm Random Walk SAT with the following modifications: In
Step 1 the assignment x is chosen uniformly at random, in Step 2 we choose m = 3n (for
technical reasons), and the whole algorithm is repeated up to r = cn(2 2/k)n times,
where c is a constant defined later.
Theorem 4.33. Let k = 3 then the modified Random Walk SAT algorithm has expected
running time O (poly (n) (4/3)n ).
Proof. Recall that m = 3n and there are up to r = n/c (2 2/k)n = n(4/3)n repetitions
for k = 3, where c is a constant defined below. Thus, if the formula F does not have a
satisfying assignment, then the modified algorithm terminates after O (poly (n) (4/3)n )
evaluations that take O (poly (n)) time, each. Now, let F have a satisfying assignment, a,
say. The modified algorithm has up to m = 3n steps to reach a starting from a random
assignment. We now compute that r many repetitions suffice in expetation to actually
reach a satisfying assignment.
Let q be the probability that the process reaches a in up to m = 3n steps starting
from a random assignment. Below we define a lower bound qi on the probability that a is
reached from an initial assignment, in which exactly i variables do not agree with a, i.e.,
N0 = n i. Notice that
j i+j

2
i + 2j
1
j
3
3
is the probability that a random walk on Z with p = 1/3 moves exactly j times down and
exactly i + j times up. It is hence a lower bound on the probability that the algorithm
(i.e., the Markov chain X) reaches the assignment a (i.e., the state n) within i + 2j 3n
many steps starting with an assignment that has exactly i non-matching variables (i.e.,
initial state N0 = n i). Therefore we define and derive

j i+j i 2i
3i
2
i + 2j
2
1
1
qi := max
,
j=0,...,i
j
3
3
i
3
3
where we have considered the case i = j. Stirlings formula yields

3i
(3i)!
c 27 i
=
'
i
i!(2i)!
i 4
with the unimportant constant c = 3/(8 ). Hence, for i > 0

i 2i

3i
2
1
c 27 i 2 i 1 2i
c 1
qi
= i.
i
3
3
3
3
i 4
i2
46
Now we estimate the overall success-probability

q
n
X
i=0
Pr [N0 = n i] qi
n
X
n 1
c 1
1
i
n+
n
2
i 2
i2
i=1
n i
1
c 1 X n
1ni
n
2
i
n2
i=0
n n
1
c
3
=
2
n 2
n
c
3
=
,
n 4

P
where we have used the Binomial theorem yielding ni=0 ni 1/2i 1ni = (1 + 1/2)n =
(3/2)n . Thus, by the geometric distribution, 1/q n/c (4/3)n repetitions suffice in
expectation.
Remark 4.34. The above algorithm is of course also applicable for general values of k.
For example, for k = 2, the problem is solvable in polynomial time and the algorithm
achieves expected running time of O n2 . This is not very compelling, since there is a
deterministic algorithm with running time O (n). The following table depicts the expected
number of formula evaluations for larger values of k (without proof ):
k
3
4
5
6
k
4.7.2
evaluations (up to poly (n))

(4/3)n ' 1.33n
(3/2)n ' 1.5n
(8/5)n ' 1.6n
(5/3)n ' 1.66n
(2 2/k)n
Queueing System
Imagine a queue, where objects (customers, jobs, packets) wait for service. We consider
a model, where time is divided into slots (of length one) and the queue has a capacity
of n, i.e., at most n objects can be in the queue at any time. Let 0 < , < 1 be two
parameters, called arrival and departure probability (with + 1). Let Xt be the
number of objects in the queue at time t. At each time t, exactly one of the following
occurs:
(i) If Xt < n, then Xt+1 = Xt + 1 with probability , i.e., a new object enters.
(ii) If Xt > 0, then Xt+1 = Xt 1 with probability , i.e., one object disappears.
(iii) The queue remains unchanged otherwise.
The process is obviously a time-invariant Markov chain with the following transition
probabilities yielding a transition matrix P = (pi,j )1i,jn :
47
pi,i+1 = Pr [ Xt+1 = i + 1 | Xt = i] =
Pr [ Xt+1
pi,i1 = Pr [ Xt+1 = i 1 | Xt = i] =
if i = 0,
1
= i | Xt = i] = 1 if 1 i n 1,
1
if i = n.
if i < n,
if i > 0,
The Markov chain is irreducible, finite, and aperiodic. So it has a unique stationary
distributon satisfying = P . We have
0 = (1 )0 + 1 ,
i = i1 + (1 )i + i+1 ,
n = n1 + (1 )n .
One verifies that
fori = 1, . . . , n 1,
i
i = 0
is a solution for the above system. With the additional requirement
Pn
i=0 i
= 1 we have
1
.
i
i=0 (/)
0 = Pn
Hence we find for j = 0, . . . , n
(/)j
j = Pn
.
i
i=0 (/)
Thus the long-term behaviour of the chain is given by the state-vector = (pi )0in
above.
4.7.3
Random Walks on Graphs
Random walks on graphs occur once in a while in the analysis of algorithms. Let G =
(V, E) be a finite, connected, undirected graph. For a vertex i, let N (i) and d(i) = |N (i)|
denote the (number of) neighbours of i, i.e., its degree. A random walk on G is a Markov
chain X = (Xt )tT with Xt V the transition probabilities
pi,j = Pr [ Xt+1 = j | Xt = i] =
1
d(i)
for any {i, j} E and pi,j = 0 otherwise.

For random walks on undirected graphs, there is a simple criterion for its aperiodicy:
Non-bipartiteness of G, which will also be assumed througout.
Lemma 4.35. A random walk X on G is aperiodic if and only if G is not bipartite.
Proof. A graph is bipartite if and only if it does not have odd cycles. In an undirected
graph, there is always a path from a vertex to itself. If the graph is bipartite, then the
random walk is 2-periodic. If the graph is not bipartite, then we have an odd cycle, whose
traversal yields an odd path from a vertex to itself.
48
Thus, a finite, non-bipartite, connected undirected graph yields a Markov chain with
a unique stationary distribution, having the following properties. Recall that si denotes
the expected return time for vertex i.
Theorem 4.36. A random walk X on G converges to a stationary distribution with
i =
d(i)
,
2|E|
and
si =
1
i
for all i V .
Proof. Since
iV
d(i) = 2|E| we have that

X
iV
i =
X d(i)
=1
2|E|
iV
and is a distribution over V . If P denotes the transition matrix, the condition = P

reads
X
X d(j)
1
d(i)
i =
j pj,i =
=
2|E| d(j)
2|E|
jN (i)
jN (i)
and the proof is complete.

Example 4.37 (Random Chessboard Knight). Suppose a knight is in one corner i of a
chessboard. If the knight moves randomly, what is the expected number of moves the
knight makes before returning to i? The above theorem states that we have to compute
d(i) and |E|. The vertex i has 2 neighbours. Furthermore, there are 4 corners with
degree P
2, 8 vertices with degree 3, 20 with degree 4, 16 with 6, and 16 with 8. Thus
2|E| = iV d(i) = 8 + 24 + 80 + 96 + 128 = 336 and hence si = 336/2 = 168.
Alternatively, the same result can be obtained with the approach = P if you enjoy
solving a system of 64 simultaneous linear equations.
49

Markov Chains

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Markov Chains

Uploaded by

Copyright:

Available Formats

Chapter 4

Let T be any countable set, referred to as time. A sequence of random variables X =

A stochastic process X satisfying the Markov condition

and transition matrix

= 0 pi0 ,i1 . . . pit1 ,it .

= i,im pim ,im+1 . . . pim+t1 ,im+t Pr [ A | Xm = i]

Observation 4.5. If X is M (P, ), then we have

Theorem 4.7. For i 6= j S the following are equivalent:

(ii) pi,j > 0 for some m T ,

Hitting Times and Probabilities

Now we substitute xj and obtain

Thus, after repeated substitution, we obtain after m steps

pi,j1 . . . pjm1 ,jm xjm .

which is what we had to show.

Let 0 < p < 1. The transition probabilities are:

We are interested in the absorbing class A = {0}. As a shorthand define hi = hA

where and are as follows. If p < 1 p, then 0 hi 1 forces = 0 and so hi = 1 for

Stopping Times and Strong Markov Property

which is what we have to show.

Recurrence, Transience, and Random Walks

It is sometimes of interest to know if a process keeps coming back to a certain state

The length of the r-th excursion to i is the r-th sojourn time

Lemma 4.15. For r = 2, 3, . . . , conditional on Ti

We can compute the distribution of Vi in terms of the return probability

(i) If Pr [ Ti < | X0 = i] = 1 then i is recurrent and

(ii) If Pr [ Ti < | X0 = i] < 1 then i is transient and

In particular each state is either transient or recurrent.

the state i is recurrent and

On the other hand, if fi = Pr [ Ti < | X0 = i] < 1, then by Lemma 4.16

The random walk on Z for 0 < p < 1 is given by

If we start at 0, we can not return after an odd number of steps, so p0,0

Using Stirlings approximation n! ' 2n(n/e)n yields

since r < 1. In that case the random walk is transient.

Let c = 1/2 and thus

and the random walk is transient.

Invariant Distributions and Convergence

j = lim pi,j = lim

lim pi,j = lim

Theorem 4.23. Let P be irreducible and recurrent. Then we have

and ik pi,k kk = 1 by (i) and (ii).

i1 pi1 ,i0 pi0 ,j + pk,j +

it pit ,it1 . . . pi0 ,j

Hence k . If P is recurrent, then k is invariant by Theorem 4.23, so = k is

by positive recurrence, so j = ji /si defines an invariant

and k is positive recurrent.

A state i is called d-periodic if there is an integer d 1 such that pi,i = 0 unless t is

The chain in the above example is 2-periodic because pi,i

= 0 and pi,i = 1 for all

= (pi,i )x (pi,i )y > 0. Since i is aperiodic, there is a sequence of numbers

a0 , a1 , . . . , ak such that pi,i` > 0 for ` = 0, . . . , k and d1 = gcd(a0 , a1 ), d` = gcd(d`1 , a` )

Proof. There exist r, s, t 0 with pj,i , pi,i , pi,k > 0. Then

(r) (s) (t)

pj,i pi,i pi,k > 0

In particular pi,j j as t for all i, j S.

Step 2. Define the process

and we will show that Z = (Zt )tT is M (P, ).

= |Pr [Xt = j, t < M ] Pr [Yt = j, t < M ] |

completing the proof.

Recall the problem Satisfiability: There is a set of n Booean variables X = {x1 , . . . , xn }

is thus at least 1/3. Hence we have

The diagram is hence:

Solving this recurrence yields

with the unimportant constant c = 3/(8 ). Hence, for i > 0

Now we estimate the overall success-probability