Bellman

THE BELLMAN PRINCIPLE OF OPTIMALITY
IOANID ROSU
As I understand, there are two approaches to dynamic optimization: the Pontrjagin (Hamil-
tonian) approach, and the Bellman approach. I saw several clear discussions of the Hamilton-
ian approach (Barro & Sala-i-Martin, Blanchard & Fischer, D. Romer), but to my surprise, I
didnt see any clear treatment of the Bellman principle. What I saw so far (Due, Chapters
3 and 9, and the Appendix of Mas-Colell, Whinston & Green) is confusing to me. I guess I
should take a look at some dynamic optimization textbook, but Im too lazy for that. Instead,
Im going to try to gure it out on my own, hoping that my freshness on the subject can be
put to use.
The rst four sections are only about local conditions for having a (nite) optimum. In the
last section I will discuss global conditions for optima in Bellmans framework, and give an
example where I solve the problem completely. As a bonus, in Section 5, I use the Bellman
method to derive the EulerLagrange equation of variational calculus.
1. Discrete time, certainty
We start in discrete time, and we assume perfect foresight (so no expectation will be in-
volved). The general problem we want to solve is
_
_
max
(ct)
t=0
f(t, k
t
, c
t
)
s.t. k
t+1
= g(t, k
t
, c
t
) .
(1)
In addition, we impose a budget constraint, which for many examples is the restriction that k
t
be eventually positive (i.e. liminf
t
k
t
0). This budget constraint excludes explosive solutions
for c
t
, so that we can apply the Bellman method. I wont mention the budget constraint until
the last section, but we should keep in mind that without it (or some constraint like it), we
might have no solution.
The usual names for the variables involved is: c
t
is the control variable (because it is under
the control of the choice maker), and k
t
is the state variable (because it describes the state of
the system at the beginning of t, when the agent makes the decision). In this paper, I call the
equation k
t+1
= g(t, k
t
, c
t
) the state equation, I dont know how it is called in the literature.
To get some intuition about the problem, think of k
t
as capital available for production
at time t, and of c
t
as consumption at t. At time 0, for a starting level of capital k
0
, the
consumer chooses the level of consumption c
0
. This determines the level of capital available
for the next period, k
1
= g(0, k
0
, c
0
). So at time 1, the consumer decides on the level of c
1
,
which together with k
1
determines k
2
, and the cycle is repeated on and on. The innite sum
t=0
f(t, k
t
, c
t
) is to be thought of as the total utility of the consumer, which the latter is
supposed to maximize at time 0.
Bellmans idea for solving (1) is to dene a value function V at each t = 0, 1, 2, . . .
V (t, k
t
) = max
(cs)
s=t
f(s, k
s
, c
s
) s.t. k
s+1
= g(s, k
s
, c
s
) ,
Date: April 8, 2002.
1
2 IOANID ROSU
which represents the consumers maximum utility given the initial level of k
t
. Then we have
the following obvious result
Theorem 1.1. (Bellmans principle of optimality)
For each t = 0, 1, 2, . . .
V (t, k
t
) = max
ct
_
f(t, k
t
, c
t
) + V
_
t + 1, g(t, k
t
, c
t
)
_
_
. (2)
This in principle reduces an innite-period optimization problem to a two-period opti-
mization problem. But is this the whole story? How do we actually solve the optimization
problem (1)? Here is where the textbooks I mentioned above are less clear
1
. Well, lets try to
squeeze all we can out of Bellmans equation (2).
We denote partial derivatives by using subscripts. A star superscript denotes the optimum.
Then the rst order condition from (2) reads
f
c
(t, k
t
, c
t
) + V
k
_
t + 1, g(t, k
t
, c
t
)
_
g
c
(t, k
t
, c
t
) = 0 . (3)
Looking at this formula, it is clear that we would like to be able to compute the derivative
V
k
(t +1, k
t+1
). We can try to do that using again the formula (2). Since we are dierentiating
a maximum operator, we apply the envelope theorem
2
and obtain
V
k
(t, k
t
) = f
k
(t, k
t
, c
t
) + V
k
_
t + 1, g(t, k
t
, c
t
)
_
g
k
(t, k
t
, c
t
) . (4)
From (3), we can calculate V
k
_
t + 1, g(t, k
t
, c
t
)
_
, and substituting it in (4), we get
V
k
(t, k
t
) =
_
f
k
f
c
g
c
g
k
_
(t, k
t
, c
t
) . (5)
Finally, substitute this formula into (3) and obtain a condition which does not depend on the
value function anymore:
f
c
(t, k
t
, c
t
) + g
c
(t, k
t
, c
t
)
_
f
k
f
c
g
c
g
k
_
_
t + 1, g(k
t
, c
t
), c
t+1
_
= 0 . (6)
Notice that this formula is true for any k
t
, not necessarily only for the optimal one up to that
point. But in that case, c
t
and c
t+1
are the optimal choices given k
t
. In any case, from now
on we are only going to work at the optimum (t, k
t
, c
t
). The previous formula can be written
as follows:
f
k
(t + 1)
f
c
(t + 1)
g
c
(t + 1)
g
k
(t + 1) =
f
c
(t)
g
c
(t)
. (7)
This is the key equation that allows us to compute the optimum c
t
, using only the initial data
(f
t
and g
t
). I guess equation (7) should be called the Bellman equation, although in particular
cases it goes by the Euler equation (see the next Example). I am going to compromise and
call it the BellmanEuler equation.
For the purposes of comparison with the continuous-time version, write g(t, k
t
, c
t
) = k
t
+
h(t, k
t
, c
t
). Denote
t
= (t+1)(t). Then we can rewrite the BellmanEuler equation (7)
t
_
f
c
(t)
h
c
(t)
_
= f
k
(t + 1)
f
c
(t + 1)
h
c
(t + 1)
h
k
(t + 1) . (8)
1
MWG only discusses the existence and uniqueness of a value function, while Due treats only the Example
mentioned above, and leaves a crucial Lemma as an exercise at the end of the chapter.
2
To state the envelope theorem, start with a function of two variables f(x, ), such that for every x, the
maximum max
f(x, ) is achieved at a point =
(x) in the interior of the -interval. Then

d
dx
max
f(x, ) =
f
x
(
(x)) .
THE BELLMAN PRINCIPLE OF OPTIMALITY 3
Example 1.2. In a typical dynamic optimization problem, the consumer has to maximize
intertemporal utility, for which the instantaneous felicity is u(c), with u a von Neumann
Morgenstern utility function. Therefore, f(t, k
t
, c
t
) =
t
u(c
t
), and 0 < < 1 is the discount
constant. The state equation k
t+1
= g(t, k
t
, c
t
) typically is given by
e
t
+ (k
t
) = c
t
+ (k
t+1
k
t
) , (9)
where e
t
is endowment (e.g. labor income), and (k
t
) is the production function (technology).
As an example of the technology function, we have (k
t
) = rk
t
. The derivative
(k
t
) = r is
then, as expected, the interest rate on capital. Notice that with the above description we have
k
t+1
= g(t, k
t
, c
t
) = k
t
+ (k
t
) + e
t
c
t
.
So we get the following formulas:
ft
ct
= u
(c
t
),
gt
ct
= 1,
ft
kt
= 0,
gt
kt
= 1 + r. The
BellmanEuler equation (7) becomes
u
(c
t
) =
_
1 +
(k
t
)
_
u
(c
t+1
) ,
which is the usual Euler equation.
2. Discrete time, uncertainty
Now we assume everything is stochastic. The only problem with this case is that we have to
be more careful about the state equation k
t+1
= g(t, k
t
, c
t
). Since k
t
and c
t
are known at time
t, it must be that something else is stochastic and generates the uncertainty in k
t+1
. That is
usually the labor income e
t
. The uncertainty then comes from the fact that e
t
is known only
after c
t
. In that case, we might as well call it e
t+1
, and then the budget constraint (9) in
Example 1.2 becomes e
t+1
+(k
t
) = c
t
+(k
t+1
k
t
). Even though it may be a bit misleading,
we are going to keep previous notation, and write g(t, k
t
, c
t
), instead of what we should really
write, g(t + 1, k
t
, c
t
). The state equation then reads
k
t+1
= g(t, k
t
, c
t
) = k
t
+ (k
t
) + e
t+1
c
t
. (10)
To be able to solve the Bellman problem, we also have to assume that g
c
(t +1, k
t
, c
t
) is known
at time t, i.e. it does not depend on t +1. The reason we need this condition is that, in order
to obtain (5), we need g
c
to come out of the expectation operator E
t
in equation (3). In our
example the condition is true, since g
c
= 1.
The agent now solves the problem
_
_
max
(ct)
E
0
t=0
f(t, k
t
, c
t
)
s.t. k
t+1
= g(t, k
t
, c
t
) .
(11)
As usual, we denote by E
t
the expectation given information available at time t. Then we can
dene the value function
V (t, k
t
) = max
(cs)
E
t
s=t
f(s, k
s
, c
s
) s.t. k
s+1
= g(s, k
s
, c
s
) .
The Bellman principle of optimality (2) becomes
V (t, k
t
) = max
ct
_
f(t, k
t
, c
t
) +E
t
V
_
t + 1, g(t, k
t
, c
t
)
_
_
. (12)
Now in order to derive the Euler equation with uncertainty, all we have to do is replace
V (t + 1) in the formulas of the previous section by E
t
V (t + 1) (using, of course, the fact
4 IOANID ROSU
that dierentiation commutes with expectation). We arrive at the following BellmanEuler
equation
E
t
_
f
k
(t + 1)
f
c
(t + 1)
g
c
(t + 1)
g
k
(t + 1)
_
=
f
c
(t)
g
c
(t)
. (13)
For our particular Example 1.2, we get
u
(c
t
) = (1 + r)E
t
u
(c
t+1
) ,
3. Continuous time, certainty
This is a bit trickier, but the same derivation as in discrete time can be used. The dierence
is that instead of the interval t, t + 1 we now look at t, t + dt.
The problem that the decision maker has to solve is
_
_
_
max
ct
_

0
f(t, k
t
, c
t
)
s.t.
dkt
dt
= h(t, k
t
, c
t
) .
(14)
The constraint can be rewritten in dierential notation
k
t+dt
= k
t
+ h(t, k
t
, c
t
)dt , (15)
so we have a problem similar in form to (1), and we can solve it by an analogous method.
Dene the value function
V (t, k
t
) = max
(cs)
_

t
f(s, k
s
, c
s
) ds s.t.
dk
s
ds
= h(s, k
s
, c
s
) .
The Bellman principle of optimality states that
V (t, k
t
) = max
ct
__
t+dt
t
f(s, k
s
, c
s
) ds + V
_
t + dt, k
t
+ h(t, k
t
, c
t
)dt
_
_
. (16)
We know that
_
t+dt
t
f(s, k
s
, c
s
) ds = f(t, k
t
, c
t
)dt. The rst order condition for a maximum is
f
c
(t)dt + V
k
(t + dt, k
t+dt
) h
c
(t) dt = 0 . (17)
This is equivalent to
V
k
(t + dt) =
f
c
(t)
h
c
(t)
. (18)
As we did in the rst section, apply the envelope theorem to derive
V
k
(t) = f
k
(t)dt + V
k
(t + dt)
_
1 + h
k
(t) dt
_
. (19)
Substitute (18) into (19) to obtain
V
k
(t) =
f
c
(t)
h
c
(t)
+
_
f
k
(t)
f
c
(t)
h
c
(t)
h
k
(t)
_
dt .
If is any dierentiable function, then (t + dt)dt = (t)dt, so we get the formula
V
k
(t + dt) =
f
c
(t + dt)
h
c
(t + dt)
+
_
f
k
(t)
f
c
(t)
h
c
(t)
h
k
(t)
_
dt . (20)
Putting equations (18) and (20), we get
f
c
(t + dt)
h
c
(t + dt)

f
c
(t)
h
c
(t)
=
_
f
k
(t)
f
c
(t)
h
c
(t)
h
k
(t)
_
dt .
Using the formula (t + dt) (t) =
d
dt
dt, we can rewrite the above formula as
d
dt
_
f
c
(t)
h
c
(t)
_
= f
k
(t)
f
c
(t)
h
c
(t)
h
k
(t) (21)
This is the BellmanEuler equation in continuous-time. One can see that it is pretty similar
to our equation (8) in discrete time.
We now have to be more careful with our notation. By f
c
(t) in the above formula, what
we really mean is f
c
(t, k
t
, c
t
) (as usual, calculated at the optimum, but Im going to omit the
star superscript). Then we calculate
d
dt
(f
c
) = f
tc
+ f
kc
h + f
cc
dc
dt
.
So we can rewrite the BellmanEuler equation (21) as follows:
_
f
tc
+ f
kc
h + f
cc
dc
dt
_
=
f
c
h
c
_
h
tc
+ h
kc
h + h
cc
dc
dt
h
c
h
k
_
h
c
f
k
. (22)
In general, in order to solve this, notice that we can rewrite (22) as dc
t
/dt = (t, k
t
, c
t
), so the
optimum is given by the following system of ODEs
_
dc
t
/dt = (t, k
t
, c
t
)
dk
t
/dt = h(t, k
t
, c
t
) .
(23)
Example 3.1. Applying the above analysis to our favorite example, we have f(t, k
t
, c
t
) =
e
t
u(c
t
) and dk
t
/dt = h(t, k
t
, c
t
) = e
t
+ (k
t
) c
t
. The Euler equation (21) becomes
d
dt
_
e
t
u
(c
t
)
_
= e
t
u
(c
t
)
(k
t
) ,
or equivalently
_
(c
t
)c
t
u
(c
t
)
_
dc
t
/dt
c
t
=
(k
t
) . (24)
Notice that we get the same equation as (7) from Blanchard and Fischer (Chapter 2, p. 40),
so we are on the right track.
4. Continuous time, uncertainty
First, we assume that the uncertainty comes from the function h (for example, if h depends
on an uncertain endowment e
t
). In the second part of this section, we are going to assume
that the constraint is stochastic. But for now, the agent solves the problem
_
_
_
max
ct
E
0
_

0
f(t, k
t
, c
t
)
s.t.
dkt
dt
= h(t, k
t
, c
t
) .
(25)
The value function takes the form
V (t, k
t
) = max
(cs)
_

t
f(s, k
s
, c
s
) ds s.t.
dk
s
ds
= h(s, k
s
, c
s
) ,
and the Bellman principle of optimality (16) becomes
V (t, k
t
) = max
ct
__
t+dt
t
f(s, k
s
, c
s
) ds +E
t
V
_
t + dt, k
t
+ h(t, k
t
, c
t
)dt
_
_
. (26)
We arrive at the following BellmanEuler equation
E
t
d
dt
_
f
c
h
c
_
= f
k
f
c
h
c
h
k
. (27)
6 IOANID ROSU
For our particular Example 1.2, we get
_
(c
t
)c
t
u
(c
t
)
_
E
t
dc
t
/dt
c
t
=
(k
t
) . (28)
Now we assume everything is stochastic, and the agent solves the problem
_
_
_
max
ct
E
0
_

0
f(t, k
t
, c
t
)
s.t. dk
t
= (t, k
t
, c
t
)dt + (t, k
t
, c
t
)dW
t
,
(29)
where W
t
is a one-dimensional Wiener process (Brownian motion), and , are deterministic
funtions. If we dene the value function V (t, k
t
as above, the Bellman principle of optimality
implies that
V (t, k
t
) = max
ct
_
f(t, k
t
, c
t
) dt +E
t
V (t + dt, k
t+dt
)
_
. (30)
Now V (t+dt, k
t+dt
) = V (t+dt, k
t
+dt+dW
t
) = V (t, k
t
)+V
t
dt+V
k
(dt+dW
t
)+
1
2
V
kk
2
dt,
where the last equality comes from Itos lemma. Taking expectation at t, it follows that
E
t
V (t + dt, k
t+dt
) = V (t, k
t
) + V
t
dt + V
k
dt +
1
2
V
kk
2
dt . (31)
The Bellman principle (30) can be written equivalently as
sup
ct
_
V (t, k
t
) + f(t, k
t
, c
t
)dt +E
t
V (t + dt, k
t+dt
)
_
0 ,
with equality at the optimum c
t
. Putting this together with (31), we get the following
sup
a
D
a
V (t, y) + f(t, y, a) = 0 , (32)
where D
a
is a partial dierential operator dened by
D
a
V (t, y) = V
t
(t, y) + V
y
(t, y)(t, y, a) +
1
2
V
yy
(t, y, a)
2
. (33)
Equation (32) is also known as the HamiltonJacobiBellman equation. (We got the same
equation as Due, chapter 9A, so were ne.) This is not quite a PDE yet, because we have a
supremum operator before it. However, if we substitute the value of a obtained from the rst
order condition in (32), we can remove the supremum, so indeed we get a PDE. The boundary
condition comes from some transversality condition that we have to impose on V at innity
(see Due).
Notice that for the stochastic case we took a dierent route than before. This is because
now we cannot eliminate the value function anymore (the reason is that we get an extra term
in the rst order condition coming from the dW
t
-term, and that term depends on V as well).
So this approach rst looks at a value function which satises the HamiltonJacobiBellman
equation, and then derives the optimal consumption c
t
and capital k
t
.
5. The EulerLagrange equation
The reason Im treating this variational problem here is that the Bellman method seems
very well suited to solve it. The classical EulerLagrange equation
d
dt
_
F
c
_
=
F
c
(34)
solves the following problem
_
_
_
max
c(t)
_
b
a
F(t, c, c)dt
s.t. c(a) = P and c(b) = Q .
(35)
Suppose we know the optimum curve c only up to some point x = c(t). Then we dene the
value function
V (t, x) = max
c(s)
_
b
t
F(s, c, c)ds s.t. c(t) = x and c(b) = Q .
In the discussion that follows, we denote by a direction in R
n
(the curve c also has values
in R
n
). The Bellman principle of optimality says that
V (t, x) = max
_
t+dt
t
F(t, x, )dt + V
t+dt
(x + dt) . (36)
The last term comes from the identity c(t + dt) = c(t) + c(t)dt = x + dt.
The rst order condition for this maximum is (after dividing through by dt)
dV
t+dt
dc
= F
c
.
The envelope theorem applied to equation (36) yields
dV
t
dc
= F
c
dt +
dV
t+dt
dc
= F
c
dt F
c
.
Replace t by t + dt in the above equation to get
dV
t+dt
dc
= F
c
dt F
c
(t + dt) .
Putting together the two formulas we have for dV
t+dt
/dt, we obtain
F
c
(t + dt) F
c
(t) = F
c
dt .
This implies
_
d
dt
F
c
_
dt = F
c
dt ,
which after canceling dt is the desired EulerLagrange equation (35).
6. Conditions for a global optimum
There are two main issues in the existence of the solution. One is whether the value function
is nite or not. This will be taken care of by the budget constraint, as we will see below. The
other issue is whether or not the maximum in the Bellman principle of optimality is attained
at an interior point (in order to get the rst order condition to hold). This last issue should
be resolved once we analyze the solution, and see that we have indeed an interior maximum.
Of course, after we know that the BellmanEuler equation holds, we need to do a little bit of
extra work to see which of the possible solutions ts the initial data of the problem.
Since I dont want to develop the general theory for this (one can look it up in a texbook),
I will just analyze our initial Example 1.2. Recall that we are solving the problem
_
_
max
(ct)
t=0
t
u(c
t
)
s.t. k
t+1
= (1 + r)k
t
+ e
t
c
t
.
We assume that the endowment is always positive and has a nite present value

t

t
e
t
< .
We impose the requirement that the capital k
t
be eventually positive, i.e. that liminf
t
k
t
0.
For simplicity, assume also that (1 + r) = 1.
Start with an optimum sequence of consumption (c
t
). Such a sequence clearly exists, since
we are looking for a (c
t
) that achieves the maximum utility at zero (nite or not). Note that
we dont assume anything particular about it; for example we dont assume that the value
8 IOANID ROSU
function corresponding to it is nite, or that it attains an interior optimum. Now we start to
use our assumptions and show that (c
t
) does satisfy all those properties.
Using the state equation, we deduce that for t suciently large (so that k
t+1
becomes
positive, hence bigger than 1),
c
t
(1 + r)k
t
+ e
t
+ 1 .
Write this inequality for all s > t and multiply through by
st
. Denote by PV
t
(c) =
st

st
c
s
, the present value of consumption at time t, and similarly for k and e. Adding
up the above inequalities for s t, we get
PV
t
(c) (1 + r)PV
t
(k) + PV
t
(e) +
1
1
. (37)
We assumed that PV
t
(e) < . Lets show that PV
t
(c) < . If PV
t
(k) < , we are done. If
PV
t
(k) = , then we can consume out of this capital until PV
t
(k) becomes nite. Certainly,
by doing this, PV
t
(c) will increase, yet it will still satisfy (37). Now all the terms on the
right hand side of (37) are nite, hence so is PV
t
(c), at the new level of consumption. That
means that our original PV
t
(c) must be nite. Moreover, it also follows that PV
t
(k) is nite.
Suppose it isnt. Then, we can proceed as above and increase PV
t
(c), while still keeping it
nite. But this is in contradiction with the fact that (c
t
) was chosen to be an optimum. (I
have to admit that this part is tricky, but we cant avoid it if we want to be rigorous!) Since
u is monotone concave, PV
t
_
u(c)
_
is also nite. That means that the Bellman value function
V (t, k
t
) is nite, so our rst concern is resolved.
We now look at the Bellman principle of optimality
V (t, k
t
) = max
ct
_
t
u(c
t
) + V
_
t + 1, (1 + r)k
t
+ e
t
c
t
)
_
_
.
Heres a subtle idea: if we save one small quantity today, tomorrow it becomes (1 +r), and
we can either consume it, or leave for later. This latter decision has to be made in an optimum
fashion (since this is the denition of V
t+1
), so we are indierent between consuming (1 + r)
tomorrow and saving it for later on. Thus, we might as well assume that we are consuming
it tomorrow, so when we calculate V
t+1
, only tomorrows utility is aected. Therefore, for
marginal purposes we can regard V
t+1
as equal to
t+1
u(c
t+1
). Then, to analyze the Bellman
optimum locally is the same as analyzing
max
t
u(c
t
) +
t+1
u
_
c
t+1
+ (1 + r)
_
_
locally around the optimum = 0. Since (1 + r) = 1, the BellmanEuler equation is
u
(c
t
) = u
(c
t+1
) or equivalently c
t
= c
t+1
= c .
Because u is concave, it is easy to see that = 0 is a local maximum. That takes care of our
second concern.
We notice that consumption should be smoothened completely, so that c
t
= c for all t.
Recall the state equation: e
t
c
t
= k
t+1
(1 + r)k
t
. Multiplying this by
t
= 1/(1 + r)
t
and
summing over all t, we get
t
(e
t
c
t
) =
t
1
(1 + r)
t
_
k
t+1
k
t
_
= lim
t
t
k
t+1
(1 + r)k
0
.
During the discussion of the niteness of the value function, we noticed that PV
t
(k) < . In
particular, this implies that lim
t

t
k
t
= 0. So we get the formula

t

t
e
t
1
1
c = (1+r)k
0
,
from which we deduce that
c = r
t=0
t
e
t
+ rk
0
. (38)
Since there is only one optimum, it must be a global optimum, so this ends the solution of
our example.

Bellman

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bellman

Uploaded by

Copyright:

Available Formats

THE BELLMAN PRINCIPLE OF OPTIMALITY

f(x, ) is achieved at a point =

(x) in the interior of the -interval. Then

You might also like