Professional Documents
Culture Documents
CEA - CADARACHE
FRANCE
SUMMER 2012
DIMITRI P. BERTSEKAS
http://www.athenasc.com/dpbook.html
http://web.mit.edu/dimitrib/www/publ.html
APPROXIMATE DYNAMIC PROGRAMMING
BRIEF OUTLINE I
Our subject:
Large-scale DP based on approximations and
in part on simulation.
This has been a research area of great inter-
est for the last 20 years known under various
names (e.g., reinforcement learning, neuro-
dynamic programming)
Emerged through an enormously fruitful cross-
fertilization of ideas from artificial intelligence
and optimization/control theory
Deals with control of dynamic systems under
uncertainty, but applies more broadly (e.g.,
discrete deterministic optimization)
A vast range of applications in control the-
ory, operations research, artificial intelligence,
and beyond ...
The subject is broad with rich variety of
theory/math, algorithms, and applications.
Our focus will be mostly on algorithms ...
less on theory and modeling
APPROXIMATE DYNAMIC PROGRAMMING
BRIEF OUTLINE II
Our aim:
A state-of-the-art account of some of the ma-
jor topics at a graduate level
Show how the use of approximation and sim-
ulation can address the dual curses of DP:
dimensionality and modeling
Our 7-lecture plan:
Two lectures on exact DP with emphasis on
infinite horizon problems and issues of large-
scale computational methods
One lecture on general issues of approxima-
tion and simulation for large-scale problems
One lecture on approximate policy iteration
based on temporal differences (TD)/projected
equations/Galerkin approximation
One lecture on aggregation methods
One lecture on stochastic approximation, Q-
learning, and other methods
One lecture on Monte Carlo methods for
solving general problems involving linear equa-
tions and inequalities
APPROXIMATE DYNAMIC PROGRAMMING
LECTURE 1
LECTURE OUTLINE
Discrete-time system
xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . , N 1
k: Discrete time
xk : State; summarizes past information that
is relevant for future optimization
uk : Control; decision to be selected at time
k from a given set
wk : Random parameter (also called distur-
bance or noise depending on the context)
N : Horizon or number of times control is
applied
wk Demand at Period k
Stock Ordered at
Period k
Cos t o f P e rio d k uk
c uk + r (xk + uk - wk)
Discrete-time system
xk+1 = fk (xk , uk , wk ) = xk + uk wk
uk = k (xk ), k = 0, . . . , N 1
J (x0 ) = J (x0 )
"! !"#$%&'()*+($,-
! ! # !#-,
JN (xN ) = gN (xN ),
Some approaches:
(a) Problem Approximation: Use Jk derived from
a related but simpler problem
(b) Parametric Cost-to-Go Approximation: Use
as Jk a function of a suitable parametric
form, whose parameters are tuned by some
heuristic or systematic scheme (we will mostly
focus on this)
This is a major portion of Reinforcement
Learning/Neuro-Dynamic Programming
(c) Rollout Approach: Use as Jk the cost of
some suboptimal policy, which is calculated
either analytically or by simulation
ROLLOUT ALGORITHMS
& $ %'
min E gk (xk , uk , wk )+Jk+1 fk (xk , uk , wk ) ,
uk Uk (xk )
Stationary system
xk+1 = f (xk , uk , wk ), k = 0, 1, . . .
Cost of a policy = {0 , 1 , . . .}
!N 1 #
" $ %
J (x0 ) = lim E
wk
k g xk , k (xk ), wk
N
k=0,1,... k=0
where 1N = {1 , 2 , . . . , N 1 }
By induction we have
Bellmans equation: J = T J , J = T J
Optimality condition:
: optimal <==> T J = T J
Jk = Tk Jk
Tk+1 Jk = T Jk
TWO KEY PROPERTIES
If J0 0,
k M k M
J (x) k
(T J0 )(x) J (x) + ,
1 1
k+1 M
(T J )(x) (T k+1 J0 )(x)
1
k+1 M
(T J )(x) +
1
we obtain J = T J . Q.E.D.
THE CONTRACTION PROPERTY
* * * *
max*(T J)(x)(T J & )(x)* max*J(x)J & (x)*.
x x
* *
Proof: Denote c = maxxS *J(x) J & (x)*. Then
Hence
* *
*(T J)(x) (T J & )(x)* c, x.
Q.E.D.
NEC. AND SUFFICIENT OPT. CONDITION
T J = T J .
J = T J ,
J = T J .
LECTURE 2
LECTURE OUTLINE
xk+1 = f (xk , uk , wk ), k = 0, 1, . . .
Cost of a policy = {0 , 1 , . . .}
!N 1 #
" $ %
J (x0 ) = lim wk
E k g xk , k (xk ), wk
N
k=0,1,... k=0
Bellmans equation: J = T J , J = T J or
& $ %'
J (x) = min E g(x, u, w) + J f (x, u, w) , x
uU (x) w
& $ % $ %'
J (x) = E g x, (x), w + J f (x, (x), w) , x
w
Optimality condition:
: optimal <==> T J = T J
i.e.,
& $ %'
(x) arg min E g(x, u, w) + J f (x, u, w) , x
uU (x) w
* * * *
* & * * &
max (T J)(x)(T J )(x) max J(x)J (x)*.
x x
k
& $ % $ %'
Jk (x) = E g x, (x), w + Jk f (x, (x), w) , x
w
or Jk = Tk Jk
Policy improvement: Let k+1 be such that
k+1
& $ %'
(x) arg min E g(x, u, w) + Jk f (x, u, w) , x
uU (x) w
or Tk+1 Jk = T Jk
For finite state space policy evaluation is equiv-
alent to solving a linear system of equations
Dimension of the system is equal to the number
of states.
For large problems, exact PI is out of the ques-
tion (even though it terminates finitely)
INTERPRETATION OF VI AND PI
T J 45 Degree Line
Prob. = 1 Prob. =
TJ
n Value Iterations Prob. = 1 Prob. =
Do not J J =Set
Replace T J S
0 Prob.==T12 J0
= T J0
J0 J J = T J
0 Prob. = 1
J0 Do not Replace Set S 1 J J
= T J0 = T 2 J0
T J T1 J J
TJ
= T J0
Policy Improvement Exact Policy Evalua
Evaluation
J0 J J = T J J J1 = T1 J1
0 Prob. = 1
J0 1 J J
Policy Improvement Exact Policy Evaluation (Exact if
JUSTIFICATION OF POLICY ITERATION
Tk+1 Jk = T Jk Tk Jk = Jk
Since
lim TNk+1 Jk = Jk+1
N
we have Jk Jk+1 .
If Jk = Jk+1 , then Jk solves Bellmans equa-
tion and is therefore equal to J
So at iteration k either the algorithm generates
a strictly improved policy or it finds an optimal
policy
For a finite spaces MDP, there are finitely many
stationary policies, so the algorithm terminates
with an optimal policy
APPROXIMATE PI
)Jk Jk ) , k = 0, 1, . . .
)Tk+1 Jk T Jk ) $, k = 0, 1, . . .
$ + 2
lim sup )Jk J )
k (1 )2
$ + 2
)J J )
1
OPTIMISTIC POLICY ITERATION
J Tm J
Tk Jk = T Jk , Jk+1 = Tmkk Jk , k = 0, 1, . . .
If mk 1 it becomes VI
If mk = it becomes PI
Can be shown to converge (in an infinite number
of iterations)
Q-LEARNING I
with x = f (x, u, w)
Q (x, u) is called the optimal Q-factor of (x, u)
We can equivalently write the VI method as
with x = f (x, u, w)
Q-LEARNING II
where x = f (x, u, w)
VI and PI for Q-factors are mathematically
equivalent to VI and PI for costs
They require equal amount of computation ...
they just need more storage
Having optimal Q-factors is convenient when
implementing an optimal policy on-line by
)F k J J ) k )J J ), k = 1, 2, . . . .
Contraction assumption:
For every J B(X), the functions T J and
T J belong to B(X).
For some (0, 1), and all and J, J &
B(X), we have
)T J T J & ) )J J & )
lim Tk J = J , lim T k J = J
k k
Jk = Tk Jk
Tk+1 Jk = T Jk
Tk Jk = T Jk , Jk+1 = Tmkk Jk , k = 0, 1, . . .
If mk 1 it becomes VI
If mk = it becomes PI
For intermediate values of mk , it is generally
more efficient than either VI or PI
ASYNCHRONOUS ALGORITHMS
J = (J1 , . . . , Jm ),
where J" is the restriction of J on the set X" .
Synchronous algorithm:
(t) (t)
+
J"t+1 (x) = T (J1 "1 , . . . , Jm"m )(x) if t R" ,
J"t (x) if t
/ R"
T (J1t , . . . , Jnt ) = T J t ,
{x0 , x1 , . . .} = {1, . . . , n, 1, . . . , n, 1, . . .}
T J S(k+1), J S(k), k = 0, 1, . . . .
Interpretation of assumptions:
J = (J1 , J2 )
(0) S2 (0) ) S(k + 1) + 1) J TJ
(0) S(k)
S(0)
S1 (0)
Convergence mechanism:
J1 Iterations
J = (J1 , J2 )
) S(k + 1) + 1) J
(0) S(k)
S(0)
Iterations J2 Iteration
LECTURE 3
LECTURE OUTLINE
xk+1 = f (xk , uk , wk ), k = 0, 1, . . .
Cost of a policy = {0 , 1 , . . .}
!N 1 #
" $ %
J (x0 ) = lim wk
E k g xk , k (xk ), wk
N
k=0,1,... k=0
n
" $ %$ $ % %
(T J)(i) = pij (i) g i, (i), j +J(j) , i = 1, . . . , n
j=1
SHORTHAND THEORY A SUMMARY
Bellmans equation: J = T J , J = T J or
n
" $ %
J (i) = min pij (u) g(i, u, j) + J (j) , i
uU (i)
j=1
n
" $ %$ $ % %
J (i) = pij (i) g i, (i), j + J (j) , i
j=1
Optimality condition:
: optimal <==> T J = T J
i.e.,
n
" $ %
(i) arg min
pij (u) g(i, u, j)+J (j) , i
uU (i)
j=1
THE TWO MAIN ALGORITHMS: VI AND PI
or Jk = Tk Jk
Policy improvement: Let k+1 be such that
n
k+1
" $ %
(i) arg min pij (u) g(i, u, j)+Jk (j) , i
uU (i)
j=1
or Tk+1 Jk = T Jk
Policy evaluation is equivalent to solving an
n n linear system of equations
For large n, exact PI is out of the question
(even though it terminates finitely)
STOCHASTIC SHORTEST PATH (SSP) PROBLEMS
))))))
!"#$%&'!%(&
SSP THEORY
Features:
Material balance,
Mobility,
Safety, etc Score
Feature Weighting
Extraction of Features
Position Evaluator
i) Linear Cost
Feature
State Extraction
i Feature Mapping
Extraction Feature
Mapping Vector
Feature (i)
Vector Linear Cost Approximator
i Feature Extraction Mapping Feature Vectori) Linear Cost (i)! r
Approximator
Approximator
Feature Extraction( Mapping
) Feature Vector
Feature Extraction Mapping Feature Vector
General structure
System Simulator D
Cost-to-Go Approx
r)
r) Samples i Cost Approximation AJ(j,
n Algorithm
r Decision Generator
r) Decision (i) Sroximator Supplies Valu State i C
ecisionCost-to-Go
GeneratorApproximator S
rState Cost
Supplies r) D
Approximation
Values J(j,
J
=0
Subspace S = {r | r "s } Set
Projection Projection
on S on S
!rk+1
!rk+1
!rk !rk Simulation error
0 0
S: Subspace spanned by basis functions S: Subspace spanned by basis functions
r = DT (r)
!
Original System States Aggregate States
Original System States Aggregate States
! i , j=1 !
according to pij (u),, g(i,
with cost
u, j)
Matrix Matrix Aggregation Probabilities
Disaggregation Probabilities
Aggregation Probabilities Aggregation Disaggregation
Probabilities Probabilit
Disaggregation Probabilities
dxi ! Disaggregation
jyProbabilities
Q
| S
rk ) Jk (i)| ,
max |J(i, k = 0, 1, . . .
i
rk )(T J)(i,
max |(Tk+1 J)(i, rk )| $, k = 0, 1, . . .
i
$
% $ + 2
lim sup max Jk (i) J (i)
k i (1 )2
LECTURE 4
LECTURE OUTLINE
# !" !$"
# "! !$"
with [0, 1)
Shorthand notation for DP mappings
n
" $ %
(T J)(i) = min pij (u) g(i, u, j)+J(j) , i = 1, . . . , n,
uU (i)
j=1
n
" $ %$ $ % %
(T J)(i) = pij (i) g i, (i), j +J(j) , i = 1, . . . , n
j=1
SHORTHAND THEORY A SUMMARY
Bellmans equation: J = T J , J = T J or
n
" $ %
J (i) = min pij (u) g(i, u, j) + J (j) , i
uU (i)
j=1
n
" $ %$ $ % %
J (i) = pij (i) g i, (i), j + J (j) , i
j=1
Optimality condition:
: optimal <==> T J = T J
i.e.,
n
" $ %
(i) arg min
pij (u) g(i, u, j)+J (j) , i
uU (i)
j=1
THE TWO MAIN ALGORITHMS: VI AND PI
or Jk = Tk Jk
Policy improvement: Let k+1 be such that
n
k+1
" $ %
(i) arg min pij (u) g(i, u, j)+Jk (j) , i
uU (i)
j=1
or Tk+1 Jk = T Jk
Policy evaluation is equivalent to solving an
n n linear system of equations
For large n, exact PI is out of the question
(even though it terminates finitely)
APPROXIMATION IN VALUE SPACE
= r
J(r)
$
% 2
lim sup max Jk (i, rk ) J (i)
k i=1,...,n (1 )2
AN EXAMPLE OF FAILURE
Error Bound: If
J T(!r)
Projection
Projection on S
on S
!r = #T(!r)
#J
0 0
S: Subspace spanned by basis functions S: Subspace spanned by basis functions
S = {r | r ,s }
J = r
where
r = arg mins )J r)2
r'
PI WITH INDIRECT POLICY EVALUATION
r = T (r)
1
)J r ) )J J )
1 2
PRELIMINARIES: PROJECTION PROPERTIES
)J J)
)J J) , for all J, J ,n .
= )J J)2
PROOF OF CONTRACTION PROPERTY
#T JT J# #T JT J# = #P (JJ )# #JJ #
1
)J r ) )J J ) .
1 2
Proof: We have
9 92
)J r )2 = )J J )2
+ 9J r 9
9 92
2
= )J J ) + 9T J T (r )9
)J J )2 + 2 )J r )2 ,
where
The first equality uses the Pythagorean The-
orem
The second equality holds because J is the
fixed point of T and r is the fixed point
of T
The inequality uses the contraction property
of T .
Q.E.D.
MATRIX FORM OF PROJECTED EQUATION
C = & (I P ), d = & g.
PROJECTED EQUATION: SOLUTION METHODS
Matrix inversion: r = C 1 d
Projected Value Iteration (PVI) method:
rk+1 = T (rk ) = (g + P rk )
Converges to r because T is a contraction.
Value Iterate
T(!rk) = g + "P!rk
Projection
on S
!rk+1
!rk
0
S: Subspace spanned by basis functions
which yields
rk+1 = rk (& )1 (Crk d)
SIMULATION-BASED IMPLEMENTATIONS
Ck C, dk d
rk+1 = rk Gk (Ck rk dk )
where
Gk (& )1
This is the LSPE (Least Squares Policy Evalua-
tion) Method.
Key fact: Ck , dk , and Gk can be computed
with low-dimensional linear algebra (of order s;
the number of basis functions).
SIMULATION MECHANICS
k
1 "
dk = (it )g(it , it+1 ) & g
k + 1 t=0
rk+1 = rk Gk (Ck rk dk )
MULTISTEP METHODS
Note that 0 as 1
T t and T () have the same fixed point J and
1
)J r ) > )J J )
1 2
Slope J
)=0
1
Error bound )J r ) )J J )
12
with
"
"
P () = (1 ) " " P "+1 , g () = " " P " g
"=0 "=0
() ()
The simulation process to obtain Ck and dk
is similar to the case = 0 (single simulation tra-
jectory i0 , i1 , . . . more complex formulas)
k k
() 1 " "
mt mt
$ %&
Ck = (it ) (im )(im+1 ) ,
k + 1 t=0 m=t
k k
() 1 " "
dk = (it ) mt mt gim
k + 1 t=0 m=t
LECTURE 5
LECTURE OUTLINE
Review of approximate PI
Review of approximate policy evaluation based
on projected Bellman equations
Exploration enhancement in policy evaluation
Oscillations in approximate PI
Aggregation An alternative to the projected
equation/Galerkin approach
Examples of aggregation
Simulation-based aggregation
DISCOUNTED MDP
# !" !$"
# "! !$"
with [0, 1)
Shorthand notation for DP mappings
n
" $ %
(T J)(i) = min pij (u) g(i, u, j)+J(j) , i = 1, . . . , n,
uU (i)
j=1
n
" $ %$ $ % %
(T J)(i) = pij (i) g i, (i), j +J(j) , i = 1, . . . , n
j=1
APPROXIMATE PI
J (r) = r
Slope J
)=0
Rk+1
rk
+2 rk+3
Rk
+1 rk+2
+2 Rk+3 k rk+1
Rk+2
2 R3
1 r2
r1
R2 2 r3
R1
r = (W T )(r)
=
"
J(j) jy R(y), j
y
EXAMPLE I: HARD AGGREGATION
1 0 0 0
1 2 3 4 15 26 37 48 59 16 27 38 49 5 6 7 8 9 1 0 0 0
0 1 0 0
1 2 3 4 5 6 7 8 9 x1 x2 1
0 0 0
1 2 3 4 15 26 37 48 59 16 27 38 49 5 6 7 8 9
= 1 0 0 0
0 1 0 0
1 2 3 4 15 26 37 48 x
59316 27 38 49 5 6x74 8 9 0
0 1 0
0 0 1 0
0 0 0 1
Special
SpecialStates
StatesAggregate
AggregateStates Special States Aggregate States Features
StatesFeatures
Features
)
j3 y1 1 y2
x j1 2 y3
xj
x j1 j2
j2 j3
Representative/Aggregate States
"
J(i) = min H" (i, u, J, R), R(&) = d"i J(i),
uU (i)
iS"
i S" , & = 1, . . . , m.
LECTURE 6
LECTURE OUTLINE
# !" !$"
# "! !$"
with [0, 1)
Shorthand notation for DP mappings
n
" $ %
(T J)(i) = min pij (u) g(i, u, j)+J(j) , i = 1, . . . , n,
uU (i)
j=1
n
" $ %$ $ % %
(T J)(i) = pij (i) g i, (i), j +J(j) , i = 1, . . . , n
j=1
THE TWO MAIN ALGORITHMS: VI AND PI
or Jk = Tk Jk
Policy improvement: Let k+1 be such that
n
k+1
" $ %
(i) arg min pij (u) g(i, u, j)+Jk (j) , i
uU (i)
j=1
or Tk+1 Jk = T Jk
We discussed approximate versions of VI and
PI using projection and aggregation
We focused so far on cost functions and approx-
imation. We now consider Q-factors.
BELLMAN EQUATIONS FOR Q-FACTORS
Equivalently Q = F Q , where
n
" 2 3
(F Q)(i, u) = pij (u) g(i, u, j) + #min Q(j, u& )
u U (j)
j=1
Equivalently Q = F Q , where
n
" 2 3
(F Q)(i, u) = pij (u) g(i, u, j) + #min Q(j, u& )
u U (j)
j=1
Q-factors of a policy : For all (i, u)
n
" $ $ %%
Q (i, u) = pij (u) g(i, u, j) + Q j, (j)
j=1
Equivalently Q = F Q , where
n
" $ $ %%
(F Q)(i, u) = pij (u) g(i, u, j) + Q j, (j)
j=1
WHAT IS GOOD AND BAD ABOUT Q-FACTORS
is a sup-norm contraction.
Generic stochastic approximation algorithm:
Consider generic fixed point problem involv-
ing an expectation:
& '
x = Ew f (x, w)
& '
Assume Ew f (x, w) is a contraction with
respect to some norm, so the iteration
& '
xk+1 = Ew f (xk , w)
Dimitri P. Bertsekas
We focus on solution of Cx = d
Use simulation to compute Ck C and dk d
Estimate the solution by matrix inversion Ck1 dk C 1 d (assuming C is
invertible)
Alternatively, solve Ck x = dk iteratively
Why simulation?
C may be of small dimension, but may be defined in terms of matrix-vector
products of huge dimension
References
Outline
2 Sampling Issues
Simulation for Projected Equations
Multistep Methods
Constrained Projected Equations
Low-Dimensional Approximation
x = (Ax + b)
y = Ay + b x = (Ax + b) Ax + b
Galerkin approximation of equation
LP CONVEX NLP y = Ay + b x = (Ax + g)
Simplex
0
LP CONVEX NLP
y = Ay + b x = (Ax + b) Ax + b
Gradient/Newton
S: Subspace spanned by basis functionsSimplex
Duality LP CONVEX NLP
Gradient/Newton
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
Let be projection with respect to a weighted Euclidean norm y = y y
x (Ax + b) (Columns of )
or
Cx = d
where
C = (I A), d = b
Aggregation equation
By forming convex combinations of variables (i.e., y x) and equations
(using D), we obtain an aggregate form of the fixed point problem y = Ay + b:
x = D(Ax + b)
or Cx = d with
C = DA, d = Db
x = D(Ax + b)
min W x h2 .
xs
C = W W , d = W h
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
Critical Problem
Compute sums ni=1 ai for very large n (or n = )
k
1
d = b dk = (it )bit
k +1
t=0
Algorithms
Matrix inversion approach: x Ck1 dk (if Ck is invertible for large k )
Iterative approach: xk +1 = xk Gk (Ck xk dk )
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
where for [0, 1) Direct Method: Projection of cost vector J J n t pnn (u)
pni (u) p
(u) pnj (u)
jn
() +1
T T
= (1 )IndirectMethod: Solving a projected form of Bellmans equa
y = Projection
Ay=0
+ b xon S. Solutionof projected equation r = T (
= (Ax + b) Ax + b
T (r) r = T (r)
Special multistep sampling methods
x
J n t p= (u) ()
nnT pin(x) y (u)ypjn
(u) pni (u) p (u)
nj
Bias-variance tradeoff
y =Special
Ay +States
b xAggregate
= (AxStates
+ b) Features
Ax +nb t pnn (u) pin (u) pni (u)
Subspace spanned byJ basis J J T J T J
T J Tfunctions
y = Ay + b x p=
jn (u) pnj (u)
(Ax + b) Ax + b
Solution of multistep projected equation
LP CONVEX
State x =
i Feature Value
T
NLP Iterate
()
Extraction(x) T (rk )Feature
Mapping = g + P rk Projection
Vector (i) Linearon S rk rk+
Cost
Approximator (i)
Special States Aggregatex
Subspace States Features
=spanned
T by nbasis
() (x) yt py
nn r pSolution
(u)
functions in (u) pni (u)
of J = T (J ) = 0 = 1 0 < < 1
pjn (u) pnj (u) Simplex
LP
y = Ay + b x = (Ax + b) Ax CONVEX
NLP
Solution + ofb J = T (J ) u==0 1 = Cost
1 0= < 1< Cost
1 = 2 u = 2 Cost = -10
Subspace spanned by basis functions Route to (u),Queue
pjk (u) 2= (p)
State i Feature Extraction Mapping Feature Vector
u = 1 (i) k (u), pki (u)=J-10
Solution of multistepSimplex Gradient/Newton
projected equation
Route to
Cost
Queue hLinear
(n)
2
Cost
= 1jCost =h2,u(n) =2 (
Cost
)N (n)
1(i +2 1) p 1 0
Approximator (i) r j (u), pjk (u) k (u),
n p
1 (u)
(n J
(p)
1) 1
Cost 2
= 1 Cost = 2 u =
LP CONVEX NLP
()
x = T (x) y y Duality h (n)y = Ay
h+,b(n)
ki
x = =
( Simulation
(Ax
)N
+
(n) error
b) J
Ax Slope
+ b J 2 Cost
= r= -10 (i + 1
1 01)Cost (p)
Gradient/Newton n 1 (n j (u), p=jk1(u) k (u),
Cost = 2pki
u (u)
= 2JCost =1 -102 (i + 1) p
Solution of J = T (J ) = 0 = 1 0 < Simulation
<1 error Bias J Slope J = r
ace spanned Simplex
by basis functions 1 0 j (u), pjk (u) k (u), pki (u) J (p) 1 2 diagram and costs under policy { ,
Transition
Route to Queue 2 Subgradient Cutting xplane
Simulation =and
T Interior
()
error Solution
(x) pointof JSubgradient
= W T (J ) Bias J Slop
olution of multistep projectedDuality equation Transition diagram costs under policy { , , . . .} M q()
Simulation r Solution of J = W T (J ) Bias J Slope pfJ0 (z)
LP CONVEXh (n) NLP
Gradient/Newton )N (n)spanned
h, (n) = (Subspace byerror
basis functions c + E J =
n 1 (n 1) Cost = 1 Cost = 2r u= Polyhedral
2 Cost = -10approximation
(i + 1) p pf0 (z)z pf0 (z) + (1 p)f1 (z)
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
Consider
x = T (x) = (Ax + b)
where is the projection operation onto a closed convex subset S of the
y = Ay + b x = (Ax + b) Ax + b
subspace S (w/ respect to weighted norm ; : positive definite).
T (x ) = x T (x )
=
y = Ay + b x = (Ax + b) yAx Ay+b+ b x = (Ax + b) Ax + b
x = T () (x) y y
Subspace spanned by basis functions T (x )= x x = T () (x) y y
y = Ayx+ x bx x (x ) T (x )+ b) Ax + b
x =T(Ax
Solution of multistep projected equation
Subspace spanned by basis functions
LP CONVEX NLP x = T () (x) y S Set S
ySubspace
Solution ofx
multistep
= T ()projected
(x) equation
Subspace spanned by basis functions LP CONVEX NLP
Simplex Subspace spanned by basis functions
Solution of multistep projected
LP CONVEX equation
NLP
Simplex
From theLPproperties
CONVEX NLP
Gradient/Newton
of projection,
Simplex Gradient/Newton
Duality Simplex
x T (x ) (y
Gradient/Newton
Duality x ) 0, y S
Gradient/Newton
Subgradient Cutting plane Interior point Subgradient
Duality Subgradient Cutting plane Interior point Subgradient
Duality
Polyhedral approximation
Subgradient Cutting plane Interior point Subgradient
ThisLPsisare
aSubgradient
linear
solved variational
Cutting
by simplex plane inequality:
method
Polyhedral
Find
Interior point Subgradient x such that
approximation
Polyhedral approximation
LPs are solved by simplex method
NLPs are Polyhedral
f (xLPs
solved by approximation
gradient/Newton
(y methods.
) are x
solved 0,method
)aresolved
by simplex
NLPs
y S,
by gradient/Newton methods.
special
LPs are solved
Convex programs are casesmethod
by simplex of NLPs.
where f (y ) = y T (y ) are
NLPs = solved y bygradient/Newton
Convex (Ay + are
programs b)special
. methods.
cases of NLPs.
Modern NLPs
view: are
Postsolved
1990sby gradient/Newton methods.
Convex programs are special cases of NLPs.
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
Equivalence Conclusion
f (x ) (x x ) 0, x X,
where
f (y ) = y T (y ) , X = {x | x S}
x = C 1 d
xk +1 = xk G(Cxk d)
where:
G is a scaling matrix, > 0 is a stepsize
Eigenvalues of I GC within the unit circle (for convergence)
Special cases:
Projection/Richardsons method: C positive semidefinite, G positive
definite symmetric
Proximal method (quadratic regularization)
Splitting/Gauss-Seidel method
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
xk = Ck1 dk
Iterative Method
xk +1 = xk Gk (Ck xk dk )
where:
Gk is a scaling matrix with Gk G
> 0 is a stepsize
Iterative Method
xk +1 = xk Gk (Ck xk dk )
Questions:
Under what conditions is the stochastic method convergent?
How to modify the method to restore convergence?
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
xk +1 = xk Gk (Ck xk dk )
get in and out of the unit circle for a long time (until the size" of the simulation
noise becomes comparable to the stability margin" of the iteration)
xk +1 = xk G(Cxk d)
xk +1 x = (I GC)(xk x )
N(C) N
xk = x + Uyk + Vzk
xk +1 = xk Gk (Ck xk dk )
However, since
(I Gk Ck ) 1
(I Gk Ck ) may cross above 1 too frequently, and we can have divergence.
S 0
Solution Set Sequence
x + N(C) N ) x
Nullspace Component Sequence Orthogonal Component
) N(C) Nullspace Component Sequence Othogonal Component
x + V z
k
xk = x + Uyk + Vzk
2 2 Example
Let the noise be {ek }: MC averages with mean 0 so ek 0, and let
1 + ek 0
xk +1 = xk
ek 1/2
where
k
e k
ek (1 + et ) = O .
t=1
k
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
A Simple Example
Consider the inversion of a scalar c > 0, with simulation error . The
absolute and relative errors are
1 1 E
E= , Er = .
c+ c 1/c
By a Taylor expansion around = 0:
1/(c + )
E = 2, Er .
=0 c c
1
For the estimate c+
to be reliable, it is required that
|| << |c|.
Number of i.i.d. samples needed: k 1/c 2 .
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
Proof Outline:
xk = x + Uyk + Vzk
Cxk d = CVzk 0
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
Ck = Mk , dk = hk ,
A Stabilization Scheme
Shifting the eigenvalues of I Gk Ck by k :
xk +1 = (1 k )xk Gk (Ck xk dk ) .
0.9
0.8 ||(1k)IGkAk||
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4
10 10 10 10 10
k
(i) k = k 1/3
(ii) k = 0
Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods and Singularity Issues
Thank You!