Matlab Opt An Dint

Matlab Optimization and Integration
Paul Schrimpf
January 14, 2009
Paul Schrimpf ()
January 14, 2009
1 / 43
This lecture focuses on two ubiquitous numerical techiniques: 1 Optimization and equation solving
Agents maximize utility / prots Estimation
2
Integration
Expectations of the future or over unobserved variables
Paul Schrimpf ()
January 14, 2009
2 / 43
Optimization
Want to solve a minimization problem: min f (x)

x
Two basic approaches:

1 2
Heuristic methods search over x in some systematic way Model based approaches use an easily minimized approximation to f () to guide their search
First, x Then, x
for intuition
n
Paul Schrimpf ()
January 14, 2009
3 / 43
Section Search
Form bracket x1 < x2 < x3 such that f (x2 ) < f (x1 ) and f (x3 ) Try new x (x1 , x3 ), update bracket
Paul Schrimpf ()
January 14, 2009
4 / 43
Quadratic Interpolation
More general interpolation methods possible e.g. Matlab uses both quadratic and cubic interpolation for line search
Paul Schrimpf ()
January 14, 2009
5 / 43
Newton-Rhapson and Quasi-Newton
Newton: use f (x) and f (x) to construct parabola xn+1 = xn f (xn ) f (xn )
n )f (x Quasi-Newton: approximate f (xn ) with f (xxn xn1n1 ) 1 BHHH: approximate f (x) Optimization and i=1 fi (x)fi (x) January 14, 2009 Paul Schrimpf () Matlab with N Integration
6 / 43
Rates of Convergence
Newtons method converges quadraticly, i.e. in a neighborhood of the solution, ||xn+1 x || =C n ||xn x ||2 lim Parabolic interpolation and quasi-Newton methods also achieve better than linear rates of convergence, but (usually) less than quadratic, i.e. ||xn+1 x || =C n ||xn x ||r lim for some r (1, 2] Can achieve faster than quadratic convergence by using more information Usually, happy with any rate better than 1
Paul Schrimpf () Matlab Optimization and Integration January 14, 2009 7 / 43
Trust Regions
Problem: For a function that is not globally concave, quadratic interpolation and Newton methods might prescribe an upward step and can fail to converge Solution: Combine them with a sectional search, or more generally, a trust region
Region, R where we trust our quadratic approximation, f (x) f (x) xn+1 = arg min fn (x)
xR
Shrink or expand R based on how much better f (xn+1 ) is than f (xn )
Brents method combines quadratic interpolation with sectional search
Paul Schrimpf ()
January 14, 2009
8 / 43
Matlab Implementation
fminbnd()
uses Brents method
No uni-dimensional implementation of any type of Newton method, but could use multi-dimensional versions We used fminbnd() in lecture 1:
1 2 3 4 5 6 7 8
optimset('fminbnd') % this returns the default options for
% change some options opt = optimset('Display','iter', ... % 'off','final','notif 'MaxFunEvals',1000,'MaxIter',1000, ... 'TolX',1e8); % maximize the expected value [c ev] = fminbnd(@(c) obj(c,x,parm), cLo, cHi,opt);
For all optimization functions, can also set options through a graphical interface by using optimtool()
Multi-Dimensional Optimization with Quadratic Approximation
Basic idea is the same:

Construct a quadratic approximation to the objective Minimize approximation over some region
Complicated by diculty of constructing region

Cannot bracket a minimum in R n , would need an n 1 dimensional manifold Two approaches:
Use n dimensional trust region Break problem into sequence of smaller-dimensional minimizations
Paul Schrimpf ()
January 14, 2009
10 / 43
Directional Search Methods

Choose a direction
Na approach: use basis or steepest descent directions ve
Very inecient in worse case
Try new directions, keep good ones: Powells method or conjugate gradients Use Newton or quasi-Newton direction
Generally fastest method
2
Do univariate minimization along that direction, this step is called a line search
Exact: nd the minimum along the direction Approximate: just nd a point that is enough of an improvement
Choose a dierent direction and repeat
Paul Schrimpf ()
January 14, 2009
11 / 43
Trust Region Methods
Same as one dimension:

Region, R where we trust our quadratic approximation, f (x) f (x) xn+1 = arg min fn (x)
xR
Shrink or expand R based on how much better f (xn+1 ) is than f (xn )
Hybrid method: combine directional seach and a trust region

1
2 3
Use one of approaches from previous slide to choose a m < n dimension subspace Use trust-region method to minimize f within the subspace Choose new subspace and repeat
Paul Schrimpf ()
January 14, 2009
12 / 43
Matlab Implementation
fminunc() oers two algorithms 1 optimset('LargeScale','off') quasi-Newton with
approximate line-search
Needs gradient, will compute nite dierence approximation if gradient not supplied Hessian approximation underdetermined by f (xn ) and f (xn1 ) in dimension > 1 Build Hessian approximation with recurrence relation: optimset('HessUpdate','bfgs') (usually better) or optimset('HessUpdate','dfp') optimset('LargeScale','on') hybrid method: 2-d trust region
with conjugate gradients

Needs the hessian, will compute nite dierence approximation if hessian not supplied Can exploit sparisty pattern of Hessian optimset('HessPattern',sparse(kron(eye(K),ones(J))))
Generally, these algorithms perform much better with user-supplied derivatives thatn with nite-dierence approximations to derivatives
optimset('GradObj','on','DerivativeCheck','on') optimset('Hessian','on')
Set Based Methods

Idea:
1 2
Evaluate function on a set of points Use current points and function values to generate candidate new points Replace points in the set with new points that have lower function values Repeat until set collapses to a single poitn
Examples: grid search, Nelder-Mead simplex, pattern search, genetic algorithms, simulated annealing Pros: simple, robust Cons: inecient an interpolation based method will usually do better
Paul Schrimpf ()
January 14, 2009
14 / 43
Nelder-Mead Simplex Algorithm
uses it N + 1 points in N dimensions form a polyhedron, move the polyhedron by

fminsearch()
1 2 Animation
Reect worst point across the center, expand if theres an improvement Shrink, e.g. toward best point, other variations possible
Another Animation
Paul Schrimpf ()
January 14, 2009
15 / 43
Pattern Search
Uses set of N + 1 or more directions, {dk } Each iteration:

Evaluate f (xi + dk ) If f (xi + dk ) < f (xi ), set xi+1 = xi + dk , increase If mink f (xi + dk ) > f (xi ), set xi+1 = xi , decrease
In Matlab, [x fval] = patternsearch(@f,x)

Requires Genetic Algorithm and Direct Search Toolbox
Paul Schrimpf ()
January 14, 2009
16 / 43
Genetic Algorithm
Can nd global optimum (but I do not know whether this has been formally proven) 0 Begin with random population of points, {xn }, then
1 2 3
i Compute tness of each point f (xn ) Select more t points as parents i+1 i Produce children by mutation xn = xn + , crossover i+1 i i i+1 i xn = xn + (1 )xm , and elites, xn = xn Repeat until have not improved function for many iterations
In Matlab, [x fval] = ga(@f,nvars,options)

Requires Genetic Algorithm and Direct Search Toolbox Many variations and options
Options can aect whether converge to local or global optimum Read the documentation and/or use optimtool
Paul Schrimpf ()
January 14, 2009
17 / 43
Simulated Annealing and Threshold Acceptance

Can nd global optimum, and under certain conditions, which are dicult to check, nds the global optimum with probability 1 Algorithm:
1 2
Random candidate point x = xi + Accept x i+1 = x if f (x) < f (xi ) or

simulated annealing: with probability threshold: if f (x) < f (xi ) + T
1 1+e (f (x)f (xi ))/
Lower the temperature, , and threshold, T
In Matlab, [x,fval] = simulannealbnd(@objfun,x0) [x,fval] = threshacceptbnd(@objfun,x0)

Requires Genetic Algorithm and Direct Search Toolbox Many variations and options
and
Options can aect whether converge to local or global optimum Read the documentation and/or use optimtool
Paul Schrimpf ()
January 14, 2009
18 / 43
Constrained Optimization
min f (x)
x
(1) (2) (3)
s.t. g (x) = 0
Quasi-Newton with directional search sequential quadratic programming

Choose direction of search both to minimize function and relax constraints
fmincon() 'LargeScale','off' does sequential quadratic programming 'LargeScale','on' only works when constraints are simple bounds
on x, it is the same as large-scale fminunc
Matlabs pattern search and genetic algorithm work for constrained problems
Solving Systems of Nonlinear Equations

Very similar to optimization F (x) = 0 min F (x) F (x)
x
Largescale fsolve = Largescale fmincon applied to least-squares problem Mediumscale fsolve = Mediumscale lsqnonlin
Gauss-Newton: replace F by rst order expansion xn+1 xn = (J (xn )J(xn ))1 J (xn )F (xn ) Levenberg-Marquardt: add dampening to Gauss-Newton to improve performance when rst order approximation is bad xn+1 xn = (J (xn )J(xn ) + n I )1 J (xn )F (xn )
Paul Schrimpf ()
January 14, 2009
20 / 43
Derivatives
The best minimization algorithms require derivatives Can use nite dierence approximation
In theory: still get better than linear rate of convergence In practice: can be inaccurate Takes n function evaluations, user written gradient typically takes 2-5 times the work of a function evaluation
Easier analytic derivatives:

Use symbolic math program to get formulas e.g. Matlab Symbolic Math Toolbox / Maple, Mathematica, Maxima Use automatic dierentiation
In Matlab INTLAB, ADMAT, MAD, ADiMat, or a version that we will create in the next lecture Switch to a language with native automatic dierentiation AMPL, GAMS
Paul Schrimpf ()
January 14, 2009
21 / 43
Simple MLE Example: Binary Choice
1 2 3 4 5 6 7 8 9 10 11 12 13
% Script for estimating a binary choice model % Paul Schrimpf, May 27, 2007 clear; % set the parameters data.N = 1000; % number of observations data.nX = 2; % number of x's parm.beta = ones(data.nX,1); % note use of function handles for distribution % estimation assumes that the distribution is known parm.dist.rand = @(m,n) random('norm',1,0,m,n); parm.dist.pdf = @(x) pdf('norm',x,0,1); parm.dist.dpdf = @(x) pdf('norm',x,0,1).*x; % derivative of pdf parm.dist.cdf = @(x) cdf('norm',x,0,1);
Paul Schrimpf ()
January 14, 2009
22 / 43
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
% create some data data = simulateBinChoice(data,parm); % set optimization options opt = optimset('LargeScale','off', ... 'HessUpdate','bfgs', ... 'GradObj','on', ... 'DerivativeCheck','on', ... 'Display','iter', ... 'OutputFcn',@binChoicePlot); b0 = zeros(data.nX,1); [parm.beta like] = fminunc(@(b) binChoiceLike(b,parm,data), ... b0,opt); % display results fprintf('likelihood = %g\n',like); for i=1:length(parm.beta) fprintf('beta(%d) = %g\n',i,parm.beta(i)); end
Paul Schrimpf ()
January 14, 2009
23 / 43
simulateBinChoice()
1 2 3 4 5
function data=simulateBinChoice(dataIn,parm) % ... comments and error checking omitted ... data.x = randn(data.N,data.nX); data.y = (data.x*parm.beta + epsilon > 0); end
Paul Schrimpf ()
January 14, 2009
24 / 43
1 2 3 4 5 6 7 8 9 10 11 12 13
function [like grad hess gi] = binChoiceLike(b,parm,data) % returns the loglikelihood of 'data' for a binary choice mode % the model is y = (x*b + eps > 0) % ... more comments omitted xb = data.x*b; % l i will be N by 1, likelihood for each person l i = parm.dist.cdf(xb); l i(data.y) = 1l i(data.y); if any(l i==0) warning('likelihood = 0 for %d observations\n',sum(l i==0)) l i(l i==0) = REALMIN; % don't take log(0)! end like = sum(log(l i));
Paul Schrimpf ()
January 14, 2009
25 / 43
Gradient for binChoiceLike()
1 2 3 4 5
% gradient of l i g i = (parm.dist.pdf(xb)*ones(1,length(b))).*data.x; g i(data.y,:) = g i(data.y,:); % change to gradient of loglike grad = sum(g i./(l i *ones(1,length(b))),1)';
Paul Schrimpf ()
January 14, 2009
26 / 43
Hessian for binChoiceLike()
1 2 3 4 5 6 7 8 9 10 11
% calculate hessian h i = zeros(length(xb),length(b),length(b)); for i=1:length(xb) h i(i,:,:) = (parm.dist.dpdf(xb(i))* ... data.x(i,:)'*data.x(i,:)) ... ... % make hessian of loglike / l i(i); end h i(data.y,:,:) = h i(data.y,:,:); % make hessiane of loglike hess = (sum(h i,1) g i'*(g i./(l i.2*ones(1,length(b)))))
Paul Schrimpf ()
January 14, 2009
27 / 43
binChoicePlot.m
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
function stop = binChoicePlot(x,optimvalues,state) if(length(x)==2) if (optimvalues.iteration==0) hold off; end grad = optimvalues.gradient; f = optimvalues.fval; plot3(f,x(1),x(2),'k*'); if (optimvalues.iteration==0) hold on; end quiver3(f,x(1),x(2),0,grad(1),grad(2)); drawnow pause(0.5); end stop = false; end
Paul Schrimpf ()
January 14, 2009
28 / 43
Numerical Integration
Want:
b
F (x) =
a
f (x)(x)dx
Approximate:
n
F (x)
i
f (xi )wi
Dierent methods are dierent ways to choose n, xi , and wi Quadrature: choose wi so the approximation is exact for a set of n basis elements
1 Monte Carlo: set wi = n , choose xi randomly
Adaptive: rene n, xi , and wi until approximation error is small

Quadrature
Suppose xi , n are given, need to choose wi Let {ej (x)} be a basis for the space of functions such that b a f (x)(x)dx <
b e (x)(x)dx a j
should be known j
{wi }n i=1
solve
n b
wi ej (xi ) =
i=1 a
ej (x)(x)dx forj = 1..n
(4)
Example: Newton-Cotes
basis = polynomials (x) = 1 Resulting rule is exact for all polynomials of degree less than or equal to n
Paul Schrimpf ()
January 14, 2009
30 / 43
Gaussian Quadrature
Now suppose n is given, but we can choose both wi and xi Same idea, {wi , xi }n solve i=1
n b
wi ej (xi ) =
i=1 a
ej (x)(x)dx forj = 1..2n 1
(5)
Exact for functions in the space spanned by {ej }2n1 j=1 If {ej } is an orthogonal polynomial basis, then {xi }n will be the i=1 roots of en Dierent gaussian quadrature rules correspond to dierent values of a, b, and (x)
Paul Schrimpf ()
January 14, 2009
31 / 43
Common Forms of Gaussian Quadrature
Interval [1, 1] (1, 1) (1, 1) [1, 1] [0, ) (, )
(x) 1 (1 x) (1 + x)
1 1x 2
Name Legendre Jacobi Chebyshev (rst kind) Chebyshev (second kind) Laguerre Hermite
1 x2 e x 2 e x
Paul Schrimpf ()
January 14, 2009
32 / 43
Quadrature in Matlab
Not built in Many implementations available at the Great for single dimension integration Multiple dimension integration is harder
... im =1 f (xi1 , ..xim )wim ..wim works, but needs mn function evaluations More sophisticated methods exist called cubature, sparse grid, or complete polynomials see e.g. Encyclopedia of Cubature Formulas
n i1 n
Mathworks le exchange
Paul Schrimpf ()
January 14, 2009
33 / 43
Adaptive Integration
F (x) =
a
f (x)(x)dx
Idea: subdivide (a, b) into smaller intervals, use simple quadrature rule on each interval, and repeat until convergence e.g. trapezoid rule, Simpsons rule
trapz(), quad(), quadl()
Pro: computes integral to known accuracy Con: care must be used when part of an objective function
Makes the objective function discontinuous at points where solution goes from k to k + 1 subdivisions integration accuracy must be set much higher than convergence criteria of optimization
Paul Schrimpf ()
January 14, 2009
34 / 43
Monte Carlo Integration
Randomly draw xi from distribution p(x) (x), set 1 wi = n (x)dx Many methods for sampling from p(x): inverse cdf, acceptance sampling, importance sampling, Gibbs sampling, Metropolis-Hastings Pros: simple to understand, easy to implement, scales well, requires little a priori knowledge of f (x) Cons: inecient for a xed n, the right quadrature rule will do much better
But when computing something like i g ( s f (yi , xi,s )ws ), errors in g ( s f (yi , xi,s )ws ) for dierent i can oset one another
Paul Schrimpf ()
January 14, 2009
35 / 43
Useful Matlab functions for Monte Carlo Integration
1 2 3 4 5
x y % t %
= rand(2,3); % 2 by 3 matrix of x U[0,1] = randn(10); % 10 by 10 matrix of y N(0,1) more generally = random('t',3,2,1); % 2 by 1, tdist with 3 df many other distributions possible
Paul Schrimpf ()
January 14, 2009
36 / 43
Integration Example
1 2 3 4 5 6 7 8 9
clear; % some integration experiments % Example 1: E[p(x)], xN(0,1), p(x) polynomial degree = 10; p = rand(1,degree+1); % make some polynomial fprintf('adaptive quad, tol %.1g = %.10g\n', ... 1e10, quad(@(x) polyval(p,x).*normpdf(x), ... 30,30,1e10));
Paul Schrimpf ()
January 14, 2009
37 / 43
Integration Example
1 2 3 4 5 6 7 8 9 10 11
fprintf(' gausshermite quadrature\n'); for n=1:((degree+1)/2+4) % use hermite will be exact for n(degree+1)/2 int = gaussHermite(n); % notice the change of variables ep = polyval(p,int.x*sqrt(2))'*int.w/sqrt(pi); if (n==round((degree+1)/2)) fprintf(' the rest should be exact \n'); end fprintf('n=%2d E[p(x)] = %.10g\n',n,ep); end
Paul Schrimpf ()
January 14, 2009
38 / 43
Integration Example
1 2 3 4 5
fprintf('\n monte carlo integration\n') for n=1:6 fprintf('n=10%d E[p(x)] = %.10g\n',n, ... mean(polyval(p,randn(10n,1)))); end
Paul Schrimpf ()
January 14, 2009
39 / 43
MCMC
For nonstandard distributions, it is often impossible to sample from p(x) directly MCMC constructs a Markov Chain, xt p(xt |xt1 ), with stationary distribution p(x) and transition kernel, p(xt |xt1 ) that can be sampled from Very common in Bayesian statistics
Paul Schrimpf ()
January 14, 2009
40 / 43
Metropolis-Hastings
General method for constructing a Markov chain Algorithm to draw from p(x): beginning with some x0
1 2 3 4
Draw y q(xt , ) p(y Compute (xt , y ) = p(xt)q(y ,xt )) )q(xt ,y Draw u U[0, 1] If u < (xt , y ) set xt+1 = y , otherwise set xt+1 = xt If q is too disperse, will not accept many draws If q is too concentrated, will accept lots of draws, but theyll be close together
Choice of candidate density, q, aects behavior of chain
Example: metropolisHastings.m
Paul Schrimpf ()
January 14, 2009
41 / 43
Exercises
1 2 3
If you have written any code that involves optimization or integration, try modifying it to use a dierent method. Modify the dynamic programming code from lecture 1 to allow for a continuous distribution for income. If you are clever, will be able to evaluate E v (x , y ) exactly, even if Ev (x , y ) does not have an analytic solution. We know that value function iteration converges linearly. Gauss-Newton and other quadratic approximation based method converge at faster rates. Change the dynamic programming code from lecture 1 to use one of these methods instead of value function iteration. Change the binary choice model into a multinomial choice model. Allow for correlation between the shocks. Try to preserve the generality of the binary model, but feel free to limit the choice of distribution if it helps.
Paul Schrimpf ()
January 14, 2009
42 / 43
More Exercises
1 (hard) I dont know much about cubature, but Id like to learn more. Read about a few methods. Find or write some Matlab code for one of them. Explore the accuracy of the method. To maintain a desired level of accuracy, how does the number of points grow with the number of dimensions? Compare it monte carlo integration. (hard) Matlab lacks an implementation of an optimization algorithm that uses interpolation in multiple dimensions. Remedy this situation. Find or develop an algorithm and implement it. (hard, but not as hard) Write a function for computing arbitrary Gaussian quadrature rules with polynomials as a basis. Given integration limits, a weight function, (x), and the number of points, n, your function should return the integration points and weights. You might want to use the following facts taken from Numerical Recipes . Let Rb f |g = a f (x)g (x)(x)dx denote the inner product. Then the following recurrence relation will construct a set of orthogonal polynomials: p1 (x) 0 p0 (x) 1 pj+1 (x) = x xpj |pj pj |pj ! x pj |pj pj1 |pj1 pj1 (x)
2 3
Recall that the roots of the n degree polynomial will be the integration points. If you have the roots, {xj }n . Then j=1 the weights are given by pn1 |pn1 pn1 (xj )pn (xj )
wj =
Paul Schrimpf ()
January 14, 2009
43 / 43

Matlab Opt An Dint

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matlab Opt An Dint

Uploaded by

Copyright:

Available Formats

Matlab Optimization and Integration

January 14, 2009

Matlab Optimization and Integration

January 14, 2009

Matlab Optimization and Integration

January 14, 2009

Want to solve a minimization problem: min f (x)

Two basic approaches:

Matlab Optimization and Integration

January 14, 2009

Matlab Optimization and Integration

January 14, 2009

Matlab Optimization and Integration

January 14, 2009

Newton-Rhapson and Quasi-Newton

Shrink or expand R based on how much better f (xn+1 ) is than f (xn )

Brents method combines quadratic interpolation with sectional search

Matlab Optimization and Integration

January 14, 2009

uses Brents method

optimset('fminbnd') % this returns the default options for

Multi-Dimensional Optimization with Quadratic Approximation

Basic idea is the same:

Complicated by diculty of constructing region

Matlab Optimization and Integration

January 14, 2009

Directional Search Methods

Choose a dierent direction and repeat

Matlab Optimization and Integration

January 14, 2009

Trust Region Methods

Same as one dimension:

Shrink or expand R based on how much better f (xn+1 ) is than f (xn )

Hybrid method: combine directional seach and a trust region

Matlab Optimization and Integration

January 14, 2009

with conjugate gradients

Set Based Methods

Matlab Optimization and Integration

January 14, 2009

Nelder-Mead Simplex Algorithm

uses it N + 1 points in N dimensions form a polyhedron, move the polyhedron by

Matlab Optimization and Integration

January 14, 2009

Uses set of N + 1 or more directions, {dk } Each iteration:

In Matlab, [x fval] = patternsearch(@f,x)

Matlab Optimization and Integration

January 14, 2009

In Matlab, [x fval] = ga(@f,nvars,options)

Matlab Optimization and Integration

January 14, 2009

Simulated Annealing and Threshold Acceptance

Random candidate point x = xi + Accept x i+1 = x if f (x) < f (xi ) or

Lower the temperature, , and threshold, T

In Matlab, [x,fval] = simulannealbnd(@objfun,x0) [x,fval] = threshacceptbnd(@objfun,x0)

Matlab Optimization and Integration

January 14, 2009

(1) (2) (3)

Quasi-Newton with directional search sequential quadratic programming

on x, it is the same as large-scale fminunc

Solving Systems of Nonlinear Equations

Matlab Optimization and Integration