Professional Documents
Culture Documents
1 Introduction
In this section, we examine regression models that are nonlinear in the parameters and give a brief overview
yi = h(xi ; ; i ); (1)
yi = h(xi ; ) + i : (2)
1. h(xi ; ; i ) = 0 x1i
1
x2i2 exp( i ). This is an intrinsically linear model because by taking natural loga-
rithms, we get a model that is linear in the parameters, ln(yi ) = 0+ 1 ln(x1i ) + 2 ln(x2i ) + i . This
2. h(xi ; ) = 0 x1i
1
x2i2 . Since the error term in (2) is additive, there is no transformation that will produce
a linear model. This is an intrinsically nonlinear model (i.e., the relevant …rst-order conditions are
nonlinear in the parameters). Below we consider two methods for estimating such a model –linearizing
the underlying regression model and nonlinear optimization of the objective function.
where
g(xi ; 0) = (@h=@ 1j = 0
; :::; @h=@ kj = 0
):
1
Collecting terms and rearranging gives
Y 0 = X0 + 0
where
Y0 Y h(X; 0) + g(X; 0) 0
X0 g(X; 0 ):
tion errors.
Given an initial value for 0, we can estimate with the following iterative LS procedure
= [X 0 (bt )0 X 0 (bt )] 1
[X 0 (bt )0 (X 0 (bt )bt + e0t )]
= bt + [X 0 (bt )0 X 0 (bt )] 1
X 0 (bt )0 e0t
= bt + Wt t gt
between bt+1 and bt is su¢ ciently small. This is called the Gauss-Newton algorithm. Interpretations for
2
Wt , t and gt will be given below. A consistent estimator of is
1 Xn
s2 = (yi h(xi ; b))2 :
n k i=1
Only asymptotic results are available for this estimator. Assuming that the pseudoregessors are well-behaved
(i.e., plim n1 X 00 X 0 = Q0 , a …nite positive de…nite matrix), then we can apply the CLT to show that
2
asy
b N[ ; (Q0 ) 1
],
n
2
where the estimate of n (Q0 ) 1
is s2 (X 00 X 0 ) 1
:
2
2.1.3 Notes
1. Depending on the initial value, b0 , the Gauss-Newton algorithm can lead to a local (as opposed to
2. The standard R2 formula may produce a goodness-of-…t value outside the interval [0; 1].
3. Extensions of the J test are available that allow one to test nonlinear versus linear models.
Consider testing the hypothesis H0 : R( ) = q. Below are four tests that are asymptotically equivalent.
Begin by letting S(b) = (Y h(X; b))0 (Y h(X; b)) be the sum of square residuals evaluated at the unrestricted
NLS estimate. Also, let S(b ) be the corresponding measure evaluated at the restricted estimate. The
asy
W = [R(b) q]0 [C V^ C 0 ] 1
[R(b) q] 2
(J)
where V^ = ^ 2 (X 00 X 0 ) 1
, C = @R(b)=@b and ^ 2 = S(b)=n.
2
Assume N (0; I). The likelihood ratio statistic is
asy 2
LR = 2[ln(L ) ln(L)] (J)
where ln(L) and ln(L ) are the unrestricted and restricted (log) likelihood values respectively.
3
2.2.4 Lagrange Multiplier Test.
The LM statistic is based solely on the restricted model. Occasionally, by imposing the restriction R( ) = q,
it may change an intrinsically nonlinear model into an intrinsically linear one. The LM statistic is
e0 X 0 [X 00 X 0 ] 1 X 00 e
LM =
S(b )=n
e0 X 0~b ]
ESS e2 asy 2
= =n = nR (J)
S(b )=n ]
T SS
of determination of e on X 0 .
An alternative method for estimating the parameters of equation (2) is to apply nonlinear optimization
techniques directly to the …rst-order conditions. Consider the NLS problem of minimizing
Xn Xn
S(b) = e2i = (yi h(xi ; b))2 :
i=1 i=1
@S(b) Xn @h(xi ; b)
= 2 (yi h(xi ; b)) = 0; (3)
@b i=1 @b
which is generally nonlinear in the parameters and does not have a nice closed-form, analytical solution.
The methods outlined below can be used to solve the set of equations (3).
3.1 Introduction
df ( )
= b + 2c = 0 =) = b=2c:
d
This is considered a linear optimization problem even though the objective function is nonlinear in the
2
parameters. Alternatively, consider the objective fuction f ( ) = a + b + c ln( ). The …rst-order condition
for minimization is
df ( ) c
= 2b + = 0:
d
4
Here is a general outline of how to solve the nonlinear optimization problem. Let be the parameter
Procedure.
1. Specify 0 and 0.
2. Determine .
3. Compute t+1 = t + t t.
Yes ! Exit.
There are two general types of nonlinear optimization algorithms –those that do not involve derivatives
Derivative-free algorithms are used when the number of parameters are small, analytical derivatives are
di¢ cult to calculate or seed values are needed for other algorithms.
1. Grid search. This is a trial-and-error method that is typically not feasible for more than two parame-
ters. It can be a useful means to …nd starting values for other algorithms.
2. Direct search methods. Using the iterative algorithm t+1 = t + t t, a search is performed in m
3. Other methods. Simplex algorithm and simulated annealing are examples of other derivative-free
methods.
The goal is to choose a directional vector t to go uphill (for a max) and an appropriate step length t. Too
big a step may overshoot a max and too small a step may be ine¢ cient. (See Figures 5.3 and 5.4 attached.)
With this in mind, consider choosing t such that the objective function increases (i.e., G( t+1 ) > G( t )).
5
where gt = dG( t+1 )=d t+1 . If we let t = Wt gt , where Wt is a positive de…nite matrix, then we know that
dG( t + t t)
= gt0 Wt gt 0.
d t
t+1 = t + t Wt gt
where t is the step length, Wt is a weighting matrix, and gt is the gradient. The Gauss-Newton algorithm
above could be written in this general form. Here are examples of some other algorithms.
1. Steepest Ascent.
Wt = I so t = gt .
This method has the drawbacks that (a) it can be slow to converge, especially on long narrow
Newton’s method can be motivated by taking a Taylor series approximation (around 0) of the
1
Therefore, Wt = H and t = 1.
Hessian can be di¢ cult to calculate or positive de…nite if far from optimum.
1
Wt = (H( t ) I) , where > 0 is chosen to ensure that Wt is positive de…nite.
4. Davidson-Fletcher-Powell (DFP).
6
Choose W0 = I.
5. Method of Scoring.
2 1
Wt = E( @@ ln(L)
@ 0 ) .
2
Wt = (g( t )g( t )0 ) 1
is an estimate of H 1
( t) = ( @@ ln(L)
@ 0 )
1
.
Notes.
1. Nonlinear optimization with constraints. There are several options such as forming a Lagrangian
function, substituting the constraint directly into the objective and imposing arbitrary penalties into
2. Assessing convergence.
3. The biggest problem in nonlinear optimization is making sure the solution is a global, as opposed to
local, optimum. The above methods work well for globally concave (convex) functions.
1. Example #1. A sample of data (n = 20) was generated from the intrinsically nonlinear regression
model
2
yt = 1 + 2 x2t + 2 x3t + t,
0
G( 1 ; 2) = (y h( 1 ; 2 )) (y h( 1 ; 2 ))
2
where h( 1 ; 2) = 1 + 2 x2t + 2 x3t . See Figure B.2 and Table B.3 (attached) to see how Newton’s
7
2. Example #2. The objective is to minimize
3 2
G( ) = 3 + 5.
2
g( ) = 3 6 =3 ( 2)
H( ) = 6 6 = 6( 1).
3 t ( t 2) t( t 2)
t+1 = t = t .
6( t 1) 2( t 1)
This examples highlights the fact that, at least for objective functions that are not globally concave
(or convex), the choice of starting values is an important aspect of nonlinear optimization.
4 MATLAB Example
In this example, we are going to estimate the parameters of an intrinsically nonlinear Cobb-Douglas produc-
tion function
Qt = 1 Lt
2
Kt 3 + t
using Gauss-Newton and Newton’s method, as well as test for constant returns to scale (see MATLAB
example 14).
0 0 0 0 0 0
0 0 0
g(X; ) = fLt 2 Kt 3 ; ln(Lt ) 1 Lt Kt
2 3
; ln(Kt ) 1 Lt Kt
2 3
g.
8
For Newton’s method, the relevant gradient and Hessian matrices are
Xn @h(xi ; b)
g(b) = 2 ei ( )
i=1 @b
@g(b)
H(b) = .
@b