Autocorrelation

Generalized Least Squares (GLS) Theory, Heteroscedasticity & Autocorrelation
Ba Chu E-mail: ba chu@carleton.ca Web: http://www.carleton.ca/bchu

(Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for details. Examples will be given and explained in the class )
Objectives
Until now we have assumed that var(u) = 2 I but it can happen that the errors have non-constant variance, i.e., var(u1 ) = var(u2 ) = = var(uT ) or are correlated, i.e., E (ut us ) = 0 for t = s. This assumption about the errors is true for many economic variables. Suppose instead that V ar(u) = 2 , where the matrix contains terms for heteroscedasticity and autocorrelation. First, we study the GLS estimator and the feasible GLS estimator. Second, we apply these techniques to do statistical inference on the regression model with heteroscedastic errors and test for heteroscedasticity. Third, we learn to estimate regression models with autocorrelated disturbances and test for serial correlation. Fourth, we also learn to use the ML technique to estimate the regression models with autocorrelated disturbances.
GLS
y = X + u, 1 (2.1)
Consider the model:
where E [u] = 0 and E [u u] = 2 with is any matrix, not necessarily diagonal.
2.1
Problem with the OLS
If we estimate (2.1) with the OLS, we will obtain OLS = (X X )1 X y = + (X X )1 X u. It is easy to verify that OLS is still unbiased. However, since V ar(OLS = 2 (X X )1 X X (X X )1 = 2 (X X )1 , inferences based on the estimator 2 (X X )1 , that we have seen, is misleading. Thus, t and F tests are invalid. Moreover, OLS is not always consistent. This is the main reason for us to learn the GLS.
2.2
Derivation of the GLS
Suppose has the eigenvalues 1 , . . . , T , by Choleskys decomposition, we obtain:
= S S ,
where is a diagonal matrix with the diagonal elements (1 , . . . , T ), and S is an orthogonal matrix. Thus,
1 = S 1 1 S 1 = S 1 1 / 2 1 / 2 S 1 = PP , where P = S 1 1/2 and 1/2 is a diagonal matrix with the diagonal elements ( 1 , . . . , T ). It is straight-forward to prove that P P = IT . Now, multiplying both sides of (2.1) by P yields
P y = P X + P u.
Setting y 0 = P y , X 0 = P X and u0 = P u, we end up with the classical linear regression model:
y 0 = X 0 + u0 , 2
where E [u0 ] = 0 and E [u0 u0 ] = E [P uu P ] = 2 P P = 2 IT . The OLS estimate of is given by
GLS = (X 0 X 0 )1 X 0 y 0 = (X P P X )1 X P P y = (X 1 X )1 X 1 y .
The last equation is dened as the GLS estimator denoted by GLS . Note that the subscript GLS is sometimes omitted for brevity. The unbiased estimate of 2 is given by 1 1 (y 0 X 0 ) (y 0 X 0 ) = (y X ) 1 (y X ). T K T K
2 GLS =
2.3
Properties of the GLS estimator
The GLS estimator is unbiased. V ar(GLS ) = 2 (X X )1 . The GLS estimator is BLUE. If u N (0, 2 ) then N ( , 2 (X 1 X )1 ). The F-test formula is given by ( R r ) [R(X 1 X )1 R ]1 (R r ) F =
q K
q 2 GLS
p
If limT X 0 X 0 /T = Q0 > 0, then = as T . T ( ) = N (0, 2 Q0 ).

d 1
2.4
Feasible GLS
Since the matrix is unknown, we need to use an estimator, say, . We have
F GLS = (X 1 X )1 X 1 y .
If the following conditions:
p lim (X 1 X /T ) = p lim (X 1 X /T )
T T
p lim (X 1 u/ T ) = p lim (X 1 u/ T )
T T
hold, then GLS and F GLS are asymptotically equivalent.
Heteroscedasticity
Lets consider the model (2.1) in which the matrix is a diagonal matrix with the diagonal elements
2 2 ). This is a form of heteroscedasticity dened in terms of inconstant variances. We can (1 , . . . , T
estimate this model by using either the FGLS or the modied OLS technique proposed by White [1980]. The FGLS procedure is straight-forward. We shall describe Whites OLS procedure in details.
2 2 2 , . . . , t , . . . , T ) White [1980] proposes a consistent estimator with the diagonal elements (1
given by
2 HC-0 t = u2 t , where ut is the OLS residual. 2 HC-1 t = 2 HC-2 t = 2 HC-3 t = T u2 . T K t u2 t , 1 ht
where ht is t-th diagonal element of the matrix X (X X )1 X .
u2 t . (1ht )2
The asymptotic variance-covariance matrix of OLS is given by
Avar(OLS ) = lim (X X /T )p lim( 2 OLS X X /T ) lim (X X /T ).

T T
Based on this heteroscedasticity-consistent estimator, the general linear hypotheses may be tested by the Wald statistics:
d
W = (ROLS r ) [RAvar(OLS )R ]1 (R r ) = 2 (q ).
The White procedure has large-sample validity. It may not work well in nite samples. We can identify whether or not there exists heteroscedasticity in the noise by using Whites test. The null hypothesis is homoscedasticity. To do Whites test, we proceed by regressing the OLS residuals on a constant, original regressors, the squares of the regressors, and the cross products of the regressors and obtain the R2 value. The test can be constructed by T R2 2 (q ), where q is the number of variables in the auxiliary regression less one. We reject the null if T R2 is suciently large.
Serial Correlation
Note that serial correlation and autocorrelation mean the same thing. Consider the model:
yt = Xt + ut , t = 1, . . . , T,
2 where E [ut ] = 0, E [u2 t = and E [ut us ] = 0 for t = s.
(4.1)
For example, we can model serial correlation by an AR(1) model, i.e.,
ut = ut1 + t ,
(4.2)
where
IID(0, 2 ) and || < 1. We can compute 2 , 1 2 |ts| 2 E [u t u s ] = . 1 2 E [u 2 t] =
More generally, we have 2 E [uu ] = 2 1 1 . . . 1 . . .

2
...
T 1
. . .
T 1 T 2 T 3
... 2 , = u . . ... . ... 1

T 2
(4.3)
2 and have the obvious meanings. where u
4.1
Estimate (4.1)-(4.2) by FGLS
The FGLS can be done in the following steps: 1. Regress yt on Xt and obtain ut = yt Xt OLS . 2. Regress ut on ut1 and obtain OLS =
P u u b P bu b
T t=2 t t1 T 2 t=2 t1
3. Construct the FGLS by F GLS = (X 1 X )1 X 1 y , where is obtained from (4.3) by substituting with OLS . If p limT
1 T T 1
Xt ut = 0, then OLS = and OLS = . Therefore, F GLS is consistent
and asymptotically normal.
4.2
Estimate (4.1)-(4.2) by the ML
An application of the conditional probability rules, we have
pdf (y1 , y2 ) = pdf (y2 |y1 )pdf (y1 ), pdf (y1 , y2 , y3 ) = pdf (y3 |y1 , y2 )pdf (y1 , y2 ) = pdf (y3 |y1 , y2 )pdf (y2 |y1 )pdf (y1 ), .........
T
pdf (y1 , . . . , yT ) = pdf (y1 )

t=2
f (yt |yt1 , . . . , y1 ).
Since
yt = Xt + ut = Xt + ut1 + N (0, 2 ), we have
= Xt + (yt1 Xt1 ) +
and
yt |yt1 N (yt1 + Xt Xt1 , 2 ).

Moreover, since y1 N (X1 , 1 ), we can derive the log-likelihood function: 2
2
log L( , , 2 ) = log pdf (y1 , . . . , yT )

T
= log pdf (y1 )

t=2
pdf (yt |yt1 ).
We then maximize this log likelihood function to obtain the MLEs. The asymptotic variance of the MLEs can be obtained from the CR lower bound.
4.3
Estimate (4.1) by the modied OLS
In general, the model (4.1) with general form of heteroscedasticity and autocorrelation can be consistently estimated by the OLS, i.e., OLS = (X X )1 X y . OLS is AN, i.e., T (OLS ) = N (0, 2 Q1 M Q1 ),
d
where Q = limT X X /T and M = p limT X X /T under the assumption E [uu ] = 2 . Q can be estimated by Q = X X /T ; 2 M can be consistently estimated by the Newey-West heteroscedasticity and autocorrelation consistent covariance (HAC) matrix estimator:
m
HAC = 0 +
j =1
w(j, m)(j + j ),
where 1 j = T
T
Xt ut utj Xtj , j = 1, . . . , m,
t=j +1
where w(j, m) is a lag window, m is a lag truncation parameter. If w(j, m) = 1 j/(m + 1), then w(j, m) is called the Barlett window. Hence, the asymptotic covariance matrix of OLS can be estimated by Q1 HAC Q1 .
4.4
Test for serial correlation

T T
The Durbin-Watson test DW =

2
(ut ut1 )2 /
1
u2 t,
where ut is the OLS residual, is used to test for the rst-order serial correlation (i.e., AR(1)). For H0 : = 0 vs. H1 : > 0, reject H0 if DW DW , do not reject if DW > DWu , and inconclusive if DW < DW < DWu with DW is the lower critical value and DWu is upper critical value. For H0 : = 0 vs. H1 : < 0, reject H0 if DW 4 DW , accept if DW < 4 DWu . 8
The Breusch-Godfrey LM test is used to test for high-order serial correlation (i.e., the AR(p) model as given by
p
ut =
i=1
i uti + t ).
p i=1 bi uti
The test is carried out by doing the OLS regression: ut = Xt +

d
+ error. Next,
we do either the LM test T R2 = 2 (q ) under the null hypothesis of no serial correlation or the F-test for b1 = b2 = = bp = 0.
Autoregressive Conditional Heteroscedasticity (ARCH)
Engle [1982] suggests that heteroscedasticity may occur in time-series context. In speculative markets such as exchange rates and stock market returns, one often observes the phenomenon called volatility clustering large changes tend to be followed by large changes, of either sign, and small changes tend to be followed by small changes. Observations of this phenomenon in nancial time series have led to the use of ARCH models. Engle [1982] formulates the notion that the recent past might give information about the conditional disturbance variance. He postulates the relation:
yt = Xt + t t , where
IID(0, 1),
2 2 t = 0 + 1 u2 t1 + + p utp , where ut = yt Xt .
Testing for ARCH can be done in the following steps: 1. Fit y to X by OLS and obtain the residuals {et }.
2 2 2. Compute the OLS regression, e2 t = 0 + 1 et1 + + p etp + error.
3. Test the joint signicance of 0 , 1 , . . . , p .
Exercises
1. Problems 6.1, 6.2, 6.3 and 6.7 (pp. 202-203, JD97).
2. Consider the model
yt = yt1 + ut , | | < 1, ut =
t
t1 ,
|| < 1,
where t = 1, . . . , T and
IID(0, 2 ).
(a) Show that the OLS estimator of is inconsistent when = 0. (b) Explain how you would test H0 : = 0 vs. H1 : = 0 using the Lagrangian multiplier test principle. Is such test likely to be superior to the Durbin-Watson test?. 3. Consider the model
yt = xt + ut , ut = ut1 + t , || < 1,
where xt is non-stochastic and
IID N (0, 2 ).
(a) Given a sample {xt , yt }T t=1 , construct the log-likelihood function. (b) Estimate the parameters , and 2 by the ML method. (c) Derive the asymptotic variances of these MLEs. (d) Given xT +1 , what do you think is the best predictor of yT +1 ?. 4. Consider the linear regression model y = X + u, where y is a T 1 vector of observations on the dependent variable, X is a T K matrix of observations on K non-stochastic explanatory variables with rank (X ) = K < T , is a K 1 vector of unknown coecients, and u is a T 1 vector of unobserved disturbances such that E [u] = 0 and E [uu ] = 2 , with positive denite and known. (a) Derive the GLS estimator GLS of . 10
(b) Assuming that T 1/2 X 1 u = N (0, 2 limT X 1 X /T ), derive the limiting distribution of GLS as T . State clearly any further assumptions you make and any statistical results you use. 5. In the generalized regression model, suppose that is known. (a) What is the covariance matrix of the OLS and GLS estimators of ? (b) What is the covariance matrix of the OLS residual vector uOLS = y X OLS ?. (c) What is the covariance matrix of the GLS residual uGLS = y X GLS ?. (d) What is the covariance matrix of the OLS and GLS residual vectors? 6. Suppose that y has the pdf pdf (yt |Xt ) = 1/( Xt ) exp{y/((Xt ) )}, y > 0. Then
1 K K 1
E [yt |Xt ] = Xt and V ar[yt |Xt ] = ( Xt ) . For this model, prove that the GLS and MLE are the same, even though this distribution involves the same parameters in the conditional mean function and the disturbance variance. 7. Suppose that the regression model is yt = + t , where has a zero mean, constant variance,
and equal correlation across observations. Then Cov ( t , s ) = 2 if t = s. Prove that the OLS estimator of is inconsistent. 8. Suppose that the regression model is yt = + t , where
E [ t |xt ] = 0, Cov (
t s |xt , xs )
= 0 for i = j, but V ar[ t |xt ] = 2 x2 t , xt > 0.
(a) Given a sample of observations on yt and xt , what is the most ecient estimator of ?, What is its variance?. (b) What is the OLS estimator of ?, and what is its variance?. (c) Prove that the estimator in part (a) is at least as ecient as the estimator in part (b).
11
References
Engle, R. F. [1982]. Autoregressive conditional heteroscedasticity with estimates of variance of united kingdom ination, Econometrica 50: 9871008. White, H. [1980]. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroscedasticity, Econometrica 48: 817838.
12

Autocorrelation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Autocorrelation

Uploaded by

Copyright:

Available Formats

Generalized Least Squares (GLS) Theory, Heteroscedasticity & Autocorrelation

Ba Chu E-mail: ba chu@carleton.ca Web: http://www.carleton.ca/bchu

Consider the model:

where E [u] = 0 and E [u u] = 2 with is any matrix, not necessarily diagonal.

Problem with the OLS

Derivation of the GLS

Suppose has the eigenvalues 1 , . . . , T , by Choleskys decomposition, we obtain:

Setting y 0 = P y , X 0 = P X and u0 = P u, we end up with the classical linear regression model:

where E [u0 ] = 0 and E [u0 u0 ] = E [P uu P ] = 2 P P = 2 IT . The OLS estimate of is given by

Properties of the GLS estimator

If limT X 0 X 0 /T = Q0 > 0, then = as T . T ( ) = N (0, 2 Q0 ).

Since the matrix is unknown, we need to use an estimator, say, . We have

If the following conditions:

hold, then GLS and F GLS are asymptotically equivalent.

where ht is t-th diagonal element of the matrix X (X X )1 X .

The asymptotic variance-covariance matrix of OLS is given by

Avar(OLS ) = lim (X X /T )p lim( 2 OLS X X /T ) lim (X X /T ).

For example, we can model serial correlation by an AR(1) model, i.e.,

IID(0, 2 ) and || < 1. We can compute 2 , 1 2 |ts| 2 E [u t u s ] = . 1 2 E [u 2 t] =

More generally, we have 2 E [uu ] = 2 1 1 . . . 1 . . .

... 2 , = u . . ... . ... 1

2 and have the obvious meanings. where u

Estimate (4.1)-(4.2) by FGLS

Xt ut = 0, then OLS = and OLS = . Therefore, F GLS is consistent

and asymptotically normal.

Estimate (4.1)-(4.2) by the ML

An application of the conditional probability rules, we have

pdf (y1 , . . . , yT ) = pdf (y1 )

yt = Xt + ut = Xt + ut1 + N (0, 2 ), we have

yt |yt1 N (yt1 + Xt Xt1 , 2 ).

log L( , , 2 ) = log pdf (y1 , . . . , yT )

= log pdf (y1 )

pdf (yt |yt1 ).

Estimate (4.1) by the modied OLS

Test for serial correlation

The Durbin-Watson test DW =

The test is carried out by doing the OLS regression: ut = Xt +

Autoregressive Conditional Heteroscedasticity (ARCH)

3. Test the joint signicance of 0 , 1 , . . . , p .

2. Consider the model

where xt is non-stochastic and

= 0 for i = j, but V ar[ t |xt ] = 2 x2 t , xt > 0.

You might also like