You are on page 1of 10

Carolina found the following site with an example of unit root tests

http://www.econ.uiuc.edu/~econ472/tutorial9.html
Copy and paste the following lines to a blank word document.
Create a subdirectory on your C drive called data. (c:/data)
Use the Edit menu of MS-word and save as
file name is eggs and file type is Plain Text which is not visible at
first blush
Try to open in word the following to make sure typing of names is
correct
c:/data/eggs.txt
money=read.table("c:/data/eggs.txt",header=T)
year
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969

chic
468491
449743
436815
444523
433937
389958
403446
423921
389624
418591
438288
422841
476935
542047
582197
516497
523227
467217
499644
430876
456549
430988
426555
398156
396776
390708
383690
391363
374281
387002
369484
366082
377392
375575
382262
394118
393019
428746
425158
422096

egg
3581
3532
3327
3255
3156
3081
3166
3443
3424
3561
3640
3840
4456
5000
5366
5154
5130
5077
5032
5148
5404
5322
5323
5307
5402
5407
5500
5442
5442
5542
5339
5358
5403
5345
5435
5474
5540
5836
5777
5629

1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983

433280
421763
404191
408769
394101
379754
378361
386518
396933
400585
392110
384838
378609
364584

5704
5806
5742
5502
5461
5382
5377
5408
5608
5777
5825
5625
5800
5656

#now open R and use following commands


Egd<-read.table("C:/data/eggs.txt", header=T)
year<-Egd$year
egg<-Egd$egg
chic<-Egd$chic
library(stats)
#pacakge ts is merged into stats
year<-ts(year) #make it a time series
chic<-ts(chic)
egg<-ts(egg)
Read Davidson MaCKinnon chapter 14 on random walks and unit roots.
Random Walk (Rwalk) process with drift 1 is
yt = 1+ yt-1 + t, y0=0, t=IID(0, 2)
using the lag operator L this says
(1 L)yt= 1+

yt-1 + t

Clearly (1 L)=0 is a polynomial in L, its solution is L=1. Hence it is


called the unit root.
Write (1 L)=

Take first difference yt= yt yt-1 = 1 + t.

DEFINITION of Integration of order d: If a nondeterministic time series has a stationary


invertible ARMA representation after differencing d times, it is said to be integrated of
order d, denoted by:
yt ~ I(d) if (1L)dyt is stationary or of I(0).
Since yt= 1 + t. we see that yt is I(1) or integrated of order 1.
SPURIOUS REGRESSION
If we regress a y series with unit root on regressors who alos have unit
roots the usual t tests on regression coefficients show statistically
significant regressions, even if in reality it is not so.
Regression of yt= RWalk on xt= Rwalk experiment with T= 20 to 20,000.
See fig 14.1 in DMCK page 611: (both the null and alternative are false,
the problem is that test is rejecting false null too often).
Vertical axis has rejection probability (null is =0) which should be
0.05. It is actually much larger, hence we call it spurious. It is
curious that lagged dependent variable helps reduce the spuriousness,
but still the dotted line remains above 5%.

Thus, unit Root testing motivated by the need to avoid spurious


regressions. (regression of yt= RWalk on xt= Rwalk)
Nelson and Plosser said many econ series are RWalks! How do we know?
we test.
consider yt=yt-1 + et. This has random walk if =1
subtract yt-1 from both sides

yt-1 = yt-1 yt-1 + et.


or yt=( 1)yt-1 + et
yt

Simplest Dickey Fuller tests regress yt=( 1)yt-1 + et.


If yt is I(1)or Rwalk with unit root, the regression coefficient of
lagged y on right side should be zero. The test can use a t test with

so-called statistic or n times OLS coefficient called z statistic. The


z statistic is not pivotal unless one includes the intercept in the
above regression. Hence typical Dickey Fuller tests regress
yt= 0 + 1 t + ( 1)yt-1 + et.
where intercept is present and t the time variable is also present.
I repeat left side is first difference and right side has both time and
lagged y.
Unit root means the coefficient of lagged y is 1. We write the
coefficient to be ( 1) instead of 3 to remind us that the null
hypothesis is going to be =1 or unit root. The alternative hypothesis
is that the coefficient is negative.
Why big deal? The usual t distribution is not applicable but simulation
is needed to construct a suitable sampling distribution of test stat.
Figures DMCK p 619 and 620 plot these densities. They have more stuff
on left of the usual t density. This means test stat has to be more
negative than the usual -1.64 to make things significantly negative.
Now comes the usual problem in econometrics that errors are
autocorrelated. To deal with it we augment the Dickey Fuller test by
inserting lagged difference on the right side of above regression.
Hence typical Augmented Dickey Fuller tests regress
yt = 0 + 1 t + ( 1)yt-1 + 1yt-1 + et.
We consider 3 R programs for augmented Dickey Fuller test
First one actually writes an R function to do the test, but does not
have DF tables to come to a conclusion
The second calls R package called tseries (less comprehensive)
The third program calls package called urca and gives more comprehensive
results. All say there is unit root in chicken data:
Method 1 write the adf function
"adf"<function(x, L = 2, int = T, trend = T)
{
#Construct Data for Augmented Dickey Fuller Model with L lags.
#This is a modified version for R, in which the command rts was
substituted by ts.

x <- ts(x)#convert the data x into time series


D <- diff(x) #compute the first difference of data x
if(L > 0) {
for(i in 1:L)
D <- ts.intersect(D, lag(diff(x), - i))
}
D <- ts.intersect(lag(x, -1), D)#binds series exclude NAs
if(trend == T)
D <- ts.intersect(D, time(x))
y <- D[, 2]
x <- D[, -2]
if(int == T)
o2=summary(lm(y ~ x))
else o2=summary(lm(y ~ x - 1))
#if no intercept wanted then force regr thru origin using the -1
list(o1=cbind(y,x), o2=o2)#there are two outputs
}
#ADF for Chickens
#Model with 1 lag, constant and trend:
adf(chic, L=1, int=T, trend=T)
Estimate Std. Error t value Pr(>|t|)
(Intercept)
8.360e+04 4.277e+04
1.955
0.0564 .
xD.lag(x, -1)
-1.821e-01 9.112e-02 -1.998
0.0514 .
xD.D.lag(diff(x), -i) -8.620e-02 1.435e-01 -0.601
0.5510
xtime(x)
-3.156e+02 2.670e+02 -1.182
0.2429
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 25030 on 48 degrees of freedom
Multiple R-Squared: 0.1067,
Adjusted R-squared: 0.05085
F-statistic: 1.911 on 3 and 48 DF, p-value: 0.1404

Need to test each series separately. (In the above table see under t
values
and note the statistic of -1.998. This is the test statistic of interest.
We can compare it with Dickey Fuller Tables for significance.
Bannerjee et al page 103 table says
i) Models with intercept and trend (int=T, trend=T)
(ii) Models with intercept but without trend (int=T, trend=F)
(iii) Models without intercept and without trend (int=F, trend=F)
Method 2 in R for doing unit root testing
library(tseries)
adf.test(chic, k=1) #this automatically looks up DF tables and gives pvalues.
# It assumes trend and intercept are present, using k lags in the
regression
Augmented Dickey-Fuller Test
The results are:
data: chic
Dickey-Fuller = -1.998, Lag order = 1, p-value = 0.5753
alternative hypothesis: stationary

p-value is large means we accept the null and reject the alternative.
There is unit root in chicken data.
#adf.test(variable name)$p.value extracts the p-value if one uses
tseries library
Method 3 in R for doing unit root testing
library(urca)
chic.df <- ur.df(y=chic, lags=1, type='trend')
summary(chic.df)
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.329e+04 4.260e+04
1.955
0.0564 .
z.lag.1
-1.821e-01 9.112e-02 -1.998
0.0514 .
tt
-3.156e+02 2.670e+02 -1.182
0.2429
z.diff.lag -8.620e-02 1.435e-01 -0.601
0.5510
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 25030 on 48 degrees of freedom
Multiple R-Squared: 0.1067,
Adjusted R-squared: 0.05085
F-statistic: 1.911 on 3 and 48 DF, p-value: 0.1404
Value of test-statistic is: -1.998
Critical values for test statistics:
1%
5%
10%
tau3 -3.98 -3.42 -3.13

Observed statistic is less extreme than critical values, so we accept


the null of unit root
Method 3 offers many advanced unit root tests.
use help(package=urca) for details
# Schmidt-Phillips Unit Root Test #
sp.chic <- ur.sp(chic, type="tau", pol.deg=1, signif=0.05)
summary(sp.chic) #chicken data
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91360.7074 40033.5741
2.282
0.0268 *
y.lagged
0.7988
0.0853
9.364 1.40e-12 ***
trend.exp1
-310.3576
255.2688 -1.216
0.2298
Value of test-statistic is: -3.1274
Critical value for a significance level of 0.05
is: -3.06

Since observed test stat is more extreme than critical value, we reject
the null of unit root in chicken data by this method. So the results do
not agree with ADF tests above.

KPSS test is a fashionable unit root test it has the null of stationarity
and alternative hypothesis of unit root (unlike above tests). The urca
package can all it as follows:
kpss.chic <- ur.kpss(chic, type="tau", lags="short")
summary(kpss.chic)
# KPSS Unit Root Test #
Test is of type: tau with 3 lags.
Value of test-statistic is: 0.0864
Critical value for a significance level of:
10%
5% 2.5%
1%
critical values 0.119 0.146 0.176 0.216

Here the test stat is smaller than critical values so we accept the null of stationarity and
reject the unit root in chicken data. Again this does not agree with ADF test results.
data(npext) #Nelson Plosser data
attach(npext)
#nomgnp (nominal GNP) interest indprod gnpperca
# realgnp
wages
realwag
sp500
# unemploy
velocity
# we know nominal GNP goes up up and away, so does have unit root
kpss.nomgnp= ur.kpss(nomgnp, type="tau", lags="short")
#gives test stat = 0.3411 > crit values 0.119 0.146 0.176 0.216 for
#10%
5% 2.5%
1%. Thus nominal GNP does have unit root nonstationarity

II. Cointegration: Engle-Granger Test


DEFINITION of Co-integration: The components of a k 1 vector yt with k 2 time
series are said to be co-integrated of order (d, b) denoted by yt ~CI(d,b) if there exist r 1
co-integrating vectors" i (0) of dimension k 1 defining r linear combinations which
are integrated of order db, or
zit = i yit ~ I(db), b>0, i1,2,, r.(1)
where the elements of yt are denoted by yjt with j=1,2,, k.
Clearly, some reduction (b>0) in the order of integration is necessary as a result of the
linear combination by the co-integrating vectors i.

Here I recommend you to sketch the Engle-Granger test, explaining


the NULL and the ALTERNATIVE hypotheses. :
Engle-Granger in R: The test can be done in 3 steps, as follows:
(i) Pre-test the variables for the presence of unit roots (done above)
and check if they are integrated of the same order
(ii) Regress the long run equilibrium model of chickens vs. eggs
Engle<-lm(chic~egg)
summary(Engle)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 470461.481 36111.963 13.028
<2e-16 ***

egg

-10.219

7.133

-1.433

0.158

Residual standard error: 45950 on 52 degrees of freedom


Multiple R-Squared: 0.03798,
Adjusted R-squared: 0.01948
F-statistic: 2.053 on 1 and 52 DF, p-value: 0.1579

Obtain the residuals.


residual<-resid(Engle)

Plot the residuals along time.


ts.plot(year,residual, gpars=list(main="Chickens vs. eggs: Is there
cointegration?", xlab="year", ylab="residuals"))

Plot also the residuals versus lagged residuals. Draw your conclusions
(iii) Test whether the residuals are I(0).
This is a residual-based version of the ADF test. The only difference
from the traditional ADF to (this version of) the Engle-Granger test
are the critical values. The critical values to be used here are no
longer the same provided by Dickey-Fuller, but instead provided by
Engle and Yoo (1987) and others (see approximated critical values in
Table B.9, Hamilton 1994). This happens because the residuals above
are not the actual error terms, but estimated values from the long run
equilibrium equation of chickens against eggs.
adf.test(residual, k=1)
#Augmented Dickey-Fuller Test
#data: residual
#Dickey-Fuller = -2.0247, Lag order = 1, p-value = 0.5645
#alternative hypothesis: stationary

p-value is large, so we accept the null and reject the alternative.


Hence the residuals are not stationary implying that chicken and eggs
are not cointegrated.
#Copy from this point:
"johansen"<function(x, L = 2)
{
#Johansen Test of cointegration for multivariate time series x
#Returns vector of eigenvalues after that you are on your own.
#This is a modified version for R, in which rts is substituted by ts.
x <- ts(x)
n <- nrow(x)
p <- ncol(x)
Ly <- lag(x[, 1], -1)
D <- diff(x[, 1])
for(i in 1:p) {
if(i > 1) {
D <- ts.intersect(D, diff(x[, i]))

Ly <- ts.intersect(Ly, lag(x[, i], -1))


}
if(L > 0)
for(j in 1:L)
D <- ts.intersect(D, lag(diff(x[, i]),

- j))

}
iys <- 1 + (L + 1) * (0:(p - 1))
Y <- D[, iys]
X <- D[, - iys]
Ly <- ts.intersect(Ly, D)[, 1:p]
ZD <- lm(Y ~ X)$resid
ZL <- lm(Ly ~ X)$resid
df <- nrow(X) - ncol(X) - 1
S00 <- crossprod(ZD)/df
S11 <- crossprod(ZL)/df
S01 <- crossprod(ZD, ZL)/df
M <- solve(S11) %*% t(S01) %*% solve(S00) %*% S01
eigen(M)$values

}
#To this point.

Your job is to copy the code above and paste in the R console. This will
create a R function called "johansen" that calculates the eigenvalues.
The command to obtain the eigenvalues is:
johansen(cbind(egg,chic), L=1)
[1] 0.16562116 0.05024913

The code above refers to the case including trend and intercept, and
the appropriate critical values should be used. Note that the
theoretical background here is essential, given that you need to
interpret the eigenvalues and calculate the test statistic by yourself,
before to draw your conclusions.
library(urca)
data(npext)
attach(npext)
#nomgnp (nominal GNP) interest indprod gnpperca
# realgnp
wages
realwag
sp500
# unemploy
velocity
np4=cbind(interest, indprod, gnpperca, wages)
np4.vecm <- ca.jo(np4, constant=TRUE, type="eigen", K=2, spec="longrun",
season=4, ctable="A3")
summary(np4.vecm)

help(ca.jo) #Johansen in the urca package be sure to use cbind the


variables
#before using ca.jo function use the trace statistic and L=1 and L=2
if possible
The results from using the above johansen function (not from urca
package) are:
# Johansen-Procedure #

Test type: maximal eigenvalue statistic (lambda max) , without linear


trend and constant in cointegration
Eigenvalues (lambda):
[1] 2.582613e-01 2.258575e-01 1.168481e-01 5.847380e-02
-1.594994e-16
Values of teststatistic & critical values of test:
r
r
r
r

<= 3
<= 2
<= 1
= 0

test
10%
5%
1%
| 4.70 7.56 9.09 12.74
| 9.69 13.78 15.75 19.83
| 19.97 19.80 21.89 26.41
| 23.30 25.61 28.17 33.12

accept
reject
accept
accept

null at 5%
null at 5%
null
null

Eigenvectors, normalised to first column:


(These are the cointegration relations)
[,1]
[,2]
[,3]
[,4]
[,5]
interest
1.000000
1.00000
1.000000 1.0000000
1.000000
indprod
86.322979 33.47564
4.810538 -3.0716694 -16.823964
gnpperca -187.141061 -41.59155 18.372687 0.5638804
27.596117
wages
3.097134 -12.64356 -13.143730 -0.1988773
6.686496
constant 1078.034097 295.77558 -58.970345 3.6892148 -208.595929

Weights W:
(This is the loading matrix)

[1,]
[2,]
[3,]
[4,]

[,1]
[,2]
[,3]
[,4]
[,5]
-7.790924e-04 -0.012636958 0.0007507385 -0.0644947339 1.719589e-15
-1.876329e-03 0.011329379 -0.0095969956 -0.0005951474 9.891543e-16
-7.962321e-06 0.006601913 -0.0056842480 -0.0004599648 1.300587e-16
-6.189149e-04 0.008883549 0.0006473886 -0.0010482132 -6.601809e-16

This completes the macroeconomic example. Since we accept r<=1


or say r=1 we can sayt that there is one significant cointegrating
vector
Now the chicken eggs example for Cointegration
chegg=cbind(chic,egg)
chegg.vecm <- ca.jo(chegg, constant=TRUE, type="eigen", K=2,
spec="longrun", season=4, ctable="A3")
summary(chegg.vecm)

# change type = eigen to trace to get the trace statistic


Eigenvalues (lambda):
[1] 2.675294e-01 1.013629e-01 5.295009e-17

Values of teststatistic & critical values of test:


r <= 1 |

test
5.56

10%
7.56

5%
1%
9.09 12.74

accept at 5%

r = 0

| 16.19 13.78 15.75 19.83

accept at 5%

Eigenvectors, normalised to first column:


(These are the cointegration relations)
[,1]
[,2]
[,3]
chic
1.00000
1.00000
1.00000
egg
4.85545
88.10556
-47.42672
constant -393810.49290 -896077.46799 -226055.04523

Weights W:
(This is the loading matrix)
[,1]
[,2]
[,3]
[1,] -1.638589e-01 -0.046540914 -2.984694e-16
[2,] -2.771808e-05 -0.000542545 1.321877e-18

We conclude that there is no Cointegration between chicken and eggs


by Johansen trace or maximum eigenvalue test (see near DMCK p.
640)

You might also like