Professional Documents
Culture Documents
Professor Roy Batchelor Cass Business School, City of London ESCP-EAP, Paris
Box-Jenkins analysis 1
Plan of session
Box-Jenkins (ARIMA) Models: statistically sophisticated methods of extrapolating time series requires large run of time series data technical expertise on the part of the forecaster What does ARIMA mean? Identifying ARIMA models
Box-Jenkins analysis 2
Box-Jenkins procedure
Yes
Box-Jenkins analysis 3
10
Index
50
100
150
200
Box-Jenkins analysis 4
Box-Jenkins procedure
Is it Stationary?
Box-Jenkins analysis 5
Is it Stationary?
Regression with nonstationary variables = spurious correlation The random walk yt = yt-1 + ut , ut ~ N(0, 2) is not stationary, since its variance increases linearly with time t But its first difference yt = yt - yt-1 = ut is stationary, so y is integrated of order 1, or y ~ I(1)
Box-Jenkins analysis 6 Roy Batchelor 2004
Correlogram of STOCKS
Autocorrelation Function f or STOCKS
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0
Autocorrelation
12
Lag 1 2 3 4 5 6 7
Corr
Lag 8 9 10 11 12
LBQ
0.99 14.04 0.98 0.97 0.96 0.94 0.92 0.89 8.04 6.18 5.21
3.32 1465.22 3.06 1615.94 2.84 1758.89 2.64 1892.44 2.44 2015.49
Autocorrelations Autocorrelations high and declining high and declining slowly = slowly = nonstationary nonstationary
Roy Batchelor 2004
Box-Jenkins analysis 8
Augmented Dickey-Fuller Test Equation Dependent Variable: D(STOCKS) Method: Least Squares Sample(adjusted): 1/08/1999 10/25/2002 Included observations: 199 after adjusting endpoints Variable Coefficient Std. Error STOCKS(-1) -0.006871 0.008015 C 0.023932 0.032796 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.003717 -0.001341 0.367900 26.66400 -82.37469 0.961566 t-Statistic -0.857277 0.729719 Prob. 0.3923 0.4664 0.006884 0.367653 0.847987 0.881085 0.392333
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic 0.734925 Prob(F-statistic)
Box-Jenkins analysis 9
Box-Jenkins procedure
Box-Jenkins analysis 10
ACF of STOCKS
Autocorrelation Function f or DSTOCKS
1.0
Autocorrelation
0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0
12
Lag 1 2 3 4 5 6 7
Corr 0.52
T 7.30
Lag 8 9 10 11 12
Corr
-0.08 -0.92 -0.24 -2.77 0.30 3.24 0.66 0.48 6.88 4.14
-0.03 -0.22
-0.08 -0.60
Box-Jenkins analysis 11
Augmented Dickey-Fuller Test Equation Dependent Variable: D(STOCKS,2) Method: Least Squares Sample(adjusted): 1/15/1999 10/25/2002 Included observations: 198 after adjusting endpoints Variable D(STOCKS(-1)) C R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient Std. Error -0.482289 0.061112 0.003314 0.022471 0.241141 0.237269 0.316148 19.59017 -51.93915 1.506855 t-Statistic -7.891929 0.147497 Prob. 0.0000 0.8829 5.05E-05 0.361997 0.544840 0.578055 62.28254 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Box-Jenkins analysis 12
Box-Jenkins procedure
Yes
Box-Jenkins analysis 13
ARIMA models
Box and Jenkins show that a wide variety of dynamics can be captured by the class of AutoRegressive Integrated Moving Average models
Box-Jenkins analysis 14
Autoregressive Models
In an autoregressive model, the value of y depends linearly on its own past values, as yt = b0 + b1 yt-1 + b2 yt-2 + + bp yt-p + ut If y follows an autoregressive process of order p we write y ~ AR(p), or y ~ ARIMA(p,0,0) The coefficients b0, b1 can be estimated by ordinary least squares regression AR models have persistent dynamics ...
Box-Jenkins analysis 15 Roy Batchelor 2004
Dynamics of AR models
Consider the model yt = b1 yt-1 + b2 yt-2 yt - b1 Ly - b2 L2y = 0, where L is the lag operator Dynamics depends on discriminant function b12 + 4.b2
4.5
0.8
3.5
0.6
0.4
2.5
0.2
0 1 11 21 31 41 51
1.5
-0.2
-0.4
0.5
-0.6
0 1 11 21 31 41 51
-0.8
Box-Jenkins analysis 16
the sample autocorrelation function (ACF, correlogram) the sample partial autocorrelation function (PACF) The ACF plots the correlations between yt and yt-k against the lag k = 1, 2, 3, : identifies possible MA terms The PACF plots the coefficients n a regression of yt on yt-1, yt-2, yt-k against k = 1, 2, 3, : identifies possible AR terms
Box-Jenkins analysis 18 Roy Batchelor 2004
ACF of STOCKS
Autocorrelation Function f or DSTOCKS
1.0
Autocorrelation
0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0
12
Lag 1 2 3 4 5 6 7
Corr 0.52
T 7.30
Lag 8 9 10 11 12
Corr
-0.08 -0.92 -0.24 -2.77 0.30 3.24 0.66 0.48 6.88 4.14
-0.03 -0.22
-0.08 -0.60
Box-Jenkins analysis 20
10 10
PACF of STOCKS
Partial Autocorrelation Function f or DSTOCKS
Partial Autocorrelation
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0
12
PACF cutoff after PACF cutoff after 4 lags = 4 lags = AR(4)? AR(4)?
Box-Jenkins analysis 21
Differencing: 1 regular difference Residuals: SS = 7.50695 (backforecasts excluded) MS = 0.03870 DF = 194 Modified Box-Pierce (Ljung-Box) Chi-Square statistic Lag 12 Chi-Square 3.6 DF 7 P-Value 0.829
Box-Jenkins analysis 22 Roy Batchelor 2004
11 11
Case: CASHFLOW
ADF Test Statistics
2
Level: Difference:
-0.3355 -3.2556
1 CASHFLOW
-1
Box-Jenkins analysis 23
Box-Jenkins analysis 24
PACF Cycles PACF Cycles gradually dying gradually dying away away
12 12
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Inverted MA Roots
Box-Jenkins analysis 25
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) -.99
Practical tips
In many practical applications it is very difficult to tell whether data is from an AR(p) or MA(p) model choose best-fitting model forecasts will differ a little in the short term, but converge Do NOT build models with large numbers of MA terms large numbers of AR and MA terms together You may well see very (suspiciously) high t-statistics This happens because of high correlation (collinearity)among regressors, not because the model is good
Box-Jenkins analysis 26 Roy Batchelor 2004
13 13
Box-Jenkins procedure
Yes
Diagnostics OK?
Box-Jenkins analysis 27
Diagnostic statistics
Random residuals: the Box-Pierce Q-statistic: Q(s) = n. r(k)2 ~ 2 (s) where r(k) is the k-th residual autocorrelation and summation is over first s autocorrelations. Fit versus parsimony: the Schwartz Bayesian Criterion ( SBC): SBC = ln{RSS/n} + (p+d+q)ln(n)/n where RSS = residual sum of squares, n is sample size, and (p+d+q) the number of parameters.
Box-Jenkins analysis 28 Roy Batchelor 2004
Diagnostics OK?
14 14
Not acceptable since there Not Differencing: 1 regular difference acceptable since there is residual correlation as Residuals: SS = 15.1505 (backforecasts excluded) is residual correlation as MS = 0.0773 DF = 196 shown by high Box-Pierce shown by high Box-Pierce
Modified Box-Pierce (Ljung-Box) Chi-Square statistic Lag 12 Chi-Square 154.1 DF 9 P-Value 0.000
Box-Jenkins analysis 29 Roy Batchelor 2004
15 15
Box-Jenkins procedure
Yes
Box-Jenkins analysis 31
10
STOCKS
20
Note 60persistent 120 Notepersistent 40 80 100 dynamics (beyond 4 Time dynamics (beyond 4 periods) periods)
140
160
180
200
Box-Jenkins analysis 32
16 16
3/15/02
5/24/02
8/02/02
10/11/02 12/20/02
CASHFLOWF
CASHFLOW
Box-Jenkins analysis 33
17 17
7.2
7.0
6.8
6.6
6.4 88 89 90 91 92 93 94 95 96 97 98 99 00 LCARBS
Box-Jenkins analysis 35 Roy Batchelor 2004
18 18
Sample(adjusted): 1989:2 1999:4 Included observations: 43 after adjusting endpoints Convergence achieved after 11 iterations Backcast: 1989:1 Variable C AR(1) SAR(4) MA(1) Coefficient 7.337168 0.554554 0.892048 -0.128166 Std. Error 0.248463 0.291564 0.054019 0.347313 0.859162 0.848329 0.068714 0.184142 56.23043 1.939906 t-Statistic 29.53027 1.901997 16.51373 -0.369020 Prob. 0.0000 0.0646 0.0000 0.7141 7.036425 0.176438 -2.429322 -2.265490 79.30478 0.000000 -.00 -.97i -.97
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) .55 .13 -.00+.97i
LCARBS LCARBSF
Box-Jenkins analysis 38
19 19
20 20