You are on page 1of 20

Box-Jenkins Analysis

Professor Roy Batchelor Cass Business School, City of London ESCP-EAP, Paris

Box-Jenkins analysis 1

Roy Batchelor 2004

Plan of session
Box-Jenkins (ARIMA) Models: statistically sophisticated methods of extrapolating time series requires large run of time series data technical expertise on the part of the forecaster What does ARIMA mean? Identifying ARIMA models

Box-Jenkins analysis 2

Roy Batchelor 2004

Box-Jenkins procedure

Plot Plot Series Series

Is it Stationary? No Difference Difference Integrate Integrate Series Series

Yes

Identify Identify Possible Possible Model Model No

Diagnostics OK? Yes Make Make forecasts forecasts

Box-Jenkins analysis 3

Roy Batchelor 2004

Case: STOCKS relative to orders

Plot Plot Series Series


STOCKS

10

Index

50

100

150

200

Box-Jenkins analysis 4

Roy Batchelor 2004

Box-Jenkins procedure

Plot Plot Series Series

Is it Stationary?

Box-Jenkins analysis 5

Roy Batchelor 2004

Stationarity and integration


A stationary series has: constant mean constant variance constant autocorrelation structure

Is it Stationary?

Regression with nonstationary variables = spurious correlation The random walk yt = yt-1 + ut , ut ~ N(0, 2) is not stationary, since its variance increases linearly with time t But its first difference yt = yt - yt-1 = ut is stationary, so y is integrated of order 1, or y ~ I(1)
Box-Jenkins analysis 6 Roy Batchelor 2004

Testing for non-stationarity


Autocorrelation function (Box-Jenkins approach) if autocorrelations start high and decline slowly, then series is nonstationary, and should be differenced Dickey-Fuller test yt = a + byt-1+ut would be a nonstationary random walk if b = 1 So to find out if y has a unit root we regress: yt = a + cyt-1+ut, where c = b-1 and test hypothesis that c = 0 against c < 0 (like a t-test)
Box-Jenkins analysis 7 Roy Batchelor 2004

Correlogram of STOCKS
Autocorrelation Function f or STOCKS
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0

Autocorrelation

12

Lag 1 2 3 4 5 6 7

Corr

LBQ 200.23 396.06 587.92 777.04

Lag 8 9 10 11 12

Corr 0.87 0.84 0.82 0.79 0.76

LBQ

0.99 14.04 0.98 0.97 0.96 0.94 0.92 0.89 8.04 6.18 5.21

3.32 1465.22 3.06 1615.94 2.84 1758.89 2.64 1892.44 2.44 2015.49

4.56 961.90 4.05 1139.22 3.64 1306.77

Autocorrelations Autocorrelations high and declining high and declining slowly = slowly = nonstationary nonstationary
Roy Batchelor 2004

Box-Jenkins analysis 8

Dickey-Fuller test on STOCKS


ADF Test Statistic 1% Critical Value* 5% Critical Value 10% Critical Value *MacKinnon critical values for rejection of hypothesis of a unit root. -0.857277 -3.4645 -2.8761 -2.5744

Augmented Dickey-Fuller Test Equation Dependent Variable: D(STOCKS) Method: Least Squares Sample(adjusted): 1/08/1999 10/25/2002 Included observations: 199 after adjusting endpoints Variable Coefficient Std. Error STOCKS(-1) -0.006871 0.008015 C 0.023932 0.032796 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.003717 -0.001341 0.367900 26.66400 -82.37469 0.961566 t-Statistic -0.857277 0.729719 Prob. 0.3923 0.4664 0.006884 0.367653 0.847987 0.881085 0.392333

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic 0.734925 Prob(F-statistic)

Box-Jenkins analysis 9

Roy Batchelor 2004

Box-Jenkins procedure

Plot Plot Series Series

Is it Stationary? No Difference Difference Integrate Integrate Series Series

Box-Jenkins analysis 10

Roy Batchelor 2004

ACF of STOCKS
Autocorrelation Function f or DSTOCKS
1.0

Autocorrelation

0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0

12

Lag 1 2 3 4 5 6 7

Corr 0.52

T 7.30

LBQ 54.13 55.44 67.64 85.71 176.09 224.55 225.74

Lag 8 9 10 11 12

Corr

LBQ 238.46 238.86 273.55 296.15 296.36

-0.25 -1.94 0.04 0.40 0.33 0.34 3.13 2.41

-0.08 -0.92 -0.24 -2.77 0.30 3.24 0.66 0.48 6.88 4.14

-0.03 -0.22

Cycles gradually Cycles gradually dying away dying away

-0.08 -0.60

Box-Jenkins analysis 11

Roy Batchelor 2004

Dickey-Fuller test on STOCKS


ADF Test Statistic 1% Critical Value* -3.4646 5% Critical Value -2.8761 10% Critical Value -2.5745 *MacKinnon critical values for rejection of hypothesis of a unit root. -7.891929

Difference Difference Integrate Integrate Series Series

Augmented Dickey-Fuller Test Equation Dependent Variable: D(STOCKS,2) Method: Least Squares Sample(adjusted): 1/15/1999 10/25/2002 Included observations: 198 after adjusting endpoints Variable D(STOCKS(-1)) C R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient Std. Error -0.482289 0.061112 0.003314 0.022471 0.241141 0.237269 0.316148 19.59017 -51.93915 1.506855 t-Statistic -7.891929 0.147497 Prob. 0.0000 0.8829 5.05E-05 0.361997 0.544840 0.578055 62.28254 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

Box-Jenkins analysis 12

Roy Batchelor 2004

Box-Jenkins procedure

Plot Plot Series Series

Is it Stationary? No Difference Difference Integrate Integrate Series Series

Yes

Identify Identify Possible Possible Model Model

Box-Jenkins analysis 13

Roy Batchelor 2004

ARIMA models
Box and Jenkins show that a wide variety of dynamics can be captured by the class of AutoRegressive Integrated Moving Average models

Identify Identify Possible Possible Model Model

Box-Jenkins analysis 14

Roy Batchelor 2004

Autoregressive Models
In an autoregressive model, the value of y depends linearly on its own past values, as yt = b0 + b1 yt-1 + b2 yt-2 + + bp yt-p + ut If y follows an autoregressive process of order p we write y ~ AR(p), or y ~ ARIMA(p,0,0) The coefficients b0, b1 can be estimated by ordinary least squares regression AR models have persistent dynamics ...
Box-Jenkins analysis 15 Roy Batchelor 2004

Dynamics of AR models
Consider the model yt = b1 yt-1 + b2 yt-2 yt - b1 Ly - b2 L2y = 0, where L is the lag operator Dynamics depends on discriminant function b12 + 4.b2
4.5

b12 + 4.b2 > 0 b1 = 0.70, b2 =0.35

0.8

b12 + 4.b2 < 0 b1 = 1.60, b2 =-0.90

3.5

0.6

0.4

2.5

0.2

0 1 11 21 31 41 51

1.5

-0.2

-0.4

0.5

-0.6

0 1 11 21 31 41 51

-0.8

Box-Jenkins analysis 16

Roy Batchelor 2004

Moving Average Models


In moving average models, the current value of y depends linearly on past shocks yt = c0 + ut + c1ut-1 + c2 ut-2 + + cq ut-q If y follows a moving average process of order q we write y ~ MA(q), or y ~ ARIMA(0,0,q) The parameters c0 , c1, c2, .., cq can only be estimated by a dedicated ARIMA modelling program This is a short-memory process, since shocks only affect y for q periods
Box-Jenkins analysis 17 Roy Batchelor 2004

Identifying possible model(s)


To find provisional orders of AR and MA models, Box and Jenkins suggest examining:

Identify Identify Possible Possible Model Model

the sample autocorrelation function (ACF, correlogram) the sample partial autocorrelation function (PACF) The ACF plots the correlations between yt and yt-k against the lag k = 1, 2, 3, : identifies possible MA terms The PACF plots the coefficients n a regression of yt on yt-1, yt-2, yt-k against k = 1, 2, 3, : identifies possible AR terms
Box-Jenkins analysis 18 Roy Batchelor 2004

Provisional ARIMA models


If data is AR(p) then ACF will decline steadily, or follow a damped cycle PACF will cut off suddenly after p lags If data is MA(q) then ACF will cut off suddenly after q lags PACF will decline steadily, or follow a damped cycle Mixed models have complex patterns
Box-Jenkins analysis 19 Roy Batchelor 2004

ACF of STOCKS
Autocorrelation Function f or DSTOCKS
1.0

Autocorrelation

0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0

12

Lag 1 2 3 4 5 6 7

Corr 0.52

T 7.30

LBQ 54.13 55.44 67.64 85.71 176.09 224.55 225.74

Lag 8 9 10 11 12

Corr

LBQ 238.46 238.86 273.55 296.15 296.36

-0.25 -1.94 0.04 0.40 0.33 0.34 3.13 2.41

-0.08 -0.92 -0.24 -2.77 0.30 3.24 0.66 0.48 6.88 4.14

-0.03 -0.22

Cycles gradually Cycles gradually dying away dying away

-0.08 -0.60

Box-Jenkins analysis 20

Roy Batchelor 2004

10 10

PACF of STOCKS
Partial Autocorrelation Function f or DSTOCKS
Partial Autocorrelation
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0

12

Lag PAC 1 0.52 2 -0.48 3 0.11 4 0.70 5 0.00 6 -0.05 7 -0.01

T 7.30 -6.71 1.58 9.93 0.03 -0.66 -0.18

Lag PAC 8 -0.01 9 -0.10 10 -0.08 11 0.02 12 -0.00

T -0.21 -1.41 -1.07 0.28 -0.06

PACF cutoff after PACF cutoff after 4 lags = 4 lags = AR(4)? AR(4)?

Box-Jenkins analysis 21

Roy Batchelor 2004

ARIMA (4,1,0) for STOCKS


Final Estimates of Parameters Type Coef SE Coef AR 1 0.7370 0.0508 AR 2 -0.1621 0.0654 AR 3 -0.4675 0.0654 AR 4 0.7080 0.0508 Constant 0.00040 0.01395 T 14.50 -2.48 -7.15 13.93 0.03 P 0.000 0.014 0.000 0.000 0.977

Differencing: 1 regular difference Residuals: SS = 7.50695 (backforecasts excluded) MS = 0.03870 DF = 194 Modified Box-Pierce (Ljung-Box) Chi-Square statistic Lag 12 Chi-Square 3.6 DF 7 P-Value 0.829
Box-Jenkins analysis 22 Roy Batchelor 2004

11 11

Case: CASHFLOW
ADF Test Statistics
2

Level: Difference:

-0.3355 -3.2556

1 CASHFLOW

-1

-2 Index 50 100 150 200

Box-Jenkins analysis 23

Roy Batchelor 2004

Correlogram for CASHFLOW


ACF cutoff after 2 ACF cutoff after 2 lags = lags = MA(2)? MA(2)?

Box-Jenkins analysis 24

PACF Cycles PACF Cycles gradually dying gradually dying away away

Roy Batchelor 2004

12 12

ARIMA(0,1,2) model for CASHFLOW


Dependent Variable: D(CASHFLOW,1) Method: Least Squares Sample(adjusted): 1/08/1999 10/25/2002 Included observations: 199 after adjusting endpoints Convergence achieved after 15 iterations Backcast: 12/25/1998 1/01/1999 Variable C MA(1) MA(2) Coefficient Std. Error t-Statistic 0.013418 0.013609 0.985935 0.500527 0.009756 5.30472 -0.484454 0.000579 -8.0659 0.310606 0.303571 0.188608 6.972326 51.09115 1.990827 .49 Prob. 0.3254 0.0000 0.0000 0.014372 0.226007 -0.483328 -0.433680 44.15374 0.000000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Inverted MA Roots
Box-Jenkins analysis 25

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) -.99

Roy Batchelor 2004

Practical tips
In many practical applications it is very difficult to tell whether data is from an AR(p) or MA(p) model choose best-fitting model forecasts will differ a little in the short term, but converge Do NOT build models with large numbers of MA terms large numbers of AR and MA terms together You may well see very (suspiciously) high t-statistics This happens because of high correlation (collinearity)among regressors, not because the model is good
Box-Jenkins analysis 26 Roy Batchelor 2004

13 13

Box-Jenkins procedure

Plot Plot Series Series

Is it Stationary? No Difference Difference Integrate Integrate Series Series

Yes

Identify Identify Possible Possible Model Model

Diagnostics OK?

Box-Jenkins analysis 27

Roy Batchelor 2004

Diagnostic statistics
Random residuals: the Box-Pierce Q-statistic: Q(s) = n. r(k)2 ~ 2 (s) where r(k) is the k-th residual autocorrelation and summation is over first s autocorrelations. Fit versus parsimony: the Schwartz Bayesian Criterion ( SBC): SBC = ln{RSS/n} + (p+d+q)ln(n)/n where RSS = residual sum of squares, n is sample size, and (p+d+q) the number of parameters.
Box-Jenkins analysis 28 Roy Batchelor 2004

Diagnostics OK?

14 14

ARIMA (2,1,0) for STOCKS?


ARIMA model for STOCKS Final Estimates of Parameters Type Coef SE Coef AR 1 0.7642 0.0628 AR 2 -0.4763 0.0628 Constant 0.00466 0.01971

T 12.17 -7.58 0.24

P 0.000 0.000 0.813

Not acceptable since there Not Differencing: 1 regular difference acceptable since there is residual correlation as Residuals: SS = 15.1505 (backforecasts excluded) is residual correlation as MS = 0.0773 DF = 196 shown by high Box-Pierce shown by high Box-Pierce
Modified Box-Pierce (Ljung-Box) Chi-Square statistic Lag 12 Chi-Square 154.1 DF 9 P-Value 0.000
Box-Jenkins analysis 29 Roy Batchelor 2004

statistic (<.05 p value) statistic (<.05 p value)

Model selection for CASHFLOW by SBC


Alternative models estimated for CASHFLOW give the following Schwartz Bayesian Criterion statistics:
Model 2,1,0 0,1,1 0,1,2 0,1,3 1,1,1 1,1,2 Schwartz Criterion -0.2174 -0.2379 -0.4336 -0.4040 -0.4025 -0.3990

ARIMA (0,1,2) has lowest SBC and would be preferred model


Box-Jenkins analysis 30 Roy Batchelor 2004

15 15

Box-Jenkins procedure

Plot Plot Series Series

Is it Stationary? No Difference Difference Integrate Integrate Series Series

Yes

Identify Identify Possible Possible Model Model No

Diagnostics OK? Yes Make Make forecasts forecasts

Box-Jenkins analysis 31

Roy Batchelor 2004

Forecasts from ARIMA(4,1,0) for STOCKS


Time Series Plot for STOCKS
(with forecasts and their 95% confidence limits)

10

STOCKS

20

Note 60persistent 120 Notepersistent 40 80 100 dynamics (beyond 4 Time dynamics (beyond 4 periods) periods)

140

160

180

200

Box-Jenkins analysis 32

Roy Batchelor 2004

16 16

Forecasts from ARIMA(0,1,2) for CASHFLOW


2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 1/04/02

Note no dynamics Note no dynamics after 2 periods after 2 periods

3/15/02

5/24/02

8/02/02

10/11/02 12/20/02

CASHFLOWF

CASHFLOW

Box-Jenkins analysis 33

Roy Batchelor 2004

Seasonality in Box-Jenkins Models


Model identification: seasonality of order s is revealed by "spikes at s, 2s, 3s, .. lags of the autocorrelation function Model estimation: to make series stationary, may need to take sth differences of the raw data before estimation.These seasonal effects may themselves follow AR and MA processes Lag operator L can be used to express these complex models more compactly: Ls.y means yt-s, so (1-.55L)(1-.89L4) yt = ut - .13ut-1 yt = .55yt-1 + .89yt-4 - (.55 .89) yt-5 + ut - .13ut-1
Box-Jenkins analysis 34 Roy Batchelor 2004

17 17

Carbonates consumption : UK, quarterly


7.4

7.2

7.0

6.8

6.6

6.4 88 89 90 91 92 93 94 95 96 97 98 99 00 LCARBS
Box-Jenkins analysis 35 Roy Batchelor 2004

ACF and PACF for ln(carbs), EVIEWS


Sample: 1988:1 2000:4 Included observations: 48 Autocorrelation Partial Correlation . |*. | . |*. | 1 . |*. | . |*. | 2 . |*. | . |*. | 3 . |******| . |****** |4 .|. | ***| . | 5 .|. | .*| . | 6 .|. | .|. | 7 . |*****| . |*. | 8 .|. | . |*. | 9 .|. | . | . | 10 .|. | .*| . | 11 . |**** | .*| . | 12 .|. | . | . | 13 .|. | . | . | 14 .*| . | .*| . | 15 . |*** | . | . | 16 .*| . | .*| . | 17 .*| . | . | . | 18 .*| . | .*| . | 19 . |** | .*| . | 20 Box-Jenkins analysis 36 AC 0.183 0.171 0.135 0.786 0.062 0.053 0.047 0.632 0.026 0.031 0.012 0.514 -0.018 0.002 -0.058 0.379 -0.103 -0.060 -0.151 0.235 PAC 0.183 0.143 0.087 0.775 -0.374 -0.162 0.030 0.139 0.094 0.037 -0.091 -0.059 -0.039 0.050 -0.081 -0.053 -0.114 -0.040 -0.069 -0.058 Q-Stat 1.7165 3.2489 4.2242 37.918 38.135 38.297 38.428 62.415 62.458 62.521 62.531 80.176 80.199 80.199 80.442 91.224 92.044 92.329 94.206 98.942 Prob 0.190 0.197 0.238 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Roy Batchelor 2004

18 18

EVIEWS: seasonal ARIMA for ln(carbs)


Dependent Variable: LCARBS Method: Least Squares

(1-.55L)(1 -.89L4) yt = ut - .13ut-1

Sample(adjusted): 1989:2 1999:4 Included observations: 43 after adjusting endpoints Convergence achieved after 11 iterations Backcast: 1989:1 Variable C AR(1) SAR(4) MA(1) Coefficient 7.337168 0.554554 0.892048 -0.128166 Std. Error 0.248463 0.291564 0.054019 0.347313 0.859162 0.848329 0.068714 0.184142 56.23043 1.939906 t-Statistic 29.53027 1.901997 16.51373 -0.369020 Prob. 0.0000 0.0646 0.0000 0.7141 7.036425 0.176438 -2.429322 -2.265490 79.30478 0.000000 -.00 -.97i -.97

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) .55 .13 -.00+.97i

Inverted AR Roots .97 Inverted MA Roots Box-Jenkins analysis 37

Roy Batchelor 2004

Forecasts from seasonal model


7.6 7.4 7.2 7.0 6.8 6.6 6.4 88 89 90 91 92 93 94 95 96 97 MIN MAX 98 99 00

LCARBS LCARBSF

Box-Jenkins analysis 38

Roy Batchelor 2004

19 19

Box-Jenkins Models: Conclusions


Advantages: provides unconditional forecasts can be made fully automatic using expert system for model identification models are parsimonious with respect to coefficients Disadvantages: requires large number of observations for model identification hard to explain and interpret to unsophisticated users estimation and selection sometimes an art form
Box-Jenkins analysis 39 Roy Batchelor 2004

20 20

You might also like