Professional Documents
Culture Documents
ARIMA Model
by
Dr Ani Shabri
Introduction to Box-Jenkins
Methodology
Box-Jenkins (BJ) methodology or Autoregressive
Integrated Moving Average (ARIMA) models are a class
of linear models that is capable of representing
stationary as well as non-stationary time series.
The BJ methodology refers to a set of procedures for
identifying, fitting, estimating and checking ARIMA
models with time series data. Forecast follow directly
from the form of fitted model.
The BJ methodology aims to obtain a model that is
parsimony. Parsimony referred a model has the smallest
number of parameters needed to adequately fit the
patterns in the data observed.
Stationarity
A stationary process has the property that the mean,
variance and auto-covariance structure do not change
over time.
A time series yt is said to be stationary if it satisfies the
following conditions:
i.
E( X1 ) E( X 2 ) ... E( X n )
ii. V ( X ) V ( X ) ... V ( X ) 2
1
2
n
iii. Cov( X t , X t k ) Cov( X t h , X t hk ) k
yt 1 yt 1 2 yt 2 ... p yt p t 1 t 1 ... q t q
Autoregressive model, AR(p) or ARMA(p, 0)
yt 1 yt 1 2 yt 2 ... p yt p t
rk
y
t 1
y yt k y
yt y
t 1
srk
k 1
1 2 rj2
j 1
The ACF is called cut off at 95% confidence interval if value of rk lie
in the range
[2srk , 2srk ]
Interpretation of Behavior of
Sample ACF
The sample ACF is said to die down if this function
does not cut off but rather decreases in a steady
fashion. The sample ACF can die down in
(i) a damped exponential fashion
(ii) a damped sine-wave fashion
(iii) a fashion dominated by either one of or a
combination of both (i) and (ii).
The SAC can die down fairly quickly or extremely slowly.
Note: Behavior of ACF and PACF usually drawn with
95% confidence interval.
PACF is used to measure the degree of association between Xt and Xt-k, when
the effects of other time lags (1, 2, 3, , k 1) are removed. The sample
PACF is given by
k 1
rkk
rk (rk 1, j )(rk j )
j 1
k 1
1 (rk 1, j )(r j )
j 1
1
rkk where
srkk
.
n
srkk
The PACF is called cut off at 95% confidence interval if value of rk lie in
the range
t rkk
[2 / n , 2 n ]
Identification Model
Summary Of The Behaviour Of Autocorrelation And Partial
Autocorrelation Functions
ACF
PACF
AR(p)
MA(q)
ARMA(p,q)
Exponential
decay/tails
off
towards zero/damped sine wave
* Note:
Diagnostics
In the model-building process, if an ARIMA(p, d, q)
model is chosen (based on the ACFs and PACFs), some
checks on the model adequacy are required. A residual
analysis is usually based on the fact that the residuals of
an adequate model should be approximately white noise.
Basically, a model is adequate if the residuals nearly the
properties white noise process, i.e. the errors
constant on variances
Independent
normally distributed with zero means and variance 2
Constant on variances
Constant on variances
Standardized Residuals
Standardized Residuals
0
T
Random errors
Standardized Residuals
Standardized Residuals
Independent Test
If an ARMA(p,q) model is an adequate representation of the
data generating process, then the residuals should be
independent. 2 Tests were considered
i. ACF of residuals mostly falls inside Barlett confidence
interval.
ii. Portmanteau test statistic uses sample ACF of the residuals as
a group to examine the following hypothesis:
H 0 : 1 2 ... k 0
Portmanteau test statistic:
rl2 (e)
Q (k ) (n d )(n d 2)
~ (2k p q )
l 1 n d k
*
Q* (k ) (2k pq )
Normality of Residuals
In statistics, the JarqueBera (JB) test is one procedure for
determining whether sample data (residuals) are normal
distribution. The test is named after Carlos Jarque and Anil K.
Bera. The test statistic JB is defined as
n
JB [ S 2 14 ( K 3) 2 ]
6
[ 1n i 1 ( xi x ) 2 ]3 / 2
( xi x )
4
n i 1
K 4
n
1
[ n i 1 ( xi x ) 2 ]2
2
JB , 2
AIC ln r
n
Bayesian Information Criterion (BIC)
ln n
BIC ln
r
n
2
SSE
n
Forecasting
Example
Monthly data of water demand in Kluang Johor in Malaysia from January 1995 to
December 2011.
Autocorrelation Function for C1
120
1.0
0.8
0.6
Autocorrelation
Water Demand
110
100
90
80
70
1
20
40
60
80
100
120
140
160
180
200
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
10
15
20
25
Lag
30
35
40
45
50
1st difference
10
-10
-20
1
20
40
60
80
100
120
Monthly
140
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
1.0
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
10
15
20
25
Lag
30
35
40
200
180
160
45
50
10
15
20
25
Lag
30
35
40
45
50
Example
The plot and ACF (cuts off quickly) of the 1st
difference of water demand suggests the series is
stationary.
Based on ACF and PACF, 3 tentative models are
identified
i.
ARIMA(0,1,1)-ACF cuts off after lag 1 and PACF
shows a exponential decay
ii. ARIMA (2,1,1)-ACF follows a damped cycle and
PACF cuts off after lag 2.
iii. ARIMA(1,1,1)-ACF and PACF decay
exponentially.
ARIMA(2,1,0)
ARIMA(1,1,1)
Type
Coef SE Coef T P
AR 1 -0.5031 0.0686 -7.33 0.000
AR 2 -0.2305 0.0686 -3.36 0.001
36
48
64.5 83.0
34 46
0.001 0.001
12
25.3
11
0.008
24
37.9
23
0.026
36
48
58.4 77.2
35
47
0.008 0.004
12
17.1
10
0.072
24 36
48
27.6 51.3 70.8
22
34
46
0.190 0.029 0.011
Example
The results indicate that ARIMA(1,1,1) residuals are
uncorrelated at least up to lag 48, while ARIMA(2,1,0) and
ARIMA(0,1,1) residuals are correlated.
The ACF and PACF of residuals of ARIMA(1,1,1) are well
within their two standard error limits indicating residuals are
white noise.
ACF of Residuals for ARIMA(1,1,1)
1.0
1.0
0.8
Partial Autocorrelation
0.6
Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
-1.0
2
10
12
14
Lag
16
18
20
22
24
10
12
14
Lag
16
18
20
22
24
Example
Residual Plots for C1
Mean
StDev
N
AD
P-Value
99
80
70
60
50
40
30
20
99
10
90
50
10
0.1
-10
The residual
autocorrelation
follows normal distribution
-10
-5
0
RESI1
10
0
Residual
80
Histogram
15
10
90
100
Fitted Value
110
Versus Order
40
10
30
Residual
-15
Frequency
0
-10
10
0.1
Versus Fits
99.9
Percent
Percent
95
90
0.4558
4.848
203
0.729
0.057
Residual
99.9
20
10
0
0
-10
-12
-8
-4
0
4
Residual
12
t-test
Q-test
x
x
AIC
3.247
3.183
3.176
BIC
3.280
3.200
3.208
Judging these results, it appears that the estimated ARIMA(1,1,1) model best fits
the data.
26
120
110
100
90
80
Data
70
ARIMA(1,1,1)
60
0
25
50
75
100
125
Monthly
150
175
200
Water Demand
120
110
100
90
80
70
1
20
40
60
80
100
120
Monthly
140
160
180
200