Professional Documents
Culture Documents
1 2
Motivation Components
Recall Australian Red Wine Sales
Australian Wine Sales • Trend (global heating brings more rain)
S e r ie s
• Seasonal variation (swim suits sold every spring)
30 00 .
15 00 .
10 00 .
Seasonal and cyclical variation
5 00 .
• Multiplicative:
(beer consumption increase hot days)
0 20 40 60 80 10 0 12 0 14 0
dependent on current sales
• Additive:
(beer consumption during Roskilde Festival)
• Trend independent of current sales
• Seasonal variation
• Irregular variation
3 4
Models Quality of forecast
S e r ie s
30 00 .
• Error εt = |X̂t − Xt |
25 00 . • Error total ∑t=1
n
εt
20 00 .
• Mean squared error 1n ∑t=1
n
εt2
q
15 00 . • Root mean squared error 1n ∑t=1n
εt2
10 00 .
Punish large errors more than small errors
5 00 .
0 20 40 60 80 10 0 12 0 14 0
5 6
• Moving average, weighted moving average Typical Behavior for Exponential Smoothing
-2
• Trends and seasonal pattern
Demand
-4
-6
• Croston’s method -8
-12
-14
1
13
19
25
31
43
49
55
61
67
73
79
85
91
97
103
115
37
109
Period
Given observations
X1, X2, . . . , Xt
Level at time t
1 m−1
Lt = ∑ Xt−i
m i=0
Forecast
X̂t+i = Lt for i = 1, 2, . . .
• advantage large m
• advantage small m
• average age of data m/2
7 8
Weighted Moving Average Exponential Smoothing
0
• Quite stable processes
1 2 3 4 5 6 7 8 9 10
Note: some authors use 1 − α instead of α
9 10
Exponential Smoothing
Shifting Mean + Zero Mean White Noise Exponential Smoothing
4 4
3 3
2 2
1
1
Series
Series 0 0.3
0
Mean Mean
-1
-1
-2
-2
-3
-3
31
36
71
76
81
1
6
11
16
21
26
41
46
51
56
61
66
86
91
96
-4
31
36
71
76
81
1
6
11
16
21
26
41
46
51
56
61
66
86
91
96
-4
3
RMSE vs Alpha
2
1.45
1
Series 1.4
0 0.1
Mean 1.35
-1
RMSE
1.3
-2
1.25
-3
-4 1.2
31
36
71
76
81
1
6
11
16
21
26
41
46
51
56
61
66
86
91
96
1.15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Alpha
11 12
Exponential Smoothing Exponential Smoothing
Forecast
0.5 Demand
Forecast
a=0.1 0
0
a=0.3
-0.5
-0.5 a=0.9
-1 -1
-1.5 -1.5
-2 -2
1
19
31
43
49
55
61
67
79
85
91
97
1
6
16
26
36
41
46
51
56
66
71
76
81
86
91
101
13
25
37
73
11
21
31
61
96
Period Period
Forecast
1
• Small α, more averaging, stable
0.5
Typically α should be 0.05 – 0.3
0
13 14
12
Forecast RMSE vs Alpha
10
0.67
0.66
8
0.65
Trend Data
0.64
Forecast RMSE
6 0.2
0.63 0.5
4
0.62 Series1
0.61
2
0.6
0.59 0
0.58 1 2 3 4 5 6 7 8 9 10 11 12
Period
0.57
0 0.2 0.4 0.6 0.8 1
Alpha
Exponential smoothing will lag behind a trend
15 16
Exponential Smoothing on a Trend Exponential Smoothing on a Trend
Lt = αXt + (1 − α)Lt−1 3
2 Trend
Series
1
-1
Level:
11
16
21
26
31
36
41
46
51
66
71
76
81
96
101
56
61
86
91
Lt = αXt + (1 − α)(Lt−1 + Tt−1)
Trend: 6
α=0.2
Starting values (t ≥ 2) 3
Trend
2
T2 = X2 − X1, L2 = X2 Series Data
Single Smoothing
1 Double smoothing
Forecast:
X̂t+i = Lt + iTt 0
11
16
26
31
36
46
51
56
61
66
71
81
86
21
41
76
91
96
101
Overshoots (dot-com, house prices?)
17 18
Xt =(1 + 0.04t)(1.5,0.5,1)
Data have trend and seasonal pattern
5
3.5
• St multiplicative seasonal factor time t
(1+0.04t)
3
• Season length s
2.5
2
Xt = (L + Tt )St + It
1.5
1 Level:
Xt
0.5
Lt = α + (1 − α)(Lt−1 + Tt−1)
0 St−s
1
13
16
19
25
28
31
40
43
46
49
52
55
58
10
22
34
37
Xt
St = γ + (1 − γ)St−s
25 00 .
Lt−1 + Tt−1
20 00 . Forecast:
(Lt + iTt )St+i−s for i = 1, 2, . . ., s
(
15 00 .
19 20
Exponential Smoothing, Seasonal Pattern Croston’s Method
0.8
Level: 0.7
0.5
Probability
Trend: 0.4
0.2
Seasonal factor: 0.1
Forecast:
Lt + iTt + St+i−s for i = 1, 2, . . ., s • Small quantities (e.g. sales of cars)
(
X̂t+i = Lt + iTt + St+i−2s for i = s + 1, s + 2, . . ., 2s
• Fine-grained time series (e.g. automatic data collec-
...
tion)
• Orders in huge quantities (e.g. containers of beer)
21 22
1.5
0 Definitions
1
14
40
53
66
92
27
79
183
235
274
287
326
378
105
118
131
144
157
170
196
209
222
248
261
300
313
339
352
365
391
Period
• Xt demand time t
• X̂t predicted demand at time t
Example
• Zt estimate of demand when not zero
Exponential smoothing applied (α=0.2)
• Tt time between non-zero demands
Exponential Smoothing Applied
0.9
0.8
• q time since last non-zero demand
0.7
Demand
0.6
0.5
0.4
0.3
0.2
0.1
0
109
127
253
271
289
145
163
181
199
217
235
307
325
343
361
379
397
19
37
55
73
91
1
Period
Exponential smoothing
• Forecast is highest right after non-zero demand
• Forecast is lowest right before non-zero demand
23 24
Croston’s Method Croston’s Method
Example Example
An intermittent Demand Series An intermittent Demand Series
3.5 3.5
3 3
2.5 2.5
2 2
Demand
Demand
1.5 1.5
1 1
0.5 0.5
0 0
1
1
14
40
53
66
92
14
40
53
66
92
27
79
27
79
183
235
274
287
326
378
183
235
274
287
326
378
105
118
131
144
157
170
196
209
222
248
261
300
313
339
352
365
391
105
118
131
144
157
170
196
209
222
248
261
300
313
339
352
365
391
Period Period
0.8
0.7
q = 1 0.6
0.5
Forecast 0.4
0.3
Zt 0.2
X̂t = 0.1
Tt 0
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
210
221
232
243
254
265
276
287
298
309
320
331
342
353
364
375
166
177
188
199
386
397
25 26
27 28
Hyndman
Forecasting based on state space models for exponential smoothing 6 Hyndman
Forecasting based on state space models for exponential smoothing 7
Seasonal component
Trend N A M Trend Seasonal component
component (none) (additive) (multiplicative) component N A M
Pt = Yt Pt = Yt − st−m Pt = Yt /st−m (none) (additive) (multiplicative)
N Qt = `t−1 Qt = `t−1 Qt = `t−1 N µt = `t−1 µt = `t−1 + st−m µt = `t−1 st−m
(none) Tt = Yt − Qt Tt = Yt /Qt (none) `t = `t−1 + αε t `t = `t−1 + αε t `t = `t−1 + αε t /st−m
φ=1 φ=1 φ=1 st = st−m + γε t st = st−m + γε t /`t−1
Yt (h) = `t Yt (h) = `t + st+h−m Yt (h) = `t st+h−m
Pt = Yt Pt = Yt − st−m Pt = Yt /st−m µt = `t−1 + bt−1 µt = `t−1 + bt−1 + st−m µt = (`t−1 + bt−1 )st−m
A Qt = `t−1 + bt−1 Qt = `t−1 + bt−1 Qt = `t−1 + bt−1 A `t = `t−1 + bt−1 + αε t `t = `t−1 + bt−1 + αε t `t = `t−1 + bt−1 + αε t /st−m
(additive) Rt = `t − `t−1 Rt = `t − `t−1 Rt = `t − `t−1 (additive) bt = bt−1 + αβε t bt = bt−1 + αβε t bt = bt−1 + αβε t /st−m
Tt = Yt − Qt Tt = Yt /Qt st = st−m + γε t st = st−m + γε t /(`t−1 + bt−1 )
φ=1 φ=1 φ=1
Yt (h) = `t + hbt Yt (h) = `t + hbt + st+h−m Yt (h) = (`t + hbt )st+h−m µt = `t−1 bt−1 µt = `t−1 bt−1 + st−m µt = `t−1 bt−1 st−m
Pt = Yt Pt = Yt − st−m Pt = Yt /st−m M `t = `t−1 bt−1 + αε t `t = `t−1 bt−1 + αε t `t = `t−1 bt−1 + αε t /st−m
M Qt = `t−1 bt−1 Qt = `t−1 bt−1 Qt = `t−1 bt−1 (multiplicative) bt = bt−1 + αβε t /`t−1 bt = bt−1 + αβε t /`t−1 bt = bt−1 + αβε t /(st−m `t−1 )
(multiplicative) Rt = `t /`t−1 Rt = `t /`t−1 Rt = `t /`t−1 st = st−m + γε t st = st−m + γε t /(`t−1 bt−1 )
Tt = Yt − Qt Tt = Yt /Qt
µt = `t−1 + bt−1 µt = `t−1 + bt−1 + st−m µt = (`t−1 + bt−1 )st−m
φ=1 φ=1 φ=1
D `t = `t−1 + bt−1 + αε t `t = `t−1 + bt−1 + αε t `t = `t−1 + bt−1 + αε t /st−m
Yt (h) = `t bth Yt (h) = `t bth + st+h−m Yt (h) = `t bth st+h−m
(damped) bt = φbt−1 + αβε t bt = φbt−1 + αβε t bt = φbt−1 + αβε t /st−m
Pt = Yt Pt = Yt − st−m Pt = Yt /st−m st = st−m + γε t st = st−m + γε t /(`t−1 + bt−1 )
D Qt = `t−1 + bt−1 Qt = `t−1 + bt−1 Qt = `t−1 + bt−1
(damped) Rt = `t − `t−1 Rt = `t − `t−1 Rt = `t − `t−1
Tt = Yt − Qt Tt = Yt /Qt
β<φ<1 β<φ<1 β<φ<1 Table 2: State space equations for each additive error model in the classification. Multiplicative error
Yt (h) = `t + Yt (h) = `t + Yt (h) = [`t + models are obtained by replacing ε t by µt ε t in the above equations.
(1 + φ + · · · + φh−1 )bt (1 + φ + · · · + φh−1 )bt + st+h−m (1 + φ + · · · + φh−1 )bt ]st+h−m
Table 1: Formulae for recursive calculations and point forecasts. 3 State space models
Writing (2.1)–(2.3) in their error-correction form we obtain HKSG describe the state space models that underlie the exponential smoothing methods.
• Two variants of state-space model (additive error, mul-
For each method, there are two models—a model with additive errors and a model with
`t = Qt + α(Pt − Qt ) (2.4)
bt = φbt−1 + β(Rt − bt−1 ) (2.5)
tiplicative error)
multiplicative errors. The pointwise forecasts for the two models are identical, but prediction
intervals will differ.
st = st−m + γ(Tt − st−m ). (2.6)
• Likelihood functions
The general model involves a state vector xt = (`t , bt , st , st−1 , . . . , st−(m−1) ) and state space
The method with fixed level (constant over time) is obtained by setting α = 0, the method equations of the form
with fixed trend (drift) is obtained by setting β = 0, and the method with fixed seasonal • Model selection
pattern is obtained by setting γ = 0. Note also that the additive trend methods are obtained Yt = µt + k(xt−1 )ε t (3.1)
by letting φ = 1 in the damped trend methods. x = f (x ) + g(x )ε
Automatic model selection, t
parameter optimization(3.2)
t−1 t−1 t
Applied
where {ε } is to
t M3 competition
a Gaussian white noise process with mean zero and variance σ and µ = 2
t
Yt−1 (1). The model with additive errors has k(xt−1 ) = 1, so that Yt = µt + ε t . The model
with multiplicative errors has k(xt−1 ) = µt , so that Yt = µt (1 + ε t ). Thus, ε t = (Yt − µt )/µt is
a relative error for the multiplicative model.
All the methods in Table 1 can be written in the form (3.1) and (3.2). The underlying equa-
tions are given in Table 2. The models are not unique. Clearly, any value of k(xt−1 ) will
29 30
De-noising
Fourier transformation
• ???
31 32
Forecasting is important Acknowledgements
Discussion
• Stochastic programming
• Deterministic optimization (using forecasts)
Data sets
• Famous M1, M2, M3 (April 2006) competition
http://mktg-sun.wharton.upenn.edu/forecast/m3-competition.html
33 34