You are on page 1of 9

Forecasting, introduction

Predict demands and resources to schedule production


Sales and operations planning (SOP) • Sales of computers
Demand forecasting • Energy production
To balance supply with demand and synchronize all oper- • Seats in airplane
ational plans
• Capture demand data
• Demand forecasting Motivation: Beer game

• Balancing of supply, demand, and budgets. Deter-


mine the best product mix, optimal inventory targets, Time series measuring quantity of interest
postponement strategies, and supply plans
X1, X2, . . . , Xt
• Adapt and adjust to changing business conditions: Sim-
ulate multiple business scenarios; react quickly to chang- forecast values at
ing conditions through automated exception manage-
ment Xt+1, Xt+2, . . .
• Integrated Solution: Drive continuous improvement our predictions are denoted
through integrated performance management
X̂t+1, X̂t+2, . . .
Sub-tasks
• Demand Planning, Advanced Supply Chain Planning,
Collaborative Planning, Inventory Optimization, Man-
ufacturing Scheduling, and Global Order Promising. Assume measurements in time series are correlated

1 2

Motivation Components
Recall Australian Red Wine Sales
Australian Wine Sales • Trend (global heating brings more rain)
S e r ie s
• Seasonal variation (swim suits sold every spring)
30 00 .

• Cyclical variation (beer consumption increase during


25 00 . Soccer Championship)
• Irregular variation
20 00 .

15 00 .

10 00 .
Seasonal and cyclical variation
5 00 .
• Multiplicative:
(beer consumption increase hot days)
0 20 40 60 80 10 0 12 0 14 0
dependent on current sales
• Additive:
(beer consumption during Roskilde Festival)
• Trend independent of current sales
• Seasonal variation
• Irregular variation

3 4
Models Quality of forecast
S e r ie s

30 00 .
• Error εt = |X̂t − Xt |
25 00 . • Error total ∑t=1
n
εt

20 00 .
• Mean squared error 1n ∑t=1
n
εt2
q
15 00 . • Root mean squared error 1n ∑t=1n
εt2
10 00 .
Punish large errors more than small errors
5 00 .

0 20 40 60 80 10 0 12 0 14 0

Additive model with trend


Xt = L + Tt + St + It Model selection
Multiplicative model with trend • Analyze problem, use experience, common sense
Xt = (L + Tt )St + It • Automatic model selection
where
– For all models
• L level of series
– For all parameters
• Tt trend time t – Evaluate error of forecast up to time t
• St seasonal variation time t – Use best model for future time
• It irregular variation time t

forecasting: model fitting to time series

5 6

ES would work well here


Overview Moving Average

• Moving average, weighted moving average Typical Behavior for Exponential Smoothing

• First order exponential smoothing 4

• Second order exponential smoothing 0

-2
• Trends and seasonal pattern
Demand

-4
-6
• Croston’s method -8

• Hyndman unified framework -10

-12
-14
1

13

19

25

31

43

49

55

61
67

73

79

85

91

97

103

115
37

109

Period

Given observations
X1, X2, . . . , Xt
Level at time t
1 m−1
Lt = ∑ Xt−i
m i=0
Forecast
X̂t+i = Lt for i = 1, 2, . . .

• advantage large m
• advantage small m
• average age of data m/2

7 8
Weighted Moving Average Exponential Smoothing

Level at time t Data have no trend or seasonal pattern


t
Lt = ∑ Wi Xi
i=1 Lt = Lt−1 + α(Xt − Lt−1)
where Wi is weight attached to each historic point (correct mistake of last forecast)
W1 +W2 + . . . +Wt = 1 Lt = αXt + (1 − α)Lt−1
Forecast (weighted average of last observation and last forecast)
X̂t+i = Lt for i = 1, 2, . . . Forecast
New data more weight than old data X̂t+i = Lt for i = 1, 2, . . .

all forecasting schemes are variants of


weighted moving average Substituting Lt−1 = αXt−1 + (1 − α)Lt−2 we get
Lt = αXt + α(1 − α)Xt−1 + (1 − α)2Lt−2
Weights on past data Repeating substitution
0.7 Lt = αXt + α(1 − α)Xt−1 + α(1 − α)2Xt−2
0.6 +α(1 − α)3 Xt−3 + . . .
0.5 Weights decrease exponentially (exponential smoothing)
0.4
Expo Smooth a=0.6
MoveAve(M=5)
0.3
• Appropriate for mean plus noise
0.2
• Or when mean is wandering around
0.1

0
• Quite stable processes
1 2 3 4 5 6 7 8 9 10
Note: some authors use 1 − α instead of α
9 10

Exponential Smoothing
Shifting Mean + Zero Mean White Noise Exponential Smoothing

4 4

3 3

2 2

1
1
Series
Series 0 0.3
0
Mean Mean
-1
-1
-2
-2
-3
-3
31
36

71
76
81
1
6
11
16
21
26

41
46
51
56
61
66

86
91
96

-4
31
36

71
76
81
1
6
11
16
21
26

41
46
51
56
61
66

86
91
96

-4

Choice of smoothing parameter α


4

3
RMSE vs Alpha
2
1.45
1
Series 1.4
0 0.1
Mean 1.35
-1
RMSE

1.3
-2

1.25
-3

-4 1.2
31
36

71
76
81
1
6
11
16
21
26

41
46
51
56
61
66

86
91
96

1.15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Alpha

11 12
Exponential Smoothing Exponential Smoothing

Actual vs Forecast for Series and Forecast using Alpha=0.9


Various Alpha
2
2
1.5
1.5
1
1
0.5

Forecast
0.5 Demand
Forecast

a=0.1 0
0
a=0.3
-0.5
-0.5 a=0.9

-1 -1

-1.5 -1.5

-2 -2
1

19

31

43

49

55
61

67

79

85
91

97

1
6

16

26

36
41

46
51
56

66
71

76
81
86

91

101
13

25

37

73

11

21

31

61

96
Period Period

Series and Forecast using Alpha=0.9

Smoothing parameter α, 0 < α < 1


1.5

• Large α, adjust more quickly to changes

Forecast
1
• Small α, more averaging, stable
0.5
Typically α should be 0.05 – 0.3
0

RMSE analysis show larger α, smoothing not appropriate -0.5


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Period

13 14

Exponential Smoothing Exponential Smoothing on a Trend

Exponential Smoothing on a Trend

12
Forecast RMSE vs Alpha

10
0.67

0.66
8
0.65
Trend Data
0.64
Forecast RMSE

6 0.2
0.63 0.5
4
0.62 Series1

0.61
2
0.6

0.59 0
0.58 1 2 3 4 5 6 7 8 9 10 11 12
Period
0.57
0 0.2 0.4 0.6 0.8 1
Alpha
Exponential smoothing will lag behind a trend

15 16
Exponential Smoothing on a Trend Exponential Smoothing on a Trend

Data have trend but no seasonal pattern 6


Example
5
Ordinary exponential smoothing
4

Lt = αXt + (1 − α)Lt−1 3

2 Trend
Series
1

Double exponential smoothing (Holt 1957)


0

-1
Level:

11

16

21

26

31

36

41

46

51

66

71

76

81

96

101
56

61

86

91
Lt = αXt + (1 − α)(Lt−1 + Tt−1)
Trend: 6

Tt = β(Lt − Lt−1) + (1 − β)Tt−1 5

(weighted average new trend, trend last time) 4

α=0.2
Starting values (t ≥ 2) 3

Trend
2
T2 = X2 − X1, L2 = X2 Series Data
Single Smoothing
1 Double smoothing
Forecast:
X̂t+i = Lt + iTt 0

Smoothing parameters 0 < α < 1 and 0 < β < 1 -1

11

16

26

31

36

46

51

56

61

66

71

81

86
21

41

76

91

96

101
Overshoots (dot-com, house prices?)
17 18

Exponential Smoothing, Seasonal Pattern Exponential Smoothing, Seasonal Pattern

Xt =(1 + 0.04t)(1.5,0.5,1)
Data have trend and seasonal pattern
5

4.5 Multiplicative Seasonal Series (Winters 1960)


4

3.5
• St multiplicative seasonal factor time t
(1+0.04t)
3
• Season length s
2.5

2
Xt = (L + Tt )St + It
1.5

1 Level:
Xt
0.5
Lt = α + (1 − α)(Lt−1 + Tt−1)
0 St−s
1

13

16

19

25

28

31

40

43

46

49

52

55

58
10

22

34

37

Recall Australian Red Wine Sales Trend:


Australian Wine Sales Tt = β(Lt − Lt−1) + (1 − β)Tt−1
S e r ie s Seasonal factor:
30 00 .

Xt
St = γ + (1 − γ)St−s
25 00 .
Lt−1 + Tt−1
20 00 . Forecast:
(Lt + iTt )St+i−s for i = 1, 2, . . ., s
(
15 00 .

X̂t+i = (Lt + iTt )St+i−2s for i = s + 1, s + 2, . . ., 2s


10 00 .
...
5 00 .
Smoothing parameters 0 < α < 1, 0 < β < 1 and 0 < γ < 1.
0 20 40 60 80 10 0 12 0 14 0

19 20
Exponential Smoothing, Seasonal Pattern Croston’s Method

Additive Seasonal Series


Xt = L + Tt + St + It Demand Distribution

0.8
Level: 0.7

Lt = α(Xt − St−s) + (1 − α)(Lt−1 + Tt−1) 0.6

0.5

Probability
Trend: 0.4

Tt = β(Lt − Lt−1) + (1 − β)Tt−1 0.3

0.2
Seasonal factor: 0.1

St + γ(Xt − Lt−1 − Tt−1) + (1 − γ)St−s


0
0 1 2 3 4 5 6 7 8 9
Demand

Forecast:
Lt + iTt + St+i−s for i = 1, 2, . . ., s • Small quantities (e.g. sales of cars)
(
X̂t+i = Lt + iTt + St+i−2s for i = s + 1, s + 2, . . ., 2s
• Fine-grained time series (e.g. automatic data collec-
...
tion)
• Orders in huge quantities (e.g. containers of beer)

21 22

Croston’s Method Croston’s Method

Example Keep track on


An intermittent Demand Series
• Time between non-zero demands
3.5

3 • Demand size when non-zero


2.5

2 • Smooth both time between and demand size


Demand

1.5

1 • Combine both for forecasting


0.5

0 Definitions
1
14

40
53
66

92
27

79

183

235

274
287

326

378
105
118
131
144
157
170

196
209
222

248
261

300
313

339
352
365

391

Period
• Xt demand time t
• X̂t predicted demand at time t
Example
• Zt estimate of demand when not zero
Exponential smoothing applied (α=0.2)
• Tt time between non-zero demands
Exponential Smoothing Applied

0.9

0.8
• q time since last non-zero demand
0.7
Demand

0.6

0.5

0.4

0.3

0.2

0.1

0
109

127

253

271

289
145

163

181

199

217

235

307

325

343

361

379

397
19

37

55

73

91
1

Period

Exponential smoothing
• Forecast is highest right after non-zero demand
• Forecast is lowest right before non-zero demand

23 24
Croston’s Method Croston’s Method

Example Example
An intermittent Demand Series An intermittent Demand Series

3.5 3.5

3 3

2.5 2.5

2 2
Demand

Demand
1.5 1.5

1 1

0.5 0.5

0 0
1

1
14

40
53
66

92

14

40
53
66

92
27

79

27

79
183

235

274
287

326

378

183

235

274
287

326

378
105
118
131
144
157
170

196
209
222

248
261

300
313

339
352
365

391

105
118
131
144
157
170

196
209
222

248
261

300
313

339
352
365

391
Period Period

Update (if zero demand)


Zt = Zt−1
Tt = Tt−1 Recall example
q = q+1
Croston’ s method applied (α=0.2)
Update (if not zero demand)
Zt = α(Xt − Zt−1) + (1 − α)Zt−1 Croston's Method Applied to Example Data

Tt = α(q − Tt−1) + (1 − α)Tt−1


0.9

0.8

0.7

q = 1 0.6

0.5

Forecast 0.4

0.3

Zt 0.2

X̂t = 0.1

Tt 0

1
12
23
34
45
56
67
78
89
100
111
122
133
144
155

210
221
232
243
254
265
276
287
298
309
320
331
342
353
364
375
166
177
188
199

386
397
25 26

Hyndman (2002) Terminology

Seasonal Component • Yt observed value time t (was Xt )


Trend N A M
• Yt (1) forecast one step ahead at time t (was X̂t )
Component (none) (additive) (multiplicative)
N (none) NN NA NM • Level `t (was Lt )
A (additive) AN AA AM
M (multiplicative) MN MA MM • Trend bt (was Tt )
D (damped) DN DA DM • Season st (was St )
Damped: trend is damped over long horizons • Season length m (was s)
NN Simple exponential smoothing
AN Holt’s linear method
AA Holt-Winter’s method (additive)
AM Holt-Winter’s method (multiplicative)

12 exponential smoothing methods


Level `t = αPt + (1 − α)Qt
Trend bt = βRt + (φ − β)bt−1
Seasonal st = γTt + (1 − γ)st−m
Values Pt , Qt , Rt , Tt vary
• Smoothing parameters α, β, γ
• Damping φ

27 28
Hyndman
Forecasting based on state space models for exponential smoothing 6 Hyndman
Forecasting based on state space models for exponential smoothing 7

Seasonal component
Trend N A M Trend Seasonal component
component (none) (additive) (multiplicative) component N A M
Pt = Yt Pt = Yt − st−m Pt = Yt /st−m (none) (additive) (multiplicative)
N Qt = `t−1 Qt = `t−1 Qt = `t−1 N µt = `t−1 µt = `t−1 + st−m µt = `t−1 st−m
(none) Tt = Yt − Qt Tt = Yt /Qt (none) `t = `t−1 + αε t `t = `t−1 + αε t `t = `t−1 + αε t /st−m
φ=1 φ=1 φ=1 st = st−m + γε t st = st−m + γε t /`t−1
Yt (h) = `t Yt (h) = `t + st+h−m Yt (h) = `t st+h−m
Pt = Yt Pt = Yt − st−m Pt = Yt /st−m µt = `t−1 + bt−1 µt = `t−1 + bt−1 + st−m µt = (`t−1 + bt−1 )st−m
A Qt = `t−1 + bt−1 Qt = `t−1 + bt−1 Qt = `t−1 + bt−1 A `t = `t−1 + bt−1 + αε t `t = `t−1 + bt−1 + αε t `t = `t−1 + bt−1 + αε t /st−m
(additive) Rt = `t − `t−1 Rt = `t − `t−1 Rt = `t − `t−1 (additive) bt = bt−1 + αβε t bt = bt−1 + αβε t bt = bt−1 + αβε t /st−m
Tt = Yt − Qt Tt = Yt /Qt st = st−m + γε t st = st−m + γε t /(`t−1 + bt−1 )
φ=1 φ=1 φ=1
Yt (h) = `t + hbt Yt (h) = `t + hbt + st+h−m Yt (h) = (`t + hbt )st+h−m µt = `t−1 bt−1 µt = `t−1 bt−1 + st−m µt = `t−1 bt−1 st−m
Pt = Yt Pt = Yt − st−m Pt = Yt /st−m M `t = `t−1 bt−1 + αε t `t = `t−1 bt−1 + αε t `t = `t−1 bt−1 + αε t /st−m
M Qt = `t−1 bt−1 Qt = `t−1 bt−1 Qt = `t−1 bt−1 (multiplicative) bt = bt−1 + αβε t /`t−1 bt = bt−1 + αβε t /`t−1 bt = bt−1 + αβε t /(st−m `t−1 )
(multiplicative) Rt = `t /`t−1 Rt = `t /`t−1 Rt = `t /`t−1 st = st−m + γε t st = st−m + γε t /(`t−1 bt−1 )
Tt = Yt − Qt Tt = Yt /Qt
µt = `t−1 + bt−1 µt = `t−1 + bt−1 + st−m µt = (`t−1 + bt−1 )st−m
φ=1 φ=1 φ=1
D `t = `t−1 + bt−1 + αε t `t = `t−1 + bt−1 + αε t `t = `t−1 + bt−1 + αε t /st−m
Yt (h) = `t bth Yt (h) = `t bth + st+h−m Yt (h) = `t bth st+h−m
(damped) bt = φbt−1 + αβε t bt = φbt−1 + αβε t bt = φbt−1 + αβε t /st−m
Pt = Yt Pt = Yt − st−m Pt = Yt /st−m st = st−m + γε t st = st−m + γε t /(`t−1 + bt−1 )
D Qt = `t−1 + bt−1 Qt = `t−1 + bt−1 Qt = `t−1 + bt−1
(damped) Rt = `t − `t−1 Rt = `t − `t−1 Rt = `t − `t−1
Tt = Yt − Qt Tt = Yt /Qt
β<φ<1 β<φ<1 β<φ<1 Table 2: State space equations for each additive error model in the classification. Multiplicative error
Yt (h) = `t + Yt (h) = `t + Yt (h) = [`t + models are obtained by replacing ε t by µt ε t in the above equations.
(1 + φ + · · · + φh−1 )bt (1 + φ + · · · + φh−1 )bt + st+h−m (1 + φ + · · · + φh−1 )bt ]st+h−m

Table 1: Formulae for recursive calculations and point forecasts. 3 State space models

Writing (2.1)–(2.3) in their error-correction form we obtain HKSG describe the state space models that underlie the exponential smoothing methods.
• Two variants of state-space model (additive error, mul-
For each method, there are two models—a model with additive errors and a model with
`t = Qt + α(Pt − Qt ) (2.4)
bt = φbt−1 + β(Rt − bt−1 ) (2.5)
tiplicative error)
multiplicative errors. The pointwise forecasts for the two models are identical, but prediction
intervals will differ.
st = st−m + γ(Tt − st−m ). (2.6)
• Likelihood functions
The general model involves a state vector xt = (`t , bt , st , st−1 , . . . , st−(m−1) ) and state space
The method with fixed level (constant over time) is obtained by setting α = 0, the method equations of the form
with fixed trend (drift) is obtained by setting β = 0, and the method with fixed seasonal • Model selection
pattern is obtained by setting γ = 0. Note also that the additive trend methods are obtained Yt = µt + k(xt−1 )ε t (3.1)
by letting φ = 1 in the damped trend methods. x = f (x ) + g(x )ε
Automatic model selection, t
parameter optimization(3.2)
t−1 t−1 t

Applied
where {ε } is to
t M3 competition
a Gaussian white noise process with mean zero and variance σ and µ = 2
t
Yt−1 (1). The model with additive errors has k(xt−1 ) = 1, so that Yt = µt + ε t . The model
with multiplicative errors has k(xt−1 ) = µt , so that Yt = µt (1 + ε t ). Thus, ε t = (Yt − µt )/µt is
a relative error for the multiplicative model.

All the methods in Table 1 can be written in the form (3.1) and (3.2). The underlying equa-
tions are given in Table 2. The models are not unique. Clearly, any value of k(xt−1 ) will
29 30

Forecasting is model fitting to time series Advanced methods

• Linear Trend (regression) • Stochastic models


• Linear Trend and Additive Seasonality • Likelihood calculations
• Linear Trend and Multiplicative Seasonality • Prediction intervals
• Polynomial • Procedures for model selection
• Logarithmic
• Exponential Neural Networks

• Trained for model selection


Frequency identification
• Smoothing parameters
• All forecasting methods assume season known
• Identify seasonality of time series

De-noising

• Removing noise from time series improves forecast


• Haar Wavelet transformation
• Daubechies Wavelet transformation

Fourier transformation

• ???

31 32
Forecasting is important Acknowledgements

• Forecasts of sales Figures from


• Sales prices of houses and flats • HOLT WINTERS file archive
http://www.barbich.net/holt/
• Water level in lakes
• Share prices
Which forecast method should we use in Beer game?
Xt = {4, 4, 4, 4, 8, 8, 8, . . .}

Discussion
• Stochastic programming
• Deterministic optimization (using forecasts)

Data sets
• Famous M1, M2, M3 (April 2006) competition
http://mktg-sun.wharton.upenn.edu/forecast/m3-competition.html

– 645 annual series


– 756 quarterly series
– 174 other series
– 1428 monthly series

33 34

You might also like