Professional Documents
Culture Documents
Causal
Forecasting
Models
Causal Models
• Used when demand is correlated with some known and
measurable environmental factor.
• Demand (y) is a function of some variables (x1, x2, . . . xk)
Dependent Variable Independent Variables
Disposable Diapers
~ f(births, household income)
Car Repair Parts
~ f(weather/snow)
Promoted Item
~f(discount, placement, advertisements)
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 2
Agenda
• Simple Linear Regression
• Regression in Spreadsheets
• Multiple Linear Regression
• Model Transformations
• Model Fit and Validity
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 3
Simple Linear Regression
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 4
Example: Simple Linear Regression
• Recall from earlier lecture on exponential smoothing
• Estimating initial parameters for Holt-Winter (level, trend, seasonality)
• Removed seasonality in order to estimate initial level and trend
yi 0 1 xi
220
Deseasoned Daily Bagel Demand
160
140
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 5
Simple Linear Regression
• The relationship is described in terms of a linear model
• The data (xi, yi) are the observed pairs from which we try to estimate
the Beta coefficients to find the ‘best fit’
• The error term, ε, is the ‘unaccounted’ or ‘unexplained’ portion
• The error terms are assumed to be iid ~N(0,σ)
Observed demand for period 97
220
= y97 = 204
y = -213.16 + 4.14x
Deseasoned Daily Bagel Demand
200
100
80 85 90 95 100
Time Period (Days)
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 6
Simple Linear Regression
• Residuals or Error Terms
Residuals, ei, are the difference of actual minus predicted values
Find the b’s that “minimize the residuals”
i 1 i i 1 yi yˆi i 1 yi b0 b1 xi
n 2 n 2 n 2
e
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 7
Simple Linear Regression
• Ordinary Least Squares (OLS) Regression
Finds the coefficients (b0 and b1) that minimize the sum of the squared error terms.
We can use partial derivatives to find the first order optimality condition with respect to
each variable.
i 1 i i 1 yi yˆi i 1 yi b0 b1 xi
n 2 n 2 n 2
e
220
y = -213.16 + 4.14x
Deseasoned Daily Bagel Demand
n 200
(xi x )( yi y)
b i1
1 n 180
(xi x ) 2
i1
160
b0 y b1 x 140
120
We know from the data:
x 89.5 y 157.4 100
80 85 90 95 100
Time Period (Days)
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 8
OLS Regression in Spreadsheet
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 9
n
(xi x )( yi y)
Regression – By Hand b1 i1
n
(xi x ) 2
i1
=B7-$B$22
=A7-$A$22
=C7*D7
=C7^2
=SUM(C2:C21) =SUM(D2:D21)
=SUM(E2:E21) =SUM(F2:F21)
=LINEST(A2:A21,B2:B21,1,1)
LINEST(known_y's, known_x's, constant, statistics)
Then, in the 'Insert Function' area, type the formula and press
the keyboard combination Ctrl+Shift+Enter (for Windows &
Linux) or command+shift+return (for Mac OS X).
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 11
Regression – Using LINEST b1
sb1
b0
sb0
n = number of observations R2 se
k = number of explanatory variables (NOT intercept)
F df
df = degrees of freedom (n-k-1)
SSR SSE
b0 = estimate of the intercept b y b x
0 1 n
(xi x )( yi y)
b1 = estimate of the slope (explanatory variable 1) b i1
1 n
(xi x )2
i1
n n n
yi y ŷi y yi ŷ
2 2 2
Total Sum of
Squares (SST) i1 i1 i1
R2 = Coefficient of Determination: the ratio of “explained” to total sum of squares where 0≤R2≤1
SSR SSR
R2
SST SSR SSE
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 12
Regression – Using LINEST b1
sb1
b0
sb0
y ŷ
n n 2
ei2
se
i1 i1 i i
n k 1 n k 1
sb0 = standard error of intercept sb1 = standard error of slope
1 x2 sb1 se
1
sb0 se n
n x x
x x
2 n 2
i i1 i
i1
=LINEST(A2:A21,B2:B21,1,1)
LINEST(known_y's, known_x's, constant, statistics)
=D2/D3
=TDIST(D9,$E$5,2)
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 14
Multiple Linear Regression
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 15
Example: Monthly Iced Coffee Sales
• Develop Forecasting Model #1
Level, trend, & avg. historical temperature
Develop OLS regression model
Yi 0 1 x1i 2 x2i i
DEMAND = LEVEL + TREND(period) + TEMP_EFFECT(temp)
Output b2 b1 b0
(0.27) 52.65 3,254.81 sb2 sb1 sb0
2.75 6.24 174.85
0.78 208.97 #N/A R2 se
36.36 21 #N/A
F df
3,175,996 917,074 #N/A
SSR SSE
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 16
Example: Monthly Iced Coffee Sales
b2 b1 b0
(0.27) 52.65 3,254.81
sb2 sb1 sb0 n= 24 observations
2.75 6.24 174.85 k = 2 variables
0.78 208.97 #N/A R2 se df = n-k-1 = 24-2-1= 21
36.36 21 #N/A F df
3,175,996 917,074 #N/A
SSR SSE
Yi 0 1x1i i
DEMAND = LEVEL + TREND(period)
52.55 3239.89 b1 b0
6.02 86.05 sb1 sb0
0.78 204.22 R2 se
76.14 22 F df
3175570 917500 SSR SSE
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 18
Example: Monthly Iced Coffee Sales
• Compare the goodness of fit between models
Model 1:
DEMAND = LEVEL + TREND(period) + TEMP_EFFECT(temp)
R2 = 0.77594
Model 2:
DEMAND = LEVEL + TREND(period)
R2 = 0.77584
n 1
adj R 1 1 R
2
n k 1
2
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 19
Transforming Variables
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 20
Example: Monthly Iced Coffee Sales
• Develop Forecasting Model #3
Level, trend, & school being open
Develop OLS regression model
Yi 0 1x1i 3 x3i i
DEMAND = LEVEL + TREND(period) + OPEN_EFFECT(open)
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 21
Example: Monthly Iced Coffee Sales (#3)
144.50 51.54 3156.12 b3 b1 b0
85.35 5.81 96.30 sb3 sb1 sb0 n= 24 observations
k = 2 variables
0.80276 196.07 #N/A R2 se
df = n-k-1 = 24-2-1= 21
42.74 21 #N/A F df
3285765 807305 #N/A SSR SSE
• How is the overall fit of the model?
R2 = 0.80276 with adj R2 ≅ 0.78398 (better than #1 or #2)
• Are the individual variables statistically significant?
Run t-tests for each variable and the intercept
Intercept and trend coefficients are strongly significant, school flag is borderline
intercept trend school in session
b0 3156 b1 51.54 b3 144.50
tb0 32.77 tb1 8.87 tb3 1.69
sb0 96.3 sb1 5.81 sb3 85.35
P-value =TDIST(32.77, 21, 2) < 0.0001 P-value =TDIST(8.87, 21, 2) < 0.0001 P-value =TDIST(1.69, 21, 2) = 0.105
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 22
Model & Variable Transformations
• We are using linear regression, so how can we use dummy variables?
The model just needs to be linear in the parameters
For model #3: y = β0+β1(period)+β3(open_flag)
Many transformations can be used:
y 0 1 x1
y 0 1 x1 2 x12
y 0 1 ln(x1 )
y ax b ln( y) ln(a) bln(x)
y ax1b1 x2b2 ln( y) ln(a) b1 ln(x1 ) b2 ln(x2 )
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models
Model Validation
5
Number of Months
4
3
2
• Basic Checks 1
0
Goodness of Fit – look at the R2 values
50
100
150
200
250
-250
-200
-150
-100
-50
0
Individual coefficients – t-tests for p-value Residuals
Residuals
0
residuals -100 35 55 75 95
Does the standard deviation of the error terms differ -200
for different values of the independent variables? -300
-400
Autocorrelation – is there a pattern over time Temperature (F)
Residuals
Make sure dummy variables were not over specified 100
-
• Statistics Software (100) 0 2 4 6 8 10 12 14 16 18 20 22 24
(200)
Most packages check for all of these (300)
(400)
More sophisticated tests and remedies Time Periods
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 25
Modeling Results – which is best?
8
7 Model: Linear Equation
6 8
5
Sales
6
4
y = 0.4x + 4.5
Sales
3 4 R² = 0.16
2
1 2
0
0
0 2 4 6 8
0 2 4 6 8
Time
Time
6 6
Sales
Sales
4 y = 0.5x2 - 2.1x + 7 4
y = 1.3333x3 - 9.5x2 + 20.167x - 7
R² = 0.36 2 R² = 1
2
0 0
0 2 4 6 8 0 2 4 6 8
Time Time
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 27
Key Points Yi 0 1x1i 2 x2i i
• Regression finds correlations between
A single dependent variable (y)
One or more independent variables (x1, x2, …)
• Coefficients are estimates by minimizing the sum of the squares of
the errors
• Always test your model:
Goodness of fit (R2)
Statistical significance of coefficients (p-value)
• Some Warnings:
Correlation is not causation
Avoid over-fitting of data
• Why not use this instead of exponential smoothing?
All data treated the same
Amount of data required to store
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 28
CTL.SC1x -Supply Chain & Logistics Fundamentals
“Casey”
Photo courtesy Yankee Golden
Retriever Rescue (www.ygrr.org)
caplice@mit.edu
Image Sources
• "Pampers packages (2718297564)" by Elizabeth from Lansing, MI, USA - Pampers packagesUploaded by Dolovis. Licensed under
Creative Commons Attribution 2.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Pampers_packages_(2718297564).jpg#mediaviewer/File:Pampers_packages_(271829756
4).jpg
• "Kraft Coupon" by Julie & Heidi from West Linn & Gillette, USA - Grocery Coupons - Tearpad shelf display of coupons for Kraft
Macaroni & Cheese. Licensed under Creative Commons Attribution-Share Alike 2.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Kraft_Coupon.jpg#mediaviewer/File:Kraft_Coupon.jpg
• "Damaged car door" by Garitzko - Own work. Licensed under Public domain via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Damaged_car_door.jpg#mediaviewer/File:Damaged_car_door.jpg
CTL.SC1x - Supply Chain and Logistics Fundamentals Lesson: Causal Forecasting Models 30