Professional Documents
Culture Documents
Analysis of Variance
Source DF SS MS F P
Regression 2 316.19 158.09 20.81 0.000
Residual Error 25 189.92 7.60
Total 27 506.11
Source DF Seq SS
GRE 1 315.95
ROE 1 0.24
Unusual Observations
yes, the restriction is that lnE equals 0. No, because E always affects PE.
2A.) Consider the following model (don’t estimate it!):
Explain why the coding of the dummy variable in the Excel file you
were given is inappropriate. (Hint: What restrictions does this
model imply?)
2B.) Define two new dummy variables that code the type of industry,
with the consumer goods industry as the “omitted group” (the
baseline or benchmark industry). Referring to these new dummy
variables you have defined, state a model that explains the P/E ratio
as a function of the growth rate of earnings and the return on
equity, and allows the following to differ for the three types of
industry: (i) intercept, and (ii) response of the predicted P/E ratio to
a change in the growth rate of earnings.
Capital Goods 10for company producing capital goodsif not producing capital
goods
PE=β0+β1×GRE+β2×ROE+β3×CG+β4×II+β5×CG×GRE+β6×II×GRE+u
2C.) Estimate the new model you stated in part b. Interpret the
coefficients in that model.
Analysis of Variance
Source DF SS MS F P
Regression 6 327.717 54.619 6.43 0.001
Residual Error 21 178.397 8.495
Total 27 506.114
Source DF Seq SS
GRE 1 315.946
ROE 1 0.243
CG 1 2.575
II 1 0.883
CG*GRE 1 1.494
II*GRE 1 6.576
Unusual Observations
For every additional dollar in the GRE increase the predicted PE by 0.0694
dollars
For every additional dollar in the ROE increase the predicted PE by 0.0307
dollars
For every additional dollar in the II decrease the predicted PE by 1.24 dollars
3A.) Make a “run chart”for these sales data. What features stand
out? (Definition: A run chart plots the data points of a time series,
against time. In this application, we plot sales on the vertical axis,
and the quarter, going from 1 to 16, on the horizontal axis. Minitab
does this—use“stat” then “time series” then “time series plot” and
choose “simple.” See below for why we need a run chart.)
80
70
sales
60
50
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Observation
Number of runs about median: 4 Number of runs up or down: 8
Expected number of runs: 9.00000 Expected number of runs: 10.33333
Longest run about median: 6 Longest run up or down: 3
Approx P-Value for Clustering: 0.00483 Approx P-Value for Trends: 0.07089
Approx P-Value for Mixtures: 0.99517 Approx P-Value for Oscillation: 0.92911
Residual Plots for sales
Normal Probability Plot Versus Fits
99 4
90
2
Residual
Percent
50 0
10 -2
1 -4
-5.0 -2.5 0.0 2.5 5.0 40 50 60 70 80
Residual Fitted Value
3 2
Frequency
Residual
2 0
1 -2
0 -4
-4 -2 0 2 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Residual Observation Order
Analysis of Variance
Source DF SS MS F P
Regression 4 2684.67 671.17 76.83 0.000
Residual Error 11 96.09 8.74
Total 15 2780.76
Source DF Seq SS
quarter 1 1687.44
spring 1 199.73
summer 1 791.57
winter 1 5.92
For every change in season, the sales increase by 2.24 million cans
For every winter that passes (ie 1 year), sales increase by 1.77 million cans
For every spring that passes (ie 1 year), sales increase by 14.8 million cans
For every summer that passes (ie 1 year), sales increase by 18.1 million cans
SER: 96.09
R^2: 96.5%
Rbar^2: 95.3%
Accuracy Measures
MAPE 12.9293
70 MAD 7.7813
sales
MSD 68.3323
60
50
40
2 4 6 8 10 12 14 16 18 20
Index
Data sales
Length 16
NMissing 0
Yt = 42.12 + 2.22779*t
Accuracy Measures
MAPE 12.9293
MAD 7.7813
MSD 68.3323
Forecasts
Period Forecast
17 79.9925 - Winter
18 82.2203 - Spring
19 84.4481 - Summer
20 86.6759 - Fall
4A.) Re-estimate the model using the natural logarithm of sales as
the dependent variable (the semi-log or log-linmodel). Provide an
interpretation of each of the estimated coefficients. (Hint:
Remember that proportional change from to
ln X ( 2 ) − ln X (1) ≈ X (1)
!)
X ( 2)
Analysis of Variance
Source DF SS MS F P
Regression 4 0.75172 0.18793 75.67 0.000
Residual Error 11 0.02732 0.00248
Total 15 0.77904
Source DF Seq SS
quarter 1 0.48314
winter 1 0.07596
spring 1 0.02435
summer 1 0.16827
MSD 0.01849
4.1
4.0
3.9
3.8
3.7
2 4 6 8 10 12 14 16 18 20
Index
Period Forecast
17 4.40828 = 82.1281 millions of cans
18 4.44597 = 85.2826 millions of cans
19 4.48367 = 88.5591 millions of cans
20 4.52137 = 91.9815 millions of cans
5.) The data set called US life expectancy CENSUS.xlscontains annual
data from 1970 to 2004 on life expectancy at birth in the US, for the
entire population, for whites, and for African-Americans (and for
each, broken down by male/female, for a total of nine series).
Source: Statistical Abstract of the United States.
Linear
B1: As each year passes, White Female Life Expectancy increases by
.129580.
Trend Analysis Plot for LFW white-female
Linear Trend Model
Yt = 76.3847 + 0.129580* t
83 Variable
Actual
82 Fits
Forecasts
81
LFW white-female
Accuracy Measures
MAPE 0.458203
80 MAD 0.357733
MSD 0.185317
79
78
77
76
75
1970 1977 1984 1991 1998 2005 2012
Year
0.5
Residual
0.0
-0.5
-1.0
1 5 10 15 20 25 30 35
Observation Order
Quadratic
B1: As each year passes, White Female Life Expectancy increases by .26277.
Trend Analysis Plot for LFW white-female
Quadratic Trend Model
Yt = 75.5633 + 0.262778* t - 0.00369996* t* * 2
81 Variable
Actual
Fits
80 Forecasts
Accuracy Measures
LFW white-female
79 MAPE 0.285550
MAD 0.223888
MSD 0.071654
78
77
76
75
1970 1977 1984 1991 1998 2005 2012
Year
0.50
0.25
Residual
0.00
-0.25
-0.50
1 5 10 15 20 25 30 35
Observation Order
Exponential
B1: As each year passes, White Female Life Expectancy increases by .165%.
Accuracy Measures
MAPE 0.470117
80 MAD 0.367166
MSD 0.192047
79
78
77
76
75
1970 1977 1984 1991 1998 2005 2012
Year
Residuals Versus the Order of the Data
(response is LFW white-female)
0.5
Residual
0.0
-0.5
-1.0
1 5 10 15 20 25 30 35
Observation Order
6A.) What are the shapes of the estimated trend lines for these
three models, from 1970 to 2015?
6B.) For each of the three models, examine the printout of the
residuals and the plot of the residuals in order (i.e., in time
sequence). Think of the residuals as estimates of the disturbances.
Our model is based on the premise that the disturbances are purely
random. Which model seems best to conform to that premise?
Briefly explain what you observe in the printouts/plots, and the
reasons for your conclusion. Based on the shapes and the residuals,
which model would you choose for short-term (ten years)
forecasting? (It will turn out that we can adjust LS to deal with
certain patterns of non-randomness in the disturbances.)
The linear models come closest to matching the Bureau’s forecasts. The
quadratic model does not make much logical sense because it says that the
population starts to decrease after 2005,
7A.) Consider the simple regression model You
Yi = β 0 + β1 X i + u i
When there are two data points X1 and X2, it is possible to estimate the data
set because only one line is possible. Ie. There are many values of Y for each
single X.
Because there are two data points, it is possible to estimate the B1 and B0
with certainty. As opposed to only one data point, this can have a range of LS
lines that can cross through that one point. For example, that one point can
have multiple Ybar values and they line up, which means there is an infinite
set of best fit lines. Ie. There are now two values of X and two values of Y
8A.) Explain why adding variables to a model can only increase .
R2
The more variables are added, the more accurately a model can explain the
data points and the more accurately it can explain the data points, therefore
the R^2 goes up.
If random variables are added that are not relevant, than R^2 will decrease
because it increases the SER. Only if the variable minimizes the error of the
regressions, ie made the model more nuanced, then it increase R^2 and
reduces SER.