Stats - Assignment 3

Assignment 3
Regression Analysis: PE versus GRE, ROE

The regression equation is
PE = 13.9 + 0.0691 GRE + 0.0083 ROE
Predictor Coef SE Coef T P

Constant 13.912 1.388 10.03 0.000
GRE 0.06909 0.01110 6.23 0.000
ROE 0.00827 0.04623 0.18 0.859
S = 2.75626 R-Sq = 62.5% R-Sq(adj) = 59.5%
Analysis of Variance
Source DF SS MS F P
Regression 2 316.19 158.09 20.81 0.000
Residual Error 25 189.92 7.60
Total 27 506.11
Source DF Seq SS
GRE 1 315.95
ROE 1 0.24
Unusual Observations
Obs GRE PE Fit SE Fit Residual St Resid

4 8 8.671 14.595 0.769 -5.924 -2.24R
7 44 15.875 17.442 1.714 -1.567 -0.73 X
9 261 32.046 31.970 2.623 0.076 0.09 X
10 15 21.883 15.167 0.558 6.716 2.49R
26 15 20.533 15.047 0.710 5.486 2.06R
R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence.
1A.) Estimate (1). Interpret the estimated coefficients.

Model (1) PE = 13.9 + 0.0691 GRE + 0.0083 ROE
Every one dollar increase in Growth Rate of Earnings, results in a .0691

increase in the PE ratio
Every one dollar increase in Return on Earning, results in a .0083 increase in

the PE ratio
1B.) Estimate (2) and (2'). Interpret the estimated coefficients of

(2').
Model (2) lnPE = 2.59 + 0.00344 GRE + 0.00186 ROE
Model (2’) lnPE = 2.74 + 0.00275 GRE + 0.00467 ROE - 0.176 lnE
Every one dollar increase in Growth Rate of Earning results in a .00275 %

increase in the PE ratio
Every one dollar increase in Return of Earning, results in a .00467 % increase

in the PE ratio
Every 1 % increase in Earnings per Share, results in a .176 % decrease in the

PE ratio
1C.) Is(2) a special case of (2')? I.e., is there a restriction on the

value of one of the parameters of (2') that turns (2') into (2)? Does
this restriction seem to be valid in the data? (Later we will be able
formally to test whether the restriction is valid in the data.)
yes, the restriction is that lnE equals 0. No, because E always affects PE.
2A.) Consider the following model (don’t estimate it!):
PE = β 0 + β1GRE + β 2 ROE + β 3 D + β 4 D • GRE + u
Explain why the coding of the dummy variable in the Excel file you
were given is inappropriate. (Hint: What restrictions does this
model imply?)
Based on the model, it implies that a change in D represents a change in PE,

ie. D times GRE, even though D is assigned arbitrarily and has no relevance.
Dummy variables should be only 0 or 1. There are three dummy variables,
but the equation only shows a change in one variable
2B.) Define two new dummy variables that code the type of industry,
with the consumer goods industry as the “omitted group” (the
baseline or benchmark industry). Referring to these new dummy
variables you have defined, state a model that explains the P/E ratio
as a function of the growth rate of earnings and the return on
equity, and allows the following to differ for the three types of
industry: (i) intercept, and (ii) response of the predicted P/E ratio to
a change in the growth rate of earnings.
Capital Goods 10for company producing capital goodsif not producing capital
goods
Investment input 10for company producing investment inputsif not producing

investment inputs
PE=β0+β1×GRE+β2×ROE+β3×CG+β4×II+β5×CG×GRE+β6×II×GRE+u
2C.) Estimate the new model you stated in part b. Interpret the
coefficients in that model.
Regression Analysis: PE versus GRE, ROE, CG, II, CG*GRE,

II*GRE
PE = 13.6 + 0.0694 GRE + 0.0307 ROE - 0.26 CG - 1.24 II - 0.044 CG*GRE
+ 0.092 II*GRE

Constant 13.630 1.677 8.13 0.000
GRE 0.06943 0.01239 5.61 0.000
ROE 0.03074 0.06166 0.50 0.623
CG -0.263 2.548 -0.10 0.919
II -1.238 1.652 -0.75 0.462
CG*GRE -0.0443 0.1184 -0.37 0.712
II*GRE 0.0923 0.1049 0.88 0.389
S = 2.91463 R-Sq = 64.8% R-Sq(adj) = 54.7%
Source DF SS MS F P
Regression 6 327.717 54.619 6.43 0.001
Total 27 506.114
Source DF Seq SS
GRE 1 315.946
ROE 1 0.243
CG 1 2.575
II 1 0.883
CG*GRE 1 1.494
II*GRE 1 6.576
Unusual Observations
Obs GRE PE Fit SE Fit Residual St Resid

2 -14 11.596 10.725 2.598 0.871 0.66 X
4 8 8.671 14.651 0.908 -5.980 -2.16R
7 44 15.875 16.294 2.881 -0.419 -0.95 X
9 261 32.046 31.845 2.799 0.201 0.25 X
10 15 21.883 15.575 0.851 6.308 2.26R
R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence.
For every additional dollar in the GRE increase the predicted PE by 0.0694
dollars
For every additional dollar in the ROE increase the predicted PE by 0.0307
dollars
For every additional dollar in the CG decrease the predicted PE by 0.26

dollars
For every additional dollar in the II decrease the predicted PE by 1.24 dollars
For every additional dollar2 increase in Capital Goods * GRE, PE decreases by

.044 dollars
For every additional dollar2 increase in Investment Input * GRE, PE decreases

by .044 dollars
This problem explores seasonality. The data set called soft drink
sales.xls contains quarterly data on the number of cans of a certain
brand of soda sold (4 years = 16 quarters of data).
3A.) Make a “run chart”for these sales data. What features stand
out? (Definition: A run chart plots the data points of a time series,
against time. In this application, we plot sales on the vertical axis,
and the quarter, going from 1 to 16, on the horizontal axis. Minitab
does this—use“stat” then “time series” then “time series plot” and
choose “simple.” See below for why we need a run chart.)
Run Chart of sales
80
70
sales
60
50
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Observation
Number of runs about median: 4 Number of runs up or down: 8
Expected number of runs: 9.00000 Expected number of runs: 10.33333
Longest run about median: 6 Longest run up or down: 3
Approx P-Value for Clustering: 0.00483 Approx P-Value for Trends: 0.07089
Approx P-Value for Mixtures: 0.99517 Approx P-Value for Oscillation: 0.92911
Residual Plots for sales
Normal Probability Plot Versus Fits
99 4
90
2
Residual
Percent
50 0
10 -2
1 -4
-5.0 -2.5 0.0 2.5 5.0 40 50 60 70 80
Residual Fitted Value
Histogram Versus Order

4 4
3 2
Frequency
Residual
2 0
1 -2
0 -4
-4 -2 0 2 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Residual Observation Order
The data is cyclical and based on seasons and is on average rising.
3B.) Use the quarter (1 to 16) as a predictor variable, to capture the

trend in sales, and capture seasonality by defining dummy variables
for three of the four seasons (let fall be the “omitted group,” i.e.,
the “omitted season,” and include an intercept in the model). State
this model. Then estimate it, and report the results (give the
estimated coefficients, SER, , and ). Provide an
2
R2 R
interpretation of each of the estimated coefficients.
Regression Analysis: sales versus quarter, spring,

summer, winter
sales = 33.4 + 2.24 quarter + 14.8 spring + 18.1 summer + 1.77 winter

Constant 33.356 2.217 15.05 0.000
quarter 2.2394 0.1652 13.55 0.000
spring 14.779 2.116 6.98 0.000
summer 18.114 2.096 8.64 0.000
winter 1.768 2.148 0.82 0.428
S = 2.95560 R-Sq = 96.5% R-Sq(adj) = 95.3%
Source DF SS MS F P
Regression 4 2684.67 671.17 76.83 0.000
Total 15 2780.76
Source DF Seq SS
quarter 1 1687.44
spring 1 199.73
summer 1 791.57
winter 1 5.92
For every change in season, the sales increase by 2.24 million cans
For every winter that passes (ie 1 year), sales increase by 1.77 million cans
For every spring that passes (ie 1 year), sales increase by 14.8 million cans
For every summer that passes (ie 1 year), sales increase by 18.1 million cans
SER: 96.09
R^2: 96.5%
Rbar^2: 95.3%
3C.) Compute predicted salesfor each quarter of the next year

(quarters 17 through 20). (Later in the course, we will (i) test
whether the seasonal dummies, taken together, explain enough of
the variation in sales to justify using up three additional degrees of
freedom, and (ii) quantify the uncertainty in our forecasts for the
fifth year!)
Trend Analysis Plot for sales
Linear Trend Model
Yt = 42.12 + 2.22779* t
90 Variable
Actual
Fits
80 Forecasts
Accuracy Measures
MAPE 12.9293
70 MAD 7.7813
sales
MSD 68.3323
60
50
40
2 4 6 8 10 12 14 16 18 20
Index
Trend Analysis for sales
Data sales
Length 16
NMissing 0
Fitted Trend Equation
Yt = 42.12 + 2.22779*t
Accuracy Measures
MAPE 12.9293
MAD 7.7813
MSD 68.3323
Forecasts
Period Forecast
17 79.9925 - Winter
18 82.2203 - Spring
19 84.4481 - Summer
20 86.6759 - Fall
4A.) Re-estimate the model using the natural logarithm of sales as
the dependent variable (the semi-log or log-linmodel). Provide an
interpretation of each of the estimated coefficients. (Hint:
Remember that proportional change from to
ln X ( 2 ) − ln X (1) ≈ X (1)
!)
X ( 2)
Regression Analysis: lnSales versus quarter, winter, spring, summer

lnSales = 3.63 + 0.0378 quarter + 0.0225 winter + 0.245 spring + 0.291 summer

Constant 3.62686 0.03738 97.04 0.000
quarter 0.037821 0.002786 13.58 0.000
winter 0.02250 0.03622 0.62 0.547
spring 0.24463 0.03568 6.86 0.000
summer 0.29097 0.03535 8.23 0.000
S = 0.0498339 R-Sq = 96.5% R-Sq(adj) = 95.2%
Source DF SS MS F P
Regression 4 0.75172 0.18793 75.67 0.000
Residual Error 11 0.02732 0.00248
Total 15 0.77904
Source DF Seq SS
quarter 1 0.48314
winter 1 0.07596
spring 1 0.02435
summer 1 0.16827
For every season that pass, the sales increases by .0378%
For every winter that pass, the sales increases by .0225%
For every spring that pass, the sales increases by .245%
For every spring that pass, the sales increases by .291%
4B.) Compute predicted sales for quarters 17 through 20 based on

the semi-log model (in millions of cans, not the logarithm of millions
of cans!).
Trend Analysis Plot for lnSales
Linear Trend Model
Yt = 3.76744 + 0.0376963* t
4.6 Variable
Actual
4.5 Fits
Forecasts
4.4
Accuracy Measures
4.3 MAPE 3.14305
MAD 0.12828
4.2
lnSales
MSD 0.01849
4.1
4.0
3.9
3.8
3.7
2 4 6 8 10 12 14 16 18 20
Index
Period Forecast
17 4.40828 = 82.1281 millions of cans
18 4.44597 = 85.2826 millions of cans
19 4.48367 = 88.5591 millions of cans
20 4.52137 = 91.9815 millions of cans
5.) The data set called US life expectancy CENSUS.xlscontains annual
data from 1970 to 2004 on life expectancy at birth in the US, for the
entire population, for whites, and for African-Americans (and for
each, broken down by male/female, for a total of nine series).
Source: Statistical Abstract of the United States.
Focus on one of the following four time series: life expectancy

(L) for white males (LMW), white females (LFW), black males (LMB),
or black females (LFB). Choose one of these series(or more than
one, if you are curious!) Estimate the following models:
(1) (linear trend)

Lt = β 0 + β1TIME t + u t
(2) (quadratic trend)
Lt = β 0 + β1TIME t + β 2 ( TIME t ) 2 + u t
(3) (exponential growth =>
ln Lt = β 0 + β1TIME t + u
semilog model)
Estimate these three models (see hint on forecasting).

Interpret (and for the second model, ) for each model.
β̂1 β̂ 2
For LFW (White females)
Linear
B1: As each year passes, White Female Life Expectancy increases by
.129580.
Trend Analysis Plot for LFW white-female
Linear Trend Model
Yt = 76.3847 + 0.129580* t
83 Variable
Actual
82 Fits
Forecasts
81
LFW white-female
Accuracy Measures
MAPE 0.458203
80 MAD 0.357733
MSD 0.185317
79
78
77
76
75
1970 1977 1984 1991 1998 2005 2012
Year
Residuals Versus the Order of the Data

(response is LFW white-female)
0.5
Residual
0.0
-0.5
-1.0
1 5 10 15 20 25 30 35
Observation Order
Quadratic
B1: As each year passes, White Female Life Expectancy increases by .26277.
Quadratic Trend Model
Yt = 75.5633 + 0.262778* t - 0.00369996* t* * 2
81 Variable
Actual
Fits
80 Forecasts
Accuracy Measures
LFW white-female
79 MAPE 0.285550
MAD 0.223888
MSD 0.071654
78
77
76
75
1970 1977 1984 1991 1998 2005 2012
Year

0.50
0.25
Residual
0.00
-0.25
-0.50
1 5 10 15 20 25 30 35
Observation Order
Exponential
B1: As each year passes, White Female Life Expectancy increases by .165%.

Growth Curve Model
Yt = 76.3963 * (1.00166* * t)
83 Variable
Actual
82 Fits
Forecasts
81
LFW white-female
Accuracy Measures
MAPE 0.470117
80 MAD 0.367166
MSD 0.192047
79
78
77
76
75
1970 1977 1984 1991 1998 2005 2012
Year
0.5
Residual
0.0
-0.5
-1.0
1 5 10 15 20 25 30 35
Observation Order
6A.) What are the shapes of the estimated trend lines for these
three models, from 1970 to 2015?
The linear model has a straight line

The quadratic model has a convex shape
The exponential model has a straight line
6B.) For each of the three models, examine the printout of the
residuals and the plot of the residuals in order (i.e., in time
sequence). Think of the residuals as estimates of the disturbances.
Our model is based on the premise that the disturbances are purely
random. Which model seems best to conform to that premise?
Briefly explain what you observe in the printouts/plots, and the
reasons for your conclusion. Based on the shapes and the residuals,
which model would you choose for short-term (ten years)
forecasting? (It will turn out that we can adjust LS to deal with
certain patterns of non-randomness in the disturbances.)
The exponential model shows a nice random distribution of the residuals

which means that the fitted line is not skewed towards any particular
direction.
6C.) Below are the Census Bureaus’sforecasts of life expectancy. For

your chosen variable, which model comes closest to matching these
forecasts? Obviously, we have to define what we mean by “close!”
Below are two criteria for evaluating “out-of-sample” forecasts
(RMSE and MAD). Use these criteria as a measure of how closely
your predictions match the Census Bureau’s forecasts (i.e., treat
these two years of forecasts as out-of-sample data).
White White Black Black
male female male female
2010 76.1 81.8 70.9 77.8
2015 78.0 83.8 71.9 78.9
The linear models come closest to matching the Bureau’s forecasts. The
quadratic model does not make much logical sense because it says that the
population starts to decrease after 2005,
7A.) Consider the simple regression model You
Yi = β 0 + β1 X i + u i
have observations corresponding to one value of X, say .

n1 X1
The mean value of Y for these equals . You also have

n1 Y (1) n2
observations corresponding to another value of X, say . The

X2
mean value of Y for these observations equals . These are

n2 Y ( 2)
your only observations (there are only two values of X in your
sample).
Explain why it is possible to obtain the LS estimates and

β̂ 0 β̂1
with this data set, while it would be impossible if there were only
one value of X in your data.
When there are two data points X1 and X2, it is possible to estimate the data
set because only one line is possible. Ie. There are many values of Y for each
single X.
7B.) Show (prove, demonstrate) that with data corresponding to only

two values of X, the LS slope estimate will be
Y ( 2 ) − Y (1)
βˆ1 =
X 2 − X1
Because there are two data points, it is possible to estimate the B1 and B0
with certainty. As opposed to only one data point, this can have a range of LS
lines that can cross through that one point. For example, that one point can
have multiple Ybar values and they line up, which means there is an infinite
set of best fit lines. Ie. There are now two values of X and two values of Y
8A.) Explain why adding variables to a model can only increase .
R2
The more variables are added, the more accurately a model can explain the
data points and the more accurately it can explain the data points, therefore
the R^2 goes up.
8B.) Explain why increases if and only if adding a variable to a

2
R
model reduced SER.
If random variables are added that are not relevant, than R^2 will decrease
because it increases the SER. Only if the variable minimizes the error of the
regressions, ie made the model more nuanced, then it increase R^2 and
reduces SER.

Stats - Assignment 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats - Assignment 3

Uploaded by

Copyright:

Available Formats

Assignment 3

Regression Analysis: PE versus GRE, ROE

Predictor Coef SE Coef T P

S = 2.75626 R-Sq = 62.5% R-Sq(adj) = 59.5%

Obs GRE PE Fit SE Fit Residual St Resid

R denotes an observation with a large standardized residual.

1A.) Estimate (1). Interpret the estimated coefficients.

Every one dollar increase in Growth Rate of Earnings, results in a .0691

Every one dollar increase in Return on Earning, results in a .0083 increase in

1B.) Estimate (2) and (2'). Interpret the estimated coefficients of

Every one dollar increase in Growth Rate of Earning results in a .00275 %

Every one dollar increase in Return of Earning, results in a .00467 % increase

Every 1 % increase in Earnings per Share, results in a .176 % decrease in the

1C.) Is(2) a special case of (2')? I.e., is there a restriction on the

PE = β 0 + β1GRE + β 2 ROE + β 3 D + β 4 D • GRE + u

Based on the model, it implies that a change in D represents a change in PE,

Investment input 10for company producing investment inputsif not producing

Regression Analysis: PE versus GRE, ROE, CG, II, CG*GRE,

Predictor Coef SE Coef T P

Obs GRE PE Fit SE Fit Residual St Resid

R denotes an observation with a large standardized residual.

For every additional dollar in the CG decrease the predicted PE by 0.26

For every additional dollar2 increase in Capital Goods * GRE, PE decreases by

For every additional dollar2 increase in Investment Input * GRE, PE decreases

Run Chart of sales

Histogram Versus Order

The data is cyclical and based on seasons and is on average rising.

3B.) Use the quarter (1 to 16) as a predictor variable, to capture the

Regression Analysis: sales versus quarter, spring,

Predictor Coef SE Coef T P

3C.) Compute predicted salesfor each quarter of the next year

Trend Analysis for sales

Fitted Trend Equation

Regression Analysis: lnSales versus quarter, winter, spring, summer

The regression equation is

Predictor Coef SE Coef T P

S = 0.0498339 R-Sq = 96.5% R-Sq(adj) = 95.2%

For every season that pass, the sales increases by .0378%

For every winter that pass, the sales increases by .0225%

For every spring that pass, the sales increases by .245%

For every spring that pass, the sales increases by .291%

4B.) Compute predicted sales for quarters 17 through 20 based on

Focus on one of the following four time series: life expectancy

(1) (linear trend)

Estimate these three models (see hint on forecasting).

Residuals Versus the Order of the Data

Residuals Versus the Order of the Data

Trend Analysis Plot for LFW white-female

The linear model has a straight line

The exponential model shows a nice random distribution of the residuals

6C.) Below are the Census Bureaus’sforecasts of life expectancy. For

have observations corresponding to one value of X, say .

The mean value of Y for these equals . You also have

observations corresponding to another value of X, say . The

mean value of Y for these observations equals . These are

Explain why it is possible to obtain the LS estimates and

7B.) Show (prove, demonstrate) that with data corresponding to only

8B.) Explain why increases if and only if adding a variable to a

You might also like