You are on page 1of 18

Assignment 3

Regression Analysis: PE versus GRE, ROE


The regression equation is
PE = 13.9 + 0.0691 GRE + 0.0083 ROE

Predictor Coef SE Coef T P


Constant 13.912 1.388 10.03 0.000
GRE 0.06909 0.01110 6.23 0.000
ROE 0.00827 0.04623 0.18 0.859

S = 2.75626 R-Sq = 62.5% R-Sq(adj) = 59.5%

Analysis of Variance

Source DF SS MS F P
Regression 2 316.19 158.09 20.81 0.000
Residual Error 25 189.92 7.60
Total 27 506.11

Source DF Seq SS
GRE 1 315.95
ROE 1 0.24

Unusual Observations

Obs GRE PE Fit SE Fit Residual St Resid


4 8 8.671 14.595 0.769 -5.924 -2.24R
7 44 15.875 17.442 1.714 -1.567 -0.73 X
9 261 32.046 31.970 2.623 0.076 0.09 X
10 15 21.883 15.167 0.558 6.716 2.49R
26 15 20.533 15.047 0.710 5.486 2.06R

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large influence.

1A.) Estimate (1). Interpret the estimated coefficients.


Model (1) PE = 13.9 + 0.0691 GRE + 0.0083 ROE

Every one dollar increase in Growth Rate of Earnings, results in a .0691


increase in the PE ratio

Every one dollar increase in Return on Earning, results in a .0083 increase in


the PE ratio

1B.) Estimate (2) and (2'). Interpret the estimated coefficients of


(2').
Model (2) lnPE = 2.59 + 0.00344 GRE + 0.00186 ROE
Model (2’) lnPE = 2.74 + 0.00275 GRE + 0.00467 ROE - 0.176 lnE

Every one dollar increase in Growth Rate of Earning results in a .00275 %


increase in the PE ratio

Every one dollar increase in Return of Earning, results in a .00467 % increase


in the PE ratio

Every 1 % increase in Earnings per Share, results in a .176 % decrease in the


PE ratio

1C.) Is(2) a special case of (2')? I.e., is there a restriction on the


value of one of the parameters of (2') that turns (2') into (2)? Does
this restriction seem to be valid in the data? (Later we will be able
formally to test whether the restriction is valid in the data.)

yes, the restriction is that lnE equals 0. No, because E always affects PE.
2A.) Consider the following model (don’t estimate it!):

PE = β 0 + β1GRE + β 2 ROE + β 3 D + β 4 D • GRE + u

Explain why the coding of the dummy variable in the Excel file you
were given is inappropriate. (Hint: What restrictions does this
model imply?)

Based on the model, it implies that a change in D represents a change in PE,


ie. D times GRE, even though D is assigned arbitrarily and has no relevance.
Dummy variables should be only 0 or 1. There are three dummy variables,
but the equation only shows a change in one variable

2B.) Define two new dummy variables that code the type of industry,
with the consumer goods industry as the “omitted group” (the
baseline or benchmark industry). Referring to these new dummy
variables you have defined, state a model that explains the P/E ratio
as a function of the growth rate of earnings and the return on
equity, and allows the following to differ for the three types of
industry: (i) intercept, and (ii) response of the predicted P/E ratio to
a change in the growth rate of earnings.

Capital Goods 10for company producing capital goodsif not producing capital
goods

Investment input 10for company producing investment inputsif not producing


investment inputs

PE=β0+β1×GRE+β2×ROE+β3×CG+β4×II+β5×CG×GRE+β6×II×GRE+u

2C.) Estimate the new model you stated in part b. Interpret the
coefficients in that model.

Regression Analysis: PE versus GRE, ROE, CG, II, CG*GRE,


II*GRE
The regression equation is
PE = 13.6 + 0.0694 GRE + 0.0307 ROE - 0.26 CG - 1.24 II - 0.044 CG*GRE
+ 0.092 II*GRE

Predictor Coef SE Coef T P


Constant 13.630 1.677 8.13 0.000
GRE 0.06943 0.01239 5.61 0.000
ROE 0.03074 0.06166 0.50 0.623
CG -0.263 2.548 -0.10 0.919
II -1.238 1.652 -0.75 0.462
CG*GRE -0.0443 0.1184 -0.37 0.712
II*GRE 0.0923 0.1049 0.88 0.389
S = 2.91463 R-Sq = 64.8% R-Sq(adj) = 54.7%

Analysis of Variance

Source DF SS MS F P
Regression 6 327.717 54.619 6.43 0.001
Residual Error 21 178.397 8.495
Total 27 506.114

Source DF Seq SS
GRE 1 315.946
ROE 1 0.243
CG 1 2.575
II 1 0.883
CG*GRE 1 1.494
II*GRE 1 6.576

Unusual Observations

Obs GRE PE Fit SE Fit Residual St Resid


2 -14 11.596 10.725 2.598 0.871 0.66 X
4 8 8.671 14.651 0.908 -5.980 -2.16R
7 44 15.875 16.294 2.881 -0.419 -0.95 X
9 261 32.046 31.845 2.799 0.201 0.25 X
10 15 21.883 15.575 0.851 6.308 2.26R

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large influence.

For every additional dollar in the GRE increase the predicted PE by 0.0694
dollars

For every additional dollar in the ROE increase the predicted PE by 0.0307
dollars

For every additional dollar in the CG decrease the predicted PE by 0.26


dollars

For every additional dollar in the II decrease the predicted PE by 1.24 dollars

For every additional dollar2 increase in Capital Goods * GRE, PE decreases by


.044 dollars

For every additional dollar2 increase in Investment Input * GRE, PE decreases


by .044 dollars
This problem explores seasonality. The data set called soft drink
sales.xls contains quarterly data on the number of cans of a certain
brand of soda sold (4 years = 16 quarters of data).

3A.) Make a “run chart”for these sales data. What features stand
out? (Definition: A run chart plots the data points of a time series,
against time. In this application, we plot sales on the vertical axis,
and the quarter, going from 1 to 16, on the horizontal axis. Minitab
does this—use“stat” then “time series” then “time series plot” and
choose “simple.” See below for why we need a run chart.)

Run Chart of sales

80

70
sales

60

50

40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Observation
Number of runs about median: 4 Number of runs up or down: 8
Expected number of runs: 9.00000 Expected number of runs: 10.33333
Longest run about median: 6 Longest run up or down: 3
Approx P-Value for Clustering: 0.00483 Approx P-Value for Trends: 0.07089
Approx P-Value for Mixtures: 0.99517 Approx P-Value for Oscillation: 0.92911
Residual Plots for sales
Normal Probability Plot Versus Fits
99 4

90
2

Residual
Percent

50 0

10 -2

1 -4
-5.0 -2.5 0.0 2.5 5.0 40 50 60 70 80
Residual Fitted Value

Histogram Versus Order


4 4

3 2
Frequency

Residual
2 0

1 -2

0 -4
-4 -2 0 2 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Residual Observation Order

The data is cyclical and based on seasons and is on average rising.

3B.) Use the quarter (1 to 16) as a predictor variable, to capture the


trend in sales, and capture seasonality by defining dummy variables
for three of the four seasons (let fall be the “omitted group,” i.e.,
the “omitted season,” and include an intercept in the model). State
this model. Then estimate it, and report the results (give the
estimated coefficients, SER, , and ). Provide an
2
R2 R
interpretation of each of the estimated coefficients.

Regression Analysis: sales versus quarter, spring,


summer, winter
The regression equation is
sales = 33.4 + 2.24 quarter + 14.8 spring + 18.1 summer + 1.77 winter

Predictor Coef SE Coef T P


Constant 33.356 2.217 15.05 0.000
quarter 2.2394 0.1652 13.55 0.000
spring 14.779 2.116 6.98 0.000
summer 18.114 2.096 8.64 0.000
winter 1.768 2.148 0.82 0.428
S = 2.95560 R-Sq = 96.5% R-Sq(adj) = 95.3%

Analysis of Variance

Source DF SS MS F P
Regression 4 2684.67 671.17 76.83 0.000
Residual Error 11 96.09 8.74
Total 15 2780.76

Source DF Seq SS
quarter 1 1687.44
spring 1 199.73
summer 1 791.57
winter 1 5.92

For every change in season, the sales increase by 2.24 million cans

For every winter that passes (ie 1 year), sales increase by 1.77 million cans

For every spring that passes (ie 1 year), sales increase by 14.8 million cans

For every summer that passes (ie 1 year), sales increase by 18.1 million cans

SER: 96.09

R^2: 96.5%

Rbar^2: 95.3%

3C.) Compute predicted salesfor each quarter of the next year


(quarters 17 through 20). (Later in the course, we will (i) test
whether the seasonal dummies, taken together, explain enough of
the variation in sales to justify using up three additional degrees of
freedom, and (ii) quantify the uncertainty in our forecasts for the
fifth year!)
Trend Analysis Plot for sales
Linear Trend Model
Yt = 42.12 + 2.22779* t
90 Variable
Actual
Fits
80 Forecasts

Accuracy Measures
MAPE 12.9293
70 MAD 7.7813
sales

MSD 68.3323

60

50

40
2 4 6 8 10 12 14 16 18 20
Index

Trend Analysis for sales

Data sales
Length 16
NMissing 0

Fitted Trend Equation

Yt = 42.12 + 2.22779*t

Accuracy Measures

MAPE 12.9293
MAD 7.7813
MSD 68.3323

Forecasts

Period Forecast
17 79.9925 - Winter
18 82.2203 - Spring
19 84.4481 - Summer
20 86.6759 - Fall
4A.) Re-estimate the model using the natural logarithm of sales as
the dependent variable (the semi-log or log-linmodel). Provide an
interpretation of each of the estimated coefficients. (Hint:
Remember that proportional change from to
ln X ( 2 ) − ln X (1) ≈ X (1)

!)
X ( 2)

Regression Analysis: lnSales versus quarter, winter, spring, summer

The regression equation is


lnSales = 3.63 + 0.0378 quarter + 0.0225 winter + 0.245 spring + 0.291 summer

Predictor Coef SE Coef T P


Constant 3.62686 0.03738 97.04 0.000
quarter 0.037821 0.002786 13.58 0.000
winter 0.02250 0.03622 0.62 0.547
spring 0.24463 0.03568 6.86 0.000
summer 0.29097 0.03535 8.23 0.000

S = 0.0498339 R-Sq = 96.5% R-Sq(adj) = 95.2%

Analysis of Variance

Source DF SS MS F P
Regression 4 0.75172 0.18793 75.67 0.000
Residual Error 11 0.02732 0.00248
Total 15 0.77904

Source DF Seq SS
quarter 1 0.48314
winter 1 0.07596
spring 1 0.02435
summer 1 0.16827

For every season that pass, the sales increases by .0378%

For every winter that pass, the sales increases by .0225%

For every spring that pass, the sales increases by .245%

For every spring that pass, the sales increases by .291%

4B.) Compute predicted sales for quarters 17 through 20 based on


the semi-log model (in millions of cans, not the logarithm of millions
of cans!).
Trend Analysis Plot for lnSales
Linear Trend Model
Yt = 3.76744 + 0.0376963* t
4.6 Variable
Actual
4.5 Fits
Forecasts
4.4
Accuracy Measures
4.3 MAPE 3.14305
MAD 0.12828
4.2
lnSales

MSD 0.01849

4.1
4.0

3.9
3.8

3.7
2 4 6 8 10 12 14 16 18 20
Index

Period Forecast
17 4.40828 = 82.1281 millions of cans
18 4.44597 = 85.2826 millions of cans
19 4.48367 = 88.5591 millions of cans
20 4.52137 = 91.9815 millions of cans
5.) The data set called US life expectancy CENSUS.xlscontains annual
data from 1970 to 2004 on life expectancy at birth in the US, for the
entire population, for whites, and for African-Americans (and for
each, broken down by male/female, for a total of nine series).
Source: Statistical Abstract of the United States.

Focus on one of the following four time series: life expectancy


(L) for white males (LMW), white females (LFW), black males (LMB),
or black females (LFB). Choose one of these series(or more than
one, if you are curious!) Estimate the following models:

(1) (linear trend)


Lt = β 0 + β1TIME t + u t
(2) (quadratic trend)
Lt = β 0 + β1TIME t + β 2 ( TIME t ) 2 + u t
(3) (exponential growth =>
ln Lt = β 0 + β1TIME t + u
semilog model)

Estimate these three models (see hint on forecasting).


Interpret (and for the second model, ) for each model.
β̂1 β̂ 2
For LFW (White females)

Linear
B1: As each year passes, White Female Life Expectancy increases by
.129580.
Trend Analysis Plot for LFW white-female
Linear Trend Model
Yt = 76.3847 + 0.129580* t
83 Variable
Actual
82 Fits
Forecasts
81
LFW white-female

Accuracy Measures
MAPE 0.458203
80 MAD 0.357733
MSD 0.185317
79

78

77

76

75
1970 1977 1984 1991 1998 2005 2012
Year

Residuals Versus the Order of the Data


(response is LFW white-female)

0.5
Residual

0.0

-0.5

-1.0
1 5 10 15 20 25 30 35
Observation Order

Quadratic
B1: As each year passes, White Female Life Expectancy increases by .26277.
Trend Analysis Plot for LFW white-female
Quadratic Trend Model
Yt = 75.5633 + 0.262778* t - 0.00369996* t* * 2
81 Variable
Actual
Fits
80 Forecasts

Accuracy Measures
LFW white-female

79 MAPE 0.285550
MAD 0.223888
MSD 0.071654
78

77

76

75
1970 1977 1984 1991 1998 2005 2012
Year

Residuals Versus the Order of the Data


(response is LFW white-female)

0.50

0.25
Residual

0.00

-0.25

-0.50
1 5 10 15 20 25 30 35
Observation Order
Exponential
B1: As each year passes, White Female Life Expectancy increases by .165%.

Trend Analysis Plot for LFW white-female


Growth Curve Model
Yt = 76.3963 * (1.00166* * t)
83 Variable
Actual
82 Fits
Forecasts
81
LFW white-female

Accuracy Measures
MAPE 0.470117
80 MAD 0.367166
MSD 0.192047
79

78

77

76

75
1970 1977 1984 1991 1998 2005 2012
Year
Residuals Versus the Order of the Data
(response is LFW white-female)

0.5
Residual

0.0

-0.5

-1.0
1 5 10 15 20 25 30 35
Observation Order
6A.) What are the shapes of the estimated trend lines for these
three models, from 1970 to 2015?

The linear model has a straight line


The quadratic model has a convex shape
The exponential model has a straight line

6B.) For each of the three models, examine the printout of the
residuals and the plot of the residuals in order (i.e., in time
sequence). Think of the residuals as estimates of the disturbances.
Our model is based on the premise that the disturbances are purely
random. Which model seems best to conform to that premise?
Briefly explain what you observe in the printouts/plots, and the
reasons for your conclusion. Based on the shapes and the residuals,
which model would you choose for short-term (ten years)
forecasting? (It will turn out that we can adjust LS to deal with
certain patterns of non-randomness in the disturbances.)

The exponential model shows a nice random distribution of the residuals


which means that the fitted line is not skewed towards any particular
direction.

6C.) Below are the Census Bureaus’sforecasts of life expectancy. For


your chosen variable, which model comes closest to matching these
forecasts? Obviously, we have to define what we mean by “close!”
Below are two criteria for evaluating “out-of-sample” forecasts
(RMSE and MAD). Use these criteria as a measure of how closely
your predictions match the Census Bureau’s forecasts (i.e., treat
these two years of forecasts as out-of-sample data).
White White Black Black
male female male female
2010 76.1 81.8 70.9 77.8
2015 78.0 83.8 71.9 78.9

The linear models come closest to matching the Bureau’s forecasts. The
quadratic model does not make much logical sense because it says that the
population starts to decrease after 2005,
7A.) Consider the simple regression model You
Yi = β 0 + β1 X i + u i

have observations corresponding to one value of X, say .


n1 X1

The mean value of Y for these equals . You also have


n1 Y (1) n2

observations corresponding to another value of X, say . The


X2

mean value of Y for these observations equals . These are


n2 Y ( 2)
your only observations (there are only two values of X in your
sample).

Explain why it is possible to obtain the LS estimates and


β̂ 0 β̂1
with this data set, while it would be impossible if there were only
one value of X in your data.

When there are two data points X1 and X2, it is possible to estimate the data
set because only one line is possible. Ie. There are many values of Y for each
single X.

7B.) Show (prove, demonstrate) that with data corresponding to only


two values of X, the LS slope estimate will be
Y ( 2 ) − Y (1)
βˆ1 =
X 2 − X1

Because there are two data points, it is possible to estimate the B1 and B0
with certainty. As opposed to only one data point, this can have a range of LS
lines that can cross through that one point. For example, that one point can
have multiple Ybar values and they line up, which means there is an infinite
set of best fit lines. Ie. There are now two values of X and two values of Y
8A.) Explain why adding variables to a model can only increase .
R2

The more variables are added, the more accurately a model can explain the
data points and the more accurately it can explain the data points, therefore
the R^2 goes up.

8B.) Explain why increases if and only if adding a variable to a


2
R
model reduced SER.

If random variables are added that are not relevant, than R^2 will decrease
because it increases the SER. Only if the variable minimizes the error of the
regressions, ie made the model more nuanced, then it increase R^2 and
reduces SER.

You might also like