You are on page 1of 6

Chapter 5: Regression Analysis

8. Construct a scatter diagram for Takeaways and Yards Allowed in the 2000 NFL Data.xls worksheet.
Does there appear to be a linear relationship? Develop a regression model for predicting Yards Allowed
as a function of Takeaways. Explain the statistical significance of the model.
y = -32.412x + 6087.8
R = 0.2171
7,000
Yards Allowed

6,000
5,000
4,000
3,000
2,000
1,000
0
0

20

40

60

Takeaways

There is a negative relationship, i.e. as takeaways increase, yards allowed tend to decrease. However the
strength of that relationship is relatively weak (R2 = .22).

SUMMARY OUTPUT

Regression Statistics
Multiple R
0.46593399
R Square
0.217094483 This is a low value, indicating only 21.7% of the variation in Yards Allowed is explained by t
Adjusted R Square
0.190097741
Standard Error 502.3197062
The coefficient of variation is equal to:
0.0983 , which means that predicte
This isn't so bad, even thoug
Observations
31
ANOVA
df
Regression
Residual
Total

Intercept
Takeaways

1
29
30

SS
2029073.89
7317427.529
9346501.419

MS
2029073.89
252325.0872

Coefficients Standard Error


t Stat
6087.762509 355.9876917 17.10104773
-32.41181776 11.42969583 -2.835755057

F
Significance F
8.041506744 0.008248331

P-value
Lower 95%
Upper 95%
1.09108E-16 5359.685533 6815.839486
0.008248331 -55.78818323 -9.035452285

The regression equation is:


Predicted yards allowed =
RESIDUAL OUTPUT

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Predicted Y
5115.407976
4499.583439
5082.996159
4888.525252
5504.34979
5082.996159
4661.642528
4758.877981
5374.702519
4758.877981
5277.467065
4953.348888
4953.348888
4953.348888
5180.231612
4726.466163
5018.172523
5147.819794
4856.113434
5115.407976
5147.819794
5147.819794
5407.114336
5277.467065
5439.526154
5342.290701
5277.467065
5407.114336
5277.467065
5439.526154
5374.702519

Residuals
-1302.407976
-532.5834391
-536.9961587
360.4747478
196.6502104
-262.9961587
882.3574721
-122.8779812
-17.70251854
41.12201884
216.5329347
-210.3488877
-133.3488877
-240.3488877
-111.231612
306.5338366
-544.1725232
-721.8197942
799.8865656
-270.4079765
145.1802058
1243.180206
301.8856637
51.53293473
-205.5261541
10.70929922
329.5329347
79.8856637
365.5329347
297.4738459
-415.7025185

Lagged
Residuals

The intercept of 6087.8 means that the mod


The x-coeficient of -32.41 means that each

-1302.407976
-532.5834391
-536.9961587
360.4747478
196.6502104
-262.9961587
882.3574721
-122.8779812
-17.70251854
41.12201884
216.5329347
-210.3488877
-133.3488877
-240.3488877
-111.231612
306.5338366
-544.1725232
-721.8197942
799.8865656
-270.4079765
145.1802058
1243.180206
301.8856637
51.53293473
-205.5261541
10.70929922
329.5329347
79.8856637
365.5329347
297.4738459

The p-value indicates that the regression is


This means that it's very likely that the numb
is a great deal of randomness in the relation
The formal hypotheses are:

Suppose we were interested in the question


conlcusion that more takeaways leads to few
this question amounts to asking if b1<0.

H0: b 1>=0
H1: b 1<0
We would reject the null hypothesis if both (
less than the value of a . We reject the null h
0.004124166
so we have the statistical evidence to say "y

1500

1000

Residual (yards)

Observation

500

0
0

10

-500

-1000

-1500

The residual plot does not show evidence of nonlinearity (wavy pattern) or heteroscedastici

The lagged residual plot is given below. There is no clear pattern of points residing primarily in diagonally o
quadrants, so we conclude the there is no autocorrelation. (Note that this is just an illustration of the techniq
is no time-series here and autocorrelation is not a relevant concept.)

Lagged Residuals vs. Residuals


1500

Lagged Residuals

1000

500

0
-1000

-500

500

-500

-1000

-1500
Residuals

1000

TakeawaysYards Allowed
30
3,813
49
3,967
31
4,546
37
5,249
18
5,701
31
4,820
44
5,544
41
4,636
22
5,357
41
4,800
25
5,494
35
4,743
35
4,820
35
4,713
28
5,069
42
5,033
33
4,474
29
4,426
38
5,656
30
4,845
29
5,293
29
6,391
21
5,709
25
5,329
20
5,234
23
5,353
25
5,607
21
5,487
25
5,643
20
5,737
22
4,959
Averages:
30.13 5111.23

s Allowed is explained by the number of Takeaways.

which means that predicted values tend to be off by about 9.8%.


is isn't so bad, even though the model doesn't explain the variation in Yards Allowed very well.

Lower 95.0%
Upper 95.0%
5359.686 6815.839
-55.7882 -9.03545

6087.8

-32.41

Takeaways

087.8 means that the model predicts 6087.8 yards gained when a team has no Takeaways.
f -32.41 means that each additional Takeaway is predicted to result in a decrease of 32.41 in Yards Allowed.

ates that the regression is significant at the 1% level, even though it has a low coefficient of determination.
s very likely that the number of takeaways influences the number of yards allowed, even though there
andomness in the relationship.
H0: b 1=0
We reject H0 at any significance level less than the p-value of .00825.
H1: b 1=/=0

interested in the question: Is there significant statistical evidence to support the


ore takeaways leads to fewer yards allowed? In the language of our regression model,
unts to asking if b1<0.

he null hypothesis if both (1) b1>0 and (2) the appropriate p-value (half cell F54) is
e of a . We reject the null hypothesis for all significance levels greater than:

atistical evidence to say "yes" to the original question at the 5% level and the 1% level.

Residuals vs Takeaways

20

30

Takeaways

attern) or heteroscedasticity (megaphone pattern).

40

50

60

ng primarily in diagonally opposing


an illustration of the technique, as there

1500

You might also like