Professional Documents
Culture Documents
Chapter 14
Simple Regression Analysis
LEARNING OBJECTIVES
The overall objective of this chapter is to give you an understanding of bivariate
regression analysis, thereby enabling you to:
1.
2.
3.
4.
5.
6.
7.
Compute the equation of a simple regression line from a sample of data and
interpret the slope and intercept of the equation.
Understand the usefulness of residual analysis in examining the fit of the
regression line to the data and in testing the assumptions underlying regression
analysis.
Compute a standard error of the estimate and interpret its meaning.
Compute a coefficient of determination and interpret it.
Test hypotheses about the slope of the regression model and interpret the results.
Estimate values of y using the regression model.
Develop a linear trend line and use it to forecast.
independent variable values that are outside the range of values used to construct the
model. MINITAB issues a warning for such activity when attempted. There are many
instances where the relationship between x and y are linear over a given interval but
outside the interval the relationship becomes curvilinear or unpredictable. Of course,
with this caution having been given, many forecasters use such regression models to
extrapolate to values of x outside the domain of those used to construct the model. Such
forecasts are introduced in section 14.8, Using Regression to Develop a Forecasting
Trend Line. Whether the forecasts obtained under such conditions are any better than
"seat of the pants" or "crystal ball" estimates remains to be seen.
The concept of residual analysis is a good one to show graphically and
numerically how the model relates to the data and the fact that it more closely fits some
points than others, etc. A graphical or numerical analysis of residuals demonstrates that
the regression line fits the data in a manner analogous to the way a mean fits a set of
numbers. The regression model passes through the points such that the vertical distances
from the actual y values to the predicted values will sum to zero. The fact that the
residuals sum to zero points out the need to square the errors (residuals) in order to get a
handle on total error. This leads to the sum of squares error and then on to the standard
error of the estimate. In addition, students can learn why the process is called least
squares analysis (the slope and intercept formulas are derived by calculus such that the
sum of squares of error is minimized - hence "least squares"). Students can learn that by
examining the values of se, the residuals, r2, and the t ratio to test the slope they can begin
to make a judgment about the fit of the model to the data. Many of the chapter problems
ask the student to comment on these items (se, r2, etc.).
It is my view that for many of these students, the most important facet of this
chapter lies in understanding the "buzz" words of regression such as standard error of the
estimate, coefficient of determination, etc. because they may only interface regression
again as some type of computer printout to be deciphered. The concepts then may be
more important important than the calculations.
CHAPTER OUTLINE
14.1
14.2
14.3
Residual Analysis
Using Residuals to Test the Assumptions of the Regression Model
Using the Computer for Residual Analysis
14.4
14.5
Coefficient of Determination
Relationship Between r and r2
14.6
Hypothesis Tests for the Slope of the Regression Model and Testing the Overall
Model
Testing the Slope
Testing the Overall Model
14.7
Estimation
Confidence Intervals to Estimate the Conditional Mean of y: y/x
Prediction Intervals to Estimate a Single Value of y
14.8
14.8
KEY TERMS
Coefficient of Determination (r2)
Confidence Interval
Dependent Variable
Deterministic Model
Heteroscedasticity
Homoscedasticity
Independent Variable
Least Squares Analysis
Outliers
Prediction Interval
Probabilistic Model
Regression Analysis
Residual
Residual Plot
Scatter Plot
Simple Regression
Standard Error of the Estimate (se)
Sum of Squares of Error (SSE)
SOLUTIONS TO CHAPTER 14
14.1
x
12
21
28
8
20
x
17
15
22
19
24
30
25
20
15
10
0
0
10
15
20
25
30
x = 89
x2= 1,833
b1 =
b0 =
SS xy
SS x
y = 97
y2 = 1,935
xy
x
x y
n
( x ) 2
(89 )( 97 )
5
= 0.162
(89 ) 2
1,833
5
1,767
y b x = 97 0.162 89
n
= 16.51 + 0.162 x
xy = 1,767
n=5
= 16.51
14.2
30,099
x_
140
119
103
91
65
29
24
_y_
25
29
46
70
88
112
128
x
x2
b1 =
b0 =
= 571
SS x
xy
x
x y
n
( x ) 2
= 144.414 0.898 x
n=7
(571 )( 498 )
7
(571 ) 2
58 ,293
7
30 ,099
y b x = 498
xy
= 498
y2 = 45,154
= 58,293
SS xy
(0.898 )
571
= 144.414
7
= -0.898
14.3
(Advertising) x
12.5
3.7
21.6
60.0
37.6
6.1
16.8
41.2
(Sales) y
148
55
338
994
541
89
126
379
x = 199.5
x2 = 7,667.15
SS xy
b1 =
SS x
xy
x
= 2,670
xy = 107,610.4
2
y = 1,587,328
n=8
x y
107 ,610 .4
n
( x ) 2
15.240
b0 =
14.4
y b x = 2,670
1
= -46.292 + 15.240 x
(Prime) x
16
6
8
4
7
x = 41
x2 = 421
b1 =
b0 =
199 .5
= -46.292
8
15 .24
SS xy
SS x
(Bond) y
5
12
9
15
7
y = 48
y2 = 524
xy
x
x y
n
( x ) 2
n
xy = 333
n=5
(41)( 48 )
5
= -0.715
(41) 2
421
5
333
y b x = 48 (0.715 ) 41
n
= 15.460 0.715 x
= 15.460
14.5
Bankruptcies(y)
34.3
35.0
38.5
40.1
35.5
37.9
x = 344.4
Firm Births(x)
58.1
55.4
57.0
58.5
57.4
58.0
y = 221.3
y2 = 8188.41
b1 =
SS xy
SS x
x2 = 19,774.78
xy = 12,708.08
xy
x
x y
n
( x ) 2
n
n=6
12 ,708 .08
b1 = 0.878
b0 =
= -13.503 + 0.878 x
= -13.503
14.6
5.65
4.65
3.96
3.36
2.95
2.52
2.44
2.29
2.15
2.07
2.17
2.10
213
258
297
340
374
420
426
441
460
469
434
444
x = 36.31
y = 4,576
y2 = 1,825,028
xy = 12,766.71
b1 =
b0 =
SS xy
SS x
xy
x
x y
n
( x ) 2
n
y b x = 4,576
n
12
= 600.186 72.3281 x
x2 = 124.7931
n = 12
12 ,766 .71
( 72 .328 )
36 .31
= 600.186
12
14.7
Steel
99.9
97.9
98.9
87.9
92.9
97.9
100.6
104.9
105.3
108.6
New Orders
2.74
2.87
2.93
2.87
2.98
3.09
3.36
3.61
3.75
3.95
x = 994.8
y = 32.15
x2 = 99,293.28
y2 = 104.9815
xy = 3,216.652
n = 10
b1 =
SS xy
SS x
xy
x
x y
n
( x ) 2
n
3,216 .652
0.05557
b0 =
10
= -2.31307 + 0.05557 x
10
= -2.31307
14.8
x
15
8
19
12
5
10
y
47
36
56
44
21
= 13.625 + 2.303 x
Residuals:
x
15
8
19
12
5
14.9
x
12
21
28
8
20
14.10
y
47
36
56
44
21
y
17
15
22
19
24
48.1694
32.0489
57.3811
41.2606
25.1401
Predicted ( y )
18.4582
19.9196
21.0563
17.8087
19.7572
Residuals (y- y )
-1.1694
3.9511
-1.3811
2.7394
-4.1401
Residuals (y- y )
-1.4582
-4.9196
0.9437
1.1913
4.2428
= 16.51 + 0.162 x
x
140
119
103
91
65
29
24
y
25
29
46
70
88
112
128
= 144.414 - 0.898 x
14.11
x
12.5
3.7
21.6
60.0
37.6
6.1
16.8
41.2
14.12
14.13
Predicted ( y )
144.2053
10.0954
282.8873
868.0945
526.7236
46.6708
209.7364
581.5868
Residuals (y- y )
3.7947
44.9047
55.1127
125.9055
14.2764
42.3292
-83.7364
-202.5868
= -46.292 + 15.240x
x
16
6
8
4
7
y
148
55
338
994
541
89
126
379
11
y
5
12
9
15
7
Predicted ( y )
4.0259
11.1722
9.7429
12.6014
10.4576
Residuals (y- y )
0.9741
0.8278
-0.7429
2.3986
-3.4575
= 15.460 - 0.715 x
_ x_
58.1
55.4
57.0
58.5
57.4
58.0
_ y_
34.3
35.0
38.5
40.1
35.5
37.9
Predicted ( y )
37.4978
35.1277
36.5322
37.8489
36.8833
37.4100
Residuals (y- y )
-3.1978
-0.1277
1.9678
2.2511
-1.3833
0.4900
The residual for x = 58.1 is relatively large, but the residual for x = 55.4 is quite
small.
14.14
_x_ _ y_
5
47
7
38
11
32
12
24
19
22
25
10
Predicted ( y )
42.2756
38.9836
32.3997
30.7537
19.2317
9.3558
12
Residuals (y- y )
4.7244
-0.9836
-0.3996
-6.7537
2.7683
0.6442
= 50.506 - 1.646 x
14.15
Miles (x)
Cost y
1,245
2.64
425
2.31
1,346
2.45
973
2.52
255
2.19
865
2.55
1,080
2.40
296
2.37
= 2.2257 0.00025 x
y
( y )
2.5376
2.3322
2.5629
2.4694
2.2896
2.4424
2.4962
2.2998
(y- y )
.1024
-.0222
-.1128
.0506
-.0996
.1076
-.0962
.0702
13
14.16
14.17
14.18
14
SSE
=
n 2
46 .5692
3
= 3.94
SSE
=
n 2
272 .0
= 7.376
5
SSE
=
n 2
70 ,940
= 108.7
6
Six out of eight (75%) of the sales estimates are within $108.7 million.
SSE
19 .8885
=
n 2
3
= 2.575
Four out of five (80%) of the estimates are within 2.575 of the actual rate for
bonds. This amount of error is probably not acceptable to financial analysts.
14.23
_ x_
58.1
55.4
57.0
58.5
57.4
58.0
_ y_
34.3
35.0
38.5
40.1
35.5
37.9
SSE =
se =
Predicted ( y )
37.4978
35.1277
36.5322
37.8489
36.8833
37.4100
( y y )
SSE
=
n 2
15
( y y ) 2
Residuals (y- y )
-3.1978
10.2259
-0.1277
0.0163
1.9678
3.8722
2.2511
5.0675
-1.3833
1.9135
0.4900
0.2401
2
( y y ) = 21.3355
= 21.3355
21 .3355
= 2.3095
4
This standard error of the estimate indicates that the regression model is
with + 2.3095(1,000) bankruptcies about 68% of the time. In this
particular problem, 5/6 or 83.3% of the residuals are within this standard
error of the estimate.
14.24
(y- y )2
(y- y )
4.7244
-0.9836
-0.3996
-6.7537
2.7683
0.6442
SSE =
se =
22.3200
.9675
.1597
45.6125
7.6635
.4150
2
y
(y- ) = 77.1382
( y y ) = 77.1382
SSE
=
n 2
77 .1382
= 4.391
4
14.25
(y- y )
.1024
-.0222
-.1129
.0506
-.0996
.1076
-.0962
.0702
se =
16
(y- y )2
.0105
.0005
.0127
.0026
.0099
.0116
.0093
.0049
2
y
(y- ) = .0620
SSE
=
n 2
.0620
6
SSE =
( y y )
= .0620
= .1017
The model produces estimates that are .1017 or within about 10 cents 68% of the
time. However, the range of milk costs is only 45 cents for this data.
14.26
Volume (x)
728.6
497.9
439.1
377.9
375.5
363.8
276.3
Sales (y)
10.5
48.1
64.8
20.1
11.4
123.8
89.0
n=7
x = 3059.1
y = 367.7
x2 = 1,464,071.97
y2 = 30,404.31
b1 = -.1504
xy = 141,558.6
b0 = 118.257
= 118.257 - .1504x
SSE = y2 b0 y - b1 XY
= 30,404.31 - (118.257)(367.7) - (-0.1504)(141,558.6) = 8211.6245
se =
SSE
=
n 2
8211 .6245
5
= 40.526
This is a relatively large standard error of the estimate given the sales values
(ranging from 10.5 to 123.8).
14.27 r =
17
SSE
46 .6399
=1
2
( y )
(97 ) 2
2
1
,
935
y n
5
= .123
14.28 r =
SSE
272 .121
=1
2
( y )
( 498 ) 2
2
45
,
154
y n
7
= .972
14.29 r2 =
SSE
70 ,940
=1
2
( y )
( 2,670 ) 2
2
1
,
587
,
328
y n
8
= .898
14.30 r =
SSE
19 .8885
=1
2
( y )
(48 ) 2
2
524
y n
5
= .685
14.31 r =
SSE
21 .33547
=1
2
( y )
( 221 .3) 2 = .183
2
8
,
188
.
41
6
n
14.32
CCI
116.8
91.5
68.5
61.6
65.9
90.6
100.0
104.6
125.4
Median Income
37.415
36.770
35.501
35.047
34.700
34.942
35.887
36.306
37.005
x = 323.573
y2 = 79,718.79
b1 =
SS xy
SS x
18
y = 824.9
xy = 29,804.4505
xy
x
x y
n
( x ) 2
x2 = 11,640.93413
n=9
29 ,804 .4505
b1 = 19.2204
b0 =
= -599.3674
= -599.3674 + 19.2204 x
SSE = y2 b0 y - b1 XY =
79,718.79 (-599.3674)(824.9) 19.2204(29,804.4505) = 1283.13435
se =
r =
SSE
1283 .13435
=
= 13.539
n 2
7
SSE
1283 .13435
=1
2
( y )
(824 .9) 2
2
79
,
718
.
79
9
n
se
14.33 sb =
x
2
( x)
n
3.94
1.833
(89) 2
5
= .2498
= .688
19
b1 = 0.162
Ho: = 0
Ha: 0
= .05
df = n - 2 = 5 - 2 = 3
t.025,3 = 3.182
t =
b1 1 0.162 0
=
= 0.65
sb
.2498
Since the observed t = 0.65 < t.025,3 = 3.182, the decision is to fail to reject the
null hypothesis.
se
14.34 sb =
( x)
7.376
58,293
(571) 2
7
= .068145
b1 = -0.898
Ho: = 0
Ha: 0
= .01
df = n - 2 = 7 - 2 = 5
t.005,5 = 4.032
t=
b1 1 0.898 0
=
= -13.18
sb
.068145
Since the observed t = -13.18 < t.005,5 = -4.032, the decision is to reject the null
hypothesis.
se
14.35 sb =
b1 = 15.240
( x)
n
108 .7
(199 .5) 2
7,667 .15
8
= 2.095
Ho: = 0
Ha: 0
20
= .10
df = n - 2 = 8 - 2 = 6
t.05,6 = 1.943
t =
b1 1 15,240 0
=
= 7.27
sb
2.095
Since the observed t = 7.27 > t.05,6 = 1.943, the decision is to reject the null
hypothesis.
se
14.36 sb =
( x )
n
2.575
421
(41) 2
5
= .27963
b1 = -0.715
Ho: = 0
Ha: 0
= .05
df = n - 2 = 5 - 2 = 3
t.025,3 = 3.182
t =
b1 1 0.715 0
=
= -2.56
sb
.27963
Since the observed t = -2.56 > t.025,3 = -3.182, the decision is to fail to reject the
null hypothesis.
se
14.37 sb =
x2
( x ) 2
n
21
2.3095
19,774 .78
(344 .4) 2
6
= 0.926025
b1 = 0.878
Ho: = 0
Ha: 0
= .05
df = n - 2 = 6 - 2 = 4
t.025,4 = 2.776
t =
b1 1 0.878 0
=
= 0.948
sb
.926025
Since the observed t = 0.948 < t.025,4 = 2.776, the decision is to fail to reject the
null hypothesis.
14.38 F = 8.26 with a p-value of .021. The overall model is significant at = .05 but
not at = .01. For simple regression,
t =
F = 2.874
t.05,8 = 1.86 but t.01,8 = 2.896. The slope is significant at = .05 but not at
= .01.
14.39 x0 = 25
95% confidence
df = n - 2 = 5 - 2 = 3
x=
x = 89
n
22
/2 = .025
t.025,3 = 3.182
= 17.8
x = 89
x2 = 1,833
se = 3.94
t /2,n-2 se
( x0 x) 2
( x ) 2
2
x n
1 (25 17.8) 2
+
20.55 3.182(3.94) 5
(89 ) 2
1,833
5
20.55 8.01
12.54 < E(y25) < 28.56
= 20.55 3.182(3.94)(.63903) =
23
14.40 x0 = 100
For 90% confidence, /2 = .05
df = n - 2 = 7 - 2 = 5
t.05,5 = 2.015
x=
x = 571
n
= 81.57143
x= 571
x2 = 58,293
se = 7.377
/2,n-2
1
+
n
se
( x0 x) 2
( x ) 2
2
x n
=
54.614 2.015(7.377)
1+
1 (100 81 .57143 ) 2
+
=
(571) 2
7
58,293
7
1
1+ +
n
y t /2,n-2 se
27.674 2.015(7.377)
1 (130 81 .57143 ) 2
1+ +
=
(571) 2
7
58,293
7
27.674 2.015(7.377)(1.1589) =
27.674 17.227
14.41 x0 = 20
For 98% confidence,
df = n - 2 = 8 - 2 = 6
t.01,6 = 3.143
x=
x = 199 .5
n
/2 = .01
= 24.9375
x = 199.5
24
x2 = 7,667.15
se = 108.8
/2,n-2
se
( x0 x) 2
( x) 2
2
x
1
+
258.51 (3.143)(108.8) 8
(20 24 .9375 ) 2
(199 .5) 2
7,667 .15
8
t /2,n-2 se
1
+
n
( x0 x) 2
( x ) 2
2
x n
258.51 (3.143)(108.8)
1
1+ +
8
(20 24 .9375 ) 2
(199 .5) 2
7,667 .15
8
25
/2 = .005
14.42 x0 = 10
For 99% confidence
df = n - 2 = 5 - 2 = 3
t.005,3 = 5.841
x=
x = 41
n
= 8.20
x = 41
x2 = 421
se = 2.575
/2,n-2
se
( x0 x) 2
( x ) 2
2
x n
1 (10 8.2) 2
+
8.31 5.841(2.575) 5
(41) 2
421
5
14.43
Year
2001
2002
2003
2004
2005
Fertilizer
11.9
17.9
22.0
21.8
26.0
x = 10,015
x2= 20,060,055
b1 =
b0 =
SS xy
SS x
y = 99.6
y2 = 2097.26
xy
x
x y
n
( x ) 2
n
xy = 199,530.9
n=5
199 ,530 .9
y b x = 99 .6 3.21 10 ,015
n
26
= -6,409.71 + 3.21 x
(2008) = -6,409.71 + 3.21(2008) = 35.97
= -6,409.71
14.44
Year
1998
1999
2000
2001
2002
2003
2004
Fertilizer
5860
6632
7125
6000
4380
3326
2642
x = 14,007
x2= 28,028,035
b1 =
SS xy
SS x
27
y = 35,965
y2 = 202,315,489
xy
x
x y
xy = 71,946,954
n=7
71,946 ,954
n
( x ) 2
-678.9643
b0 =
y b x = 35 ,965
n
678 .9643
14 ,007
7
= 1,363,745.39
= 1,363,745.39 + -678.9643 x
(2007) = 1,363,745.39 + -678.9643(2007) = 1,064.04
14.45 Year
2003
Quarter
1
2
3
4
1
2
3
4
1
2
3
4
2004
2005
28
Cum. Quarter(x)
1
2
3
4
5
6
7
8
9
10
11
12
Sales(y)
11.93
12.46
13.28
15.08
16.08
16.82
17.60
18.66
19.73
21.11
22.21
22.94
b1 =
b0 =
SS xy
SS x
y = 207.9
y2 = 3,755.2084
xy
x
x y
n
( x ) 2
1,499 .07
y b x = 207 .9 1.033 78
n
12
xy = 1,499.07
n = 12
12
= 10.6105
= 10.6105 + 1.033 x
Remember, this trend line was constructed using cumulative quarters. To forecast
sales for the third quarter of year 2007, we must convert this time frame to
cumulative quarters. The third quarter of year 2007 is quarter number 19 in our
scheme.
14.46
x
5
7
3
16
12
9
y
8
9
11
27
15
13
x = 52
y = 83
xy = 865
a)
b)
c)
x2 = 564
y2 = 1,389
n=6
b1 = 1.2853
b0 = 2.6941
= 2.6941 + 1.2853 x
(Predicted Values)
9.1206
11.6912
6.5500
23.2588
18.1177
14.2618
(y- y ) residuals
-1.1206
-2.6912
4.4500
3.7412
-3.1176
-1.2618
(y- y )2
1.2557
7.2426
19.8025
13.9966
9.7194
1.5921
SSE = 53.6089
se =
d)
29
r2 =
SSE
=
n 2
53 .6089
4
= 3.661
SSE
53 .6089
=1
2
( y )
(83 ) 2
2
1
,
389
y n
6
= .777
e)
Ho: = 0
Ha: 0
30
= .01
df = n - 2 = 6 - 2 = 4
t.005,4 = 4.604
se
sb =
t =
x2
( x) 2
3.661
564
(52) 2
6
= .34389
b1 1 1.2853 0
=
= 3.74
sb
.34389
Since the observed t = 3.74 < t.005,4 = 4.604, the decision is to fail to reject
the null hypothesis.
f)
14.47
x
53
47
41
50
58
62
45
60
y
5
5
7
4
10
12
3
11
x = 416
y = 57
xy = 3,106
a)
x2 = 22,032
y2 = 489
n=8
= -11.335 + 0.355 x
b1 = 0.355
b0 = -11.335
b)
c)
d)
(Predicted Values)
7.48
5.35
3.22
6.415
9.255
10.675
4.64
9.965
(y -
31
) residuals
-2.48
-0.35
3.78
-2.415
0.745
1.325
-1.64
1.035
(y- y )2
6.1504
0.1225
14.2884
5.8322
0.5550
1.7556
2.6896
1.0712
SSE = 32.4649
se =
SSE
=
n 2
32 .4649
6
= 2.3261
SSE
32 .4649
=1
2
( y )
(57 ) 2
2
489
y n
8
e)
r =
f)
Ho: = 0
Ha: 0
= .608
= .05
df = n - 2 = 8 - 2 = 6
t.025,6 = 2.447
se
sb =
t =
( x)
n
2.3261
22,032
b1 1 0.3555 0
=
= 3.05
sb
.116305
( 416 ) 2
8
= 0.116305
32
Since the observed t = 3.05 > t.025,6 = 2.447, the decision is to reject the
null hypothesis.
The population slope is different from zero.
This model produces only a modest r2 = .608. Almost 40% of the
variance of y is unaccounted for by x. The range of y values is 12 - 3 = 9
and the standard error of the estimate is 2.33. Given this small range, the
se is not small.
g)
14.48
x = 1,263
y = 417
xy = 88,288
b0 = 25.42778
x2 = 268,295
y2 = 29,135
n=6
b1 = 0.209369
SSE = y2 - b0 y - b1 xy =
29,135 - (25.42778)(417) - (0.209369)(88,288) = 46.845468
r =
SSE
46 .845468
=1
2
153 .5
( y )
= .695
y2 n
14.49a) x0 = 60
x = 524
y = 215
xy = 15,125
x2 = 36,224
y2 = 6,411
n=8
se = 3.201
95% Confidence Interval
df = n - 2 = 8 - 2 = 6
t.025,6 = 2.447
b1 = .5481
b0 = -9.026
/2 = .025
x=
x = 524
n
= 65.5
1
+
n
/2,n-2
33
se
( x0 x) 2
( x ) 2
2
x n
23.86 + 2.447(3.201)
1
+
8
(60 65 .5) 2
(524 ) 2
36,224
8
70
1
+
n
+ t /2,n-2 se
29.341 + 2.447(3.201)
( x0 x) 2
( x ) 2
2
x n
1+
1
+
8
(70 65.5) 2
(524 ) 2
36,224
8
14.50
Year
1
2
3
4
5
Cost
56
54
49
46
45
x = 15
x2= 55
b1 =
b0 =
34
y = 250
y2 = 12,594
SS xy
SS x
xy
x
x y
n
( x ) 2
(15 )( 250 )
5
= -3
(15 ) 2
55
5
720
y b x = 250
1
xy = 720
n=5
(3)
15
= 59
5
= 59 - 3 x
(7) = 59 - 3(7) = 38
14.51 y = 267
x = 21
xy = 1,256
b0 = 9.234375
y2 = 15,971
x2 = 101
n =5
b1 = 10.515625
SSE = y2 - b0 y - b1 xy =
15,971 - (9.234375)(267) - (10.515625)(1,256) = 297.7969
r =
SSE
297 .7969
=1
2
1,713 .2 = .826
( y )
y2 n
If a regression model would have been developed to predict number of cars sold
by the number of sales people, the model would have had an r2 of 82.6%. The
same would hold true for a model to predict number of sales people by the
number of cars sold.
14.52 n = 12
y = 5940
x = 548
y2 = 3,211,546
b1 = 10.626383
35
x2 = 26,592
xy = 287,908
b0 = 9.728511
= 9.728511 + 10.626383 x
SSE = y2 - b0 y - b1 xy =
3,211,546 - (9.728511)(5940) - (10.626383)(287,908) = 94337.9762
se =
r =
t =
SSE
=
n 2
94 ,337 .9762
10
= 97.1277
SSE
94 ,337 .9762
=1
2
271 ,246
( y )
y 2 n
10 .626383 0
97 .1277
(548 ) 2
26 ,592
12
= .652
= 4.33
If = .01, then t.005,10 = 3.169. Since the observed t = 4.33 > t.005,10 = 3.169, the
decision is to reject the null hypothesis.
14.53
Sales(y)
Number of Units(x)
17.1
7.9
4.8
4.7
4.6
4.0
2.9
2.7
2.7
12.4
7.5
6.8
8.7
4.6
5.1
11.2
5.1
2.9
y = 51.4
x2 = 538.97
y2 = 460.1
xy = 440.46
b1 = 0.92025
36
x = 64.3
n=9
b0 = -0.863565
= -0.863565 + 0.92025 x
SSE = y2 - b0 y - b1 xy =
460.1 - (-0.863565)(51.4) - (0.92025)(440.46) = 99.153926
r =
SSE
99 .153926
=1
2
166 .55 = .405
( y )
y2 n
14.54
207,596,350
Year
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
37
Total Employment
11,152
10,935
11,050
10,845
10,776
10,764
10,697
9,234
9,223
9,158
x = 19,995
y = 103,834
x2= 39,980,085
y2 = 1,084,268,984
b1 =
SS xy
SS x
xy
x
x y
n
( x ) 2
n
xy =
n=7
-239.188
b0 =
y b x = 103 ,834
n
10
( 239 .188 )
19 ,995
10
= 488,639.564 + -239.188 x
(2008) = 488,639.564 + -239.188(2008) = 8,350.30
= 488,639.564
14.55
1977
581
213
668
345
1476
1776
2003
666
214
496
204
1600
6278
x= 5059
y2 = 42,750,268
b1 =
b0 =
SS xy
SS x
y = 9458
xy = 14,345,564
xy
x
x y
n
( x ) 2
x2 = 6,280,931
n=6
(5059 )( 9358 )
6
= 3.1612
(5059 ) 2
6,280 ,931
6
13,593 ,272
y b x = 9458
n
38
(3.1612 )
5059
6
= -1089.0712
= -1089.0712 + 3.1612 x
for x = 700:
= 1076.6044
1
+
n
+ t /2,n-2se
( x0 x) 2
( x ) 2
2
x
= 843.167
SSE = y2 b0 y b1 xy =
42,750,268 (-1089.0712)(9458) (3.1612)(14,345,564) = 7,701,506.49
se =
SSE
=
n 2
39
= 1387.58
Confidence Interval =
1
(700 843 .167 ) 2
+
1123.757 + (2.776)(1387.58) 6
(5059 ) 2 =
6,280 ,931
6
1123.757 + 1619.81
-496.05 to 2743.57
H0: 1 = 0
Ha: 1 0
= .05
df = 4
3.1612 0
1387 .58
2.9736
.8231614
= 3.234
Since the observed t = 3.234 > t.025,4 = 2.776, the decision is to reject the null
hypothesis.
14.56 x = 11.902
y = 516.8
xy = 1,202.867
x2 = 25.1215
y2 = 61,899.06
n=7
b1 = 66.36277
b0 = -39.0071
= -39.0071 + 66.36277 x
SSE = y2 - b0 y - b1 xy
SSE = 61,899.06 - (-39.0071)(516.8) - (66.36277)(1,202.867) = 2,232.343
se =
r2 =
SSE
=
n 2
2,232 .343
5
= 21.13
SSE
2,232 .343
=1
2
( y )
(516 .8) 2
2
61
,
899
.
06
y n
7
14.57 x = 44,754
y = 17,314
= 1 - .094 = .906
x2 = 167,540,610
y2 = 24,646,062
b1 =
SS xy
SS x
xy = 59,852,571
n = 13
xy
x
x y
40
59 ,852 ,571
n
( x ) 2
= .
01835
b0 =
y b x = 17 ,314
1
13
(. 01835 )
44 ,754
= 1268.685
13
= 1268.685 + .01835 x
b1 =
b0 =
SS xy
SS x
y = 44,754
y2 = 167,540,610
xy
x
x y
n
( x ) 2
n
y b x = 44 ,754
n
13
xy = 304,797
n = 13
(91)( 44 ,754 )
13
= -46.5989
(91) 2
819
13
304 ,797
(46 .5989 )
91
= 3,768.81
13
= 3,768.81 46.5989 x
(2007) = 3,768.81 - 46.5989(15) = 3,069.83
14.58 x = 323.3
x2 = 29,629.13
xy = 339,342.76
SS xy
b1 =
SS x
y = 6765.8
y2 = 7,583,144.64
n=7
xy
41
x y
n
( x ) 2
1.82751
b0 =
= 882.138
= 882.138 + 1.82751 x
SSE = y2 b0 y b1 xy
= 7,583,144.64 (882.138)(6765.8) (1.82751)(339,342.76) = 994,623.07
SSE
=
n 2
se =
r =
= 446.01
SSE
994 ,623 .07
=1
2
( y )
(6765 .8) 2
2
7
,
583
,
144
.
64
7
n
H0: = 0
Ha: 0
SSxx =
= .05
x
2
( x)
n
= 1 - .953 = .047
t.025,5 = 2.571
= 29 ,629 .13
b1 0
1.82751 0
=
s
446 .01
t =
= 0.50
e
14,697 .29
SS xx
Since the observed t = 0.50 < t.025,5 = 2.571, the decision is to fail to reject the
null hypothesis.
42
b1 = 2.40107
b0 = -54.35604
= -54.35604 + 2.40107 x
x2 = 49,584
y2 = 152,711
n=8
100
SSE = y2 - b0 y - b1 xy
SSE = 152,711 - (-54.35604)(1,025) - (2.40107)(86,006) = 1919.5146
se =
r =
SSE
1,919 .5146
=
n 2
6
= 17.886
SSE
1,919 .5145
=1
2
( y )
(1025 ) 2
2
152
,
711
8
n
= 1 - .09 = .91
= .01
sb =
t =
( x)
b1 1 2.40107 0
=
sb
.30783
17.886
(608) 2
49,584
8
= .30783
= 7.80
Since the observed t = 7.80 < t.005,6 = 3.707, the decision is to reject the null
hypothesis.
43
= 67.2 0.0565 x
b) For every unit of increase in the value of x, the predicted value of y will
decrease by -.0565.
c) The t ratio for the slope is 5.50 with an associated p-value of .000. This is
significant at = .10. The t ratio negative because the slope is negative and
the numerator of the t ratio formula equals the slope minus zero.
d) r2 is .627 or 62.7% of the variability of y is accounted for by x. This is only
a modest proportion of predictability. The standard error of the estimate is
10.32. This is best interpreted in light of the data and the magnitude of the
data.
e) The F value which tests the overall predictability of the model is 30.25. For
simple regression analysis, this equals the value of t2 which is (-5.50)2.
f) The negative is not a surprise because the slope of the regression line is also
negative indicating an inverse relationship between x and y. In addition,
taking the square root of r2 which is .627 yields .7906 which is the magnitude
of the value of r considering rounding error.
14.61 The F value for overall predictability is 7.12 with an associated p-value of .0205
which is significant at = .05. It is not significant at alpha of .01. The
coefficient of determination is .372 with an adjusted r2 of .32. This represents
very modest predictability. The standard error of the estimate is 982.219
which982.219, which in units of 1,000 laborers means that about 68% of the
predictions are within 982,219 of the actual figures. The regression model is:
Number of Union Members = 22,348.97 - 0.0524 Labor Force. For a labor force
of 100,000 (thousand, actually 100 million), substitute x = 100,000 and get a
predicted value of 17,108.97 (thousand) which is actually 17,108,970 union
members.
14.62 The Residual Model Diagnostics from MINITAB indicate a relatively healthy set
of residuals. The Histogram indicates that the error terms are generally normally
distributed. This is somewhat confirmed by the semi straight line Normal Plot of
Residuals. However, the Residuals vs. Fits graph indicates that there may be
some heteroscedasticity with greater error variance for small x values.