You are on page 1of 43

Chapter 14: Simple Regression Analysis

Chapter 14
Simple Regression Analysis

LEARNING OBJECTIVES
The overall objective of this chapter is to give you an understanding of bivariate
regression analysis, thereby enabling you to:
1.
2.
3.
4.
5.
6.
7.

Compute the equation of a simple regression line from a sample of data and
interpret the slope and intercept of the equation.
Understand the usefulness of residual analysis in examining the fit of the
regression line to the data and in testing the assumptions underlying regression
analysis.
Compute a standard error of the estimate and interpret its meaning.
Compute a coefficient of determination and interpret it.
Test hypotheses about the slope of the regression model and interpret the results.
Estimate values of y using the regression model.
Develop a linear trend line and use it to forecast.

CHAPTER TEACHING STRATEGY


This chapter is about all aspects of simple (bivariate, linear) regression. Early in
the chapter through scatter plots, the student begins to understand that the object of
simple regression is to fit a line through the points. Fairly soon in the process, the student
learns how to solve for slope and y intercept and develop the equation of the regression
line. Most of the remaining material on simple regression is to determine how good the
fit of the line is and if assumptions underlying the process are met.
The student begins to understand that by entering values of the independent
variable into the regression model, predicted values can be determined. The question
then becomes: Are the predicted values good estimates of the actual dependent values?
One rule to emphasize is that the regression model should not be used to predict for

Chapter 14: Simple Regression Analysis

independent variable values that are outside the range of values used to construct the
model. MINITAB issues a warning for such activity when attempted. There are many
instances where the relationship between x and y are linear over a given interval but
outside the interval the relationship becomes curvilinear or unpredictable. Of course,
with this caution having been given, many forecasters use such regression models to
extrapolate to values of x outside the domain of those used to construct the model. Such
forecasts are introduced in section 14.8, Using Regression to Develop a Forecasting
Trend Line. Whether the forecasts obtained under such conditions are any better than
"seat of the pants" or "crystal ball" estimates remains to be seen.
The concept of residual analysis is a good one to show graphically and
numerically how the model relates to the data and the fact that it more closely fits some
points than others, etc. A graphical or numerical analysis of residuals demonstrates that
the regression line fits the data in a manner analogous to the way a mean fits a set of
numbers. The regression model passes through the points such that the vertical distances
from the actual y values to the predicted values will sum to zero. The fact that the
residuals sum to zero points out the need to square the errors (residuals) in order to get a
handle on total error. This leads to the sum of squares error and then on to the standard
error of the estimate. In addition, students can learn why the process is called least
squares analysis (the slope and intercept formulas are derived by calculus such that the
sum of squares of error is minimized - hence "least squares"). Students can learn that by
examining the values of se, the residuals, r2, and the t ratio to test the slope they can begin
to make a judgment about the fit of the model to the data. Many of the chapter problems
ask the student to comment on these items (se, r2, etc.).
It is my view that for many of these students, the most important facet of this
chapter lies in understanding the "buzz" words of regression such as standard error of the
estimate, coefficient of determination, etc. because they may only interface regression
again as some type of computer printout to be deciphered. The concepts then may be
more important important than the calculations.

Chapter 14: Simple Regression Analysis

CHAPTER OUTLINE
14.1

Introduction to Simple Regression Analysis

14.2

Determining the Equation of the Regression Line

14.3

Residual Analysis
Using Residuals to Test the Assumptions of the Regression Model
Using the Computer for Residual Analysis

14.4

Standard Error of the Estimate

14.5

Coefficient of Determination
Relationship Between r and r2

14.6

Hypothesis Tests for the Slope of the Regression Model and Testing the Overall
Model
Testing the Slope
Testing the Overall Model

14.7

Estimation
Confidence Intervals to Estimate the Conditional Mean of y: y/x
Prediction Intervals to Estimate a Single Value of y

14.8

Using Regression to Develop a Forecasting Trend Line


Determining the Equation of the Trend Line
Forecasting Using the Equation of the Trend Line
Alternate Coding for Time Periods

14.8

Interpreting Computer Output

KEY TERMS
Coefficient of Determination (r2)
Confidence Interval
Dependent Variable
Deterministic Model
Heteroscedasticity
Homoscedasticity
Independent Variable
Least Squares Analysis
Outliers

Prediction Interval
Probabilistic Model
Regression Analysis
Residual
Residual Plot
Scatter Plot
Simple Regression
Standard Error of the Estimate (se)
Sum of Squares of Error (SSE)

Chapter 14: Simple Regression Analysis

SOLUTIONS TO CHAPTER 14
14.1

x
12
21
28
8
20

x
17
15
22
19
24

30

25

20

15

10

0
0

10

15

20

25

30

x = 89
x2= 1,833

b1 =

b0 =

SS xy
SS x

y = 97
y2 = 1,935

xy
x

x y

n
( x ) 2

(89 )( 97 )
5
= 0.162
(89 ) 2
1,833
5

1,767

y b x = 97 0.162 89
n

= 16.51 + 0.162 x

xy = 1,767
n=5

= 16.51

Chapter 14: Simple Regression Analysis

14.2

30,099

x_
140
119
103
91
65
29
24

_y_
25
29
46
70
88
112
128

x
x2

b1 =

b0 =

= 571

SS x

xy
x

x y

n
( x ) 2

= 144.414 0.898 x

n=7

(571 )( 498 )
7
(571 ) 2
58 ,293
7

30 ,099

y b x = 498

xy

= 498

y2 = 45,154

= 58,293
SS xy

(0.898 )

571
= 144.414
7

= -0.898

Chapter 14: Simple Regression Analysis

14.3

(Advertising) x
12.5
3.7
21.6
60.0
37.6
6.1
16.8
41.2

(Sales) y
148
55
338
994
541
89
126
379

x = 199.5
x2 = 7,667.15
SS xy

b1 =

SS x

xy
x

= 2,670
xy = 107,610.4
2
y = 1,587,328
n=8

x y

(199 .5)( 2,670 )


8
(199 .5) 2
7,667 .15
8

107 ,610 .4

n
( x ) 2

15.240
b0 =

14.4

y b x = 2,670
1

= -46.292 + 15.240 x

(Prime) x
16
6
8
4
7
x = 41
x2 = 421

b1 =

b0 =

199 .5
= -46.292
8

15 .24

SS xy
SS x

(Bond) y
5
12
9
15
7
y = 48
y2 = 524

xy
x

x y

n
( x ) 2
n

xy = 333
n=5
(41)( 48 )
5
= -0.715
(41) 2
421
5

333

y b x = 48 (0.715 ) 41
n

= 15.460 0.715 x

= 15.460

Chapter 14: Simple Regression Analysis

14.5

Bankruptcies(y)
34.3
35.0
38.5
40.1
35.5
37.9
x = 344.4

Firm Births(x)
58.1
55.4
57.0
58.5
57.4
58.0
y = 221.3

y2 = 8188.41

b1 =

SS xy
SS x

x2 = 19,774.78

xy = 12,708.08

xy
x

x y

n
( x ) 2
n

n=6

(344 .4)( 221 .3)


6
=
(344 .4) 2
19 ,774 .78
6

12 ,708 .08

b1 = 0.878
b0 =

y b x = 221 .3 (0.878 ) 344 .4


n

= -13.503 + 0.878 x

= -13.503

Chapter 14: Simple Regression Analysis

14.6

No. of Farms (x)

Avg. Size (y)

5.65
4.65
3.96
3.36
2.95
2.52
2.44
2.29
2.15
2.07
2.17
2.10

213
258
297
340
374
420
426
441
460
469
434
444

x = 36.31

y = 4,576

y2 = 1,825,028

xy = 12,766.71

b1 =

b0 =

SS xy
SS x

xy
x

x y

n
( x ) 2
n

y b x = 4,576
n

12

= 600.186 72.3281 x

x2 = 124.7931
n = 12

(36 .31)( 4,576 )


12
= -72.328
(36 .31) 2
124 .7931
12

12 ,766 .71

( 72 .328 )

36 .31
= 600.186
12

Chapter 14: Simple Regression Analysis

14.7

Steel
99.9
97.9
98.9
87.9
92.9
97.9
100.6
104.9
105.3
108.6

New Orders
2.74
2.87
2.93
2.87
2.98
3.09
3.36
3.61
3.75
3.95

x = 994.8

y = 32.15

x2 = 99,293.28

y2 = 104.9815

xy = 3,216.652

n = 10

b1 =

SS xy
SS x

xy
x

x y

n
( x ) 2
n

(994 .8)( 32 .15 )


10
=
(994 .8) 2
99 ,293 .28
10

3,216 .652

0.05557
b0 =

y b x = 32 .15 (0.05557 ) 994 .8


n

10

= -2.31307 + 0.05557 x

10

= -2.31307

Chapter 14: Simple Regression Analysis

14.8

x
15
8
19
12
5

10

y
47
36
56
44
21
= 13.625 + 2.303 x

Residuals:
x
15
8
19
12
5

14.9

x
12
21
28
8
20

14.10

y
47
36
56
44
21

y
17
15
22
19
24

48.1694
32.0489
57.3811
41.2606
25.1401

Predicted ( y )
18.4582
19.9196
21.0563
17.8087
19.7572

Residuals (y- y )
-1.1694
3.9511
-1.3811
2.7394
-4.1401

Residuals (y- y )
-1.4582
-4.9196
0.9437
1.1913
4.2428

= 16.51 + 0.162 x

x
140
119
103
91
65
29
24

y
25
29
46
70
88
112
128

Predicted ( y ) Residuals (y- y )


18.6597
6.3403
37.5229
-8.5229
51.8948
-5.8948
62.6737
7.3263
86.0281
1.9720
118.3648
-6.3648
122.8561
5.1439

= 144.414 - 0.898 x

Chapter 14: Simple Regression Analysis

14.11

x
12.5
3.7
21.6
60.0
37.6
6.1
16.8
41.2

14.12

14.13

Predicted ( y )
144.2053
10.0954
282.8873
868.0945
526.7236
46.6708
209.7364
581.5868

Residuals (y- y )
3.7947
44.9047
55.1127
125.9055
14.2764
42.3292
-83.7364
-202.5868

= -46.292 + 15.240x

x
16
6
8
4
7

y
148
55
338
994
541
89
126
379

11

y
5
12
9
15
7

Predicted ( y )
4.0259
11.1722
9.7429
12.6014
10.4576

Residuals (y- y )
0.9741
0.8278
-0.7429
2.3986
-3.4575

= 15.460 - 0.715 x

_ x_
58.1
55.4
57.0
58.5
57.4
58.0

_ y_
34.3
35.0
38.5
40.1
35.5
37.9

Predicted ( y )
37.4978
35.1277
36.5322
37.8489
36.8833
37.4100

Residuals (y- y )
-3.1978
-0.1277
1.9678
2.2511
-1.3833
0.4900

The residual for x = 58.1 is relatively large, but the residual for x = 55.4 is quite
small.

Chapter 14: Simple Regression Analysis

14.14

_x_ _ y_
5
47
7
38
11
32
12
24
19
22
25
10

Predicted ( y )
42.2756
38.9836
32.3997
30.7537
19.2317
9.3558

12

Residuals (y- y )
4.7244
-0.9836
-0.3996
-6.7537
2.7683
0.6442

= 50.506 - 1.646 x

No apparent violation of assumptions

14.15

Miles (x)
Cost y
1,245
2.64
425
2.31
1,346
2.45
973
2.52
255
2.19
865
2.55
1,080
2.40
296
2.37
= 2.2257 0.00025 x
y

( y )
2.5376
2.3322
2.5629
2.4694
2.2896
2.4424
2.4962
2.2998

No apparent violation of assumptions

(y- y )
.1024
-.0222
-.1128
.0506
-.0996
.1076
-.0962
.0702

Chapter 14: Simple Regression Analysis

13

14.16

Error terms appear to be non independent

14.17

There appears to be nonlinear regression

14.18

The MINITAB Residuals vs. Fits graphic is strongly indicative of a violation of


the homoscedasticity assumption of regression. Because the residuals are very
close together for small values of x, there is little variability in the residuals at the
left end of the graph. On the other hand, for larger values of x, the graph flares
out indicating a much greater variability at the upper end. Thus, there is a lack of
homogeneity of error across the values of the independent variable.

Chapter 14: Simple Regression Analysis

14

14.19 SSE = y2 b0 y - b1 XY = 1,935 - (16.51)(97) - 0.1624(1767) = 46.5692


se =

SSE
=
n 2

46 .5692
3

= 3.94

Approximately 68% of the residuals should fall within 1se.


3 out of 5 or 60% of the actual residuals fell within 1se.

14.20 SSE = y2 b0 y - b1 XY = 45,154 - 144.414(498) - (-.89824)(30,099) =


SSE = 272.0
se =

SSE
=
n 2

272 .0
= 7.376
5

6 out of 7 = 85.7% fall within + 1se


7 out of 7 = 100% fall within + 2se

14.21 SSE = y2 b0 y - b1 XY = 1,587,328 - (-46.29)(2,670) - 15.24(107,610.4) =


SSE = 70,940
se =

SSE
=
n 2

70 ,940
= 108.7
6

Six out of eight (75%) of the sales estimates are within $108.7 million.

14.22 SSE = y2 b0 y - b1 XY = 524 - 15.46(48) - (-0.71462)(333) = 19.8885


se =

SSE
19 .8885
=
n 2
3

= 2.575

Four out of five (80%) of the estimates are within 2.575 of the actual rate for
bonds. This amount of error is probably not acceptable to financial analysts.

Chapter 14: Simple Regression Analysis

14.23

_ x_
58.1
55.4
57.0
58.5
57.4
58.0

_ y_
34.3
35.0
38.5
40.1
35.5
37.9

SSE =
se =

Predicted ( y )
37.4978
35.1277
36.5322
37.8489
36.8833
37.4100

( y y )

SSE
=
n 2

15

( y y ) 2
Residuals (y- y )
-3.1978
10.2259
-0.1277
0.0163
1.9678
3.8722
2.2511
5.0675
-1.3833
1.9135
0.4900
0.2401
2
( y y ) = 21.3355

= 21.3355

21 .3355
= 2.3095
4

This standard error of the estimate indicates that the regression model is
with + 2.3095(1,000) bankruptcies about 68% of the time. In this
particular problem, 5/6 or 83.3% of the residuals are within this standard
error of the estimate.

14.24

(y- y )2

(y- y )
4.7244
-0.9836
-0.3996
-6.7537
2.7683
0.6442

SSE =
se =

22.3200
.9675
.1597
45.6125
7.6635
.4150
2

y
(y- ) = 77.1382

( y y ) = 77.1382

SSE
=
n 2

77 .1382
= 4.391
4

Chapter 14: Simple Regression Analysis

14.25

(y- y )
.1024
-.0222
-.1129
.0506
-.0996
.1076
-.0962
.0702

se =

16

(y- y )2
.0105
.0005
.0127
.0026
.0099
.0116
.0093
.0049
2

y
(y- ) = .0620

SSE
=
n 2

.0620
6

SSE =

( y y )

= .0620

= .1017

The model produces estimates that are .1017 or within about 10 cents 68% of the
time. However, the range of milk costs is only 45 cents for this data.

14.26

Volume (x)
728.6
497.9
439.1
377.9
375.5
363.8
276.3

Sales (y)
10.5
48.1
64.8
20.1
11.4
123.8
89.0

n=7
x = 3059.1
y = 367.7
x2 = 1,464,071.97
y2 = 30,404.31
b1 = -.1504

xy = 141,558.6

b0 = 118.257

= 118.257 - .1504x

SSE = y2 b0 y - b1 XY
= 30,404.31 - (118.257)(367.7) - (-0.1504)(141,558.6) = 8211.6245
se =

SSE
=
n 2

8211 .6245
5

= 40.526

This is a relatively large standard error of the estimate given the sales values
(ranging from 10.5 to 123.8).

Chapter 14: Simple Regression Analysis

14.27 r =

17

SSE
46 .6399
=1
2
( y )
(97 ) 2
2
1
,
935

y n
5

= .123

This is a low value of r2

14.28 r =

SSE
272 .121
=1
2
( y )
( 498 ) 2
2
45
,
154

y n
7

= .972

This is a high value of r2

14.29 r2 =

SSE
70 ,940
=1
2
( y )
( 2,670 ) 2
2
1
,
587
,
328

y n
8

= .898

This value of r2 is relatively high

14.30 r =

SSE
19 .8885
=1
2
( y )
(48 ) 2
2
524

y n
5

= .685

This value of r2 is a modest value.


68.5% of the variation of y is accounted for by x but 31.5% is unaccounted for.

14.31 r =

SSE
21 .33547
=1
2
( y )
( 221 .3) 2 = .183
2
8
,
188
.
41

6
n

This value is a low value of r2.


Only 18.3% of the variability of y is accounted for by the x values and 81.7% are
unaccounted for.

Chapter 14: Simple Regression Analysis

14.32

CCI
116.8
91.5
68.5
61.6
65.9
90.6
100.0
104.6
125.4

Median Income
37.415
36.770
35.501
35.047
34.700
34.942
35.887
36.306
37.005

x = 323.573
y2 = 79,718.79

b1 =

SS xy

SS x

18

y = 824.9
xy = 29,804.4505

xy
x

x y

n
( x ) 2

x2 = 11,640.93413
n=9

(323 .573 )(824 .9)


9
=
(323 .573 ) 2
11,640 .93413
9

29 ,804 .4505

b1 = 19.2204
b0 =

y b x = 824 .9 (19 .2204 ) 323 .573


1

= -599.3674

= -599.3674 + 19.2204 x

SSE = y2 b0 y - b1 XY =
79,718.79 (-599.3674)(824.9) 19.2204(29,804.4505) = 1283.13435
se =

r =

SSE
1283 .13435
=
= 13.539
n 2
7

SSE
1283 .13435
=1
2
( y )
(824 .9) 2
2
79
,
718
.
79

9
n

se

14.33 sb =

x
2

( x)
n

3.94
1.833

(89) 2
5

= .2498

= .688

Chapter 14: Simple Regression Analysis

19

b1 = 0.162
Ho: = 0
Ha: 0

= .05

This is a two-tail test, /2 = .025

df = n - 2 = 5 - 2 = 3

t.025,3 = 3.182
t =

b1 1 0.162 0
=
= 0.65
sb
.2498

Since the observed t = 0.65 < t.025,3 = 3.182, the decision is to fail to reject the
null hypothesis.

se

14.34 sb =

( x)

7.376

58,293

(571) 2
7

= .068145

b1 = -0.898
Ho: = 0
Ha: 0

= .01

Two-tail test, /2 = .005

df = n - 2 = 7 - 2 = 5

t.005,5 = 4.032
t=

b1 1 0.898 0
=
= -13.18
sb
.068145

Since the observed t = -13.18 < t.005,5 = -4.032, the decision is to reject the null
hypothesis.

se

14.35 sb =

b1 = 15.240

( x)
n

108 .7
(199 .5) 2
7,667 .15
8

= 2.095

Chapter 14: Simple Regression Analysis

Ho: = 0
Ha: 0

20

= .10

For a two-tail test, /2 = .05

df = n - 2 = 8 - 2 = 6

t.05,6 = 1.943
t =

b1 1 15,240 0
=
= 7.27
sb
2.095

Since the observed t = 7.27 > t.05,6 = 1.943, the decision is to reject the null
hypothesis.

se

14.36 sb =

( x )
n

2.575
421

(41) 2
5

= .27963

b1 = -0.715
Ho: = 0
Ha: 0

= .05

For a two-tail test, /2 = .025

df = n - 2 = 5 - 2 = 3

t.025,3 = 3.182
t =

b1 1 0.715 0
=
= -2.56
sb
.27963

Since the observed t = -2.56 > t.025,3 = -3.182, the decision is to fail to reject the
null hypothesis.

Chapter 14: Simple Regression Analysis

se

14.37 sb =

x2

( x ) 2
n

21

2.3095

19,774 .78

(344 .4) 2
6

= 0.926025

b1 = 0.878
Ho: = 0
Ha: 0

= .05

For a two-tail test, /2 = .025

df = n - 2 = 6 - 2 = 4

t.025,4 = 2.776
t =

b1 1 0.878 0
=
= 0.948
sb
.926025

Since the observed t = 0.948 < t.025,4 = 2.776, the decision is to fail to reject the
null hypothesis.

14.38 F = 8.26 with a p-value of .021. The overall model is significant at = .05 but
not at = .01. For simple regression,
t =

F = 2.874

t.05,8 = 1.86 but t.01,8 = 2.896. The slope is significant at = .05 but not at
= .01.

Chapter 14: Simple Regression Analysis

14.39 x0 = 25
95% confidence
df = n - 2 = 5 - 2 = 3
x=

x = 89
n

22

/2 = .025
t.025,3 = 3.182

= 17.8

x = 89

x2 = 1,833

se = 3.94

= 16.5 + 0.162(25) = 20.55


1
+
n

t /2,n-2 se

( x0 x) 2
( x ) 2
2
x n

1 (25 17.8) 2
+
20.55 3.182(3.94) 5
(89 ) 2
1,833
5

20.55 8.01
12.54 < E(y25) < 28.56

= 20.55 3.182(3.94)(.63903) =

Chapter 14: Simple Regression Analysis

23

14.40 x0 = 100
For 90% confidence, /2 = .05
df = n - 2 = 7 - 2 = 5
t.05,5 = 2.015
x=

x = 571
n

= 81.57143

x= 571

x2 = 58,293

se = 7.377

= 144.414 - .0898(100) = 54.614


1+

/2,n-2

1
+
n

se

( x0 x) 2
( x ) 2
2
x n
=

54.614 2.015(7.377)

1+

1 (100 81 .57143 ) 2
+
=
(571) 2
7
58,293
7

54.614 2.015(7.377)(1.08252) = 54.614 16.091


38.523 < y < 70.705
For x0 = 130,

1
1+ +
n
y t /2,n-2 se

= 144.414 - 0.898(130) = 27.674


( x0 x) 2
( x ) 2
2
x

27.674 2.015(7.377)

1 (130 81 .57143 ) 2
1+ +
=
(571) 2
7
58,293
7

27.674 2.015(7.377)(1.1589) =

27.674 17.227

10.447 < y < 44.901


The width of this confidence interval of y for x0 = 130 is wider that the
confidence interval of y for x0 = 100 because x0 = 100 is nearer to the value of
x = 81.57 than is x0 = 130.

Chapter 14: Simple Regression Analysis

14.41 x0 = 20
For 98% confidence,
df = n - 2 = 8 - 2 = 6
t.01,6 = 3.143
x=

x = 199 .5
n

/2 = .01

= 24.9375

x = 199.5

24

x2 = 7,667.15

se = 108.8

= -46.29 + 15.24(20) = 258.51


1
+
n

/2,n-2

se

( x0 x) 2
( x) 2
2
x

1
+
258.51 (3.143)(108.8) 8

(20 24 .9375 ) 2
(199 .5) 2
7,667 .15
8

258.51 (3.143)(108.8)(0.36614) = 258.51 125.20


133.31 < E(y20) < 383.71
For single y value:
1+

t /2,n-2 se

1
+
n

( x0 x) 2
( x ) 2
2
x n

258.51 (3.143)(108.8)

1
1+ +
8

(20 24 .9375 ) 2
(199 .5) 2
7,667 .15
8

258.51 (3.143)(108.8)(1.06492) = 258.51 364.16


-105.65 < y < 622.67
The confidence interval for the single value of y is wider than the confidence
interval for the average value of y because the average is more towards the
middle and individual values of y can vary more than values of the average.

Chapter 14: Simple Regression Analysis

25

/2 = .005

14.42 x0 = 10
For 99% confidence
df = n - 2 = 5 - 2 = 3
t.005,3 = 5.841
x=

x = 41
n

= 8.20

x = 41

x2 = 421

se = 2.575

= 15.46 - 0.715(10) = 8.31


1
+
n

/2,n-2

se

( x0 x) 2
( x ) 2
2
x n

1 (10 8.2) 2
+
8.31 5.841(2.575) 5
(41) 2
421
5

8.31 5.841(2.575)(.488065) = 8.31 7.34


0.97 < E(y10) < 15.65
If the prime interest rate is 10%, we are 99% confident that the average bond rate
is between 0.97% and 15.65%.

Chapter 14: Simple Regression Analysis

14.43

Year
2001
2002
2003
2004
2005

Fertilizer
11.9
17.9
22.0
21.8
26.0

x = 10,015
x2= 20,060,055

b1 =

b0 =

SS xy
SS x

y = 99.6
y2 = 2097.26

xy
x

x y

n
( x ) 2
n

xy = 199,530.9
n=5

(10 ,015 )( 99 .6)


5
= 3.21
(10 ,015 ) 2
20 ,060 ,055
5

199 ,530 .9

y b x = 99 .6 3.21 10 ,015
n

26

= -6,409.71 + 3.21 x
(2008) = -6,409.71 + 3.21(2008) = 35.97

= -6,409.71

Chapter 14: Simple Regression Analysis

14.44

Year
1998
1999
2000
2001
2002
2003
2004

Fertilizer
5860
6632
7125
6000
4380
3326
2642

x = 14,007
x2= 28,028,035

b1 =

SS xy
SS x

27

y = 35,965
y2 = 202,315,489

xy
x

x y

xy = 71,946,954
n=7

(14 ,007 )( 35 ,965 )


7
=
(14 ,007 ) 2
28 ,028 ,035
7

71,946 ,954

n
( x ) 2

-678.9643
b0 =

y b x = 35 ,965
n

678 .9643

14 ,007
7

= 1,363,745.39

= 1,363,745.39 + -678.9643 x
(2007) = 1,363,745.39 + -678.9643(2007) = 1,064.04

Chapter 14: Simple Regression Analysis

14.45 Year
2003

Quarter
1
2
3
4
1
2
3
4
1
2
3
4

2004

2005

28

Cum. Quarter(x)
1
2
3
4
5
6
7
8
9
10
11
12

Sales(y)
11.93
12.46
13.28
15.08
16.08
16.82
17.60
18.66
19.73
21.11
22.21
22.94

Use the cumulative quarters as the predictor variable, x, to predict sales, y.


x = 78
x2= 650

b1 =

b0 =

SS xy
SS x

y = 207.9
y2 = 3,755.2084

xy
x

x y

n
( x ) 2

(78 )( 207 .9)


12
= 1.033
(78 ) 2
650
12

1,499 .07

y b x = 207 .9 1.033 78
n

12

xy = 1,499.07
n = 12

12

= 10.6105

= 10.6105 + 1.033 x

Remember, this trend line was constructed using cumulative quarters. To forecast
sales for the third quarter of year 2007, we must convert this time frame to
cumulative quarters. The third quarter of year 2007 is quarter number 19 in our
scheme.

(19) = 10.6105 + 1.033(19) = 30.2375

Chapter 14: Simple Regression Analysis

14.46

x
5
7
3
16
12
9

y
8
9
11
27
15
13

x = 52
y = 83
xy = 865
a)
b)

c)

x2 = 564
y2 = 1,389
n=6

b1 = 1.2853
b0 = 2.6941

= 2.6941 + 1.2853 x

(Predicted Values)
9.1206
11.6912
6.5500
23.2588
18.1177
14.2618

(y- y ) residuals
-1.1206
-2.6912
4.4500
3.7412
-3.1176
-1.2618

(y- y )2
1.2557
7.2426
19.8025
13.9966
9.7194
1.5921
SSE = 53.6089
se =

d)

29

r2 =

SSE
=
n 2

53 .6089
4

= 3.661

SSE
53 .6089
=1
2
( y )
(83 ) 2
2
1
,
389

y n
6

= .777

Chapter 14: Simple Regression Analysis

e)

Ho: = 0
Ha: 0

30

= .01

Two-tailed test, /2 = .005

df = n - 2 = 6 - 2 = 4

t.005,4 = 4.604
se

sb =

t =

x2

( x) 2

3.661
564

(52) 2
6

= .34389

b1 1 1.2853 0
=
= 3.74
sb
.34389

Since the observed t = 3.74 < t.005,4 = 4.604, the decision is to fail to reject
the null hypothesis.
f)

14.47

The r2 = 77.74% is modest. There appears to be some prediction with this


model. The slope of the regression line is not significantly different from
zero using = .01. However, for = .05, the null hypothesis of a zero
slope is rejected. The standard error of the estimate, se = 3.661 is not
particularly small given the range of values for y (11 - 3 = 8).

x
53
47
41
50
58
62
45
60

y
5
5
7
4
10
12
3
11

x = 416
y = 57
xy = 3,106
a)

x2 = 22,032
y2 = 489
n=8

= -11.335 + 0.355 x

b1 = 0.355
b0 = -11.335

Chapter 14: Simple Regression Analysis

b)

c)

d)

(Predicted Values)
7.48
5.35
3.22
6.415
9.255
10.675
4.64
9.965

(y -

31

) residuals
-2.48
-0.35
3.78
-2.415
0.745
1.325
-1.64
1.035

(y- y )2
6.1504
0.1225
14.2884
5.8322
0.5550
1.7556
2.6896
1.0712
SSE = 32.4649
se =

SSE
=
n 2

32 .4649
6

= 2.3261

SSE
32 .4649
=1
2
( y )
(57 ) 2
2
489

y n
8

e)

r =

f)

Ho: = 0
Ha: 0

= .608

= .05

Two-tailed test, /2 = .025

df = n - 2 = 8 - 2 = 6

t.025,6 = 2.447
se

sb =

t =

( x)
n

2.3261
22,032

b1 1 0.3555 0
=
= 3.05
sb
.116305

( 416 ) 2
8

= 0.116305

Chapter 14: Simple Regression Analysis

32

Since the observed t = 3.05 > t.025,6 = 2.447, the decision is to reject the
null hypothesis.
The population slope is different from zero.
This model produces only a modest r2 = .608. Almost 40% of the
variance of y is unaccounted for by x. The range of y values is 12 - 3 = 9
and the standard error of the estimate is 2.33. Given this small range, the
se is not small.

g)

14.48

x = 1,263
y = 417
xy = 88,288
b0 = 25.42778

x2 = 268,295
y2 = 29,135
n=6
b1 = 0.209369

SSE = y2 - b0 y - b1 xy =
29,135 - (25.42778)(417) - (0.209369)(88,288) = 46.845468

r =

SSE
46 .845468
=1
2
153 .5
( y )
= .695
y2 n

Coefficient of determination = r2 = .695

14.49a) x0 = 60
x = 524
y = 215
xy = 15,125

x2 = 36,224
y2 = 6,411
n=8

se = 3.201
95% Confidence Interval
df = n - 2 = 8 - 2 = 6
t.025,6 = 2.447

= -9.026 + 0.5481(60) = 23.86

b1 = .5481
b0 = -9.026

/2 = .025

Chapter 14: Simple Regression Analysis

x=

x = 524
n

= 65.5
1
+
n

/2,n-2

33

se

( x0 x) 2
( x ) 2
2
x n

23.86 + 2.447(3.201)

1
+
8

(60 65 .5) 2
(524 ) 2
36,224
8

23.86 + 2.447(3.201)(.375372) = 23.86 + 2.94


20.92 < E(y60) < 26.8
b) x0 = 70

70

= -9.026 + 0.5481(70) = 29.341


1+

1
+
n

+ t /2,n-2 se

29.341 + 2.447(3.201)

( x0 x) 2
( x ) 2
2
x n
1+

1
+
8

(70 65.5) 2
(524 ) 2
36,224
8

29.341 + 2.447(3.201)(1.06567) = 29.341 + 8.347


20.994 < y < 37.688
c) The confidence interval for (b) is much wider because part (b) is for a single value
of y which produces a much greater possible variation. In actuality, x0 = 70 in
part (b) is slightly closer to the mean (x) than x0 = 60. However, the width of the
single interval is much greater than that of the average or expected y value in
part (a).

Chapter 14: Simple Regression Analysis

14.50

Year
1
2
3
4
5

Cost
56
54
49
46
45

x = 15
x2= 55

b1 =

b0 =

34

y = 250
y2 = 12,594

SS xy
SS x

xy
x

x y

n
( x ) 2

(15 )( 250 )
5
= -3
(15 ) 2
55
5

720

y b x = 250
1

xy = 720
n=5

(3)

15
= 59
5

= 59 - 3 x
(7) = 59 - 3(7) = 38

14.51 y = 267
x = 21
xy = 1,256
b0 = 9.234375

y2 = 15,971
x2 = 101
n =5
b1 = 10.515625

SSE = y2 - b0 y - b1 xy =
15,971 - (9.234375)(267) - (10.515625)(1,256) = 297.7969

r =

SSE
297 .7969
=1
2
1,713 .2 = .826
( y )
y2 n

If a regression model would have been developed to predict number of cars sold
by the number of sales people, the model would have had an r2 of 82.6%. The
same would hold true for a model to predict number of sales people by the
number of cars sold.

Chapter 14: Simple Regression Analysis

14.52 n = 12
y = 5940

x = 548
y2 = 3,211,546

b1 = 10.626383

35

x2 = 26,592
xy = 287,908

b0 = 9.728511

= 9.728511 + 10.626383 x

SSE = y2 - b0 y - b1 xy =
3,211,546 - (9.728511)(5940) - (10.626383)(287,908) = 94337.9762
se =

r =

t =

SSE
=
n 2

94 ,337 .9762
10

= 97.1277

SSE
94 ,337 .9762
=1
2
271 ,246
( y )
y 2 n

10 .626383 0
97 .1277
(548 ) 2
26 ,592
12

= .652

= 4.33

If = .01, then t.005,10 = 3.169. Since the observed t = 4.33 > t.005,10 = 3.169, the
decision is to reject the null hypothesis.

Chapter 14: Simple Regression Analysis

14.53

Sales(y)

Number of Units(x)

17.1
7.9
4.8
4.7
4.6
4.0
2.9
2.7
2.7

12.4
7.5
6.8
8.7
4.6
5.1
11.2
5.1
2.9

y = 51.4
x2 = 538.97

y2 = 460.1
xy = 440.46

b1 = 0.92025

36

x = 64.3
n=9

b0 = -0.863565

= -0.863565 + 0.92025 x

SSE = y2 - b0 y - b1 xy =
460.1 - (-0.863565)(51.4) - (0.92025)(440.46) = 99.153926

r =

SSE
99 .153926
=1
2
166 .55 = .405
( y )
y2 n

Chapter 14: Simple Regression Analysis

14.54

207,596,350

Year
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004

37

Total Employment
11,152
10,935
11,050
10,845
10,776
10,764
10,697
9,234
9,223
9,158

x = 19,995

y = 103,834

x2= 39,980,085

y2 = 1,084,268,984

b1 =

SS xy
SS x

xy
x

x y

n
( x ) 2
n

xy =
n=7

(19 ,995 )(103 ,834 )


10
=
(19 ,995 ) 2
39 ,980 ,085
10

207 ,596 ,350

-239.188
b0 =

y b x = 103 ,834
n

10

( 239 .188 )

19 ,995
10

= 488,639.564 + -239.188 x
(2008) = 488,639.564 + -239.188(2008) = 8,350.30

= 488,639.564

Chapter 14: Simple Regression Analysis

14.55

1977
581
213
668
345
1476
1776

2003
666
214
496
204
1600
6278

x= 5059
y2 = 42,750,268

b1 =

b0 =

SS xy
SS x

y = 9458
xy = 14,345,564

xy
x

x y

n
( x ) 2

x2 = 6,280,931
n=6

(5059 )( 9358 )
6
= 3.1612
(5059 ) 2
6,280 ,931
6

13,593 ,272

y b x = 9458
n

38

(3.1612 )

5059
6

= -1089.0712

= -1089.0712 + 3.1612 x

for x = 700:

= 1076.6044
1
+
n

+ t /2,n-2se

( x0 x) 2
( x ) 2
2
x

= .05, t.025,4 = 2.776


x0 = 700, n = 6
x

= 843.167

SSE = y2 b0 y b1 xy =
42,750,268 (-1089.0712)(9458) (3.1612)(14,345,564) = 7,701,506.49

Chapter 14: Simple Regression Analysis

se =

SSE
=
n 2

7,701 ,506 .49


4

39

= 1387.58

Confidence Interval =
1
(700 843 .167 ) 2
+
1123.757 + (2.776)(1387.58) 6
(5059 ) 2 =
6,280 ,931
6
1123.757 + 1619.81

-496.05 to 2743.57
H0: 1 = 0
Ha: 1 0

= .05

df = 4

Table t.025,4 = 2.776


b1 0
=
t = sb

3.1612 0
1387 .58

2.9736
.8231614

= 3.234

2,015 ,350 .833

Since the observed t = 3.234 > t.025,4 = 2.776, the decision is to reject the null
hypothesis.

14.56 x = 11.902
y = 516.8
xy = 1,202.867

x2 = 25.1215
y2 = 61,899.06
n=7

b1 = 66.36277
b0 = -39.0071

= -39.0071 + 66.36277 x

SSE = y2 - b0 y - b1 xy
SSE = 61,899.06 - (-39.0071)(516.8) - (66.36277)(1,202.867) = 2,232.343
se =

r2 =

SSE
=
n 2

2,232 .343
5

= 21.13

SSE
2,232 .343
=1
2
( y )
(516 .8) 2
2
61
,
899
.
06

y n
7

14.57 x = 44,754

y = 17,314

= 1 - .094 = .906

x2 = 167,540,610

Chapter 14: Simple Regression Analysis

y2 = 24,646,062

b1 =

SS xy
SS x

xy = 59,852,571

n = 13

xy
x

x y

40

( 44 ,754 )(17 ,314 )


13
( 44 ,754 ) 2
167 ,540 ,610
13

59 ,852 ,571

n
( x ) 2

= .

01835
b0 =

y b x = 17 ,314
1

13

(. 01835 )

44 ,754
= 1268.685
13

= 1268.685 + .01835 x

r2 for this model is .002858. There is no predictability in this model.


Test for slope: t = 0.18 with a p-value of 0.8623. Not significant
Time-Series Trend Line:
x = 91
x2= 819

b1 =

b0 =

SS xy
SS x

y = 44,754
y2 = 167,540,610

xy
x

x y

n
( x ) 2
n

y b x = 44 ,754
n

13

xy = 304,797
n = 13
(91)( 44 ,754 )
13
= -46.5989
(91) 2
819
13

304 ,797

(46 .5989 )

91
= 3,768.81
13

= 3,768.81 46.5989 x
(2007) = 3,768.81 - 46.5989(15) = 3,069.83

Chapter 14: Simple Regression Analysis

14.58 x = 323.3
x2 = 29,629.13
xy = 339,342.76
SS xy

b1 =

SS x

y = 6765.8
y2 = 7,583,144.64
n=7

xy

41

x y

n
( x ) 2

(323 .3)( 6765 .8)


7
(323 .3) 2
29 ,629 .13
7

339 ,342 .76

1.82751

y b x = 6765 .8 (1.82751 ) 323 .3

b0 =

= 882.138

= 882.138 + 1.82751 x

SSE = y2 b0 y b1 xy
= 7,583,144.64 (882.138)(6765.8) (1.82751)(339,342.76) = 994,623.07
SSE
=
n 2

se =

r =

= 446.01

SSE
994 ,623 .07
=1
2
( y )
(6765 .8) 2
2
7
,
583
,
144
.
64

7
n

H0: = 0
Ha: 0
SSxx =

994 ,623 .07


5

= .05

x
2

( x)
n

= 1 - .953 = .047

t.025,5 = 2.571

= 29 ,629 .13

(323 .3) 2 = 14,697.29


7

b1 0
1.82751 0
=
s
446 .01
t =
= 0.50
e
14,697 .29
SS xx

Since the observed t = 0.50 < t.025,5 = 2.571, the decision is to fail to reject the
null hypothesis.

Chapter 14: Simple Regression Analysis

42

14.59 Let Water use = y and Temperature = x


x = 608
y = 1,025
xy = 86,006

b1 = 2.40107
b0 = -54.35604

= -54.35604 + 2.40107 x

x2 = 49,584
y2 = 152,711
n=8

100

= -54.35604 + 2.40107(100) = 185.751

SSE = y2 - b0 y - b1 xy
SSE = 152,711 - (-54.35604)(1,025) - (2.40107)(86,006) = 1919.5146
se =

r =

SSE
1,919 .5146
=
n 2
6

= 17.886

SSE
1,919 .5145
=1
2
( y )
(1025 ) 2
2
152
,
711

8
n

= 1 - .09 = .91

Testing the slope:


Ho: = 0
Ha: 0

= .01

Since this is a two-tailed test, /2 = .005


df = n - 2 = 8 - 2 = 6
t.005,6 = 3.707
se

sb =

t =

( x)

b1 1 2.40107 0
=
sb
.30783

17.886
(608) 2
49,584
8

= .30783

= 7.80

Since the observed t = 7.80 < t.005,6 = 3.707, the decision is to reject the null
hypothesis.

Chapter 14: Simple Regression Analysis

14.60 a) The regression equation is:

43

= 67.2 0.0565 x

b) For every unit of increase in the value of x, the predicted value of y will
decrease by -.0565.
c) The t ratio for the slope is 5.50 with an associated p-value of .000. This is
significant at = .10. The t ratio negative because the slope is negative and
the numerator of the t ratio formula equals the slope minus zero.
d) r2 is .627 or 62.7% of the variability of y is accounted for by x. This is only
a modest proportion of predictability. The standard error of the estimate is
10.32. This is best interpreted in light of the data and the magnitude of the
data.
e) The F value which tests the overall predictability of the model is 30.25. For
simple regression analysis, this equals the value of t2 which is (-5.50)2.
f) The negative is not a surprise because the slope of the regression line is also
negative indicating an inverse relationship between x and y. In addition,
taking the square root of r2 which is .627 yields .7906 which is the magnitude
of the value of r considering rounding error.

14.61 The F value for overall predictability is 7.12 with an associated p-value of .0205
which is significant at = .05. It is not significant at alpha of .01. The
coefficient of determination is .372 with an adjusted r2 of .32. This represents
very modest predictability. The standard error of the estimate is 982.219
which982.219, which in units of 1,000 laborers means that about 68% of the
predictions are within 982,219 of the actual figures. The regression model is:
Number of Union Members = 22,348.97 - 0.0524 Labor Force. For a labor force
of 100,000 (thousand, actually 100 million), substitute x = 100,000 and get a
predicted value of 17,108.97 (thousand) which is actually 17,108,970 union
members.

14.62 The Residual Model Diagnostics from MINITAB indicate a relatively healthy set
of residuals. The Histogram indicates that the error terms are generally normally
distributed. This is somewhat confirmed by the semi straight line Normal Plot of
Residuals. However, the Residuals vs. Fits graph indicates that there may be
some heteroscedasticity with greater error variance for small x values.

You might also like