You are on page 1of 31

Slides Prepared by

JOHN S. LOUCKS
St. Edwards University

2002 South-Western/Thomson Learning

Chapter 14
Simple Linear Regression

Simple Linear Regression Model


Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Computer Solution
Residual Analysis: Validating Model
Assumptions
Residual Analysis: Outliers and Influential
Observations

The Simple Linear Regression Model

Simple Linear Regression Model


y = 0 + 1 x +

Simple Linear Regression Equation


E(y) = 0 + 1x

Estimated Simple Linear Regression Equation


y^= b0 + b1x

Least Squares Method

Least Squares Criterion


min (yi y i )2

where:
yi = observed value of the dependent
variable
^ for the ith observation
yi = estimated value of the dependent
variable
for the ith observation

The Least Squares Method

Slope for the Estimated Regression Equation


xi yi ( xi yi ) / n
b1
2
2
xi ( xi ) / n

y-Intercept for the Estimated Regression


_
_
Equation
b0 = y - b1x
where:
xi = value of independent variable for ith
_
observation
y_i = value of dependent variable for ith
observation
x = mean value for independent variable
y = mean value for dependent variable

Example: Reed Auto Sales

Simple Linear Regression


Reed Auto periodically has a special week-long
sale. As part of the advertising campaign
Reed runs one or more television commercials
during the weekend preceding the sale. Data
from a sample of 5 previous sales are shown
below.
Number of TV Ads
1 14
3 24
2 18
1 17
3 27

Number of Cars Sold

Example: Reed Auto Sales

Slope for the Estimated Regression Equation


b1 = 220 - (10)(100)/5 = 5
24 - (10)2/5
y-Intercept for the Estimated Regression
Equation
b0 = 20 - 5(2) = 10
^
Estimated Regression
Equation
y = 10 + 5x

Example: Reed Auto Sales


Scatter Diagram
30
25
Cars Sold

20

y = 5x + 10

15
10
5
0
0

2
TV Ads

The Coefficient of Determination

Relationship Among SST, SSR, SSE


SST = SSR + SSE
yi y )2 (yi ^
yi )2
(yi y )2 (^

Coefficient of Determination
r2 = SSR/SST
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

Example: Reed Auto Sales

Coefficient of Determination
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong
since 88% of the variation in number of cars
sold can be explained by the linear
relationship between the number of TV ads
and the number of cars sold.

10

The Correlation Coefficient

Sample Correlation Coefficient

rxy (sign of b1 ) Coefficient of Determination

rxy (sign of b1 ) r 2
where:
b1 = the slope of the estimated
regression
y b0 b1 x
equation

11

Example: Reed Auto Sales

Sample Correlation Coefficient

rxy (sign of b1 ) r 2
The sign of b1 in the equation
y 10 5 x is +.

rxy =+ .8772
rxy = +.9366

12

Model Assumptions

Assumptions About the Error Term


The error is a random variable with mean
of zero.
The variance of , denoted by 2, is the
same for all values of the independent
variable.
The values of are independent.
The error is a normally distributed random
variable.

13

Testing for Significance

To test for a significant regression relationship,


we must conduct a hypothesis test to
determine whether the value of 1 is zero.

Two tests are commonly used


t Test
F Test
Both tests require an estimate of 2, the
variance of in the regression model.

14

Testing for Significance

An Estimate of 2
The mean square error (MSE) provides the
estimate
of 2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:

SSE ( yi y i ) 2 ( yi b0 b1 xi ) 2

15

Testing for Significance

An Estimate of
To estimate we take the square root of 2.
The resulting s is called the standard error
of the estimate.

SSE
s MSE
n2

16

Testing for Significance: t Test

Hypotheses
H0: 1 = 0
Ha: 1 = 0

Test Statistic

b1
t
sb1

Rejection Rule
Reject H0 if t < -tor t > t
where t is based on a t distribution with
n - 2 degrees of freedom.
17

Example: Reed Auto Sales

t Test
Hypotheses

H 0 : 1 = 0
Ha: 1 = 0

Rejection Rule
For = .05 and d.f. = 3, t.025 = 3.182
Reject H0 if t > 3.182

Test Statistics
t = 5/1.08 = 4.63
Conclusions
Reject H0

18

Confidence Interval for 1

We can use a 95% confidence interval for 1 to


test the hypotheses just used in the t test.
H0 is rejected if the hypothesized value of 1 is
not included in the confidence interval for 1.

19

Confidence Interval for 1

The form of a confidence interval for 1 is:

b1 t / 2 sb1
where

b1

is the point estimate

t / the
is
2 sb1 margin of error
is tthe
/ 2 t value providing an area
of /2 in the upper tail of a
t distribution with n - 2 degrees
of freedom

20

Example: Reed Auto Sales

Rejection Rule
Reject H0 if 0 is not included in the
confidence interval for 1.

95% Confidence Interval for 1

or 1.56 to 8.44
Conclusion
Reject H0

b1 t / 2=
sb1 5 +/- 3.182(1.08) = 5 +/- 3.44

21

Testing for Significance: F Test

Hypotheses
H0: 1 = 0
Ha: 1 = 0

Test Statistic
F = MSR/MSE
Rejection Rule
Reject H0 if F > F

where F is based on an F distribution with 1


d.f. in
the numerator and n - 2 d.f. in the
denominator.
22

Example: Reed Auto Sales

F Test
Hypotheses

H0: 1 = 0
Ha: 1 = 0

Rejection Rule
For = .05 and d.f. = 1, 3: F.05 =
10.13
Reject H0 if F > 10.13.

Test Statistic
F = MSR/MSE = 100/4.667 = 21.43
Conclusion
We can reject H0.
23

Some Cautions about the


Interpretation of Significance Tests

Rejecting H0: 1 = 0 and concluding that the


relationship between x and y is significant
does not enable us to conclude that a causeand-effect relationship is present between x
and y.
Just because we are able to reject H0: 1 = 0
and demonstrate statistical significance does
not enable us to conclude that there is a linear
relationship between x and y.

24

Using the Estimated Regression Equation


for Estimation and Prediction

Confidence Interval Estimate of E(yp)


y p t / 2sy p

Prediction Interval Estimate of yp

yp + t/2 sind
where the confidence coefficient is 1 - and
t/2 is based on a t distribution with n - 2 d.f.

25

Example: Reed Auto Sales

Point Estimation
If 3 TV ads are run prior to a sale, we expect
the mean number of cars sold to be:
y =^ 10 + 5(3) = 25 cars
Confidence Interval for E(yp)

95% confidence interval estimate of the mean


number of cars sold when 3 TV ads are run is:
25 + 4.61 = 20.39 to 29.61 cars
Prediction Interval for yp

95% prediction interval estimate of the


number of cars sold in one particular week
when 3 TV ads are run is:
25 + 8.28 =
16.72 to 33.28 cars
26

Residual Analysis

Residual for Observation i


yi ^yi

Standardized Residual for Observation i


yi ^
yi
syi ^
yi

where:

syi y^i s 1 hi

27

Example: Reed Auto Sales

Residuals
Observation
1
2
3
4
5

Predicted Cars Sold


15
25
20
15
25

Residuals
-1
-1
-2
2
2

28

Example: Reed Auto Sales


Residual Plot
TV Ads Residual Plot

3
2

Residuals

1
0
-1
-2
-3
0

TV Ads

29

Residual Analysis

Detecting Outliers
An outlier is an observation that is unusual
in comparison with the other data.
Minitab classifies an observation as an
outlier if its standardized residual value is <
-2 or > +2.
This standardized residual rule sometimes
fails to identify an unusually large
observation as being an outlier.
This rules shortcoming can be
circumvented by using studentized deleted
residuals.
The |i th studentized deleted residual| will
be larger than the |i th standardized
residual|.
30

End of Chapter 14

31

You might also like