Professional Documents
Culture Documents
Chap 15-1
Multiple Regression Analysis
and Model Building
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-2
Chapter Goals
After completing this chapter, you should be able
to:
+ + + +
Population model:
Y-intercept Population slopes Random Error
Estimated
(or predicted)
value of y
Estimated slope coefficients
Estimated multiple regression model:
Estimated
intercept
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-5
Multiple Regression Model
Two variable model
y
x
1
x
2
2 2 1 1 0
x b x b b y
+ +
S
l
o
p
e
f
o
r
v
a
r
i
a
b
l
e
x
1
S
lo
p
e
fo
r v
a
ria
b
le
x
2
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-6
Multiple Regression Model
Two variable model
y
x
1
x
2
2 2 1 1 0
x b x b b y
+ +
y
i
y
i
<
e = (y y)
<
x
2i
x
1i
The best fit equation, y ,
is found by minimizing the
sum of squared errors, e
2
<
Sample
observation
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-7
Multiple Regression Assumptions
Slope (b
i
)
Example: if b
1
= -20, then sales (y) is expected to
decrease by an estimated 20 pies per week for each $1
increase in selling price (x
1
), net of the effects of
changes due to advertising (x
2
)
y-intercept (b
0
)
Excel:
PHStat:
Excel:
PHStat:
+
+
Note that Advertising is
in $100s, so $350
means that x
2
= 3.5
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-21
Predictions in PHStat
R
2
never decreases when a new x variable is
added to the model
Smaller than R
2
,
_
1 k n
1 n
) R 1 ( 1 R
2 2
A
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-27
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA
df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.44172 R
2
A
44.2% of the variation in pie sales is
explained by the variation in price and
advertising, taking into account the sample
size and number of independent variables
Multiple Coefficient of
Determination
(continued)
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-28
Is the Model Significant?
Hypotheses:
H
0
:
1
=
2
= =
k
= 0 (no linear relationship)
H
A
: at least one
i
0 (at least one independent
variable affects y)
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-29
F-Test for Overall Significance
Test statistic:
where F has (numerator) D
1
= k and
(denominator) D
2
= (n k 1)
degrees of freedom
(continued)
MSE
MSR
1 k n
SSE
k
SSR
F
Hypotheses:
H
0
:
i
= 0 (no linear relationship)
H
A
:
i
0 (linear relationship does exist
between x
i
and y)
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-33
Are Individual Variables
Significant?
H
0
:
i
= 0 (no linear relationship)
H
A
:
i
0 (linear relationship does exist
between x
i
and y )
Test Statistic:
(df = n k 1)
i
b
i
s
0 b
t
(continued)
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-34
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA
df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
t-value for Price is t = -2.306, with
p-value .0398
t-value for Advertising is t = 2.855,
with p-value .0145
(continued)
Are Individual Variables
Significant?
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-35
d.f. = 15-2-1 = 12
= .05
t
/2
= 2.1788
Inferences about the Slope:
t Test Example
H
0
:
i
= 0
H
A
:
i
0
The test statistic for each variable falls in
the rejection region (p-values < .05)
There is evidence that both
Price and Advertising affect
pie sales at = .05
From Excel output:
Reject H
0
for each variable
Coefficients Standard Error t Stat P-value
Price -24.97509 10.83213 -2.30565 0.03979
Advertising 74.13096 25.96732 2.85478 0.01449
Decision:
Conclusion:
Reject H
0
Reject H
0
/2=.025
-t
/2
Do not reject H
0
0
t
/2
/2=.025
-2.1788 2.1788
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-36
Confidence Interval Estimate
for the Slope
Confidence interval for the population slope
1
(the effect of changes in price on pie sales):
Example: Weekly sales are estimated to be reduced by
between 1.37 to 48.58 pies for each increase of $1 in the
selling price
i
b 2 / i
s t b
t
Coefficients Standard Error Lower 95% Upper 95%
Intercept 306.52619 114.25389 57.58835 555.46404
Price -24.97509 10.83213 -48.57626 -1.37392
Advertising 74.13096 25.96732 17.55303 130.70888
where t has
(n k 1) d.f.
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-37
Standard Deviation of the
Regression Model
coded as 0 or 1
2 1
+ +
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-45
Same
slope
Dummy-Variable Model Example
(with 2 Levels)
(continued)
x
1
(Price)
y (sales)
b
0
+ b
2
b
0
1 0 1 0
1 2 0 1 0
x b b (0) b x b b y
x b ) b (b (1) b x b b y
1 2 1
1 2 1
+ + +
+ + + +
Holiday
No Holiday
Different
intercept
H
o
l
i
d
a
y
N
o
H
o
l
i
d
a
y
If H
0
:
2
= 0 is
rejected, then
Holiday has a
significant effect
on pie sales
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-46
Sales: number of pies sold per week
Price: pie price in $
Holiday:
Interpreting the Dummy Variable
Coefficient (with 2 Levels)
Example:
1 If a holiday occurred during the week
0 If no holiday occurred
b
2
= 15: on average, sales were 15 pies greater in
weeks with a holiday than in weeks without a
holiday, given the same price
) 15(Holiday 30(Price) - 300 Sales +
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-47
Dummy-Variable Models
(more than 2 Levels)
Example:
y = house price ; x
1
= square feet
'
'
not if 0
level split if 1
x
not if 0
ranch if 1
x
3 2
3 2 1 0
x b x b x b b y
3 2 1
+ + +
b
2
shows the impact on price if the house is a
ranch style, compared to a condo
b
3
shows the impact on price if the house is a split
level style, compared to a condo
(continued)
Let the default category be condo
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-49
Interpreting the Dummy Variable
Coefficients (with 3 Levels)
With the same square feet, a
ranch will have an estimated
average price of 23.53
thousand dollars more than a
condo
With the same square feet, a
ranch will have an estimated
average price of 18.84
thousand dollars more than a
condo.
Suppose the estimated equation is
3 2 1
18.84x 23.53x 0.045x 20.43 y
+ + +
18.84 0.045x 20.43 y
1
+ +
23.53 0.045x 20.43 y
1
+ +
1
0.045x 20.43 y
+
For a condo: x
2
= x
3
= 0
For a ranch: x
3
= 0
For a split level: x
2
= 0
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-50
Model Building
Best-subset approach
Each x
i
is linearly related to y
) y
y ( e
i
Errors (or Residuals) are given by
Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-54
Residual Analysis
Non-constant variance
Constant variance
x x
r
e
s
i
d
u
a
l
s
r
e
s
i
d
u
a
l
s
Not Independent Independent
x
r
e
s
i
d
u
a
l
s
x
r
e
s
i
d
u
a
l
s
Developed adjusted R
2
Described multicollinearity
Stepwise regression