Professional Documents
Culture Documents
David Chow
Nov 2014
Learning Objectives
Understand the simple linear regression model
Model assumptions
Meaning of the coefficients b0 and b1
Chap 13-2
2
Overview
This PPT
Linear regression model
Estimate regression line
Measures of variation
Residual analysis
Statistical inference
Excel file
Eg: House price
Scatter plot
Reg: standard output
Reg: residuals
What We Know
1.
2.
Scatter plot
3.
4.
Hypothesis testing
Chap 13-4
Regression Analysis
Dependent variable (y): the variable we wish to
predict or explain
Independent variable (x): the variable used to
predict or explain y
Linear Regression:
Only one x
Chap 13-5
Examples
Y=___, X=___
Useful tip:
A visual check BEFORE running
regression is always helpful!
Dependent
Variable
Population
Slope
Coefficient
Independent
Variable
Yi 0 1Xi i
Linear component
Random
Error term
Assumptions
1. X and Y are linearly related
2. 0 and 1 are population parameters
3. Given an X-value (say, Xi), Y= Yi is random, because of
4. Assume E()=0, so that E(Y)=____
Chap 13-8
Yi 0 1Xi i
Population Regression
line: E(Y) = 0 +1 X
Observed Value
of Y for Xi
i
Predicted Value
of Y for Xi
Intercept = 0
Slope = 1
Random Error
for this Xi value
Xi
X
Chap 13-9
Linearity
Independence of errors
Normality of error
Estimate of
intercept
Estimate of slope
Yi b0 b1Xi
Value of X for
observation i
Chap 13-12
Chap 13-13
Interpreting b0 and b1
Chap 13-14
Chap 13-15
Square Feet
(X)
245
1400
312
1600
279
1700
308
1875
450
400
350
300
250
200
150
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Chap 13-16
Using Excel
1. Choose Data
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Chap 13-18
Intercept
= 98.248
450
400
350
300
250
200
150
Slope
= 0.10977
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
Eg: Interpreting bo
houseprice 98.24833 0.10977 (square feet)
Chap 13-20
Eg: Interpreting b1
houseprice 98.24833 0.10977 (square feet)
Chap 13-21
Chap 13-22
Making Predictions
General rule:
Predict only within the relevant range of Xs
500
1000
1500
2000
Square Feet
2500
3000
Measures of Variation
and r2
Measures of Variation
SST
Total Sum of
Squares
SSR
Regression Sum
of Squares
SST ( Yi Y)2
Y)2
SSR ( Y
i
SSE
Error Sum of
Squares
)2
SSE ( Yi Y
i
where:
Chap 13-25
Measures of Variation
(Total Variation)
Chap 13-26
Measures of Variation
Y
Yi
SSE = (Yi - Yi )2
_
Y
Xi
_
Y
X
Chap 13-27
Coefficient of Determination, r2
SST
total sum of squares
2
0 r 1
2
Chap 13-28
Examples of r2
Y
0 < r2 < 1
Q: Graph for r2 = 1?
X
Chap 13-29
0.58082
SST 32600.5000
2
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Chap 13-30
SSE
SYX
n2
(Yi Yi ) 2
i 1
n2
where
SSE = error sum of squares
n = sample size
Chap 13-31
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
SYX 41.33032
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Chap 13-32
large SYX
small SYX
Salary
32
42
52
61
62
66
65
Residual Analysis
(Autocorrelation is NOT covered)
Residual Analysis
ei Yi Y
i
L: Linearity
Y
Not Linear
residuals
residuals
Linear
Chap 13-37
I: Independence
Not Independent
residuals
residuals
residuals
Independent
Chap 13-38
N: Normality
Stem-and-Leaf
Histogram
Normal Probability Plot, etc.
Chap 13-39
E: Equal Variance
Y
Non-constant variance
residuals
residuals
Constant variance
Chap 13-40
60
Residuals
40
Residuals
Predicted
House Price
80
251.92316
-6.923162
273.87671
38.12329
284.85348
-5.853484
304.06284
3.937162
-40
218.99284
-19.99284
-60
268.38832
-49.38832
356.20251
48.79749
367.17929
-43.17929
254.6674
64.33264
10
284.85348
-29.85348
20
0
-20
1000
2000
3000
Square Feet
SSX
S YX
2
(X
X
)
i
where:
Sb1
S YX
H0: 1 = 0
H1: 1 0
Test statistic
t STAT
b1 1
Sb
d.f. n 2
where:
b1 = regression slope
coefficient
1 = hypothesized slope
Sb1 = standard
error of the slope
Chap 13-44
Intercept
Square Feet
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
b1
Sb1
t STAT
b1 1
Sb
0.10977 0
3.32938
0.03297
Chap 13-45
a 0.05
d.f. 10 2 8
a/2=.025
Reject H0
a/2=.025
Do not reject H0
-t/2
-2.3060
Decision: Reject H0
Reject H0
t/2
2.3060
3.329
Chap 13-46
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
p-value
Chap 13-47
F Test statistic:
where
H1: 1 0
MSR
FSTAT
MSE
MSR
SSR
k
MSE
SSE
n k 1
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
FSTAT
MSR 18934.9348
11.0848
MSE 1708.1957
41.33032
Observations
10
p-value for
the F-Test
ANOVA
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Significance F
0.01039
Chap 13-49
H0: 1 = 0 H1: 1 0
a = .05
df1= 1
df2 = 8
FSTAT
MSR
11.08
MSE
Decision:
Critical Value:
Reject H0 at a = 0.05
Fa = 5.32
a = .05
Do not
reject H0
Reject H0
F.05 = 5.32
Chap 13-50
b1 t / 2S b
d.f. = n - 2
Excel Printout
Intercept
Square Feet
Coefficients
Standard Error
t Stat
P-value
98.24833
0.10977
Lower 95%
Upper 95%
58.03348
1.69296
0.12892
-35.57720
232.07386
0.03297
3.32938
0.01039
0.03374
0.18580
Hypotheses
H0: = 0
H1: 0
Test statistic
t STAT
r -
1 r2
n2
where
r r 2 if b1 0
r r 2 if b1 0
Chap 13-52
H0: = 0
H1: 0
(No correlation)
(correlation exists)
a =.05 , df = 10 - 2 = 8
t STAT
r
1 r2
n2
.762 0
1 .7622
10 2
3.329
Critical Value =
Decision:
Chap 13-53
Y = b0+b1Xi
Prediction Interval
for an individual Y,
given Xi
Xi
X
Chap 13-54
2.
3.
Chap 13-55
5.
6.
Chap 13-56
Review Question 4
4. Shown below is a portion of an Excel printout for a regression
analysis relating Y (quantity demanded) and X (unit price).
ANOVA
Regression
Residual
Total
Intercept
X
df
1
46
47
SS
5048.818
3132.661
8181.479
Coefficients
80.390
-2.137
Standard Error
3.102
0.248
Review Question 5:
Restaurant Ratings
A regression model is developed to relate the price per person
and the sum of the ratings for food, decor, and service
1. Find r2
2. Calculate and Interpret the regression coefficients
3. At the 0.05 level of significance, is there evidence of a linear
relationship between the price per person and the summated
rating?
4. Construct a 95% confidence interval for the slope
Appendix
b1
( x x )( y y )
(x x )
i
b0 y b1 x