Professional Documents
Culture Documents
Objectives
After this, you should be able to:
+Ve
Curvilinear relationships
y
Demand &
Supply
x
y
-Ve
x
y
Altitude &
Temperature
Weak relationships
y
x
y
x
y
x
y
Correlation Coefficient
(continued)
Features of and r
Unit free
Range between -1 and 1
The closer to -1, the stronger the negative
linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship
Examples of Approximate
r Values
y
r = -1
r = -.6
r=0
r = +.3
r = +1
Calculating the
Correlation Coefficient
Sample correlation coefficient:
( x x )( y y )
[ ( x x ) ][ ( y y ) ]
2
n xy x y
[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Calculation Example
Tree
Height
Trunk
Diameter
xy
y2
x2
35
280
1225
64
49
441
2401
81
27
189
729
49
33
198
1089
36
60
13
780
3600
169
21
147
441
49
45
11
495
2025
121
51
12
612
2601
144
=321
=73
=3142
=14111
=713
Calculation Example
(continued)
Tree
Height,
y 70
n xy x y
60
50
40
8(3142) (73)(321)
[8(713) (73)2 ][8(14111) (321)2 ]
0.886
30
20
10
0
0
10
Trunk Diameter, x
12
14
Excel Output
Excel Correlation Output
Tools / data analysis / correlation
Tree Height
Trunk Diameter
Correlation between
Tree Height and Trunk Diameter
Test statistic
r
1 r
n2
2
=.05 , df = 8 - 2 = 6
t
r
1 r 2
n2
.886
1 .8862
82
4.68
r
1 r 2
n2
.886
1 .8862
82
Decision:
Reject H0
4.68
Conclusion:
There is
evidence of a
linear relationship
at the 5% level of
significance
d.f. = 8-2 = 6
/2=.025
Reject H0
-t/2
-2.4469
/2=.025
Do not reject H0
Reject H0
t/2
2.4469
4.68
Introduction to Regression
Analysis
is
No Relationship
Population
Slope
Coefficient
Independent
Variable
y 0 1x
Linear component
Random
Error
term, or
residual
Random Error
component
Linear Regression
Assumptions
Error values () are statistically independent
Error values are normally distributed for any
given value of x
y 0 1x
(continued)
Observed Value
of y for xi
i
Predicted Value
of y for xi
Slope = 1
Random Error
for this x value
Intercept = 0
xi
Estimate of
the regression
intercept
Estimate of the
regression slope
y i b0 b1x
Independent
variable
e (y y)
2
(y (b
b1x))
( x x )( y y )
(x x)
2
algebraic equivalent:
b1
x y
xy
n
2
(
x
)
2
x n
and
b0 y b1 x
Interpretation of the
Slope and the Intercept
b0 is the estimated average value of y
when the value of x is zero
b1 is the estimated change in the
average value of y as a result of a
one-unit change in x
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficient
s
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Graphical Presentation
House price model: scatter plot and
regression line
House Price ($1000s)
450
Intercept
= 98.248
400
350
Slope
= 0.10977
300
250
200
150
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
Interpretation of the
Intercept, b0
house price 98.24833 0.10977(squarefeet)
Interpretation of the
Slope Coefficient, b1
house price 98.24833 0.10977(squarefeet)
(
y
y
)
minimum (minimized
)
The simple regression line always passes
through the mean of the y variable and the
mean of the x variable
The least squares coefficients are unbiased
estimates of 0 and 1
SST SSE
Total sum
of Squares
SST ( y y)2
Sum of
Squares Error
SSE ( y y )2
SSR
Sum of Squares
Regression
SSR ( y y)2
where:
2
SSE = (yi - yi )
_
y
Xi
_
y
Coefficient of Determination, R2
The coefficient of determination is the
portion of the total variation in the
dependent variable that is explained by
variation in the independent variable
The coefficient of determination is also
called R-squared and is denoted as R2
SSR
R
SST
2
where
0 R2 1
Coefficient of Determination,
2
R
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R
SST
total sum of squares
2
R r
2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Examples of Approximate
R2 Values
y
R2 = 1
R2 = 1
R2
= +1
Examples of Approximate
R2 Values
y
0 < R2 < 1
Examples of Approximate
R2 Values
R2 = 0
No linear relationship
between x and y:
R2 = 0
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R
Square
0.52842
Standard Error
SSR 18934.9348
R
0.58082
SST 32600.5000
2
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficient
s
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance
F
0.01039
Lower 95%
Upper
95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
SSE
n k 1
Where
SSE = Sum of squares error
n = Sample size
k = number of independent variables in the model
sb1
(x x)
( x)
x n
where:
SSE
= Sample standard error of the estimate
n2
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R
Square
0.52842
Standard Error
s 41.33032
41.33032
Observations
sb1 0.03297
10
ANOVA
df
SS
MS
F
11.084
8
Regression
18934.9348
18934.934
8
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficien
ts
Intercept
Square Feet
98.24833
0.10977
Standard Error
58.03348
0.03297
Significance
F
0.01039
t Stat
Pvalue
1.69296
0.1289
2
-35.57720
232.0738
6
3.32938
0.0103
9
0.03374
0.18580
Lower 95%
Upper
95%
small s
small sb1
large sb1
large s
Test statistic
b1 1
t
sb1
d.f. n 2
where:
b1 = Sample regression slope
coefficient
1 = Hypothesized slope
sb1 = Estimator of the standard
error of the slope
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
H0: 1 = 0
HA: 1 0
b1
Coefficie
nts
Standard
Error
Intercept
d.f. = 10-2 = 8
Square
Feet
/2=.025
Reject H0
98.24833
0.10977
s b1
t Stat
Pvalue
58.03348
1.692
96
0.128
92
0.03297
3.329
38
0.010
39
/2=.025
Do not reject H0
-t/2
-2.3060
Reject H
0
t/2
2.3060 3.329
Decision: Reject H0
There is sufficient evidence
that square footage affects
house price
b1 t /2 sb1
d.f. = n - 2
Standard
Error
t Stat
P-value
Lower 95%
Upper
95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Standard
Error
t Stat
P-value
Lower 95%
Upper
95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
1 (x p x)
n (x x)2
2
y t /2 s
1 (x p x)
1
2
n (x x)
2
y t /2 s
Interval Estimates
for Different Values of x
y
Prediction Interval
for an individual y,
given xp
Confidence
Interval for
the mean of
y, given xp
xp
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
y t /2s
(x p x)2
317.85 37.12
2
n (x x)
Estimation of Individual
Values: Example
Prediction Interval Estimate for y|xp
Find the 95% confidence interval for an individual
house with 2,000 square feet
y t /2s 1
(x p x)2
317.85 102.28
2
n (x x)
Residual Analysis
Purposes
Examine for linearity assumption
Examine for constant variance for all
levels of x
Evaluate normal distribution assumption
Graphical Analysis of Residuals
Can plot residuals vs. x
Can create histogram of residuals to
check for normality
Not Linear
residuals
residuals
Linear
x
Non-constant variance
residuals
residuals
Constant variance
Excel Output
RESIDUAL OUTPUT
80
251.92316
-6.923162
60
273.87671
38.12329
40
284.85348
-5.853484
304.06284
3.937162
218.99284
-19.99284
-20
268.38832
-49.38832
-40
356.20251
48.79749
-60
367.17929
-43.17929
254.6674
64.33264
10
284.85348
-29.85348
Residuals
Predicted
House
Price
20
0
0
1000
2000
Square Feet
3000
Summary
Introduced correlation analysis
Discussed correlation to measure the
strength of a linear association
Introduced simple linear regression
analysis
Calculated the coefficients for the simple
linear regression equation
measures of variation (R2 and s)
Addressed assumptions of regression
and correlation
Summary
(continued)