COURSE 6 ECONOMETRICS 2009 Regression

The Simple Linear
Regression Model
MDA course 6
Purpose of Regression and
Correlation Analysis
• Regression Analysis is Used Primarily for
Prediction
A statistical model used to predict the values of a
dependent or response variable based on values of
at least one independent or explanatory variable
Correlation Analysis is Used to Measure

Strength of the Association Between
Numerical Variables
The Scatter Diagram
Plot of all (Xi , Yi) pairs
Axis
100
Title
50
0 Axis Title
0 20 40 60
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
Negative Linear Relationship No Relationship

Simple Linear Regression
Model
• Relationship Between Variables Is a Linear Function
• The Straight Line that Best Fit the Data
Y intercept Random
Error
Yi   0   1 X i   i
Dependent
(Response) Independent
Slope (Explanatory)
Variable
Variable
Error Variable: Required
Conditions
 The error  is a critical part of the regression
model.
 Four requirements involving the distribution of 
must be satisfied.
 The probability distribution of  is normal.
 The mean of  is zero: E() = 0.
 The standard deviation of  is s for all values of x.
 The set of errors associated with different values of y
are all independent.
6
Population
Linear Regression Model
Y Yi   0  1X i   i Observed
Value
i = Random Error
m   0  1X i
YX
X
Observed Value
Sample Linear Regression
Model

Y i  b0  b1X i

Yi = Predicted Value of Y for observation i
Xi = Value of X for observation i
b0 = Sample Y - intercept used as estimate of

the population 0
b1 = Sample Slope used as estimate of the
population 1
To calculate the estimates of the The regression equation that
coefficients estimates
that minimize the differences the equation of the first
between the data order linear model
points and the line, use the is:
formulas:
cov( X , Y )
b1  ŷ  b 0  b1x
s 2x
b 0  y  b1 x
9
REGRESSION COEFFICIENTS
 To calculate the estimates of the coefficients that

minimize the differences between the data points and
the line, use the formulas ( least squares method):
n X iYi  ( X i )( Yi )
b1 et b0  Y  b1 X
n( X )  ( X i )
i
2 2
 EXCEL offers several approaches to regression,

including trendlines, regression functions and the
regression analysis tool
Simple Linear Regression
Equation: Example
Annual
Store Square Sales
You wish to examine the Feet ($000)
relationship between the 1 1,726 3,681
square footage of produce
2 1,542 3,395
stores and its annual sales.
Sample data for 7 stores 3 2,816 6,653
were obtained. Find the 4 5,555 9,543
equation of the straight 5 1,292 3,318
line that fits the data best 6 2,208 5,563
7 1,313 3,760
Scatter Diagram Example
12000
Annua l Sa le s ($000)
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
S q u a re F e e t
Excel Output
Equation for the Best
Straight Line

Y i  b0  b1 X i
 1636 . 415  1 . 487 X i
From Excel Printout:

C o e ffi c i e n ts
I n te r c e p t 1 6 3 6 .4 1 4 7 2 6
X V a ria b le 1 1 .4 8 6 6 3 3 6 5 7
Graph of the Best
Straight Line
12000
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
S q u a re F e e t
Interpreting the Results

Yi = 1636.415 +1.487Xi
The slope of 1.487 means for each increase of one

unit in X, the Y is estimated to increase 1.487units.
For each increase of 1 square foot in the size of the

store, the model predicts that the expected annual
sales are estimated to increase by $1487.
Inferences about the Slope: t
Test
• t Test for a Population Slope
Is a Linear Relationship Between X & Y ?
•Null and Alternative Hypotheses
H0: 1 = 0 (No Linear Relationship)
H1: 1  0 (Linear Relationship)
b1   1 SYX
•Test Statistic: t  Where Sb 
S b1 1 n
( Xi  X )
2
i 1
and df = n - 2
Standard Error of Estimate
n 
 ( Yi  Yi )
SSE 2
Syx  = i 1
n2
n2
The standard deviation of the variation of

observations around the regression line
Graph of the Best
Straight Line
12000
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
S q u a re F e e t
Example: Produce Stores
Data for 7 Stores:

Regression
Annual Model Obtained:
Store Square
Feet
Sales
($000)

Yi = 1636.415 +1.487Xi
1 1,726 3,681
2 1,542 3,395 The slope of this model
3 2,816 6,653 is 1.487.
4 5,555 9,543 Is there a linear
5 1,292 3,318 relationship between the
6 2,208 5,563 square footage of a store
7 1,313 3,760 and its annual sales?
Inferences about the
Slope: t Test Example
 H0: 1 = 0 Test Statistic:

 H1: 1  0 From Excel Printout
t S tat P-valu e
a  .05 I n te r c e p t 3.6244333 0.0151488
df  7 - 2 = 7 X V a ria b le 1 9.009944 0.0002812
Critical Value(s): Decision:

Reject Reject Reject H0
.025 .025
Conclusion:
There is evidence of a
-2.5706 0 2.5706
t linear relationship.
Inferences about the Slope:
Confidence Interval Example
Confidence Interval Estimate of the Slope

b1 tn-2 Sb1
Excel Printout for Produce Stores
L o w er 95% U p p er 95%
I n te r c e p t 475.810926 2797.01853
X V a r i a b l e 11 . 0 6 2 4 9 0 3 7 1.91077694
At 95% level of Confidence The confidence Interval for the

slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear relationship
between annual sales and the size of the store.
Measures of Variation:
The Sum of Squares
SST = Total Sum of Squares
•measures_the variation of the Yi values around their
mean Y
SSR = Regression Sum of Squares
•explained variation attributable to the relationship
between X and Y
SSE = Error Sum of Squares
•variation attributable to factors other than the
relationship between X and Y
Measures of Variation: The
Sum of Squares
Y 
SSE =(Yi - Yi )2
_
SST = (Yi - Y)2
 _
SSR = (Yi - Y)2
_
Y
X
Xi
Measures of Variation
The Sum of Squares: Example
Excel Output for Produce Stores

df SS
R e g r e ssi o n 1 30380456.12
R e si d u a l 5 1871199.595
T o ta l 6 32251655.71
SSR SSE SST

 Testing the validity of the model
 We pose the question:

Is there at least one independent variable linearly
related to the dependent variable?
 To answer the question we test the hypothesis
H0: 1 = 0
H1: At least one i is not equal to 0
 If at least one i is not equal to zero, the model is valid.

ANOVA - Summary Table
Source of Degrees Sum of Mean F Test

Variation of Squares Square Statistic
Freedom (Variance)
MSR
Explained k-1 SSR MSR = =
MSE
(Factor) SSR/(k - 1)
Within n-k SSE MSE =
(Error) SSE/(n - k)
Total n-1 SST =
SSR+SSE
 To test these hypotheses we perform an analysis
of variance procedure.
 The F test
 Construct the F statistic
MSR=SSR/k-1
MSR
[Variation in y] = SSR + SSE. F
Large F results from a large SSR. MSE
Then, much of the variation in y is
explained by the regression
 Rejection regionmodel. MSE=SSE/(n-k)
The null hypothesis should
be rejected; thus, the model is valid.
F >Fa,k,n-k Required conditions
must be satisfied.
The Coefficient of
Determination
SSR regression sum of squares

r2 = =
SST total sum of squares
Measures the proportion of variation that is

explained by the independent variable X in
the regression model
Coefficients of Determination
(r2) and Correlation (r)
Y r2 = 1, r = +1 Y r2 = 1, r = -1
^=b +b X
Yi 0 1 i
^=b +b X
Yi 0 1 i
X X
Yr2 = .8, r = +0.9 Y r2 = 0, r = 0
^=b +b X
Y ^=b +b X
Y
i 0 1 i i 0 1 i
X X
Measures of Variation:
Example
Excel Output for Produce Stores
R e g r e ssi o n S ta ti sti c s
M u lt ip le R 0 .9 7 0 5 5 7 2
R S q u a re 0 .9 4 1 9 8 1 2 9
A d ju s t e d R S q u a re 0 .9 3 0 3 7 7 5 4
S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7
O b s e r va t i o n s 7
r2 = .94 Syx
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage
Estimation of
Predicted Values
Confidence Interval Estimate for mXY

The Mean of Y given a particular Xi
Size of interval vary according to
Standard error distance away from mean, X.
of the estimate
1 ( Xi  X ) 2
Ŷi  t n  2  Syx  n
n  ( X  X )2
t value from table i
i 1
with df=n-2
Estimation of
Predicted Values
Confidence Interval Estimate for
Individual Response Yi at a Particular Xi
Addition of this 1 increased width of
interval from that for the mean Y
1 ( Xi  X ) 2
Ŷi  t n  2  Syx 1  n
n  ( X  X )2
i
i 1
Interval Estimates for
Different Values of X
Confidence Interval Confidence
for a individual Yi Interval for the
Y mean of Y
_ X
X A Given X
Example: Produce Stores
Data for 7 Stores:

Annual
Store Square Sales Predict the annual
Feet ($000)
sales for a store with
1 1,726 3,681 2000 square feet.
2 1,542 3,395
3 2,816 6,653 Regression Model Obtained:
4 5,555 9,543
5 1,292 3,318 
6 2,208 5,563
Yi = 1636.415 +1.487Xi
7 1,313 3,760
Estimation of Predicted
Values: Example
Confidence Interval Estimate for Individual Y

Find the 95% confidence interval for the average annual sales
for stores of 2,000 square feet

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706
1 ( X i  X )2
Ŷi  t n  2  Syx  n = 4610.45  980.97
n  ( X  X )2
i
i 1 Confidence interval for mean Y
Estimation of Predicted
Values: Example
Confidence Interval Estimate for mXY
Find the 95% confidence interval for annual sales of one
particular store of 2,000 square feet

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706
1 ( X i  X )2
Ŷi  t n  2  Syx 1  n = 4610.45  1853.45
n  ( X  X )2
i
i 1
Confidence interval for individua
Y

COURSE 6 ECONOMETRICS 2009 Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COURSE 6 ECONOMETRICS 2009 Regression

Uploaded by

Copyright:

Available Formats

The Simple Linear

Correlation Analysis is Used to Measure

Plot of all (Xi , Yi) pairs

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

• The Straight Line that Best Fit the Data

Xi = Value of X for observation i

b0 = Sample Y - intercept used as estimate of

 To calculate the estimates of the coefficients that

 EXCEL offers several approaches to regression,

From Excel Printout:

The slope of 1.487 means for each increase of one

For each increase of 1 square foot in the size of the

The standard deviation of the variation of

Data for 7 Stores:

 H0: 1 = 0 Test Statistic:

df  7 - 2 = 7 X V a ria b le 1 9.009944 0.0002812

Critical Value(s): Decision:

Confidence Interval Estimate of the Slope

At 95% level of Confidence The confidence Interval for the

Excel Output for Produce Stores

SSR SSE SST

 We pose the question:

 If at least one i is not equal to zero, the model is valid.

Source of Degrees Sum of Mean F Test

SSR regression sum of squares

Measures the proportion of variation that is

Yr2 = .8, r = +0.9 Y r2 = 0, r = 0

Confidence Interval Estimate for mXY

Data for 7 Stores:

Confidence Interval Estimate for Individual Y

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

You might also like