Professional Documents
Culture Documents
y = 0 + 1x + u
Some Terminology
In the simple linear regression model, where y = 0 + 1x + u, we typically refer to y as the
Independent Variable, or Right-Hand Side Variable, or Explanatory Variable, or Regressor, or Covariate, or Control Variables
Economics 20 - Prof. Anderson 3
A Simple Assumption
The average value of u, the error term, in the population is 0. That is, E(u) = 0 This is not a restrictive assumption, since we can always use 0 to normalize E(u) to 0
Economics 20 - Prof. Anderson 4
E(y|x) as a linear function of x, where for any x the distribution of y is centered about E(y|x)
y
f(y)
.
x1 x2
Economics 20 - Prof. Anderson
. E(y|x) = + x
0 1
Population regression line, sample data points and the associated error terms
y y4 E(y|x) = 0 + 1x . u4 {
y3 y2
u2 {.
.} u3
y1
} u1
x1
x2
x3
x4
x
8
n n
( y
n i= 1 n i= 1
0 1 xi = 0
xi yi 0 1 xi = 0
Economics 20 - Prof. Anderson 12
y = 0 + 1 x , or 0 = y 1 x
Economics 20 - Prof. Anderson 13
( (
xi ( yi y ) = 1 xi ( xi x )
i =1 n i =1 2 ( xi x )( yi y ) = 1 ( xi x ) i =1 i =1
Economics 20 - Prof. Anderson 14
1 =
( x x )( y
i =1 i n i =1 i n
y)
2
( x x)
i =1
More OLS
Intuitively, OLS is fitting a line through the sample points such that the sum of squared residuals is as small as possible, hence the term least squares The residual, , is an estimate of the error term, u, and is the difference between the fitted line (sample regression function) and the sample point
Economics 20 - Prof. Anderson 17
Sample regression line, sample data points and the associated estimated error terms
y y4 4 {
.
y = 0 + 1 x
y3 y2
2 {.
.} 3
y1
} 1 .
x1 x2 x3 x4 x
18 Economics 20 - Prof. Anderson
i ) = yi 0 1 xi (u
2 i =1 i =1
Economics 20 - Prof. Anderson
19
(y
n i =1 n i
1 xi = 0 0
xi yi 0 1 xi = 0
i =1
Economics 20 - Prof. Anderson 20
u
i =1
=0
x u
i =1 i
=0
y = 0 + 1 x
Economics 20 - Prof. Anderson 22
More terminology
We can think of each observatio n as being made up of an explained part, and an unexplaine d part, yi = yi + ui We then define the following :
( y y ) is the total sum of squares (SST) ( y y ) is the explained sum of squares (SSE) u is the residual sum of squares (SSR)
2 2 i i 2 i
Economics 20 - Prof. Anderson 23
24
Goodness-of-Fit
How do we think about how well our sample regression line fits our sample data? Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression R2 = SSE/SST = 1 SSR/SST
Economics 20 - Prof. Anderson 25
Unbiasedness of OLS
Assume the population model is linear in parameters as y = 0 + 1x + u Assume we can use a random sample of size n, {(xi, yi): i=1, 2, , n}, from the population model. Thus we can write the sample model yi = 0 + 1xi + ui Assume E(u|x) = 0 and thus E(ui|xi) = 0 Assume there is variation in the xi
Economics 20 - Prof. Anderson 27
1
2 x
( x x) y =
i
2 x
, where
s ( xi x )
28
( x x ) y = ( x x )( + x ( x x ) + ( x x ) x + ( x x )u = ( x x ) + ( x x )x + ( x x )u
i i i i 0 0 i 1 i i i 0 i 1 i i i i
Economics 20 - Prof. Anderson
1 i
+ ui ) =
29
( x x ) = 0, ( x x)x = ( x x)
i i i i 2 1 x
( x x )u +
i
2 x
30
( )
Unbiasedness Summary
The OLS estimates of 1 and 0 are unbiased Proof of unbiasedness depends on our 4 assumptions if any assumption fails, then OLS is not necessarily unbiased Remember unbiasedness is a description of the estimator in a given sample we may be near or far from the true parameter
Economics 20 - Prof. Anderson 32
Homoskedastic Case
y
f(y|x)
.
x1 x2
Economics 20 - Prof. Anderson
. E(y|x) = + x
0 1
35
Heteroskedastic Case
f(y|x)
y
.
x1 x2 x3
.
Economics 20 - Prof. Anderson
E(y|x) = 0 + 1x
x
36
( )
d i2Var ( ui )
2
1 d = sx2
2
d i2 =
1 2 2 = Var 1 sx = 2 2 sx sx
Economics 20 - Prof. Anderson
( )
37
) (
( )
se 1 = / ( xi x )
( )
41