Simple Regression

1
Simple Linear Regression
Simple linear regression is probably the starting point of constructing econometrics theory. The idea behind regression is to explain the dependence of one
variable yi usually referred to as dependent on another variable xi (independent)
using a simple form
yi = + xi + "i ;
where is the intercept and is the slope, "i is unobserved random variable
called error. Subscript i indicates that this observation belongs to individual i.
Parameters and are xed and unknown to the researcher. Since and E"i
are not separately identied the restriction E"i = 0 is impossed. Then
Eyi =
+ xi ;
but in fact what we have is the conditional mean

E (yi jxi ) =
+ xi ;
which is assumed to be linear in xi : In real applications it is almost never linear.

We just hope that the specied conditional mean is a reasonable approximation.
If we assume that the pair yi ; xi has bivariate normal distribution then yi
conditional on xi is indeed linear. Bivariate normality of yi ; xi seems to be a
reasonable assumption for a number of application, including for example dental
care expenditure and income.
The term regression goes back to the work of Francis Galton who investigated
the relationship between heights of fathers and sons. In general tall fathers tend
to have tall sons and short fathers tend to have short sons. However, very tall
fathers tend to have shorter sons and very short fathers tend to have taller sons.
Galton called this regression towards the mean and that is why we call this
model linear regression.
1.1
Least Squares Estimator
Assume that n independent individuals are observed i = 1; :::; n. Dene the

following quantities
n
x =
Sxx
1X
1X
xi and y =
yi
n i=1
n i=1
n
X
(xi
x) and Syy =
i=1
Sxy
n
X
n
X
i=1
(xi
x) (yi
y) :
i=1
Also dene the residual sum of squares as

RSS =
n
X
(yi
i=1
( + xi ))
(yi
y)
The least squares estimates of parameters

and b that minimize RSS.
n
X
i=1
Theorem
n
X
min
(xi
a
n
X
b + b xi
yi
are dened as such values b
and
yi
b xi
(xi
x) :
i=1
a) =
i=1
n
X
i=1
We leave it as an exercise to prove it. Then the minimizing value b is

n
b=
Then
n
X
1X
yi
n i=1
i=1
n
X
b + b xi
yi
b xi = y
(yi
b x:
b (xi
y)
i=1
= Syy
x)
2 b Sxy + b Sxx :
Taking the derivative of this and setting it to zero leads to the solution
b = Sxy ;
Sxx
2
which is a minimum since coe cient b is positive. The least squares estimates
of and are
1.2
OLS
b OLS
Sxy
=
Sxx
n
P
(xi
x) (yi
y)
i=1
n
P
;
(xi
x)
i=1
= y
OLS x:
Best Linear Unbiased Estimator
When there is no intercept in the regression the estimator of the slope coe cient
collapses to
n
P
xi yi
i=1
b
:
OLS = P
n
x2i
i=1
Next we show that the least square estimator is best linear unbiased estimator
(BLUE). This estimator is linear because it can be presented in the form
b=
n
X
i=1
d i yi :
It is also unbiased
n
P
Eb =
i=1
n
P
i=1
Since
Eb =
n
P
xi
i=1
n
P
xi Eyi
=
x2i
n
X
i=1
di Eyi =
i=1
xi
= :
x2i
n
X
di xi
i=1
the unbiasedness of the estimator means that

n
X
di xi = 1:
i=1
Assume that yi are independent with common variance

V ar b =
n
X
d2i
. Then
i=1
and to nd di so that the variance of estimator b is minimum we have to

n
n
P
P
minimize
d2i 2 subject to the constraint
di xi = 1: Minimize
i=1
i=1
n
X
n
X
d2i
i=1
di xi
1 :
i=1
The derivative with respect to di gives

2di
xi
di
0
xi
2
Multiply by xi and sum over i
n
X
di xi
di xi
i=1
Then since
n
P
i=1
x2i
2
n
X
2 i=1
di xi = 1
2
= P
:
n
x2i
i=1
Then
xi
di = P
n
x2i
i=1
x2i
which gives the least squares estimator
1.3
OLS
n
P
xi yi
= i=1
:
n
P
x2i
i=1
Conditional Normal Model
We made very few assumptions to derive the least squares estimator. Specically, we specied the conditional mean of yi , its variance and assumed statistical
independence of the individual observations. As a results of these assumptions
we were able to derive the least squares estimator and show its unbiasedness
and e ciency.
Now let us make a much stronger assumption that
yi
+ xi ;
; i = 1; :::; n:
Alternatively it can be dened as

yi
"i
iid
n 0;
+ xi + "i ;
; i =
1; :::; n:
Then the joint pdf is given by

f yj ; ;
n
Y
i=1
n
Y
f yi j ; ;
1
p
=
2
i=1
=
exp
1
n=2
(2 )
"
(yi
xi )
2
n
P
6
exp 6
4
#
2
(yi
xi )
i=1
7
7:
5
To nd the maximum likelihood estimates we nd the log-likelihood function
log f yj ; ;
n
log
2
n
log (2 )
2
From this we see that for any xed value

imized where the lesat square function
n
X
xi )
i=1
(yi
xi )
i=1
the log-likelihood function is max-
(yi
n
P
is minimized and therefore the ML estimates of parameters

equal the OLS estimates
ML
bM L
Sxy
=
Sxx
n
P
x) (yi
(xi
should
y)
i=1
n
P
and
;
2
(xi
x)
i=1
= y
M L x:
Substituting these in the log-likelihood function

n
log (2 )
2
n
log
2
we maximize it with respect to

zero
n 1
+
2 2
n
P
yi
i=1
b xi
. The derivative with respect to
n
P
yi
i=1
which gives the solution
1X
yi
n i=1
is set to
=0
b xi
b xi
b2 =
which is the RSS evaluated at the least squares line divided by the sample size.
Dene the residusl from the regression to be
b
"i = yi
V arb2 =
b xi :
Since b and b are unbiased estimates Eb

"i = 0: It can be shown that
Thus an unbiased estimator of
would be
n
sb =
b2
n 2
2
Theorem
The sampling distributions of the estimators b , b for the conditional normal
model are
!
n
2 X
2
x ;
b
n
;
nSxx i=1 i
cov b ; b
Sxx
5
Sxx
Also b ; b and sb2 are independent and

2) sb2
(n
2
n 2
Proof
Since b and b are linear combinations of normally distributed yi they must
be normal. Then
E b OLS
V ar b
n
P
(xi
x) E (yi
y)
i=1
n
P
=
2
(xi
x)
n
P
n
P
i=1
n
P
x)
=
(xi
x)
i=1
2
(xi
x) V ar (yi
y)
i=1
n
P
x) (xi
(xi
i=1
2
2
(xi
x)
Sxx
i=1
It can be shown that estimator b OLS can be expressed as

b OLS =
where
ci =
1
n
n
X
(xi
Then it could be shown that E b OLS =
ci yi
i=1
x) x
:
Sxx
and V arb OLS =
nSxx
n
P
i=1
x2i and also
2
that cov b ; b = Sxxx ; which I leave as an exersize.
The second part of the theorem is somewhat more di cult to prove because
it involves considerable algebraical manipulations. In fact, it is much easier to
prove this using matrix algebra which we will do later. Here I just give a sketch
of the proof.
s2
Random variable (n 2)b
is a function of the estimated residuals b
" i = yi b
2
b xi : The regression itself could be thought of as a decomposition of the observed
values yi into the predicted value ybi = b + b xi and the residual b
"i = yi ybi :
They are uncorrelated because
n
X
i=1
where
n
P
b
"i = 0 and
i=1
n
P
i=1
ybib
"i = b
n
n
X
X
b
"i + b xib
"i = 0;
i=1
i=1
xib
"i = 0 because it is consistent the F.O.C. of the ML
estimator taken with respect to parameters b and b . Zero correlation implies

6
independence under normality. Moreover,

n
X
"2
i
2
2
n
i=1
by construction. And
n
X
"2i
i=1
n
X
(yi
xi ) =
i=1
n
X
ybi + ybi
(yi
i=1
n
n
X
X
=
b
"2i +
b
i=1
i=1
2
n 2
is a decomposition of two terms

this is not a formal proof.
and
b
2
2
x2i
sb
and
n
P
i=1
distributed respectively. However,
Statistical inference about parameters

following Students t-distributions
s
xi )
can be made based on the
tn
2;
tn
2:
x2i = (nSxx )
b
p
sb= Sxx
In fact, the joint distribution of these two t-statistics is a bivariate Students

t-distribution. Practitioner usually concentrate their interest on statistical inference about the slope parameter. If = 0 then there is no linear relationship
between yi and xi . The test of the null hypothesis
H0 :
=0
H1 :
6= 0;
against
can be based on
b
p
sb= Sxx
or equivalently on
b2
b2
sb2 =Sxx
F1;n
sb2 =Sxx
In fact,
=
tn
2:
2
2
2
Sxy
=Sxx
Sxy
=Sxx
=
:
sb2 =Sxx
RSS= (n 2)
2
The quantity in the numerator Sxy
=Sxx has a special name: Regression sum of
squares. In the identity
n
X
n
X
y) =
(yi
n
X
y) +
(b
yi
i=1
i=1
ybi ) ;
(yi
i=1
Total sum of squares=Regression sum of square+Residual sum of square,

where ybi = b + b xi : It can also be shown that
n
X
2
y) = Sxy
=Sxx :
(b
yi
i=1
This is left as an exercise.

A measure of goodness of t usually used for linear regression is the coe cient of determination
n
P
Regression sum of square

= i=1
r2 =
n
P
Total sum of squares
(b
yi
y)
=
2
(yi
y)
2
Sxy
:
Sxx Syy
i=1
It is 0 6 r2 6 1 and it measures the proportion of variation in the dependent

variable explained by the regression line.
1.4
Consistency
The least squares estimator is consistent.
n
P
(xi
x) (yi
y)
i=1
n
P
(xi
x) [(xi
x)
i=1
n
P
(xi
+ (ui
u)]
n
P
(xi
x)
i=1
2
x)
i=1
x)
i=1
(xi
n
P
n
P
(xi
x) (ui
i=1
n
P
(xi
1
n
u)
=
x)
i=1
n
P
(xi
i=1
n
P
1
(xi
n
i=1
x) ui
:
2
x)
The rst term on this sum is

which is a constant. The second term is a
ratio of a weighted average of random variables ui and a non-stochastic term
n
P
2
1
(xi x) : We assume that as n ! 1 the numerical series is convergent
n
1
n
i=1
n
P
i=1
(xi
x)
! Qxx < 1: We can think of
mean of random variables (xi
x) ui such that
E (xi
x) ui
V ar (xi
x) ui
(xi
1
n
n
P
(xi
i=1
x) < 1:
x) ui as the sampling
An extension of the law of large numbers discussed before applied to independently but not identically distributed sequence of random variables can be
utilized here to establish that
" n
#
n
1X
1X
p
(xi x) ui ! E
(xi x) ui = 0:
n i=1
n i=1
Therefore,
b=
1.5
n
P
1
n
(xi
i=1
n
P
1
(xi
n
i=1
x) ui
x)
Asymptotic Normality
Given the condition
1
n
n
P
(xi
x)
! Qxx < 1 the series
i=1
1
n
n
P
(xi
x) ui
i=1
converges to a degenerate random variable with variance zero. Indeed

" n
#
n
1X
1X
2
(xi x) ui
(xi x) 2
V ar
=
n i=1
n2 i=1
" n
#
2
1X
2
(xi x)
! 0 as n ! 1
=
n n i=1
since the term in the brackets converges to the nite Qxx . Therefore, b
converges to a random variable the distribution of which is all probability mass
of one on the single value of zero. The rate of convergence of b
to zero is
very fast.
However,
this
convergence
could
be
slowed
down
by
multiplying
it
p
with n: Then
n
P
p1
(xi x) ui
n
p
i=1
b
:
n
=
n
P
2
1
(x
x)
i
n
i=1
Denote
such that
1 X
zn = p
(xi
n i=1
Ezn
V arzn
=
=
x) ui
0
n
2X
i=1
(xi
x) < 1:
Table 1: Summary statistics.

Lntdcexp
Lnincome
logarithm of total dental expenditure

logarithm of income
5.108
3.397
1.183
0.910
Then a central limit theorem could be applied to such an i.n.i.d. series of random
variables such that
zn Ezn d
p
! N (0; 1)
V arzn
or
d
zn ! N 0; 2 Qxx :
Then
p
n b
p1
n
=
1
n
n
P
i=1
n
P
(xi
x) ui
(xi
! N 0;
x)
Qxx1 :
i=1
Dental Expenditure
We investigate the eect of income on the total dental expenditure. The data
set is derived from the Medical Expenditure Panel Survey (MEPS). MEPS
is a nationally representative survey of health care, including dental, expenditure, sources of payment and insurance coverage for the US civilian noninstitutionalized population. We use data from the 1996, 1997, 1998, 1999 and
2000 surveys. The sampling scheme of the MEPS data is a two-year overlapping
panel, i.e., in each calendar year after the rst survey year, one sample of persons is in its second year of responses while another sample of persons is in its
rst year of responses. The sample is restricted to the U.S. population between
the ages of 25 and 64 years and those who are employed. Further we take only
those observations whose total dental expenditure and income are positive. To
reduce heterogeneity coming from the fact that the eect of income should be
structurally dierent for those with and without dental insurance we restrict our
sample to only those without dental insurance. The total number of observation
in the data set is 2737. We use logarithm retransformation of income and dental
expenditure to have more symmetric distributions. Tables 1 gives a summary
statistics.
Here is the plot of the data indicating that Lntdcexp and Lnincome are potentially bivaraite normally distributed. When two random variables (yi ; xi )
BN x ; y ; 2x ; 2y ; the conditional distribution of yi given xi is
E (yi jxi ) =
y
x
(x
x) :
This means that the bivariate normal model implies that yi is a linear function
of xi : The estimated regression isand the null hypothesis
10
10
8
6
lntdcexp
4
2
0
0
lnincome
Source
SS
df
MS
Number of obs =
F(
1,
2737
2735) =
32.37
Model
44.8104909
44.8104909
Prob > F
0.0000
Residual
3786.1013
2735
1.38431492
R-squared
0.0117
Total
lntdcexp
3830.91179
Coef.
2736
1.40018706
Std. Err.
Adj R-squared =
0.0113
Root MSE
1.1766
P>|t|
[95% Conf. Interval]
lnincome
.140637
.0247188
5.69
0.000
.0921676
.1891064
_cons
4.630147
.0869183
53.27
0.000
4.459715
4.800579
11
10
8
6
4
2
0
0
lnincome
lntdcexp
Fitted values
H0 :
=0
is rejected with the t-ratio of 5.69 and the p-value smaller that 0.1%. The
plotted regression line isAs you can see from the regression results the linear
model is a poor t for the data. Just roughly 1% of variation in expenditure is
explained by income. More explanatory variables are needed to improve the t.
12
Problems
1. Prove that
n
X
min
(xi
a
i=1
V arb2M L =
b
3. Shown that estimator b OLS = y
ci =
x) :
n
X
can be expressed as
ci yi
i=1
(xi
1
n
OLS x:
b OLS =
where
(xi
i=1
2. Show that
Verify that E b OLS =
n
X
a) =
x) x
Sxx
and
2
V arb OLS =
n
X
nSxx i=1
x2i
and verify that
Solution:
b=y
Note that
cov b ; b =
bx =
n
X
i=1
1
yi
n
(xi
x) yi
Sxx
n
X
Sxx
ci
ci xi
i=1
13
n
X
i=1
i=1
n
X
1
n
(xi
x) x
Sxx
yi
which means that

Eb
V arb
n
X
c2i
i=1
n
X
n 6
X
61
6 2+
4n
i=1
(xi
n
P
2
(xi x) x2
n
P
2
(xi x)
i=1
3
61
6 +
n
4n
P
x) x 7
7
25
(xi x)
i=1
3
61
6
4n
i=1
32
7
7
27
5
n
2 X
7
7=
x2 :
nSxx i=1 i
25
x)
x2
(xi
i=1
Finally to verify the last fact we rst notice that
OLS
n
P
(xi
x) (yi
y)
i=1
n
P
x)
(xi
i=1
n
P
(xi
n
P
x) yi
i=1
(xi
x) y
i=1
n
P
(xi
x)
i=1
n
P
i=1
n
P
(xi
x) yi
:
(xi
x)
i=1
Then
cov b ; b
n
X
B
= cov B
@
=
i=1
n
X
i=1
1
n
1
n
(xi
x) x
Sxx
(xi
x) x
Sxx
n
P
yi ; i=1
(xi
Sxx
(xi x)
=
Sxx
4. Show that
n
X
i=1
(yi
y) =
n
X
(b
yi
i=1
14
y) +
n
X
i=1
(yi
x)
ybi ) :
Sxx
C
yi C
A
Solution: Need to show that the cross product is zero

n
X
(b
yi
ybi ) =
y) (yi
i=1
Substitute b = y
=
n
X
i=1
b x to get
n h
X
i=1
n
X
b (xi
(yi
b + b xi
ih
x) (yi
y) (xi
x)
i=1
because of the denition of b :
5. Show that
n
X
y)
b2
b (xi
n
X
b + b xi
yi
(xi
i
x)
2
x) = 0
i=1
2
y) = Sxy
=Sxx
(b
yi
i=1
Solution:
n
X
(b
yi
i=1
2
y) = b
2
n
X
(xi
2
x) = Sxy
=Sxx :
i=1
6. Run four simple regressions for the data sets provided in the table below.
Report your results and plot the data sets on four dierent graphs. What
conclusion
can
you
make
from
this?
Anscombes
quartet:
x1
y1
x2
y2
x3
y3
x4
y4
10.0 8.04
10.0 9.14 10.0 7.46
8.0
6.58
8.0
6.95
8.0
8.14 8.0
6.77
8.0
5.76
13.0 7.58
13.0 8.74 13.0 12.74 8.0
7.71
9.0
8.81
9.0
8.77 9.0
7.11
8.0
8.84
11.0 8.33
11.0 9.26 11.0 7.81
8.0
8.47
.
14.0 9.96
14.0 8.10 14.0 8.84
8.0
7.04
6.0
7.24
6.0
6.13 6.0
6.08
8.0
5.25
4.0
4.26
4.0
3.10 4.0
5.39
19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15
8.0
5.56
7.0
4.82
7.0
7.26 7.0
6.42
8.0
7.91
5.0
5.68
5.0
4.74 5.0
5.73
8.0
6.89
7. Let x and x be independent random variables with means 1 , 2 and
variances 21 ; 22 : Determine the correlation coe cient of x and z = x y
in terms of 1 , 2 , 21 ; 22 .
8. Suppose we estimate the model:
yi =
where ui
N 0;
; i = 1; :::; N:
15
+ ui ;
(a) Show that the OLS estimator of b = y:
(b) Directly obtain the variance of y.

9. Consider a model
y = ( + x)e
where y and x are scalar observables, e is unobservable. Let E[ejx] = 1
and V ar[ejx] = 1. How would you estimate and by OLS? How would
you construct the standard errors?
10. Let
zn ! N 0;
n
P
1
(xi
n
n !1 i=1
where Qxx = lim
(xi
Qxx ;
x) : Then show that
zn
n
P
1
d
2
x)
i=1
16
! N 0;
Qxx1 :

Simple Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Regression

Uploaded by

Copyright:

Available Formats

1

Simple Linear Regression

but in fact what we have is the conditional mean

which is assumed to be linear in xi : In real applications it is almost never linear.

Least Squares Estimator

Assume that n independent individuals are observed i = 1; :::; n. Dene the

Also dene the residual sum of squares as

The least squares estimates of parameters

are dened as such values b

We leave it as an exercise to prove it. Then the minimizing value b is

Best Linear Unbiased Estimator

the unbiasedness of the estimator means that

Assume that yi are independent with common variance

and to nd di so that the variance of estimator b is minimum we have to

The derivative with respect to di gives

Multiply by xi and sum over i

which gives the least squares estimator

Conditional Normal Model

Alternatively it can be dened as

Then the joint pdf is given by

To nd the maximum likelihood estimates we nd the log-likelihood function

From this we see that for any xed value

the log-likelihood function is max-

is minimized and therefore the ML estimates of parameters

Substituting these in the log-likelihood function

we maximize it with respect to

. The derivative with respect to

which gives the solution

Since b and b are unbiased estimates Eb

Also b ; b and sb2 are independent and

It can be shown that estimator b OLS can be expressed as

Then it could be shown that E b OLS =

x2i and also

estimator taken with respect to parameters b and b . Zero correlation implies

independence under normality. Moreover,

is a decomposition of two terms

distributed respectively. However,

Statistical inference about parameters

can be made based on the

In fact, the joint distribution of these two t-statistics is a bivariate Students

Total sum of squares=Regression sum of square+Residual sum of square,

This is left as an exercise.

Regression sum of square

It is 0 6 r2 6 1 and it measures the proportion of variation in the dependent

The least squares estimator is consistent.

The rst term on this sum is

! Qxx < 1: We can think of

mean of random variables (xi

Given the condition

! Qxx < 1 the series

converges to a degenerate random variable with variance zero. Indeed

Table 1: Summary statistics.

logarithm of total dental expenditure

[95% Conf. Interval]

3. Shown that estimator b OLS = y

Verify that E b OLS =

and verify that

which means that

Finally to verify the last fact we rst notice that

Solution: Need to show that the cross product is zero

because of the denition of b :

(a) Show that the OLS estimator of b = y:

(b) Directly obtain the variance of y.

where Qxx = lim