Professional Documents
Culture Documents
TA Note 1: Version 1
Instrument Variable (IV) and Two Stage Least Square (2SLS) Estimators
Hisayuki Yoshimoto
Last Modied: April 08, 2008
Abstract: In section 1, we discuss the inconsistency of originally least square estimator when underlying OLS
assumptions are violated. Throughout this TA note, I used the example of classical labor economic regression, return to
education. In section 2 and section 3, we review the Instrument variable (IV) and two stage least square (2SLS) estimator
with their interpretation. In section 4, we discuss the asymptotic property of 2SLS estimator. In section 5, we review
partial (residual) regression. Finally, in section 6, we solve Comp 2006S Part III Question 1, the application of partial
regression with instruments.
1.1
Roughly speaking1 , a regressor is called exogenous if it is not correlated with error term. Also, a regressor is called
endogenous if it is correlated with error term. Here, we consider the OLS model such that regressors are endogenous.
yi = x0i + ui
yi
= x0i
+ ui :
|{z}
|{z}|{z} |{z}
1 1
Matrix notation is
1 1
1 KK 1
Y = X +U
Y
= |{z}
X
+ U :
|{z}
|{z} |{z}
N 1
N KK 1
N 1
and
N
1 P
p
xi ui 9 E [xi ui ] 6= 0k
N i=1
E ujX [ uj X] 6= 0N
N
1 P
p
xi x0i ! E [xi x0i ] :
N i=1
and
(X 0 X)
X 0Y
(X 0 X)
X 0 (X 0 + U )
K 1
=
=
1 These
+ (X 0 X)
X 0U
denitons are absolutely not formal. Here, I jsut present intuitions of exogenous and endogenous regressors.
6=
7
X 0 E ujX [ U j X]5
|
{z
}
6=0N
OLS
X 0Y
N
1 P
xi x0i
N i=1
=
=
= p lim
N
1 P
xi x0i
N i=1
and
p lim ^ OLS
(X 0 X)
"
N
1 P
xi yi =
N i=1
1
+ p lim
+ E [xi x0i ]
N
1 P
xi x0i
N i=1
6=
N
1 P
xi ui
N i=1
p lim
E [xi ui ]
| {z }
6=0K
N
1 P
xi (x0i + ui )
N i=1
N
1 P
xi ui ;
N i=1
N
1 P
xi x0i
N i=1
N
1 P
xi x0i
N i=1
N
1 P
xi ui
N i=1
1.2
Example
Consider the classic example in labor economics, estimating the return of education.
We want to regress
ln (hwagei ) =
2 edui
3 exi
4 abi
+ "i
where
8
hwagei
>
>
<
edui
exi
>
>
:
abi
:
:
:
:
hourly wage
education length
;
experience on current job
ability (unobserved)
2 edui
where
ui =
2 If
4 abi
+ "i :
3 exi
+ ui
(1)
Denote
yi = [ln (hwage)]
3
1
xi = 4 edui 5 :
exi
and
OLS
(X X)
XY =
N
1 P
xi x0i
N i=1
N
N
1 P
1 P
xi x0i
xi yi = +
N i=1
N i=1
2
3
1
1
N
1 P
4 edui 5 ( 4 abi + "i ) :
N i=1
exi
N
1 P
xi x0i
N i=1
N
1 P
xi ui
N i=1
The omission of ability (abi ) causes the so called omitting variable bias. The labor economics literature states that
there are strong correlation between education (edui ) and unobserved ability (abi ). An individual who has long education
length is expected to have high ability, i.e. education is an endogenous variable (education is correlated with error term)
. Therefore, regressing the above equation causes serious problem, biased and inconsistent estimator. As a consequent,
we cannot correctly estimate the return to education parameter 2 :3
2.1
IV estimator
yi
Y
Now, assume that we have a K 1 instrumental vector zi (that contains instrumental variable zi1;:::; ziK ). that has
following properties
(1) zi is uncorrelated with ui :
(2) zi is correlated with xi
Mathematically, we need the condition
N
1 P
p
zi u i ! 0 K
N i=1
1:
IV
(Z 0 X)
Z 0Y
1
N
1 P
zi x0i
N i=1
N
1 P
zi y i
N i=1
Consider the consistency of ^ IV : We can rewrite the IV estimator and checking asymptotic behavior
^
IV
N
1 P
zi x0i
N i=1
=
=
+
p
N
1 P
zi x0i
N i=1
N
1 P
zi y i =
N i=1
N
1 P
zi x0i
N i=1
N
1 P
zi u i :
N i=1
N
1 P
zi (xi + ui )
N i=1
+ E [zi xi ] E [zi ui ]
| {z }| {z }
K K
=OK
we assume experience (exi ) is exogenous variable (experience on current job is uncorrelated to error term or ability (abi )).
2.2
Example of IV estimator
2 edui
3 exi
abi + "i :
| 4 {z }
=ui
2 edui
3 exi
+ ui
where
ui =
4 abi
+ "i :
As we discussed before, education is endogenous variable. (education is correlated with ability). Therefore, we need to
employ the IV estimation method to consistently estimate the return to education parameter 2 : So, what instrumental
variable is available for this regression? Here, we need to nd the variable that is correlated with education and uncorrelated
with ability (equivalently, uncorrelated to error term).
Using the last digit of social security number as instrument is a bad idea. It is not only uncorrelated with individuals
ability, but also uncorrelated with education.
Angrist and Krueger (1991, Quarterly Journal of Economics) suggest birth month as instrument for education. In
U.S. school system, students are categorized into school year system. As a consequent of this education system, there
are rst and last school categorization months. It is reported that students who were born in earlier months have higher
school grades and SAT scores compared to students who were born in later. As a consequent, students who are born
earlier are more likely to go to colleges. So, birth month is correlated with the length of education.
Dene the birth month variable as birthmi (assigning the rst month to 1 and last month to 12) Then, IV vector is
2
3
1
zi = 4 birthmi 5 :
exi
Here, the constant is by denition uncorrelated with error term (no matter what values error take, it is always
constant). As we discussed above birthmi is uncorrelated with error term (i.e. ability). Also, experience on current job
is uncorrelated with ability. Remind that we denote dependent variable and the vector of regressors as
2
3
1
yi = [ln (hwagge)] ;
xi = 4 edui 5 ; ui = 4 abii + "i :
exi
Then, the IV estimator is
^
=
IV
|{z}
(Z 0 X)
Z 0Y =
N
1 P
zi xi
N i=1
3 1
N
1 P
zi x0i
N i=1
N
1 P
zi u i
N i=1
2
3
1
N
1 P
4 birthmi 5 [
N i=1
exi
2.3
4 abi
1:
IV
N ^ IV
N
1 P
zi x0i
N i=1
N
1 P
zi x0i
N i=1
N
1 P
zi u i :
N i=1
N
1 P
p
zi u i :
N i=1
+ "i ] :
We have
N
1 P
p
zi x0i ! E [zi x0i ]
N i=1
N
1 P
d
p
zi ui ! N 0; E u2i zi zi0
N i=1
N ^ IV
0; E [zi x0i ]
!N
E u2i zi zi0
1 0
E [zi x0i ]
N
1 P
zi x0i
n i=1
N
1 P
u
^ 2 zi z 0
n i=1 i i
N
1 P
zi x0i
n i=1
!0
where
xi ^ IV :
u
^ i = yi
3.1
2SLS Estimator
Assume that there are more instrument variables than regressors. We might think we do not want to discard any of
available instruments. What estimating mythology unable us to use all instruments?
Rewriting the model
yi
= x0i
+ ui
|{z}
|{z}|{z} |{z}
1 1
1 KK 1
Y
=
|{z}
N KK 1
X
+ u
|{z}
|{z} |{z}
N 1
1 1
N 1
zi
|{z}
L 1
where L
K with condition
N
1 P
p
zi ui ! E [zi ui ] = 0L
N i=1
1:
The two stage least square (2SLS) estimator is dened as (You should check dimensions of this estimator)
^
=
| 2SLS
{z }
K 1
X 0 Z (Z 0 Z)
B 0
= @ |{z}
X Z (Z 0 Z)
|
{z
K N
(X 0 PZ X)
N N
Z 0X
X 0 Z (Z 0 Z)
1
C
Z 0 |{z}
X A
}
N K
X 0 PZ Y;
Z 0Y
X 0 Z (Z 0 Z)
|{z}
|
{z
K N
N N
Z 0 |{z}
Y
}
N 1
N N
Z0
1
Z A
Z |{z}
Z @ |{z}
|{z}
N L
3.2
= Z (Z 0 Z)
PZ
|{z}
L NN L
{z
L L
Z0 :
|{z}
L N
ln (hwagei )
1+
2 edui
3 exi
abi + "i :
| 4 {z }
=ui
2 edui +
3 exi + u
where
ui =
4 abi
+ "i
We have discussed edui is endogenous variable, i.e. correlated with error term (or ability abi ). Also, we have argued
that we can utilize birth month as instrument for education. Card (1995, Aspects of Labour Market Behavior) suggests
that the vicinity of of four-year college as instrument for education. Prof. Card insists that high school students who
are close to four-year college expect lower expenditure for college life because they can commute from their homes. As a
result, high school students who live near four-year colleges are more likely to obtain opportunities of college education.
On the other hand, the vicinity to college has no relation to individual ability. (Does an individual who were born close
to college has high ability? I dont think so.) Therefore, we can use he vicinity of four-year college as instrument. Denote
the distance between individual i and nearest four-year college as disti : Then, the vector of instrument and stacked up
matrix are
2
3
2
3
2 0 3
1
z1
1 birthm1 dist1 ex1
6 birthmi 7
6
6 . 7
..
..
.. 7 :
7
zi = 6
and |{z}
Z = 4 .. 5 = 4 ...
.
.
. 5
4 disti 5
|{z}
0
N 4
4 1
1
birthm
dist
ex
zN
N
N
N
exi
and 2SLS estimate is obtained by
^
= X 0 Z (Z 0 Z)
| 2SLS
{z }
Z 0X
X 0 Z (Z 0 Z)
3 1
3.3
C
B
C
B
C
B
1
0
0
0
0
B
Z |{z}
Z Y = B|{z}
X |{z}
Z (Z Z) |{z}
X C
C
{z
}
|
C
B3 N N 4
4 N N 3A
4 4
@
|
{z
}
N N
Z 0 |{z}
X 0 |{z}
Z (Z 0 Z) |{z}
Y
|{z}
| {z }
3 NN 4
4 4
{z
N N
4 NN 1
There are two interpretations of 2SLS. The rst interpretation is straight forward, implementing the least square regression
twice, in rst and second stages.
First Stage:
In the rst stage, we project xi on zi (or equivalently project X on Z)
xi = zi0 + i
xi =
zi + i
|{z}
|{z}
|{z} |{z}
K 1
where
K LL 1
K 1
x0i = zi0
+ 0i :
|{z}
|{z} |{z} |{z}
1 K
1 LL K
1 K
= Z +V
+ V
= |{z}
Z
|{z} |{z}
N LL K
N K
x0i
6 .. 7
6
4 . 5 = 4
x0N
| {z }
|
N K
N L
2
zi0
6
.. 7
. 5 |{z} + 4
0
zN
L K
|
{z }
N L
3
vi00
.. 7 :
. 5
0
vN
{z }
N L
By regressing above equation by least square estimator (least square in matrix sense), we obtain the OLS estimator
of :
1
^ = (Z 0 Z) Z 0 X:
|{z}
L K
^
X
|{z}
= Z ^ = Z(Z 0 Z) Z 0 X
{z
}
|
N K
=^
= PZ X
(where PZ = Z (Z 0 Z)
|{z}
Z 0)
N N
PZ |{z}
X :
|{z}
N NN K
^ as projection of X on Z:
where PZ is annihilator matrix (projection matrix) of Z: Note that we denote X
Second Stage:
^
In the second stage we regress Y on projected matrix X:
^ +
Y =X
Then, the least square estimator of
^
^
XY
^ 0X
^
X
(PZ X) PZ X
(X 0 PZ PZ X)
(X 0 PZ X)
X 0 Z (Z 0 Z)
=
=
PZ Y
PZ Y
1
^ = PZ X)
(substituting X
PZ Y
Z 0X
(since PZ is idempotent, PZ PZ = PZ )
1
X 0 Z (Z 0 Z)
Z 0Y
(substituting PZ = Z (Z 0 Z)
Z 0)
2SLS
The name "two stage least square" names after this two stage procedure, projection in the rst state and regression
on projected matrix in the second stage.
Formally the above two stage procedures are described owingly, To the model equation
yi
Y
= x0i + ui
= X + U;
(2)
Lets call this operation as "exogenizing", since we project endogenous variable matrix X on exogenous variable matrix
Z:
OLS
(PZ X) PZ X
1
(X 0 PZ PZ X)
(X 0 PZ X)
= ^ 2SLS :
(PZ X) PZ Y
X 0 PZ PZ Y
X 0 PZ Y
(since PZ is idempotent, PZ PZ = PZ )
Thus, two stage least square procedure is nothing more than regressing "exogenized" model equation.
3.4
1 KK 1
Y
=
|{z}
N KK 1
X
+ u :
|{z}
|{z} |{z}
N 1
1 1
N 1
Z 0 Y = Z 0 X + Z 0 u:
(3)
and regressing this equitation with GLS method. The error vector Zu has conditional expectation and variance are
E ujZ [ Zuj Z]
= ZE ujZ [ uj Z] = 0N
{z
}
|
=0N
V ar ujZ [ Zuj Z]
20
6B
= E ujZ 4 @Zu
0
10
CB
E ujZ [ Zuj Z]A @Zu
|
{z
}
=0N
10
7
C
E ujZ [ Zuj Z]A Z 5
|
{z
}
=0N
2
u IN
Z0 =
2
0
u ZZ
Here we assume homoskedastic error. (you can extend heteroskedastic case easily)
Therefore, the variance matrix = 2u ZZ 0 and GLS estimator of equation (3) is
^
GLS
(ZX)
ZX
(Z 0 X)
X 0 Z (ZZ 0 )
2
0
u ZZ
1
Z 0X
(ZX)
Z 0X
1
ZY
0
(Z 0 X)
X 0 Z (ZZ 0 )
ZY
2
0
u ZZ
2
u s
Z 0Y
(substituting
2
0
u ZZ
2SLS
3.5
We will discuss the asymptotic distribution of 2SLS estimator in the following question (Final Review: Question 8) So
let me compromise this subsection.
where
E [ ui j xi ] 6= 0
for i = 1; : : : ; n where xi is a K
that
1 vector of regressors. Suppose that there exists a vector of random variable zi such
E [ ui j zi ] = 0;
OLS
(X 0 X)
+
p
=
6=
N
1 P
xi x0i
N i=1
N
1 P
xi x0i
N i=1
X 0Y =
N
1 P
xi ui
N i=1
N
1 P
xi x0i
N i=1
N
1 P
xi (x0i + ui )
N i=1
7
6
Exi 4xi E ui j [ ui j xi ]5
{z
}
|
6=01
N
1 P
xi yi =
N i=1
2LSL
(X 0 Pz X)
X 0 Pz Y
(X 0 Z (Z 0 Z) Z 0 X)
X 0 Z (Z 0 Z) Z 0 Y:
(3) Show that the estimator suggested in (2) can be viewed as a GMM estimator.
Answer:
We will discuss this question when we study GMM.
(4) Using the fact established in (3), provide the asymptotic distribution of the estimator for :
Answer:
Actually, we do not need to answer (3) to solve this question. Arranging the 2SLS estimator
^
Moving
2LSL
+ X 0 Z (Z 0 Z)
1 0
XZ
n
Z 0X
1 0
ZZ
n
N ^ 2SLS
1 0
XZ
n
X 0 Z (Z 0 Z) Z 0 U
! 1
1 0
1 0
1 0
ZX
XZ
ZZ
n
n
n
1 0
ZU
n
1
( s are created).
n
n
1 0
ZZ
n
1 0
ZX
n
1 0
XZ
n
=
=
n
1 P
p
xi zi0 ! E [xi zi0 ]
n i=1
n
1 P
p
zi z 0 ! E [zi zi0 ] ;
n i=1 i
1 0
ZZ
n
1
p Z 0 U:
n
B
B
C
B 1 0 B 1 0 C
B ZZ C
= B
X
Z
B n
B
C
@ | {z } @ |n {z } A
p
p
!E [xi zi0 ] !E [zi zi0 ]
d
! N (0K
1 0
ZX
n
| {z }
!E[xi zi ]0
1 ; A) :
1; V
ar [zi ui ])
C
C
C
C
A
N 0M
B
C
B 1 0 C
B ZZ C
B n
C
@ | {z } A
p
p
!E [xi zi0 ] !E [zi zi0 ]
1 0
XZ
n
| {z }
u2i zi zi0
1; E
1
p Z 0U
n
| {z }
d
!N (0M 1 ;E [u2i zi zi0 ])
where A is dened as
A = E [xi zi0 ] (E [zi zi0 ])
E [xi zi0 ]
E u2i zi zi0
2
u
E [xi zi0 ]
E [xi zi0 ]
E [xi zi0 ]
= B:
2
u Ezi
[zi zi0 ] :
2
u
E [xi zi0 ]
n ^ 2SLS
! N (0K
n ^ 2SLS
! N (0K
d
d
1 ; A)
1 ; A)
Or equivalently,
a
2SLS
a
2SLS
N
N
1
A
n
1
; B
n
(5) Provide a consistent estimator for the asymptotic covariance matrix established in (4). Justify your answer.
Answer:
By WLLN we have
n
1 P
p
xi zi0 ! E [xi zi0 ]
n i=1
n
1 P
p
zi zi0 ! E [zi zi0 ]
n i=1
n
1 P
p
u
^2i zi zi0 ! E u2i zi zi0
n i=1
10
E [xi zi0 ]
E [xi zi0 ]
we have
7
6
= Ezi 4E ui jz u2i z zi zi0 5 =
{z
}
|
=
E u2i zi zi0
2
u;
E [zi zi0 ]
where
xi ^ 2LSL :
u
^ i = yi
n
1 P
xi zi0
n i=1
n
1 P
zi z 0
n i=1 i
n
1 P
zi z 0
n i=1 i
n
1 P
xi zi0
n i=1
n
1 P
xi zi0
n i=1
n
1 P
xi zi0
n i=1
n
1 P
xi zi0
n i=1
n
1 P
zi z 0
n i=1 i
n
1 P
zi z 0
n i=1 i
n
1 P
xi zi0
n i=1
n
1 P
u
^ 2 zi z 0
n i=1 i i
n
1 P
xi zi0
n i=1
where
^ 2u =
n
1 P
zi z 0
n i=1 i
n
n
1 P
1 P
yi
u
^2i =
n i=1
n i=1
n
1 P
xi zi0
n i=1
x0i ^ 2SLS
In this section, we review partial regression that you had learned on prof. Kyriazidous Note 2. We need to utilize
partial regression for solving Comp questions (Comp 2003S Part III) in next two sections. For formal derivation of partial
regression, please refer to the appendix of prof. Kyriazidous Note #2. Here, we just review the formula and discuss
interpretations.
If regressor X is partitioned into two groups X1 and X2 ;
Y = X1
the OLS estimator of
and
+ X2
+ u;
are given by
^
^
1
2
=
=
(X10 MX2 X1 )
X10 MX2 Y
(X20 MX1 X2 )
X20 MX1 Y;
X1 (X10 X)
X10
X2 (X20 X)
X20 :
N N
MX = In
| {z }1
N N
MX1 MX1
MX2 MX2
= MX1
= MX2
Intuitively, the residual operator MX1 extracts components that X1 cannot explain. Similarly, the residual operator
MX2 extracts components that X2 cannot explain.
Denote residuals by
~1
X
~2
X
= MX2 X1
= MX1 X2 :
11
OLS estimators can be written as (by using idempotent property of MX1 an MX2 )
^
^
~ 10 X
~1
X
~ 20 X
~2
X
~ 10 Y
X
~ 20 Y
X
~ 10 X
~1
X
~0X
~
X
2 2
~ 10 Y~
X
~ 0 Y~ :
X
2
yi =
where
is k
xi +
is p
wi + ui
(i = 1; : : : ; n)
n k k 1
n p p 1
n 1
2
3
(w1 )
y1
6
6 . 7
6
7
6 .. 7
..
+4
+4
5
4 . 5 = 4 .. 5
.
|{z}
|{z}
0
x0n k 1
yn
p 1
(wn )
|
| {z }
| {z }
| {z }
2
n 1
x01
n p
n k
OLS
= (X 0 MW X)
un
{z }
n 1
X 0 MW Y
where
MW = In
3
u1
.. 7:
. 5
W (W 0 W )
W 0:
OLS
=
=
=
(X 0 MW X)
X 0 MW Y
(X 0 MW X)
X 0 MW (X + W + U )
+ (X MW X)
X MW W + (X MW X)
| {z }
=On
1 0
X MW X
n
(substituting Y = X + W + U )
X 0 MW U
1 0
X MW U
n
| {z }
(since MW W = In
discussing b elow
12
W (W 0 W )
W W =W
W = On
p)
Notice that
1 0
X MW U
n
=
=
=
=
1 0
X In
n
1 0
XU+
n
1 0
XU+
n
W (W 0 W )
1 0
1
X W (W 0 W ) W 0 U
n
1
1 0
1 0
1 0
XW
W W
W U
n
n
n
n
n
1 P
1 P
xi ui +
zi wi0
n i=1
n i=1
p
0k
Therefore,
^
n
1 P
zi u i
n i=1
E [wi ui ]
| {z }
6=0p
1:
1
( s are created)
n
1
n
1 P
wi wi0
n i=1
6=
W0 U
OLS
= PZ X + PZ W + PZ U
^ +W
^ +U
^:
= X
(4)
where
PZ = Z (Z 0 Z)
|{z}
Z0
n n
Y^
^
W
^
U
= PZ Y
= PZ W
= PZ U:
^ 0 M ^ Y^
W
X
^ ExogenizedOLS
^ 0M ^ W
^
W
X
^ 0 M ^ Y^
W
X
^ 0M ^ W
^
W
X
^ 0 M ^ Y^
W
X
^ 0M ^ W
^
W
X
^ 0M ^ X
^ +W
^ +U
^
W
X
^ 0M ^ W
^
W
X
^ 0M ^ X
^ +
W
X
{z
}
Op
^ 0M ^ W
^
+ W
X
^ 0M ^ W
^
^ 0M ^ U
^
+ W
W
X
X
|
{z
}
discussing b elow
13
^ 0M ^ U
^
W
X
^ = On
(since MX^ X
k)
In
X (X 0 X)
X 0 PZ U
In
X (X 0 X)
X 0 Z 0 (Z 0 Z)
In
X (X 0 X)
X0 Z0
(PZ W )
(PZ W )
(PZ W )
Z 0U
1
1 0
ZZ
n
1 0
ZU
|n {z }
!E[zi u]=0p
! 0p
1:
(by WLLN)
1
^ ExogenizedOLS ! ;
i.e. ^ ExogenizedOLS is consistent.
(c) Under the condition in (b), consider the following estimation procedure: (i) Estimate from a regression of Y on
1
X; and (ii) Compute Y~ = MX Y (where MX = I X (X 0 X) X) and estimate by computing the instrumental variable
estimator from a regression of Y on w; using z as instrumental variable for w:
Answer:
Following suggestions given in the question.
We regress Y on X and obtain the residual
Y~ = MX Y:
Next, we project W on Instrument Z; and obtain projected matrix
^ = PZ W:
W
Then, regress Y~ on W
/ and deriving the alternative estimator
^ 0W
^ W
^ Y~ :
^ Alternative = W
(d) Compare the estimators for from (b) and (c). Explain the dierence and/or the similarity.
Answer:
In (b) and (c), we have derived two estimators
^ ExogenizedOLS
^ Alternative
^ 0M ^ W
^
W
X
1 ^0
^
W MX^ W
n
^ 0W
^ W
^ Y~
W
1 ^0^
W W
n
^ 0 M ^ Y^
W
X
1
1 ^0
W MX^ Y^
n
1 ^ ~
WY
n
= PZ X + PZ W + PZ U
= MX PZ X + MX PZ W + MX PZ U
0
MX PZ = (MX PZ ) = PZ0 MX
= PZ MX
14
= PZ MX X + MX PZ W + MX PZ U
| {z }
On
PZ Y~
= MX PZ W + MX PZ U
X (X 0 X)
In
= Z (Z 0 Z)
Z0
X 0 Z (Z 0 Z)
X (X 0 X)
0
X
Z (Z 0 Z)
|{z}
=Ok
0
= Z (Z Z)
= PZ
Z0
(PZ W ) PZ W
~ 0W
~
W
4 Here,
(PZ W ) PZ Y~
~ Y~
W
^ Alternative
15
p:
Z0