You are on page 1of 15

ECON 203C: System Models

TA Note 1: Version 1

Instrument Variable (IV) and Two Stage Least Square (2SLS) Estimators
Hisayuki Yoshimoto
Last Modied: April 08, 2008

Abstract: In section 1, we discuss the inconsistency of originally least square estimator when underlying OLS
assumptions are violated. Throughout this TA note, I used the example of classical labor economic regression, return to
education. In section 2 and section 3, we review the Instrument variable (IV) and two stage least square (2SLS) estimator
with their interpretation. In section 4, we discuss the asymptotic property of 2SLS estimator. In section 5, we review
partial (residual) regression. Finally, in section 6, we solve Comp 2006S Part III Question 1, the application of partial
regression with instruments.

Inconsistency of OLS Estimator

1.1

Bias and Inconsistency with Endogenous Regressors

Roughly speaking1 , a regressor is called exogenous if it is not correlated with error term. Also, a regressor is called
endogenous if it is correlated with error term. Here, we consider the OLS model such that regressors are endogenous.
yi = x0i + ui
yi
= x0i
+ ui :
|{z}
|{z}|{z} |{z}
1 1

Matrix notation is

1 1

1 KK 1

Y = X +U
Y
= |{z}
X
+ U :
|{z}
|{z} |{z}
N 1

N KK 1

N 1

However, unlike usual OLS assumptions, we here assume


E ui jxi [ ui j xi ] 6= 01
and

and

N
1 P
p
xi ui 9 E [xi ui ] 6= 0k
N i=1

E ujX [ uj X] 6= 0N

N
1 P
p
xi x0i ! E [xi x0i ] :
N i=1

and

Intuitively, regressor xi and error term ui are correlated.


The OLS estimator is
^
=
| OLS
{z }

(X 0 X)

X 0Y

(X 0 X)

X 0 (X 0 + U )

K 1

=
=

1 These

+ (X 0 X)

X 0U

denitons are absolutely not formal. Here, I jsut present intuitions of exogenous and endogenous regressors.

Consider the expectation of ^ OLS


i
h
i
h
1
=
+ EX;u (X 0 X) X 0 U
EX;u ^ OLS
ii
h
h
1
=
+ EX E ujX (X 0 X) X 0 U X
2
3
6
+ EX 4(X 0 X)

6=

(Law of Iterated Expectation)

7
X 0 E ujX [ U j X]5
|
{z
}
6=0N

Thus, the OLS estimator is biased (nite sample property).


Furthermore,
^

OLS

X 0Y

N
1 P
xi x0i
N i=1

=
=

= p lim

N
1 P
xi x0i
N i=1

and
p lim ^ OLS

(X 0 X)

"

N
1 P
xi yi =
N i=1
1

+ p lim

+ E [xi x0i ]

N
1 P
xi x0i
N i=1

6=

N
1 P
xi ui
N i=1

p lim

E [xi ui ]
| {z }
6=0K

N
1 P
xi (x0i + ui )
N i=1

N
1 P
xi ui ;
N i=1

N
1 P
xi x0i
N i=1

N
1 P
xi x0i
N i=1

N
1 P
xi ui
N i=1

(by WLLN, Slutzky, and continuity theorems)

Therefore, the OLS estimator is also inconsistent (large sample property).


In summary, OLS estimator with endogenous regressor(s) is not only biased with nite sample, but also inconsistent
with large sample. This is the serious problem.

1.2

Example

Consider the classic example in labor economics, estimating the return of education.
We want to regress
ln (hwagei ) =

2 edui

3 exi

4 abi

+ "i

where
8
hwagei
>
>
<
edui
exi
>
>
:
abi

:
:
:
:

hourly wage
education length
;
experience on current job
ability (unobserved)

and we want to estimate the return to education parameter 2 :


The problem is that the explanatory variable ability2 (abi ) is usually unobserved and researchers inevitably omit the
ability term (abi ). Therefore, the regression equation becomes
ln (hwagei ) =

2 edui

where
ui =
2 If

4 abi

you do not like the abstract terminology "ability", replace it by IQ.

+ "i :

3 exi

+ ui

(1)

Denote
yi = [ln (hwage)]

3
1
xi = 4 edui 5 :
exi

and

The OLS estimator is


1

OLS

(X X)

XY =

N
1 P
xi x0i
N i=1

N
N
1 P
1 P
xi x0i
xi yi = +
N i=1
N i=1
2
3
1
1
N
1 P
4 edui 5 ( 4 abi + "i ) :
N i=1
exi

N
1 P
xi x0i
N i=1

N
1 P
xi ui
N i=1

The omission of ability (abi ) causes the so called omitting variable bias. The labor economics literature states that
there are strong correlation between education (edui ) and unobserved ability (abi ). An individual who has long education
length is expected to have high ability, i.e. education is an endogenous variable (education is correlated with error term)
. Therefore, regressing the above equation causes serious problem, biased and inconsistent estimator. As a consequent,
we cannot correctly estimate the return to education parameter 2 :3

Instrumental Variable (IV) Estimator

2.1

IV estimator

Keep considering the model


= x0i + ui
= X +u

yi
Y

Now, assume that we have a K 1 instrumental vector zi (that contains instrumental variable zi1;:::; ziK ). that has
following properties
(1) zi is uncorrelated with ui :
(2) zi is correlated with xi
Mathematically, we need the condition
N
1 P
p
zi u i ! 0 K
N i=1

1:

Then, the instrumental variable (IV) estimator is dened as


^

IV

(Z 0 X)

Z 0Y
1

N
1 P
zi x0i
N i=1

N
1 P
zi y i
N i=1

Consider the consistency of ^ IV : We can rewrite the IV estimator and checking asymptotic behavior
^

IV

N
1 P
zi x0i
N i=1

=
=

+
p

N
1 P
zi x0i
N i=1

N
1 P
zi y i =
N i=1

N
1 P
zi x0i
N i=1

N
1 P
zi u i :
N i=1

N
1 P
zi (xi + ui )
N i=1

+ E [zi xi ] E [zi ui ]
| {z }| {z }
K K

=OK

Thus, the IV estimator is consistent (large sample property).


3 Here,

we assume experience (exi ) is exogenous variable (experience on current job is uncorrelated to error term or ability (abi )).

2.2

Example of IV estimator

Continuing the example of return to education. The true model is


ln (hwagei ) =

2 edui

3 exi

abi + "i :
| 4 {z }
=ui

However, due to the unavailability of ability (abi ), we regress the equation


ln (hwagei ) =

2 edui

3 exi

+ ui

where
ui =

4 abi

+ "i :

As we discussed before, education is endogenous variable. (education is correlated with ability). Therefore, we need to
employ the IV estimation method to consistently estimate the return to education parameter 2 : So, what instrumental
variable is available for this regression? Here, we need to nd the variable that is correlated with education and uncorrelated
with ability (equivalently, uncorrelated to error term).
Using the last digit of social security number as instrument is a bad idea. It is not only uncorrelated with individuals
ability, but also uncorrelated with education.
Angrist and Krueger (1991, Quarterly Journal of Economics) suggest birth month as instrument for education. In
U.S. school system, students are categorized into school year system. As a consequent of this education system, there
are rst and last school categorization months. It is reported that students who were born in earlier months have higher
school grades and SAT scores compared to students who were born in later. As a consequent, students who are born
earlier are more likely to go to colleges. So, birth month is correlated with the length of education.
Dene the birth month variable as birthmi (assigning the rst month to 1 and last month to 12) Then, IV vector is
2
3
1
zi = 4 birthmi 5 :
exi
Here, the constant is by denition uncorrelated with error term (no matter what values error take, it is always
constant). As we discussed above birthmi is uncorrelated with error term (i.e. ability). Also, experience on current job
is uncorrelated with ability. Remind that we denote dependent variable and the vector of regressors as
2
3
1
yi = [ln (hwagge)] ;
xi = 4 edui 5 ; ui = 4 abii + "i :
exi
Then, the IV estimator is

^
=
IV
|{z}

(Z 0 X)

Z 0Y =

N
1 P
zi xi
N i=1

3 1

N
1 P
zi x0i
N i=1

N
1 P
zi u i
N i=1

2
3
1
N
1 P
4 birthmi 5 [
N i=1
exi

The second term of above equation are expected to converge 03

2.3

4 abi

1:

Asymptotic Distribution of IV Estimator

Deriving the asymptotic distribution of IV estimator. As we discussed


^

IV

Transforming the above equation into


p

N ^ IV

N
1 P
zi x0i
N i=1

N
1 P
zi x0i
N i=1

N
1 P
zi u i :
N i=1

N
1 P
p
zi u i :
N i=1

+ "i ] :

We have
N
1 P
p
zi x0i ! E [zi x0i ]
N i=1
N
1 P
d
p
zi ui ! N 0; E u2i zi zi0
N i=1

Then, by WLLN, Slutzky and continuity theorems, the limiting distribution of ^ IV is


p

N ^ IV

0; E [zi x0i ]

!N

E u2i zi zi0

1 0

E [zi x0i ]

The consistent estimator of variance is obtained by


1

N
1 P
zi x0i
n i=1

N
1 P
u
^ 2 zi z 0
n i=1 i i

N
1 P
zi x0i
n i=1

!0

where
xi ^ IV :

u
^ i = yi

Two Stage Least Square (2SLS) Estimator

3.1

2SLS Estimator

Assume that there are more instrument variables than regressors. We might think we do not want to discard any of
available instruments. What estimating mythology unable us to use all instruments?
Rewriting the model
yi
= x0i
+ ui
|{z}
|{z}|{z} |{z}
1 1

1 KK 1

Y
=
|{z}

N KK 1

X
+ u
|{z}
|{z} |{z}

N 1

Now, the vector of available instrument is

1 1

N 1

zi
|{z}
L 1

where L

K with condition

N
1 P
p
zi ui ! E [zi ui ] = 0L
N i=1

1:

We stack up the instrument vector and denote the matrix of instrument as


2 0 3
z1
6 .. 7
Z = 4 . 5:
|{z}
0
N L
zN

The two stage least square (2SLS) estimator is dened as (You should check dimensions of this estimator)
^
=
| 2SLS
{z }
K 1

X 0 Z (Z 0 Z)

B 0
= @ |{z}
X Z (Z 0 Z)
|
{z
K N

(X 0 PZ X)

N N

Z 0X

X 0 Z (Z 0 Z)
1

C
Z 0 |{z}
X A
}
N K

X 0 PZ Y;

Z 0Y

X 0 Z (Z 0 Z)
|{z}
|
{z

K N

N N

Z 0 |{z}
Y
}
N 1

where we use the notation of annihilator matrix (projection matrix).

N N

Z0
1

Z A
Z |{z}
Z @ |{z}
|{z}

N L

3.2

= Z (Z 0 Z)

PZ
|{z}

Example of 2SLS Estimator

L NN L

{z

L L

Z0 :
|{z}

L N

Re-discussing the example of return to education. Rewriting the model


ln (hwagei )

ln (hwagei )

1+

2 edui

3 exi

abi + "i :
| 4 {z }

(where abi is unavailable variable)

=ui

2 edui +

3 exi + u

where

ui =

4 abi

+ "i

We have discussed edui is endogenous variable, i.e. correlated with error term (or ability abi ). Also, we have argued
that we can utilize birth month as instrument for education. Card (1995, Aspects of Labour Market Behavior) suggests
that the vicinity of of four-year college as instrument for education. Prof. Card insists that high school students who
are close to four-year college expect lower expenditure for college life because they can commute from their homes. As a
result, high school students who live near four-year colleges are more likely to obtain opportunities of college education.
On the other hand, the vicinity to college has no relation to individual ability. (Does an individual who were born close
to college has high ability? I dont think so.) Therefore, we can use he vicinity of four-year college as instrument. Denote
the distance between individual i and nearest four-year college as disti : Then, the vector of instrument and stacked up
matrix are
2
3
2
3
2 0 3
1
z1
1 birthm1 dist1 ex1
6 birthmi 7
6
6 . 7
..
..
.. 7 :
7
zi = 6
and |{z}
Z = 4 .. 5 = 4 ...
.
.
. 5
4 disti 5
|{z}
0
N 4
4 1
1
birthm
dist
ex
zN
N
N
N
exi
and 2SLS estimate is obtained by

^
= X 0 Z (Z 0 Z)
| 2SLS
{z }

Z 0X

X 0 Z (Z 0 Z)

3 1

3.3

C
B
C
B
C
B
1
0
0
0
0
B
Z |{z}
Z Y = B|{z}
X |{z}
Z (Z Z) |{z}
X C
C
{z
}
|
C
B3 N N 4
4 N N 3A
4 4
@
|
{z
}
N N

Z 0 |{z}
X 0 |{z}
Z (Z 0 Z) |{z}
Y
|{z}
| {z }
3 NN 4

4 4

{z

N N

4 NN 1

Two Stage Interpretation of 2SLS Estimator

There are two interpretations of 2SLS. The rst interpretation is straight forward, implementing the least square regression
twice, in rst and second stages.
First Stage:
In the rst stage, we project xi on zi (or equivalently project X on Z)
xi = zi0 + i
xi =
zi + i
|{z}
|{z}
|{z} |{z}
K 1

where

K LL 1

K 1

is error vector. We take transpose of above equation

x0i = zi0
+ 0i :
|{z}
|{z} |{z} |{z}
1 K

1 LL K

1 K

By stacking up transposed vectors, we obtain the matrix notation


X
X
|{z}

= Z +V
+ V
= |{z}
Z
|{z} |{z}
N LL K

N K

x0i
6 .. 7
6
4 . 5 = 4
x0N
| {z }
|
N K

N L

2
zi0
6
.. 7
. 5 |{z} + 4
0
zN
L K
|
{z }

N L

3
vi00
.. 7 :
. 5
0
vN
{z }

N L

By regressing above equation by least square estimator (least square in matrix sense), we obtain the OLS estimator
of :
1
^ = (Z 0 Z) Z 0 X:
|{z}
L K

Thus we can obtain the projection of X on Z (projection of matrix on another matrix)


1

^
X
|{z}

= Z ^ = Z(Z 0 Z) Z 0 X
{z
}
|

N K

=^

= PZ X

(where PZ = Z (Z 0 Z)
|{z}

Z 0)

N N

PZ |{z}
X :
|{z}

N NN K

^ as projection of X on Z:
where PZ is annihilator matrix (projection matrix) of Z: Note that we denote X
Second Stage:
^
In the second stage we regress Y on projected matrix X:
^ +
Y =X
Then, the least square estimator of
^

^
XY

^ 0X
^
X

(PZ X) PZ X

(X 0 PZ PZ X)

(X 0 PZ X)

X 0 Z (Z 0 Z)

=
=

is equal to 2SLS estimator

(since PZ is symmetric, PZ0 = PZ )

PZ Y

PZ Y
1

^ = PZ X)
(substituting X

PZ Y

Z 0X

(since PZ is idempotent, PZ PZ = PZ )
1

X 0 Z (Z 0 Z)

Z 0Y

(substituting PZ = Z (Z 0 Z)

Z 0)

2SLS

The name "two stage least square" names after this two stage procedure, projection in the rst state and regression
on projected matrix in the second stage.
Formally the above two stage procedures are described owingly, To the model equation
yi
Y

= x0i + ui
= X + U;

we multiply the annihilator matrix PZ from left


PZ Y = PZ X + PZ U

(2)

Lets call this operation as "exogenizing", since we project endogenous variable matrix X on exogenous variable matrix
Z:

Applying OLS to the equation (2),


^

OLS

(PZ X) PZ X
1

(X 0 PZ PZ X)

(X 0 PZ X)
= ^ 2SLS :

(PZ X) PZ Y

X 0 PZ PZ Y

X 0 PZ Y

(since PZ is idempotent, PZ PZ = PZ )

Thus, two stage least square procedure is nothing more than regressing "exogenized" model equation.

3.4

GLS Interpretation of 2SLS

Another interpretation is generalized least square (GLS).


Note that we have the model
+ ui
yi
= x0i
|{z}
|{z}|{z} |{z}
1 1

1 KK 1

Y
=
|{z}

N KK 1

X
+ u :
|{z}
|{z} |{z}

N 1

Multiplying Z 0 to above equation from left

1 1

N 1

Z 0 Y = Z 0 X + Z 0 u:

(3)

and regressing this equitation with GLS method. The error vector Zu has conditional expectation and variance are
E ujZ [ Zuj Z]

= ZE ujZ [ uj Z] = 0N
{z
}
|
=0N

V ar ujZ [ Zuj Z]

20

6B
= E ujZ 4 @Zu
0

10

CB
E ujZ [ Zuj Z]A @Zu
|
{z
}
=0N

10

7
C
E ujZ [ Zuj Z]A Z 5
|
{z
}
=0N

= E ujZ [ Zuu Z j Z] = ZE ujZ [ uu j Z] Z = Z

2
u IN

Z0 =

2
0
u ZZ

Here we assume homoskedastic error. (you can extend heteroskedastic case easily)
Therefore, the variance matrix = 2u ZZ 0 and GLS estimator of equation (3) is
^

GLS

(ZX)

ZX

(Z 0 X)

X 0 Z (ZZ 0 )

2
0
u ZZ
1

Z 0X

(ZX)

Z 0X
1

ZY
0

(Z 0 X)

X 0 Z (ZZ 0 )

ZY

2
0
u ZZ

2
u s

Z 0Y

(substituting

2
0
u ZZ

are cancelled out)

2SLS

Therefore, 2SLS is GLS estimator of equation (3).

3.5

Asymptotic Distribution of 2SLS Estimator

We will discuss the asymptotic distribution of 2SLS estimator in the following question (Final Review: Question 8) So
let me compromise this subsection.

Final Review: Question 8 - 2SLS Estimator and Its Asymptotic Distribution

Consider the linear model


yi = x0i + ui
8

where
E [ ui j xi ] 6= 0
for i = 1; : : : ; n where xi is a K
that

1 vector of regressors. Suppose that there exists a vector of random variable zi such
E [ ui j zi ] = 0;

where zi is an M 1 vector with M > K:


(1) Show that a least square regression of yi on xi will yield an inconsistent estimator for
Answer:
We have
^

OLS

(X 0 X)

+
p

+ Exi ;ui [xi x0i ]

=
6=

N
1 P
xi x0i
N i=1

+ Exi [xi x0i ]

N
1 P
xi x0i
N i=1

X 0Y =

N
1 P
xi ui
N i=1

Exi ;ui [xi ui ]


2

N
1 P
xi x0i
N i=1

N
1 P
xi (x0i + ui )
N i=1

(by WLLN, Slutzky, and Continuity Theorem)


3

7
6
Exi 4xi E ui j [ ui j xi ]5
{z
}
|
6=01

N
1 P
xi yi =
N i=1

Therefore, OSL estimator is inconsistent.


(2) Suggest an instrumental variable estimator for using the entire vector of instruments zi :
Answer:
Since we have the condition on M > L; we suggest 2SLS estimator.
^

2LSL

(X 0 Pz X)

X 0 Pz Y

(X 0 Z (Z 0 Z) Z 0 X)

X 0 Z (Z 0 Z) Z 0 Y:

(3) Show that the estimator suggested in (2) can be viewed as a GMM estimator.
Answer:
We will discuss this question when we study GMM.
(4) Using the fact established in (3), provide the asymptotic distribution of the estimator for :
Answer:
Actually, we do not need to answer (3) to solve this question. Arranging the 2SLS estimator
^

Moving

2LSL

+ X 0 Z (Z 0 Z)

1 0
XZ
n

Z 0X

1 0
ZZ
n

from LHS to RHS and multiplying


p

N ^ 2SLS

1 0
XZ
n

X 0 Z (Z 0 Z) Z 0 U
! 1
1 0
1 0
1 0
ZX
XZ
ZZ
n
n
n

1 0
ZU
n

1
( s are created).
n

n
1 0
ZZ
n

1 0
ZX
n

1 0
XZ
n

Now, by WLLN we have


1 0
XZ
n
1 0
ZZ
n

=
=

n
1 P
p
xi zi0 ! E [xi zi0 ]
n i=1
n
1 P
p
zi z 0 ! E [zi zi0 ] ;
n i=1 i

1 0
ZZ
n

1
p Z 0 U:
n

and by CLT we have


n
1
1 P
d
p Z 0U = p
zi ui ! N (0M
n
n i=1

Therefore, the limiting distribution is


p
N ^ 2SLS
0
0

B
B
C
B 1 0 B 1 0 C
B ZZ C
= B
X
Z
B n
B
C
@ | {z } @ |n {z } A
p
p
!E [xi zi0 ] !E [zi zi0 ]
d

! N (0K

1 0
ZX
n
| {z }

!E[xi zi ]0

1 ; A) :

1; V

ar [zi ui ])

C
C
C
C
A

N 0M

B
C
B 1 0 C
B ZZ C
B n
C
@ | {z } A
p
p
!E [xi zi0 ] !E [zi zi0 ]
1 0
XZ
n
| {z }

u2i zi zi0

1; E

1
p Z 0U
n
| {z }
d
!N (0M 1 ;E [u2i zi zi0 ])

where A is dened as
A = E [xi zi0 ] (E [zi zi0 ])

E [xi zi0 ]

E [xi zi0 ] E [zi zi0 ]

E u2i zi zi0

In the special case of homoskedastic error, E ui jz u2i z =

Then, the variance part corrupts to

2
u

E [xi zi0 ]

E [xi zi0 ] (E [zi zi0 ])

E [xi zi0 ] E [zi zi0 ]


1

E [xi zi0 ]

E [xi zi0 ]

E [xi zi0 ] (E [zi zi0 ])

= B:

2
u Ezi

[zi zi0 ] :

2
u

E u2i zi zi0 E [zi zi0 ]


| {z }
= 2u Ezi [zi zi0 ]

E [xi zi0 ]

E [xi zi0 ] (E [zi zi0 ])

n ^ 2SLS

! N (0K

n ^ 2SLS

! N (0K

d
d

1 ; A)

in the case of heteroskedastic error

1 ; A)

in the case of heteroskedastic error.

Or equivalently,

a
2SLS
a
2SLS

N
N

1
A
n
1
; B
n

in the case of heteroskedastic error


in the case of heteroskedastic error.

(5) Provide a consistent estimator for the asymptotic covariance matrix established in (4). Justify your answer.
Answer:
By WLLN we have
n
1 P
p
xi zi0 ! E [xi zi0 ]
n i=1
n
1 P
p
zi zi0 ! E [zi zi0 ]
n i=1
n
1 P
p
u
^2i zi zi0 ! E u2i zi zi0
n i=1

10

E [xi zi0 ]

Therefore, asymptotic distribution of ^ 2SLS is

E [xi zi0 ]

we have

7
6
= Ezi 4E ui jz u2i z zi zi0 5 =
{z
}
|
=

= Ezi ;ui u2i zi zi0 = Ezi E ui jz u2i zi zi0 z


3
2

E u2i zi zi0

E [xi zi0 ] (E [zi zi0 ])

2
u;

E [zi zi0 ]

where
xi ^ 2LSL :

u
^ i = yi

Therefore, the consistent estimator for the asymptotic variance is


A^ =

n
1 P
xi zi0
n i=1

n
1 P
zi z 0
n i=1 i

n
1 P
zi z 0
n i=1 i

n
1 P
xi zi0
n i=1

n
1 P
xi zi0
n i=1

n
1 P
xi zi0
n i=1

n
1 P
xi zi0
n i=1

n
1 P
zi z 0
n i=1 i

n
1 P
zi z 0
n i=1 i

n
1 P
xi zi0
n i=1

n
1 P
u
^ 2 zi z 0
n i=1 i i

In the special case of homoskedastic error, the asymptotic variance corrupts


^ = ^ 2u
B

n
1 P
xi zi0
n i=1

where
^ 2u =

n
1 P
zi z 0
n i=1 i

n
n
1 P
1 P
yi
u
^2i =
n i=1
n i=1

n
1 P
xi zi0
n i=1

x0i ^ 2SLS

Consistency is established by LLN, Slutzky, and continuity theorems.

Review of Partial (Residual) Regression

In this section, we review partial regression that you had learned on prof. Kyriazidous Note 2. We need to utilize
partial regression for solving Comp questions (Comp 2003S Part III) in next two sections. For formal derivation of partial
regression, please refer to the appendix of prof. Kyriazidous Note #2. Here, we just review the formula and discuss
interpretations.
If regressor X is partitioned into two groups X1 and X2 ;
Y = X1
the OLS estimator of

and

+ X2

+ u;

are given by
^
^

1
2

=
=

(X10 MX2 X1 )

X10 MX2 Y

(X20 MX1 X2 )

X20 MX1 Y;

where MX1 and MX2 are residual operators


MX = In
| {z }1

X1 (X10 X)

X10

X2 (X20 X)

X20 :

N N

MX = In
| {z }1
N N

Notice that MX1 and MX2 are Idempotent matrix.

MX1 MX1
MX2 MX2

= MX1
= MX2

Intuitively, the residual operator MX1 extracts components that X1 cannot explain. Similarly, the residual operator
MX2 extracts components that X2 cannot explain.
Denote residuals by
~1
X
~2
X

= MX2 X1
= MX1 X2 :

11

OLS estimators can be written as (by using idempotent property of MX1 an MX2 )
^
^

~ 10 X
~1
X

~ 20 X
~2
X

~ 10 Y
X
~ 20 Y
X

~ 1 = MX X1 (extracting components that X2 cannot


Here for ^ 1 ; we rst regress X1 on X2 and obtain residual X
2
^
~
~ 2 = MX X2 (extracting
explain). Then, regress Y on X1 : Similarly, for 2 ; we rst regress X2 on X1 and obtain residual X
1
~
components that X1 cannot explain). Then , regress Y on X2 :
Alternatively, OLS estimators can be written as (again, by using idempotent property of MX1 an MX2 )
^
^

~ 10 X
~1
X

~0X
~
X
2 2

~ 10 Y~
X
~ 0 Y~ :
X
2

Comp 2003S Part III (Buchinsky): Question 1

Consider the Neo Classical regression model


0

yi =
where

is k

xi +

1 vector of parameters, and

is p

wi + ui

(i = 1; : : : ; n)

1 vector of parameters. Also, for xi we have


E [xi ui ] = 0

and for wi we have


E [wi ui ] 6= 0:
(a) Can the coe cient vector be consistently estimated by a least-square regression? Demonstrate your answer as
precisely as possible.
Answer:
The matrix notation of the model is
Y = X +W +U
Y
= |{z}
X
+ W
+ U :
|{z}
|{z} |{z}|{z} |{z}
n 1

n k k 1

n p p 1

n 1

2
3
(w1 )
y1
6
6 . 7
6
7
6 .. 7
..
+4
+4
5
4 . 5 = 4 .. 5
.
|{z}
|{z}
0
x0n k 1
yn
p 1
(wn )
|
| {z }
| {z }
| {z }
2

n 1

x01

n p

n k

Consider the partial OLS estimator

OLS

= (X 0 MW X)

un
{z }

n 1

X 0 MW Y

where
MW = In

3
u1
.. 7:
. 5

W (W 0 W )

W 0:

Checking the consistency of ^ OLS :


^

OLS

=
=
=

(X 0 MW X)

X 0 MW Y

(X 0 MW X)

X 0 MW (X + W + U )

+ (X MW X)

X MW W + (X MW X)
| {z }
=On

1 0
X MW X
n

(substituting Y = X + W + U )

X 0 MW U

1 0
X MW U
n
| {z }

(since MW W = In

discussing b elow

12

W (W 0 W )

W W =W

W = On

p)

Notice that
1 0
X MW U
n

=
=
=
=

1 0
X In
n
1 0
XU+
n
1 0
XU+
n

W (W 0 W )

1 0
1
X W (W 0 W ) W 0 U
n
1
1 0
1 0
1 0
XW
W W
W U
n
n
n

n
n
1 P
1 P
xi ui +
zi wi0
n i=1
n i=1
p

0k

Therefore,
^

n
1 P
zi u i
n i=1

E [wi ui ]
| {z }
6=0p

1:

1
( s are created)
n
1

n
1 P
wi wi0
n i=1

! E [xi ui ] + E [zi wi0 ] E [wi wi0 ]


| {z }
=0k

6=

W0 U

OLS

i.e. OLS estimator is inconsistent.


0
0
(b) Suppose that Cov [xi ; wi0 ] = 0; and X 0 W = 0; where X = (x1 ; : : : xn ) ; and W = (w1 ; : : : ; wn ) : Suppose also that
0
0
the vector zi = (z1i ; : : : ; zli ) (with l > p) is a proper instrument for wi = (w1i ; : : : ; wpi ) ;and let Z = (z1 ; : : : ; zn ) : None
of the elements in zi equal to any of elements in xi : Compute the instrumental variable estimator for in the regression
that includes both x and w:
Answer:
Transforming the model by multiplying PZ from left and exogenizing the model
PZ Y
Y^

= PZ X + PZ W + PZ U
^ +W
^ +U
^:
= X

(4)

where
PZ = Z (Z 0 Z)
|{z}

Z0

n n

Y^
^
W
^
U

= PZ Y
= PZ W
= PZ U:

^ ; therefore we need to exogenize the model. Then,


Note that in equation (4), W is endogenous, i.e. uncorrelated with U
we can apply partial regression to equation (4).
OLS partial regression estimator of is (and call it exogenized OLS)
^ 0M ^ W
^
^ ExogenizedOLS = W
X

^ 0 M ^ Y^
W
X

Checking the consistency of this estimator.

^ ExogenizedOLS

^ 0M ^ W
^
W
X

^ 0 M ^ Y^
W
X

^ 0M ^ W
^
W
X

^ 0 M ^ Y^
W
X

^ 0M ^ W
^
W
X

^ 0M ^ X
^ +W
^ +U
^
W
X

^ 0M ^ W
^
W
X

^ 0M ^ X
^ +
W
X
{z
}

Op

^ 0M ^ W
^
+ W
X

^ 0M ^ W
^
^ 0M ^ U
^
+ W
W
X
X
|
{z
}
discussing b elow

13

^ 0M ^ U
^
W
X

^ = On
(since MX^ X

k)

Discussing the second term of above equation,


^ 0M ^ U
^
W
X

In

X (X 0 X)

X 0 PZ U

In

X (X 0 X)

X 0 Z 0 (Z 0 Z)

In

X (X 0 X)

X0 Z0

(PZ W )

(PZ W )

(PZ W )

Z 0U
1

1 0
ZZ
n

1 0
ZU
|n {z }

!E[zi u]=0p

! 0p

1:

(by WLLN)
1

Therefore, by WLLN, Slutzky, and continuity theorems,


p

^ ExogenizedOLS ! ;
i.e. ^ ExogenizedOLS is consistent.
(c) Under the condition in (b), consider the following estimation procedure: (i) Estimate from a regression of Y on
1
X; and (ii) Compute Y~ = MX Y (where MX = I X (X 0 X) X) and estimate by computing the instrumental variable
estimator from a regression of Y on w; using z as instrumental variable for w:
Answer:
Following suggestions given in the question.
We regress Y on X and obtain the residual
Y~ = MX Y:
Next, we project W on Instrument Z; and obtain projected matrix
^ = PZ W:
W
Then, regress Y~ on W
/ and deriving the alternative estimator
^ 0W
^ W
^ Y~ :
^ Alternative = W
(d) Compare the estimators for from (b) and (c). Explain the dierence and/or the similarity.
Answer:
In (b) and (c), we have derived two estimators
^ ExogenizedOLS

^ Alternative

^ 0M ^ W
^
W
X

1 ^0
^
W MX^ W
n

^ 0W
^ W
^ Y~
W

1 ^0^
W W
n

^ 0 M ^ Y^
W
X
1

1 ^0
W MX^ Y^
n

1 ^ ~
WY
n

Are these estimators equivalent?


The answer is yes, but we need to assume very strong assumption, the orthogonal condition between xi and zi i.e.
E [xi zi0 ] = Ok p :
Intuitively, we multiply MX to the exogenized
PZ Y
MX PZ Y

= PZ X + PZ W + PZ U
= MX PZ X + MX PZ W + MX PZ U

Since MX and PZ are symmetric matrix, the product PZ MX is also symmetric


0

0
MX PZ = (MX PZ ) = PZ0 MX
= PZ MX

14

We can transform the equation into


PZ MX Y

= PZ MX X + MX PZ W + MX PZ U
| {z }
On

PZ Y~

= MX PZ W + MX PZ U

Also, since X and Z are orthogonal, we have4


MX PZ

X (X 0 X)

In

= Z (Z 0 Z)

Z0

X 0 Z (Z 0 Z)

X (X 0 X)

0
X
Z (Z 0 Z)
|{z}

=Ok
0

= Z (Z Z)
= PZ

Z0

Thus, the equation becomes


PZ Y~ = PZ W + PZ U:
We can obtain OLS estimator of this equation by
^ OLS

(PZ W ) PZ W

~ 0W
~
W

4 Here,

(PZ W ) PZ Y~

~ Y~
W

^ Alternative

my discussin is very sloppy. Formally, we need


n
1 P
p
xi zi0 ! E xi zi0 = Ok
n k=1

15

p:

Z0

You might also like