You are on page 1of 12

Split-Sample Instrumental Variables Estimates of the Return to Schooling Author(s): Joshua D. Angrist and Alan B.

Krueger Source: Journal of Business & Economic Statistics, Vol. 13, No. 2, JBES Symposium on Program and Policy Evaluation (Apr., 1995), pp. 225-235 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/1392377 Accessed: 24/09/2009 16:43
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=astata. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of Business & Economic Statistics.

http://www.jstor.org

Statistical Association ( 1995American

Journal Business Economic of & Statistics, 1995,Vol.13,No. 2 April

Variables Instrumental Split-Sample of the Estimates Return to Schooling


Joshua D. ANGRIST
of Mount Jerusalem 91905, Israel Scopus,Hebrew Department Economics, University, Alan B. KRUEGER Princeton NJ of Princeton, 08544 University, Department Economics,
reevaluates recent instrumental variables estimates the returns schooling of to Thisarticle (IV) leastsquaresis biasedinthesamedirection ordinary as in light thefactthattwo-stage of least a eveninverylargesamples.We propose split-sample instrumental variables squares(OLS) that toward OLS. SSIV uses one-half a sample estimate of to estimator is notbiased pa(SSIV) of Estimated are rameters thefirst-stage parameters thenusedto construct equation. first-stage in valuesandsecond-stage estimates the otherhalfsample. SSIVis biased fitted parameter confirm reinforce The estimators and some toward butthisbiascanbe corrected. spit-sample 0, on to but to others. previous findings thereturns schooling fail confirm and least KEY WORDS: bias; Finite-sample Human capital wages;Two-stage squares. There has been longstandinginterestin the finite-sample propertiesof instrumentalvariables(IV) estimators. In an influential early article, Nagar (1959) used an approximation argumentto show that two-stage least squares (2SLS) estimates are biased towardthe probabilitylimit of ordinary least squares(OLS) estimates in finite samples with normal disturbances. Buse (1992) generalized this result to cases with nonnormaldisturbances.Otherthingsequal,the bias of 2SLS is greaterif the excluded instruments explain a smaller shareof the variationin the endogenousvariable.Nelson and Startz(1990) and Maddalaand Jeong (1992) showed that,in samples of the size typically used in time series analyses, distriIV estimates and their t ratios have highly nonnormal butions if the first-stageR-squareis low and the correlation errorsis large. between reduced-formand structural Recently, Bound, Jaegar, and Baker (in press) (henceforth BJB), Staiger and Stock (1994), and Bekker (1994) argued that finite-sample bias may also be a problem in cross-sectionalstudiesthatuse very largesampleswith many excluded instruments. BJB and Staiger and Stock (1994) presentedreplicationstudies of Angrist and Krueger(1991) (henceforthAK-91), who reportedthe results of using quarter of birth to constructinstrumentsfor years of schooling in log-wage equationsestimatedwith Census data. Quarter of birth is correlatedwith schooling because of a mechanilaws andage cal interactionbetween compulsoryattendance at school entry. Both BJB and Staiger and Stock explored the possibility thatthe empiricalfindingthatIV estimatesof schooling coefficients are similarto OLS estimatesis largely attributable bias in the IV estimates. to In a second application (Angrist and Krueger 1992a; henceforth AK-92), we used draft lottery numbersto constructIV estimatesof the returnsto schooling for men at risk of being drafted during the Vietnam era. The idea behind 225 AK-92 was to exploit the possibility that draft avoidance via college defermentgenerateda relationshipbetween randomly assigned draft-lotterynumbers and the educational attainmentof men at risk of being drafted. As in AK-91, IV estimates of schooling coefficients in AK-92 were also similarto the OLS estimates. The case for finite-samplebias in these two applications begins by noting that the first-stage equations explain little of the variance of the endogenous regressor. To see how a weak first stage can lead to bias, we experimented with randomly drawn fictitious instruments, which naturally generate a very weak first-stagerelationship (similar experiments were reported by BJB). It turns out that by drawing instrumentsfrom a uniform random-number generatorit is possible to generate2SLS estimatesthat arequite close to the OLS estimates arising from specifications reported by AK-92, as well as some of those reported by the AK-91. More importantly, reportedsecond-stage standard errors give the impression of a "statistically significant" structural coefficient estimate using the conventional normalapproximation the sampling distributionof 2SLS to estimates. The possibility of such misleading inferences highlights the importanceof developing IV estimatorsthat are not biased towardOLS. In this article, we propose a new estimator that we call split-sampleinstrumentalvariables(SSIV). SSIV works by randomly splitting the sample in half and using one half of the sample to estimate parametersof the first-stageequation. These estimatedfirst-stageparameters are then used to constructfitted values and second-stageparameterestimatesfrom data in the other half of the sample. This estimatoris a special case of the two-sample instrumentalvariables(TSIV) estimatorpresentedby Angrist and Krueger(1992b). (Altonji and Segal [1994] also discussed

226

of Journal Business Economic & Statistics, 1995 April BJB's adaptation the Buse (1992) approximate of bias formula for a simple case with no exogenous regressors(where all variableshave mean 0) is " 7ZZ" U2 x r'Z'Zlr(k- 2). (3)

sample splittingto reduce the bias of generalizedmethodof momentsestimators.) Unlike conventionalIV estimates, SSIV estimatesare biased toward0 regardlessof the degreeof covariancebetween and structural reduced-formerrorsor the first-stageR2. An unbiasedestimateof the attenuation bias of SSIV is given by the coefficient from a regressionof the endogenousregressor on its predictedvalue (using data from one half of the from the other). The estisample but first-stageparameters matorformed from the productof SSIV and the estimated inverse attenuationbias, called USSIV, is consistent as the numberof instrumentsgrows, holding the numberof observationsper instrumentconstant. Bekker(1994) showed that this sort of asymptoticargument gives a good accountof the estimafinite-samplepropertiesof simultaneous-equations tors, improvingconsiderablyon the conventionalasymptotic approximation. In Section 1, we review the literature finite-sample on bias in IV estimators.In Section 2, we developthe basic SSIV and USSIV approach. In Section 3, we discuss "groupasymptotics" applied to SSIV and USSIV, as well as to 2SLS. In Section 4, we present a replication study of our two articles using 2SLS to estimate the returnsto schooling. SSIV and USSIV coefficient estimates generatedfrom the AK-91 data are similar to conventionalIV estimates. A reexamination of the results of AK-92, however, strongly suggests thatthe 2SLS resultsreportedin thatarticlearealmostsolely attributable the finite-samplebias of IV towardOLS. to

1. SMALL-SAMPLE BIAS IN2SLS ESTIMATES


Consider the following two-equation model with one endogenousregressor:
Yi = oWoi lsis Ei+

i'x

+ Ei
'z

(1)

and
xi = 7co/woi + r'wli

77i

qi,

(2)

for i = 1, ..., n observations,whereyi is the dependentvariable (e.g., log wages) andsi is the endogenousregressor(e.g., years of schooling). zi is a (k + p) x 1 vectorof instrumental variablesthat includes the p exogenous variablesappearing in Equation (1), wo0,plus k additionalvariables, w1i (e.g., dummies). Thus therearek excludedinstruquarter-of-birth mentsandk- 1 overidentifying restrictions.xi is a (p+ 1) x 1 vector that includes the exogenous regressorsalong with the endogenousregressor. The data are more compactly denoted by an n x 1 vector Y, an n x (p + 1) matrixX, and an n x (k + p) matrixZ. From(1) and (2), we have Y = Xf + E andX = Zr + 7. The coefficient fl is the scalarparameter interest,assumedto of be the last element in the (p + 1) x 1 vector t, and 7 is the (k + p) x (p + 1) matrix of reduced-formparameters.We assume that observationsin the sample are iid and that the disturbances satisfy E(Ei I zi) = E(rh I zi) = O. The residual varianceof Eiis denoted r2. The vectorof residualvariances in Equation (2), rl, consists of p zeros for the exogenous convariates,plus the last element correspondingto si. The varianceof this elementis ofr,andits covariance with Eiis 7E?.

is BJB pointedout that rr'Z'Zr/a,2k the inverse of the population analog of the F statisticfor a test of 7r = 0 in the first7r stage equation(i.e., substituting andU2 for OLS estimates in the usualF statisticformula)andthatthe approximate bias of IV estimates is proportional the OLS bias, a,/a2. It to "is also clear that a lower first-stageR2, keeping constantthe numberof instruments, leads to more bias unless thereis no need to instrument (i.e., a,1 = 0.) A simpleexplanationfor this sort of bias in IV estimatesis thatthe estimatedcoefficientsused to constructthe first-stage fittedvalues arecorrelatedwith the structural-equation error. Let Pz = Z(Z'Z)-1Z'.The first-stagefittedvalues can thenbe writtenPzX = Zir + PzE. The averagecovariancebetween PzEand the last column of q is asymptoticallynegligible but has an expectationequal to ,,n[k +p]/n in any finite sample. To see how serious this sort of bias could be, we experimented with the specification reportedby AK-91 using 3 dummies x 10 year-of-birthdummies plus quarter-of-birth 3 quarter-of-birth dummies x 50 state-of-birthdummies to forma set of 180 excludedinstruments.Conventional estiIV matesfromthis specification(using a sampleof over 329,000 observations)generate a schooling coefficient of .093 with standarderrorof .009. OLS estimates of the same specification generatea schooling coefficient of .067 with a standarderrorof .0003. Replacingactualquarterof birthwith a randomdrawfrom a four-pointdiscreteuniformdistribution and repeatingthe IV estimation generated a coefficient of .057 with a reportedstandard errorof .014. Most researchers would probablybelieve (not knowing that the instruments were fictitious) that they had learned something about the returnsto schooling from this estimate.

2. SPLIT-SAMPLE INSTRUMENTAL VARIABLES (SSIV)


SSIV solves the problemof spuriousinferencesin IV estimationby breakingthe link betweenEand r in Equations(1) and (2). The SSIV estimate is constructedby randomlydividing a single sampleintotwo half samples,denoted 1 and2. Each sampleconsists of datamatrices{Yj,X,, Z,} forj = 1, 2. Sample 2 is used to estimate the first-stageequation. The first-stageparametersare then combined with observations on Z1to form fittedvalues for X1 in sample 1. Finally, Y1is regressedon these fittedvalues andthe exogenous regressors in sample 1. The estimatoris = [xZ2( z

x [xz2(ZZ2)-lZ Y,],

y-l'zz1,(Z2Z2)-lZZX2]-1

(4)

where X21 = Z1(ZZ2)- ZX2 is the cross-sample fitted value. Note thatthe cross-samplefittedvalue for exogenous

and Krueger:IVEstimatesof the Returnto Schooling Angrist

227

regressors is the value of the exogenous regressors in saniple 1. To develop the propertiesof SSIV, we begin with the observationthat by virtue of independentsamplingwe have Assumption 1. The data matrices {Y1,X1,Z1} and {Y2,X2,Z2}arejointly independent. Assumption 1 implies that {IY1, is jointly independent X1} of {Y2,X2, Z2) given Z1. This is used to prove the following proposition. Proposition 1. (a) Provided that the expectationexists, E(.,) = E(O)p = O5, where0 is a (p + 1) x (p + 1) matrix,

Corollary1.1. SupposethatE[X1 IX21]is linear. Then S


E[X] = (E[X21])- 1E[X,1X1]

(5a)

= [X 2(ZZ2)-1ZZi(Z2Z2)1Z;X2]

x[X2(ZZI)-ZX (5) ].
(b) Let1?21 representthe cross-samplefittedvalueof the vector of si, and let sl representthe endogenousregressorin sample 1. The lower rightelementof 6 is equivalentto the coefficient on ?21 from a regression of sl on ?21 and all the exogenous regressors. This regressioncoefficientprovidesan unbiased bias in SSIV estimatesof P1. estimate of the proportional Proof write SubstituteX1f$+ E1for Y1in (4). Then we can

= {7'E(ziz;)" + cO2L11 }-{"7X'E(ziz;) }, (5b) and where c - tr{E[(Z2Z2)-1(Z'Z1)]/nl} L1 is a (k + p) matrix consisting of all zeros except for a 1 in the square lower right corner. If (Z'Z) is the same in the two samples, then c = (k + p)/n,. L1reflectsthe fact thatXi includes only one endogenousregressor, &i. Proof If E[X1I X21] is linear, then E[XI X21] is we Since 0 = tI[X2X21X2-1IX1XI], X21{E[X21X21]-IE[X21X1]}. = E[X:1X21/n1]-' can substitutefor X1 to show that E[O] * In the Appendix,the momentsin the numerator E[X'1XI/n1]. are and denominator simplifiedto give (5b). The corollaryshows that6 representsa kind of attenuation bias arising from the use of reduced-formcoefficients from a separatesample. If there are no exogenous variables,then bias the proportional of SSIV is between 0 and 1-that is, the SSIV coefficient will be biased toward0 in absolute value. More generally,(5b) implies a matrix attenuationbias. As measurement-error models with a single misin multivariate in measuredregressor(e.g., Fuller 1975), matrixattenuation this case implies attenuationof the coefficient on the single endogenousregressor,P1. The Appendix shows that (6) , where P andR are submatricesin a partitionedmatrixand q is a positive scalarso that41/(4+ ca,2)is necessarilybetween 0 and 1 (because c is also positive.) A consequence of Equation(6) is that, under the conditions of Corollary 1.1, the SSIV estimate is asymptotically unbiasedas n gets largewith the numberof instruments fixed (because c then goes to 0.) Anothercase of interestis when the vector of reduced-formcoefficients,7r, is near0. In this from Equation(5) thatthe SSIV estimate case, it is apparent of P1 has expectationnear0. Moreover,increasingthe number of instrumentswith the explained sum of squaresfixed also tends to pull SSIV estimates toward0 (because c then contrastssharplywith the tendency increases.) This property of conventionalIV estimatesto be biased towardOLS. l = [/+ [ I( + 0 [P /(O + c )]O3
2

1% ,Z 2Z 2X2] + Z = 0p Xr ( Z2)-,Z 1(Z2)- ,Z


EIl. [XZ2(Z2Z2)-IZ over Z1 and using assumption 1, we Iteratingexpectations have x

)]p-1Rpl1

-X ZE Z7) E[Xr(Z -,Z'ZZ(ZZYZ)-ZX2][Xr2(Z1Z2)1Z-lZ


2(Z2Z,2)-Z'Z(Z2])EEIE[XZ((ZZl 'ZX21-'

S[XZ2(Z2Z2)-1IZI]x Z' x E[E1I ZIl),

which is 0 because E[E1I Z1] = 0. Note that 6 is the matrixof coefficients from a regression of the columns of X1 on ZI(Z2Z2)-1Z2X2.We can writeX1 = 1]. Regressing [Wo, sl], and Z1(Z2Z2)- Z2X2 = [Wo01 sl] on [Wol '21]gives the matrix [Wol

2.1 Conventional AsymptoticResults for SSIV


where I, is ap x p identity matrix,&,is p x 1, and ,,+1is a scalar equal to the coefficient on ?21 in a regressionof Sl on s21 and Wo0. Proposition 1 startswith the assumptionthatthe expectation of 6 exists. We have not been able to provide general conditions for the existence of this expectation. Instead,we first consider a special case in which the expectationclearly exists. Then in Section 4 we use an improvedasymptotic to argument makethe same point in a more generalway. The special case we consider is one in which E[X1 I X21]is linear. [This will be approximately trueif X andZ arenormally distributedand if the sampling varianceof (Z 2)-1Z(X2is negligible.] We have the following corollary. SSIV is consistentbecause plim &equals the identitymatrix. The following propositioncharacterizesthe asymptotic of distribution /,. Proposition2. Define g,(6) - [Z'Y1/n - (ZX22 where nl = an2 for some positive numbera. Under standardconditions, n' N(0, f?), where 6 is a (p + k) g,~(f) x (p + k) asymptotic covariancematrix. Then n,2 (p, 6) a N(0, r) where r = ( -=1xZ)-1 =lahZ l= (Cxzl x)-1 and C, and C, denote population average matrices. cross-product Proof This proposition is established by showing that SSIV is asymptotically equivalent to the general TSIV

228

Journal Business Economic of & Statistics, 1995 April for new years of data,or by addingadditionalcross-sections from new states, regions, or cohorts. This is the same type of argumentused by Deaton (1985) in his study of panel data createdfrom an asymptoticallylengtheningtime series of cross-sections. The group-asymptotics approachis also similar to the parametersequence used in Bekker's (1994) estimators. As noted by study of simultaneous-equations Bekker, the rationalefor this approachis not really important; what mattersis that it gives a good account of finitesample properties. Under group asymptotics, each cross-section replication provides m additional observations. In particular,the tth cross-sectionreplicationis assumed to contain iid data matrices of length m with observationson {Y,,X,, Z,} for t = 1, ..., T. We split these observationsinto data matricesfor half-samplesof size mi and m2,denoted by {Yj,, Zj,},for Xj,, j = 1, 2. An importantfeatureof this replicationsequence is that there is assumedto be a differentmatrix of reducedform coefficients associated with each replication. In particular, we imagine that at each replication a reduced-form coefficient matrix,7t,, is also drawn. The rt,are themselves iid randommatricessatisfying E[Xj,- Zjir, I Zi,] = 0, with E[(Xj,- Zjt,,)(Xj, - Zjir)' I Zj,] having one nonzero element equal to a,2. Each7r,is independentof the data in each half sample. The factthat7r,varieswith t motivatesthe use of interaction terms in the instrumentlist. The matrix of fitted values is therefore

estimator discussed by Angrist and Krueger (1992b) and then using the asymptoticcovariancematrixgiven by them. The general TSIV estimator is [(XZ2/n2)( (Z2X2/n2)]-1 [(XZ21/n2)I(Z' Y1/nl)], where P is any positive-definite weighting matrix. To see that SSIV is asymptoticallyequivalent to TSIV,set P = (Z2Z2)-1(Z'Z1)(Z1Z2)-1. In general, setting P = 1-1 gives the optimal TSIV estimator. If t = EC,2 (say because f = 0), then SSIV is the asymptoticallyefficientTSIV estimatorconstructed from Zi Y1/nlandZ2X2/n2. In this case, the asymptoticcovariance an matrixof SSIV simplifiesto (E,E E,)-' r,2under (n1)'/2 normalization,which is the usual form of the 2SLS asymptotic covariancematrix. Since nl = n/2 in the SSIV case, however,the asymptoticcovariancematrixof SSIV is at least twice the asymptoticcovariancematrixof 2SLS. This is not surprisingbecause SSIV uses half as much data as 2SLS to computethe second-momentmatrices. overother Finally,note thatSSIV has a practicaladvantage TSIV estimatorsin that it is easy to computeusing standard OLS regression software. Moreover,like TSIV, the SSIV estimatecanbe calculatedusing one samplewith information on Z and Y but no informationon X and a second sample with informationon Z and X but no informationon Y. This propertysometimes motivates the use of TSIV instead of 2SLS (e.g. Angrist 1990; Angrist and Krueger1992b.).

2.2 Unbiased Split-Sample Estimation


It seems reasonableto try to improveon SSIV byinflating bias, fi, by the inverseof the estimatedproportional 0. Theresultingestimator,0'P1,, is not unbiased,however,becauseit randomvariinvolvesa nonlinearfunctionof the (correlated) ables 0 and f,. Nevertheless,the inflatedestimatoris unbioutlinedlater.We asedunderthe group-asymptotic argument thereforelabel the inflated estimatorunbiasedsplit-sample variables(USSIV). instrumental Recall thatO= [XlX21,]-[X21X1]. Then the USSIV estimatoris =, , = [XXi1] [X21Yi].Note thatP, can be constructedby using X21as an instrumentfor Xi in the for regressionYi = X1/6+ E1.Using X21as an instrument X1 instead of including it directly as a regressoreliminatesthe bias that arises from estimationof the first-stage attenuation reducedform. An importantdifferencebetween USSIV and SSIV is that USSIV requiresdata on X1 but SSIV does not. This means that USSIV cannot be used in applicationssuch as that of Angrist and Krueger(1992b), where one sample includesonly observationson (Z1, Y1)andthe otherincludes only observationson (Z2,X2). 3. GROUP ASYMPTOTICS

=...

..X2lT]

whereX21,t= Zi,(Z2,Z2,)ZFX2t. Similarly,the datamatrices from each replicationare stacked:


X 1' ''X' X" ' T

"Z- [Z1,.''.,'Z'''''' ...TZj

forj = 1, 2. Consider the SSIV estimatorconstructedby pooling all replicationsand allowing a separatereduced form for each replication.We definethegroup-asymptotic probabilitylimit of P, as the probabilitylimit of this estimatorwhen the number of groups (T) becomes infinitewhile the group size (m) is fixed. This probabilitylimit turnsout to be similar to the expectationderivedin Proposition1, in which a linearconditional expectation,E[X1 IX21],was assumed. Proposition3 uses groupasymptoticsto comparethe bias of SSIV andconventional2SLS. Proposition3. (a) For SSIV, it is useful to write

In this section, we develop an asymptoticargumentthat featuresof the finite-sample beappearsto captureimportant and haviorof ft., f,. The group-asymptotics derives approach the limitingcharacteristics 6, as the numberof instruments of is grows, but the numberof observations instrument held per fixed. In the context of AK-91, this can be thoughtof as obtainingadditionalinstrumentsby addingnew cross-sections

X21, = Z1,i + Z1,(Z,Zt)- Z2,r2 Yi, = (Zltrt + rIt)B + Et.

The group-asymptotic probabilitylimit of SSIV is


T-,oo

+ {E[Cr'(ZZlt/ml)irt] c*o, Li}xE[n, (ZltZl/ml)nrlf,

IV and of to Angrist Krueger: Estimates the Return Schooling where c* - tr{E[(Z2,Z2,)-'(Z',Zi,)]/mi}. For 2SLS, it is (b) useful to write
X, = Zt(ZZ,)-lz:x, = Z,,rt + Z,(ZZt)-y'Ztr,

229

proof of Corollary1.1. To derivethe varianceformulain (b), substitutefor Y1in ,,:

+ + Y,= (Z,x, ?,)70 E,.


The group-asymptotic probabilitylimit of 2SLS is

P= [X21XIl-1,Y1,1X IX = 1+

(7)

so that r' -

6) = [(1/T),X'j ,,, /mj, * T1/2[(1/T) E-1/

plim[( 1/T)Et[XXt;/m]-'[(1/T)Et[XkYtI/m] = fi + [(k + p)/m]{E[7r'(Z/Zt/m)7r,] + [(k + p)/m]Or2L1}-le1lr,,, wheref, is a (p + 1) column vectorconsistingof zeros in the firstp rows and a 1 in the last row. Proof For Part(a), the first step is to note that
X21,ty,t/ml]

m]

"a E[I21,X,/mil-'
IT) ST'1/2[( E , Using the fact that E1,is mean independentof X21,twith a scalarcovariancematrixcompletes the proof. Thus USSIV is consistent undergroup asymptotics. The USSIV coefficientestimatesand group-asymptotic standard errors are also easy to compute. Note that P, is a justidentified2SLS estimatorin sample 1, so it can be written X1 Y1i I u = [XIX21 (X2X21)-1X1X1-XX',(X,X21) The conventional 2SLS covariance for matrixestimator an estimateof this formis [X,2' (X21X21)-1 SoftwareX,1Xi-' ,2. reported 2SLS standarderrors therefore provide a consistent estimate of the sampling variance of f, under group asymptotics. We also havethe following corollary,which gives conventional asymptoticresults for USSIV for the case in which mi = m/2. Corollary4.1. ([m/2]T)1/2(fu - P) has the same conventional asymptoticcovariancematrix(i.e., letting m get large with fixed T) as 2SLS. This can be proved directly or by taking the limit of A as m becomes infinite. An implication of the corollary is thatthe averageof USSIV and its complement(reversingthe roles of samples 1 and 2) has the same limiting distribution underconventional asymptoticsas does 2SLS. Analysis of the distributionof the combined estimatoris group-asymptotic more complicated, however, because the group-asymptotic covarianceof the two possible USSIV estimators is not 0. We thereforeleave an investigationof the combinedUSSIV estimatorfor futurework.

, plim[(1/T)C21.,2,,tml]-'[(1/T)E T-+oo

E[X2'1,X2,t1/m* E[X2,,Yi,/mil]. j-1'


The proof is completed by using the definitionsof X21,t and Y1,and the independenceassumptionto evaluatepopulation moments, as in the proof of Corollary 1.1 (given in the Apin pendix). As in Corollary1.1, the matrixattenuation Propoof sition 3 implies scalarattenuation the coefficientf,1. Similarly, proof of Part (b) begins with the observationthat the desiredprobabilitylimit is E[XXt,/m]-' * E[X*Yt/m]. this In case, however, Et and ir, are in the same sample and have covariancea,, for the element of rjicorresponding si. to This propositionshows that neitherSSIV or 2SLS is consistent undergroupasymptotics,althoughthe two estimators are biased in different ways. The group-asymptotic probability limit of SSIV is the same as thatpresentedin Corollary 1.1 for a special case andreflectsa bias toward0. The groupasymptoticprobabilitylimit of 2SLS is the same as Bekker's (1994) formulafor the bias of 2SLS. Like the Nagar (1959) and Buse (1992) approximation results, this formulareflects a bias toward OLS. (Because SSIV and 2SLS are not consistent under group asymptotics,the standarderrorsfor the SSIV and 2SLS estimatesreportedin Section 4 are based on the usual asymptoticapproximation.) The group-asymptoticpropertiesof USSIV are summarized in the following proposition. Proposition4. (a) plim[(1/T)E,hXXXi,/ml]-I[(1/T),X2,1.,Yi,/mi]
T--oo

4. IVESTIMATES THERETURN OF TO SCHOOLING


4.1 Angrist and Krueger (1991)

= 8.

(b) T1/2 (u where

f) N(O, A)

A = (1/m,)E[XltX2,t/mi,1 Proof To prove (a), we only need to show that

= plim[(1/T)X,XX1,,X1,/m,] E[Z,Z,,ml. T--oo Writing X1, = Z1,t, + rjt and using the definition of X21, gives E[X,1Xi,/mli] = E[irZ'Z',ZiIr/m],as in the Appendix

AK-91 arguedthat quarterof birthprovides a legitimate variablefor yearsof schooling becausechildren instrumental born earlierin the year enter school at an older age and are thereforeallowed to dropout (on their 16th or 17thbirthday) afterhavingcompletedless schoolingthanchildrenbornlater in the year. In particular, men bornin earlierquarters less get schooling, are less likely to graduatefrom high school, and earnless thanmen bornin laterquarters.These relationships arestatisticallysignificantin dataon single-yearbirthcohorts from 1920-1959 and in both the 1970 and 1980 Census. to 2SLS estimatesof the return educationbasedon quarterof-birthinstrumentsare close to OLS estimates, suggesting

230

Journalof Business &EconomicStatistics,April1995


0.04

that omitted variablesdo not bias the OLS estimates. Here we focus on estimates for men in their 40s (i.e., men born 1930-1939 in the 1980 Census and men born 1920-1929 in the 1970 Census) because the age-earnings profile is fairly flat for this age group. This minimizes potentialproblems due to correlationbetween age and quarter birth. of The firsttwo columns of Table 1 reportOLS and 2SLS estimatesof the educationcoefficientfromlog-wage equations estimatedusing the Census samples. The samplesizes range fromclose to 250,000 in the 1970 Censusto close to 330,000 in the 1980 Census, and the specificationsare the same as those reportedby AK-91 (tablesIV andV). The 2SLS model x uses 30 quarter-of-birth year-of-birth interactionterms as excludedinstruments, maineffects as includingyear-of-birth covariates. exogenous As notedpreviously,OLS and2SLS estimatesareremarkraises the quesably similarin both data sets. This naturally tion of whethersuch similarityis a real findingor a spurious to result attributable the finite-samplebias of IV. For comSSIV and USSIV results are presentedin columns parison, 3 and 4. Each of the SSIV estimates is somewhat smaller IV thanthe corresponding estimate,as one would expect because SSIV is biased toward0. The SSIV estimateis above the OLS estimate for the 1980 sample, whereas it is below it for the 1970 sample. But in each case the SSIV and OLS estimates are not statisticallydifferent. The SSIV estimates are significantlydifferentfrom 0, with standard errors50% errorsof 2SLS estimates. thanthe standard larger attenuation bias (0) of SSIV is estimated The proportional to be 78% in the 1980 sample and 93% in the 1970 sample, errorof about 12%in each case. Column(4) with a standard USSIV estimates, which inflate the SSIV estimates reports by the inverse of 0. The USSIV estimates tend to be above the OLS estimates and are also remarkablyclose to 2SLS estimates. Estimates 30 Instruments With Table1. Quarter-of-Birth of Type estimates OLS 2SLS SSIV USSIV Parameter (1) (2) (3) (4) A. 1980Census,menborn1930-1939 .081 .063 .070 .089 B
(.0003)
-

0.03

0.02

0.01

0.00

-0.01

-.

*?

-0.02

-0.03

-0.04 -0.22 -0.17 -0.12 -0.07 -0.02 0.03 0.08 0.13 0.18

Education

Residual

of of Figure1. Split-sample Graph AverageWagesby Quarter Birth in of Average Against Schooling Quarter Birth the 1970Cenby showsaverages sus. Thescatter from computed twohalfsamples, and one forearnings one forschooling, bothdrawn from 1970 the Censusformen born1920-1929. Points in are plotted the figure residuals a regression year-of-birth from on effects.TheOLS regresthe is sionlinethrough average alsoshown. The split-sampleapproachcan also be used to produce a graphicalimpressionof the SSIV slope estimate. To do this, we randomly splitthesamplein halfandthengraphed average of earningsby quarter birthin one sample againstaverageedof ucationby quarter birthin the othersample(afterremoving year effects). Figures 1 and2 show these graphsfor the 1970 and 1980 samples. The plots clearly show upward-sloping relationships.The slope of the regressionline drawn in the figurescan be shown to be an SSIV estimatorfor this examto ple becauseZ'Z1is roughlyproportional Ik+pin this case. For both the 1970 and 1980 data,the slope is roughly .069. Table2 reportsa set of OLS, 2SLS, SSIV, and USSIV rex sults for models estimatedusing 150 quarter-of-birth statex year-of-birth of-birthinteractionsplus 30 quarter-of-birth interactionsas the excluded instruments, with data from the
0.04

0.03

0.02

0.01-

(.016)
-

(.023)
.780

(.030)
--

0.00

First-stageF
(df = 30)

(.118)
4.75 2.41 2.41
-0.02 -0.03

B. 1970Census,menborn1920-1929
.070 (.0004)
O--

.069 (.015)
---

.059 (.023)
.934

.063 (.024) 2.03

-0.04 -0.19 -0.14 -0.09 -0.04 0.01 0.06 0.11 0.16 0.21

First-stageF
(df = 30)

4.54

(.127) 2.03

Education

Residual

NOTE: Modelsinclude9 year-of-birth dummies,marital status, regiondummies,SMSA and a race dummyas exogenous regressors. Samplesize for 1980 samplefor dummy, OLSand2SLS is 329,509;forSSIVand USSIV, first-stage the was with equation estimated and 164,474observations the second-stagewith165,035observations. Samplesize forthe 1970 samplefor OLSand 2SLS is 244,099; for SSIVand USSIVthe first-stage equation was estimated with121,956observations the second-stagewith122,143observations. and

2. of of Figure Split-sample Graph Average Wagesby Quarter Birth in of Against Average Schooling Quarter Birth the 1980Cenby showsaverages from computed twohalfsamples, sus. Thescatter one forearnings one forschooling, drawn and from 1980 the both in Censusformenborn1930-1939. Points are plotted the figure residuals a regression year-of-birth from on effects.TheOLS regressionlinethrough averages alsoshown. the is

and Krueger:IVEstimatesof the Returnto Schooling Angrist

231

With Estimates 180Instruments Table2. Quarter-of-Birth OLS (1)


.063 (.0003) -

Parameter
/
S-

of Type estimate SSIV 2SLS (2) (3)


.083 (.009)
-

Table Quarter-of-Birth 3. Estimates-Results 31 Monte of Carlo of (180-instrument Replications Split specification) Actual instruments Random instruments SSIV USSIV SSIV USSIV Statistic (1) (2) (3) (4) statistics education for coefficients Summary
Mean Median deviation Standard of coefficients 25th percentile 75th percentile .048 .050 .010 .042 .055 .112 .114 .024 .099 .129 .002 .004 .014 -.006 .014 .021 .034 .187 -.080 .133

USSIV (4)
.076 (.028) 1.70

1980Census,menborn1930-1939
.031 (.011)
.408

(.057)
First-stageF
(df = 180)
marital 48 NOTE: Modelsinclude9 year-of-birth status, dummies, state-of-birth dummies, as and regiondummies,SMSAdummy, a racedummy exogenousregressors.Samplesize the for 1980 sampleforOLSand 2SLS is 329,509;forSSIVand USSIV first-stage equation and was estimatedwith164,474observations the second-stagewith165,035observations.

2.43

1.70

NOTE: Modelsinclude9 year-of-birth dummiesand 50 state-of-birth dummiesas exogenous regressors.The conventional 2SLS estimateand standard errorusingrandom instrumentsis .057 (.014).

1980 Census sample. This model has a first-stageF statistic of to for the excludedinstruments 2.4 (compared 4.8 in the 30in to instrument model) and corresponds the models reported tableVII of AK-91. BJB andStaigerandStock(1994) argued thatthe low first-stageF statisticmeans thatIV estimatesof these models are likely to be seriously biased. (On the other hand,Hall, Rudebusch,andWilcox [1994] notedthatpretestrelevanceusing first-stageF tests or other ing for instrument criteria can exacerbatethe poor finite-samplepropertiesof 2SLS.) For the 180-excluded-instruments specification,the SSIV educationcoefficient is .031, about40% as largeas the IV estimate, though still significantlydifferentfrom 0. The attenuation bias of SSIV,however,is estimatedproportional the also on the orderof 40%. Consequently, USSIV estimate is .076, only slightly less than the 2SLS estimate and above the OLS estimate. Would SSIV and USSIV providemisleadingresultsin the extreme case of fictitious, randomly assigned instruments? To investigate this, as well as the sensitivity of SSIV and USSIV estimatesto alternativesplits, we conducteda smallscale Monte Carlo exercise in which we randomlydivided the sample and calculated SSIV and USSIV estimates 31 times. For each replication,we divided the datausing a different(randomlygenerated)seed number.The specifications estimatedhereuse the 180 quarter-of-birth interactions exas cluded instruments,as in Table2. The Monte Carloresults for the actualinstruments reare portedin columns (1) and (2) of Table3. The averageSSIV estimate is .048, with a Monte Carlo standarddeviation of .010 in 31 replications. This is somewhat higher than the SSIV estimate reported in Table 2 for a similar specification. (We omittedregion dummies,maritalstatus,the SMSA dummy,andthe race dummyfromthe firstandsecond stages of the models used for the Monte Carloreplications.The estimatesin Table3 and Table2 are thereforenot strictlycomparable.) The median SSIV estimate is .05, with upperand lower quartilesof .055 and .042. The averageestimateof the attenuation bias in SSIV in these 31 replications proportional (not shown in the table) is .433 with a Monte Carlostandard deviation of .05. The averageUSSIV estimate is .112 with a standarddeviation of .024. Lower and upperquartilesfor USSIV estimates are .099 and .129, giving an interquartile

range of .03. This suggests that the SSIV estimates are less sensitive than USSIV estimatesto the sample split. Results of the same experimentusing randomlyassigned fictitious instrumentsare reportedin columns (3) and (4) of Table 3. The SSIV coefficient estimates are centered on 0, deviation of .014. The average with a Monte Carlostandard is estimateof 9 in this experiment .086, suggestingsubstantial bias downward,and this estimate is not significantlydifferent from 0. An insignificantestimate of 0 means that the researchercannot reject the hypothesis that the true vector of reduced-form coefficients is 0. In that case, the SSIV estimate is 0 regardlessof the correlationbetween E and r?in Equations(1) and (2) and the (group-asymptotic)moments of the USSIV coefficient do not exist. The USSIV coefficientestimateswith randomlygenerated instruments highly variable,with a Monte Carlostandard are deviation of .187. Their individualstandarderrorsare also high-on the order of .13-which is about double the size of the OLS coefficientestimate. Althoughthe USSIV coefficients arecenterednear0, the key resulthere is thatthey have very largesamplingvarianceandwould be unlikelyto lead to an apparently credibleinference. In contrast,using the same in randomlygeneratedinstruments 2SLS estimationyields a coefficientestimateof .057 with a reportedstandard errorof .014. Thus, unlike SSIV and USSIV, 2SLS results with fictitious instruments look remarkably the OLS estimates. like

4.2 Angristand Krueger (1992a)


In AK-92, we used the 1970-1972 draft lotteries to constructinstruments theeducationof men atriskof induction for the Vietnamera. The lotteriesworkedby assigning a during randomsequence number(RSN) to dates of birthin cohorts at risk of being drafted. The lowest numbers were called determinedceiling. Men with first,up to an administratively numbersabove the ceiling were not drafted.In certainyears, men could be deferredor exemptedfrom militaryservice by remainingin school andtherebyobtainingan educationaldeferment. Thus draft-lottery numbersaffected both the likelihood of serving in the military and the incentive to seek additionalschooling. Angrist (1990) showed that low lottery numbersare associated with an increasedprobabilityof military service and

232

Journalof Business &EconomicStatistics,April1995

reducedSocial Securityearnings. If this link representsthe casualeffect of veteranstatus,thenthe impactof militaryservice on earningsmust be accountedfor if the draftlotteryis also to be used to identifythe effect of schooling on earnings. We thereforeproposedthe following model:
+ Yi = ,owoi + li-si yvi + Ei, Si = 7rowoi+ IrWli" + ri, (8)

(9)

and + + (10) vi = r"owoi 7rg1wu Ui, where vi is a dummy for veteranstatusand wli is a vector of excluded instruments.The firstequationcapturesthe partial effects of the two endogenous regressors si and vi on the outcomeyi. The lattertwo equationsarereducedforms. The excluded instruments, wli, are dummiesthatindicategroups of consecutive RSN's interactedwith dummies for years of birthfrom 1944-1953. The dataset used to estimate(8)-(10) consists of a sample of over 25,000 observationsfrom six MarchCurrentPopulation Surveys (CPS's) that were specially preparedfor us andincludesinformationon draft-lottery numbers.The CPS extractscontainlabor-market information the years 1979 for and 1981-1985. These datashow thatmen bornfrom 19501953 with low lotterynumberswere indeedsignificantlyand substantiallymore likely to have served in the militarythan men with high numbers. Table4 reportsCPS estimates of Equation(9), which relateseducationto dummiesfor lotterynumbers.Resultsfrom

two models arereported,one where vi is treatedas an exogenous covariateand one where v, is treatedas endogenousin (8). The instrumentsare three dummies for coarse lotterynumbergroups (RSN 1-75, 76-150, and 151-225), interacted with 10 years of birth. The first-stageestimatesdo not show a consistent patternand only a few of the individual coefficient estimatesarepositive. But thejoint test of significance has a marginalsignificancelevel under 10%for both sets of estimates. Table5 reportsOLS, 2SLS, SSIV,andUSSIV estimatesof to schoolingcoefficientsfromEquation(8) corresponding the estimatesin Table4. The estimatesarefor models first-stage in which veteranstatusis treatedas an exogenous covariate (treatingveteranstatusas endogenoushas little impacton the estimatedschooling coefficients.) For comparison,the OLS estimate of .059 is reportedin column (1). The 2SLS estimateis .021 with a standard errorof .029. The SSIV estimate is essentially 0, with a somewhatlargerstandarderrorthan the 2SLS estimate.The attenuation bias in the SSIV estimate is .176, but the standard errorassociatedwith this parameter is .167. Thus the null hypothesisthat the true reduced-form coefficients are 0 cannot be rejected. The USSIV estimate, althoughinflatedby the inverseattenuationbias, is also virtually 0. The main specificationreportedby AK-92 is replicatedin Panel A of Table 6. Column (2) shows the 2SLS estimate generatedby using 3 lottery-numberdummies (indicating groups of 25 consecutive RSN's) interactedwith 10 year-

Table 4. Lottery Number and Educational Attainment Veteran status exogenousac RSN 1-75 76-150 151-225 (1) (2) (3) -.194 -.563 -.471 (.171) (.174) (.177) .375 .230 .457 (.175) (.170) (.175) .301 .332 .298 (.157) (.156) (.163) .003 -.228 .008 (.151) (.151) (.147) -.003 .068 -.004 (.156) (.152) (.153) .057 .091 .079 (.155) (.153) (.150) -.078 .097 -.131 (.151) (.152) (.152) .358 .160 .139 (.147) (.151) (.146) -.023 .025 .080 (.147) (.149) (.149). -.026 .104 .151 (.149) (.148) (.150) .071 Veteran status endogenousbc RSN 1-75 76-150 151-225 (4) (5) (6) -.197 -.556 -.485 (.171) (.174) (.177) .378 .235 .460 (.175) (.170) (.175) .288 .328 .303 (.157) (.156) (.163) -.001 -.232 -.012 (.151) (.151) (.148) -.012 .050 -.018 (.156) (.152) (.153) .031 mi.076 .082 (.155) (.153) (.150) -.134 .056 -.158 (.151) (.152) (.153) .122 .312 .133 (.150) (.151) (.146) -.084 .011 .081 (.150) (.149) (.149) -.035 .102 .149 (.149) (.148) (.150) .080

Yearof birth 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953

P valuefor
jointF test (df = 30)

is aDependentvariable years of schooling. is the veteranstatusand covariates. bDependentvariable years of schoolingafterremoving effectof predicted are central dummy, balanceof SMSAdummy, five nineyear-ofcity marriage dummy, yeardummies, CCovariates two race dummies, birthdummies,and eight regiondummies.Veteran status is also a covariate when it is treatedas exogenous. Samplesize is 25,781. Standard errorsare shownbelowthe coefficients.

and Krueger:IVEstimatesof the Returnto Schooling Angrist

233

With Table5. Lottery Estimates 30 Instruments of Type estimate OLS 2SLS SSIV USSIV Parameter (4) (3) (2) (1)
Actualinstruments(3 lotterydummies* 10 years of birth) .0014 .021 .0002 .059 p (.184) (.032) (.001) (.029)
0 .176

First-stageF
(df = 30)

1.40

(.167) 1.19

1.19

on NOTE: The sample includes25,781 observations men born1944-1953 in the March 1979 and 1981-1985 CPS Special Extracts.The table reportsOLSestimatesand 2SLS estimates of regressionsin whichyears of schoolingis the sole endogenous regressor. and for Othercovariatesincludeveteranstatus, a dummyfor Blacks,a dummy Hispanic with otherraces, dummiesfor residencein centralcity,otherSMSA,and married spouse and eight regiondummies.The nine year-of-birth dummies, present,five year dummies, SSIVand USSIVestimates in bothpanels are based on a single samplesplitwith12,967 used forthe used forthe cross-samplefittedvalues and 12,814 observations observations RSN 1-75, dummies(indicating include3 lottery-number second stage. The instruments dummies. with 76-150 and 151-225) interacted 10 year-of-birth

of-birthdummies. Coefficients on the additionalcovariates are not reported(for these see table 3 of AK-92). Although in the results, using relatively few instruments, Table5 sugthe thatlottery-basedestimationis not very informative, gest is 2SLS estimatein Table6 using 130 instruments .066 with a standard errorof .015, a findingclose to the OLS estimateof .059. The conventionalasymptoticstandard errorof this esbased timatedoes not providea warningof weak instruments on the usual normalapproximation. In contrast with the 2SLS estimates, SSIV estimates in column (3) of Table 6 are .005 with a standard errorof .016. unlikefor most of the specifications Thus, by reported AK-91,
Table 6. LotteryEstimates With130 Instruments Typeof estimate

the SSIV estimate in this case does not confirmthe convenof tional2SLS findings. The impliedattenuation SSIV is .088 errorof .084. This is consistentwith the null with a standard hypothesis that the lottery instrumentsare actually worthless for estimatingschooling coefficients. InflatingSSIV by bias generatesa USSIV estimateof .062, but the attenuation the standarderrorof this estimate is .177, again suggesting aboutthe thatlittle is learnedfrom lottery-basedinstruments returnsto schooling. As a final check on these models, estimates fromthe same specificationusing 13 fictitiousrandomlygeneratedlotteryare numberdummies as instrumients reportedin Panel B of Table 6. The 2SLS estimate here is .049 with a standard error of .018. This is smaller than the estimate using the actualinstruments does notleadto a dramatically but different inference. As with the actual instruments,however, SSIV and USSIV provide strong evidence that the 2SLS result is spurious. The SSIV estimate is .025 with a standarderror error of .018 and the USSIV estimateis 1.65 with a standard of 10.1.

5. CONCLUSIONS
SSIV and USSIV provide valuable complements to conventional 2SLS. SSIV estimates are biased toward0 rather thantowardthe OLS probabilitylimit. Thus with SSIV there is little risk of spuriousor misleading inferences generated solely as a consequence of finite-sample bias. Moreover, the estimated SSIV attenuationbias can be used to inflate SSIV estimatesandprovidean asymptoticallyunbiased(under groupasymptotics)USSIV estimate. Ourreinvestigationof Angrist and Krueger(1991) shows that SSIV and USSIV producerelatively precise parameter estimates that are close to the conventional2SLS and OLS estimates. All of the IV estimatorsused here-2SLS, SSIV, and USSIV--lead to similar results for the 30 instrument specificationsreportedin that article. There is evidence of a problemfor 2SLS estimates in the 180-instrument specifications, as well as for the SSIV estimates, which are biased toward 0. But the bias-correctedUSSIV estimator generates statisticallysignificantestimatesclose to 2SLS andOLS estimatesfor the 180-instrument specification. In contrast,ourreexamination resultsfrom Angrist and of fails to supportthe findingsreportedin that Krueger(1992a) article. 2SLS estimatesare close to OLS estimates in a 130instrumentspecification,but SSIV estimates are essentially 0, and both SSIV and USSIV estimates are statisticallyinsignificant.Thesefindingsthereforesuggestthatdraft-lottery numbersare not useful for estimatingschooling coefficients. A natural extensionof the researchagendabegunhereis to develop more efficientestimatorsthatuse sample splittingto reducebias witha minimalincreasein samplingvariance.For example,an estimatorbased on combiningthe two SSIV and USSIV estimatorsthat could be produced from any single split will have lower sampling variancethan the SSIV and USSIV estimatorsintroducedhere. With Guido Imbens, we version of are also workingon a jackknifed"leave-one-out" based on a separatefirst stage and fitted value for USSIV

OLS
Parameter (1) .059 (.001)
-

2SLS
(2) .066 (.015)
-

SSIV
(3) .005 (.016)
.088

USSIV
(4) .062 (.177) 1.12

A. Actualinstruments(13 lotterydummies* 10 years of birth)

f
0

First-stageF df = 130

1.11

(.084) 1.12

B. Randominstruments dummies* 10 years of birth) (13 multinominal

p
S-

.059

.049
-

.025
.015

1.65

(.001)
First-stageF (df= 130)
-

(.018) (.018)
.82 (.094) .84

(10.1)
-

.84

NOTE: The sample includes 25,781 observations on men born 1944-53 in the March 1979 and 1981-1985 CPS Special Extracts. The table reports OLS estimates and 2SLS estimates of regressions in which years of schooling is the sole endogenous regressor. Other covariates include veteran status, a dummy for Blacks, a dummy for Hispanic and other races, dummies for residence in central city, other SMSA, and married with spouse present, five year dummies, nine year-of-birth dummies, and eight region dummies. The SSIV and USSIV estimates in both panels are based on a single sample split with 12,967 observations used for the cross-sample fitted values and 12,814 observations used for the second-stage. The instruments include 13 lottery-number dummies (indicating group of 25 consecutive RSN's), interated with 10 year-of-birth dummies. The same sample split was used to compute estimates in both panels and in Tables 5 and 6.

234

Journalof Business & EconomicStatistics,April 1995

each observation. This estimatorhas the same asymptotic distributionas 2SLS with the desirable bias propertiesof

USSIV.

Derivation of Equation (6). Recall that P = [fP 1P]'. Write the (p + 1) x (p + 1) matrix, 7r'E(ziz9)7r, a conas matrix: formablypartitioned
P R

ACKNOWLEDGMENTS
Thanksgo to KevinMcCormick,RonaldTucker,andGreg Weyland at the Census Bureau for creating a special Current Population Extract for us. Thanks also go to David GuidoImbens,JimPowell, George Card,GaryChamberlain, at Tauchen,andseminarparticipants NationalBureauof Economic Research Labor Studies meeting for helpful discussions and comments, and to three anonymousreferees for detailedwrittencomments. We will make an SAS computer programavailableto anyone interestedin using the estimators presentedin this article. This researchwas supported by National Science FoundationGrantSES-9012149.

where p is p x p, Q is a scalar, and R is p x 1. Moreover, let q = ca,. Using the partitionedinversionformula(Theil 1971, p. 18), we have

1 ['E(zz)] =

,
P-1

R,

P-'RR'P-I(1/)

- R'P-I(1/ )
where0

-P-IR(1/4)I
(1/0)

J'

(A.6) = Q- R'P-IR is a scalar.We can use (A.6) to write


= +q)] 'E(ZiZi)r + ca2Land

APPENDIX: PROOFS
Proof of Corollary1.1. We need to show that
= + E[X2'X21/n1] {Tr'E(ziz,)Tr ca,Ll}

(A.1)

and (A.2) E[X21XI/nl]= { r'E(ziz')r} = tr{E[(Z2Z2)-'(Z'ZI)]/nI}and LI is a (k + p) where c squarematrixconsisting of zeros except for a 1 in the lower rightcorner.Note that
1

-P(1/) _P_1R'

P-1I

+p-IRR'P-1

-p-IR

+ (1/c

The first term in curly brackets equals [7r'E(ziz),l]-. Therefore,


{7r'E(ZiZ)2r + cao2L1}-I{7r'E(ZiZ)7rr}

= Z(Z2)-12

= Z

+Z(ZZ2)-Z2

(A.3)

and (A.4) XI = Zl7r+ 71. Using the independenceof the two samples and the fact thatE[r72 Z2] = 0, E[X'1X21/nl] simplifies to I
E[r'Z'Z1lr/nl]

+ E[

n, Z',2(ZZ2)-1Z'Z, (ZZ2)-lZr221

= We have, E[7r'Z'Zxr/nl] {(r'E(zjz)7r by virtueof iid sam} To simplify E[r Z2(ZZZ2)-1Z'Z (ZZ2)Z272/nl], let pling. be the column of ?72 correspondingto si. Then, 2rt
E[Ez,(Z2(,)Z, -1, ')-IzI(ZIZ2)= E['2 '2(ZZ2)-1Z'Z Z1(Z2Z2)1Z22]L1/nl.

+ + q)] 0 q)]Ip+l [q//( Multiplying this times P gives Equation (6) in the text. Because {1r'E(ziz)7r is positive definite, 1/4 = 1/[Q } R'P-'R], which is the lower right diagonal element of must be positive. Finally, note that c > 0 [7r'E(zizz)7r]-', because (Z2Z2)-1(Z'Zi) positive definite. Thus the proporis tional bias in estimatesof Pfi,[0/(0 + q)] = [0/(4 + cog)], is between 0 and 1.

= [0/(0 +

"

[ReceivedMarch1993. RevisedSeptember1994.]

we Using propertiesof the traceoperator, have


E[g2'Z2(Z2Z2)ZZ-1ZZ
(Z2Z2)-1Z2

REFERENCES
Bias in GMMEstimation Altonji,J., andSegal,L. M. (1994), "Small-Sample of Covariance TechnicalWorkingPaper156, NationalBureau Structures," of Economic Research,Cambridge,MA. Draft LotAngrist, J. D. (1990), "LifetimeEarningsand the Vietnam-Era American Records," tery: EvidenceFromSocial SecurityAdministrative EconomicReview,80, 313-336. Angrist, J., and Krueger,A. B. (1991), "Does CompulsorySchool AttendanceAffect SchoolingandEarnings?"QuarterlyJournalofEconomics, 106, 979-1014. (1992a), "Estimatingthe Payoff to Schooling Using the VietnamEra Draft Lottery," WorkingPaper4067, National Bureauof Economic MA. Research,Cambridge, (1992b), "TheEffect of Age at School Entryon EducationalAttainment: An Applicationof Instrumental Variables WithMomentsFromTwo Samples,"Journalof the AmericanStatisticalAssociation, 87, 328-336. Bekker, P. A. (1994), "Alternative Approximationsto the Distributionsof Instrumental VariablesEstimators," Econometrica,62, 657-682. Bound,J., Jaeger,D., andBaker,R. (in press), "Problems WithInstrumental Variables EstimationWhen the CorrelationBetween the Instruments and

]2]

E[tr{Z

Z2(Z2Z2)'ZI(Z2)-'.

Iteratingexpectations, passing the expectationthroughthe trace,and using the fact thatE[r;' IZ2] = l, gives E[tr This establishes(A. 1). To simplify (A.2), use (A.3) and (A.4) to write
E[X21XI] = E[r'Z'Zr] + E[7r'Z'jrl] = E[tr{Z'Z,(ZZ2)-1 }]O. {Z'rZrl;Z2(ZZ2)-'ZtZ1(ZZ2)- }]

+ E[r,2(ZZ2)Z-1'Z17 , +E[r 2Z2(Z2Z2)ZI1'I]. (A.5) Because the two samples are independentand r/ is meanindependentof Z4,only the first term on the right side of (A.5) is nonzero.

IV of to and Angrist Krueger: Estimates the Return Schooling


Journalof theAmerican the EndogenousExplanatoryVariableIs Weak," StatisticalAssociation, 90. EconoVariablesEstimators," Buse, A. (1992), "The Bias of Instrumental metrica, 60, 173-180. Deaton, A. (1985), "Panel Data From a Time Series of Cross-Sections," Journalof Econometrics,30, 109-126. Fuller,W. A. (1975), "RegressionAnalysis for Sample Surveys,"Sankhydi, Ser. C, 37, 312-326. Hall, R. A., Rudebusch,G. D., and Wilcox, D. W. (1994), "JudgingInstruDiscussion Paper VariablesEstimation," ment Relevance in Instrumental 94-3, Federal Reserve Board Division of MonetaryAffairs, Division of Researchand Statistics,Finance and Economics, Washington,DC. Maddala,G. S., and Jeong, J. (1992), "On the Exact Small Sample Distri-

235

VariablesEstimator," butionof the Instrumental Econometrica,60, 181183. Nagar,A. L. (1959), "TheBias and Moment Matrixof the Generalk-class in Estimatorsof the Parameters SimultaneousEquations," Econometrica, 27, 575-595. Nelson, C., and Startz, R. (1990), "The Distributionof the Instrumental VariablesEstimatorand its t-ratioWhen the Instrumentis a Poor One," Journal of Business, 63, part2, S125-S140. VariablesRegressions Staiger, D., and Stock, J. H. (1994), "Instrumental WithWeakInstruments," TechnicalWorkingPaper151, NationalBureau of Economic Research,Cambridge,MA. Theil, H., (1971), Principles of Econometrics, Chicago: University of Chicago Press.

You might also like