Professional Documents
Culture Documents
Solutions to OddNumbered EndofChapter Exercises
Chapter 2
Review of Probability
(b) Cumulative probability distribution function for Y
Using Key Concept 2.3: var(Y ) E (Y ) - [ E (Y )] ,
2 2
and
(ui | X i )
so that
var(Y ) E (Y 2 ) - [ E (Y )]2 1.50 - (1.00) 2 0.50.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 3
2.5. Let X denote temperature in F and Y denote temperature in C. Recall that Y 0 when X 32 and
Y 100 when X 212; this implies Y (100/180) �( X - 32) or Y -17.78 + (5/9) �X. Using Key
Concept 2.3, mX 70oF implies that mY -17.78 + (5/9) �70 21.11�
C, and s 7oF implies
X
s Y (5/9) �7 3.89�
C.
Value of Y Probability
Distribution of
14 22 30 40 65 X
Value of X 1 0.02 0.05 0.10 0.03 0.01 0.21
5 0.17 0.15 0.05 0.02 0.01 0.40
8 0.02 0.03 0.15 0.10 0.09 0.39
Probability distribution of Y 0.21 0.23 0.30 0.15 0.11 1.00
(a) The probability distribution is given in the table above.
E (Y ) 14 �0.21 + 22 �0.23 + 30 �0.30 + 40 �0.15 + 65 �0.11 30.15
E (Y 2 ) 142 �0.21 + 22 2 �0.23 + 302 �0.30 + 402 �0.15 + 652 �0.11 1127.23
var(Y ) E (Y 2 ) - [ E (Y )]2 218.21
s Y 14.77
©2011 Pearson Education, Inc. Publishing as Addison Wesley
4 Stock/Watson • Introduction to Econometrics, Third Edition
(b) The conditional probability of Y|X 8 is given in the table below
Value of Y
14 22 30 40 65
0.02/0.39 0.03/0.39 0.15/0.39 0.10/0.39 0.09/0.39
2.13.
(b) Y and W are symmetric around 0, thus skewness is equal to 0; because their mean is zero, this
means that the third moment is zero.
calculation yields the results for W.
X 0, so that S W :
(d) First, condition on
E ( S | X 0) 0; E ( S 2 | X 0) 100, E ( S 3 |X 0) 0, E ( S 4 | X 0) 3 �100 2.
Similarly,
E ( S | X 1) 0; E ( S | X 1) 1, E ( S | X 1) 0, E ( S | X 1) 3.
2 3 4
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 5
From the law of iterated expectations
E ( S ) E ( S | X 0) �Pr (X 0) + E ( S | X 1) �Pr( X 1) 0
E ( S 2 ) E ( S 2 | X 0) �Pr (X 0) + E ( S 2 | X 1) �Pr( X 1) 100 �0.01 + 1 �0.99 1.99
E ( S 3 ) E ( S 3 | X 0) �Pr (X 0) + E ( S 3 | X 1) �Pr( X 1) 0
E ( S 4 ) E ( S 4 | X 0) �Pr (X 0) + E ( S 4 | X 1) �Pr( X 1)
3 �1002 �0.01 + 3 �1�0.99 302.97
�9.6 - 10 Y - 10 10.4 - 10 �
Pr (9.6 �Y �10.4) Pr � � � �
� 4/n 4/n 4/n �
�9.6 - 10 10.4 - 10 �
Pr � �Z � �
2.15. (a) � 4/n 4/n �
where Z ~ N(0, 1). Thus,
�9.6 - 10 10.4 - 10 �
Pr � �Z � � Pr ( -0.89 �Z �0.89) 0.63
(i) n 20; � 4/ n 4/ n �
�9.6 - 10 10.4 - 10 �
Pr � �Z � � Pr(-2.00 �Z �2.00) 0.954
(ii) n 100; � 4/n 4/n �
�9.6 - 10 10.4 - 10 �
Pr � �Z � � Pr(-6.32 �Z �6.32) 1.000
(iii) n 1000; � 4/n 4/n �
� -c Y - 10 c �
Pr (10 - c �Y �10 + c) Pr � � � �
� 4/n 4/n 4/n �
� -c c �
Pr � �Z � .
�
(b) � 4/n 4/n �
c
As n get large 4 / n gets large, and the probability converges to 1.
(c) This follows from (b) and the definition of convergence in probability given in Key Concept 2.6.
2.17.
�Y - 0.4 0.43 - 0.4 � �Y - 0.4 �
Pr � � � Pr � �0.6124 � 0.27
(a) (i) P( Y 0.43) � 0.24/n 0.24/n � � 0.24/n �
©2011 Pearson Education, Inc. Publishing as Addison Wesley
6 Stock/Watson • Introduction to Econometrics, Third Edition
�Y - 0.4 0.37 - 0.4 � �Y - 0.4 �
Pr � � � Pr � �-1.22 � 0.11
(ii) P( Y 0.37) � 0.24/n 0.24/n � � 0.24/n �
0.41 - 0.40
0.41 > -1.96
(b) We know Pr(-1.96 Z 1.96) 0.95, thus we want n to satisfy 24 / n
0.39 - 0.40
-1.96.
and 24 / n Solving these inequalities yields n 9220.
l
Pr (Y y j ) �Pr ( X xi , Y y j )
i 1
l
�Pr (Y y j | X xi )Pr ( X xi )
2.19. (a) i 1
k k l
E (Y ) �y j Pr (Y yj ) �yj �Pr (Y yj |X xi ) Pr ( X xi )
j 1 j 1 i 1
l �k �
� �yj Pr (Y yj |X xi ) �Pr ( X xi )
� �
�
� �
i 1 �
�j 1
�
�
l
�E (Y | X xi )Pr ( X xi ).
(b) i 1
so
s XY E[( X - m X )(Y - mY )]
l k
��( xi - m X )( y j - mY ) Pr ( X xi , Y y j )
i 1 j 1
l k
��( xi - mX )( y j - mY ) Pr ( X xi ) Pr (Y y j )
i 1 j 1
�l �k
� �
� �
�i 1
( xi - m X ) Pr ( X x )
i ��
�
� ( yj - mY ) Pr (Y yj �
�j 1 �
E ( X - m X ) E (Y - mY ) 0 �0 0,
s XY 0
corr(X , Y ) 0.
s Xs Y s Xs Y
E ( X - m )3 E[( X - m ) 2 ( X - m )] E[ X 3 - 2 X 2 m + X m 2 - X 2 m + 2 X m 2 - m 3 ]
E ( X 3 ) - 3E ( X 2 ) m + 3E ( X ) m 2 - m 3 E ( X 3 ) - 3E ( X 2 ) E ( X )
+ 3E ( X )[ E ( X )]2 - [ E ( X )]3
2.21. (a) E ( X 3 ) - 3E ( X 2 ) E ( X ) + 2 E ( X ) 3
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 7
E ( X - m ) E[( X - 3 X m + 3 X m - m )( X - m )]
4 3 2 2 3
E[ X 4 - 3 X 3 m + 3 X 2 m 2 - X m 3 - X 3 m + 3 X 2 m 2 - 3 X m 3 + m 4 ]
E ( X 4 ) - 4 E ( X 3 ) E ( X ) + 6 E ( X 2 ) E ( X ) 2 - 4 E ( X ) E ( X )3 + E ( X ) 4
©2011 Pearson Education, Inc. Publishing as Addison Wesley
8 Stock/Watson • Introduction to Econometrics, Third Edition
2.23. X and Z are two independently distributed standard normal random variables, so
m X m Z 0, s X2 s Z2 1, s XZ 0.
(b) E ( X ) s X + m X 1, and mY E ( X + Z ) E ( X ) + m Z 1 + 0 1.
2 2 2 2 2
(c) E ( XY ) E ( X + ZX ) E ( X ) + E ( ZX ). Using the fact that the odd moments of a standard
3 3
normal random variable are all zero, we have E ( X ) 0. Using the independence between
3
�( x + y ) ( x
i 1
i i 1 + y1 + x2 + y2 + L xn + yn )
( x1 + x2 + L xn ) + ( y1 + y2 + L yn )
n n
�xi + �yi
(b) i 1 i 1
�a (a + a + a + L + a ) na
(c) i 1
n n
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 3
Review of Statistics
s Y2 43
s Y2 0.6719,
(b) n 64, n 64 and
101 - 100 Y - 100 103 - 100 �
�
Pr(101 Y 103) Pr � �
� 0.6719 0.6719 0.6719 �
�F (3.6599) - F (1.2200) 0.9999 - 0.8888 0.1111.
s Y2 43
s Y2 0.2606,
(c) n 165, n 165 and
�Y - 100 98 - 100 �
Pr (Y > 98) 1 - Pr (Y �98) 1 - Pr � � �
� 0.2606 0.2606 �
�1 - F(-3.9178) F(3.9178) 1.0000 (rounded to four decimal places).
215
pˆ 0.5375.
400
(a)
©2011 Pearson Education, Inc. Publishing as Addison Wesley
10 Stock/Watson • Introduction to Econometrics, Third Edition
(c) The computed tstatistic is
pˆ - m p ,0 0.5375 - 0.5
t act 1.506.
SE( pˆ ) 0.0249
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 11
Pr(|pˆ - 0.5| > 0.02) 1 - Pr( -0.02 �pˆ - 0.5 �.02)
� -0.02 pˆ - 0.5 0.02 �
1 - Pr � � � �
� 0.53 �0.47/1055 0.53 �0.47/1055 0.53 �0.47/1055 �
� -0.05 pˆ - 0.53 -0.01 �
1 - Pr � � � �
� 0.53 �0.47/1055 0.53 �0.47/1055 0.53 �0.47/1055 �
� pˆ - 0.53 �
1 - Pr �-3.25 � �-0.65 �
� .53 �0.47/1055 �
0.74
where the final equality using the central limit theorem approximation.
0.54 - 0.50
t 2.61, and Pr(|t| > 2.61) 0.01,
(b) (i)
(0.54 �0.46) / 1055 so that the null is rejected at the
5% level.
(ii) Pr(t > 2.61) .004, so that the null is rejected at the 5% level.
(iii) 0.54 �1.96 (0.54 �0.46) / 1055 0.54 �0.03, or 0.51 to 0.57.
(iv) 0.54 �2.58 (0.54 �0.46) / 1055 0.54 �0.04, or 0.50 to 0.58.
(v) 0.54 �0.67 (0.54 �0.46) / 1055 0.54 �0.01, or 0.53 to 0.55.
(c) (i) The probability is 0.95 is any single survey, there are 20 independent surveys, so the
probability if 0.95 0.36
20
(ii) 95% of the 20 confidence intervals or 19.
1.96 �SE( pˆ ) .01 or 1.96 � p(1 - p) / n .01.
(d) The relevant equation is Thus n must be
1.96 p (1 - p )
2
n> ,
chosen so that 0.012 so that the answer depends on the value of p. Note that the
largest value that p(1 − p) can take on is 0.25 (that is, p 0.5 makes p(1 - p) as large as
1.962 �0.25
n> 9604,
possible). Thus if 0.012 then the margin of error is less than 0.01 for all
values of p.
3.7. The null hypothesis is that the survey is a random draw from a population with p = 0.11. The t
pˆ - 0.11
t ,
statistic is SE( pˆ ) where SE( pˆ ) pˆ (1 - pˆ )/n. (An alternative formula for SE( p̂ ) is
0.11 �(1 - 0.11) / n, which is valid under the null hypothesis that p 0.11). The value of the t
statistic is 2.71, which has a pvalue of that is less than 0.01. Thus the null hypothesis p 0.11
(the survey is unbiased) can be rejected at the 1% level.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
12 Stock/Watson • Introduction to Econometrics, Third Edition
sY 200
sY 20
standard deviation of the sampling distribution of Y is n 100 hours. The
hypothesis test is H 0 : m 2000 vs. H1 : m > 2000 . The manager will accept the alternative
hypothesis if Y > 2100 hours.
(a) The size of a test is the probability of erroneously rejecting a null hypothesis when it is valid.
The size of the manager’s test is
(b) The power of a test is the probability of correctly rejecting a null hypothesis when it is invalid. We
calculate first the probability of the manager erroneously accepting the null hypothesis when it is
invalid:
1 �1 n 3 n �
� �� mY + � � mY � mY
n �2 2 2 2 �
1 �1 9 1 9 �
var(Y%
) 2 � var(Y1 ) + var(Y2 ) + L var(Yn -1 ) + var(Yn ) �
n �4 4 4 4 �
1 �1 n 2 9 n 2 � s 2
2 � �� sY + � � s Y � 1.25 Y .
n �4 2 4 2 � n
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 13
The pvalue for the onesided test is:
p -value 1 - F (t act ) 1 - F (4.0479) 1 - 0.999974147 2.5853 �10-5.
With the small pvalue, the null hypothesis can be rejected with a high degree of confidence.
There is statistically significant evidence that the districts with smaller classes have higher
average test scores.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
14 Stock/Watson • Introduction to Econometrics, Third Edition
0.859(1 - 0.859) 0.374(1 - 0.374)
+
95% CI is (.859 – .374) ± 1.96 5801 4249 or 0.485 ± 0.017.
this is an example of the “continuous mapping theorem” discussed in Chapter 17.)
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 15
1 1
�in1 (Ymi - Ym ) 2 �in1 (Ywi - Yw ) 2
( n - 1) (n - 1)
[ SE (Ym - Yw )]
2
+
n n
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 4
Linear Regression with One Regressor
(d) Use the formula for the standard error of the regression (SER) in Equation (4.19) to get the
sum of squared residuals:
SSR (n - 2) SER 2 (100 - 2) �11.52 12961.
2
Use the formula for R in Equation (4.16) to get the total sum of squares:
SSR 12961
TSS 13044.
1- R 2
1 - 0.082
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 17
(b) SER is in the same units as the dependent variable (Y, or AWE in this example). Thus SER is
measured in dollars per week.
(c) R2 is unit free.
(d) (i) 696.7 + 9.6 �25 $936.7;
(ii) 696.7 + 9.6 �45 $1,128.7
(e) No. The oldest worker in the sample is 65 years old. 99 years is far outside the range of the
sample data.
(f) No. The distribution of earning is positively skewed and has kurtosis larger than the normal.
ˆ ˆ ˆ ˆ
(g) b 0 Y - b1 X , so that Y b 0 + b1 X . Thus the sample mean of AWE is 696.7 + 9.6 41.6
$1,096.06.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
18 Stock/Watson • Introduction to Econometrics, Third Edition
ˆ ˆ ˆ ˆ
4.9. (a) With b1 0, b0 Y , and Yi b 0 Y . Thus ESS 0 and R2 0.
ˆ ˆ ˆ ˆ ˆ
(b) If R2 0, then ESS 0, so that Yi Y for all i. But Yi b 0 + b1 X i , so that Yi Y for all i,
ˆ
which implies that b1 0, or that X is constant for all i. If X is constant for all i, then
i i
�
n
( Xi - X ) 0
2
i -1 and b̂1 is undefined (see equation (4.7)).
(a) The least squares objective function is �i 1
n
(Yi - b1 X i ) 2.
4.11. Differentiating with respect to b1
n
��(Yi - b1 X i ) 2
-2�i 1 X i (Yi - b1 X i ).
i 1 n
yields �
b1 Setting this zero, and solving for the least
n
�X Y i i
bˆ1 i 1
n
.
�X i
2
squares estimator yields i 1
�X (Y - 4) i i
bˆ1 i 1
n
.
�X i
2
(b) Following the same steps in (a) yields i 1
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 19
4.13. The answer follows the derivations in Appendix 4.3 in “LargeSample Normal Distribution of the
OLS Estimator.” In particular, the expression for i is now i (Xi - mX)ui, so that var(i)
3var[(Xi - mX)ui], and the term 2 carry through the rest of the calculations.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 5
Regression with a Single Regressor: Hypothesis
Tests and Confidence Intervals
5.3. The 99% confidence interval is 1.5 {3.94 2.58 0.31) or 4.71 lbs WeightGain 7.11 lbs.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to EndofChapter Exercises 21
13.9
t act
5.56,
(b) The tstatistic is 2.5 which has a pvalue of 0.00. Thus the null hypothesis is
rejected at the 5% (and 1%) level.
(c) 13.9 2.58 2.5 13.9 6.45.
3.2
2.13
5.7. (a) The tstatistic is 1.5 with a pvalue of 0.03; since the pvalue is less than 0.05, the
null hypothesis is rejected at the 5% level.
(b) 3.2 1.96 1.5 3.2 2.94
(c) Yes. If Y and X are independent, then b1 0; but this null hypothesis was rejected at the
5% level in part (a).
(d) b1 would be rejected at the 5% level in 5% of the samples; 95% of the confidence intervals
would contain the value b1 0.
1
(Y1 + Y2 + L + Yn )
b n
SE ( bˆ1 ) 7.65.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
22 Stock/Watson • Introduction to Econometrics, Third Edition
[SE( bˆm ,1 )]2 + [SE( bˆw,1 )]2 ,
and the result follows by noting the SE is the square root of the
estimated variance.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 6
Linear Regression with
Multiple Regressors
6.1. By equation (6.15) in the text, we know
n -1
R2 1- (1 - R 2 ).
n - k -1
2
Thus, that values of R are 0.175, 0.189, and 0.193 for columns (1)–(3).
6.3. (a) On average, a worker earns $0.29/hour more for each year he ages.
(b) Sally’s earnings prediction is 4.40 + 5.48 �1 - 2.62 �1 + 0.29 �29 15.67 dollars per hour.
Betsy’s earnings prediction is 4.40 + 5.48 �1 - 2.62 �1 + 0.29 �34 17.12 dollars per hour.
The difference is 1.45
6.5. (a) $23,400 (recall that Price is measured in $1000s).
(b) In this case DBDR 1 and DHsize 100. The resulting expected change in price is 23.4 +
0.156 100 39.0 thousand dollars or $39,000.
(c) The loss is $48,800.
R 2 1 - n -n -k 1-1 (1 - R 2 ), R 2 1 - n -n -k1-1 (1 - R 2 ),
(d) From the text so thus, R2 0.727.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
24 Stock/Watson • Introduction to Econometrics, Third Edition
(b) The description suggests that the research goes a long way towards controlling for
potential omitted variable bias. Yet, there still may be problems. Omitted from the
analysis are characteristics associated with behavior that led to incarceration
(excessive drug or alcohol use, gang activity, and so forth), that might be correlated
with future earnings. Ideally, data on these variables should be included in the
analysis as additional control variables.
6.9. For omitted variable bias to occur, two conditions must be true: X1 (the included
regressor) is correlated with the omitted variable, and the omitted variable is a
determinant of the dependent variable. Since X1 and X2 are uncorrelated, the estimator of
b1 does not suffer from omitted variable bias.
��(Yi - b1 X1i - b2 X 2i ) 2
-2�X 1i (Yi - b1 X 1i - b2 X 2i )
�b1
��(Yi - b1 X 1i - b2 X 2i ) 2
-2�X 2i (Yi - b1 X 1i - b2 X 2i )
� b2
(b)
ˆ
(c) From (b), b1 satisfies
�X 1iYi - bˆ2 �X 1i X 2i
bˆ1
�X 12i
or
and the result follows immediately.
(d) Following analysis as in (c)
�X 2iYi - bˆ1 �X 1i X 2i
bˆ2
�X 22i
�X 1iY � 2 i i X1 �
X Y - bˆ X 1i X 2 i
�X 1i X 2 i
ˆ
b1 � 2
2i
.
�X 12i
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to EndofChapter Exercises 25
��(Yi - b0 - b1 X 1i - b2 X 2 i )
2
-2�(Yi - b0 - b1 X 1i - b2 X 2 i ).
� b0
ˆ ˆ ˆ
Setting this to zero and solving for b̂ 0 yields: b 0 Y - b1 X 1 - b 2 X 2 .
ˆ ˆ ˆ
(f) Substituting b 0 Y - b1 X 1 - b 2 X 2 into the least squares objective function yields
�(Yi - bˆ0 - b1 X1i - b2 X 2i )2 �( (Yi - Y ) - b1 ( X1i - X1 ) - b2 ( X 2i - X 2 ) ) , which is
2
identical to the least squares objective function in part (a), except that all variables
have been replaced with deviations from sample means. The result then follows as in
(c).
Notice that the estimator for b1 is identical to the OLS estimator from the regression
of Y onto X1, omitting X2. Said differently, when � 1i
( X - X 1 )( X 2i - X 2 ) 0
, the
estimated coefficient on X1 in the OLS regression of Y onto both X1 and X2 is the
same as estimated coefficient in the OLS regression of Y onto X1.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 7
Hypothesis Tests and Confidence
Intervals in Multiple Regression
7.1 and 7.2
Regressor (1) (2) (3)
College (X1) 5.46** 5.48** 5.44**
(0.21) (0.21) (0.21)
Female (X2) - 2.64** - 2.62** - 2.62**
(0.20) (0.20) (0.20)
Age (X3) 0.29** 0.29**
(0.04) (0.04)
Ntheast (X4) 0.69*
(0.30)
Midwest (X5) 0.60*
(0.28)
South (X6) - 0.27
(0.26)
Intercept 12.69** 4.40** 3.75**
(0.14) (1.05) (1.06)
(a) The tstatistic is 5.46/0.21 26.0, which exceeds 1.96 in absolute value. Thus, the coefficient
is statistically significant at the 5% level. The 95% confidence interval is 5.46 1.96 0.21.
(b) tstatistic is - 2.64/0.20 -13.2, and 13.2 > 1.96, so the coefficient is statistically significant
at the 5% level. The 95% confidence interval is -2.64 1.96 0.20.
bˆcollege,1998 - bˆcollege,1992
t
SE( bˆcollege,1998 - bˆcollege, 1992 )
7.5. The tstatistic for the difference in the college coefficients is . Because
b̂ college,1998 b̂
and college,1992 are computed from independent samples, they are independent, which
cov( bˆcollege,1998 , bˆcollege,1992 ) 0 var( bˆcollege,1998 - bˆcollege,1992 )
means that Thus, =
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to EndofChapter Exercises 27
and test whether g 0.
(b) Estimate
Yi b 0 + g X 1i + b 2 ( X 2 i - aX 1i ) + ui
and test whether g 0.
(c) Estimate
Yi - X 1i b 0 + g X 1i + b 2 ( X 2 i - X 1i ) + ui
and test whether g 0.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
28 Stock/Watson • Introduction to Econometrics, Third Edition
(b) Because treatment was randomly assigned conditional on enrollment status
(continuing or newlyenrolled), E(u | X1, X2) will not depend on X1. This means that
the assumption of conditional mean independence is satisfied, and b̂1 is unbiased and
consistent. However, because X2 was not randomly assigned (newlyenrolled
students may, on average, have attributes other than being newly enrolled that affect
test scores), E(u | X , X ) may depend of X , so that b̂ 2 may be biased and
1 2 2
inconsistent.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 8
Nonlinear Regression Functions
198 - 196
100 � 1.0204%.
8.1. (a) The percentage increase in sales is 196 The approximation
is 100 [ln (198) - ln (196)] 1.0152%.
205 - 196
100 � 4.5918%
(b) When Sales2010 205, the percentage increase is 196 and the
approximation is 100 [ln (205) - ln (196)] 4.4895%. When Sales2010 250, the percentage
250 - 196
100 � 27.551%
increase is 196 and the approximation is 100 [ln (250) - ln (196)]
500 - 196
100 � 155.1%
24.335%. When Sales2010 500, the percentage increase is 196 and the
approximation is 100 [ln (500) - ln (196)] 93.649%.
(c) The approximation works well when the change is small. The quality of the approximation
deteriorates as the percentage change increases.
(ii) As described in equation (8.8) and the footnote on page 261, the standard error can
be found by dividing 0.28, the absolute value of the estimate, by the square root of
the
Fstatistic testing bln(Price per citation) + ln(80) bln(Age)×ln(Price per citation) 0.
�Characters �
ln � � ln(Characters ) - ln(a)
(c) � a � for any constant a. Thus, estimated
parameter on Characters will not change and the constant (intercept) will change.
(ii) Female is correlated with the two new included variables and at least one of the
variables is important for explaining ln(Earnings). Thus the regression in part (a)
suffered from omitted variable bias.
(c) Forgetting about the effect or Return, whose effects seems small and statistically
insignificant, the omitted variable bias formula (see equation (6.1)) suggests that
Female is negatively correlated with ln(MarketValue).
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 31
8.9. Note that
Y b 0 + b1 X + b 2 X 2
b 0 + (b1 + 21b 2 ) X + b 2 ( X 2 - 21X ).
Y b 0 + g X + b 2 Z + ui .
gˆ �1.96 �SE ( gˆ ) .
The confidence interval is
dE (Y | X )
b1
8.11. Linear model: E(Y | X) b0 + b1X, so that dX and the elasticity is
X b1 X
b1
E (Y | X ) b 0 + b1 X
LogLog Model: E(Y | X)
( )
E e b0 + b1 ln( X ) + u | X e b0 + b1 ln( X ) E (eu | X ) ce b0 + b1 ln( X )
, where c
E(eu | X), which does not depend on X because u and X are assumed to be independent.
dE (Y | X ) b1 b0 + b1 ln( X ) E (Y | X )
ce b1
Thus dX X X , and the elasticity is b1.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 9
Assessing Studies Based on
Multiple Regression
9.1. As explained in the text, potential threats to external validity arise from differences between the
population and setting studied and the population and setting of interest. The statistical results
based on New York in the 1970s are likely to apply to Boston in the 1970s but not to Los Angeles
in the 1970s. In 1970, New York and Boston had large and widely used public transportation
systems. Attitudes about smoking were roughly the same in New York and Boston in the 1970s. In
contrast, Los Angeles had a considerably smaller public transportation system in 1970. Most
residents of Los Angeles relied on their cars to commute to work, school, and so forth. The results
from New York in the 1970s are unlikely to apply to New York in 2010. Attitudes towards
smoking changed significantly from 1970 to 2010.
9.3. The key is that the selected sample contains only employed women. Consider two women, Beth
and Julie. Beth has no children; Julie has one child. Beth and Julie are otherwise identical. Both
can earn $25,000 per year in the labor market. Each must compare the $25,000 benefit to the costs
of working. For Beth, the cost of working is forgone leisure. For Julie, it is forgone leisure and the
costs (pecuniary and other) of child care. If Beth is just on the margin between working in the
labor market or not, then Julie, who has a higher opportunity cost, will decide not to work in the
labor market. Instead, Julie will work in “home production,” caring for children, and so forth.
Thus, on average, women with children who decide to work are women who earn higher wages in
the labor market.
g 1b 0 - g 0 b1 g 1u - b1v
Q + .
9.5. (a) g 1 - b1 g 1 - b1
b0 - g 0 u -v
P + .
and g 1 - b1 g 1 - b1
g 1b 0 - g 0 b1 b -g0
E (Q ) , E ( P) 0
(b) g 1 - b1 g 1 - b1
2
� 1 � 2 2
var(Q) � �(g 1 s u + b1 s v ), var( P)
2 2
�g 1 - b1 �
2
� 1 � 2
� �(s u + s v ),
2
g
�1 - b1 �
(c)
and
Solutions to OddNumbered EndofChapter Exercises 33
2
� 1 �
cov( P, Q) � �(g 1s u + b1s V )
2 2
g
�1 - b1 �
p cov(Q, P ) g s + b1s v 2 2
cov( P, Q)
bˆ1 � 1 u2 , bˆ0 �
p
E (Q ) - E ( P )
(d) (i) var( P) s u + s v2 var( P)
p
s 2 (g - b )
bˆ1 - b1 � u 2 1 2 1 > 0,
(ii) su +sv using the fact that g1 > 0 (supply curves slope up) and b1 0
(demand curves slope down).
9.9. Both regressions suffer from omitted variable bias so that they will not provide reliable estimates
of the causal effect of income on test scores. However, the nonlinear regression in (8.18) fits the
data well, so that it could be used for forecasting.
9.11. Again, there are reasons for concern. Here are a few.
Internal consistency: To the extent that price is affected by demand, there may be simultaneous
equation bias.
External consistency: The internet and introduction of “Ejournals” may induce important changes in
the market for academic journals so that the results for 2000 may not be relevant for today’s market.
� ( X% %
300
i - X )(Yi - Y )
bˆ1 i 1
� ( X% %2
300
i - X)
9.13. (a) i 1 . Because all of the Xi’s are used (although some are used for
� � ( X% � ( X%- X )(u - u )
0.8 n n n
( X i - X )2 i - X )( X i - X )
bˆ1 b1 in1
i i
+ b1 i 0.8 n +1
+ i 1
� � � (X - X )
n n
i 1
(Xi - X ) 2
i 1
(Xi - X ) 2
i 1 i
2
1 0.8n 1 n 1 n %
�i 1
( X i - X )2 �i 0.8 n +1
( X% i - X )( X i - X ) �i 1 ( X i - X )(ui - u )
b1 n + b1 n +n
1 n 1 n 1 n
n
� i 1
( X i - X )2
n
� i 1
( X i - X )2 � ( X i - X )2
n i 1
©2011 Pearson Education, Inc. Publishing as Addison Wesley
34 Stock/Watson Introduction to Econometrics Third Edition
where n 300, and the last equality uses an ordering of the observations so that the first
%
240 observations ( 0.8 n) correspond to the correctly measured observations ( X i Xi).
As is done elsewhere in the book, we interpret n 300 as a large sample, so we use the
approximation of n tending to infinity. The solution provided here thus shows that these
expressions are approximately true for n large and hold in the limit that n tends to infinity.
Each of the averages in the expression for b̂1 have the following probability limits:
1 n p
n
�i 1
( X i - X ) 2
� s X2
,
1 0.8 n p
� i
n i 1
( X - X ) 2
� 0.8s X2
,
1 n % p
�
n i 1
( X i - X )(u i - u ) � 0
, and
1 n p
�i 0.8 n +1
( X%i - X )( X i - X ) � 0
n ,
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 35
%
where the last result follows because X i Xi for the scrambled observations and Xj is
p
ˆ
independent of Xi for i j. Taken together, these results imply b1 � 0.8b1 .
p p
ˆ ˆ
(b) Because b1 � 0.8b1 , b1 / 0.8 � b1 , so a consistent estimator of b1 is the OLS estimator
divided by 0.8.
(c) Yes, the estimator based on the first 240 observations is better than the adjusted estimator
from part (b). Equation (4.21) in Key Concept 4.4 (page 129) implies that the estimator based
on the first 240 observations has a variance that is
1 var [ ( X i - m X )ui ]
var( bˆ1 (240obs ))
[ var( X i )]
2
240
.
From part (a), the OLS estimator based on all of the observations has two sources of sampling
� ( X%- X )(u - u )
300
i 1 i i
�
300
( X i - X )2
error. The first is i 1 which is the usual source that comes from the
� ( X%- X )( X - X )
300
i i
b1 i 241
�i1 ( X i - X )2 , which is the source that comes
300
omitted factors (u). The second is
from scrambling the data. These two terms are uncorrelated in large samples, and their
respective largesample variances are:
��300 ( X% i - X )(ui - u )
� 1 var [ ( X - m )u ]
var � i 1 300 � i X i
� � ( X i - X )2 [ var( X i )]
2
� 300
� i 1 �
and
©2011 Pearson Education, Inc. Publishing as Addison Wesley
36 Stock/Watson Introduction to Econometrics Third Edition
Thus
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 10
Regression with Panel Data
10.3. The five potential threats to the internal validity of a regression study are: omitted variables,
misspecification of the functional form, imprecise measurement of the independent variables,
sample selection, and simultaneous causality. You should think about these threats onebyone.
Are there important omitted variables that affect traffic fatalities and that may be correlated with
the other variables included in the regression? The most obvious candidates are the safety of roads,
weather, and so forth. These variables are essentially constant over the sample period, so their
effect is captured by the state fixed effects. You may think of something that we missed. Since
most of the variables are binary variables, the largest functional form choice involves the Beer Tax
38 Stock/Watson Introduction to Econometrics Third Edition
variable. A linear specification is used in the text, which seems generally consistent with the data
in Figure 8.2. To check the reliability of the linear specification, it would be useful to consider a
log specification or a quadratic. Measurement error does not appear to a problem, as variables like
traffic fatalities and taxes are accurately measured. Similarly, sample selection is a not a problem
because data were used from all of the states. Simultaneous causality could be a potential problem.
That is, states with high fatality rates might decide to increase taxes to reduce consumption. Expert
knowledge is required to determine if this is a problem.
10.5. Let D2i 1 if i 2 and 0 otherwise; D3i 1 if i 3 and 0 otherwise … Dni 1 if i n and 0
otherwise. Let B2t 1 if t 2 and 0 otherwise; B3t 1 if t 3 and 0 otherwise … BTt 1 if t T
and 0 otherwise. Let b0 1 + 1; gi i - 1 and t t - b1.
s u2
ˆ � Yit
1 T .
10.9. (a) i T t 1 which has variance T Because T is not growing, the variance is not
getting small. i is not consistent.
ˆ
(b) The average in (a) is computed over T observations. In this case T is small (T 4), so the
normal approximation from the CLT is not likely to be very good.
10.11 Using the hint, equation (10.22) can be written as
�1 1 �
� � ( X i 2 - X i1 ) ( Yi 2 - Yi1 ) + ( X i 2 - X i1 ) ( Yi 2 - Yi1 ) �
n
i 1
bˆ1DM �4 4 �
n � 1 1 2 �
�i 1 � ( X i 2 - X i1 ) + ( X i 2 - X i 1 ) �
2
�4 4 �
� ( X - X ) ( Y - Y ) bˆ
n
i2 i1 i2 i1
i 1 BA
� (X -X )
n 2 1
i 1 i2 i1
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 11
Regression with a Binary
Dependent Variable
The probabilities are similar except when experience in large ( >40 years). In this case the
LPM model produces nonsensical results (probabilities greater than 1.0).
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 41
(b) With the P/I ratio reduced to 0.30, the probability of being denied is
1
F (-4.13 + 5.37 �0.30 + 1.27) 22.29%
1 + e1.249 . The difference in denial probabilities
compared to (a) is 4.99 percentage points lower.
(c) For a white applicant having a P/I ratio of 0.35, the probability that the application will be
1
F ( -4.13 + 5.37 �0.35) 9.53%.
denied is 1 + e 2.2505 If the P/I ratio is reduced to 0.30, the
1
F ( -4.13 + 5.37 �0.30) 7.45%.
probability of being denied is 1 + e 2.519 The difference in
denial probabilities is 2.08 percentage points lower.
(d) From the results in parts (a)–(c), we can see that the marginal effect of the P/I ratio on the
probability of mortgage denial depends on race. In the logit regression functional form,
the marginal effect depends on the level of probability which in turn depends on the race
of the applicant. The coefficient on black is statistically significant at the 1% level. The logit
and probit results are similar.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 12
Instrumental Variables Regression
sˆ 2 1 �n (Y - bˆ0TSLS - bˆ1TSLS Xˆ i ) 2
is not consistent. Write this as s a
12.3. (a) The estimator a n - 2 i 1 i
ˆ2
�in1 (uˆi - bˆ1TSLS ( Xˆ i - X i ))2 , ˆ TSLS ˆ TSLS ˆ TSLS
where uˆi Yi - b 0 - b1 X i . Replacing b1 with b1,
1
n-2
sˆ 2 �1 �n (u - b1 ( Xˆ i - X i )) 2 n1 �in1 ui2 +
as suggested in the question, write this as a n i 1 i
1
�in1[ b12 ( Xˆ i - X i ) 2 + 2ui b1 ( Xˆ i - X i )].
n The first term on the right hand side of the equation
converges to s u , but the second term converges to something that is nonzero. Thus s a is
ˆ2 ˆ2
not consistent.
sˆ b2 1
Sin1 (Yi - bˆ0TSLS - bˆ1TSLS X i ) 2
(b) The estimator n-2 is consistent. Using the same notation as in
sˆ � S u ,
and this estimator converges in probability to s u .
2 1 n 2 2
(a), we can write b n i 1 i
and instrument exogeneity E(ui|Z1i, Z2i) 0 is rejected.
(b) The J test suggests that E(ui |Z1i, Z2i) 0, but doesn’t provide evidence about whether
the problem is with Z1 or Z2 or both.
(b) The draft was determined by a national lottery so the choice of serving in the military
was random. Because it was randomly selected, the lottery number is uncorrelated
with individual characteristics that may affect earning and hence the instrument is
exogenous. Because it affected the probability of serving in the military, the lottery
number is relevant.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 13
Experiments and QuasiExperiments
13.1. For students in kindergarten, the estimated small class treatment effect relative to being in a
regular class is an increase of 13.90 points on the test with a standard error 2.45. The 95%
confidence interval is 13.90 1.96 2.45 [9.098, 18.702].
For students in grade 1, the estimated small class treatment effect relative to being in a regular
class is an increase of 29.78 points on the test with a standard error 2.83. The 95% confidence
interval is 29.78 1.96 2.83 [24.233, 35.327].
For students in grade 2, the estimated small class treatment effect relative to being in a regular
class is an increase of 19.39 points on the test with a standard error 2.71. The 95% confidence
interval is 19.39 1.96 2.71 [14.078, 24.702].
For students in grade 3, the estimated small class treatment effect relative to being in a regular
class is an increase of 15.59 points on the test with a standard error 2.40. The 95% confidence
interval is 15.59 1.96 2.40 [10.886, 20.294].
X - X Control
13.3. (a) The estimated average treatment effect is TreatmentGroup 1241 - 1201 40
points.
(b) There would be nonrandom assignment if men (or women) had different probabilities of being
assigned to the treatment and control groups. Let pMen denote the probability that a male is
assigned to the treatment group. Random assignment means pMen 0.5. Testing this null
pˆ Men - 0.5 0.55 - 0.50
tMen 1.00,
1 1
pˆ Men (1 - pˆ Men ) 0.55(1 - 0.55)
n Men 100
hypothesis results in a tstatistic of
so that the null of random assignment cannot be rejected at the 10% level. A similar result is
found for women.
13.7. From the population regression
Yit i + b1 X it + b 2 ( Dt �Wi ) + b 0 Dt + vit ,
we have
By defining DYi Yi2 - Yi1, DXi Xi2 - Xi1 (a binary treatment variable) and ui vi2 - vi1, and using
D1 0 and D2 1, we can rewrite this equation as
DYi b 0 + b1 X i + b 2Wi + ui ,
which is Equation (13.5) in the case of a single W regressor.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
46 Stock/Watson Introduction to Econometrics Third Edition
cov( b1i X i , X i ) E{[ b1i X i - E ( b1i X i )][ X i - E ( X i )]}
E{b1i X i2 - E ( b1i X i ) X i - b1i X i E ( X i ) + E ( b1i X i ) E ( X i )}
E ( b1i X i2 ) - E ( b1i X i ) E ( X i )
Because Xi is randomly assigned, Xi is distributed independently of b1i. The independence means
E ( b1i X i ) E ( b1i ) E ( X i ) and E ( b1i X i2 ) E ( b1i ) E ( X i2 ).
So
cov( b1i X i , X i ) E ( b1i )s X2
E ( b1i ).
s X2 s X2
13.11. Following the notation used in Chapter 13, let 1i denote the coefficient on state sales tax in the
“first stage” IV regression, and let -b1i denote cigarette demand elasticity. (In both cases, suppose
that income has been controlled for in the analysis.) From (13.11)
p
E ( b1i 1i ) cov( b1i , 1i ) cov( b1i , 1i )
bˆ TSLS � E ( b1i ) + Average Treatment Effect + ,
E ( 1i ) E ( 1i ) E ( 1i )
where the first equality uses the uses properties of covariances (equation (2.34)), and the second
equality uses the definition of the average treatment effect. Evidently, the local average treatment
effect will deviate from the average treatment effect when cov(b1i , 1i ) 0. As discussed in
Section 13.6, this covariance is zero when b1i or 1i are constant. This seems likely. But, for the
sake of argument, suppose that they are not constant; that is, suppose the demand elasticity differs
from state to state (b1i is not constant) as does the effect of sales taxes on cigarette prices ( 1i is not
constant). Are b1i and 1i related? Microeconomics suggests that they might be. Recall from your
microeconomics class that the lower is the demand elasticity, the larger fraction of a sales tax is
passed along to consumers in terms of higher prices. This suggests that b1i and 1i are positively
related, so that cov( b1i , 1i ) > 0. Because E( ) > 0, this suggests that the local average treatment
1i
effect is greater than the average treatment effect when b1i varies from state to state.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 14
Introduction to Time Series Regression
and Forecasting
s t2|t -1 E[(Yt - Yt |t -1 ) 2 ].
with the conditional variance This equation is minimized when the
f Y .
second term equals zero, or when t -1 t|t -1 (An alternative is to use the hint, and notice that
the result follows immediately from exercise 2.27.)
(c) Applying Equation (2.27), we know the error ut is uncorrelated with ut – 1 if E(ut |ut – 1) 0.
From Equation (14.14) for the AR(p) process, we have
ut -1 Yt -1 - b 0 - b1Yt - 2 - b 2Yt -3 - L - b pYt - p -1 f (Yt -1 , Yt - 2 ,..., Yt - p -1 ),
48 Stock/Watson Introduction to Econometrics Third Edition
Thus ut and ut – 1 are uncorrelated. A similar argument shows that ut and ut – j are uncorrelated
for all j 1. Thus ut is serially uncorrelated.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 49
(b) The 1st autocovariance is
cov(Yt , Yt -1 ) cov(2.5 + 0.7Yt -1 + ut , Yt -1 )
0.7 var(Yt -1 ) + cov(ut , Yt -1 )
0.7s Y2
0.7 �17.647 12.353.
The 2nd autocovariance is
cov(Yt , Yt - 2 ) cov[(1 + 0.7)2.5 + 0.72 Yt - 2 + ut + 0.7ut -1 , Yt - 2 ]
0.72 var(Yt - 2 ) + cov(ut + 0.7ut -1 , Yt - 2 )
0.72 s Y2
0.72 �17.647 8.6471.
(c) The 1st autocorrelation is
cov(Yt , Yt -1 ) 0.7s Y2
corr (Yt , Yt -1 ) 0.7.
var(Yt ) var(Yt -1 ) s Y2
The 2nd autocorrelation is
cov(Yt , Yt - 2 ) 0.7 2 s Y2
corr (Yt , Yt - 2 ) 0.49.
var(Yt ) var(Yt - 2 ) s Y2
where the final equality follows from var(et) s e for all t and cov(et, ei) 0 for i t.
2
©2011 Pearson Education, Inc. Publishing as Addison Wesley
50 Stock/Watson Introduction to Econometrics Third Edition
(c) Yt b0 + et + b1et–1 + b2et – 2 + + bqet – q and Yt–j b0 + et – j + b1et – 1 – j + b2et – 2 – j +
�q �q b b cov(et - k , et - j - m ),
+ bqet – q – j and cov(Yt, Yt – j) k 0 m 0 k m where b0 1. Notice that
cov(et–k, et–j–m) 0 for all terms in the sum.
var(Yt ) s e2 ( 1 + b12 ) , cov(Yt , Yt - j ) s e2b1 , cov(Yt , Yt - j ) 0
(d) and for j > 1.
14.11. Write the model as Yt - Yt – 1 b0 + b1(Yt – 1 - Yt – 2) + ut. Rearranging yields
Yt b0 + (1 + b1)Yt – 1 - b1Yt – 2 + ut.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 15
Estimation of Dynamic Causal Effects
Dynamic 95% confidence
Period ahead multiplier Predicted effect on interval 25 [b i 1.96SE
(i) (b i) output growth (25b i) (b i)]
0 - 0.055 -1.375 [-4.021, 1.271]
(c) The predicted cumulative change in GDP growth over eight quarters is
25 (-0.055 - 0.026 - 0.031 - 0.109 - 0.128 + 0.008 + 0.025 - 0.019) -8.375%.
(d) The 1% critical value for the Ftest is 2.407. Since the HAC Fstatistic 3.49 is larger than the
critical value, we reject the null hypothesis that all the coefficients are zero at the 1% level.
15.3. The dynamic causal effects are for experiment A. The regression in exercise 15.1 does not control
for interest rates, so that interest rates are assumed to evolve in their “normal pattern” given changes
in oil prices.
15.5. Substituting
X t D X t + X t -1
D X t + D X t -1 + X t - 2
L L
D X t + D X t -1 + L + D X t - p +1 + X t - p
into Equation (15.4), we have
Yt b 0 + b1 X t + b 2 X t -1 + b3 X t - 2 + L + b r +1 X t - r + ut
b 0 + b1 (D X t + D X t -1 + L + D X t - r +1 + X t - r )
+ b 2 ( D X t -1 + L + D X t - r +1 + X t - r )
+ L + b r (D X t - r +1 + X t - r ) + b r + 1X t - r + ut
b 0 + b1D X t + ( b1 + b 2 ) D X t -1 + ( b1 + b 2 + b 3 )D X t - 2
+ L + ( b1 + b 2 + L + b r )D X t - r +1
+ ( b1 + b 2 + L + b r + b r +1 ) X t -r + ut .
Comparing the above equation to Equation (15.7), we see 0 b0, 1 b1, 2 b1 + b2,
3 b1 + b2+ b3,…, and r + 1 b1 + b2 ++ br + br + 1.
Write ut �i 0 f1 ut -i
�
15.7.
i
%
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 53
E (ut - j | u% ) 0
(b) Because t +1
for j 0, Xt is exogenous. However E(ut+1 | u% %
t +1 ) ut +1 so that X is
t
not strictly exogenous.
1
�T (Y - f Y )
(1 - f1)b0 is the mean of Yt – f1Yt – 1 T -1 t 2 t 1 t -1 . Dividing by (1-f1) yields the GLS
estimator of b0.
(c) This is a rearrangement of the result in (b).
bˆ 1 �T Y 1 (Y + Y ) + TT-1 T1-1 �Tt -21 Yt , bˆ - bˆ0GLS T1 (YT + Y1 ) - T1 T 1-1 �Tt -21 Yt -
(d) Write 0 T t 1 t T T 1 so that 0
1
1 1
(YT - Y1 ) 2
.
1-f T -1
and the variance is seen to be proportional to T
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 16
Additional Topics in Time
Series Regression
b0
mY E (Yt ) ,
16.1. Y follows a stationary AR(1) model, Yt b 0 + b1Yt -1 + ut . The mean of Y is 1 - b1
t t
m Y - mY
Y + t .
1 - 1 - b1
The unconditional mean of ut is E (ut) 0, and the unconditional variance of ut is
because E (ut) 0. Because of the stationarity, var(ut–1) var(ut). Thus, var(ut) 1.0 +
0.5var(ut) which implies var(ut ) 1.0 / 0.5 2.
�-3 u 3 �
Pr (-3 �ut �3) Pr � � t � �
�1.01 s t 1.01 �
F (2.9703) - F (-2.9703) 0.9985 - 0.0015 0.9970.
� -3 u 3 �
Pr ( -3 �ut �3) Pr � �t � �
1.732 s t 1.732 �
�
F(1.732) - F( -1.732) 0.9584 - 0.0416 0.9168.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
56 Stock/Watson Introduction to Econometrics Third Edition
T T T T T
So
1 T 1 1 �T 2 T 2 T
�
�
T t 1
Yt -1DYt � � �
T 2 �t 1
Yt - �Yt -1 - �(DYt )2 �
t 1 t 1
.
�
1 T 1 1 �2 T �
�
T t 1
Yt -1 DYt � �
T 2�
YT - �(DYt ) 2 �
t 1 �
1� �
2
�YT � 1 T
�
� � - �(DYt ) 2 �
.
2�
� T � T t 1
� �
�
1 T
� YX
T
� Y DY
T
�t 1Yt DYt +1
bˆ T
t 1 t t t 1 t t +1
.
�X � ( DY ) 1 T
T 2 T 2
� ( t +1 )
2
t 1 t t 1 t +1 D Y
16.7. T t 1 Following the hint, the numerator is the same
s u2
1
�Tt 1 Yt DYt +1 �
d
( c12 - 1).
expression as (16.21) (shifted forward in time 1 period), so that T 2 The
s u2
p
1
�Tt D Yt +1 ) 2
1( �
1 T
ut2+1
denominator is T T t 1
by the law of large numbers. The result follows
directly.
E (ut2 ) E ( s t2 )
E ( 0 + 1ut2-1 )
0 + 1 E ( ut2-1 )
0 + 1 E ( ut2 )
2
where the last line uses stationarity of u. Solving for E (ut ) gives the required result.
(b) As in (a)
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 57
E (u ) E ( s
2
t t
2
)
E ( 0 + 1ut2-1 + 2ut2- 2 + L + p ut2- p )
0 + 1 E ( ut2-1 ) + 2 E ( ut2- 2 ) + L + p E ( ut2- p )
0 + 1 E ( ut2 ) + 2 E ( ut2 ) + L + p E ( ut2 )
0
E (ut2 )
so that 1 - �tp1 i .
2
(c) This follows from (b) and the restriction that E (ut ) > 0.
(d) As in (a)
E (ut2 ) E ( s t2 )
0 + 1 E ( ut2-1 ) + f1 E ( s t2-1 )
0 + (1 + f1 ) E ( ut2-1 )
0 + (1 + f1 ) E ( ut2 )
0
1 - 1 - f1
E ( ut2 ) > 0.
(e) This follows from (d) and the restriction that
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 17
The Theory of Linear Regression
with One Regressor
�(Y - b X ) .
i 1
i 1 i
2
�2(Y - b X )(- X ) 0.
i 1
i 1 i i
Solving for b1 from the first order condition leads to the restricted least squares estimator
�n X Y
bˆ1RLS i n1 i 2 i .
�i 1 X i
�n X Y �n X ( b X + ui ) �n X u
bˆ1RLS i n1 i 2 i i 1 i n 1 2i b1 + i n1 i 2 i .
�i 1 X i �i 1 X i �i 1 X i
Thus
©2011 Pearson Education, Inc. Publishing as Addison Wesley
60 Stock/Watson Introduction to Econometrics Third Edition
ˆ RLS
Under assumptions 1-3 of Key Concept 17.1, b1 is asymptotically normally distributed.
ˆ RLS
The large sample normal approximation to the limiting distribution of b1 follows from
considering
�n X u 1
�n X u
bˆ1RLS - b1 i n1 i 2 i n1 i n1 i 2 i .
�i 1 X i n �i 1 X i
Consider first the numerator which is the sample average of vi Xiui. By assumption 1 of Key
Concept 17.1, v has mean zero: E ( X i ui ) E[ X i E (ui | X i )] 0. By assumption 2, v is i.i.d. By
i i
1 n
v /s v
sv n
�v i 1
i �
d
N (0, 1)
or
1 n
�
n i 1
X i ui �
d
N (0, s v2 ).
2
For the denominator, X i is i.i.d. with finite second variance (because X has a finite fourth
moment), so that by the law of large numbers
1 n 2 p
�X i � E ( X 2 ).
n i 1
Combining the results on the numerator and the denominator and applying Slutsky’s theorem
lead to
1
�in1 X i ui � var( X i ui ) �
n ( bˆ1RLS - b u ) n
n 2
�
d
N�
0, 2
.
�
1
n � Xi 1 i � E( X ) �
ˆ RLS
(c) b1 is a linear estimator:
�n X Y Xi
bˆ1RLS i n1 i 2 i �i 1 aiYi ,
n
where ai .
�i 1 X i � X i2
n
i 1
The weight ai (i 1,, n) depends on X1,, Xn but not on Y1,, Yn.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 61
Thus
�n X u
bˆ1RLS b1 + i n1 i 2 i .
�i 1 X i
©2011 Pearson Education, Inc. Publishing as Addison Wesley
62 Stock/Watson Introduction to Econometrics Third Edition
bˆ1RLS is conditionally unbiased because
ˆ � �in1 X iui �
E ( b1 | X 1 ,K , X n E �b1 + n
RLS
2
| X 1 ,K , X n �
� �i 1 X i �
��n X u �
b1 + E � i n1 i 2 i | X 1 ,K , X n � b1.
��i 1 X i �
The final equality used the fact that
��n X u � �n X E (u | X ,K , X n )
E � i n1 i 2 i | X 1 ,K , X n � i 1 i n i 12 0
��i 1 X i � �i 1 X i
because the observations are i.i.d. and E (ui |Xi) 0.
ˆ RLS
(d) The conditional variance of b1 , given X1,, Xn, is
� �in1 X i ui �
var( b�
1
RLS
| X 1,K , X n ) var �b1 + n 2 | X1 ,K , X n �
� �i 1 Xi �
�in1 X i2 var(ui | X1 ,K , X n )
(�in1 Xi2 )2
�in1 X i2s u2
( �in1 Xi2 )2
s u2
.
�in1 X i2
s u2
var(b�
1 |X1 ,K , X n ) .
�in1 ( X i - X )2
Since
n n n n n
�( X i - X )2 �X i2 - 2 X �X i + nX 2 �X i2 - nX 2 �X i2 ,
i 1 i 1 i 1 i 1 i 1
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 63
ˆ RLS
the OLS estimator has a larger conditional variance: var( b1|X 1 ,K , X n ) > var( b1 |X 1 ,K , X n ).
ˆ RLS
The restricted least squares estimator b1 is more efficient.
ˆ RLS
(f) Under assumption 5 of Key Concept 17.1, conditional on X1,, Xn, b1 is normally
distributed since it is a weighted average of normally distributed variables ui:
�n X u
bˆ1RLS b1 + i n1 i 2 i .
�i 1 X i
©2011 Pearson Education, Inc. Publishing as Addison Wesley
64 Stock/Watson Introduction to Econometrics Third Edition
ˆ RLS
Using the conditional mean and conditional variance of b1 derived in parts (c) and (d)
ˆ RLS
respectively, the sampling distribution of b1 , conditional on X ,, X , is 1 n
� s2 �
bˆ1RLS ~ N �b1 , n u 2 �
.
� �i 1 X i �
(g) The estimator
� �n u �
var( b%
1 | X 1 ,K , X n ) var �b1 + ni 1 i | X 1 ,K , X n �
� �i 1 X i �
n
� var(ui | X 1 ,K , X n )
i 1
(�in1 X i ) 2
ns u2
.
(�in1 X i )2
% ˆ RLS
The difference in the conditional variance of b1 and b1 is
ns u2 s u2
var( b% ˆ RLS | X ,K , X )
1 | X 1 ,K , X n ) - var( b1 1 n - .
( �in1 X i ) 2 �in1 X i2
% ˆ RLS
In order to prove var( b1 | X 1 ,K , X n ) �var( b1 | X 1 ,K , X n ), we need to show
n 1
n 2
� n
(� X i )
i 1 �i 1 X i2
or equivalently
2
n
�n �
n�X �� � 2
X i �.
i
i 1 �i 1 �
This inequality comes directly by applying the CauchySchwartz inequality
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 65
2
�n � n n
��
�i 1
(ai �ף
bi ) �
�
�ai2
i 1
�b
i 1
i
2
which implies
2 2
�n � �n � n n n
�� X �
�i 1 � �i 1
�
i � �ף 1 X i �
�
�12
i 1
�X i2
i 1
n�X i2 .
i 1
% ˆ RLS
That is nSi 1 X i �(S x 1 X i ) , or var( b1| X 1 ,K , X n ) �var( b1 | X 1 ,K , X n ).
n 2 n 2
%
Note: because b1 is linear and conditionally unbiased, the result
var( b% ˆ RLS | X ,K , X )
1 | X 1 ,K , X n ) �var( b1 1 n follows directly from the GaussMarkov theorem.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
66 Stock/Watson Introduction to Econometrics Third Edition
17.3. (a) Using Equation (17.19), we have
n �i 1 ( X i - X )ui
1 n
n ( b�
1 - b1 ) n 1
n
�in1 ( Xi - X )2
1
�in1[( Xi - m X ) - ( X - m X )]ui
n n
n �i 1 ( X i - X )
1 n 2
1
n �in1 ( X i - m X )ui ( X - mX ) 1
n �in1 ui
-
1
n �in1 ( X i - X )2 1
n �in1 ( X i - X )2
1
n �in1 vi ( X - mX ) 1
n �in1 ui
-
1
n �in1 ( X i - X )2 1
n �in1 ( X i - X )2
by defining vi (Xi - mX)ui.
central limit theorem,
n ( u - mu ) 1
�in1 ui
n d
� N (0, 1).
su su
X � m X 2 , or X - m X � 0.
p p
The law of large numbers implies By the consistency of sample
S (X - X )
1 n 2
variance, n i 1 i converges in probability to population variance, var(Xi), which is
finite and nonzero. The result then follows from Slutsky’s theorem.
(c) The random variable vi (Xi - mX) ui has finite variance:
var(vi ) var[( X i - m X ) mi ]
�E[( X i - m X ) 2 ui2 ]
� E[( X i - m X ) 4 ]E[(ui ) 4 ] �.
The inequality follows by applying the CauchySchwartz inequality, and the second inequality
follows because of the finite fourth moments for (Xi, ui). The finite variance along with the fact
that vi has mean zero (by assumption 1 of Key Concept 15.1) and vi is i.i.d. (by assumption 2)
implies that the sample average v satisfies the requirements of the central limit theorem. Thus,
v 1
�in1 vi
n
sv sv
satisfies the central limit theorem.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 67
(d) Applying the central limit theorem, we have
1
n �in1 vi
�
d
N (0, 1).
sv
©2011 Pearson Education, Inc. Publishing as Addison Wesley
68 Stock/Watson Introduction to Econometrics Third Edition
Because the sample variance is a consistent estimator of the population variance, we have
1
�in1 ( X i - X ) 2 p
n
� 1.
var( X i )
Using Slutsky’s theorem,
�in1 vt
1
n
sv
�
d
N (0,1),
i 1 ( X t - X )
n 2
n�
1
s X2
or equivalently
1
n �in1 vi
� var(vi ) �
�
d
N�
0, .
2 �
1
n � (Xi - X )
n
i 1
2
� [var( X i )] �
Thus
1
�in1 vi ( X - mX ) 1
�in1 ui
n ( bˆ1 - b1 ) n
- n
1
n �in1 ( X i - X ) 2 1
n �in1 ( X i - X ) 2
� var(vi ) �
�
d
N�
0, 2 �
� [var( X i )] �
ˆ
since the second term for n ( b1 - b1 ) converges in probability to zero as shown in part (b).
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 69
f (ui , u j , X i , X j ) f (ui , X i | u j , X j ) f (u j , X j )
f (ui , X i ) f (u j , X j ).
(b) The conditional probability distribution function of ui and uj given Xi and Xj equals
f (ui , u j , X i , X j ) f (ui , X i ) f (u j , X j )
f (ui , u j | X i , X j ) f (ui | X i ) f (u j | X j ).
f (Xi , X j ) f (Xi ) f (X j )
The first and third equalities used the definition of the conditional probability distribution
function. The second equality used the conclusion the from part (a) and the independence
between Xi and Xj. Substituting
f (ui , u j | Xi , X j ) f (ui | Xi ) f (u j | X j )
©2011 Pearson Education, Inc. Publishing as Addison Wesley
70 Stock/Watson Introduction to Econometrics Third Edition
into the definition of the conditional expectation, we have
E (ui u j | X i , X j ) �
�ui u j f (ui , u j | X i , X j ) dui du j
�
�ui u j f (ui | X i ) f (u j |X j )dui du j
�
ui f (ui | X i )dui �
u j f (u j | X j ) du j
E (ui | X i ) E (u j | X j ).
f (ui , X i , Q)
f (ui | X i , Q)
f ( X i , Q)
f (ui , X i ) f (Q )
f ( X i ) f (Q)
f (ui , X i )
f (Xi )
f (ui | X i )
where the first equality uses the definition of the conditional density, the second uses the fact
that (ui, Xi) and Q are independent, and the final equality uses the definition of the conditional
density. The result then follows directly.
(d) An argument like that used in (c) implies
f (ui u j | X i , K X n ) f (ui u j | X i , X j )
and the result then follows from part (b).
17.9. We need to prove
1 n
�
p
[( X i - X )2 uˆi2 - ( X i - m X )2 ui2 ] � 0.
n i 1
Using the identity X m X + ( X - m X ),
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 71
n n
1 2 1 1 n
� i
n i 1
[( X - X ) 2 2
ˆ
ui - ( X i - m X ) 2 2
u i ] ( X - m X ) �i
n i 1
ˆ
u 2
- 2( X - m X ) �( X i - m X )uˆi2
n i 1
1 n
+ �( X i - m X )2 (uˆi2 - ui2 ).
n i 1
1
Sin1[( X i - X )2 uˆi2 - ( X i - m X ) 2 ui2 ]
Substituting this into the expression for n yields a series of terms
p
b S X u n r s
each of which can be written as anbn where an � 0 and n
1
where r and s are
n i 1 i i
ˆ
integers. For example, an ( X - m X ), an ( b1 - b1 ) and so forth. The result then follows from
p
Sin1 X ir uis � d
where d is a finite constant. Let wi X i ui and note that wi
1 r s
Slutksy’s theorem if n
is i.i.d. The law of large numbers can then be used for the desired result if E ( wi ) �. There are
2
two cases that need to be addressed. In the first, both r and s are nonzero. In this case write
and this term is finite if r and s are less than 2. Inspection of the terms shows that this is true. In
the second case, either r 0 or s 0. In this case the result follows directly if the nonzero
exponent (r or s) is less than 4. Inspection of the terms shows that this is true.
�exp � �
� �- 2 r XY � �
� �+ � �� + � ��.
�-2(1 - r XY
2
)�
� s � � s �
� s � � s � � 2 � s � �
� � X X Y Y � X
�
©2011 Pearson Education, Inc. Publishing as Addison Wesley
72 Stock/Watson Introduction to Econometrics Third Edition
Simplifying yields the desired expression.
(b) The result follows by noting that fY|X=x(y) is a normal density (see equation (17.36)) with m mT|
s Y2|X
X and s .
2
This regression has two regressors, 1/si and Xi/si. Because these regressors depend only on
Xi, E(vi|Xi) 0 implies that E(vi/si | (1/si), Xi/si) 0. Thus, weighted least squares provides a
consistent estimator of b0 E(b0i) and b1 E(b1i).
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 73
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Chapter 18
The Theory of Multiple Regression
18.3. (a) c�
var( W) c c�
Σwc
Solutions to OddNumbered EndofChapter Exercises 75
where the second equality uses the fact that Q is a scalar and the third equality uses the fact
that mQ cm w.
(b) Because the covariance matrix �W is positive definite, we have c� �wc > 0 for every non
zero vector from the definition. Thus, var(Q) > 0. Both the vector c and the matrix �W are
finite, so var(Q) c�
�wc is also finite. Thus, 0 < var(Q) < .
18.5. PX X (XX)-1X, MX In - PX.
(a) PX is idempotent because
PXPX X(XX)-1 XX(XX)-1 X X(XX)-1X PX.
MX is idempotent because
M X M X (I n - PX ) (I n - PX ) I n - PX - PX + PX PX
I n - 2PX + PX I n - PX M X
PXMX 0nxn because
PX M X PX (I n PX ) PX - PX PX PX - PX 0n �n
ˆ � -1 �
(b) Because β ( X X) X Y, we have
ˆ Xβˆ X( X�
Y X)-1 X�
Y PX Y
which is Equation (18.27). The residual vector is
ˆ Y-Y
U ˆ Y - P Y (I - P )Y M Y.
X n X X
We know that MXX is orthogonal to the columns of X:
MXX (In - PX) X X - PXX X -X (XX)-1 XX X - X 0
so the residual vector can be further written as
ˆ M Y M ( Xb + U) M Xb + M U M U
U X X X X X
which is Equation (18.28).
©2011 Pearson Education, Inc. Publishing as Addison Wesley
76 Stock/Watson Introduction to Econometrics Third Edition
The OLS estimator is
�bˆ1 � �X�
X X�
-1
W � �X�Y�
� � � � � �
�bˆ � �W�X W�
W � �W�Y�
�2 �
-1
�b � �X� X X�
W � �X�
U�
� 1 �+ � � � �
�b 2 � �W�
X W�
W � �W�
U�
-1
�b � �1 X� X 1
X�
W � �1n X�
U�
� 1 �+ �1n n
� � �
�b 2 � �n W�
X W�
W � �1n W�
U�
1
n
-1
�b � �1 �n X 2 �in1 X iWi � �1n �i 1 X i ui �
1 n
� 1 �+ �1 n n i 1 i n
� � �
�b 2 � �n �i 1 Wi X i
1
n
�in1 Wi 2 � �1 n
�n �i 1 Wi ui �
�
n 2 p
1
��
i 1 X i E ( X 2 ); 1 n
Wi 2 p
E (W 2 ); 1
�n X W p
By the law of large numbers n n i 1 n i 1 i i
E ( XW ) 0 (because X and W are independent with means of zero); n
1
�in1 X i ui p E ( Xu ) 0
1
�in1 X i ui p
E ( Xu ) 0
(because X and u are independent with means of zero); n Thus
-1
�bˆ1 �p �b1 � �E ( X 2 ) 0 �� 0 �
� �� � �+ � �� �
�bˆ � �b 2 � � 0 E (W 2 ) � �E (Wu ) �
�2 �
� b1 �
� E (Wu ) �.
�b 2 + E (W 2 ) �
� �
E (Wu)
bˆ2 �
p
b2 + �b 2
(b) From the answer to (a) E (W 2 ) if E(Wu) is nonzero.
(c) Consider the population linear regression ui onto Wi:
ui Wi + ai
where E(Wu)/E(W2). In this population regression, by construction, E(aW) 0. Using this equation for
ui rewrite the equation to be estimated as
Yi X i b1 + Wi b 2 + ui
X i b1 + Wi ( b 2 + ) + ai
X i b1 + Wiq + ai
where q b 2 + . A calculation like that used in part (a) can be used to show that
-1
� n ( bˆ1 - b1 ) � �1n �in1 X i2 1
�in1 X iWi � �n �i 1 X i ai �
1 n
� � 1 n n
� n �
� n ( bˆ - q ) � � n 2 �
�� �
� �n �i 1 Wi X i n �i 1 Wi �n �i 1 Wu ai �
1 1
� 2
-1
d
�E ( X 2 ) 0 � �S1 �
� � �� �
� 0 E (W 2 ) � �S2 �
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 77
� s a2 �
n ( bˆ1 - b1 ) �
d
N�
0, 2 �
� E( X ) �
Now consider the regression that omits W, which can be written as:
Yi X i b1 + d i
where di Wiq + ai. Calculations like those used above imply that
� s d2 �
(
n bˆ1r - b1 �
d
)N�
0, 2 �
� E( X ) �
.
ˆr
Since s d s a + q E (W ), the asymptotic variance of b1 is never smaller than the asymptotic
2 2 2 2
ˆ
variance of b1 .
bˆ ( X�
M W X ) -1 X �
MWY
M W ( Xb + Wg + U)
M W X ) -1 X �
( X�
b + ( X�
M W X) -1 X�
M W U.
18.9. (a)
The last equality has used the orthogonality MWW 0. Thus
bˆ - b ( X�
M W X) -1 X�
M W U ( n -1 X�
M W X) -1 (n -1 X�
M W U ).
(b) Using MW In - PW and PW W(WW)-1W we can get
n -1X�
M W X n -1X�
(I n - PW )X
n -1X�
X - n-1X�
PW X
n -1X�
X - ( n-1X�
W )( n-1 W�
W )-1 (n-1W�
X).
n -1X�
X 1n �in1 Xi X� 1
i . The (j, l) element of this matrix is n
�in1 X ji Xli .
First consider By
Assumption (ii), Xi is i.i.d., so XjiXli is i.i.d. By Assumption (iii) each element of Xi has four
moments, so by the CauchySchwarz inequality XjiXli has two moments:
E ( X li4 ) �.
E ( X 2ji X li2 ) � E ( X 4ji ) �
1
�n X X
Because XjiXli is i.i.d. with two moments, n i 1 ji li obeys the law of large numbers, so
1 n
�X ji X li �p E ( X ji X li ) .
n i 1
This is true for all the elements of n-1 XX, so
1 n
n -1X� �
p
X i � E ( Xi X i ) �XX . .
X i X� �
n i 1
Applying the same reasoning and using Assumption (ii) that (Xi, Wi, Yi) are i.i.d. and
Assumption (iii) that (Xi, Wi, ui) have four moments, we have
©2011 Pearson Education, Inc. Publishing as Addison Wesley
78 Stock/Watson Introduction to Econometrics Third Edition
1 n
n -1 W�
W �Wi Wi��p E (Wi Wi�) �WW ,
n i 1
1 n
�Xi Wi�� E (Xi Wi�) �XW ,
p
n -1 X �
W
n i 1
and
1 n
n -1 W�
X �Wi X�
n i 1
p
i � E ( Wi X i ) �WX .
�
(c) The conditional expectation
�E (u1| X, W ) � �E (u1| X1 , W1 ) �
� �� �
E (u | X, W ) � �E (u2 | X 2 , W2 ) �
E ( U|X, W) � 2
� M �� M �
� �� �
�E (un | X, W ) � �E (un | X n , Wn ) �
�W1� � �W1� �
� �� � � �
W W
� 2 � � 2 � W .
� M � �M �
� �� �
�Wn� � �Wn� �
The second equality used Assumption (ii) that ( Xi , Wi , Yi ) are i.i.d., and the third equality
applied the conditional mean independence assumption (i).
(d) In the limit
n -1X� p
MWU � M W U|X, W) X�
E ( X� M W W 0 k1 �1
M W E (U|X, W) X�
because M W W 0.
-1 -1
� �
(e) n X M W X converges in probability to a finite invertible matrix, and n X M W U converges
in probability to a zero vector. Applying Slutsky’s theorem,
bˆ - b (n -1X�
M W X) -1 (n -1X� p
M W U) � 0.
This implies
bˆ �
p
b.
©2011 Pearson Education, Inc. Publishing as Addison Wesley
Solutions to OddNumbered EndofChapter Exercises 79
% % +u%
(b) Yi Xβ
i i , so that
©2011 Pearson Education, Inc. Publishing as Addison Wesley
80 Stock/Watson Introduction to Econometrics Third Edition
-1
�n % %� n %
βˆ - β ��Xi ' Xi � �X i ' u%i
�i 1 � i 1
-1
�n % %� n
��Xi ' Xi �
�i 1
�Xi ' M ' Mu i
� i 1
-1
�n % %� n
��Xi ' Xi �
�i 1
�Xi ' M 'ui
� i 1
-1
�n % %� n %
��Xi ' Xi �
�i 1
�Xi 'ui
� i 1
Qˆ X% 1n �in1 (T -1 �Tt 1 ( X it - X i )2 ), -1
where (T �t 1 ( X it - X i ) ) are i.i.d. with mean X% and
T 2
Q
(c)
finite variance (because Xit has finite fourth moments). The result then follows from the law of
large numbers.
(d) This follows the Central limit theorem.
(e) This follows from Slutsky’s theorem.
(f) hi are i.i.d., and the result follows from the law of large numbers.
2
-1/ 2 % ˆ -1/ 2 ˆ % %
(g) Let hi T Xi ' u%i hi - T ( b - b ) Xi ' Xi . Then
ˆ
%' uˆ% h 2 + T -1 ( bˆ - b ) 2 ( X
hˆi2 T -1/ 2 X %' X%) 2 - 2T -1/2 ( bˆ - b )h X
% %
i i i i i i i ' Xi
18.17 The results follow from the hints and matrix multiplication and addition.
©2011 Pearson Education, Inc. Publishing as Addison Wesley