You are on page 1of 10

Homework 3 by Grygorenko, Koppel, Mosisa

a) What is the fraction of the observations that have x>5?

. tab z
=1 if x >=
5

Freq.

Percent

Cum.

0
1

1,000
1,000

50.00
50.00

50.00
100.00

Total

2,000

100.00

The above result shows 50% of observations (i.e 1000 observations) have x>5.
b) What is the fraction of units that are in the treated group (that is, have w= 1)?

. tab w z
=1 if
treated

=1 if x >= 5
0

Total

0
1

727
273

111
889

838
1,162

Total

1,000

1,000

2,000

This table shows that 58.10% units are in the treated group, that is 1162 people out of total 2,000
observations.
How does the proportion of people treated below and above the cut-off point?
89% of people in group who had x over 5 received treatment, compared to 27,3% of group who
had x below 5.
c) Estimate a separate linear probability models for x < 5 and x>5, and obtain the predicted values
at the cut-off point and graph the functions.

. reg w x if z==0
Source

SS

df

MS

Model
Residual

17.5177744
180.953226

1
998

17.5177744
.181315857

Total

198.471

999

.19866967

Coef.

x
_cons

.0916522
.043984

Number of obs
F( 1,
998)
Prob > F
R-squared
Adj R-squared
Root MSE

Std. Err.

P>|t|

.0093244
.0269105

9.83
1.63

0.000
0.102

=
=
=
=
=
=

1000
96.61
0.0000
0.0883
0.0874
.42581

[95% Conf. Interval]


.0733545
-.0088237

.1099499
.0967917

. predict what0
(option xb assumed; fitted values)
. reg w x if z==1
Source

SS

df

MS

Model
Residual

4.4914493
94.1875507

1
998

4.4914493
.094376303

Total

98.679

999

.098777778

Coef.

x
_cons

.0464084
.5408787

Std. Err.
.0067272
.0513891

t
6.90
10.53

Number of obs
F( 1,
998)
Prob > F
R-squared
Adj R-squared
Root MSE

1000
47.59
0.0000
0.0455
0.0446
.30721

P>|t|

[95% Conf. Interval]

0.000
0.000

.0332073
.4400356

. predict what1
(option xb assumed; fitted values)
. gen what= what0 if ~z
(1000 missing values generated)

=
=
=
=
=
=

.0596095
.6417218

. replace what = what1 if z


(1000 real changes made)

. label var what "linear prob"

.6
.4
.2
0

linear prob

.8

. twoway (line what x)

4
6
forcing variable

d) Do the same thing using probit models

10

. probit w x if ~z
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:

log
log
log
log
log

likelihood
likelihood
likelihood
likelihood
likelihood

=
=
=
=
=

-586.21993
-540.8212
-540.49983
-540.49975
-540.49975

Probit regression

Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2

Log likelihood = -540.49975

Coef.

x
_cons

.292854
-1.38944

Std. Err.
.0315516
.0981431

P>|z|

9.28
-14.16

0.000
0.000

=
=
=
=

1000
91.44
0.0000
0.0780

[95% Conf. Interval]


.2310139
-1.581797

.354694
-1.197083

.
. predict phat0
(option pr assumed; Pr(w))

. probit w x if z
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:

log
log
log
log
log

likelihood
likelihood
likelihood
likelihood
likelihood

=
=
=
=
=

-348.60098
-324.97189
-324.44263
-324.44168
-324.44168

Probit regression

Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2

Log likelihood = -324.44168

Coef.

x
_cons

.2706787
-.7171781

.
. predict phat1
(option pr assumed; Pr(w))

Std. Err.
.040865
.2895344

z
6.62
-2.48

P>|z|
0.000
0.013

=
=
=
=

1000
48.32
0.0000
0.0693

[95% Conf. Interval]


.1905849
-1.284655

.3507726
-.1497012

. gen pshat = phat0 if ~z


(1000 missing values generated)
.
. replace pshat = phat1 if z
(1000 real changes made)

. label var pshat "probit"

.2

.4

.6

.8

. twoway (line what pshat x)

4
6
forcing variable
linear prob

10

probit

Difference in probabilities 0.77 0.5 = 0.25


e) Estimate a separate linear outcome models for x < 5 and x>5 and find predicted values of
outcomes at the cut-off point from both models. Find also the difference and
f) Compute the estimate of the treatment effect at point x=5 using the ratio of calculated
differences from (e) and (c) (Wooldridge equation (21.107))

. reg y x_5 if z
Source

SS

df

MS

Model
Residual

230.759468
742.020178

1
998

230.759468
.743507193

Total

972.779646

999

.973753399

Coef.

x_5
_cons

.3326468
3.814271

Std. Err.
.0188819
.0545347

t
17.62
69.94

Number of obs
F( 1,
998)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1000
310.37
0.0000
0.2372
0.2365
.86227

P>|t|

[95% Conf. Interval]

0.000
0.000

.295594
3.707255

.3696997
3.921287

. reg y x_5 if ~z
Source

SS

df

MS

Model
Residual

409.736839
1109.52773

1
998

409.736839
1.11175123

Total

1519.26457

999

1.52078535

Coef.

x_5
_cons

.4432575
3.282887

Std. Err.
.0230891
.0666859

t
19.20
49.23

Number of obs
F( 1,
998)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1000
368.55
0.0000
0.2697
0.2690
1.0544

P>|t|

[95% Conf. Interval]

0.000
0.000

.3979488
3.152026

.4885663
3.413747

The coefficient of cons in the first regression is 3.58 and in the second regression is 3.28 and the
difference is 0.53

. reg w x_5 if z
Source

SS

df

MS

Model
Residual

4.4914493
94.1875507

1
998

4.4914493
.094376303

Total

98.679

999

.098777778

Coef.

x_5
_cons

.0464084
.7729209

Std. Err.
.0067272
.0194295

t
6.90
39.78

Number of obs
F( 1,
998)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1000
47.59
0.0000
0.0455
0.0446
.30721

P>|t|

[95% Conf. Interval]

0.000
0.000

.0332073
.7347935

.0596095
.8110482

. reg w x_5 if ~z
Source

SS

df

MS

Model
Residual

17.5177745
180.953226

1
998

17.5177745
.181315857

Total

198.471

999

.19866967

Coef.

x_5
_cons

.0916522
.5022452

Std. Err.
.0093244
.0269307

t
9.83
18.65

Number of obs
F( 1,
998)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1000
96.61
0.0000
0.0883
0.0874
.42581

P>|t|

[95% Conf. Interval]

0.000
0.000

.0733545
.4493979

.1099499
.5550926

. di ( 3.814271 - 3.282887)/( .7729209 - .5022452)


1.9631759
g) Use the IV estimator described in Wooldridge equation (21.108) using all the data. (This
should give you the same point estimate than (f)). What is its standard error?

. ivreg y x zx_5 (w=z), robust


Instrumental variables (2SLS) regression

Coef.

w
x
zx_5
_cons

1.963177
.263328
-.0217891
.9802505

Instrumented:
Instruments:

Number of obs
F( 3, 1996)
Prob > F
R-squared
Root MSE

Robust
Std. Err.
.2046892
.0295197
.0214587
.0363406

t
9.59
8.92
-1.02
26.97

P>|t|
0.000
0.000
0.310
0.000

=
2000
= 3588.42
= 0.0000
= 0.8722
=
.5959

[95% Conf. Interval]


1.56175
.2054354
-.0638729
.908981

2.364604
.3212206
.0202947
1.05152

w
x zx_5 z

Coef of 1.96 matches coefficient found in f.


Std. Error 0.2
h) Now use a "local" version of the IV method, restricting to data with x > 3 and x < 7. What has
happened to point estimate and standard error? Present the results.
. ivreg y x zx_5 (w=z) if x >3 & x < 7, robust
Instrumental variables (2SLS) regression

Coef.

w
x
zx_5
_cons

1.775465
.3471895
-.0991082
.7060606

Instrumented:
Instruments:

w
x zx_5 z

Robust
Std. Err.
.3267695
.0726118
.0772654
.1912344

Number of obs
F( 3,
796)
Prob > F
R-squared
Root MSE

t
5.43
4.78
-1.28
3.69

P>|t|
0.000
0.000
0.200
0.000

=
=
=
=
=

800
351.50
0.0000
0.7662
.61919

[95% Conf. Interval]


1.134033
.2046563
-.2507762
.3306773

2.416897
.4897226
.0525599
1.081444

Note that now number of obs = 800 instead of 2000, which was the case before.
W = 1.77, which is clearly different from 1.96 obtained before.
Standard error increases over 1.5 times from 0.2 to 0.33
=1 if treated
Bandwidth 1.68758681936383

.2

.4

.6

.8

response variable
Bandwidth 1.68758681936383

-5

0
Centered forcing variable

-5

y=50

0
Centered forcing var

w = 50
response variable
Bandwidth 3.375173638727659

.2

.4

.6

.8

=1 if treated
Bandwidth 3.375173638727659

-5

-5

y=100 (standard)

w = 100 (standard)

response variable
Bandwidth 6.750347277455318

.2

.4

.6

.8

=1 if treated
Bandwidth 6.750347277455318

-5

y = 200

-5

w = 200

. rd y w x_5, gr
Three variables specified; jump in treatment
at Z=0 will be estimated. Local Wald Estimate
is the ratio of jump in outcome to jump in treatment.
Assignment variable Z is x_5
Treatment variable X_T is w
Outcome variable y is y
Command used for graph: lpoly; Kernel used: triangle (default)
Bandwidth: 3.3751736; loc Wald Estimate: 1.877531
Bandwidth: 1.6875868; loc Wald Estimate: 1.6518071
Bandwidth: 6.7503473; loc Wald Estimate: 1.9487988
Estimating for bandwidth 3.375173638727659
Estimating for bandwidth 1.68758681936383
Estimating for bandwidth 6.750347277455318

Coef.

numer
denom
lwald
numer50
denom50
lwald50
numer200
denom200
lwald200

.4655039
.2479341
1.877531
.4717397
.28559
1.651807
.5050617
.2591657
1.948799

Std. Err.

P>|z|

[95% Conf. Interval]

.1319549
.0539299
.3075553
.1853561
.0764943
.3825489
.100234
.0407388
.2229953

3.53
4.60
6.10
2.55
3.73
4.32
5.04
6.36
8.74

0.000
0.000
0.000
0.011
0.000
0.000
0.000
0.000
0.000

.2068771
.1422334
1.274734
.1084484
.135664
.9020252
.3086066
.179319
1.511736

.7241307
.3536347
2.480328
.8350309
.4355161
2.401589
.7015168
.3390123
2.385862

Estimates made in case of centered x-variable.


Different bandwidth gives different results:
o Including wider range of observations (coef of lwald200 = 1.95) gives results
similar to ones obtained before with similar standard error.
o Optimal bandwidth (lwald) shows policy effect on outcome variable to be 1.87,
however std error is 0.31
o lwald50 gives most different results, with observed coef being 1.65 with standard
error 0.38.

You might also like