Professional Documents
Culture Documents
Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Experiments and Basic Statistics
1. The probability framework for statistical inference
2. Experiments and estimation
3. Hypothesis testing
4. Confidence Intervals
2. Estimation
Almost everything we do in econometrics involves averages or
weighted averages of a sample of data
So we need to understand the distributions of sample averages
Again, we will assume simple random sampling
Choose n individuals (districts, entities) at random from the
population
The n observations in the sample are denoted Y1, ,Yn, where
Y1 is the first observation,Y2 is the second, and so on
So the data set is (Y1,Y2,,Yn), where Yi = value of Y for the
ith individual (district, entity, unit) sampled
1
Y = (Y2 + ... + Yi + ... + Yn ) / n = Yi
n i=1
But
The mean and variance of Y are the mean and variance of its
sampling distribution, E( Y ) and var( Y )
just Bernoulli:
1
mean: E(Y ) = E(
n
Variance:
n
i
1
Yi ) =
n
1
1
E (Yi ) =
n
1
n
Y
i 1
= E[Y
=E
1
n
1
=E
n
Yi
i 1
2
(Yi
i 1
8
1/2/3-36
= Y + Y
So var(Y1 + Y2) = 2 Y
a2 Y
2
1
+ 2ab[cov(Y1,Y2)] +
b2 Y
2
2
var( Y ) = (1/2) Y
Mean
variance
of sampling
distribution
Mean
andand
variance
of sampling
distribution
of Y , of
ctd.Y
So in general,
E(Y ) =
var(Y ) =
Y
2
Y
Implications:
variance:
Y)
10
E( Y ) = Y
11
var(Y ) = Y2 / n
12
13
The LLN says that Y will be near Y when n is large with very
high probability
(LLN is sometimes called the law of averages, but this can be
misleading)
14
withinis an
interval of
the probability
true population
value
mator
consistent
if the
that its
fallstends to one
the sample
size
increases.
anasinterval
of the
true
population value tends to one
2
Y1,,Y
)
are
i.i.d.
and
< and
, then Y< is
consistent
(ample
n ifincreases.
Y is a consistent
So,
(Y1, ,Yn) areY i.i.d.
,athen
size
2
imator
of and
estimator
ofis, , that is,
) are i.i.d.
Y, that
Y < Y , then Y is a consistent
Y,
that is,Pr[|Y
Y|
< ]
1 as n
written,
] Ybe1written,
as nY
can
Y| <also
hichPr[|
canYThis
be
p
be
written,YY means
Y Y converges in probability to
(Y
Y).
2
Y
converges
means
converges
to Yimplies
).
means
Y Y ) in
in probability
to Y that
e Ymath
:which
as nY
, var(
= probability
0, which
2
n
Y
as
n
,
var(
Y
)
=
0, which implies that
| < any
] 1.)
[|Y YFor
I pick,
n no matter how small, this will be true
< ]
1.)
1/2/3-40
1/2/3-40
15
(Y
Y|
< ]
p
1 as n
which
canmath:
be
Y in probability
Y written,
converges
to
Y
Here
is the
Y means
p
Y).
2
Y
Y converges
in probability
to Y).
YY means
math: as n(Y , var(
)=
0, which implies
that
n
2
Y
the
math
:
as
n
,
var(
Y
)
=
0, which implies that
(
Y Y|which
< ] implies
1.) that
n
Pr[|Y Y| < ] 1.)
1/2/3-40
1/2/3-40
16
LLN illustrated
The figures show sampling distributions of Y , when Y is the
sample average of n Bernoulli random variables with
p = P(Yi = 1) = 0.78
The variance of the sampling distribution of Y decreases as n gets
larger, so the sampling distribution becomes more tightly
concentrated around its true mean Y = 0.78 and n increases
17
18
1/2/3-42
So if (Y1, ,Yn) are i.i.d. and 0 < Y < , then when n is large
Y2 /n
19
d. and 0 <
2
Y
distributed N( Y,
) (normal
Standardizingnjust means
2
(1) subtracting
mean (Y) so the mean is 0
variance the
ean Y and
Y /n)
(2) dividing by the standard deviation so the variance is 1
ized Y =
E (Y )
var(Y )
Y
Y
/ n
is
tributed
as N(0,1)
is approximately
distributed as N(0,1) (standard normal)
e betterThe
is the
approximation.
larger
is n, the better is the approximation
on of Y when Y is Bernoulli, p = 0.78:
1/2/3-41
20
21
22
E (Y )
var(Y )
1/2/3-43
23
< , then
/n
so Y is an unbiased estimator of Y
o Y n is large,
of large numbers)
When
the sampling
distribution simplifies:
Y (Law
p
Y
ELLN,
(Y ) Y is consistent)
(by
the
oo Y
of large numbers)
is approximately
N(0,1) (CLT)
Y (Law
var(Y )
Y E (Y )
o
is approximately N(0,1) (CLT)
var(Y )
1/2/3-44
25
To Estimate Y?
Why Use
Y is unbiased: E( Y ) = Y
Y is consistent: Y converges in probability to Y
(b) Why Use Y To Estimate Y?
Y is unbiased: E(Y ) =
Y
Footnote: Later on we will see that Y is the least squares
p
Y estimator
of YY
is consistent:
Y
Y is the
least
That
is, Y squares
solves, estimator of
n
min m
(Yi
Y;
Y solves,
m)2
i 1
26
ctd.
Y has a smaller variance than all other linear unbiased estimators
for example,
consider the estimator
than allother
linear unbiased
ctd.
1 n
aiYi , where
estimator, Y
ni1
an all other linear unbiased
nbiased;
then1{a
var(Y
) var( )
n
where
i} are such thatY Y is unbiased
aiYi , where
imator, Y
Then itn can
i 1 be shown that
rased;
of Ythen
can
you)think
var(Y
var(ofYa)time
e median instead?
Y isnt the only estimator of Y can you think of a case
f Y can you
think
a time
where
youofmight
want to use the median instead?
median instead?
27
28
29
30
31
1/2/3-3
32
34