5 Estimation PDF

Economics 420
Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Experiments and Basic Statistics
1. The probability framework for statistical inference
2. Experiments and estimation
3. Hypothesis testing
4. Confidence Intervals
2. Estimation
Almost everything we do in econometrics involves averages or
weighted averages of a sample of data
So we need to understand the distributions of sample averages
Again, we will assume simple random sampling
Choose n individuals (districts, entities) at random from the
population
The n observations in the sample are denoted Y1, ,Yn, where
Y1 is the first observation,Y2 is the second, and so on
So the data set is (Y1,Y2,,Yn), where Yi = value of Y for the
ith individual (district, entity, unit) sampled
Distribution of Y1, , Yn under simple random sampling

Because individuals #1 and #2 are selected at random, the
value of Y1 has no information content for Y2
If follows that:
First,Y1 and Y2 are independently distributed
Second,Y1 and Y2 come from the same distribution that is,
Y1 and Y2 are identically distributed
So under simple random sampling,Y1 and Y2 are independently
and identically distributed i.i.d. for short
More generally, under simple random sampling, {Yi}, i = 1, , n,
are i.i.d.
Sample average (or mean)
Y is the natural estimator of the mean

n
1
Y = (Y2 + ... + Yi + ... + Yn ) / n = Yi
n i=1
But
What are the properties of Y ?

Why should we use Y rather than some other estimator?
For example, Y1 (the first observation), or maybe use unequal
weights (not a simple average), or the median
The starting point for answering these questions is the sampling
distribution of Y
4
a. The sampling distribution of Y
Y is a random variable, and its properties are determined by the

sampling distribution of Y
The individuals in the sample are drawn at random

So the values of (Y1, ,Yn) are random
And functions of (Y1, ,Yn), such as Y , are random
If a different sample had been drawn, Y would have taken a
different value
The distribution of Y over different possible samples of size n

is called the sampling distribution of Y
The mean and variance of Y are the mean and variance of its
sampling distribution, E( Y ) and var( Y )
The concept of the sampling distribution underpins all of

econometrics, so you need to understand it
Mean and variance of the sampling distribution of
First, the mean: If n = 2 (so we have two observations), we

want to know the mean of the sum of Y1 and Y2, which is
E[(1/2)(Y1 + Y2)], or (1/2) E(Y1 + Y2)
It turns out that E(Y1 + Y2)

= E(Y1) + E(Y2)

= Y + Y

= 2 Y
So the mean of the sample average is:
(1/2) E(Y1 + Y2) = (1/2) (2 Y) = Y
The mean and variance of the sampling distribution of Y

In general that is, if Y is i.i.d. from any distribution, then the
General
mean case
is: that is, for Yi i.i.d. from any distribution, not
just Bernoulli:
1
mean: E(Y ) = E(
n

Variance:
n
i
1
Yi ) =
n
1
1
E (Yi ) =
n
1
n
Y
i 1
var(Y ) = E[Y E(Y )]2

2
]
Y
= E[Y
=E
1
n
1
=E
n
Yi
i 1
2
(Yi
i 1
8
1/2/3-36
What about the variance? Go back to n = 2

It turns out that var(Y1 + Y2) = var(Y1) + var(Y2)
2
2
= Y + Y
as long as E(Y1 + Y2) are independent (which they are in this

case)
So var(Y1 + Y2) = 2 Y
It turns out that var(aY1 + bY2)

a2 Y
2
1
+ 2ab[cov(Y1,Y2)] +
b2 Y
2
2
And if you apply that to our case, a = b = 1/2, and cov(Y1,Y2) =

0, so
var( Y ) = (1/2) Y
Mean
variance
of sampling
distribution
Mean
andand
variance
of sampling
distribution
of Y , of
ctd.Y
So in general,
E(Y ) =
var(Y ) =

Y
2
Y
and the standard deviation of Y is the square root of the
Implications:
variance:
1. Y is an unbiased estimator of Y (that is, E(Y ) =

SD( Y ) = Y = Y/(n)
2. var(Y ) is inversely proportional to n
Y)
the spread of the sampling distribution is

proportional to 1/ n
Thus the sampling uncertainty associated with Y is
proportional to 1/ n (larger samples, less
uncertainty, but square-root law)
10
What does all this imply? (And why does it matter?)

The first statement

E( Y ) = Y
says that Y is an unbiased estimator of Y
11
The second statement

var(Y ) = Y2 / n
says Y is a consistent estimator of Y
That is, var( Y ) is inversely proportional to n it gets smaller

as n gets larger (makes sense)
That means the spread of the sampling distribution is
proportional to 1/n ... OR
The sampling uncertainty associated with Y is proportional to

1/n larger samples, less uncertainty
12
How do we know? The sampling distribution of Y when

n is large
For small sample sizes, the distribution of Y is complicated, but if
n is large, the sampling distribution is pretty simple
As n increases, the distribution of Y becomes more tightly

centered around Y:
Law of Large Numbers and consistency
Moreover, the distribution of ( Y Y) becomes normal:

Central Limit Theorem and unbiasedness
13
The Law of Large Numbers (LLN) and consistency
The LLN says that Y will be near Y when n is large with very
high probability
(LLN is sometimes called the law of averages, but this can be
misleading)
This implies that Y is a consistent estimator of Y

What is consistency?
An estimator is consistent if the probability that it falls within
an interval of the true population value tends to 1 (unity) as
the sample size increases
14
withinis an
interval of
the probability
true population
value
mator
consistent
if the
that its
fallstends to one
the sample
size
increases.
anasinterval
of the
true
population value tends to one
2
Y1,,Y
)
are
i.i.d.
and
< and
, then Y< is
consistent
(ample
n ifincreases.
Y is a consistent
So,
(Y1, ,Yn) areY i.i.d.
,athen
size
2
imator
of and
estimator
ofis, , that is,
) are i.i.d.
Y, that
Y < Y , then Y is a consistent
Y,
that is,Pr[|Y
Y|
< ]
1 as n
written,
] Ybe1written,
as nY
can
Y| <also
hichPr[|
canYThis
be
p
be
written,YY means
Y Y converges in probability to
(Y
Y).
2
Y
converges
means
converges
to Yimplies
).
means
Y Y ) in
in probability
to Y that
e Ymath
:which
as nY
, var(
= probability
0, which
2
n
Y
as
n
,
var(
Y
)
=
0, which implies that
| < any
] 1.)
[|Y YFor
I pick,
n no matter how small, this will be true
< ]
1.)
1/2/3-40
1/2/3-40
15
ch can be written, Y Pr[|YY
(Y
Y|
< ]
p
1 as n
which
canmath:
be
Y in probability
Y written,
converges
to
Y
Here
is the
Y means
p
Y).
2
Y
Y converges
in probability
to Y).
YY means
math: as n(Y , var(
)=
0, which implies
that
n
2
Y
the
math
:
as
n
,
var(
Y
)
=
0, which implies that
(
Y Y|which
< ] implies
1.) that
n
Pr[|Y Y| < ] 1.)
1/2/3-40
1/2/3-40
16
LLN illustrated
The figures show sampling distributions of Y , when Y is the
sample average of n Bernoulli random variables with
p = P(Yi = 1) = 0.78
The variance of the sampling distribution of Y decreases as n gets
larger, so the sampling distribution becomes more tightly
concentrated around its true mean Y = 0.78 and n increases
17
18
1/2/3-42
The Central Limit Theorem (CLT) and unbiasedness
The CLT says that the distribution of Y is well approximated

by the normal distribution when n is large
2
So if (Y1, ,Yn) are i.i.d. and 0 < Y < , then when n is large
the distribution of Y is approximately normal

2
That is, Y is approximately distributed N(Y , Y /n)
which means normally distributed with mean Y and variance
Y2 /n
19
d. and 0 <
2
Y
< , then when n is large
Y is well approximated by a normal

CLT illustrated
2
It is easier to Ysee
the CLT in action if we first standardize Y
distributed N( Y,
) (normal
Standardizingnjust means
2
(1) subtracting
mean (Y) so the mean is 0
variance the
ean Y and
Y /n)
(2) dividing by the standard deviation so the variance is 1
pproximately distributed N(0,1)

So the standardized Y =
ized Y =
E (Y )
var(Y )
Y
Y
/ n
is
tributed
as N(0,1)
is approximately
distributed as N(0,1) (standard normal)
e betterThe
is the
approximation.
larger
is n, the better is the approximation
on of Y when Y is Bernoulli, p = 0.78:
1/2/3-41
20
These figures again show sampling distributions of Y (the

sample average of n Bernoulli random variables with p = 0.78),
but this time after standardizing Y
Standardizing magnifies the distributions (by a factor of n) so

you can see what is going on better
When the sample size is large, the sampling distributions are
increasingly well approximated by he normal distribution
(shown by the solid line), and the CLT predicts
The normal distribution is scaled so that the height of the
distribution about the same in all the figures
21
So again, these are sampling distributions for
22
Same example: sampling distribution of
E (Y )
var(Y )
1/2/3-43
23
Summary: The Sampling Distribution of Y

If Y1, ,Yn are i.i.d. with 0 <
< , then
The exact (finite sample) sampling distribution of Y has mean

Y
so Y is an consistent estimator of Y
Also, the exact sampling distribution of Y has variance
/n
so Y is an unbiased estimator of Y
Other than its mean and variance, the exact distribution of Y

is complicated and depends on the distribution of Y (the
population distribution)
24
Y is complicated and depends on the distribution of Y (the

Other than its mean and variance, the exact distribution of
population distribution)
Y is complicated and depends on the distribution of Y (the
is large,
samplingdistribution
distribution simplifies:
When
When
n isndistribution)
large,
thethesampling
simplifies:
population
p
o Y n is large,
of large numbers)
When
the sampling
distribution simplifies:
Y (Law
p
Y
ELLN,
(Y ) Y is consistent)
(by
the
oo Y
of large numbers)
is approximately
N(0,1) (CLT)
Y (Law
var(Y )
Y E (Y )
o
is approximately N(0,1) (CLT)
var(Y )
1/2/3-44
(by the CLT, Y is unbiased)

1/2/3-44
25
To Estimate Y?
Why Use
Y is unbiased: E( Y ) = Y
Y is consistent: Y converges in probability to Y
(b) Why Use Y To Estimate Y?
Y is unbiased: E(Y ) =
Y
Footnote: Later on we will see that Y is the least squares
p
Y estimator
of YY
is consistent:
Y
Y is the
least
That
is, Y squares
solves, estimator of
n
min m
(Yi
Y;
Y solves,
m)2
i 1
the sum of squared residuals

so, Y minimizes
Y
so, minimizes the sum of squared residuals
optional derivation (also see App. 3.2)
n
n
d n
d
2
2
(Yi m ) =
(Yi m ) = 2 (Yi m )
dm i 1
i 1 dm
i 1
Set derivative to zero and denote optimal value of m by m :
26
Why Use Y To Estimate Y? (continued)

Y,
ctd.
Y has a smaller variance than all other linear unbiased estimators
for example,
consider the estimator
than allother
linear unbiased
ctd.
1 n
aiYi , where
estimator, Y
ni1
an all other linear unbiased
nbiased;
then1{a
var(Y
) var( )
n
where
i} are such thatY Y is unbiased
aiYi , where
imator, Y
Then itn can
i 1 be shown that
rased;
of Ythen
can
you)think
var(Y
var(ofYa)time
e median instead?
Y isnt the only estimator of Y can you think of a case
f Y can you
think
a time
where
youofmight
want to use the median instead?
median instead?
27
What have we learned?

The sampling distribution of Y is the distribution of Y over
different possible samples of sizes (n)
The mean of the sampling distribution of Y E( Y ) is Y
The variance of the sampling distribution of Y var( Y ) is
/n
28
Details on the sampling distribution of Y (extra stuff)

Example: Suppose Y can be 0 or 1 (a Bernoulli random
variable) with the probability distribution,
Pr[Y = 0] = 0.22, and Pr(Y =1) = 0.78
Then
E(Y) = p 1 + (1 p) 0 = p = 0.78
and
= E[Y E(Y)]2 = p(1 p)

= 0.78 (1 0.78) = 0.1716
[from your stats course]
29
The sampling distribution of Y depends on n

Consider a sample of size 2 (n = 2)
The sampling distribution of Y is
Pr( Y = 0) = 0.222 = 0.0484 (only one way this can happen)
Pr( Y = ) = (0.22 0.78) + (0.78 0.22) = 0.3432

(two ways this can happen)
Pr( Y = 1) = 0.782 = 0.6084 (again, one way this can happen)
30
The sampling distribution of Y when Y is Bernoulli (p = 0.78), for
The sampling distribution of Y when Y is Bernoulli (p = .78):

different sample sizes (n)
31
1/2/3-3
32
Things we want to know about the sampling distribution
What is the mean of Y ?

If E( Y ) = true = 0.78, then Y is an unbiased estimator of
What is the variance of Y ?

Does var( Y ) depend on n? (yes 1/n formula)
Does Y get close to when n is large?
Law of large numbers: Y is a consistent estimator of
Y appears bell shaped for n large

Is this generally true?
Yes, Y is approximately normally distributed for large n
(Central Limit Theorem)
33
34

5 Estimation PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 Estimation PDF

Uploaded by

Copyright:

Available Formats

Economics 420

Distribution of Y1, , Yn under simple random sampling

Sample average (or mean)

Y is the natural estimator of the mean

What are the properties of Y ?

a. The sampling distribution of Y

Y is a random variable, and its properties are determined by the

The individuals in the sample are drawn at random

The distribution of Y over different possible samples of size n

The concept of the sampling distribution underpins all of

Mean and variance of the sampling distribution of

First, the mean: If n = 2 (so we have two observations), we

The mean and variance of the sampling distribution of Y

var(Y ) = E[Y E(Y )]2

What about the variance? Go back to n = 2

as long as E(Y1 + Y2) are independent (which they are in this

It turns out that var(aY1 + bY2)

And if you apply that to our case, a = b = 1/2, and cov(Y1,Y2) =

and the standard deviation of Y is the square root of the

1. Y is an unbiased estimator of Y (that is, E(Y ) =

the spread of the sampling distribution is

What does all this imply? (And why does it matter?)

says that Y is an unbiased estimator of Y

The second statement

says Y is a consistent estimator of Y

That is, var( Y ) is inversely proportional to n it gets smaller

The sampling uncertainty associated with Y is proportional to

How do we know? The sampling distribution of Y when

As n increases, the distribution of Y becomes more tightly

Moreover, the distribution of ( Y Y) becomes normal:

The Law of Large Numbers (LLN) and consistency

This implies that Y is a consistent estimator of Y

ch can be written, Y Pr[|YY

The Central Limit Theorem (CLT) and unbiasedness

The CLT says that the distribution of Y is well approximated

the distribution of Y is approximately normal

That is, Y is approximately distributed N(Y , Y /n)

which means normally distributed with mean Y and variance

< , then when n is large

Y is well approximated by a normal

pproximately distributed N(0,1)

These figures again show sampling distributions of Y (the

Standardizing magnifies the distributions (by a factor of n) so

So again, these are sampling distributions for

Same example: sampling distribution of

Summary: The Sampling Distribution of Y

The exact (finite sample) sampling distribution of Y has mean

Also, the exact sampling distribution of Y has variance

Other than its mean and variance, the exact distribution of Y

Y is complicated and depends on the distribution of Y (the

(by the CLT, Y is unbiased)

the sum of squared residuals

Why Use Y To Estimate Y? (continued)

What have we learned?

Details on the sampling distribution of Y (extra stuff)

[from your stats course]

The sampling distribution of Y depends on n

(two ways this can happen)

Pr( Y = 1) = 0.782 = 0.6084 (again, one way this can happen)

The sampling distribution of Y when Y is Bernoulli (p = 0.78), for

The sampling distribution of Y when Y is Bernoulli (p = .78):

Things we want to know about the sampling distribution