You are on page 1of 34

Economics 420

Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Experiments and Basic Statistics
1. The probability framework for statistical inference
2. Experiments and estimation
3. Hypothesis testing
4. Confidence Intervals

2. Estimation
Almost everything we do in econometrics involves averages or
weighted averages of a sample of data
So we need to understand the distributions of sample averages
Again, we will assume simple random sampling
Choose n individuals (districts, entities) at random from the
population
The n observations in the sample are denoted Y1, ,Yn, where
Y1 is the first observation,Y2 is the second, and so on
So the data set is (Y1,Y2,,Yn), where Yi = value of Y for the
ith individual (district, entity, unit) sampled

Distribution of Y1, , Yn under simple random sampling


Because individuals #1 and #2 are selected at random, the
value of Y1 has no information content for Y2
If follows that:
First,Y1 and Y2 are independently distributed
Second,Y1 and Y2 come from the same distribution that is,
Y1 and Y2 are identically distributed
So under simple random sampling,Y1 and Y2 are independently
and identically distributed i.i.d. for short
More generally, under simple random sampling, {Yi}, i = 1, , n,
are i.i.d.

Sample average (or mean)

Y is the natural estimator of the mean


n

1
Y = (Y2 + ... + Yi + ... + Yn ) / n = Yi
n i=1
But

What are the properties of Y ?


Why should we use Y rather than some other estimator?
For example, Y1 (the first observation), or maybe use unequal
weights (not a simple average), or the median
The starting point for answering these questions is the sampling
distribution of Y
4

a. The sampling distribution of Y

Y is a random variable, and its properties are determined by the


sampling distribution of Y

The individuals in the sample are drawn at random


So the values of (Y1, ,Yn) are random
And functions of (Y1, ,Yn), such as Y , are random
If a different sample had been drawn, Y would have taken a
different value

The distribution of Y over different possible samples of size n


is called the sampling distribution of Y

The mean and variance of Y are the mean and variance of its
sampling distribution, E( Y ) and var( Y )

The concept of the sampling distribution underpins all of


econometrics, so you need to understand it

Mean and variance of the sampling distribution of

First, the mean: If n = 2 (so we have two observations), we


want to know the mean of the sum of Y1 and Y2, which is
E[(1/2)(Y1 + Y2)], or (1/2) E(Y1 + Y2)
It turns out that E(Y1 + Y2)

= E(Y1) + E(Y2)

= Y + Y

= 2 Y
So the mean of the sample average is:
(1/2) E(Y1 + Y2) = (1/2) (2 Y) = Y

The mean and variance of the sampling distribution of Y


In general that is, if Y is i.i.d. from any distribution, then the
General
mean case
is: that is, for Yi i.i.d. from any distribution, not

just Bernoulli:
1
mean: E(Y ) = E(
n

Variance:

n
i

1
Yi ) =
n
1

1
E (Yi ) =
n
1

n
Y

i 1

var(Y ) = E[Y E(Y )]2


2
]
Y

= E[Y
=E

1
n

1
=E
n

Yi

i 1
2

(Yi

i 1
8
1/2/3-36

What about the variance? Go back to n = 2


It turns out that var(Y1 + Y2) = var(Y1) + var(Y2)
2
2

= Y + Y

as long as E(Y1 + Y2) are independent (which they are in this


case)

So var(Y1 + Y2) = 2 Y

It turns out that var(aY1 + bY2)



a2 Y

2
1

+ 2ab[cov(Y1,Y2)] +

b2 Y

2
2

And if you apply that to our case, a = b = 1/2, and cov(Y1,Y2) =


0, so

var( Y ) = (1/2) Y

Mean
variance
of sampling
distribution
Mean
andand
variance
of sampling
distribution
of Y , of
ctd.Y
So in general,

E(Y ) =

var(Y ) =

Y
2
Y

and the standard deviation of Y is the square root of the

Implications:
variance:

1. Y is an unbiased estimator of Y (that is, E(Y ) =






SD( Y ) = Y = Y/(n)
2. var(Y ) is inversely proportional to n

Y)

the spread of the sampling distribution is


proportional to 1/ n
Thus the sampling uncertainty associated with Y is
proportional to 1/ n (larger samples, less
uncertainty, but square-root law)

10

What does all this imply? (And why does it matter?)


The first statement

E( Y ) = Y

says that Y is an unbiased estimator of Y

11

The second statement



var(Y ) = Y2 / n

says Y is a consistent estimator of Y

That is, var( Y ) is inversely proportional to n it gets smaller


as n gets larger (makes sense)
That means the spread of the sampling distribution is
proportional to 1/n ... OR

The sampling uncertainty associated with Y is proportional to


1/n larger samples, less uncertainty

12

How do we know? The sampling distribution of Y when


n is large
For small sample sizes, the distribution of Y is complicated, but if
n is large, the sampling distribution is pretty simple

As n increases, the distribution of Y becomes more tightly


centered around Y:
Law of Large Numbers and consistency

Moreover, the distribution of ( Y Y) becomes normal:


Central Limit Theorem and unbiasedness

13

The Law of Large Numbers (LLN) and consistency

The LLN says that Y will be near Y when n is large with very
high probability
(LLN is sometimes called the law of averages, but this can be
misleading)

This implies that Y is a consistent estimator of Y


What is consistency?
An estimator is consistent if the probability that it falls within
an interval of the true population value tends to 1 (unity) as
the sample size increases

14

withinis an
interval of
the probability
true population
value
mator
consistent
if the
that its
fallstends to one
the sample
size
increases.
anasinterval
of the
true
population value tends to one
2
Y1,,Y
)
are
i.i.d.
and
< and
, then Y< is
consistent
(ample
n ifincreases.
Y is a consistent
So,
(Y1, ,Yn) areY i.i.d.
,athen
size
2
imator
of and
estimator
ofis, , that is,
) are i.i.d.
Y, that
Y < Y , then Y is a consistent

Y,

that is,Pr[|Y

Y|

< ]

1 as n

written,
] Ybe1written,
as nY
can
Y| <also
hichPr[|
canYThis
be
p

be
written,YY means
Y Y converges in probability to
(Y

Y).

2
Y
converges

means
converges
to Yimplies
).
means
Y Y ) in
in probability
to Y that
e Ymath
:which
as nY
, var(
= probability
0, which
2
n
Y
as
n
,
var(
Y
)
=
0, which implies that
| < any
] 1.)
[|Y YFor
I pick,
n no matter how small, this will be true
< ]

1.)
1/2/3-40

1/2/3-40
15

ch can be written, Y Pr[|YY

(Y

Y|

< ]
p

1 as n

which
canmath:
be
Y in probability
Y written,
converges
to
Y
Here
is the
Y means
p

Y).

2
Y

Y converges
in probability
to Y).
YY means
math: as n(Y , var(
)=
0, which implies
that
n
2
Y
the
math
:
as
n
,
var(
Y
)
=
0, which implies that
(
Y Y|which
< ] implies
1.) that
n
Pr[|Y Y| < ] 1.)
1/2/3-40

1/2/3-40

16

LLN illustrated
The figures show sampling distributions of Y , when Y is the
sample average of n Bernoulli random variables with
p = P(Yi = 1) = 0.78
The variance of the sampling distribution of Y decreases as n gets
larger, so the sampling distribution becomes more tightly
concentrated around its true mean Y = 0.78 and n increases

17

18
1/2/3-42

The Central Limit Theorem (CLT) and unbiasedness

The CLT says that the distribution of Y is well approximated


by the normal distribution when n is large
2

So if (Y1, ,Yn) are i.i.d. and 0 < Y < , then when n is large

the distribution of Y is approximately normal


2

That is, Y is approximately distributed N(Y , Y /n)

which means normally distributed with mean Y and variance

Y2 /n

19

d. and 0 <

2
Y

< , then when n is large

Y is well approximated by a normal


CLT illustrated
2
It is easier to Ysee
the CLT in action if we first standardize Y

distributed N( Y,

) (normal

Standardizingnjust means
2
(1) subtracting
mean (Y) so the mean is 0
variance the
ean Y and
Y /n)
(2) dividing by the standard deviation so the variance is 1

pproximately distributed N(0,1)


So the standardized Y =

ized Y =

E (Y )

var(Y )

Y
Y

/ n

is

tributed
as N(0,1)
is approximately
distributed as N(0,1) (standard normal)

e betterThe
is the
approximation.
larger
is n, the better is the approximation
on of Y when Y is Bernoulli, p = 0.78:
1/2/3-41

20

These figures again show sampling distributions of Y (the


sample average of n Bernoulli random variables with p = 0.78),
but this time after standardizing Y

Standardizing magnifies the distributions (by a factor of n) so


you can see what is going on better
When the sample size is large, the sampling distributions are
increasingly well approximated by he normal distribution
(shown by the solid line), and the CLT predicts
The normal distribution is scaled so that the height of the
distribution about the same in all the figures

21

So again, these are sampling distributions for

22

Same example: sampling distribution of

E (Y )

var(Y )

1/2/3-43
23

Summary: The Sampling Distribution of Y


If Y1, ,Yn are i.i.d. with 0 <

< , then

The exact (finite sample) sampling distribution of Y has mean


Y
so Y is an consistent estimator of Y

Also, the exact sampling distribution of Y has variance

/n

so Y is an unbiased estimator of Y

Other than its mean and variance, the exact distribution of Y


is complicated and depends on the distribution of Y (the
population distribution)
24

Y is complicated and depends on the distribution of Y (the


Other than its mean and variance, the exact distribution of
population distribution)
Y is complicated and depends on the distribution of Y (the
is large,
samplingdistribution
distribution simplifies:
When
When
n isndistribution)
large,
thethesampling
simplifies:
population
p

o Y n is large,
of large numbers)
When
the sampling
distribution simplifies:
Y (Law
p
Y
ELLN,
(Y ) Y is consistent)
(by
the
oo Y
of large numbers)
is approximately
N(0,1) (CLT)
Y (Law
var(Y )
Y E (Y )
o
is approximately N(0,1) (CLT)
var(Y )
1/2/3-44

(by the CLT, Y is unbiased)


1/2/3-44

25

To Estimate Y?

Why Use

Y is unbiased: E( Y ) = Y
Y is consistent: Y converges in probability to Y
(b) Why Use Y To Estimate Y?

Y is unbiased: E(Y ) =

Y
Footnote: Later on we will see that Y is the least squares
p

Y estimator
of YY
is consistent:
Y
Y is the
least
That
is, Y squares
solves, estimator of
n

min m

(Yi

Y;

Y solves,

m)2

i 1

the sum of squared residuals


so, Y minimizes
Y
so, minimizes the sum of squared residuals
optional derivation (also see App. 3.2)
n
n
d n
d
2
2
(Yi m ) =
(Yi m ) = 2 (Yi m )
dm i 1
i 1 dm
i 1
Set derivative to zero and denote optimal value of m by m :

26

Why Use Y To Estimate Y? (continued)


Y,

ctd.
Y has a smaller variance than all other linear unbiased estimators

for example,
consider the estimator
than allother
linear unbiased
ctd.
1 n
aiYi , where
estimator, Y
ni1
an all other linear unbiased
nbiased;
then1{a
var(Y
) var( )
n
where
i} are such thatY Y is unbiased
aiYi , where
imator, Y
Then itn can
i 1 be shown that
rased;
of Ythen
can
you)think
var(Y
var(ofYa)time
e median instead?
Y isnt the only estimator of Y can you think of a case
f Y can you
think
a time
where
youofmight
want to use the median instead?
median instead?
27

What have we learned?


The sampling distribution of Y is the distribution of Y over
different possible samples of sizes (n)
The mean of the sampling distribution of Y E( Y ) is Y
The variance of the sampling distribution of Y var( Y ) is
/n

28

Details on the sampling distribution of Y (extra stuff)


Example: Suppose Y can be 0 or 1 (a Bernoulli random
variable) with the probability distribution,
Pr[Y = 0] = 0.22, and Pr(Y =1) = 0.78
Then
E(Y) = p 1 + (1 p) 0 = p = 0.78
and
= E[Y E(Y)]2 = p(1 p)


= 0.78 (1 0.78) = 0.1716

[from your stats course]

29

The sampling distribution of Y depends on n


Consider a sample of size 2 (n = 2)
The sampling distribution of Y is
Pr( Y = 0) = 0.222 = 0.0484 (only one way this can happen)
Pr( Y = ) = (0.22 0.78) + (0.78 0.22) = 0.3432

(two ways this can happen)

Pr( Y = 1) = 0.782 = 0.6084 (again, one way this can happen)

30

The sampling distribution of Y when Y is Bernoulli (p = 0.78), for

The sampling distribution of Y when Y is Bernoulli (p = .78):


different sample sizes (n)

31

1/2/3-3

32

Things we want to know about the sampling distribution

What is the mean of Y ?


If E( Y ) = true = 0.78, then Y is an unbiased estimator of

What is the variance of Y ?


Does var( Y ) depend on n? (yes 1/n formula)
Does Y get close to when n is large?
Law of large numbers: Y is a consistent estimator of

Y appears bell shaped for n large


Is this generally true?
Yes, Y is approximately normally distributed for large n
(Central Limit Theorem)
33

34

You might also like