Estimation and Testing of Population Parameters

1/27
EC114 Introduction to Quantitative Economics

6. Estimation and Testing of Population Parameters
Department of Economics
University of Essex
15/17 November 2011
EC114 Introduction to Quantitative Economics 6. Estimation and Testing of Population Parameters
2/27
Outline
1
Motivation
2
Point Estimates
3
Condence Intervals
4
Testing Hypotheses About a Population Mean
Reference: R. L. Thomas, Using Statistics in Economics,
McGraw-Hill, 2005, sections 3.3 and 4.1.
Motivation 3/27
We have used the Central Limit Theorem (CLT) to calculate
probabilities involving the sample mean,

X, assuming that
the population mean and variance, and
2
, are known.
However, usually the values of the population parameters
are unknown.
This is a common situation in statistics (and econometrics),
and so we try to estimate the unknown parameters from
the sample data.
We shall consider point estimates (a single number) as
well as condence intervals (a range of values) for the
population mean, .
Point Estimates 4/27
The obvious estimator of is the sample mean,

X.
It is actually a good estimator: we know that E(
X) = .
This means that, in repeated samples,

X will on average
be equal to .
Important: we are not saying that

X is equal to and the
statement that

X = is INCORRECT it is E(
X) that
equals i.e. the mean of all

X values obtained from many
samples.
There is no systematic tendency for there to be an error in
estimating by

X i.e. no systematic tendency to
overestimate of underestimate .
When E(
X) = we say that there is no bias in our

estimator and

X is an unbiased estimator of .
We shall also need to estimate the population variance,
2
.
An obvious estimator is
v
2
=
n
i=1
(X
i
X)
2
n
.
But v
2
is not unbiased because E(v
2
) =
2
.
In fact, it can be shown that
E(v
2
) =
n 1
n

2
<
2
so that, on average, v
2
underestimates
2
.
This is depicted below:

i.e. v
2
is a biased estimator of
2
.
Is it possible to construct an unbiased estimator of
2
?
An alternative estimator replaces n in the denominator of v
2
with n 1:
s
2
=
n
i=1
(X
i
X)
2
n 1
.
The factor n 1 compensates for the downward bias and
we nd that s
2
> v
2
.
In fact,
s
2
=
n
n 1
v
2
,
and so
E(s
2
) = E
n
n 1
v
2
=
n
n 1
E(v
2
) =
n
n 1
n 1
n

2
=
2
.
Hence s
2
is an unbiased estimator of
2
.
Example. (Thomas, Example 3.8) During May, a random
sample of packages leaving a large wholesale store have
weights (in kg) as follows:
250, 2000, 720, 1200, 310, 280, 1460, 180.
Find unbiased estimates of the mean and variance of all
packages leaving the store in May.
Solution. The table on the next slide shows the calculation
of the relevant sums. We obtain:
X =
i
X
i
n
=
6400
8
= 800;
s
2
=
i
(X
i
X)
2
n 1
=
3, 239, 400
7
= 462, 771.43.
X
i
X
i
X (X
i
X)
2
X
2
i
250 550 302,500 62,500
2000 1200 1,440,000 4,000,000
720 80 6,400 518,400
1200 400 160,000 1,440,000
310 490 240,100 96,100
280 520 270,400 78,400
1460 660 435,600 2,131,600
180 620 384,400 32,400
6400 0 3,239,400 8,359,400
Note that another (computationally easier) way to calculate
s
2
is to use
s
2
=
i
X
2
i
n 1

n
n 1
X
2
,
which only involves computing
i
X
2
i
rather than
i
(X
i
X)
2
.
For the previous example,
s
2
=
8, 359, 400
7

8
7
800
2
= 1, 194, 200 731, 428.57 = 462, 771.43
as required.
Condence Intervals 11/27
Sometimes we wish to specify a degree of condence in
our estimator.
One way of doing this is to indicate a range of values within
which we are 95% condent that the true parameter value
lies.
Suppose we wish to construct a condence interval for the
population mean, .
The aim is to nd two values,

X E and

X +E, such that
there is a 95% probability that the range

X E to

X +E will
contain i.e. we need to nd an E such that
Pr(
X E < <

X +E) = 0.95.
Note that the centre of the range is

X so we can express
the range as

X E.
We shall make use of the Central Limit Theorem:
X N
,

2
n
Z =

X
/
n
N(0, 1).
We need to nd a value, k, such that
Pr(k < Z < k) = 0.95
i.e. the value k puts 2.5% of the distribution in each tail.
The tables show that k = 1.96; the relevant row is
z 0.00 ... 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.9 0.4713 ... 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
In terms of the N(0, 1) distribution:

We have
Pr(1.96 < Z < 1.96) = 0.95;
substituting the expression for Z:
Pr
1.96 <
X
/
n
< 1.96
= 0.95.
Multiply by /
n:
Pr
1.96

n
<

X < 1.96

= 0.95.
Subtract

X:
Pr
X 1.96

n
< <
X + 1.96

= 0.95.
Multiply by 1 (remember to change the inequality signs):
Pr
X + 1.96

n
> >

X 1.96

= 0.95
or (reversing the order)
Pr
X 1.96

n
< <

X + 1.96

= 0.95.
We have therefore shown that E = 1.96/
n and have
found a 95% large-sample condence interval for .
We can write the condence interval as
X 1.96

n
.
Problem: is unknown, therefore we replace it with s so
that for practical purposes the condence interval is
X 1.96
s
n
.
Remember that we are saying there is a 95% probability
that the true (unknown) value lies in the range
X 1.96
s
n
to

X + 1.96
s
n
.
If we wish to increase the probability level, say to 99%, we
need to nd a k such that Pr(k < Z < k) = 0.99.
The relevant row of the table is:
z 0.00 ... 0.03 0.04 0.05 0.06 0.07 0.08 0.09
2.5 0.4938 ... 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
Hence k = 2.58 and the 99% condence interval for is
X 2.58
s
n
.
As we increase the level of condence the width of the
interval becomes wider (less precise) - a 100% condence
interval is

X !
Example. (Thomas, Example 3.11) In 2002, the mean
weekly food expenditure of couples with two children was
164 with a standard deviation of 28. To estimate the
weekly food expenditure of such families in 2004, a sample
is to be taken. How large should the sample be if the
sample mean is to be within 2 of the true mean
expenditure with 90% condence?
Solution. We want to nd the value of n such that
Pr(2 <

X < 2) = 0.90.
The value of k such that Pr(k < Z < k) = 0.90 is
k = 1.645; the relevant row from the table is
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525
Hence Pr(1.645 < Z < 1.645) = 0.90 or
Pr
1.645 <
X
/
n
< 1.645
= 0.90.
Multiplying through by /
n we nd that
Pr
1.645

n
<

X < 1.645

= 0.90.
We require Pr(2 <

X < 2) = 0.90 so that we need to
nd n that satises
1.645

n
= 2.
We are told that = 28; hence
n =
1.645 28
2
= 23.03 n = 23.03
2
= 530.3809
i.e. we need n = 531.
Testing Hypotheses About a Population Mean 19/27
Lets return to the example of the mean income in a large
city (from Lecture 5).
In week 45 of 2006 it was found that = 17, 670
(population mean).
Surveying the entire population in week 45 of 2011 is too
expensive, so a random sample of size n = 400 is taken.
We want to ask: has increased since 2011?
We need to dene a null hypothesis
H
0
: = 17, 670 (is unchanged)
and an alternative hypothesis:
H
A
: > 17, 670 (has grown).
Important: the null and alternative hypotheses are always
in terms of the unknown population parameter.
The objective is to choose between H
0
and H
A
on the basis
of our sample of n = 400 and the sample mean

X.
If

X is much larger than 17,670 it would make sense to
reject H
0
and accept H
A
.
But how large does

X need to be?
From the Central Limit Theorem we have that
Z =

X
/
n
N(0, 1).
If H
0
is true, and = 17, 670, then consider the test statistic
TS =

X 17, 670
/
n
N(0, 1).
Important: TS is only N(0, 1) when H
0
is true.
Under H
0
we would expect TS to be close to zero.
However, if TS is sufciently larger than zero we would
reject H
0
.
Under H
0
there is only a small probability of obtaining a
value of TS much larger than zero.
Suppose that TS > 1.64 this puts TS into the top 5% of
the N(0, 1) distribution.
If this occurs, we regard it as evidence against H
0
,
because there is only a small probability of it happening if
H
0
were really true.
The rejection region is depicted below:

If TS > 1.64 we reject H
0
at the 0.05 (5%) level of
signicance.
The level of signicance = Pr(reject H
0
|H
0
is true).
If = 0.01 we reject H
0
if TS > 2.33.
A rejection of H
0
at the 1% signicance level is stronger
than rejection at the 5% level, but if is too small we would
almost never reject H
0
!
Our decision rule or test criterion is:
reject H
0
if TS =

X 17, 670
/
n
> 1.64;
otherwise we accept H
0
.
Another way of writing this is:
reject H
0
if

X > 17, 670 + 1.64

n
.
Suppose, from our sample of n = 400 residents, we nd
that

X = 17, 890 and s = 2048.
Using s in place of (recall that s
2
is an unbiased estimator
of
2
) we obtain
TS =
17, 890 17, 670
2048/
400
= 2.15.
Hence TS > 1.64 and we reject the null H
0
: = 17, 670 at
the 5% signicance level in favour of H
A
: > 17, 670.
Note, however, that TS < 2.33 so we would not reject H
0
at
the 1% signicance level.
The previous test is an example of a one-tail test because,
under H
A
, we were only interested in values of greater
than 17,670.
Suppose, instead, that we write the alternative hypothesis
as:
H
A
: = 17, 670.
Now, under H
A
, values of both greater and less than
17,670 are included this becomes a two-tail test.
There are now two critical values, one at each end of the
distribution: for a 5% level of signicance these values are
1.96; for a 1% signicance level they are 2.58.
Note that, for a signicance level , the critical values put
an area of /2 into each end of the distribution e.g. if
= 0.05 than 0.025 (2.5%) goes into each end.
The two-tail rejection region is depicted below:

Our decision rule at the 5% signicance level is:
reject H
0
if TS > 1.96 or TS < 1.96;
otherwise we accept H
0
.
Another way of writing this is in terms of the absolute value
of TS, denoted |TS|: reject H
0
if |TS| > 1.96.
Summary 27/27
Summary
Point estimates
Condence intervals
Testing hypotheses about a population mean
Next week:
Hypothesis testing

Estimation and Testing of Population Parameters

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation and Testing of Population Parameters

Uploaded by

Copyright:

Available Formats

1/27

EC114 Introduction to Quantitative Economics

X) = we say that there is no bias in our

You might also like