Professional Documents
Culture Documents
CVEN2002/2702
2015 - Week 7
This lecture
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
2 / 43
7.1 Introduction
Introduction
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
3 / 43
Point estimation
Recall that we wish to estimate an unknown parameter of a
population. For instance, the population mean , from a random
sample of size n, say X1 , X2 , . . . , Xn .
To do so, we select an estimator, which must be a statistic (i.e. a value
= h(X1 , X2 , . . . , Xn )
computable from the sample), say
An estimator is a random variable, which has its mean, its
variance and its probability distribution, known as the sampling
distribution
For instance, to estimate the population mean , P
we suggested in the
1
) =
E(X
) =
and Var(X
2
,
n
Dr Heng Lian
,
n
2015- Week 7
4 / 43
Properties of estimators
The choice of the sample mean to estimate the population mean
seems quite natural. However, there are many other estimators that
can be used to calculate an estimate. Why not:
1 = X1 , the first observed value;
2 = (X1 + Xn )/2;
Dr Heng Lian
2015- Week 7
5 / 43
Definition
of is said to be unbiased if and only if its mean is
An estimator
equal to , whatever the value of , i.e.
=
E()
an estimator is unbiased if on the average its values will equal the
parameter it is supposed to estimate
If an estimator is not unbiased, then the difference
E()
is called the bias of the estimator systematic error
) =
For instance, we showed that E(X
is an unbiased estimator for
the sample mean X
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
6 / 43
X1 + Xn
2
aX1 + bXn
a+b
=
=
1
1
(E(X1 ) + E(Xn )) = ( + ) =
2
2
1
1
(aE(X1 ) + bE(Xn )) =
(a + b) =
a+b
a+b
1,
2 and
3 are also unbiased estimators for
Dr Heng Lian
2015- Week 7
7 / 43
2
n ,
while we have
Fact
Estimators with smaller variances are more likely to produce estimates
close to the true value .
a logical principle of estimation is to choose the unbiased estimator
that has minimum variance
Such an estimator is said to be efficient among the unbiased
estimators.
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
8 / 43
Properties of estimators
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
9 / 43
2) =
Var(
2
9 0,
2
3 ) = 2
Var(
a2 + b 2
90
(a + b)2
Dr Heng Lian
2015- Week 7
10 / 43
Sample mean
is unbiased and
We have seen thus far that the sample mean X
consistent as an estimator of the population mean .
It can be also shown that in most practical situations where we
estimate the population mean , the variance of no other estimator is
less than the variance of the sample mean.
Fact
In most practical situations, the sample mean is a very good estimator
for the population mean .
Note: there exist several other criteria for assessing the goodness of
point estimation methods, but we shall not discuss them in this course
when we will have to
Here, we will always use the sample mean X
estimate the population mean .
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
11 / 43
Definition
is its standard deviation sd().
Dr Heng Lian
2015- Week 7
12 / 43
) = and Var(X
) = 2 , so the standard error of X
We know that E(X
n
as an estimator of is
) = ,
sd(X
n
However, we cannot report a numerical value for this standard error as
it depends in turn on the population standard deviation , which is
usually unknown.
we have a natural estimate of the population standard deviation
given by the observed sample standard deviation
q
1 Pn
2
s = n1
i=1 (xi x )
\
) with sd(
) =
estimate the standard error sd(X
X
s
n
Dr Heng Lian
2015- Week 7
13 / 43
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
14 / 43
= 0.0898 BTU/hr-ft- F
sd(
X
n
10
x =
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
15 / 43
Dr Heng Lian
2015- Week 7
16 / 43
Confidence intervals
The preceding interval is called a confidence interval.
Definition
A confidence interval is an interval for which we can assert with a
reasonable degree of certainty (or confidence) that it will contain the
true value of the population parameter under consideration.
A confidence interval is always calculated by first selecting a
confidence level, which measures its degree of reliability
a confidence interval of level 100 (1 )% means that we are
100 (1 )% confident that the true value of the parameter is
included into the interval ( is a real number in [0, 1])
The most frequently used confidence levels are 90%, 95% and 99%
the higher the confidence level, the more strongly we believe that
the value of the parameter being estimated lies within the interval
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
17 / 43
Fact
The 100 (1 )% refers to the percentage of all samples of the
same size possibly drawn from the same population which would
produce an interval containing the true .
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
18 / 43
Dr Heng Lian
2015- Week 7
19 / 43
0.4
0.2
0.0
0.2
0.4
20
40
60
80
100
sample number
Dr Heng Lian
2015- Week 7
20 / 43
Dr Heng Lian
2015- Week 7
21 / 43
X
=1
Xi N
X
n
i=1
,
n
X
X
= n
N (0, 1)
/ n
Dr Heng Lian
2015- Week 7
22 / 43
0.4
0.2
0.0
0.1
P(z1/2 Z z1/2 ) = 1
(x)
0.3
z1
z1
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
23 / 43
x z1/2 , x + z1/2
n
n
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
24 / 43
Dr Heng Lian
2015- Week 7
25 / 43
Dr Heng Lian
2015- Week 7
26 / 43
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
27 / 43
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
28 / 43
x z1/2 , x + z1/2
n
n
(?)
x z1/2 , x + z1/2
n
n
is the shortest one among all confidence intervals of level 1
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
29 / 43
, x + z1
n
and
x z1 , +
n
Dr Heng Lian
2015- Week 7
30 / 43
X
n
N (0, 1)
Dr Heng Lian
2015- Week 7
31 / 43
Dr Heng Lian
2015- Week 7
32 / 43
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
33 / 43
What the CLT states is that, when the Xi s are not normal (whatever
they are!), n X
N (0, 1) when n is infinitely large
the standard normal distribution provides a reasonable
approximation to the distribution of n X
when n is large
This is usually denoted
a
X
n
N (0, 1)
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
34 / 43
Xi U[a,b]
Xi Bern()
( =
( =
1/ a
X
1
1
, = ) = n
N (0, 1)
1/
a+b a
X
a+b
ba
2
, = ) = n
N (0, 1)
ba
2
12
12
( = , =
X
a
(1 )) = n p
N (0, 1)
(1 )
Facts:
the larger n, the better the normal approximation
the closer the population distribution is to being normal, the more
rapidly the distribution of n X
approaches normality as n gets
large
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
35 / 43
Xi Exp(1) 1
0.4
0.6
0.0
0.2
0.1
0.2
distribution
0.8
0.4
0.3
0.0
distribution
( = 0, = 1)
1.0
( = 0, = 1)
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
36 / 43
Xi Bern()
i=1 Xi ,
n = 50
n = 100
n = 500
10
15
20
25
30
35
40
0.025
0.020
pX(x)
0.000
30
40
50
60
70
200
220
240
260
n=5
n = 50
n = 100
n = 500
280
300
80
100
3
x
CVEN2002/2702 (Statistics)
0.05
0
10
15
10
15
x
Dr Heng Lian
0.03
pX(x)
0.01
0.00
0.00
0.0
0
0.00
0.02
0.1
0.05
0.04
0.02
pX(x)
0.06
0.10
pX(x)
0.3
0.2
pX(x)
0.08
0.4
0.04
0.10
0.15
0.5
0.12
0.6
= 0.1:
0.06
0.00
0.00
0.05
0.005
0.02
0.10
0.02
0.010
0.04
0.15
0.015
0.04
pX(x)
0.06
pX(x)
pX(x)
0.20
0.08
0.06
0.25
0.030
0.10
0.30
0.035
0.08
= 0.5:
Pn
20
25
20
40
60
x
2015- Week 7
37 / 43
Go to
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
38 / 43
Dr Heng Lian
2015- Week 7
39 / 43
Dr Heng Lian
2015- Week 7
40 / 43
a
X
n
N (0, 1),
X
P z1/2 n
z1/2 ' 1 ,
Dr Heng Lian
2015- Week 7
41 / 43
P X z1/2 X + z1/2
' 1 ,
n
n
so that if x is the sample mean of an observed random sample of size
n from any distribution with known variance 2 , an approximate
confidence interval of level 100 (1 )% for is given by
x z1/2 , x + z1/2
n
n
Note: because this result requires n large enough to be reliable, this
type of interval, based on the CLT, is often called large-sample
confidence interval.
One could also define large-sample one-sided confidence intervals of
level 100 (1 )%: (, x + z1 n ] and [x z1 n , +).
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
42 / 43
Objectives
Objectives
Now you should be able to:
Explain important properties of point estimators, including bias,
variance, efficiency and consistency
Know how to compute and explain the precision with which a
parameter is estimated
Understand the basics of interval estimation and explain what a
confidence interval of level 100 (1 )% for a parameter is
Construct exact confidence intervals on the mean of a normal
distribution with known variance
Understand the Central Limit Theorem and explain the important
role of the normal distribution in inference
Construct large sample confidence intervals on a mean of an
arbitrary distribution
Recommended exercises: Q46+Q49, Q50 p.237, Q69 p.239, Q1, Q3
p.293, Q5, Q6 p.294, Q10, Q11 p.301
CVEN2002/2702 (Statistics)
Dr Heng Lian
2015- Week 7
43 / 43