You are on page 1of 5

11.

CONFIDENCE INTERVALS FOR There is some small probability () that the method of
THE MEAN; KNOWN VARIANCE constructing confidence intervals will fail. We can control this
probability, that is, we can select any value for we want.
Traditionally, is taken to be either 0.05 or 0.01. We refer to
We assume here that the population variance 2 is known. 1 as the confidence level. This is the proportion of the time
This is an unrealistic assumption, but it allows us to give a that the confidence interval contains the parameter, in repeated
simplified presentation which reveals many of the important sampling.
issues, and prepares us to solve the real problem, where 2 is
unknown. (Next handout).
Let z/2 denote the z value such that the area to its right under
We want a confidence interval for the population mean, , the standard normal curve is /2.
based on n observations.

Eg: For = 0.05 we get z/2 = z0.025 = 1.96. So, 95% of the time a normal will be in the range
( 1.96, + 1.96).

In the Empirical Rule, we approximated the 1.96 by 2.

For = 0.01 we get z/2 = z0.005 = 2.576.

The interval X z 2 x is a CI for with confidence level


1 .

Note that the interval extends from


X z 2 x to X + z 2 x .
Heres why the formula above works. First, we suppose that the
Central Limit Theorem allows us to assume that X is normal. The probability that X falls more than z/2 standard errors from
is

Lets convert X into its own z-score, Prob{ Z < z 2 }+ Prob{ Z > z 2 } = + =
2 2
X
Z=
x
We have standardized X by subtracting its own mean, x = ,

and dividing by its own standard error, x = .
n

Since X is normal, Z is standard normal.

Z measures how many standard errors X falls from .

Therefore, Interpretation of Confidence Intervals


1 = Prob{ X falls within z 2 standard errors of } As stated earlier, the confidence interval X z 2 x will cover
= Prob{ is within z 2 x of X } with probability 1 . There is a subtle difficulty in the
= Prob{ is between X z 2 x and X + z 2 x } . practical interpretation of the results, however, as demonstrated
in the following example.

Eg 1: In the Pepsi example, we had n = 100, = 0.05 and


So X z 2 x is a CI for with confidence level 1 . . . The 95% confidence interval estimator for is
x = 1985
X 1.96 , that is, X 0.01
n

Plugging in x = 1.985, we obtain the 95% confidence interval


estimate (1.975, 1.995), rounded to three decimal places.
Which of the following statements is true? Warning: The practical interpretation of confidence intervals is
extremely tricky. The difficulty has to do with the distinction
a) There is a 95% chance that is between 1.975 and 1.995. between an estimator and an estimate.

b) will be between 1.975 and 1.995 95% of the time. An estimator is a random variable whose value depends on a
sample not yet taken (eg: X ).
c) In 95% of all future samples, x will be between 1.975 and
1.995. An estimate is the value actually taken by the estimator for a
given sample (eg: x = 1985
. ).
d) is between 1.975 and 1.995.
The CI is an interval estimator. It has random endpoints. After
e) None of the above. the sample is taken, its endpoints take on specific values, yielding
an interval estimate.

The word probability refers to the long-run proportion of the time


Since is an (unknown) constant, and since the endpoints of that these random endpoints will contain the true mean , assuming
the CI estimate are fixed numbers (eg: 1.975, 1.995), it makes a large number of repetitions of the experiment of collecting a
no sense to talk about the probability that the CI estimate random sample and constructing the CI.
contains . Either it does or it doesnt, and we may never find
out which of these events has occurred. Thus the confidence level 1 refers to the process of constructing
confidence intervals, not to the particular CI estimate obtained from
Instead, it is the CI estimator which contains with probability the given sample.
1.
A correct (but not very satisfying) answer to the multiple choice
The estimator has random endpoints, problem is
X z 2 x , X + z 2 x .
f) We cant really say anything about the particular interval estimate
we got for this sample (1.975, 1.995), but the confidence interval
estimator X 0.01 will cover in 95% of all random
samples which can be collected.
Unfortunately, in practice we have only one sample. So what
good is a probability statement referring to all samples which
See Web demo on confidence intervals based on repeated
might have been taken?
samples at:
1) You can think of 1 as an overall success rate. If you
http://www.amstat.org/publications/jse/v6n3/applets/ConfidenceInte
compute many 95% confidence intervals over your lifetime, and
rval.html
if the required assumptions are satisfied for each one, then
approximately 95% of these confidence intervals will contain
Try changing the value of alpha and watch the CIs widen or narrow.
their respective population means. Unfortunately, you may
never know which ones were wrong.
Remember: In real life, we have only one sample, so only one CI. It
might be one of the 5% of all CIs which will fail to contain the
2) Even though we cant talk about the probability that the
mean. We dont know. But we have 95% confidence in the
given CI estimate contains , if we had to bet on it, we would
statistical procedure, since only 1 out of 20 such intervals will fail
say that the odds are 19 to 1 that our given 95% CI estimate fails
in the long run.
to cover . In the long run, in a lifetime of gambling on
confidence intervals, you would win 19 out of 20 bets that the
CI covered .
Why Dont We Use 100% Confidence Intervals?

Wouldnt it be better to be right 100% of the time rather than 95%


of the time? Not necessarily, when it comes to confidence intervals.
The problem is that (for given n and ), the smaller we make the
wider the CI becomes.
It is possible to construct a 100% confidence interval, but it is
infinitely wide, and therefore tells us nothing.

You might also like