You are on page 1of 92

Review Lecture 6

NormalDistribution
Normal Approximation to the Binomial and Poisson
Distributions

MTH410-S16- Lecture 07

1/103

Lecture 7
Data Collection and Sampling
The Central Limit Theorem
Introduction to Estimation
Point Estimation (For Mean)
Interval Estimation Confidence Intervals
(For mean, standard deviation known)
Interval Estimation Confidence Intervals
(For mean, standard deviation known)
Determining Sample Sizes for given
confidence levels

MTH410-S16- Lecture 07

2/103

Introduction & Recap


Statistics is a way to get information from data
Statistics

Data

Information

But where do data come from? How do we ensure them


accurate? Are the data reliable? Are they representative of
the population from which they were drawn? This lecture
explores some of these issues

MTH410-S16- Lecture 07

3/103

Introduction & Recap(Contd)


Parameter

A descriptive measure of a population.


Statistic
A descriptive measure of a sample.
Because populations tend to be very large, most population
parameters are not only unknown but also unknowable.
We can only use statistics inference to obtain an estimate if
willing to accept less than 100% accuracy. Instead of
investigating the entire population, we choose to study a
sample.
MTH410-S16- Lecture 07

4/103

Methods of Collecting Data


There are many methods used to collect or obtain data for
statistical analysis. Three of the most popular methods are:
Direct Observation
Experiments
Surveys.

MTH410-S16- Lecture 07

5/103

Observational and experimental studies


Experimental study is one in which measurements
representing a variable of interest are observed and
recorded, while controlling factors that might influence
their values.
Observational study is one in which measurements
representing a variable of interest are observed and
recorded, without controlling any factor that might
influence their values.

MTH410-S16- Lecture 07

6/103

Surveys
A survey solicits information from people; e.g. Gallup polls;
pre-election polls; marketing surveys.
The Response Rate (i.e. the proportion of all people selected
who complete the survey) is a key survey parameter. A low
response rate can destroy the validity of any conclusion
resulting from this survey.

MTH410-S16- Lecture 07

7/103

Surveys(Contd)
Surveys may be administered in a variety of ways, e.g.
Personal Interview: higher response rate, less incorrect
responses due to misunderstanding, but expensive;
Telephone Interview: less expensive, but less personal
and lower expected response rate.
Self Administered Survey (which is usually mailed to a
sample of people): inexpensive, but with lower response rate
and relatively high misunderstanding

MTH410-S16- Lecture 07

8/103

Questionnaire Design
Over the years, a lot of thought has been put into the science
of the design of survey questions. Key design principle:
KISS Keep it simple & stupid
E.g. Keep the questionnaire as short as possible,
Ask short, simple, and clearly worded questions,
Start with simple questions,
Use Yes-No or multiple choice questions,
Avoid leading questions,
Make it easy to analyze & present the collected data
MTH410-S16- Lecture 07

9/103

Sampling
Recall: statistical inference permits us to draw conclusions
about a population based on a sample.
Rationale: a) cost (less expensive to sample 1,000 television
viewers than 100 million TV viewers), and
b) practicality (e.g. performing a crash test on every vehicle
produced is impractical).

The sampled population and the target population should


always be similar to each other.

MTH410-S16- Lecture 07

10/103

Self-selected Samples
are almost always biased, because the individuals who
participate in them are most likely more interested in this
issue than other members of the population.
E.g. Radio and television stations always ask people to call
and give their opinion on an issue of interest.
However, only listeners who are concerned about this topic
and have enough patience to get through to the station will
be included in the sample.
That means, the sampled population is different with the
target population the conclusions drawn from such
surveys are frequently wrong.
MTH410-S16- Lecture 07

11/103

Sampling Plans
A sampling plan is just a method or procedure for
specifying how a sample will be taken from a population.
We will focus our attention on these three methods:
Simple Random Sampling,
Stratified Random Sampling, and
Cluster Sampling.

MTH410-S16- Lecture 07

12/103

Simple Random Sampling


A simple random sample is a sample selected in such a way
that every possible sample of the same size is equally likely
to be chosen.

E.g. Drawing three names from a hat containing all the


names of the students in the class is an example of a simple
random sample: any group of three names will be picked
equally likely.
Normally, assign each element of the entire population a
unique number, then sample numbers can be selected at
random. We may use a random number table or Excel
MTH410-S16- Lecture 07

13/103

Stratified Random Sampling(Contd)


A stratified random sample is obtained by separating the
population into mutually exclusive sets, or strata, and then
drawing simple random samples from each stratum.
Strata 1: Gender
Male
Female

Strata 2: Age
< 20
20-30
31-40
41-50
51-60
> 60

Strata 3: Occupation
professional
white collar
blue collar
other

Besides acquiring about the total population, we can also make


inferences within a stratum, or make comparisons across strata
MTH410-S16- Lecture 07

14/103

Stratified Random Sampling(Contd)


After the population has been stratified, we can use simple
random sampling to generate the complete sample:

If we plan to sample 400 people total, wed draw 100 (=400*.25)


of them from the low income group
if we are sampling 1000 people, wed draw 50 (=1000*.05)
of them from the high income group.
MTH410-S16- Lecture 07

15/103

Cluster Sampling
A cluster sample is a simple random sample of groups or
clusters of elements (vs. a simple random sample of
individual objects).
This method is useful when it is difficult or costly to
develop a complete list of the population members or when
the population elements are widely dispersed geographically.

Cluster sampling may increase sampling error due to


similarities among cluster members.

MTH410-S16- Lecture 07

16/103

Sample Size
Numerical techniques for determining sample sizes will
be described later, but at least we can say that the larger
the sample size is, the more accurate we can expect the
sample estimates will be.

MTH410-S16- Lecture 07

17/103

Sampling and Non-Sampling Errors


Two major types of errors can arise when a sample of
observations is taken from a population:
Sampling error refers to differences between the sample and
the population that exist only because of the observations
that happened to be selected for the sample.
Nonsampling errors are more serious and are due to
mistakes made in the acquisition of data, or non-response
error or due to the sample observations being selected
improperly.

MTH410-S16- Lecture 07

18/103

Sampling Error
Sampling error refers to differences between the sample
and the population, because of the specific observations that
happen to be selected.
Sampling error is expected to occur when making a
statement about the population based on the sample taken.
Increasing the sample size will reduce this type of error.

MTH410-S16- Lecture 07

19/103

Nonsampling Error
Nonsampling errors are more serious and are due to
mistakes made in the acquisition of data, or non-response
error, or due to the sample observations being selected
improperly.

Three types of nonsampling errors:


Errors in data acquisition,
Nonresponse errors, and
Selection bias.

Note: increasing the sample size will not reduce this type of
error.
MTH410-S16- Lecture 07

20/103

Sampling Distributions

MTH410-S16- Lecture 07

21/103

Agenda
Sampling Distribution of the Mean
Sampling Distribution of a Proportion
Sampling Distribution of the Difference Between
Two Mean

MTH410-S16- Lecture 07

22/103

Introduction
In real life calculating parameters of populations is
prohibitive because populations are very large.

Rather than investigating the whole population, we take a


sample, calculate a statistic related to the parameter of
interest, and make an inference.
The sampling distribution of the statistic is the tool that
tells us how close is the statistic to the parameter.

MTH410-S16- Lecture 07

23/103

Sampling Distributions
A sampling distribution is created by, as the name suggests,
sampling.
The method we will employ on the rules of probability and
the laws of expected value and variance to derive the
sampling distribution.

MTH410-S16- Lecture 07

24/103

Sampling Distribution of the Mean


An Example
A fair die is thrown infinitely many times,
with the random variable X = # of spots on any throw.
The probability distribution of X is:
x

P(x)

1/6

1/6

1/6

1/6

1/6

1/6

and the mean and variance are calculated as well:

MTH410-S16- Lecture 07

25/103

Throwing a Die Twice: Sample Mean


Suppose we want to estimate mean from the mean x of a
sample of size n = 2.
What is the distribution of x ?

MTH410-S16- Lecture 07

26/103

Sampling Distribution of Two Dice


A sampling distribution is created by looking at
all samples of size n=2 (i.e. two dice) and their means

While there are 36 possible samples of size 2, there are only


11 values for
, and some (e.g.
=3.5) occur more
frequently than others (e.g.
=1).
MTH410-S16- Lecture 07

27/103

Sample
1
2
3
4
5
6
7
8
9
10
11
12

1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6

Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5

Sample
25
26
27
28
29
30
31
32
33
34
35
36

Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6

The distribution of x when n = 2


2

2
x
Note : x x and x
2

3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6

E( x) =1.0(1/36)+
1.5(2/36)+.=3.5

6/36
5/36

V( x ) = (1.0-3.5)2(1/36)+
(1.5-3.5)2(2/36)... = 1.46

4/36

3/36
2/36
1/36

1.5

2.0

2.5

3.0

3.5

4.0

4.5

MTH410-S16- Lecture 07

5.0

5.5 6.0

28/103

Sampling Distribution of the Mean(Contd)

n5
x 3.5
2x
.5833 ( )
5
2
x

n 10
x 3.5
2x
.2917 ( )
10
2
x

MTH410-S16- Lecture 07

n 25
x 3.5
2x
.1167 ( )
25
2
x

29/103

Sampling Distribution of the Mean(Contd)


n5
x 3.5

n 10
x 3.5

n 25
x 3.5

2x .5833 ( x )
5

2x .2917 ( x )
10

2x .1167 ( x )
25

Notice that x2 is equal


x/n..
smallertothan
The larger the sample size the
2
Therefore, x tends
tends
smaller x .. Therefore,
to fall closer to , as the sample
size increases.
2

MTH410-S16- Lecture 07

30/103

Sampling Distribution of the Mean(Contd)


Demonstration: The variance of the sample mean is smaller
than the variance of the population.
Mean = 1.5 Mean = 2 Mean = 2.5

Population

Let us take samples


of two observations

1.5
2.5
22
3
1.5
2.5
22
1.5
2.5
1.5
2
2.5
1.5
2.5
2
Compare
the variability
of
the population
1.5
2.5
1.5
2
2.5
1.5
2of the sample
2.5
to the variability
mean.
1.5
2.5
2
1.5
2.5
1.5
2
2.5
1.5
2
2.5
1.5
2
2.5
1

MTH410-S16- Lecture 07

31/103

Sampling Distribution of the Mean(Contd)


Also, notice
Expected value of the population=(1+2+3)/3 = 2
Expected value of the sample mean=(1.5+2+2.5)/3 = 2

MTH410-S16- Lecture 07

32/103

Sampling Distribution of the Mean(Sum)


1.
2.

3. If X is normal, x is normal. If X is nonnormal, x is


approximately normal for sufficiently large sample sizes.
Note: the definition of sufficiently large depends on the
extent of nonnormality of x (e.g. heavily skewed;
multimodal)

MTH410-S16- Lecture 07

33/103

Opening Example
Example Deans claim: The average weekly income of
B.B.A graduates one year after graduation is $800. And
suppose the distribution of weekly income has a standard
deviation of $100.
What is the probability that 25 randomly selected graduates
have an average weekly income of less than $750?
Solution

750 800
P( x 750) P(

)
x
100 25
P( z 2.5) 0.0062
MTH410-S16- Lecture 07

34/103

Opening Example(Contd)
Example continued
If a random sample of 25 graduates actually had an average weekly
income of $750, what would you conclude about the validity of the
claim that the average weekly income is 800?

Solution
With = 800 the probability of observing a sample
mean as low as 750 is very small (0.0062). The claim
that the mean weekly income is $800 is probably
unjustified.
It will be more reasonable to assume that is smaller
than $800, because then a sample mean of $750 becomes
more probable.
MTH410-S16- Lecture 07

35/103

Standardizing the Sample Mean


The sampling distribution can be used to make inferences
about population parameters. In order to do so, the sample
mean can be standardized to the standard normal distribution
using the following formulation:

MTH410-S16- Lecture 07

36/103

Using Sampling Distributions for Inference


To make inference about population parameters we use sampling
distributions.
The symmetry of the normal distribution along with the sample
distribution of the mean lead to:
P( 1.96 z 1.96) .95, or P( 1.96
- Z.025

Z.025

x
1.96) .95
n

This can be written as

P( 1.96
x 1.96
) .95
n
n
which become

P( 1.96
x 1.96
) .95
n
n

MTH410-S16- Lecture 07

37/103

Using Sampling Distributions for Inference


Standard normal distribution Z

Normal distribution of
P(800 1.96

100
100
x 800 1.96
) .95
25
25

.95
.025

.025

-1.96

Z
-1.96

.025

.95

.025

100
P(
800
.96
11.96
25
n

MTH410-S16- Lecture 07

800

100
n 25

P(800
1.96
1.96

38/103

Using Sampling Distributions for Inference


100
100
P(800 1.96
x 800 1.96
) .95
25
25
Which reduces to P(760.8 x 839.2) .95

Conclusion
There is 95% chance that the sample mean falls within the
interval [760.8, 839.2] if the population mean is 800.
Since the sample mean was 750, the population mean is
probably not 800.

MTH410-S16- Lecture 07

39/103

Generally

P z / 2
X z / 2
1
n
n

All are probability statements about X , which well use in


statistical inference.
In this formula, (Greek letter alpha) is the probability that
X does not fall into the interval.

MTH410-S16- Lecture 07

40/103

Return to the Opening Example


Replacing = 800,

100, n = 25, and

= .05, we get


1 .05
P z .025
X z .025
n
n

100
100
.95
P 800 1.96
X 800 1.96
25
25

P 760.8 X 839.2 .95

This is another way of checking the deans claim. The probability that
X falls between 760.8 and 839.2 is 95%. It is unlikely that we would
observe a sample mean as low as $750 when the population mean is
$800.

MTH410-S16- Lecture 07

41/103

Using the Sampling Distribution for Inference


Changing the probability from .95 to .90 changes the probability
statement to

P( 1.645

X 1.645

) .90

MTH410-S16- Lecture 07

42/103

Agenda
Sampling Distribution of the Mean
Sampling Distribution of a Proportion

MTH410-S16- Lecture 07

43/103

Sampling Distribution of a Proportion


The estimator of a population proportion of successes is the
sample proportion. That is, we count the number of
successes in a sample and compute:

(read this as p-hat).

X is the number of successes, n is the sample size.


Recall X is binomially distributed.

MTH410-S16- Lecture 07

44/103

Mean, Variance & Standard Deviation of p


By the Laws of expected value and variance, we can
determine the mean, variance and standard deviation of p
E ( p ) p
p (1 p)
n
p (1 p)
n

V ( p ) p2

Then the variable

p p
p(1 p) / n

is approximately standard normally distributed provided


that the sample size is large.
MTH410-S16- Lecture 07

45/103

Example
In the last election a MP received 52% of the votes cast. One
year later, the MP organized a survey that asked a random
sample of 300 people whether they would vote for him in the
next election. If we assume his popularity has not changed,
what is the probability that more than half of the sample
would vote for him?
Solution: Here n = 300, p =.52, we want to determine the
probability that the sample proportion is greater than 50%,
that is, we want to find P( p >.50)
p p
.50 .52
p( p .50) p(

) p( z .69) .7549
p(1 p) / n
(.52)(.48) / 300
MTH410-S16- Lecture 07

46/103

Example

Contd

The number of respondents who would vote for the representative


is a binomial random variable with n = 300 and p = .52.
We want to determine the probability that the sample proportion
is greater than 50%. That is, we want to find

P(P .50)
We now know that the sample proportion P is approximately
normally distributed with mean p = .52 and standard deviation

p(1 p) / n (.52)(1 .52) / 300 .0288

MTH410-S16- Lecture 07

47/103

Example

Contd

Thus, we calculate
P(P .50)
p

P
.
50

.
52

p(1 p) / n

.
0288

P( Z .69)
.7549

If we assume that the level of support remains at 52%, the


probability that more than half the sample of 300 people
would vote for the representative is 75.49%.
MTH410-S16- Lecture 07

48/103

Agenda
Sampling Distribution of the Mean
Sampling Distribution of a Proportion
Sampling Distribution of the Difference Between
Two Mean

MTH410-S16- Lecture 07

49/103

Difference Between Two Means


Independent samples are drawn from each of two normal
populations
Were interested in the sampling distribution of the
difference between the two sample means x1 x 2

MTH410-S16- Lecture 07

50/103

Difference Between Two Means(Contd)


The distribution of x1 x 2 is normal if
The two samples are independent, and
The parent populations are normally distributed.
If the two populations are not both normally distributed,
but the sample sizes are 30 or more, the distribution of
x1 x 2 is approximately normal.

MTH410-S16- Lecture 07

51/103

Difference Between Two Means(Contd)

Applying the laws of expected value and variance we have:

We can define:
Z

( x1 x 2 ) (1 2 )
12 22

n1 n2

MTH410-S16- Lecture 07

52/103

Example
Starting salaries for MBA grads at two universities are
normally distributed with the following means and standard
deviations. Samples from each school are taken
University 1

University 2

Mean

62,000 $/yr

60,000 $/yr

Std. Dev.

14,500 $/yr

18,300 $/yr

50

60

sample size

What is the probability that the sample mean starting salary


of University #1 graduates will exceed that of the #2 grads?
MTH410-S16- Lecture 07

53/103

Example

Contd

What is the probability that the sample mean starting


salary of University #1 graduates will exceed that of the #2
grads?
We are interested in determining P(X1 > X2). Converting this
to a difference of means, what is: P(X1 X2 > 0) ?

there is about a 74% chance that the sample mean


starting salary of U. #1 will exceed that of U. #2
MTH410-S16- Lecture 07

54/103

The Central Limit Theorem(Contd)


If the population is normal, then X is normally distributed
for ALL values of n.
If the population is non-normal, then X is approximately
normal only for larger values of n.
In many practical situations, a sample size of 30 may be
sufficiently large to allow us to use the normal distribution
as an approximation for the sampling distribution of X.

MTH410-S16- Lecture 07

63/103

The Central Limit Theorem


If the underlying distribution is close to a normal density curve,
then the approximation will be good even for a small n, whereas
if it is far from being normal, then a large n will be required.
Rule of Thumb
If n > 30, the Central Limit Theorem can be used.
There are population distributions for which even an n of 40 or
50 does not suffice, but such distributions are rarely encountered
in practice.
On the other hand, the rule of thumb is often conservative; for
many population distributions, an n much less than 30 would
suffice.

MTH410-S16- Lecture 07

64/103

Example (a)
The amount of soda pop in each bottle is normally
distributed with a mean of 32.2 ounces and a standard
deviation of .3 ounces.
Find the probability that a bottle bought by a customer
will contain more than 32 ounces.
Solution
0.7486

The random variable X is the


amount of soda in a bottle.
x 32 32.2
P( x 32) P(

)
x
.3

x = 32 = 32.2

P( z .67) 0.7486
MTH410-S16- Lecture 07

67/103

Example (b)
Find the probability that a carton of four bottles will have
a mean of more than 32 ounces of soda per bottle.
Solution
Define the random variable as the mean amount of soda
per bottle.
0.9082

x 32 32.2
P( x 32) P(

)
x
.3 4
P( z 1.33) 0.9082

0.7486
x = 32
x 32 = 32.2

MTH410-S16- Lecture 07

x 32.2

68/103

Example
The amount of a particular impurity in a batch of a certain
chemical product is a random variable with mean value 4.0 g and
standard deviation 1.5 g.
If 50 batches are independently prepared, what is the
(approximate) probability that the sample average amount of
impurity is between 3.5 and 3.8 g?

According to the rule of thumb to be stated shortly, n = 50 is


large enough for the CLT to be applicable.

MTH410-S16- Lecture 07

69/103

Example

Contd

then has approximately a normal distribution with mean


value = 4.0 and
so

MTH410-S16- Lecture 07

70/103

Statistical Inference
Statistical inference is the process by which we acquire information
and draw conclusions about populations from samples.
Statistics
Information

Data
Population

Sample
Inference

Statistic
Parameter

In order to do inference, we require the skills and knowledge of


descriptive statistics, probability distributions, and sampling
distributions.
MTH410-S16- Lecture 07

72/103

Estimation
There are two types of inference:
estimation and
hypothesis testing
estimation is introduced first.
The objective of estimation is to determine the approximate
value of a population parameter on the basis of a sample
statistic.
E.g., the sample mean (
) is employed to estimate the
population mean ( ).
We refer to the sample mean as the estimator of population
mean. Computed value of sample mean is called the estimate.
MTH410-S16- Lecture 07

73/103

Estimation
The objective of estimation is to determine the
approximate value of a population parameter on the basis of
a sample statistic.
There are two types of estimators:

Point Estimator
Interval Estimator

MTH410-S16- Lecture 07

74/103

Point Estimator
A point estimator draws inferences about a population by
estimating the value of an unknown parameter using a single
value or point.

We saw earlier that point probabilities in continuous


distributions were virtually zero. Likewise, wed expect that
the point estimator gets closer to the parameter value with an
increased sample size, but point estimators dont reflect the
effects of larger sample sizes. Hence we will employ the
interval estimator to estimate population parameters
MTH410-S16- Lecture 07

75/103

Interval Estimator
An interval estimator draws inferences about a population
by estimating the value of an unknown parameter using an
interval.

That is we say (with some ___% certainty) that some


interval will include the population parameter of interest.

MTH410-S16- Lecture 07

76/103

Point & Interval Estimation


For example, suppose we want to estimate the mean summer
income of a class of business students. For n=25 students,
is calculated to be 400 $/week.

point estimate

interval estimate

An alternative statement is:


The mean income is between 380 and 420 $/week.
MTH410-S16- Lecture 07

77/103

Qualities of Estimators
Qualities desirable in estimators include unbiasedness,
consistency, and relative efficiency:
An unbiased estimator of a population parameter is an
estimator whose expected value is equal to that parameter.
An unbiased estimator is said to be consistent if the
difference between the estimator and the parameter grows
smaller as the sample size grows larger.
If there are two unbiased estimators of a parameter, the one
whose variance is smaller is said to be relatively efficient.

MTH410-S16- Lecture 07

78/103

Unbiased Estimators
An unbiased estimator of a population parameter is an
estimator whose expected value is equal to that parameter.

E.g. the sample mean X is an unbiased estimator of the


population mean
, since:
E(X) =

MTH410-S16- Lecture 07

79/103

Consistency
An unbiased estimator is said to be consistent if the
difference between the estimator and the parameter grows
smaller as the sample size grows larger.
E.g. X is a consistent estimator of

because:

V(X) is
That is, as n grows larger, the variance of X grows smaller.

MTH410-S16- Lecture 07

80/103

Relative Efficiency
If there are two unbiased estimators of a parameter, the one
whose variance is smaller is said to be relatively efficient.

E.g. both the the sample median and sample mean are
unbiased estimators of the population mean, however, the
sample median has a greater variance than the sample mean,
so we choose since it is relatively efficient when
compared to the sample median.

MTH410-S16- Lecture 07

81/103

Estimating

when

is known

We can calculate an interval estimator from a sampling


distribution, by:
Drawing a sample of size n from the population
Calculating its mean,
And, by the central limit theorem, we know that X is
normally (or approximately normally) distributed so

will have a standard normal (or approximately normal)


distribution.
MTH410-S16- Lecture 07

82/103

Estimating

when

Looking at this in more detail


Known, i.e. standard
normal distribution

is known
Known, i.e. sample
mean
Unknown, i.e. we
want to estimate
the population mean

Known, i.e. its


assumed we know
the population
standard deviation

Known, i.e. the


number of items
sampled

MTH410-S16- Lecture 07

83/103

Estimating

when

is known
the confidence
interval

the sample mean is


in the center of the
interval

Thus, the probability that the interval:

contains the population mean


confidence interval estimator for

MTH410-S16- Lecture 07

is 1

. This is a

84/103

Confidence Interval Estimator for


The probability 1

is called the confidence level.

Usually represented
with a plus/minus
( ) sign

Confidence Interval
Estimator

upper confidence
limit (UCL)

lower confidence
limit (LCL)

MTH410-S16- Lecture 07

85/103

Graphically
here is the confidence interval for

width
MTH410-S16- Lecture 07

86/103

Graphically
the actual location of the population mean

may be here

or here

or possibly even here

The population mean is a fixed but unknown quantity. Its incorrect to interpret the
confidence interval estimate as a probability statement about . The interval acts as the
lower and upper limits of the interval estimate of the population mean.
MTH410-S16- Lecture 07

87/103

Table: Four Commonly Used Confidence Levels and z / 2

1-

/2

z / 2

.90

.10

.05

z.05 1.645

.95

.05

.025

z.025 1.96

.98

.02

.01

z.01 2.33

.99

.01

.005

z.005 2.575

MTH410-S16- Lecture 07

88/103

Example
A computer company samples demand during lead time over
25 time periods:
235
421
394
261
386

374
361
439
374
316

309
514
348
302
296

499
462
344
466
332

253
369
330
535
334

Its is known that the standard deviation of demand over lead


time is 75 computers. We want to estimate the mean demand
over lead time with 95% confidence in order to set inventory
levels

MTH410-S16- Lecture 07

89/103

Example

contd

We want to estimate the mean demand over lead time with


95% confidence in order to set inventory levels
IDENTIFY

Thus, the parameter to be estimated is the popn mean:


And so our confidence interval estimator will be:

MTH410-S16- Lecture 07

90/103

Example contd

CALCULATE

In order to use our confidence interval estimator, we need the


following pieces of data:
370.16

Calculated from the data

1.96
75

Given

25

therefore:
The lower and upper confidence limits are 340.76 and 399.56.
MTH410-S16- Lecture 07

91/103

Example contd

INTERPRET

The estimation for the mean demand during lead time lies
between 340.76 and 399.56 we can use this as input in
developing an inventory policy.

That is, we estimated that the mean demand during lead


time falls between 340.76 and 399.56, and this type of
estimator is correct 95% of the time. That also means that
5% of the time the estimator will be incorrect.
Incidentally, the media often refer to the 95% figure as 19
times out of 20, which emphasizes the long-run aspect of
the confidence level.
MTH410-S16- Lecture 07

92/103

Interval Width
A wide interval provides little information.
For example, suppose we estimate with 95% confidence that
an accountants average starting salary is between $15,000
and $100,000.
Contrast this with: a 95% confidence interval estimate of
starting salaries between $42,000 and $45,000.
The second estimate is much narrower, providing accounting
students more precise information about starting salaries.

MTH410-S16- Lecture 07

93/103

Interval Width
The width of the confidence interval estimate is a function of
the confidence level, the population standard deviation, and
the sample size

MTH410-S16- Lecture 07

94/103

Interval Width
The width of the confidence interval estimate is a function of
the confidence level, the population standard deviation, and
the sample size

A larger confidence level


produces a w i d e r
confidence interval:
MTH410-S16- Lecture 07

95/103

Interval Width
The width of the confidence interval estimate is a function of
the confidence level, the population standard deviation, and
the sample size

Larger values of standard


deviation produce w i d e r
confidence intervals
MTH410-S16- Lecture 07

96/103

Interval Width
The width of the confidence interval estimate is a function of
the confidence level, the population standard deviation,
and the sample size

Increasing the sample size decreases the width of the


confidence interval while the confidence level can
remain unchanged.
Note: this also increases the cost of obtaining additional data
MTH410-S16- Lecture 07

97/103

Selecting the Sample Size


Before we pointed out that sampling error is the difference
between an estimator and a parameter.

We can also define this difference as the error of


estimation.
In this chapter this can be expressed as the difference
between x and .

MTH410-S16- Lecture 07

98/103

Selecting the Sample Size


The bound on the error of estimation is
B = Z / 2

With a little algebra we find the sample size to estimate a


mean.
z / 2
n

MTH410-S16- Lecture 07

99/103

Selecting the Sample Size


To illustrate suppose that the data the manager had decided
that he needed to estimate the mean demand during lead time
to with 16 units, which is the bound on the error of
estimation.
We also have 1 = .95 and = 75. We calculate
2

z / 2
(1.96)( 75)
n

84.41
16

MTH410-S16- Lecture 07

100/103

Selecting the Sample Size


Because n must be an integer and because we want the
bound on the error of estimation to be no more than 16 any
non-integer value must be rounded up.
Thus, the value of n is rounded to 85, which means that to be
95% confident that the error of estimation will be no larger
than 16, we need to randomly sample 85 lead time intervals.

MTH410-S16- Lecture 07

101/103

Example
A lumber company must estimate the mean diameter of
trees to determine whether or not there is sufficient lumber
to harvest an area of forest. They need to estimate this to
within 1 inch at a confidence level of 99%. The tree
diameters are normally distributed with a standard
deviation of 6 inches.
How many trees need to be sampled?

MTH410-S16- Lecture 07

102/103

Example

contd

B=1, = 6
1 = .99, . =0.01, /2=0.05 From Table

z.005 2.575
We compute

z / 2 2
(2.575)(6) 2
n(
) (
) 239
B
1
That is, we will need to sample at least 239 trees to have a
99% confidence interval of x 1

MTH410-S16- Lecture 07

103/103

You might also like