Professional Documents
Culture Documents
NormalDistribution
Normal Approximation to the Binomial and Poisson
Distributions
MTH410-S16- Lecture 07
1/103
Lecture 7
Data Collection and Sampling
The Central Limit Theorem
Introduction to Estimation
Point Estimation (For Mean)
Interval Estimation Confidence Intervals
(For mean, standard deviation known)
Interval Estimation Confidence Intervals
(For mean, standard deviation known)
Determining Sample Sizes for given
confidence levels
MTH410-S16- Lecture 07
2/103
Data
Information
MTH410-S16- Lecture 07
3/103
4/103
MTH410-S16- Lecture 07
5/103
MTH410-S16- Lecture 07
6/103
Surveys
A survey solicits information from people; e.g. Gallup polls;
pre-election polls; marketing surveys.
The Response Rate (i.e. the proportion of all people selected
who complete the survey) is a key survey parameter. A low
response rate can destroy the validity of any conclusion
resulting from this survey.
MTH410-S16- Lecture 07
7/103
Surveys(Contd)
Surveys may be administered in a variety of ways, e.g.
Personal Interview: higher response rate, less incorrect
responses due to misunderstanding, but expensive;
Telephone Interview: less expensive, but less personal
and lower expected response rate.
Self Administered Survey (which is usually mailed to a
sample of people): inexpensive, but with lower response rate
and relatively high misunderstanding
MTH410-S16- Lecture 07
8/103
Questionnaire Design
Over the years, a lot of thought has been put into the science
of the design of survey questions. Key design principle:
KISS Keep it simple & stupid
E.g. Keep the questionnaire as short as possible,
Ask short, simple, and clearly worded questions,
Start with simple questions,
Use Yes-No or multiple choice questions,
Avoid leading questions,
Make it easy to analyze & present the collected data
MTH410-S16- Lecture 07
9/103
Sampling
Recall: statistical inference permits us to draw conclusions
about a population based on a sample.
Rationale: a) cost (less expensive to sample 1,000 television
viewers than 100 million TV viewers), and
b) practicality (e.g. performing a crash test on every vehicle
produced is impractical).
MTH410-S16- Lecture 07
10/103
Self-selected Samples
are almost always biased, because the individuals who
participate in them are most likely more interested in this
issue than other members of the population.
E.g. Radio and television stations always ask people to call
and give their opinion on an issue of interest.
However, only listeners who are concerned about this topic
and have enough patience to get through to the station will
be included in the sample.
That means, the sampled population is different with the
target population the conclusions drawn from such
surveys are frequently wrong.
MTH410-S16- Lecture 07
11/103
Sampling Plans
A sampling plan is just a method or procedure for
specifying how a sample will be taken from a population.
We will focus our attention on these three methods:
Simple Random Sampling,
Stratified Random Sampling, and
Cluster Sampling.
MTH410-S16- Lecture 07
12/103
13/103
Strata 2: Age
< 20
20-30
31-40
41-50
51-60
> 60
Strata 3: Occupation
professional
white collar
blue collar
other
14/103
15/103
Cluster Sampling
A cluster sample is a simple random sample of groups or
clusters of elements (vs. a simple random sample of
individual objects).
This method is useful when it is difficult or costly to
develop a complete list of the population members or when
the population elements are widely dispersed geographically.
MTH410-S16- Lecture 07
16/103
Sample Size
Numerical techniques for determining sample sizes will
be described later, but at least we can say that the larger
the sample size is, the more accurate we can expect the
sample estimates will be.
MTH410-S16- Lecture 07
17/103
MTH410-S16- Lecture 07
18/103
Sampling Error
Sampling error refers to differences between the sample
and the population, because of the specific observations that
happen to be selected.
Sampling error is expected to occur when making a
statement about the population based on the sample taken.
Increasing the sample size will reduce this type of error.
MTH410-S16- Lecture 07
19/103
Nonsampling Error
Nonsampling errors are more serious and are due to
mistakes made in the acquisition of data, or non-response
error, or due to the sample observations being selected
improperly.
Note: increasing the sample size will not reduce this type of
error.
MTH410-S16- Lecture 07
20/103
Sampling Distributions
MTH410-S16- Lecture 07
21/103
Agenda
Sampling Distribution of the Mean
Sampling Distribution of a Proportion
Sampling Distribution of the Difference Between
Two Mean
MTH410-S16- Lecture 07
22/103
Introduction
In real life calculating parameters of populations is
prohibitive because populations are very large.
MTH410-S16- Lecture 07
23/103
Sampling Distributions
A sampling distribution is created by, as the name suggests,
sampling.
The method we will employ on the rules of probability and
the laws of expected value and variance to derive the
sampling distribution.
MTH410-S16- Lecture 07
24/103
P(x)
1/6
1/6
1/6
1/6
1/6
1/6
MTH410-S16- Lecture 07
25/103
MTH410-S16- Lecture 07
26/103
27/103
Sample
1
2
3
4
5
6
7
8
9
10
11
12
1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5
Sample
25
26
27
28
29
30
31
32
33
34
35
36
Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
2
x
Note : x x and x
2
3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6
E( x) =1.0(1/36)+
1.5(2/36)+.=3.5
6/36
5/36
V( x ) = (1.0-3.5)2(1/36)+
(1.5-3.5)2(2/36)... = 1.46
4/36
3/36
2/36
1/36
1.5
2.0
2.5
3.0
3.5
4.0
4.5
MTH410-S16- Lecture 07
5.0
5.5 6.0
28/103
n5
x 3.5
2x
.5833 ( )
5
2
x
n 10
x 3.5
2x
.2917 ( )
10
2
x
MTH410-S16- Lecture 07
n 25
x 3.5
2x
.1167 ( )
25
2
x
29/103
n 10
x 3.5
n 25
x 3.5
2x .5833 ( x )
5
2x .2917 ( x )
10
2x .1167 ( x )
25
MTH410-S16- Lecture 07
30/103
Population
1.5
2.5
22
3
1.5
2.5
22
1.5
2.5
1.5
2
2.5
1.5
2.5
2
Compare
the variability
of
the population
1.5
2.5
1.5
2
2.5
1.5
2of the sample
2.5
to the variability
mean.
1.5
2.5
2
1.5
2.5
1.5
2
2.5
1.5
2
2.5
1.5
2
2.5
1
MTH410-S16- Lecture 07
31/103
MTH410-S16- Lecture 07
32/103
MTH410-S16- Lecture 07
33/103
Opening Example
Example Deans claim: The average weekly income of
B.B.A graduates one year after graduation is $800. And
suppose the distribution of weekly income has a standard
deviation of $100.
What is the probability that 25 randomly selected graduates
have an average weekly income of less than $750?
Solution
750 800
P( x 750) P(
)
x
100 25
P( z 2.5) 0.0062
MTH410-S16- Lecture 07
34/103
Opening Example(Contd)
Example continued
If a random sample of 25 graduates actually had an average weekly
income of $750, what would you conclude about the validity of the
claim that the average weekly income is 800?
Solution
With = 800 the probability of observing a sample
mean as low as 750 is very small (0.0062). The claim
that the mean weekly income is $800 is probably
unjustified.
It will be more reasonable to assume that is smaller
than $800, because then a sample mean of $750 becomes
more probable.
MTH410-S16- Lecture 07
35/103
MTH410-S16- Lecture 07
36/103
Z.025
x
1.96) .95
n
P( 1.96
x 1.96
) .95
n
n
which become
P( 1.96
x 1.96
) .95
n
n
MTH410-S16- Lecture 07
37/103
Normal distribution of
P(800 1.96
100
100
x 800 1.96
) .95
25
25
.95
.025
.025
-1.96
Z
-1.96
.025
.95
.025
100
P(
800
.96
11.96
25
n
MTH410-S16- Lecture 07
800
100
n 25
P(800
1.96
1.96
38/103
Conclusion
There is 95% chance that the sample mean falls within the
interval [760.8, 839.2] if the population mean is 800.
Since the sample mean was 750, the population mean is
probably not 800.
MTH410-S16- Lecture 07
39/103
Generally
P z / 2
X z / 2
1
n
n
MTH410-S16- Lecture 07
40/103
= .05, we get
1 .05
P z .025
X z .025
n
n
100
100
.95
P 800 1.96
X 800 1.96
25
25
This is another way of checking the deans claim. The probability that
X falls between 760.8 and 839.2 is 95%. It is unlikely that we would
observe a sample mean as low as $750 when the population mean is
$800.
MTH410-S16- Lecture 07
41/103
P( 1.645
X 1.645
) .90
MTH410-S16- Lecture 07
42/103
Agenda
Sampling Distribution of the Mean
Sampling Distribution of a Proportion
MTH410-S16- Lecture 07
43/103
MTH410-S16- Lecture 07
44/103
V ( p ) p2
p p
p(1 p) / n
45/103
Example
In the last election a MP received 52% of the votes cast. One
year later, the MP organized a survey that asked a random
sample of 300 people whether they would vote for him in the
next election. If we assume his popularity has not changed,
what is the probability that more than half of the sample
would vote for him?
Solution: Here n = 300, p =.52, we want to determine the
probability that the sample proportion is greater than 50%,
that is, we want to find P( p >.50)
p p
.50 .52
p( p .50) p(
) p( z .69) .7549
p(1 p) / n
(.52)(.48) / 300
MTH410-S16- Lecture 07
46/103
Example
Contd
P(P .50)
We now know that the sample proportion P is approximately
normally distributed with mean p = .52 and standard deviation
MTH410-S16- Lecture 07
47/103
Example
Contd
Thus, we calculate
P(P .50)
p
P
.
50
.
52
p(1 p) / n
.
0288
P( Z .69)
.7549
48/103
Agenda
Sampling Distribution of the Mean
Sampling Distribution of a Proportion
Sampling Distribution of the Difference Between
Two Mean
MTH410-S16- Lecture 07
49/103
MTH410-S16- Lecture 07
50/103
MTH410-S16- Lecture 07
51/103
We can define:
Z
( x1 x 2 ) (1 2 )
12 22
n1 n2
MTH410-S16- Lecture 07
52/103
Example
Starting salaries for MBA grads at two universities are
normally distributed with the following means and standard
deviations. Samples from each school are taken
University 1
University 2
Mean
62,000 $/yr
60,000 $/yr
Std. Dev.
14,500 $/yr
18,300 $/yr
50
60
sample size
53/103
Example
Contd
54/103
MTH410-S16- Lecture 07
63/103
MTH410-S16- Lecture 07
64/103
Example (a)
The amount of soda pop in each bottle is normally
distributed with a mean of 32.2 ounces and a standard
deviation of .3 ounces.
Find the probability that a bottle bought by a customer
will contain more than 32 ounces.
Solution
0.7486
)
x
.3
x = 32 = 32.2
P( z .67) 0.7486
MTH410-S16- Lecture 07
67/103
Example (b)
Find the probability that a carton of four bottles will have
a mean of more than 32 ounces of soda per bottle.
Solution
Define the random variable as the mean amount of soda
per bottle.
0.9082
x 32 32.2
P( x 32) P(
)
x
.3 4
P( z 1.33) 0.9082
0.7486
x = 32
x 32 = 32.2
MTH410-S16- Lecture 07
x 32.2
68/103
Example
The amount of a particular impurity in a batch of a certain
chemical product is a random variable with mean value 4.0 g and
standard deviation 1.5 g.
If 50 batches are independently prepared, what is the
(approximate) probability that the sample average amount of
impurity is between 3.5 and 3.8 g?
MTH410-S16- Lecture 07
69/103
Example
Contd
MTH410-S16- Lecture 07
70/103
Statistical Inference
Statistical inference is the process by which we acquire information
and draw conclusions about populations from samples.
Statistics
Information
Data
Population
Sample
Inference
Statistic
Parameter
72/103
Estimation
There are two types of inference:
estimation and
hypothesis testing
estimation is introduced first.
The objective of estimation is to determine the approximate
value of a population parameter on the basis of a sample
statistic.
E.g., the sample mean (
) is employed to estimate the
population mean ( ).
We refer to the sample mean as the estimator of population
mean. Computed value of sample mean is called the estimate.
MTH410-S16- Lecture 07
73/103
Estimation
The objective of estimation is to determine the
approximate value of a population parameter on the basis of
a sample statistic.
There are two types of estimators:
Point Estimator
Interval Estimator
MTH410-S16- Lecture 07
74/103
Point Estimator
A point estimator draws inferences about a population by
estimating the value of an unknown parameter using a single
value or point.
75/103
Interval Estimator
An interval estimator draws inferences about a population
by estimating the value of an unknown parameter using an
interval.
MTH410-S16- Lecture 07
76/103
point estimate
interval estimate
77/103
Qualities of Estimators
Qualities desirable in estimators include unbiasedness,
consistency, and relative efficiency:
An unbiased estimator of a population parameter is an
estimator whose expected value is equal to that parameter.
An unbiased estimator is said to be consistent if the
difference between the estimator and the parameter grows
smaller as the sample size grows larger.
If there are two unbiased estimators of a parameter, the one
whose variance is smaller is said to be relatively efficient.
MTH410-S16- Lecture 07
78/103
Unbiased Estimators
An unbiased estimator of a population parameter is an
estimator whose expected value is equal to that parameter.
MTH410-S16- Lecture 07
79/103
Consistency
An unbiased estimator is said to be consistent if the
difference between the estimator and the parameter grows
smaller as the sample size grows larger.
E.g. X is a consistent estimator of
because:
V(X) is
That is, as n grows larger, the variance of X grows smaller.
MTH410-S16- Lecture 07
80/103
Relative Efficiency
If there are two unbiased estimators of a parameter, the one
whose variance is smaller is said to be relatively efficient.
E.g. both the the sample median and sample mean are
unbiased estimators of the population mean, however, the
sample median has a greater variance than the sample mean,
so we choose since it is relatively efficient when
compared to the sample median.
MTH410-S16- Lecture 07
81/103
Estimating
when
is known
82/103
Estimating
when
is known
Known, i.e. sample
mean
Unknown, i.e. we
want to estimate
the population mean
MTH410-S16- Lecture 07
83/103
Estimating
when
is known
the confidence
interval
MTH410-S16- Lecture 07
is 1
. This is a
84/103
Usually represented
with a plus/minus
( ) sign
Confidence Interval
Estimator
upper confidence
limit (UCL)
lower confidence
limit (LCL)
MTH410-S16- Lecture 07
85/103
Graphically
here is the confidence interval for
width
MTH410-S16- Lecture 07
86/103
Graphically
the actual location of the population mean
may be here
or here
The population mean is a fixed but unknown quantity. Its incorrect to interpret the
confidence interval estimate as a probability statement about . The interval acts as the
lower and upper limits of the interval estimate of the population mean.
MTH410-S16- Lecture 07
87/103
1-
/2
z / 2
.90
.10
.05
z.05 1.645
.95
.05
.025
z.025 1.96
.98
.02
.01
z.01 2.33
.99
.01
.005
z.005 2.575
MTH410-S16- Lecture 07
88/103
Example
A computer company samples demand during lead time over
25 time periods:
235
421
394
261
386
374
361
439
374
316
309
514
348
302
296
499
462
344
466
332
253
369
330
535
334
MTH410-S16- Lecture 07
89/103
Example
contd
MTH410-S16- Lecture 07
90/103
Example contd
CALCULATE
1.96
75
Given
25
therefore:
The lower and upper confidence limits are 340.76 and 399.56.
MTH410-S16- Lecture 07
91/103
Example contd
INTERPRET
The estimation for the mean demand during lead time lies
between 340.76 and 399.56 we can use this as input in
developing an inventory policy.
92/103
Interval Width
A wide interval provides little information.
For example, suppose we estimate with 95% confidence that
an accountants average starting salary is between $15,000
and $100,000.
Contrast this with: a 95% confidence interval estimate of
starting salaries between $42,000 and $45,000.
The second estimate is much narrower, providing accounting
students more precise information about starting salaries.
MTH410-S16- Lecture 07
93/103
Interval Width
The width of the confidence interval estimate is a function of
the confidence level, the population standard deviation, and
the sample size
MTH410-S16- Lecture 07
94/103
Interval Width
The width of the confidence interval estimate is a function of
the confidence level, the population standard deviation, and
the sample size
95/103
Interval Width
The width of the confidence interval estimate is a function of
the confidence level, the population standard deviation, and
the sample size
96/103
Interval Width
The width of the confidence interval estimate is a function of
the confidence level, the population standard deviation,
and the sample size
97/103
MTH410-S16- Lecture 07
98/103
MTH410-S16- Lecture 07
99/103
z / 2
(1.96)( 75)
n
84.41
16
MTH410-S16- Lecture 07
100/103
MTH410-S16- Lecture 07
101/103
Example
A lumber company must estimate the mean diameter of
trees to determine whether or not there is sufficient lumber
to harvest an area of forest. They need to estimate this to
within 1 inch at a confidence level of 99%. The tree
diameters are normally distributed with a standard
deviation of 6 inches.
How many trees need to be sampled?
MTH410-S16- Lecture 07
102/103
Example
contd
B=1, = 6
1 = .99, . =0.01, /2=0.05 From Table
z.005 2.575
We compute
z / 2 2
(2.575)(6) 2
n(
) (
) 239
B
1
That is, we will need to sample at least 239 trees to have a
99% confidence interval of x 1
MTH410-S16- Lecture 07
103/103