You are on page 1of 33

THE DESIGN OF RESEARCH

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved.


SAMPLING DESIGN
Selection of Elements

• Population

• Population Element - Unit

• Sampling

• Sample

• Census

• Sampling Frame

• Variable – characteristic of a unit


What is a Good Sample?

• Accurate: absence of bias

• Precise estimate: sampling error


Sample Mean and the
Central Limit Theorem

• The sample mean is an unbiased estimator


of , therefore,
E( X ) = E( X) = 

• The standard error of the mean is the


standard deviation of the sampling error of x :

x =

n
Sample Mean and the
Central Limit Theorem

• If the population is exactly normal, then


the sample mean follows a normal
distribution.
Sample Mean and the Central Limit Theorem

• Example, the average price, , of a 5 GB MP3


player is $80.00 with a standard deviation, , equal
to $10.00. What will be the mean and standard
error from a sample of 20 players?
E( X ) = E(X) = = $80.00

x = = 10 = $2.236
 20
n
• If the distribution of prices for these players is a
normal distribution, then the sampling distribution on
x is N(80.00, 2.236).
Sample Mean and the Central Limit Theorem


• Central Limit Theorem (CLT) for a Mean
• If a random sample of size n is drawn from a
population with mean  and standard deviation
, the distribution of the sample mean x
approaches a normal distribution with mean 
and standard deviation x = / n as the
sample size increase.
• If the population is normal, the distribution of
the sample mean is normal regardless of
sample size.
Sample Mean and the
Central Limit Theorem
Sample Mean and the Central Limit Theorem

• Symmetric Population: Uniform


Distribution
• Rule of thumb: to obtain a normal distribution
for the sample mean, n > 30.
• A much smaller n will suffice if the population is
symmetric.
• For example,
consider a
uniform
population
U(500, 1500).
Sample Mean and the Central Limit Theorem

• Symmetric Population: Uniform Distribution


• The central limit theorem predicts that samples drawn
from this population will have a mean µ of 1000 and
Std Deviation б of 288.7and the standard error of the
mean of:

Predicted S.E. for x = / n


n=1 = 288.7/1 = 288.7
n=2 = 288.7/2 = 204.1
n=4 = 288.7/4 = 144.3
n = 16 = 288.7/16 = 72.2
Sample Mean and the
Central Limit Theorem

• Histograms of Sample Means from Uniform Population


Sample Mean and the Central Limit Theorem

• Range of Sample Means


• The CLT permits a range or interval within
which the sample means are expected to fall.

 Where z is from the


+z
 standard normal table.
• If we know  and ,nthe range of sample means for
samples of size n are predicted to be:

90% Interval 95% Interval 99% Interval


  
 + 1.645  + 1.960  + 2.576
  
n n n
Sample Mean and the Central Limit Theorem

• Illustration: GMAT Scores


• For samples of size n = 5 applicants, within what
range would GMAT means be expected to fall?
• The parameters are  = 520.78 and  = 86.8. The
predicted range for 95% of the sample means is:

 86.8
 + 1.960 = 520.78 + 1.960
 5
n = 520.78 + 76.08
Sample Mean and the
Central Limit Theorem

• Sample Size and Standard Error


• The standard error declines as n
increases, but at a decreasing rate.

Make the interval  + z

small by increasing n.
n
The distribution of
sample means collapses
at the true population
mean  as n increases.
Sample Mean and the
Central Limit Theorem

• Illustration: All Possible Samples from a Uniform Population

• Consider a discrete uniform population


consisting of the integers {0, 1, 2, 3}.

• The population parameters are:


 = 1.5,  = 1.118
CONFIDENCE STATEMENTS

• CS HAS TWO PARTS


– MARGIN OF ERROR – WHICH SAYS
HOW CLOSE THE SAMPLE STATISTIC
LIES TO THE POPULATION PARAMETER
– LEVEL OF CONFIDENCE – WHICH SAYS
WHAT PERCENT OF ALL POSSIBLE
SAMPLES SATISFY THE MARGIN OF
ERROR
INTERPRETING CONFIDENCE STATEMENTS

• THE CONCLUSION OF A CS ALWAYS APPLIES TO THE


POPULATION AND NOT TO THE SAMPLE
• OUR CONCLUSION ABOUT THE POPULATION IS NEVER
COMPLETELY CERTAIN I.E. WE CANNOT REACH 100 %
CONFIDENCE LEVEL
• TO BE 99% CONFIDENT WE MUST ACCEPT A LARGER
MARGIN OF ERROR THAN FOR 95% - I.E. A TRADE OF
BETWEEN HOW CLOSELY WE CAN PIN DOWN THE
POPULATION PARAMETER (MARGIN OF ERROR) & HOW
CONFIDENT WE ARE THAT A SAMPLE MEETS THE
MARGIN OF ERROR
• IT IS USUAL TO REPORT THE ME FOR 95% CONFIDENCE
• IF WE WANT A SMALLER MARGIN OF ERROR WITH THE
SAME CONFIDENCE THEN TAKE A LARGER SAMPLE –
SAMPLE SIZE CONTROLS THE PRECISION OF THE
RESULT
• DIFFERENT ME’s WILL BE APPLICABLE TO DIFF
SITUATIONS DEPENDING ON PRECISION WARRANTED
Confidence Interval for a Mean () with Known 

• What is a Confidence Interval?


• A sample mean x is a point estimate of the population
mean .
• A confidence interval for the mean is a range
lower <  < upper
• The confidence level is the probability that the
confidence interval contains the true population mean.
• The confidence level (usually expressed as a %) is the
area under the curve of the sampling distribution.
Confidence Interval for a Mean () with Known 

• What is a Confidence Interval?


• The confidence interval for  with known  is:
Confidence Interval for a Mean () with Known 

• Choosing a Confidence Level


• A higher confidence level leads to a wider
confidence interval.
• Greater
confidence
implies loss of
precision.
• 95% confidence is
most often used.
Confidence Interval for a
Mean () with Known 

• Interpretation
• A confidence interval either does or does
not contain .
• The confidence level quantifies the risk.
• Out of 100 confidence intervals,
approximately 95% would contain ,
while approximately 5% would not
contain .
Confidence Interval for a Mean () with Unknown 

• Confidence Interval Width


• Confidence interval width reflects
- the sample size,
- the confidence level and
- the standard deviation.
• To obtain a narrower interval and more
precision
- increase the sample size or
- lower the confidence level (e.g., from 90% to
80% confidence)
Types of Sampling Designs

• Probability

• Nonprobability
Steps in Sampling Design

• What is the relevant population?


• What are the parameters of interest?
• What is the sampling frame?
• What is the type of sample?
• What size sample is needed?
• How much will it cost?
Concepts to Help Understand
Probability Sampling

• Standard error

• Confidence interval

• Central limit theorem


• Sampling distribution
• Sampling Variation - Variance
BIAS & LACK OF PRECISION

• LARGE SAMPLES ARE MORE TRUSTWORTHY


THAN SMALLER ONES
• THE RESULTS OF SMALLER SAMPLES (SIZES)
ARE STILL CENTRED AT THE TRUTH FOR THE
POPULATION, BUT THEY ARE MUCH MORE
SPREAD OUT
• WE SELECT A SAMPLE TO GAIN INFO ABOUT THE
POPULATION. AN ERROR COULD OCCUR IN
WHICH THE STATISTIC MISSES THE TRUE VALUE
OF THE PARAMETER
• BULL’S – EYE AND BULLET ON TARGET ANALOGY
BIAS & LACK OF PRECISION

• BIAS – CONSISTENTLY OFF THE BULL’S EYE


• LACK OF PRECISION – SCATTER
• RS REDUCES BIAS
• LARGE SAMPLES INCREASE PRECISION
• A GOOD SAMPLING SCHEME MUST HAVE
LOW BIAS AND HIGH PRECISION
• SIZE OF THE POPULATION HAS LITTLE
INFLUENCE ON THE BEHAVIOR OF
STATISTICS FROM RANDOM SAMPLES
Probability Sampling Designs

• Simple random sampling


• Systematic sampling
• Stratified sampling
– Proportionate
– Disproportionate
• Cluster sampling
• Double sampling
Designing Cluster Samples

• How homogeneous are the clusters?


• Shall we seek equal or unequal
clusters?
• How large a cluster shall we take?
• Shall we use a single-stage or
multistage cluster?
• How large a sample is needed?
Nonprobability Sampling

Reasons to use
• Procedure satisfactorily meets the sampling
objectives
• Lower Cost
• Limited Time
• Not as much human error as selecting a
completely random sample
• Total list population not available
Nonprobability Sampling

• Convenience Sampling
• Purposive Sampling
– Judgment Sampling
– Quota Sampling
• Snowball Sampling

You might also like