You are on page 1of 44

Slides Prepared by

JOHN S. LOUCKS
St. Edwards University

2002 South-Western/Thomson Learning

Chapter 7
Sampling and Sampling Distributions

Simple Random Sampling


Point Estimation
Introduction to Sampling Distributions
Sampling Distribution of
x
Sampling Distribution of
p
n = 100
Properties of Point Estimators
Other Sampling Methods
n = 30

Statistical Inference

The purpose of statistical inference is to obtain


information about a population from
information contained in a sample.
A population is the set of all the elements of
interest.
A sample is a subset of the population.
The sample results provide only estimates of
the values of the population characteristics.
A parameter is a numerical characteristic of a
population.
With proper sampling methods, the sample
results will provide good estimates of the
population characteristics.
3

Simple Random Sampling

Finite Population
A simple random sample from a finite
population of size N is a sample selected
such that each possible sample of size n has
the same probability of being selected.
Replacing each sampled element before
selecting subsequent elements is called
sampling with replacement.
Sampling without replacement is the
procedure used most often.
In large sampling projects, computergenerated random numbers are often used
to automate the sample selection process.
4

Simple Random Sampling

Infinite Population
A simple random sample from an infinite
population is a sample selected such that
the following conditions are satisfied.
Each element selected comes from the
same population.
Each element is selected independently.
The population is usually considered infinite
if it involves an ongoing process that makes
listing or counting every element
impossible.
The random number selection procedure
cannot be used for infinite populations.
5

Point Estimation

In point estimation we use the data from the


sample to compute a value of a sample
statistic that serves as an estimate of a
populationxparameter.
We refer to as the point estimator of the
population mean .
s is the point estimator of the population
standard
deviation .
p
is the point estimator of the population
proportion p.

Sampling Error

The absolute difference between an unbiased


point estimate and the corresponding
population parameter is called the sampling
error.
Sampling error is the result of using a subset
of the population (the sample), and not the
entire population to develop estimates.
The sampling
| x | errors are:
for sample mean
| ps| - for sample standard
| p
deviation
for sample proportion
7

Example: St. Andrews


St. Andrews University receives 900
applications
annually from prospective students. The
application
forms contain a variety of information including
the
individuals scholastic aptitude test (SAT) score
and
whether or not the individual desires on-campus
housing.

Example: St. Andrews


The director of admissions would like to know
the
following information:
the average SAT score for the applicants,
and
the proportion of applicants that want to live
on campus.
We will now look at three alternatives for
obtaining
the desired information.
Conducting a census of the entire 900
applicants
Selecting a sample of 30 applicants, using a
random number table
Selecting a sample of 30 applicants, using
9

Example: St. Andrews

Taking a Census of the 900 Applicants


SAT Scores
Population Mean
x

900

990

Population Standard Deviation

(x

)2

900

80

Applicants Wanting On-Campus Housing


Population Proportion
p

648
.72
900
10

Example: St. Andrews

Take a Sample of 30Applicants Using a


Random Number Table
Since the finite population has 900
elements, we will need 3-digit random
numbers to randomly select applicants
numbered from 1 to 900.
We will use the last three digits of the 5digit random numbers in the third column of a
random number table. The numbers we draw
will be the numbers of the applicants we will
sample unless
the random number is greater than 900
or
the random number has already been
used.
We will continue to draw random numbers 11

Example: St. Andrews

Use of Random Numbers for Sampling


3-Digit
Applicant
Random Number
Included in Sample
744
No. 744
436
No. 436
865
No. 865
790
No. 790
835
No. 835
902
Number exceeds 900
190
No. 190
436
Number already used
etc.
etc.

12

Example: St. Andrews

Sample Data

Random
No. Number
Applicant
SAT Score
On-Campus
1
744
Connie Reyman 1025
Yes
2
436 William Fox
950
Yes
3
865 Fabian Avante
1090
No
4
790 Eric Paxton
1120
Yes
5
835 Winona Wheeler
1015
No
.
.
.
.
.
30
685 Kevin Cossack
965
No

13

Example: St. Andrews

Take a Sample of 30 Applicants Using


Computer-Generated Random Numbers
Excel provides a function for generating
random numbers in its worksheet.
900 random numbers are generated, one for
each applicant in the population.
Then we choose the 30 applicants
corresponding to the 30 smallest random
numbers as our sample.
Each of the 900 applicants have the same
probability of being included.

14

Using Excel to Select


a Simple Random Sample

Formula Worksheet

1
2
3
4
5
6
7
8
9

Applicant
Number
1
2
3
4
5
6
7
8

SAT
Score
1008
1025
952
1090
1127
1015
965
1161

On-Campus
Housing
Yes
No
Yes
Yes
Yes
No
Yes
No

D
Random
Number
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()

Note: Rows 10-901 are not shown.


15

Using Excel to Select


a Simple Random Sample

Value Worksheet

1
2
3
4
5
6
7
8
9

Applicant
Number
1
2
3
4
5
6
7
8

SAT
Score
1008
1025
952
1090
1127
1015
965
1161

On-Campus
Housing
Yes
No
Yes
Yes
Yes
No
Yes
No

Random
Number
0.41327
0.79514
0.66237
0.00234
0.71205
0.18037
0.71607
0.90512

Note: Rows 10-901 are not shown.


16

Using Excel to Select


a Simple Random Sample

Value Worksheet (Sorted)

1
2
3
4
5
6
7
8
9

Applicant
Number
12
773
408
58
116
185
510
394

SAT
Score
1107
1043
991
1008
1127
982
1163
1008

On-Campus
Housing
No
Yes
Yes
No
Yes
Yes
Yes
No

Random
Number
0.00027
0.00192
0.00303
0.00481
0.00538
0.00583
0.00649
0.00667

Note: Rows 10-901 are not shown.


17

Example: St. Andrews

Point Estimates
x as Point Estimator of
x

29,910

997
30
30

s as Point Estimator of
s

2
(
x

x
)
i

29

163,996
75.2
29

as Point Estimator of p
p 20 30 .68

Note: Different random numbers would have


identified a different sample which would have
resulted in different point estimates.
18

Sampling Distribution ofx

Process of Statistical Inference


Population
with mean
=?

A simple random sample


of n elements is selected
from the population.

The value of
isx used to
make inferences about
the value of .

The sample data


provide a value for
the sample mean .x

19

Sampling Distribution of x

x of
The sampling distribution
is the
probability distribution of all possible values of
the sample
x
mean .
x
Expected Value of
x
E( ) =

where:
= the population mean

20

Sampling Distribution ofx

Standard Deviationx of
Finite Population

N n
x ( )
n N 1

Infinite Population

x
n

A finite population is treated as being


infinite if n/N < .05.
is the finite correction factor.
( N n) / ( N 1)
x is referred to as the standard error of the
mean.

21

x
Sampling Distribution of

If we use a large (n > 30) simple random


sample, the central limit theorem enables us
x
to conclude that the sampling
distribution of
can be approximated by a normal probability
distribution.

x sample is small (n <


When the simple random
30), the sampling distribution of
can be
considered normal only if we assume the
population has a normal probability
distribution.

22

Example: St. Andrews

Sampling Distributionx of

for the SAT Scores

80
x

14.6
n
30

E ( x ) 990

23

Example: St. Andrews

Sampling Distributionx of
for the SAT Scores
What is the probability that a simple
random sample of 30 applicants will provide
an estimate of the population mean SAT score
that is within plus or minus 10 of the actual
population mean ?
x
In other words, what is the probability that
will be between 980 and 1000?

24

Example: St. Andrews

Sampling Distributionx of

for the SAT Scores


Sampling
distribution
of x

Area = .2518

Area = .2518

x
980 9901000
Using the standard normal probability table
with
z = 10/14.6= .68, we have area = (.2518)(2) =
.5036
25

Sampling Distribution ofp

p of
The sampling distribution
is the
probability distribution of all possible values of
p sample proportion
the
p
Expected Value of

E ( p) p

where:
p = the population proportion

26

p
Sampling Distribution of

Standard Deviation of
p
Finite Population

p (1 p ) N n
n
N 1

Infinite Population

p (1 p )
n

is referred to as the standard error of the


proportion.

27

Example: St. Andrews

Sampling Distributionp of
Residents

for In-State

.72(1 .72)
p
.082
30

E(p) .72

The normal probability distribution is an


acceptable approximation since np = 30(.72)
= 21.6 > 5 and
28
n(1 - p) = 30(.28) = 8.4 > 5.

Example: St. Andrews

Sampling Distributionp of
for In-State
Residents
What is the probability that a simple
random sample of 30 applicants will provide
an estimate of the population proportion of
applicants desiring on-campus housing that is
within plus or minus .05 of the actual p
population proportion?
In other words, what is the probability that
will be between .67 and .77?

29

Example: St. Andrews

Sampling Distributionp of
Residents

for In-State
Sampling
distribution
of p

Area = .2291

Area = .2291

p
0.67 0.72 0.77

For z = .05/.082 = .61, the area = (.2291)(2) =


.4582.
The probability is .4582 that the sample
proportion will be within +/-.05 of the actual 30

Properties of Point Estimators

Before using a sample statistic as a point


estimator, statisticians check to see whether
the sample statistic has the following
properties associated with good point
estimators.
Unbiasedness
Efficiency
Consistency

31

Properties of Point Estimators

Unbiasedness
If the expected value of the sample statistic
is equal to the population parameter being
estimated, the sample statistic is said to be an
unbiased estimator of the population
parameter.

32

Properties of Point Estimators

Efficiency
Given the choice of two unbiased
estimators of the same population parameter,
we would prefer to use the point estimator
with the smaller standard deviation, since it
tends to provide estimates closer to the
population parameter.
The point estimator with the smaller
standard deviation is said to have greater
relative efficiency than the other.

33

Properties of Point Estimators

Consistency
A point estimator is consistent if the values
of the point estimator tend to become closer to
the population parameter as the sample size
becomes larger.

34

Other Sampling Methods

Stratified Random Sampling


Cluster Sampling
Systematic Sampling
Convenience Sampling
Judgment Sampling

35

Stratified Random Sampling

The population is first divided into groups of


elements called strata.
Each element in the population belongs to one
and only one stratum.
Best results are obtained when the elements
within each stratum are as much alike as
possible (i.e. homogeneous group).
A simple random sample is taken from each
stratum.
Formulas are available for combining the
stratum sample results into one population
parameter estimate.

36

Stratified Random Sampling

Advantage: If strata are homogeneous, this


method is as precise as simple random
sampling but with a smaller total sample size.
Example: The basis for forming the strata
might be department, location, age, industry
type, etc.

37

Cluster Sampling

The population is first divided into separate


groups of elements called clusters.
Ideally, each cluster is a representative smallscale version of the population (i.e.
heterogeneous group).
A simple random sample of the clusters is then
taken.
All elements within each sampled (chosen)
cluster form the sample.
continued

38

Cluster Sampling

Advantage: The close proximity of elements


can be cost effective (I.e. many sample
observations can be obtained in a short time).
Disadvantage: This method generally requires
a larger total sample size than simple or
stratified random sampling.
Example: A primary application is area
sampling, where clusters are city blocks or
other well-defined areas.

39

Systematic Sampling

If a sample size of n is desired from a


population containing N elements, we might
sample one element for every n/N elements in
the population.
We randomly select one of the first n/N
elements from the population list.
We then select every n/Nth element that
follows in the population list.
This method has the properties of a simple
random sample, especially if the list of the
population elements is a random ordering.
continued
40

Systematic Sampling

Advantage: The sample usually will be easier


to identify than it would be if simple random
sampling were used.
Example: Selecting every 100th listing in a
telephone book after the first randomly
selected listing.

41

Convenience Sampling

It is a nonprobability sampling technique.


Items are included in the sample without
known probabilities of being selected.
The sample is identified primarily by
convenience.
Advantage: Sample selection and data
collection are relatively easy.
Disadvantage: It is impossible to determine
how representative of the population the
sample is.
Example: A professor conducting research
might use student volunteers to constitute a
sample.
42

Judgment Sampling

The person most knowledgeable on the subject


of the study selects elements of the population
that he or she feels are most representative of
the population.
It is a nonprobability sampling technique.
Advantage: It is a relatively easy way of
selecting a sample.
Disadvantage: The quality of the sample
results depends on the judgment of the person
selecting the sample.
Example: A reporter might sample three or
four senators, judging them as reflecting the
general opinion of the senate.

43

End of Chapter 7

44

You might also like