You are on page 1of 175

STA2, iST2

FS 2017
Chapter 13: Samples and Surveys

Chapter_13_14.tns

1 / 32
Car of the Year

In any particular year millions of new cars are bought.

2 / 32
Car of the Year

In any particular year millions of new cars are bought.

All the people


who bought
a new car
in 2011

2 / 32
Car of the Year

In any particular year millions of new cars are bought.

Results
All the people
from the
who bought
people who
a new car
completed
in 2011
a survey

2 / 32
Car of the Year

In any particular year millions of new cars are bought.

Results
All the people
from the
who bought
people who
a new car
completed
in 2011
a survey

2 / 32
Car of the Year

In any particular year millions of new cars are bought.

Results
All the people
from the
who bought
people who
a new car
completed
in 2011
a survey

Question: How trustworthy is this result?

2 / 32
Populations and Samples

3 / 32
Populations and Samples

I The Population is the entire collection of interest.

3 / 32
Populations and Samples

I The Population is the entire collection of interest.

I A Sample is a subset of the population.

3 / 32
Populations and Samples

I The Population is the entire collection of interest.

I A Sample is a subset of the population.

Population of Sample of people


all the people who bought a car
who bought and completed
a new car the survey.

3 / 32
Populations and Samples

I The Population is the entire collection of interest.

I A Sample is a subset of the population.

Population of Sample of people


all the people who bought a car
who bought and completed
a new car the survey.

Question: How representative is the sample of the population?


3 / 32
Representative and Bias

4 / 32
Representative and Bias

I A sample is Representative if it reflects the same mix as in


the entire population.

4 / 32
Representative and Bias

I A sample is Representative if it reflects the same mix as in


the entire population.
I Samples that distort the mix of the population, are said to
have a Bias.

4 / 32
Representative and Bias

I A sample is Representative if it reflects the same mix as in


the entire population.
I Samples that distort the mix of the population, are said to
have a Bias.
I A Literary Bias: In 1936 The Literary Digest magazine sent
out 10 million questionnaires to readers and potential
readers: 2 million were returned. They predicted that
Landon would defeat Roosevelt.

4 / 32
Representative and Bias

I A sample is Representative if it reflects the same mix as in


the entire population.
I Samples that distort the mix of the population, are said to
have a Bias.
I A Literary Bias: In 1936 The Literary Digest magazine sent
out 10 million questionnaires to readers and potential
readers: 2 million were returned. They predicted that
Landon would defeat Roosevelt.
I BUT Roosevelt defeats Landon in a landslide!

4 / 32
Representative and Bias

I A sample is Representative if it reflects the same mix as in


the entire population.
I Samples that distort the mix of the population, are said to
have a Bias.
I A Literary Bias: In 1936 The Literary Digest magazine sent
out 10 million questionnaires to readers and potential
readers: 2 million were returned. They predicted that
Landon would defeat Roosevelt.
I BUT Roosevelt defeats Landon in a landslide!
I Soon after, the magazine goes out of existence.
4 / 32
Gallup Poll
Question: What went wrong?

5 / 32
Gallup Poll
Question: What went wrong?
I 1936 was the middle of the Great Depression.

5 / 32
Gallup Poll
Question: What went wrong?
I 1936 was the middle of the Great Depression.
I Not everyone was able to afford a telephone and a
magazine subscription.

5 / 32
Gallup Poll
Question: What went wrong?
I 1936 was the middle of the Great Depression.
I Not everyone was able to afford a telephone and a
magazine subscription.
I Therefore the sample was biased to wealthy people.

5 / 32
Gallup Poll
Question: What went wrong?
I 1936 was the middle of the Great Depression.
I Not everyone was able to afford a telephone and a
magazine subscription.
I Therefore the sample was biased to wealthy people.
I Wealthy people tended to vote Republican.

5 / 32
Gallup Poll
Question: What went wrong?
I 1936 was the middle of the Great Depression.
I Not everyone was able to afford a telephone and a
magazine subscription.
I Therefore the sample was biased to wealthy people.
I Wealthy people tended to vote Republican.
Question: What went right?

5 / 32
Gallup Poll
Question: What went wrong?
I 1936 was the middle of the Great Depression.
I Not everyone was able to afford a telephone and a
magazine subscription.
I Therefore the sample was biased to wealthy people.
I Wealthy people tended to vote Republican.
Question: What went right?
I George Gallup correctly predicted the election even with a
reduced sample size.

5 / 32
Gallup Poll
Question: What went wrong?
I 1936 was the middle of the Great Depression.
I Not everyone was able to afford a telephone and a
magazine subscription.
I Therefore the sample was biased to wealthy people.
I Wealthy people tended to vote Republican.
Question: What went right?
I George Gallup correctly predicted the election even with a
reduced sample size.
I He also correctly predicted the results of the Digest’s poll!

5 / 32
Gallup Poll
Question: What went wrong?
I 1936 was the middle of the Great Depression.
I Not everyone was able to afford a telephone and a
magazine subscription.
I Therefore the sample was biased to wealthy people.
I Wealthy people tended to vote Republican.
Question: What went right?
I George Gallup correctly predicted the election even with a
reduced sample size.
I He also correctly predicted the results of the Digest’s poll!
I Gallup is still going strong today.
5 / 32
An Experiment in Randomness

6 / 32
An Experiment in Randomness

I Representative samples are difficult to construct.

6 / 32
An Experiment in Randomness

I Representative samples are difficult to construct.

I The best method is to use a Random selection method.

6 / 32
An Experiment in Randomness

I Representative samples are difficult to construct.

I The best method is to use a Random selection method.

I Choose randomly six numbers from the gi-


ven 42, in any order, without replacement.
Then choose randomly one lucky number
out of 6.

6 / 32
Humans versus Machines!
Humans versus Machines!

Question: Did you choose at least two consecutive numbers?


Humans versus Machines!

Question: Did you choose at least two consecutive numbers?

The probability that at least two numbers are consecutive is


Humans versus Machines!

Question: Did you choose at least two consecutive numbers?

The probability that at least two numbers are consecutive is 56%.


Humans versus Machines!

Question: Did you choose at least two consecutive numbers?

The probability that at least two numbers are consecutive is 56%.

randSamp(seq(i,i,1,42),6,1)
randSamp(seq(i,i,1,42),6,1)
randSamp(seq(i,i,1,42),6,1)
randSamp(seq(i,i,1,42),6,1)

7 / 32
Humans versus Machines!

Question: Did you choose at least two consecutive numbers?

The probability that at least two numbers are consecutive is 56%.

randSamp(seq(i,i,1,42),6,1)
randSamp(seq(i,i,1,42),6,1)
randSamp(seq(i,i,1,42),6,1)
randSamp(seq(i,i,1,42),6,1)

Conclusion: Humans are not good at being random!

7 / 32
Randomization

8 / 32
Randomization

8 / 32
Randomization

8 / 32
Randomization

8 / 32
Randomization

Sample

8 / 32
Randomization

Population Sample

8 / 32
Randomization

Population Sample

Question: Do you trust that this sample is representative of the


population?

8 / 32
Randomization

Randomization Population Sample

Question: Do you trust that this sample is representative of the


population?

Answer: It depends on the randomization.

8 / 32
Inferential Statistics

Given what is inside ?

9 / 32
Inferential Statistics

Given what is inside ?

Inferential Statistics is the use of sample statistics to make


deductions about population parameters.

9 / 32
Sample Sizes

10 / 32
Sample Sizes

I Let N be the population size: either unknown or else too


large for each member to be reachable.

10 / 32
Sample Sizes

I Let N be the population size: either unknown or else too


large for each member to be reachable.

I Let n be the sample size and we assume that it is much


smaller in comparison to N.

10 / 32
Sample Sizes

I Let N be the population size: either unknown or else too


large for each member to be reachable.

I Let n be the sample size and we assume that it is much


smaller in comparison to N.

I A Surprising Property: Larger populations do not require


larger samples!

10 / 32
Simple Random Sample

11 / 32
Simple Random Sample

I A Simple Random Sample (SRS) is obtained by a


procedure that makes every sample of size n from the
population equally likely.

11 / 32
Simple Random Sample

I A Simple Random Sample (SRS) is obtained by a


procedure that makes every sample of size n from the
population equally likely.

I A Sampling Frame is a list of items from which to draw the


sample. Ideally this list would be the population.

11 / 32
Simple Random Sample

I A Simple Random Sample (SRS) is obtained by a


procedure that makes every sample of size n from the
population equally likely.

I A Sampling Frame is a list of items from which to draw the


sample. Ideally this list would be the population.

I Given a sampling frame, we can create an SRS using


random numbers generated from a computer. We
demonstrate this with Volunteers.

11 / 32
Identifying the Sample Frame

Obtaining a suitable sample frame is difficult. Consider an


election poll:

12 / 32
Identifying the Sample Frame

Obtaining a suitable sample frame is difficult. Consider an


election poll:

What we What we
HAVE WANT

A list of
all the
people who
WILL vote

12 / 32
Identifying the Sample Frame

Obtaining a suitable sample frame is difficult. Consider an


election poll:

What we What we
HAVE WANT

A list of
all the
people who
WILL vote

The people who do vote tend not to form an SRS of the voter’s list.
12 / 32
Hypothetical Populations

Some populations do not exist!

13 / 32
Hypothetical Populations

Some populations do not exist! Consider a farming experiment:

+ ⇒
10% BIGGER

13 / 32
Hypothetical Populations

Some populations do not exist! Consider a farming experiment:

+ ⇒
10% BIGGER

Question: If 300 such oranges exist, is this the population?

13 / 32
Hypothetical Populations

Some populations do not exist! Consider a farming experiment:

+ ⇒
10% BIGGER

Question: If 300 such oranges exist, is this the population?

No! The population is the collection of all potential oranges to come.

13 / 32
Estimating Parameters

14 / 32
Estimating Parameters

µ σ2 p

14 / 32
Estimating Parameters

µ σ2 p

x s2 p
b

14 / 32
Estimating Parameters

µ σ2 p

x s2 p
b

I Given x, what can we say about µ?

14 / 32
Estimating Parameters

µ σ2 p

x s2 p
b

I Given x, what can we say about µ?

I Given p
b, what can we say about p?

14 / 32
Estimating Parameters

µ σ2 p

x s2 p
b

I Given x, what can we say about µ?

I Given p
b, what can we say about p?

I In General: Given a sample statistic, what can we say


about the population parameter?
14 / 32
Sampling Variation

15 / 32
Sampling Variation

I We want to know about some population parameter.

15 / 32
Sampling Variation

I We want to know about some population parameter.

I We take a sample and measure the appropriate statistic.

15 / 32
Sampling Variation

I We want to know about some population parameter.

I We take a sample and measure the appropriate statistic.

I We select SRSs of size 5 from Similar Diamonds:

sample1:=randSamp(price,5,1)
mean(sample1)
sample2:=randSamp(price,5,1)
mean(sample2)

15 / 32
Sampling Variation

I We want to know about some population parameter.

I We take a sample and measure the appropriate statistic.

I We select SRSs of size 5 from Similar Diamonds:

sample1:=randSamp(price,5,1)
mean(sample1)
sample2:=randSamp(price,5,1)
mean(sample2)

I Different samples lead to different values of this statistic.

15 / 32
Sampling Distribution

16 / 32
Sampling Distribution

I Hence the statistic can be viewed as a random variable.

16 / 32
Sampling Distribution

I Hence the statistic can be viewed as a random variable.

I The Sampling Distribution is the probability distribution of


this random variable.

16 / 32
Sampling Distribution

I Hence the statistic can be viewed as a random variable.

I The Sampling Distribution is the probability distribution of


this random variable.

I Understanding sampling distributions for


particular statistics will help us make the
inferences about the appropriate popula-
tion parameters.

16 / 32
We Know Nothing!

17 / 32
We Know Nothing!
BUY
Data is stored: how
much, what etc.

17 / 32
We Know Nothing!
BUY
Data is stored: how
much, what etc.

NO BUY
We know nothing!
How many did not buy?
Why did they not buy?

I Nothing suitable

I wrong size/colour

I prices too high

17 / 32
We Know Nothing!
BUY
Data is stored: how
much, what etc.

NO BUY
We know nothing!
How many did not buy?
Why did they not buy?

I Nothing suitable

I wrong size/colour

I prices too high

In such circumstances a shop can organise an exit survey.


17 / 32
Exit Surveys

18 / 32
Exit Surveys

I Every survey needs a clear objective.

18 / 32
Exit Surveys

I Every survey needs a clear objective.

I Identify the population and the parameters of interest.

18 / 32
Exit Surveys

I Every survey needs a clear objective.

I Identify the population and the parameters of interest.

Customers who do not buy

18 / 32
Exit Surveys

I Every survey needs a clear objective.

I Identify the population and the parameters of interest.

Proportions for
Customers who do not buy
each of the reasons

18 / 32
Exit Surveys

I Every survey needs a clear objective.

I Identify the population and the parameters of interest.

Proportions for
Customers who do not buy
each of the reasons

I The sampling frame doesn’t exist, yet someone will need to


interview a random selection of shoppers who do not buy.

18 / 32
Exit Surveys

I Every survey needs a clear objective.

I Identify the population and the parameters of interest.

Proportions for
Customers who do not buy
each of the reasons

I The sampling frame doesn’t exist, yet someone will need to


interview a random selection of shoppers who do not buy.

I To be reliable the nonresponses need to be also recorded!

18 / 32
Alternative Sampling Methods

19 / 32
Alternative Sampling Methods

I Simple Random Sample: Selects individuals randomly.

19 / 32
Alternative Sampling Methods

I Simple Random Sample: Selects individuals randomly.


I Stratified Random Sample: Selects individuals randomly
within subsets of similar items (strata).

19 / 32
Alternative Sampling Methods

I Simple Random Sample: Selects individuals randomly.


I Stratified Random Sample: Selects individuals randomly
within subsets of similar items (strata).
I Cluster Sampling: A type of stratified sampling whereby the
strata are determined geographically.

19 / 32
Alternative Sampling Methods

I Simple Random Sample: Selects individuals randomly.


I Stratified Random Sample: Selects individuals randomly
within subsets of similar items (strata).
I Cluster Sampling: A type of stratified sampling whereby the
strata are determined geographically.
I Census: Selects all individuals of the population.

19 / 32
Alternative Sampling Methods

I Simple Random Sample: Selects individuals randomly.


I Stratified Random Sample: Selects individuals randomly
within subsets of similar items (strata).
I Cluster Sampling: A type of stratified sampling whereby the
strata are determined geographically.
I Census: Selects all individuals of the population.
I Voluntary Response: Selects individuals who volunteer to
participate.

19 / 32
Alternative Sampling Methods

I Simple Random Sample: Selects individuals randomly.


I Stratified Random Sample: Selects individuals randomly
within subsets of similar items (strata).
I Cluster Sampling: A type of stratified sampling whereby the
strata are determined geographically.
I Census: Selects all individuals of the population.
I Voluntary Response: Selects individuals who volunteer to
participate.
I Convenience Samples: Selects individuals who are readily
available.
19 / 32
Checklist for Surveys

20 / 32
Checklist for Surveys

I Does the sampling frame match the population?

20 / 32
Checklist for Surveys

I Does the sampling frame match the population?

I What is the rate of nonresponse?

20 / 32
Checklist for Surveys

I Does the sampling frame match the population?

I What is the rate of nonresponse?

I How was the question worded?

20 / 32
Checklist for Surveys

I Does the sampling frame match the population?

I What is the rate of nonresponse?

I How was the question worded?

I Did the interviewer affect the results?

20 / 32
Checklist for Surveys

I Does the sampling frame match the population?

I What is the rate of nonresponse?

I How was the question worded?

I Did the interviewer affect the results?

I Does survivor bias affect the survey?

20 / 32
Sampling Distribution of the Mean.

21 / 32
GPS Chips
GPS Chips
GPS Chips
GPS Chips
GPS Chips
GPS Chips
GPS Chips

Random sampling
to test the process

22 / 32
HALT Testing

To test chips Highly Accelerated Life Tests are set up.

23 / 32
HALT Testing

To test chips Highly Accelerated Life Tests are set up.

23 / 32
HALT Testing

To test chips Highly Accelerated Life Tests are set up.

23 / 32
HALT Testing

To test chips Highly Accelerated Life Tests are set up.

23 / 32
HALT Testing

To test chips Highly Accelerated Life Tests are set up.

A HALT test has 15 stages. When a chip fails a test, the stage
of the test is noted down. If a chip survives all tests it gets a
score of 16.
23 / 32
HALT Scores

24 / 32
HALT Scores

I HALT testing monitors the manufacturing process.

24 / 32
HALT Scores

I HALT testing monitors the manufacturing process.

I If every chip is exactly the same, it would be easy to see


when the process is malfunctioning.

24 / 32
HALT Scores

I HALT testing monitors the manufacturing process.

I If every chip is exactly the same, it would be easy to see


when the process is malfunctioning.

BUT

There is variation among the HALT scores for the chips.

24 / 32
HALT Scores

I HALT testing monitors the manufacturing process.

I If every chip is exactly the same, it would be easy to see


when the process is malfunctioning.

BUT

There is variation among the HALT scores for the chips.

I Instead of testing single chips, random samples are tested.

24 / 32
HALT Scores

I HALT testing monitors the manufacturing process.

I If every chip is exactly the same, it would be easy to see


when the process is malfunctioning.

BUT

There is variation among the HALT scores for the chips.

I Instead of testing single chips, random samples are tested.

I If the sample fails the test, then we want to conclude that


there is something wrong with the process.

24 / 32
HALT Data, X

25 / 32
HALT Data, X

I When the process is known to be operating correctly, we


have mean µ = 7 and standard deviation σ = 4.

25 / 32
HALT Data, X

I When the process is known to be operating correctly, we


have mean µ = 7 and standard deviation σ = 4.

I Data is collected over 21 days and stored in HALT.

25 / 32
HALT Data, X

I When the process is known to be operating correctly, we


have mean µ = 7 and standard deviation σ = 4.

I Data is collected over 21 days and stored in HALT.

I Let X be the random


variable representing the
HALT score for a chip.

25 / 32
HALT Data, X

I When the process is known to be operating correctly, we


have mean µ = 7 and standard deviation σ = 4.

I Data is collected over 21 days and stored in HALT.

I Let X be the random


variable representing the
HALT score for a chip.

I Is it enough to to know that some chips are performing


badly to conclude that the process is malfunctioning?
25 / 32
Sample Mean Data, X

26 / 32
Sample Mean Data, X

I The data shows that each day a sample of size 20 was


HALT tested.

26 / 32
Sample Mean Data, X

I The data shows that each day a sample of size 20 was


HALT tested.
I Let X be the random
variable representing
the mean HALT score
for these daily samples.

  
xbar:=seq mean iffn(day=i,halt,_) ,i,1,21

26 / 32
Sample Mean Data, X

I The data shows that each day a sample of size 20 was


HALT tested.
I Let X be the random
variable representing
the mean HALT score
for these daily samples.

  
xbar:=seq mean iffn(day=i,halt,_) ,i,1,21
I It’s expected that a single chip might score low, but it is
unlikely that a whole sample scores badly.
26 / 32
The Benefits of Averaging
The Benefits of Averaging

1. mean(halt)
mean(xbar)
The Benefits of Averaging

1. mean(halt) 6.94
mean(xbar)
The Benefits of Averaging

1. mean(halt) 6.94
mean(xbar) 6.94
The Benefits of Averaging

1. mean(halt) 6.94
mean(xbar) 6.94

The means of the the two distributions are the same.

27 / 32
The Benefits of Averaging

1. mean(halt) 6.94
mean(xbar) 6.94

The means of the the two distributions are the same.

2. stdevsamp(halt)
stdevsamp(xbar)

27 / 32
The Benefits of Averaging

1. mean(halt) 6.94
mean(xbar) 6.94

The means of the the two distributions are the same.

2. stdevsamp(halt) 4.24
stdevsamp(xbar)

27 / 32
The Benefits of Averaging

1. mean(halt) 6.94
mean(xbar) 6.94

The means of the the two distributions are the same.

2. stdevsamp(halt) 4.24
stdevsamp(xbar) 1.19

27 / 32
The Benefits of Averaging

1. mean(halt) 6.94
mean(xbar) 6.94

The means of the the two distributions are the same.

2. stdevsamp(halt) 4.24
stdevsamp(xbar) 1.19

The standard deviation of X is smaller than the standard


deviation of X .

27 / 32
The Benefits of Averaging

1. mean(halt) 6.94
mean(xbar) 6.94

The means of the the two distributions are the same.

2. stdevsamp(halt) 4.24
stdevsamp(xbar) 1.19

The standard deviation of X is smaller than the standard


deviation of X .

3. Although the shape for X was non-distinct, the shape of X


is bell-shaped.
27 / 32
Normality
Normality

X is
Normal
Normality

X is X is
Normal normal
Normality

X is X is
Normal normal

X is Non-
normal
Normality

X is X is
Normal normal

X is Non- X is asym-
normal ptotically
normal

28 / 32
Normality

X is X is
Normal normal

X is Non- Central Limit X is asym-


normal Theorem ptotically
normal

28 / 32
Normality

X is X is
Normal normal

X is Non- Central Limit X is asym-


normal Theorem ptotically
normal

Samples of a lar-
ge enough size

28 / 32
Sample Sizes

We will keep this simple:

29 / 32
Sample Sizes

We will keep this simple:

I For symmetric distributions a sample size of 20-25 is


sufficient.

29 / 32
Sample Sizes

We will keep this simple:

I For symmetric distributions a sample size of 20-25 is


sufficient.

I For skewed distributions sample sizes need to be


somewhat larger.

29 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

I E(X ) =

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = =

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1 n·µ
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = =
n

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1 n·µ
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = = µ
n

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1 n·µ
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = = µ
n

I Var (X ) =

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1 n·µ
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = = µ
n

1 1
I Var (X ) = n2
Var (X1 ) + ··· + n2
Var (Xn ) = =

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1 n·µ
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = = µ
n

1 1 n · σ2
I Var (X ) = n2
Var (X1 ) + ··· + n2
Var (Xn ) = =
n2

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1 n·µ
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = = µ
n

1 1 n · σ2 σ2
I Var (X ) = n2
Var (X1 ) + ··· + n2
Var (Xn ) = =
n2 n

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1 n·µ
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = = µ
n

1 1 n · σ2 σ2
I Var (X ) = n2
Var (X1 ) + ··· + n2
Var (Xn ) = =
n2 n

I SD(X ) =

30 / 32
Normal Models

Let X be a random variable with mean µ and standard


deviation σ. A sample of size n can be considered thus:

1 1
X = X1 + · · · + Xn
n n

If n is large enough, then X will be normal such that:

1 n·µ
I E(X ) = n E(X1 ) + · · · + n1 E(Xn ) = = µ
n

1 1 n · σ2 σ2
I Var (X ) = n2
Var (X1 ) + ··· + n2
Var (Xn ) = =
n2 n
σ
I SD(X ) = √
n

30 / 32
Standard Error of the Mean

SD(X ) measures the variation of the value of the sample mean


from sample to sample. It is often called the Standard Error of
the Mean and is further notated as σX .

31 / 32
Standard Error of the Mean

SD(X ) measures the variation of the value of the sample mean


from sample to sample. It is often called the Standard Error of
the Mean and is further notated as σX .

Hence σX = √σ .
n

31 / 32
Standard Error of the Mean

SD(X ) measures the variation of the value of the sample mean


from sample to sample. It is often called the Standard Error of
the Mean and is further notated as σX .

Hence σX = √σ .
n

1. If σ can be decreased, then σX decreases.

31 / 32
Standard Error of the Mean

SD(X ) measures the variation of the value of the sample mean


from sample to sample. It is often called the Standard Error of
the Mean and is further notated as σX .

Hence σX = √σ .
n

1. If σ can be decreased, then σX decreases.

2. If n is increased, then σX decreases.

31 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data)
stDevPop(data)
samp1:=randSamp(data,20)
mean(samp1)
stDevSamp(samp1)
  
xbar:=seq mean randSamp(data,20) ,i,1,500
mean(xbar)
stDevSamp(xbar)

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data)
samp1:=randSamp(data,20)
mean(samp1)
stDevSamp(samp1)
  
xbar:=seq mean randSamp(data,20) ,i,1,500
mean(xbar)
stDevSamp(xbar)

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data) 4.04
samp1:=randSamp(data,20)
mean(samp1)
stDevSamp(samp1)
  
xbar:=seq mean randSamp(data,20) ,i,1,500
mean(xbar)
stDevSamp(xbar)

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data) 4.04
samp1:=randSamp(data,20) {13.2, 4.2, 1.9...
mean(samp1)
stDevSamp(samp1)
  
xbar:=seq mean randSamp(data,20) ,i,1,500
mean(xbar)
stDevSamp(xbar)

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data) 4.04
samp1:=randSamp(data,20) {13.2, 4.2, 1.9...
mean(samp1) 6.96
stDevSamp(samp1)
  
xbar:=seq mean randSamp(data,20) ,i,1,500
mean(xbar)
stDevSamp(xbar)

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data) 4.04
samp1:=randSamp(data,20) {13.2, 4.2, 1.9...
mean(samp1) 6.96
stDevSamp(samp1) 5.08
  
xbar:=seq mean randSamp(data,20) ,i,1,500
mean(xbar)
stDevSamp(xbar)

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data) 4.04
samp1:=randSamp(data,20) {13.2, 4.2, 1.9...
mean(samp1) 6.96
stDevSamp(samp1) 5.08
  
xbar:=seq mean randSamp(data,20) ,i,1,500 {7.3, 8.3,...
mean(xbar)
stDevSamp(xbar)

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data) 4.04
samp1:=randSamp(data,20) {13.2, 4.2, 1.9...
mean(samp1) 6.96
stDevSamp(samp1) 5.08
  
xbar:=seq mean randSamp(data,20) ,i,1,500 {7.3, 8.3,...
mean(xbar) 6.79
stDevSamp(xbar)

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data) 4.04
samp1:=randSamp(data,20) {13.2, 4.2, 1.9...
mean(samp1) 6.96
stDevSamp(samp1) 5.08
  
xbar:=seq mean randSamp(data,20) ,i,1,500 {7.3, 8.3,...
mean(xbar) 6.79
stDevSamp(xbar) 0.881

stDevPop(data)/ 20
32 / 32
Sampling Distribution
 2

If X ∼ (µ, σ 2 ), then X ∼ N µ, σn .

In file Sampling we have some raw data.


mean(data) 6.99
stDevPop(data) 4.04
samp1:=randSamp(data,20) {13.2, 4.2, 1.9...
mean(samp1) 6.96
stDevSamp(samp1) 5.08
  
xbar:=seq mean randSamp(data,20) ,i,1,500 {7.3, 8.3,...
mean(xbar) 6.79
stDevSamp(xbar) 0.881

stDevPop(data)/ 20 0.883
32 / 32

You might also like