You are on page 1of 44

Introduction to Probability

and Statistics
Tenth Edition

Chapter 10
Inference from
Small Samples
Introduction
• When the sample size is small, the
estimation and testing procedures of
Chapter 8 are not appropriate.
• There are equivalent small sample test and
estimation procedures for
µ , the mean of a normal population
µ 1 −µ 2 , the difference between two
population means
σ 2, the variance of a normal population
The ratio of two population variances.
The Sampling Distribution
of the Sample Mean
• When we take a sample from a normal population, the sample mean
has a normal distribution for any sample size n, and
x

x−µ x−µ
z= is not normal!

σnormal
has a standard / n distribution.
s/ n
• But if σ is unknown, and we must use s to estimate it, the resulting
statistic is not normal.
normal
Student’s t Distribution
• Fortunately, this statistic does have a sampling distribution that is
well known to statisticians, called the Student’s t distribution, with
n-1 degrees of freedom.

x−µ
t=
s/ n
•We can use this distribution to create
estimation testing procedures for the population
mean µ .
Properties of Student’s t
•Mound-shaped and
symmetric about 0.
•More variable than z,
with “heavier tails”

• Shape depends on the sample size n or the


degrees of freedom, n-1.
• As n increases the shapes of the t and z
distributions become almost identical.
Using the t-Table
• Table 4 gives the values of t that cut off certain
critical values in the tail of the t distribution.
• Index df and the appropriate tail area a to find
ta,the value of t with area a to its right.
For a random sample of size n =
10, find a value of t that cuts
off .025 in the right tail.
Row = df = n –1 = 9
Column subscript = a = .025
t.025 = 2.262
Small Sample Inference
for a Population Mean µ
• The basic procedures are the same as those used
for large samples. For a test of hypothesis:

Test H 0 : µ = µ 0 versus H a : one or two tailed


using the test statistic
x − µ0
t=
s/ n
using p - values or a rejection region based on
a t - distribution with df = n − 1.
Small Sample Inference
for a Population Mean
µ
• For a 100(1−α )% confidence interval
for the population mean µ :
s
x ± tα / 2
n
where tα / 2 is the value of t that cuts off area α/2
in the tail of a t - distribution with df = n − 1.
Example
A sprinkler system is designed so that the average
time for the sprinklers to activate after being
turned on is no more than 15 seconds. A test of 5
systems gave the following times:
17, 31, 12, 17, 13, 25
Is the system working as specified? Test using
α = .05.
H 0 : µ = 15 (working as specified)
H a : µ > 15 (not working as specified)
Example
First, calculate the sample mean and standard deviation,
using your calculator or the formulas in Chapter 2.

∑ xi 115
x= = = 19.167
n 6
(∑ x) 1152 2
∑x − 2
2477 −
s= n = 6 = 7.387
n −1 5
Example
Calculate the test statistic and find the rejection region for
α =.05.

Test statistic : Degrees of freedom :


x − µ 0 19.167 − 15
t= = = 1.38 df = n − 1 = 6 − 1 = 5
s / n 7.387 / 6

Rejection
RejectionRegion:
Region:Reject
RejectHH00ififtt>>
2.015.
2.015.IfIfthe
thetest
teststatistic
statisticfalls
fallsin
in
the
therejection
rejectionregion,
region,its
itsp-value
p-value
will
willbe
beless thanαα ==.05.
lessthan .05.
Conclusion
Compare the observed test statistic to the rejection region,
and draw conclusions.

Test statistic : t = 1.38


H 0 : µ = 15
Rejection Region :
H a : µ > 15
Reject H 0 if t > 2.015.

Conclusion: For our example, t = 1.38 does not fall in the


rejection region and H0 is not rejected. There is insufficient
evidence to indicate that the average activation time is greater
than 15.
Approximating the
p-value
• You can only approximate the p-value
for the test using Table 4.

Since the observed value


of t = 1.38 is smaller
than t.10 = 1.476,
p-value > .10.
The exact p-value
• You can get the exact p-value
using some calculators or a computer.
p-value = .113 which
is greater than .10 as
we approximated
using Table 4.
One-Sample
One-SampleT:
T:Times
Times
Test
Test of
of mu
mu == 15
15 vs
vs >> 15
15

95%
95%
Lower
Lower
Variable N
Variable N Mean
Mean StDev
StDev SE Mean
SE Mean Bound
Bound TT PP
Times
Times 66 19.1667
19.1667 7.3869
7.3869 3.0157
3.0157 13.0899
13.0899 1.38
1.38 0.113
0.113
Testing the Difference
between Two Means
As in Chapter 9, independent random samples of size n and n are drawn
1 2
2 2
from populations 1 and 2 with means μ and μ and variances σ and σ .
1 2 1 2

Since the sample sizes are small, the two populations must be normal.

•To test:
•H0: µ 1−µ 2 = D0 versus Ha: one of three
where D0 is some hypothesized difference,
usually 0.
Testing the Difference
between Two Means
•The test statistic used in Chapter 9
x1 − x2
z≈
2 2
s s
1
+ 2
n1 n2
•does not have either a z or a t distribution, and
cannot be used for small-sample inference.
•We need to make one more assumption, that
the population variances, although unknown,
are equal.
Testing the Difference
between Two Means
•Instead of estimating each population variance
separately, we estimate the common variance
with
( n − 1) s 2
+ ( n − 1) s 2 •And the resulting
s2 = 1 1 2 2
test statistic,
n +n −2
1 2

x1 − x2 − D0 has a t distribution
t=
1 1 with n1+n2-2 degrees
s  + 
2

 n1 n2  of freedom.
Estimating the Difference
between Two Means
•You can also create a 100(1-α )% confidence
interval for µ 1-µ 2. Remember the three
Remember the three
assumptions:
assumptions:
1 1
( x1 − x2 ) ± tα / 2 s  + 
2
1.1. Original
Originalpopulations
populations
 n1 n2  normal
normal
2.2. Samples
Samplesrandom
randomand
and
( n − 1) s 2
+ ( n − 1) s 2
independent
independent
with s 2 = 1 1 2 2
n1 + n2 − 2 3.3. Equal
Equalpopulation
population
variances.
variances.
Example
• Two training procedures are compared by
measuring the time that it takes trainees to
assemble a device. A different group of trainees are
taught using each method. Is there a difference in the
two methods? Use α = .01. H : µ − µ = 0
0 1 2
Time to Method 1 Method 2
Assemble H a : µ1 − µ 2 ≠ 0
Sample size 10 12 Test statistic :
Sample mean 35 31
x1 − x2 − 0
Sample Std Dev 4.9 4.5 t=
1 1
s  + 
2

 n1 n2 
Example
• Solve this problem by approximating the p-
value using Time to Method 1 Method 2
Assemble
Table 4.
Sample size 10 12
Sample mean 35 31
Sample Std Dev 4.9 4.5

Calculate : Test statistic :


(n1 − 1) s + (n2 − 1) s
2 2
35 − 31
s =
2 1 2
t=
n1 + n2 − 2
1 1
21.942 + 
9(4.9 2 ) + 11(4.52 )  10 12 
= = 21.942
20 = 1.99
Example
p - value : P (t > 1.99) + P (t < −1.99)
1
P (t > 1.99) = ( p - value)
2
df .025
.025<<½(
½(p-value)
p-value)<<.05
df==nn11++nn22––22==10
10++12
12––22==20
20 .05
.05
.05<<p-value
p-value<<.10
.10
Since
Sincethethep-value
p-valueisis
greater thanαα ==.01,
greaterthan .01,HH00
isisnot
notrejected.
rejected.There
Thereisis
insufficient
insufficientevidence
evidencetoto
indicate
indicateaadifference
differencein
in
the
thepopulation
populationmeans.
means.
Testing the Difference
between Two Means
•How can you tell if the equal variance
assumption is reasonable?
Rule of Thumb :
larger s 2
If the ratio, 2
≤ 3,
smaller s
the equal variance assumption is reasonable.
larger s 2
If the ratio, 2
> 3,
smaller s
use an alternative test statistic.
Testing the Difference
between Two Means
•If the population variances cannot be assumed
equal, the test statistic
2
x1 − x2 s 2
s 
2

t≈  + 
1 2

 n1 n2 
s12 s22 df ≈ 2
+ ( s1 / n1 ) 2 ( s22 / n2 ) 2
n1 n2 +
n1 − 1 n2 − 1

•has an approximate t distribution with degrees


of freedom given above. This is most easily
done by computer.
The Paired-Difference
Test
•Sometimes the assumption of independent
samples is intentionally violated, resulting in a
matched-pairs or paired-difference test.
test
•By designing the experiment in this way, we can
eliminate unwanted variability in the experiment
by analyzing only the differences,
di = x1i – x2i
•to see if there is a difference in the two
population means, µ 1−µ 2.
Example
Car 1 2 3 4 5
Type A 10.6 9.8 12.3 9.7 8.8
Type B 10.2 9.4 11.8 9.1 8.3

• One Type A and one Type B tire are randomly assigned


to each of the rear wheels of five cars. Compare the
average tire wear for types A and B using a test of
hypothesis.
• But the samples are not
H 0 : µ1 − µ 2 = 0 independent. The pairs of
responses are linked because
H a : µ1 − µ 2 ≠ 0
measurements are taken on the
same car.
The Paired-Difference
Test
To test H 0 : µ1 − µ 2 = 0 we test H 0 : µ d = 0
using the test statistic
d −0
t=
sd / n
where n = number of pairs, d and sd are the
mean and standard deviation of the differences, d i .
Use the p - value or a rejection region based on
a t - distribution with df = n − 1.
Example
Car 1 2 3 4 5
Type A 10.6 9.8 12.3 9.7 8.8
Type B 10.2 9.4 11.8 9.1 8.3
Difference .4 .4 .5 .6 .5

H 0 : µ1 − µ 2 = 0
H a : µ1 − µ 2 ≠ 0 Test statistic :
d −0 .48 − 0
∑ di t= = = 12.8
d =
Calculate = .48 sd / n .0837 / 5
n

∑d 2 −
( ∑ d i ) 2

i
sd = n = .0837
n −1
Example
Car 1 2 3 4 5
Type A 10.6 9.8 12.3 9.7 8.8
Type B 10.2 9.4 11.8 9.1 8.3
Difference .4 .4 .5 .6 .5

Rejection region: Reject H0 if t


> 2.776 or t < -2.776.
Conclusion: Since t = 12.8, H0
is rejected. There is a
difference in the average tire
wear for the two types of tires.
Some Notes
•You can construct a 100(1-α )% confidence
interval for a paired experiment using
sd
d ± tα / 2
n
•Once you have designed the experiment by
pairing, you MUST analyze it as a paired
experiment. If the experiment is not designed as a
paired experiment in advance, do not use this
procedure.
Inference Concerning
a Population Variance
•Sometimes the primary parameter of interest
is not the population mean µ but rather the
population variance σ 2. We choose a random
sample of size n from a normal distribution.
•The sample variance s2 can be used in its
standardized form:
( n − 1) s 2
χ2 =
σ2
• which has a Chi-Square distribution with n - 1
degrees of freedom.
Inference Concerning
a Population Variance
•Table 5 gives both upper and lower critical
values of the chi-square statistic for a given df.

For example, the value of


chi-square that cuts off .
05 in the upper tail of the
distribution with df = 5 is
χ 2 =11.07.
Inference Concerning
a Population Variance
To test H 0 : σ 2 = σ 02 versus H a : one or two tailed
we use the test statistic
(n − 1) s 2
χ =
2
with a rejection region based on
σ02

a chi - square distribution with df = n − 1.

Confidence interval :
(n − 1) s 2 ( n − 1) s 2
<σ < 2
2

χα / 2
2
χ (1−α / 2)
Example
•A cement manufacturer claims that his cement
has a compressive strength with a standard
deviation of 10 kg/cm2 or less. A sample of n =
10 measurements produced a mean and standard
deviation of 312 and 13.96, respectively.
AAtest
testof
ofhypothesis:
hypothesis: uses
usesthe
thetest
teststatistic:
statistic:
HH00::σσ 22==10
10(claim
(claimisis ( n − 1) s 2
9 (13 . 96 2
)
χ =
2
= = 17.5
correct)
correct) 10 2
100
HHaa::σσ 22>>10
10(claim
(claimisis
wrong)
wrong)
Example
•Do these data produce sufficient evidence to
reject the manufacturer’s claim? Use α = .05.
Rejection region: Reject H0
if
χ 2 > 16.919 (α = .05).
Conclusion: Since χ 2=
17.5, H0 is rejected. The
standard deviation of the
cement strengths is more
than 10.
Approximating the
p-value
p - value : P ( χ > 17.5) with df = n − 1 = 9
2

.025
.025<<p-value
p-value<<.05
.05
Since
Sincethe
thep-value
p-valueisisless
less
thanαα ==.05,
than .05,HH00isisnot
not
rejected.
rejected.There
Thereisis
sufficient
sufficientevidence
evidenceto to
reject
rejectthe
themanufacturer’s
manufacturer’s
claim.
claim.
Inference Concerning
Two Population Variances
•We can make inferences about the ratio of
two population variances in the form a ratio.
We choose two independent random samples
of size n1 and n2 from normal distributions.
•If the two population variances are equal, the
statistic s12
F=
s22
•has an F distribution with df1 = n1 - 1 and df2 =
n2 - 1 degrees of freedom.
Inference Concerning
Two Population Variances
•Table 6 gives only upper critical values of the
F statistic for a given pair of df1 and df2.

For example, the value of


F that cuts off .05 in the
upper tail of the
distribution with df1 = 5
and df2 = 8 is F =3.69.
Inference Concerning
Two Population Variances
To test H 0 : σ 12 = σ 22 versus H a : one or two tailed
we use the test statistic
s12
F = 2 where s12 is the larger of the two sample variances.
s2
with a rejection region based on an F distribution with
df1 = n1 − 1 and df 2 = n2 − 1.
Confidence interval :
s2
1 σ2
s 2
1
< 1
< Fdf 2 ,df1
1
s Fdf1 ,df 2 σ
2
2
2
2 s 2
2
Example
•An experimenter has performed a lab
experiment using two groups of rats. He wants
to test H0: µ 1 = µ 2, but first he wants to
make sure that the population variances are
Standard (2) Experimental (1)
equal.
Sample size 10 11
Sample mean 13.64 12.42
Sample Std Dev 2.3 5.8

Preliminary test :
H 0 : σ 12 = σ 22 versus H a : σ 12 ≠ σ 22
Example
Standard (2) Experimental (1)

Sample size 10 11
Sample Std Dev 2.3 5.8
Test statistic :
H0 : σ = σ
2
1
2
2
s12 5.82
Ha :σ ≠ σ
2
1
2
2 F= 2 = 2
= 6.36
s2 2.3

We
Wedesignate
designatethe
thesample
samplewith
withthe
thelarger
largerstandard
standard
deviation
deviationas
assample
sample1,1,totoforce
forcethe
thetest
teststatistic
statistic
into
intothe
theupper
uppertail
tailof
ofthe
theFFdistribution.
distribution.
Example
H 0 : σ 12 = σ 22 Test statistic :
2 2
Ha :σ ≠ σ
2 2 s 5.8
1 2 F= = 1
2 2
= 6.36
s 2.3
2

The
Therejection
rejectionregion
regionisistwo-tailed, withαα ==.05,
two-tailed,with .05,but
butwe
weonly
only
need
needto tofind
findthe
theupper
uppercritical
criticalvalue,
value,which hasαα /2
whichhas /2 ==.025
.025to
to
its
itsright.
right.
From
FromTable
Table6,6,with
withdf
df11=10
=10and
anddf
df22==9,9,we
wereject
rejectHH00ififFF>>
3.96.
3.96.
CONCLUSION:
CONCLUSION:Reject RejectHH00..There
Thereisissufficient
sufficientevidence
evidencetoto
indicate
indicatethat
thatthe
thevariances
variancesareareunequal.
unequal.Do
Donotnotrely
relyon
onthe
the
assumption
assumptionof ofequal
equalvariances
variancesforforyour
yourtttest!
test!
Key Concepts
I. Experimental Designs for Small Samples
1. Single random sample: The sampled population
must be normal.
2. Two independent random samples: Both sampled
populations must be normal.
a. Populations have a common variance σ 2.
b. Populations have different variances
3. Paired-difference or matched-pairs design: The
samples are not independent.
Key Concepts
II. Statistical Tests of Significance
1. Based on the t, F, and χ 2 distributions
2. Use the same procedure as in Chapter 9
3. Rejection region — critical values and significance levels:
based on the t, F, and χ 2 distributions with the appropriate
degrees of freedom
4. Tests of population parameters: a single mean, the
difference between two means, a single variance, and the
ratio of two variances
III. Small Sample Test Statistics
To test one of the population parameters when the sample sizes
are small, use the following test statistics:
Key Concepts

You might also like