You are on page 1of 30

10 Hypothesis Testing 1

10 Hypothesis Testing
Hypothesis testing is used to investigate whether or not data are
consistent with some theory, when that theory can be quantied
through a particular value of a (population) parameter.
10.1 Four Basic Elements of a Hypothesis Test
Null hypothesis H
0
Alternative hypothesis H
a
Test statistic
Rejection region, RR (also called critical region)
10 Hypothesis Testing 2
1. The Null Hypothesis: denoted by H
0
.
The null hypothesis H
0
is an assertion about the population
(parameter). The purpose of the hypothesis testing is to test the
viability of the null hypothesis in the light of experimental data.
E.g. An experiment on a new antidepressant drug:
Ten people suering from depression were sampled and treated
with the new drug, and the level of depression of all subjects
was measured after 12 weeks, denoted by Y
1
, , Y
n
.
We want to compare the mean depression level of the
drug-taken patients with that of patients not taking any drug
(known to be
0
= 6, say).
The null hypothesis would be designated by the following symbols:
H
0
: = 6.
In this course, we will usually consider the simple null
hypothesis; that is, there is only one possible value of under H
0
.
10 Hypothesis Testing 3
2. The Alternative Hypothesis: denoted by H
a
.
The alternative hypothesis H
a
describes values of other than
those specied in H
0
. It is usually the hypothesis that we seek to
support based on the information contained in the sample data set.
In the previous example, if the researcher believes that the mean
depression level for patients taking the new drug is
smaller than
0
= 6, i.e. the drug is eective in reducing the
depression level, the researcher will use
H
a
: < 6 (left-tailed)
larger than
0
= 6, i.e. the drug is not eective, the researcher
will use
H
a
: > 6 (right-tailed)
dierent from
0
= 6, the researcher will use
H
a
: = 6 (two-tailed)
10 Hypothesis Testing 4
Remark: Notice that H
0
false doesnt necessarily mean that H
a
is
true (unless the union of H
0
and H
a
constitutes the entire
parameter space).
In general, the alternative is chosen to reect the researchers
belief about the parameter. Therefore, H
a
is sometimes called the
researchers hypothesis. The aim is to see if the researchers
hypothesis is supported by the data set.
3. The Test Statistic: the test statistic is a statistic that is used
to test H
0
versus H
a
. We make our decision by comparing the
observed value of the test statistic to its sampling distribution
under H
0
. Recall that a statistic is a function of the observed data
Y
1
, , Y
n
. used for inference about (population) parameters.
For the antidepressant drug example, a test statistic would be
based on the unbiased estimator of :
T =

Y
10 Hypothesis Testing 5
Most often a test statistic is based on a MVUE or MLE of the
parameter of interest that describes H
0
.
If the observed value of the test statistic is consistent with its
sampling distribution under H
0
, then there is not enough
evidence for H
a
If the observed value of the test statistic is not consistent with
its sampling distribution under H
0
, and is in the direction of
sampling distribution specied under H
a
, then there is enough
evidence to reject H
0
(and support H
a
.)
4. The Rejection Region (RR): The rejection region (RR)
species the values of the test statistic for which H
0
is rejected.
If t
obs
RR, reject H
0

If t
obs
/ RR, do not reject H
0

The RR is usually located in tails of the sampling distribution


of t derived under H
0
10 Hypothesis Testing 6
In the depression example, suppose RR = {

Y 4} and that
y = 4 is observed. Then we reject H
0
and conclude that there
is evidence that the druge, on average, reduces depression
levels. The sample mean, y = 3, is signicantly lower than
the hypothetical mean, = 6.)
The above four elements: H
0
, H
a
, test statistic T, and RR are the
building blocks of a statistical test. Any statistical test should include
all of the above four elements.
10 Hypothesis Testing 7
Another Hypothesis Testing Example
Assessing ESP (Extra-sensory perception) ability
(fabricated data)
Hypotheses:
H
0
: Rachael does not have ESP (=random guessing)
H
a
: Rachael has ESP
Experiment:
A deck of 52 cards has 26 red and 26 black. The cards are shued
and one selected at random. Rachael guesses the color of the card.
Data from experiment:
Carry out n = 20 repetitions, shuing each time, observing correct or
not correct.
Test statistic:
T = Number of correct responses out of n
10 Hypothesis Testing 8
Distribution of Test Statistic under H
0
:
T Bin(n = 20, p),
where p is the success rate, with H
0
: p = 0.5. (Trials assumed
independent)
Restate hypotheses in terms of parameters:
H
0
: p = 0.5 [pure guessing]
H
a
: p > 0.5 [responses informed by ESP]
Rejection Region RR = {t
obs
15}.
Outcome of the experiment: t
obs
= 16
Conclusion:
Since t
obs
falls in RR, i.e. t
obs
= 16 {t
obs
15}, contradicting the
null hypothesis H
0
. We reject H
0
as implausible and conclude that the
observed rate of success ( p = .8) is signicantly higher than p
0
= 0.5.
There is evidence of ESP.
10 Hypothesis Testing 9
Hypothesis TestingErrors
Because we are choosing between H
0
and H
a
based on the sample
data, there is a chance that we make an error.
t
obs
RR t
obs
/ RR
Reject H
0
Do not reject H
0
H
0
true Type I error
H
0
false Type II error
Two types of errors can be made in reaching a decision.
Type I error: Reject H
0
when H
0
is true.
Type II error: Fail to reject H
0
when H
0
is false.
10 Hypothesis Testing 10
Error Probabilities
Probability of making a Type I error:
= P(reject H
0
|H
0
true)
is called the Level of signicance or the Level of the test.
Probability of making a type II error (at a particular/given value of
=
a
in H
a
):
= (
a
) = P(fail to reject H
0
| =
a
).
10 Hypothesis Testing 11
Error Probabilities
Example 10.1.1 In the ESP example, calculate and when p = .7
The Type I error probability:
= P(reject H
0
|H
0
true)
10 Hypothesis Testing 12
The Type II error probability for p = 0.7:
(0.7) = P(fail to reject H
0
|p = 0.7)
= P(T < 15|p = 0.7)
= P(T 14|p = 0.7)
= B(14; n = 20, p = 0.7)
= 0.584.
Practice: calculate (0.9).
10 Hypothesis Testing 13
Error Probabilities
Type I error probability (also referred to as level of
signicance, or less formally, false positive rate)
= P(reject H
0
|H
0
true)
Type II error probability (also referred to false negative rate):
= (
a
) = P(fail to reject H
0
| =
a
).
Power(
a
) = 1 (
a
):
1 (
a
) = P(reject H
0
| =
a
).
Remark: Ideally we would like to reduce both and . However, with
a xed sample size n, we cannot reduce both of them. So we generally
x to a small value (say 0.05 or 0.01), and construct a RR (in this
way, will be minimized).
10 Hypothesis Testing 14
Example 10.1.2 Let Y Uniform(, + 1). Consider H
0
: = 0
versus H
a
: > 0. The test procedure: reject H
0
if Y > 0.95.
1. Calculate the level of signicance of this test
2. Calculate when = 0.5.
10 Hypothesis Testing 15
10.2 Large Sample Tests
Suppose Y
1
, , Y
n
is a rs with n large and from a distribution with
parameter of interest . Let

denote an estimator with large sample
distribution N(,
2

).
For example, if

is consistent and unbiased estimator of . By CLT,
we often have, for large n,

d
N(,
2

).
We assume that
2

is known or can be estimated by a consistent


estimator
2

.
10 Hypothesis Testing 16
Large Sample Tests
Notational convention: SD(

) =

and

SD(

) =

, V (

) =
2

.
Null and alternative Hypotheses:
H
0
: =
0
versus H
a
: >
0
,
where the value
0
is specied.
Test statistic: T =

RR = {

:

> k} (value of k to be determined)
Specify the level of signicance, (say 0.05 or 0.01), i.e., the
largest Type I error probability that can be tolerated.
We determine the value of k so that the corresponding type I error
probability equals the pre-specied level . That is, we determine
k by solving the equation:
= P(

> k| =
0
)
10 Hypothesis Testing 17

set
= P(

> k| =
0
)
= P
_

>
k
0

| =
0
_
P
_
Z >
k
0

| =
0
_
= P(Z > z

).
Therefore, to solve this equation for k, we need to set
k
0

= z

k =
0
+z

Thus,
RR =
_

:

>
0
+z

_
=
_

> z

_
.
10 Hypothesis Testing 18
Large Sample TestSummary Procedure
State hypotheses (right-tailed alternative):
H
0
: =
0
versus H
a
: >
0
Test Statistic:
Z =

, [Z
approx
N(0, 1) when =
0
]
Rejection region:
RR = {z
obs
: z
obs
> z

}.
If left-tailed test: H
0
: =
0
versus H
a
: <
0
, then
RR = {z
obs
: z
obs
< z

}.
If two-tailed test: H
0
: =
0
versus H
a
: =
0
, then
RR = {z
obs
: |z
obs
| > z
/2
}.
Note that the alternatives are paired with the corresponding RRs.
10 Hypothesis Testing 19
Large Sample TestUnknown Variance
Suppose that

is unknown and that


,
i.e.

is a consistent estimator of

. Then
Z =

Since

p
1, and

d
N(0, 1).
It follows by Sluskys Theorem that
Z =

d
N(0, 1).
Therefore, the same test procedure follows.
10 Hypothesis Testing 20
Large Sample TestUnknown Variance
Use Consistent Estimator of

, in the Z-test statistic.


Test Statistic:
Z =

SD(

)
, [Z
approx
N(0, 1) when =
0
]
Right-tailed test: H
0
: =
0
vs H
a
: >
0
.
RR = {z
obs
: z
obs
> z

}.
Left-tailed test: H
0
: =
0
vs H
a
: <
0
.
RR = {z
obs
: z
obs
< z

}.
Two-tailed test: H
0
: =
0
vs H
a
: =
0
.
RR = {z
obs
: |z
obs
| > z
/2
}.
10 Hypothesis Testing 21
Large Sample TestExamples
Example 10.2.1 A company wishes to test its claim that the average
lifetime of the tire they sell is 20,000 miles. The experiment yields
n = 36 observations with the sample mean y = 19, 375 miles. Carry
out the hypothesis test and conclude at signicance level = 0.01.
Suggested procedures:
Parameter of interest: = = mean life time for the population
Construct null and alternative hypotheses:
H
0
: = 20, 000 vs H
a
: < 20, 000
Compute the test statistic:
Construct the rejection region at level 0.01
10 Hypothesis Testing 22
Conclude:
Q: What if H
a
: = 20, 000?
10 Hypothesis Testing 23
Example 10.2.2 (10.19 of WMS) A sample of 40 independent
readings on the voltage for this circuit gave a sample mean of 128.6,
and standard deviation 2.1. Test the hypothesis that the mean output
voltage is 130 against the alternative that it is less than 130. Use a
test with level 0.05.
10 Hypothesis Testing 24
Large Sample TestProportions
Data: Y Bin(n, p)
Hypotheses: H
0
: p = p
0
vs H
a
: p > p
0
for a given value of p
0
(0, 1). The level is specied.
Parameter of interest: = p
Unbiased and consistent estimator

= p = Y/n

2

= V (Y/n) = p(1 p)/n. So under H


0
,
2

= p
0
(1 p
0
)/n.
Test statistic:
Z =

=
p p
0
_
p
0
(1 p
0
)/n
[by CLT , Z
d
N(0, 1) under H
0
]
RR = {z
obs
: z
obs
> z

}.
10 Hypothesis Testing 25
Large Sample TestProportion
Example 10.2.3 Each member of a panel of 100 tasters was
presented with three glasses of beer in random order, one of which was
dierent from the other two (e.g. AAB). Each taster was asked to
identify which beer was dierent. Let p = P(B is correctly identied).
If the tasters are unable to distinguish between the beers we would
expect p = 1/3. If they are able to distinguish we expect p > 1/3.
Suppose among 100 tasters, 40 answered correctly. Are tasters able to
distinguish? Carry out the hypothesis test at level 0.05.
Construct the appropriate null and alternative hypotheses:
10 Hypothesis Testing 26
Calculate the test statistic value:
Construct the rejection region at level 0.05
Conclude:
(BTW, what is the population here?)
10 Hypothesis Testing 27
Large Sample TestOther
Table 8.1 on page 397:
10 Hypothesis Testing 28
Example 10.2.4 Let X
1
, , X
n
be a random sample from the
exponential distribution with pdf
f(x) =
1

e
x/
, 0 < x < +, 0 < < +.
1. Find the large sample distribution of the method of moments
estimator of ,

.
2. Set up a large sample test of H
0
: =
0
versus H
a
: <
0
using level . Specify the rejection region.
10 Hypothesis Testing 29
3. Using a random sample of size n = 64 and level = 0.05, test the
hypothesis H
0
: = 10 versus H
0
: < 10 when the sample mean
is x = 7.7. State your conclusion.
10 Hypothesis Testing 30
Large Sample Test
ExampleTwo-sample Z-test
Example 10.2.5 Samples of 36 males and 40 females tested to
determine their temperature preference. Assume that variances are
known.
Samples:
Y
1,1
, Y
1,2
, , Y
1,n
m
, and Y
2,1
, Y
2,2
, , Y
2,n
f
.
Males: n
m
= 36,
2
m
= 4.0, y
m
= 74.6
Females: n
f
= 40,
2
f
= 2.5, y
f
= 76.5
Do females and males dier with respect to their temperature
preferences? Conduct the test at level = 0.01.
Let
m
and
f
denote the mean temperature preference of males and
females, respectively.
10 Hypothesis Testing 31
Solution:
10 Hypothesis Testing 32
Solution, continued.
Note that if
2
m
and
2
f
are unknown, since both sample sizes are
large, we would substitute their estimators from each sample, s
2
m
and
s
2
f
, where, for example, S
2
m
=
1
n
m
1

n
m
j=1
(Y
i,j


Y
m
)
2
. We may do
this because S
2
m
(and S
2
f
) are consistent estimators of
2
m
(and
2
f
).
10 Hypothesis Testing 33
Dierence between two population proportions
Example: 10.33 on WMS. A political researcher believes that the
fraction p
1
of Republicans strongly in favor of the death penalty is
greater than the fraction p
2
of Democrats strongly in favor of the
death penalty. He acquired independent random samples of 200
Republicans and 200 Democrats and found 46 Republicans and 34
Democrats strongly favoring the death penalty. Does this evidence
provide statistical support for the researchers belief? Use = 0.05.
10 Hypothesis Testing 34
10.3 Sample Size and Power
Suppose Y
1
, Y
2
, , Y
n
is a rs from N(,
2
). We wish to test
H
0
: =
0
versus H
a
: >
0
.
Question: can we nd a sample size n, which guarantees that the
Type I & Type II error probabilities will not exceed and
respectively? Here and are prespecied values.
10 Hypothesis Testing 35
Recall the level test procedure for this hypothesis testing problem.
Note that under H
0
: =
0
,
Z =

Y
0
/

n
N(0, 1).
Therefore from the large sample test constructed earlier, we reject H
0
when
z
obs
> z

.
That is, the Rejection Region (RR) is:
RR = {

Y :

Y > k}, where k =
0
+z

n
.
10 Hypothesis Testing 36
Therefore, the Type II error probability is:
(
a
) = P(Do not reject H
0
| =
a
)
= P(

Y k| =
a
)
= P
_

Y k|

Y N(
a
,
2
/n)
_
= P
_

Y
a
/

n
<
k
a
/

n
_
= P(Z <
k
a
/

n
),
where k =
0
+z

n
.
We need P(Z <
k
a
/

n
) = . Note that P(Z < z

) = . Therefore,
we need to set
z

=
k
a
/

n
=

0
+z

n

a
/

n
=

0

a
/

n
+z

10 Hypothesis Testing 37
Solving
z

=

0

a
/

n
+z

for n:

n

0

= z

That is
n =
(z

+z

)
2

2
(
0

a
)
2
=
_
(z

+z

a
_
2
Remark: the value of n depends on
a
. So we can not nd a sample
size n irrespective of the value of
a
(>
0
for right-tailed test)
10 Hypothesis Testing 38
Sample Size DeterminationSummary
For upper- or lower-tailed level tests, to control the Type II error
probability at when =
a
under H
a
, the required sample size is
_
(z

+z

a
_
2
For two-tailed level tests, the required sample size is:
n =
_
(z
/2
+z

a
_
2
Remark: the above conclusions hold for the z-type tests based on
the normal distribution (or approximate), which applies when the test
statistic is normally distributed (or approximately for large samples)
10 Hypothesis Testing 39
Sample Size Determination: Example
Example 10.3.1 An SAT prep course claims to increase average
Verbal SAT scores.
To test the claim, n candidates will be selected at random to receive
the training and take the test.
It is known that in the population under study, SAT-V scores are
distributed
N( = 565,
2
= 40
2
)
We want the probability of making a type II error, when there is a 15
point mean increase, to be 0.1(or less), when the level is = 0.05.
Determine how many candidate should be selected.
10 Hypothesis Testing 40
Power of the Test
The power of a test at a particular value in the alternative, =
a
, is
dened as
Power(
a
) = P{Reject H
0
| =
a
} = 1 (
a
)
In the SAT example, we nd that if n is 61 and the true mean SAT-V
score of people taking the SAT prep course is 580 (increases 15
points), then the power of the test is
Power(580) = 1 0.1 = 0.9
10 Hypothesis Testing 41
Power curve for SAT-V example (for n = 61)
10 Hypothesis Testing 42
Power curve for SAT-V example (for
a
= 580)
10 Hypothesis Testing 43
Another exampleBeer tasting
Refer to Example 10.2.3: H
0
: p = 1/3 versus H
0
: p > 1/3.
RR for a level = 0.05 test is:
RR : z
obs
=
p p
0
_
p
0
(1 p
0
)/n
> 1.645,
where p
0
= 1/3, p = Y/n, Y is the number of tasters that answered
correctly. That is,
RR : p > k, where k = p
0
+
_
p
0
(1 p
0
)
n
z

Calculate (0.5), the Type II error rate when p


a
= 0.5.
Recall n = 100.
10 Hypothesis Testing 44
Solution:
10 Hypothesis Testing 45
Q: Calculate the minimize sample size n to control (0.5) at 0.01.
We need solve n such that
k p
a
_
p
a
(1 p
a
)/n
= z

, (1)
where k = p
0
+
_
p
0
(1p
0
)
n
z

also depends on n. Solving


p
0
+
_
p
0
(1p
0
)
n
z

p
a
_
p
a
(1 p
a
)/n
= z

n =
_
z

_
p
a
(1 p
a
) +z

_
p
0
(1 p
0
)
p
0
p
a
_
2
Thus for this example, p
0
= 1/3, p
a
= 0.5, z

= 1.645, z

= 2.33, so
n = 135
10 Hypothesis Testing 46
Power curve for Beer example (for n = 100)
10 Hypothesis Testing 47
Type II error rate for Beer example (for p
a
= 0.5)
10 Hypothesis Testing 48
10.4 Test/condence interval relationship
Hypothesis Testing and Condence Intervals
Hypothesis testing has a close connection to condence intervals (CI)
in the sense that condence intervals are often the
complement of rejection regions
The complement of the RR is sometimes called the acceptance
region
Consider the problem of two-tailed alternatives:
H
0
: =
0
H
a
: =
0
Test statistic: Z =

Rejection region:
RR = {Z : |Z| > z
/2
}
10 Hypothesis Testing 49
From the above, the Acceptance Region is:
RR =
_
|Z| z
/2
_
=
_

0


z
/2

_
.
Notice that a 100(1 )% CI for is:

z
/2

.
Therefore,
Reject H
0
if and only if
0
/ CI
In other words, testing for a level two-tailed alternative is equivalent
to checking if the hypothesized value of (=
0
) lies in the
100(1 )% CI for .
A similar relationship exists between one-sided alternative hypotheses
and one-sided condence intervals.
10 Hypothesis Testing 50
Test-condence interval relationshipExample
Refer to Example 10.2.5 (Male and Female Temperature Experience).
Males: n
m
= 36,
2
m
= 4.0, y
m
= 74.6
Females: n
f
= 40,
2
f
= 2.5, y
f
= 76.5
Let
m
and
f
denote the mean temperature preference of males and
females, respectively.
Construct a 99% condence interval for
m

f
. Is the value

f
= 0 contained in the condence interval? Based on the
interval, should we reject H
0
:
m

f
= 0?
10 Hypothesis Testing 51
10.5 The p-value
Observed signicance level
Often misunderstood
P(these data or more extreme; H
0
is true)
Reject H
0
at level p value <
Denition: the smallest level of signicance at which H
0
can be
rejected.
10 Hypothesis Testing 52
Refer to Example 10.2.4.
H
0
: = 10 versus H
a
: < 10; z
obs
= 1.84
If = 0.1, z

= 1.28, RR: z
obs
< 1.28, conclusion: Reject H
0
If = 0.05, z

= 1.65, RR: z
obs
< 1.65, conclusion: Reject H
0
...
If = 0.025, z

= 1.96, RR: z
obs
< 1.96, so Do not reject H
0
Questions:
Calculate P(Z < 1.28) and P(Z < 1.85)
What is the smallest so that H
0
will be rejected?
For what values, we can reject H
0
? for any > 0.03
Normal Calculator:
http://www.stat.tamu.edu/

west/applets/normaldemo.html
10 Hypothesis Testing 53
Calculation of p-values
The p-value is the probability of obtaining a test statistic value as
extreme as the observed value, calculated assuming H
0
is true.
Consider testing H
0
: =
0
. Suppose the test statistic Z N(0, 1)
(or approximately) under H
0
.
Left-tailed test H
a
: <
0
, p-value=P(Z < z
obs
)
Right-tailed test H
a
: >
0
, p-value=P(Z > z
obs
)
Two-tailed test H
a
: =
0
,
p-value=P(Z < |z
obs
| OR Z > |z
obs
|) = 2P(Z > |z
obs
|)
The more extreme observed test statistic value
smaller p-value
more evidence to reject H
0
10 Hypothesis Testing 54
Calculation of p-value
Example 10.5.1 Refer to the Beer Tasting example 10.2.3.
H
0
: p = 1/3, H
a
: p > 1/3, = 0.05
y = 40 out of n = 100 correct ids observed. Calculate the p-value and
conclude.
10 Hypothesis Testing 55
Example 10.5.2 (10.57 of WMS)
A publisher of a newsmagazine ad found through past experience that
60% of subscribers renew their subscription. In a recent random
sample of n = 200 subscribers, 108 indicated that they planned to
renew. What is the p-value associated with the test that the current
rate of renewal diers from the previously experienced? State your
conclusion using = 0.05. How about = 0.1?
BTW, does the total number of subscribers, N matter?
10 Hypothesis Testing 56
Example 10.5.3 Refer to Example 10.2.5 (Male and Female
Temperature Preference).
Test H
0
:
m

f
= 0 versus H
a
:
m

f
= 0. Calculate the
p-value and conclude using = 0.01.
Males: n
m
= 36,
2
m
= 4.0, y
m
= 74.6
Females: n
f
= 40,
2
f
= 2.5, y
f
= 76.5
10 Hypothesis Testing 57
10.6 Testing means in small samples (normal)
Recall: The z-test
Testing Means in Normal Samples with Known Variances
Assumption: Y
1
, , Y
n
a rs from N(,
2
), is known
H
0
: =
0
versus
H
a
: =
0
(two-tailed test)
H
a
: <
0
(lower-tailed test)
H
a
: >
0
(upper-tailed test)
Test statistic:
Z =

Y
0
/

n
Under H
0
: Z N(0, 1)
10 Hypothesis Testing 58
Two-tailed test:
H
a
: =
0
RR =
_
z
obs
: |z
obs
| > z
/2
_
p-value = P(|Z| > |z
obs
|) = 2P(Z > |z
obs
|)
Right-tailed test:
H
a
: >
0
RR = {z
obs
: z
obs
> z

}
p-value = P(Z > z
obs
)
Left-tailed test:
H
a
: <
0
RR = {z
obs
: z
obs
< z

}
p-value = P(Z < z
obs
)
10 Hypothesis Testing 59
Recall: The large-sample z-test
Testing Means in Large Samples with Unknown Variances
Assumption: Y
1
, , Y
n
a rs with common mean and
unknown variance
2
, where n is large.
H
0
: =
0
versus
H
a
: =
0
(two-tailed test)
H
a
: <
0
(left-tailed test)
H
a
: >
0
(right-tailed test)
Test statistic:
Z =

Y
0
S/

n
,
S
2
is the sample variance.
Under H
0
: Z
approx
N(0, 1)
RR and p-value calculations are the same as the previous z-test
10 Hypothesis Testing 60
Two-tailed test:
H
a
: =
0
RR =
_
z
obs
: |z
obs
| > z
/2
_
p-value = P(|Z| > |z
obs
|) = 2P(Z > |z
obs
|)
Right-tailed test:
H
a
: >
0
RR = {z
obs
: z
obs
> z

}
p-value = P(Z > z
obs
)
Left-tailed test:
H
a
: <
0
RR = {z
obs
: z
obs
< z

}
p-value = P(Z < z
obs
)
10 Hypothesis Testing 61
The t-test
Testing Means in Small Normal Samples
Y
1
, , Y
n
a rs from N(,
2
),
2
unknown and n is small
H
0
: =
0
versus
H
a
: =
0
(two-tailed test)
H
a
: <
0
(lower-tailed test)
H
a
: >
0
(upper-tailed test)
Signicance level:
Test statistic:
T =

Y
0
S/

n
,
where

Y and S
2
are the sample mean and variance.
Under H
0
, T t
n1
10 Hypothesis Testing 62
Two-tailed test:
H
a
: =
0
RR =
_
t
obs
: |t
obs
| > t
n1,/2
_
p-value = P(|T
n1
| > |t
obs
|) = 2P(T
n1
> |t
obs
|),
where T
n1
is a rv following the t
n1
distribution.
Right-tailed test:
H
a
: >
0
RR = {t
obs
: t
obs
> t
n1,
}
p-value = P(T
n1
> t
obs
)
Left-tailed test:
H
a
: <
0
RR = {t
obs
: t
obs
< t
n1,
}
p-value = P(T
n1
< t
obs
)
T distribution calculator:
http://www.stat.tamu.edu/

west/applets/tdemo.html
10 Hypothesis Testing 63
Example: IQ Test
Example 10.6.1 Ten sampled students aged 18-21 years received
special training. They are given an IQ test that is N(100, 10
2
) in the
general population. Let be the mean IQ of these students who
received special training. The observed IQ scores:
121, 98, 95, 94, 102, 106, 112, 120, 108, 109
Test if the special training improves the IQ score using signicance
level = 0.05.
10 Hypothesis Testing 64
Solution, continued
. . . p = .029, so that at level = .05, H
0
is rejected, and the observed
sample mean,

Y is signicantly greater than = 100.
10 Hypothesis Testing 65
Small Sample TestsTwo-Sample t-test
Independent random samples of size n
1
and n
2
from populations
N(
1
,
2
) and N(
2
,
2
), where is unknown.
H
0
:
1

2
= D
0
(e.g. D
0
= 0) versus
H
a
:
1

2
= D
0
(or
1

2
< D
0
or
1

2
> D
0
)
Signicance level
Test statistic:
T =

Y
1


Y
2
D
0
S
p
_
1
n
1
+
1
n
2
where

Y
1
and

Y
2
are the sample means and S
2
1
and S
2
2
are the
sample variances from two groups.
10 Hypothesis Testing 66
and the pooled estimator of the common variance
2
is
S
2
p
=
(n
1
1)S
2
1
+ (n
2
1)S
2
2
n
1
+n
2
2
.
Under H
0
: T t
n
1
+n
2
2
Two-tailed test:
H
a
:
1

2
= D
0
RR =
_
t
obs
: |t
obs
| > t
/2,n
1
+n
2
2
_
p-value = P(|T
n
1
+n
2
2
| > |t
obs
|) = 2P(T
n
1
+n
2
2
> |t
obs
|),
where T
n
1
+n
2
2
is a rv following the t
n
1
+n
2
2
distribution.
10 Hypothesis Testing 67
Example: Recovery time for new drug
Example 10.6.2 Twenty subjects randomized to two groups, n = 10
each. The recovery time for patients taking a new drug (or placebot)
is measured in days. Data follow
with drug (1): 15 10 13 7 9 8 21 9 14 8
placebo(2): 15 14 12 8 14 7 16 10 15 12
Assume that the data are normally distributed and that
1
=
2
. Use
= 0.05 to test H
0
:
1

2
= 0 versus H
a
:
1

2
< 0
10 Hypothesis Testing 68
Two-Sample t-test (Unequal variances)
Basic Assumptions
1. X
1
, , X
m
is a random samples from N(
1
,
2
1
), and
1
is
unknown.
2. Y
1
, , Y
n
is a random samples from N(
2
,
2
2
), and
2
is
unknown.
3. The X and Y samples are independent of each other.
The standardized variable
(

X

Y ) (
1

2
)
_
S
2
1
m
+
S
2
2
n
has approximately a t distribution with degree of freedom estimated
from the data by
=
(
s
2
1
m
+
s
2
2
n
)
2
(s
2
1
/m)
2
m1
+
(s
2
2
/n)
2
n1
(round down)
10 Hypothesis Testing 69
Null hypothesis: H
0
:
1

2
= D
0
.
Test statistic value:
t
obs
=
( x y) D
0
_
s
2
1
m
+
s
2
2
n
.
Alternative Hypothesis Rejection Region for Level Test
H
a
:
1

2
>
0
t
obs
t
,
H
a
:
1

2
<
0
t
obs
t
,
H
a
:
1

2
=
0
t
obs
t
,/2
OR t t
,/2
The p-values can be calculated as in the two-sample t-test with equal
variance.
10 Hypothesis Testing 70
Sign Test - Nonparametric Test
Random sample size n from a continuous distribution with median .
H
0
: =
0
versus H
a
: >
0
Signicance level
Test statistic: S =No of observations >
0
Under H
0
: S Bin(n, 0.5)
Form of RR = {s
obs
k}
p-value =P(S s
obs
)
10 Hypothesis Testing 71
Sign Test
Example: IQ test
Refer to Example 10.6.1:
The observed IQ scores:
121, 98, 95, 94, 102, 106, 112, 120, 108, 109
H
0
: = 100 versus H
a
: > 100
= 0.05
Observed test statistic value: s
obs
= 7
Under H
0
: S Bin(10, 0.5)
p-value=P(S 7) = 1 P(S 6) = 1 0.828 = 0.172.
Conclusion: do not reject H
0
10 Hypothesis Testing 72
10.7 Testing Variances in Small Normal Samples
Random sample from N(,
2
). Consider
H
0
:
2
=
2
0
versus
H
a
:
2
=
2
0
(or
2
<
2
0
or
2
>
2
0
)
Signicance level
Test statistic:

2
=
(n1)S
2

2
0
(
2
n1
under H
0
)
10 Hypothesis Testing 73
Two-tailed test:
H
a
:
2
=
2
0
RR :
2
obs
>
2
/2,n1
or
2
obs
<
2
1/2,n1
p-value = 2 min{P(
2
>
2
obs
), P(
2
<
2
obs
)}
Right-tailed test:
H
a
:
2
>
2
0
RR :
2
obs
>
2
,n1
p-value = P(
2
>
2
obs
)
Left-tailed test:
H
a
:
2
<
2
0
RR :
2
obs
<
2
1,n1
p-value = P(
2
<
2
obs
)
10 Hypothesis Testing 74
Example: IQ test
Example 10.7.1 IQ Example 10.6.1: H
0
:
2
= 100 versus
H
a
:
2
> 100. Use signicance level = 0.05.
Recall: y = 106.5, s = 9.5.
Solution:
Observed test statistic value:
Rejection region:
p-value=
10 Hypothesis Testing 75
Two Sample Variance Tests
Suppose that S
2
1
and S
2
2
are the sample variances for two independent
random samples of size n
1
and n
2
from distributions N(
1
,
2
1
) and
N(
2
,
2
2
). All parameters are unknown.
The forms
U
j
=
(n
j
1)S
2
j

2
j
, j = 1, 2
are independent
2
random variables with (n
1
1) and (n
2
1)
degrees of freedom, respectively, so that
F =
U
1
n
1
1
/
U
2
n
2
1
=
S
1

2
2
S
2
2

2
1
F
n
1
1,n
2
1
.
Thus, when
2
1
=
2
2
,
F =
S
2
1
S
2
2
F
n
1
1,n
2
1
10 Hypothesis Testing 76
To test the equality of the two population variances:
H
0
:
2
1
=
2
2
versus
H
a
:
2
1
=
2
2
(or
2
1
>
2
2
or
2
1
<
2
2
)
Signicance level
Test statistic: F = S
2
1
/S
2
2
RR (two-tailed test): F > F
n
1
1,n
2
1,/2
or
F < F
n
1
1,n
2
1,1/2
= (F
n
2
1,n
1
1,/2
)
1
RR (right-tailed test): F > F
n
1
1,n
2
1,
= (F
n
2
1,n
1
1,1
)
1
RR (left-tailed test): F < F
n
1
1,n
2
1,1
= (F
n
2
1,n
1
1,
)
1
10 Hypothesis Testing 77
Two Sample Variance Tests: Example
Example 10.7.2 Compare the variances of the amount of active
ingredients in generic and brand-name drugs. Random samples of
size 20 (generic) and 30 (brand-name). Data: s
2
g
= 0.00109mg
2
,
s
2
b
= 0.000384mg
2
. Use level = 0.05 to test
H
0
:
2
g
=
2
b
versus H
a
:
2
g
>
2
b
10 Hypothesis Testing 78
10.8 Neyman-Pearson - MP -level test
Consider a test involving a parameter with test statistic W and
rejection region RR. The power of the test:
Power() = P(reject H
0
when the parameter value is )
= P(W RR, when the parameter value is )
Relationship between power and ,
Suppose H
0
: =
0
,
a
is a parameter value under H
a
. Then
Power(
0
) = = P(Reject H
0
when H
0
is true)
Power(
a
) = 1 (
a
)
We would like to choose a level (Type I error) RR to maximize the
Power() for in H
a
, i.e. nd the Most Powerful (MP) -level
test.
10 Hypothesis Testing 79
Neyman-Pearson Lemma
MP -level Tests
Y
1
, , Y
n
is a rs from a distribution with parameter and likelihood
L(). We wish to test:
H
0
: =
0
versus H
a
: =
a
,
using level of signicance , where
0
and
a
are given.
Theorem 1 The Neyman-Pearson Lemma For the given level
of signicance, , the test that maximizes Power(
a
) has a RR with
the form:
RR :
L(
0
)
L(
a
)
< k,
where k is chosen to insure that the level (Type I error probability) is
. The Most Powerful (MP) - level test is sometimes called the best
test.
10 Hypothesis Testing 80
Remark: The Neyman-Pearson Lemma gives the Rejection Region,
RR, that maximizes the power:
Power(
a
) = P(RR| =
a
)
given that
P(RR| =
0
) =
10 Hypothesis Testing 81
MP -level Test
Example: Beta(, 1) (n = 1)
Example 10.8.1 One observation n = 1, Y from Beta(, 1) with
pdf:
f(y|) = y
1
, 0 < y < 1
(a) Use the N-P lemma to nd the = .05 MP test of
H
0
: = 2 versus H
a
: = 1
(b) For the MP 0.05-level test derived above, calculate Power(1)
Solution:
Likelihood L() = f(y|) = y
1
Therefore, in this case
L(
0
)
L(
a
)
=
L(2)
L(1)
=
2y
1y
0
= 2y, 0 < y < 1
10 Hypothesis Testing 82
By N-P Lemma, the rejection region for the MP test:
RR = {2Y < k} or {Y < k/2}
Determining k:
= P(RR| =
0
= 2)
= P(Y < k/2| = 2)
=
_
k/2
0
2ydy = y
2
|
k/2
0
= (k/2)
2
k/2 =

k = 2

Therefore, for this problem, the MP 0.05-level test has the rejection
region
RR : Y <

0.05 = 0.2236
10 Hypothesis Testing 83
(b) Solution (compute Power(1)):
Power(1) = P(RR| =
a
= 1)
= P(Y < 0.2236| = 1)
=
_
0.2236
0
1y
0
dy =
_
0.2236
0
1dy
= y|
0.2236
0
= 0.2236.
10 Hypothesis Testing 84
MP -level Test
Normal Sample
2
known
Y
1
, , Y
n
rs N(,
2
),
2
known. Test
H
0
: =
0
versus H
a
: =
a
, where
a
>
0
The pdf of Y
i
is
f(y|) =
1

2
exp
_

(y )
2
2
2
_
, < y < +.
Use the N-P lemma to nd the MP -level test procedure.
Solution:
From the pdf, we obtain the likelihood for :
L() = f(y
1
|)f(y
2
|) f(y
n
|)
=
_
1

2
_
n
exp
_

i=1
(y
i
)
2
2
2
_
10 Hypothesis Testing 85
By the N-P Lemma, the FORM of the MP -level test rejection region
is:
RR :
L(
0
)
L(
a
)
< k.
L(
0
)
L(
a
)
=
_
1

2
_
n
_
1

2
_
n

exp
_

n
i=1
(y
i

0
)
2
2
2
_
exp
_

n
i=1
(y
i

a
)
2
2
2
_ < k
exp
_

1
2
2
_
n

i=1
(y
i

0
)
2

i=1
(y
i

a
)
2
__
< k

1
2
2
_
n

i=1
(y
i

0
)
2

i=1
(y
i

a
)
2
__
< ln(k)

_
n

i=1
(y
i

0
)
2

i=1
(y
i

a
)
2
_
>2
2
ln(k)
10 Hypothesis Testing 86
RR :
_
n

i=1
(y
i

0
)
2

i=1
(y
i

a
)
2
_
>2
2
ln(k)

_
n

i=1
y
2
i
2n y
0
+n
2
0
_

_
n

i=1
y
2
i
2n y
a
+n
2
a
_
> 2
2
ln(k)
2n y(
a

0
) +n
2
0
n
2
a
> 2
2
ln(k)
2n y(
a

0
) > 2
2
ln(k) n
2
0
n
2
a
since
a

0
> 0
RR : y >
2
2
ln(k) n
2
0
n
2
a
2n(
a

0
)
.
Note that the right hand side of the above does not involve the data,
so the inequality is equivalent to y > k

So the form of the RR becomes:


RR : y > k

10 Hypothesis Testing 87
RR : y > k

To determine k

, we set
= P(

Y RR| =
0
)
= P(

Y > k

| =
0
)
= P
_

Y
0
/

n
>
k

0
/

n
| =
0
_
= P
_
Z >
k

0
/

n
_
Therefore,
k

0
/

n
= z

=
0
+z

n
10 Hypothesis Testing 88
Thus, the MP -level test of
H
0
: =
0
versus H
a
: =
a
, where
a
>
0
has the rejection region:
RR : y >
0
+z

n
or equivalently:
RR : Z =
y
0
/

n
> z

.
10 Hypothesis Testing 89
Uniformly Most Powerful (UMP) -level test
In the Normal sample example with known , note that the test does
not depend on
a
(except that we need the assumption
a
>
0
).
Because of this, the test is the most powerful level test for
H
a
: =
a
, where
a
>
0
. We call such test the Uniformly
Most Powerful (UMP) -level test of
H
0
: =
0
versus H
a
: >
0
.
Simple hypothesis: hypothesis that uniquely species the
distribution of the population from which the sample is taken.
Composite hypothesis: not a simple hypothesis.
Eg. for the above normal sample example with known H : =
0
is
simple, while H : >
0
is composite
10 Hypothesis Testing 90
MP -level test
Exponential Sample
Example 10.8.2 Y
1
, , Y
n
rs from Exp() with pdf:
f(y) =
1

e
y/
, 0 < y < +
(1) Using N-P Lemma to construct the MP -level test for
H
0
: =
0
versus H
a
: =
a
, where
a
>
0
Hint:

n
i=1
Y
i
Gamma(n, ).
10 Hypothesis Testing 91 10 Hypothesis Testing 92
(2) Construct the UMP -level test for
H
0
: =
0
versus H
a
: >
0
.
10 Hypothesis Testing 93
(3) For n = 36, we wish to test
H
0
: = 1 versus H
a
: = 2(or > 1)
using level = 0.01. Use StaTable to nd the critical value at
http: // www. cytel. com/ Products/ StaTable/ or
http: // mcsp. wartburg. edu/ nmb/ fall10/ math313/
seeingstats/ Chpt4/ gammaProb. html
Gamma(, ), is the shape parameter and is the scale
parameter.
10 Hypothesis Testing 94
(4) (Large Sample Approach). Using the CLT to obtain an
approximate test for the hypotheses in (3).
10 Hypothesis Testing 95
Summary: Neyman-Pearson Lemma
MP -level Tests
N-P Lemma provides the test statistic and form of the RR for the
MP test. The constant (critical value) must be determined to
assure that the test is level .
It is not always possible to nd a UMP test.
The N-P Lemma cannot be applied if there are unknown
parameters other than .
If the rvs in the random sample are discrete, then it is usually not
possible to achieve a given level of signicance, .
10 Hypothesis Testing 96
10.9 Likelihood Ratio Test
Likelihood Ratio Test (LRT)
An approach for developing tests when either or both of the
hypotheses are composite.
Can be used when the model for the data has more than one
parameter:

1
,
2
, ,
k
We will refer to the vector of parameters:
= (
1
,
2
, ,
k
)
The likelihood of the random sample:
L(
1
,
2
, ,
k
) = L()
10 Hypothesis Testing 97
LRT: Normal Example
Let Y
1
, , Y
n
be a random sample from N(,
2
), where
2
is
unknown. Then
= (,
2
).
We wish to test
H
0
: =
0
where H
a
: =
0
Note that the null hypothesis is actually:
H
0
: =
0
,
2
> 0,
i.e. H
0
is not simple (it is composite).
In this situation, H
0
states that the parameters fall in a particular set,
i.e. is in set
0
. We write
0
: the parameter space under
the null hypothesis.
10 Hypothesis Testing 98
For our example,
H
0
:
0
= {(,
2
) : =
0
,
2
> 0}
The alternative states:
H
a
: {(,
2
) : =
0
,
2
> 0} =
a
,
where
a
denotes the parameter space under the alternative
hypothesis.
10 Hypothesis Testing 99
We denote the union of the sets
0
and
a
by , i.e.

0

a
=
In our example:
=
0

a
= {(,
2
) : =
0
,
2
> 0} {(,
2
) : =
0
,
2
> 0}
= {(,
2
) : < < +,
2
> 0}
That is, = set of all possible values of the parameters, without
regard to the hypotheses.
10 Hypothesis Testing 100
Likelihood Ratio Test (LRT)
Notations:
L(

0
) = max

0
L()
denotes the maximum of the likelihood of the parameter values in
0
(under H
0
).
L(

) = max

L()
denotes the maximum of the likelihood of the parameter values in .
10 Hypothesis Testing 101
Likelihood Ratio Test (LRT)
This is always true:
L(

0
) L(

),
since the space contains
0
. If the maximum over falls in
0
then
L(

0
) = L(

a
)
Evidence that H
0
is false (and H
a
is true) is that
L(

0
) << L(

a
)
Likelihood ratio test (LRT) statistic:
=
L(

0
)
L(

)
10 Hypothesis Testing 102
Likelihood Ratio Test (LRT)
This is always true: 0 1
Rejection region:
RR : =
L(

0
)
L(

)
k,
k is chosen to achieve a preselected level, .
10 Hypothesis Testing 103
LRT: Normal Example
Let Y
1
, , Y
n
be a random sample from N(,
2
), where
2
is
unknown. Then
= (,
2
).
We wish to test
H
0
: =
0
versus H
a
: =
0

0
= {(,
2
) : =
0
,
2
> 0}
= {(,
2
) : < < +,
2
> 0}
Use the LRT method to nd the RR for a level test.
10 Hypothesis Testing 104
10 Hypothesis Testing 105 10 Hypothesis Testing 106
10 Hypothesis Testing 107
LRT: Large Sample RR
Theorem 2 Let Y
1
, , Y
n
have a joint likelihood L(). Let
r
0
= # free parameters in
0
r = # free parameters in
Assuming that certain regularity conditions hold, then under H
0
and
for large n, 2ln() has approximately a
2
distribution with r r
0
degrees of freedom.
10 Hypothesis Testing 108
LRT: Normal Example
Using Theorem ?? to derive the level large sample RR.
10 Hypothesis Testing 109
LRT Example 2: Poisson Dispersion Test
Let X
1
, X
2
, , X
n
be independent rv from
Poisson(
i
), i = 1, 2, , n with pmf:
P(X
i
= x
i
) =
e

x
i
i
x
i
!
, x
i
= 0, 1, ;
i
> 0
We wish to test
H
0
:
i
= , i = 1, 2, , n
versus
H
a
:
i
are not all equal.
Assume that n is large. Construct an approximate level LRT.
10 Hypothesis Testing 110
Example 2: Poisson Disperson Test
NIST test of asbestos bers, # ber on 23 squares on a grid:
31 29 19 18 31 28 34 27 34 30 16 18 26 27 27 18 24 22 28 24 21 17 24
10 Hypothesis Testing 111
Example 3: Binomials
Let X
i
Binomial(n
i
, p
i
), i = 1, 2 be independent. We wish to test
H
0
: p
1
= p
2
versus H
a
: p
1
= p
2

0
= {(p
1
, p
2
) : 0 < p
1
= p
2
< 1}

a
= {(p
1
, p
2
) : 0 < p
1
< 1, 0 < p
2
< 1}
Both hypotheses are composite. Suppose n
i
are large. Carry out an
approximate level LRT.
10 Hypothesis Testing 112
Example 3: Binomials
Clinical Trial: Allergy Medicine versus Placebo
Randomization of 3774 subjects:
Allergy medicine group: n
1
= 2103, x
1
= 547 reported headaches
Placebo group: n
2
= 1671, x
2
= 368 reported headaches
Test whether the proportion of those reporting headaches in dierent
in the two groups, using signicance level = 0.05.
Ref: Michael Sullivan, III. (2004) Statistics: Informed Decisions Using Data.
10 Hypothesis Testing 113
Summary: LRT
The likelihood ratio approach does not guarantee an optimum test
(unlike the N-P Lemma).
Using the likelihood ratio approach will customarily provide an
acceptable test.
Unlike that N-P Lemma, the likelihood ratio approach can be
applied where the underlying model has nuisance parameters
(parameter not of particular interest).
10 Hypothesis Testing 114
Summary of Chapter 10
Four Basic Elements of a Statistical Test: (1) H
0
; (2) H
a
; (3)
test statistic; (4) rejection region
Error probabilities:
Type I error probability (level): = P(reject H
0
H
0
is
true) (sending an innocent person to jail)
Type II error probability: (
a
) = P(fail to reject H
0
H
a
is true with =
a
) (setting a guilty person free)
Power() = P(reject H
0
with parameter value )
Large sample tests
Suppose

N(,
2

) or approximately. For instance, apply


CLT when

is consistent and unbiased for , and n is large.

2

is known or can be estimated consistently by


2

H
0
: =
0
versus H
a
: >
0
10 Hypothesis Testing 115
Test statistic: T =

or Z = (


0
)/

RR:

:

> z

or equivalently

>
0
+z

P-value=P(Z > z
obs
) (for right tailed z-test)
Typical examples:
Test on population means (one-sample z-test)
Test on two population mean dierence (two-sample z-test)
Test on population proportions (one-sample z-test)
Test on two proportion dierence (two-sample z-test)
Calculation of (
a
) and Power for a level test procedure
Determination of sample size to control the Type II error
probability at level for a level test procedure. Examples
discussed include
Test on means with one-sample z test (SAT prep course)
Test on means with two-sample z test (WMS 10.44)
10 Hypothesis Testing 116
Test on population proportions with one-sample z test
(Beer-tasting)
Test-CI relationship: CIs are complements of RRs, which are also
called acceptance regions
p-value
Observed signicance level
Smallest level of signicance, for which the observed value
indicates that H
0
should be rejected
Tail area captured by the observed test statistic
Reject H
0
when p-value <
Testing mean in small normal samples
One sample t-test
Two sample t-test (assuming equal variance)
10 Hypothesis Testing 117
Testing variance in small normal samples

2
-test for one population variance
F-test for testing the equality of two population variances
Neyman-Pearson Lemma
Use N-P Lemma to construct MP -level test for testing
simple hypotheses H
0
and H
a
Based on the MP test, construct uniformly MP -level test for
composite hypotheses
Only one unknown parameter is involved
Likelihood-ratio test
Test statistic: =
L(

0
)
L(

)
L(

0
) (restricted likelihood): the maximum value of
the likelihood when the parameters are restricted (and
reduced in number) based on the assumption of H
0
10 Hypothesis Testing 118
L(

) (unrestricted likelihood): the maximum value of


the likelihood when some the parameters are unrestricted,
i.e. obtained under the entire parameter space
Construct the RR for a level based on the sampling
distribution of
Large sample procedure: for large n, when the distribution
satises some regularity conditions, 2ln()
2
rr
0
,
r: No of free parameters in the whole parameter space
(unrestricted)
r
0
: No of free parameters in the restricted parameter space
(under H
0
)