Ch10 4page

10 Hypothesis Testing 1
10 Hypothesis Testing
Hypothesis testing is used to investigate whether or not data are
consistent with some theory, when that theory can be quantied
through a particular value of a (population) parameter.
10.1 Four Basic Elements of a Hypothesis Test
Null hypothesis H
0
Alternative hypothesis H
a
Test statistic
Rejection region, RR (also called critical region)
1. The Null Hypothesis: denoted by H
0
.
The null hypothesis H
0
is an assertion about the population
(parameter). The purpose of the hypothesis testing is to test the
viability of the null hypothesis in the light of experimental data.
E.g. An experiment on a new antidepressant drug:
Ten people suering from depression were sampled and treated
with the new drug, and the level of depression of all subjects
was measured after 12 weeks, denoted by Y
1
, , Y
n
.
We want to compare the mean depression level of the
drug-taken patients with that of patients not taking any drug
(known to be
0
= 6, say).
The null hypothesis would be designated by the following symbols:
H
0
: = 6.
In this course, we will usually consider the simple null
hypothesis; that is, there is only one possible value of under H
0
.
2. The Alternative Hypothesis: denoted by H
a
.
The alternative hypothesis H
a
describes values of other than
those specied in H
0
. It is usually the hypothesis that we seek to
support based on the information contained in the sample data set.
In the previous example, if the researcher believes that the mean
depression level for patients taking the new drug is
smaller than
0
= 6, i.e. the drug is eective in reducing the
depression level, the researcher will use
H
a
: < 6 (left-tailed)
larger than
0
= 6, i.e. the drug is not eective, the researcher
will use
H
a
: > 6 (right-tailed)
dierent from
0
= 6, the researcher will use
H
a
: = 6 (two-tailed)
Remark: Notice that H
0
false doesnt necessarily mean that H
a
is
true (unless the union of H
0
and H
a
constitutes the entire
parameter space).
In general, the alternative is chosen to reect the researchers
belief about the parameter. Therefore, H
a
is sometimes called the
researchers hypothesis. The aim is to see if the researchers
hypothesis is supported by the data set.
3. The Test Statistic: the test statistic is a statistic that is used
to test H
0
versus H
a
. We make our decision by comparing the
observed value of the test statistic to its sampling distribution
under H
0
. Recall that a statistic is a function of the observed data
Y
1
, , Y
n
. used for inference about (population) parameters.
For the antidepressant drug example, a test statistic would be
based on the unbiased estimator of :
T =

Y
Most often a test statistic is based on a MVUE or MLE of the
parameter of interest that describes H
0
.
If the observed value of the test statistic is consistent with its
sampling distribution under H
0
, then there is not enough
evidence for H
a
If the observed value of the test statistic is not consistent with
its sampling distribution under H
0
, and is in the direction of
sampling distribution specied under H
a
, then there is enough
evidence to reject H
0
(and support H
a
.)
4. The Rejection Region (RR): The rejection region (RR)
species the values of the test statistic for which H
0
is rejected.
If t
obs
RR, reject H
0
If t
obs
/ RR, do not reject H
0
The RR is usually located in tails of the sampling distribution

of t derived under H
0
In the depression example, suppose RR = {
Y 4} and that
y = 4 is observed. Then we reject H
0
and conclude that there
is evidence that the druge, on average, reduces depression
levels. The sample mean, y = 3, is signicantly lower than
the hypothetical mean, = 6.)
The above four elements: H
0
, H
a
, test statistic T, and RR are the
building blocks of a statistical test. Any statistical test should include
all of the above four elements.
Another Hypothesis Testing Example
Assessing ESP (Extra-sensory perception) ability
(fabricated data)
Hypotheses:
H
0
: Rachael does not have ESP (=random guessing)
H
a
: Rachael has ESP
Experiment:
A deck of 52 cards has 26 red and 26 black. The cards are shued
and one selected at random. Rachael guesses the color of the card.
Data from experiment:
Carry out n = 20 repetitions, shuing each time, observing correct or
not correct.
Test statistic:
T = Number of correct responses out of n
Distribution of Test Statistic under H
0
:
T Bin(n = 20, p),
where p is the success rate, with H
0
: p = 0.5. (Trials assumed
independent)
Restate hypotheses in terms of parameters:
H
0
: p = 0.5 [pure guessing]
H
a
: p > 0.5 [responses informed by ESP]
Rejection Region RR = {t
obs
15}.
Outcome of the experiment: t
obs
= 16
Conclusion:
Since t
obs
falls in RR, i.e. t
obs
= 16 {t
obs
15}, contradicting the
null hypothesis H
0
. We reject H
0
as implausible and conclude that the
observed rate of success ( p = .8) is signicantly higher than p
0
= 0.5.
There is evidence of ESP.
Hypothesis TestingErrors
Because we are choosing between H
0
and H
a
based on the sample
data, there is a chance that we make an error.
t
obs
RR t
obs
/ RR
Reject H
0
Do not reject H
0
H
0
true Type I error
H
0
false Type II error
Two types of errors can be made in reaching a decision.
Type I error: Reject H
0
when H
0
is true.
Type II error: Fail to reject H
0
when H
0
is false.
Error Probabilities
Probability of making a Type I error:
= P(reject H
0
|H
0
true)
is called the Level of signicance or the Level of the test.
Probability of making a type II error (at a particular/given value of
=
a
in H
a
):
= (
a
) = P(fail to reject H
0
| =
a
).
Error Probabilities
Example 10.1.1 In the ESP example, calculate and when p = .7
The Type I error probability:
= P(reject H
0
|H
0
true)
The Type II error probability for p = 0.7:
(0.7) = P(fail to reject H
0
|p = 0.7)
= P(T < 15|p = 0.7)
= P(T 14|p = 0.7)
= B(14; n = 20, p = 0.7)
= 0.584.
Practice: calculate (0.9).
Error Probabilities
Type I error probability (also referred to as level of
signicance, or less formally, false positive rate)
= P(reject H
0
|H
0
true)
Type II error probability (also referred to false negative rate):
= (
a
0
| =
a
).
Power(
a
) = 1 (
a
):
1 (
a
) = P(reject H
0
| =
a
).
Remark: Ideally we would like to reduce both and . However, with
a xed sample size n, we cannot reduce both of them. So we generally
x to a small value (say 0.05 or 0.01), and construct a RR (in this
way, will be minimized).
Example 10.1.2 Let Y Uniform(, + 1). Consider H
0
: = 0
versus H
a
: > 0. The test procedure: reject H
0
if Y > 0.95.
1. Calculate the level of signicance of this test
2. Calculate when = 0.5.
10.2 Large Sample Tests
Suppose Y
1
, , Y
n
is a rs with n large and from a distribution with
parameter of interest . Let

denote an estimator with large sample
distribution N(,
2
).
For example, if

is consistent and unbiased estimator of . By CLT,
we often have, for large n,
d
N(,
2
).
We assume that
2
is known or can be estimated by a consistent

estimator
2
.
Large Sample Tests
Notational convention: SD(
) =
and

SD(
) =
, V (
) =
2
.
Null and alternative Hypotheses:
H
0
: =
0
versus H
a
: >
0
,
where the value
0
is specied.
Test statistic: T =

RR = {
:

> k} (value of k to be determined)
Specify the level of signicance, (say 0.05 or 0.01), i.e., the
largest Type I error probability that can be tolerated.
We determine the value of k so that the corresponding type I error
probability equals the pre-specied level . That is, we determine
k by solving the equation:
= P(
> k| =
0
)
set
= P(
> k| =
0
)
= P
_
>
k
0
| =
0
_
P
_
Z >
k
0
| =
0
_
= P(Z > z
).
Therefore, to solve this equation for k, we need to set
k
0
= z
k =
0
+z
Thus,
RR =
_
:

>
0
+z
_
=
_
> z
_
.
Large Sample TestSummary Procedure
State hypotheses (right-tailed alternative):
H
0
: =
0
versus H
a
: >
0
Test Statistic:
Z =
, [Z
approx
N(0, 1) when =
0
]
Rejection region:
RR = {z
obs
: z
obs
> z
}.
If left-tailed test: H
0
: =
0
versus H
a
: <
0
, then
RR = {z
obs
: z
obs
< z
}.
If two-tailed test: H
0
: =
0
versus H
a
: =
0
, then
RR = {z
obs
: |z
obs
| > z
/2
}.
Note that the alternatives are paired with the corresponding RRs.
Large Sample TestUnknown Variance
Suppose that
is unknown and that

,
i.e.
is a consistent estimator of
. Then
Z =
Since
p
1, and
d
N(0, 1).
It follows by Sluskys Theorem that
Z =
d
N(0, 1).
Therefore, the same test procedure follows.
Large Sample TestUnknown Variance
Use Consistent Estimator of
, in the Z-test statistic.

Test Statistic:
Z =
SD(
)
, [Z
approx
N(0, 1) when =
0
]
Right-tailed test: H
0
: =
0
vs H
a
: >
0
.
RR = {z
obs
: z
obs
> z
}.
Left-tailed test: H
0
: =
0
vs H
a
: <
0
.
RR = {z
obs
: z
obs
< z
}.
Two-tailed test: H
0
: =
0
vs H
a
: =
0
.
RR = {z
obs
: |z
obs
| > z
/2
}.
Large Sample TestExamples
Example 10.2.1 A company wishes to test its claim that the average
lifetime of the tire they sell is 20,000 miles. The experiment yields
n = 36 observations with the sample mean y = 19, 375 miles. Carry
out the hypothesis test and conclude at signicance level = 0.01.
Suggested procedures:
Parameter of interest: = = mean life time for the population
Construct null and alternative hypotheses:
H
0
: = 20, 000 vs H
a
: < 20, 000
Compute the test statistic:
Construct the rejection region at level 0.01
Conclude:
Q: What if H
a
: = 20, 000?
Example 10.2.2 (10.19 of WMS) A sample of 40 independent
readings on the voltage for this circuit gave a sample mean of 128.6,
and standard deviation 2.1. Test the hypothesis that the mean output
voltage is 130 against the alternative that it is less than 130. Use a
test with level 0.05.
Large Sample TestProportions
Data: Y Bin(n, p)
Hypotheses: H
0
: p = p
0
vs H
a
: p > p
0
for a given value of p
0
(0, 1). The level is specied.
Parameter of interest: = p
Unbiased and consistent estimator

= p = Y/n

2
= V (Y/n) = p(1 p)/n. So under H

0
,
2
= p
0
(1 p
0
)/n.
Test statistic:
Z =
=
p p
0
_
p
0
(1 p
0
)/n
[by CLT , Z
d
N(0, 1) under H
0
]
RR = {z
obs
: z
obs
> z
}.
Large Sample TestProportion
Example 10.2.3 Each member of a panel of 100 tasters was
presented with three glasses of beer in random order, one of which was
dierent from the other two (e.g. AAB). Each taster was asked to
identify which beer was dierent. Let p = P(B is correctly identied).
If the tasters are unable to distinguish between the beers we would
expect p = 1/3. If they are able to distinguish we expect p > 1/3.
Suppose among 100 tasters, 40 answered correctly. Are tasters able to
distinguish? Carry out the hypothesis test at level 0.05.
Construct the appropriate null and alternative hypotheses:
Calculate the test statistic value:
Construct the rejection region at level 0.05
Conclude:
(BTW, what is the population here?)
Large Sample TestOther
Table 8.1 on page 397:
Example 10.2.4 Let X
1
, , X
n
be a random sample from the
exponential distribution with pdf
f(x) =
1
e
x/
, 0 < x < +, 0 < < +.
1. Find the large sample distribution of the method of moments
estimator of ,

.
2. Set up a large sample test of H
0
: =
0
versus H
a
: <
0
using level . Specify the rejection region.
3. Using a random sample of size n = 64 and level = 0.05, test the
hypothesis H
0
: = 10 versus H
0
: < 10 when the sample mean
is x = 7.7. State your conclusion.
Large Sample Test
ExampleTwo-sample Z-test
Example 10.2.5 Samples of 36 males and 40 females tested to
determine their temperature preference. Assume that variances are
known.
Samples:
Y
1,1
, Y
1,2
, , Y
1,n
m
, and Y
2,1
, Y
2,2
, , Y
2,n
f
.
Males: n
m
= 36,
2
m
= 4.0, y
m
= 74.6
Females: n
f
= 40,
2
f
= 2.5, y
f
= 76.5
Do females and males dier with respect to their temperature
preferences? Conduct the test at level = 0.01.
Let
m
and
f
denote the mean temperature preference of males and
females, respectively.
Solution:
Solution, continued.
Note that if
2
m
and
2
f
are unknown, since both sample sizes are
large, we would substitute their estimators from each sample, s
2
m
and
s
2
f
, where, for example, S
2
m
=
1
n
m
1
n
m
j=1
(Y
i,j

Y
m
)
2
. We may do
this because S
2
m
(and S
2
f
) are consistent estimators of
2
m
(and
2
f
).
Dierence between two population proportions
Example: 10.33 on WMS. A political researcher believes that the
fraction p
1
of Republicans strongly in favor of the death penalty is
greater than the fraction p
2
of Democrats strongly in favor of the
death penalty. He acquired independent random samples of 200
Republicans and 200 Democrats and found 46 Republicans and 34
Democrats strongly favoring the death penalty. Does this evidence
provide statistical support for the researchers belief? Use = 0.05.
10.3 Sample Size and Power
Suppose Y
1
, Y
2
, , Y
n
is a rs from N(,
2
). We wish to test
H
0
: =
0
versus H
a
: >
0
.
Question: can we nd a sample size n, which guarantees that the
Type I & Type II error probabilities will not exceed and
respectively? Here and are prespecied values.
Recall the level test procedure for this hypothesis testing problem.
Note that under H
0
: =
0
,
Z =
Y
0
/
n
N(0, 1).
Therefore from the large sample test constructed earlier, we reject H
0
when
z
obs
> z
.
That is, the Rejection Region (RR) is:
RR = {
Y :

Y > k}, where k =
0
+z
n
.
Therefore, the Type II error probability is:
(
a
) = P(Do not reject H
0
| =
a
)
= P(
Y k| =
a
)
= P
_
Y k|
Y N(
a
,
2
/n)
_
= P
_
Y
a
/
n
<
k
a
/
n
_
= P(Z <
k
a
/
n
),
where k =
0
+z
n
.
We need P(Z <
k
a
/
n
) = . Note that P(Z < z
) = . Therefore,
we need to set
z
=
k
a
/
n
=
0
+z
n

a
/
n
=

0
a
/
n
+z
Solving
z
=

0
a
/
n
+z
for n:
n

0
= z
That is
n =
(z
+z
)
2
2
(
0
a
)
2
=
_
(z
+z
a
_
2
Remark: the value of n depends on
a
. So we can not nd a sample
size n irrespective of the value of
a
(>
0
for right-tailed test)
Sample Size DeterminationSummary
For upper- or lower-tailed level tests, to control the Type II error
probability at when =
a
under H
a
, the required sample size is
_
(z
+z
a
_
2
For two-tailed level tests, the required sample size is:
n =
_
(z
/2
+z
a
_
2
Remark: the above conclusions hold for the z-type tests based on
the normal distribution (or approximate), which applies when the test
statistic is normally distributed (or approximately for large samples)
Sample Size Determination: Example
Example 10.3.1 An SAT prep course claims to increase average
Verbal SAT scores.
To test the claim, n candidates will be selected at random to receive
the training and take the test.
It is known that in the population under study, SAT-V scores are
distributed
N( = 565,
2
= 40
2
)
We want the probability of making a type II error, when there is a 15
point mean increase, to be 0.1(or less), when the level is = 0.05.
Determine how many candidate should be selected.
Power of the Test
The power of a test at a particular value in the alternative, =
a
, is
dened as
Power(
a
) = P{Reject H
0
| =
a
} = 1 (
a
)
In the SAT example, we nd that if n is 61 and the true mean SAT-V
score of people taking the SAT prep course is 580 (increases 15
points), then the power of the test is
Power(580) = 1 0.1 = 0.9
Power curve for SAT-V example (for n = 61)
Power curve for SAT-V example (for
a
= 580)
Another exampleBeer tasting
Refer to Example 10.2.3: H
0
: p = 1/3 versus H
0
: p > 1/3.
RR for a level = 0.05 test is:
RR : z
obs
=
p p
0
_
p
0
(1 p
0
)/n
> 1.645,
where p
0
= 1/3, p = Y/n, Y is the number of tasters that answered
correctly. That is,
RR : p > k, where k = p
0
+
_
p
0
(1 p
0
)
n
z
Calculate (0.5), the Type II error rate when p

a
= 0.5.
Recall n = 100.
Solution:
Q: Calculate the minimize sample size n to control (0.5) at 0.01.
We need solve n such that
k p
a
_
p
a
(1 p
a
)/n
= z
, (1)
where k = p
0
+
_
p
0
(1p
0
)
n
z
also depends on n. Solving

p
0
+
_
p
0
(1p
0
)
n
z
p
a
_
p
a
(1 p
a
)/n
= z
n =
_
z
_
p
a
(1 p
a
) +z
_
p
0
(1 p
0
)
p
0
p
a
_
2
Thus for this example, p
0
= 1/3, p
a
= 0.5, z
= 1.645, z
= 2.33, so
n = 135
Power curve for Beer example (for n = 100)
Type II error rate for Beer example (for p
a
= 0.5)
10.4 Test/condence interval relationship
Hypothesis Testing and Condence Intervals
Hypothesis testing has a close connection to condence intervals (CI)
in the sense that condence intervals are often the
complement of rejection regions
The complement of the RR is sometimes called the acceptance
region
Consider the problem of two-tailed alternatives:
H
0
: =
0
H
a
: =
0
Test statistic: Z =

Rejection region:
RR = {Z : |Z| > z
/2
}
From the above, the Acceptance Region is:
RR =
_
|Z| z
/2
_
=
_
0

z
/2
_
.
Notice that a 100(1 )% CI for is:
z
/2
.
Therefore,
Reject H
0
if and only if
0
/ CI
In other words, testing for a level two-tailed alternative is equivalent
to checking if the hypothesized value of (=
0
) lies in the
100(1 )% CI for .
A similar relationship exists between one-sided alternative hypotheses
and one-sided condence intervals.
Test-condence interval relationshipExample
Refer to Example 10.2.5 (Male and Female Temperature Experience).
Males: n
m
= 36,
2
m
= 4.0, y
m
= 74.6
Females: n
f
= 40,
2
f
= 2.5, y
f
= 76.5
Let
m
and
f
denote the mean temperature preference of males and
females, respectively.
Construct a 99% condence interval for
m
f
. Is the value
f
= 0 contained in the condence interval? Based on the
interval, should we reject H
0
:
m
f
= 0?
10.5 The p-value
Observed signicance level
Often misunderstood
P(these data or more extreme; H
0
is true)
Reject H
0
at level p value <
Denition: the smallest level of signicance at which H
0
can be
rejected.
Refer to Example 10.2.4.
H
0
: = 10 versus H
a
: < 10; z
obs
= 1.84
If = 0.1, z
= 1.28, RR: z
obs
< 1.28, conclusion: Reject H
0
If = 0.05, z
= 1.65, RR: z
obs
< 1.65, conclusion: Reject H
0
...
If = 0.025, z
= 1.96, RR: z
obs
< 1.96, so Do not reject H
0
Questions:
Calculate P(Z < 1.28) and P(Z < 1.85)
What is the smallest so that H
0
will be rejected?
For what values, we can reject H
0
? for any > 0.03
Normal Calculator:
http://www.stat.tamu.edu/
west/applets/normaldemo.html
Calculation of p-values
The p-value is the probability of obtaining a test statistic value as
extreme as the observed value, calculated assuming H
0
is true.
Consider testing H
0
: =
0
. Suppose the test statistic Z N(0, 1)
(or approximately) under H
0
.
Left-tailed test H
a
: <
0
, p-value=P(Z < z
obs
)
Right-tailed test H
a
: >
0
, p-value=P(Z > z
obs
)
Two-tailed test H
a
: =
0
,
p-value=P(Z < |z
obs
| OR Z > |z
obs
|) = 2P(Z > |z
obs
|)
The more extreme observed test statistic value
smaller p-value
more evidence to reject H
0
Calculation of p-value
Example 10.5.1 Refer to the Beer Tasting example 10.2.3.
H
0
: p = 1/3, H
a
: p > 1/3, = 0.05
y = 40 out of n = 100 correct ids observed. Calculate the p-value and
conclude.
Example 10.5.2 (10.57 of WMS)
A publisher of a newsmagazine ad found through past experience that
60% of subscribers renew their subscription. In a recent random
sample of n = 200 subscribers, 108 indicated that they planned to
renew. What is the p-value associated with the test that the current
rate of renewal diers from the previously experienced? State your
conclusion using = 0.05. How about = 0.1?
BTW, does the total number of subscribers, N matter?
Example 10.5.3 Refer to Example 10.2.5 (Male and Female
Temperature Preference).
Test H
0
:
m
f
= 0 versus H
a
:
m
f
= 0. Calculate the
p-value and conclude using = 0.01.
Males: n
m
= 36,
2
m
= 4.0, y
m
= 74.6
Females: n
f
= 40,
2
f
= 2.5, y
f
= 76.5
10.6 Testing means in small samples (normal)
Recall: The z-test
Testing Means in Normal Samples with Known Variances
Assumption: Y
1
, , Y
n
a rs from N(,
2
), is known
H
0
: =
0
versus
H
a
: =
0
(two-tailed test)
H
a
: <
0
(lower-tailed test)
H
a
: >
0
(upper-tailed test)
Test statistic:
Z =
Y
0
/
n
Under H
0
: Z N(0, 1)
Two-tailed test:
H
a
: =
0
RR =
_
z
obs
: |z
obs
| > z
/2
_
p-value = P(|Z| > |z
obs
|) = 2P(Z > |z
obs
|)
Right-tailed test:
H
a
: >
0
RR = {z
obs
: z
obs
> z
}
p-value = P(Z > z
obs
)
Left-tailed test:
H
a
: <
0
RR = {z
obs
: z
obs
< z
}
p-value = P(Z < z
obs
)
Recall: The large-sample z-test
Testing Means in Large Samples with Unknown Variances
Assumption: Y
1
, , Y
n
a rs with common mean and
unknown variance
2
, where n is large.
H
0
: =
0
versus
H
a
: =
0
(two-tailed test)
H
a
: <
0
(left-tailed test)
H
a
: >
0
(right-tailed test)
Test statistic:
Z =
Y
0
S/
n
,
S
2
is the sample variance.
Under H
0
: Z
approx
N(0, 1)
RR and p-value calculations are the same as the previous z-test
Two-tailed test:
H
a
: =
0
RR =
_
z
obs
: |z
obs
| > z
/2
_
p-value = P(|Z| > |z
obs
|) = 2P(Z > |z
obs
|)
Right-tailed test:
H
a
: >
0
RR = {z
obs
: z
obs
> z
}
p-value = P(Z > z
obs
)
Left-tailed test:
H
a
: <
0
RR = {z
obs
: z
obs
< z
}
p-value = P(Z < z
obs
)
The t-test
Testing Means in Small Normal Samples
Y
1
, , Y
n
a rs from N(,
2
),
2
unknown and n is small
H
0
: =
0
versus
H
a
: =
0
(two-tailed test)
H
a
: <
0
(lower-tailed test)
H
a
: >
0
(upper-tailed test)
Signicance level:
Test statistic:
T =
Y
0
S/
n
,
where

Y and S
2
are the sample mean and variance.
Under H
0
, T t
n1
Two-tailed test:
H
a
: =
0
RR =
_
t
obs
: |t
obs
| > t
n1,/2
_
p-value = P(|T
n1
| > |t
obs
|) = 2P(T
n1
> |t
obs
|),
where T
n1
is a rv following the t
n1
distribution.
Right-tailed test:
H
a
: >
0
RR = {t
obs
: t
obs
> t
n1,
}
p-value = P(T
n1
> t
obs
)
Left-tailed test:
H
a
: <
0
RR = {t
obs
: t
obs
< t
n1,
}
p-value = P(T
n1
< t
obs
)
T distribution calculator:
http://www.stat.tamu.edu/
west/applets/tdemo.html
Example: IQ Test
Example 10.6.1 Ten sampled students aged 18-21 years received
special training. They are given an IQ test that is N(100, 10
2
) in the
general population. Let be the mean IQ of these students who
received special training. The observed IQ scores:
121, 98, 95, 94, 102, 106, 112, 120, 108, 109
Test if the special training improves the IQ score using signicance
level = 0.05.
Solution, continued
. . . p = .029, so that at level = .05, H
0
is rejected, and the observed
sample mean,

Y is signicantly greater than = 100.
Small Sample TestsTwo-Sample t-test
Independent random samples of size n
1
and n
2
from populations
N(
1
,
2
) and N(
2
,
2
), where is unknown.
H
0
:
1
2
= D
0
(e.g. D
0
= 0) versus
H
a
:
1
2
= D
0
(or
1
2
< D
0
or
1
2
> D
0
)
Signicance level
Test statistic:
T =
Y
1

Y
2
D
0
S
p
_
1
n
1
+
1
n
2
where

Y
1
and

Y
2
are the sample means and S
2
1
and S
2
2
are the
sample variances from two groups.
and the pooled estimator of the common variance
2
is
S
2
p
=
(n
1
1)S
2
1
+ (n
2
1)S
2
2
n
1
+n
2
2
.
Under H
0
: T t
n
1
+n
2
2
Two-tailed test:
H
a
:
1
2
= D
0
RR =
_
t
obs
: |t
obs
| > t
/2,n
1
+n
2
2
_
p-value = P(|T
n
1
+n
2
2
| > |t
obs
|) = 2P(T
n
1
+n
2
2
> |t
obs
|),
where T
n
1
+n
2
2
is a rv following the t
n
1
+n
2
2
distribution.
Example: Recovery time for new drug
Example 10.6.2 Twenty subjects randomized to two groups, n = 10
each. The recovery time for patients taking a new drug (or placebot)
is measured in days. Data follow
with drug (1): 15 10 13 7 9 8 21 9 14 8
placebo(2): 15 14 12 8 14 7 16 10 15 12
Assume that the data are normally distributed and that
1
=
2
. Use
= 0.05 to test H
0
:
1
2
= 0 versus H
a
:
1
2
< 0
Two-Sample t-test (Unequal variances)
Basic Assumptions
1. X
1
, , X
m
is a random samples from N(
1
,
2
1
), and
1
is
unknown.
2. Y
1
, , Y
n
is a random samples from N(
2
,
2
2
), and
2
is
unknown.
3. The X and Y samples are independent of each other.
The standardized variable
(

X

Y ) (
1
2
)
_
S
2
1
m
+
S
2
2
n
has approximately a t distribution with degree of freedom estimated
from the data by
=
(
s
2
1
m
+
s
2
2
n
)
2
(s
2
1
/m)
2
m1
+
(s
2
2
/n)
2
n1
(round down)
Null hypothesis: H
0
:
1
2
= D
0
.
Test statistic value:
t
obs
=
( x y) D
0
_
s
2
1
m
+
s
2
2
n
.
Alternative Hypothesis Rejection Region for Level Test
H
a
:
1
2
>
0
t
obs
t
,
H
a
:
1
2
<
0
t
obs
t
,
H
a
:
1
2
=
0
t
obs
t
,/2
OR t t
,/2
The p-values can be calculated as in the two-sample t-test with equal
variance.
Sign Test - Nonparametric Test
Random sample size n from a continuous distribution with median .
H
0
: =
0
versus H
a
: >
0
Signicance level
Test statistic: S =No of observations >
0
Under H
0
: S Bin(n, 0.5)
Form of RR = {s
obs
k}
p-value =P(S s
obs
)
Sign Test
Example: IQ test
Refer to Example 10.6.1:
The observed IQ scores:
121, 98, 95, 94, 102, 106, 112, 120, 108, 109
H
0
: = 100 versus H
a
: > 100
= 0.05
Observed test statistic value: s
obs
= 7
Under H
0
: S Bin(10, 0.5)
p-value=P(S 7) = 1 P(S 6) = 1 0.828 = 0.172.
Conclusion: do not reject H
0
10.7 Testing Variances in Small Normal Samples
Random sample from N(,
2
). Consider
H
0
:
2
=
2
0
versus
H
a
:
2
=
2
0
(or
2
<
2
0
or
2
>
2
0
)
Signicance level
Test statistic:
2
=
(n1)S
2
2
0
(
2
n1
under H
0
)
Two-tailed test:
H
a
:
2
=
2
0
RR :
2
obs
>
2
/2,n1
or
2
obs
<
2
1/2,n1
p-value = 2 min{P(
2
>
2
obs
), P(
2
<
2
obs
)}
Right-tailed test:
H
a
:
2
>
2
0
RR :
2
obs
>
2
,n1
p-value = P(
2
>
2
obs
)
Left-tailed test:
H
a
:
2
<
2
0
RR :
2
obs
<
2
1,n1
p-value = P(
2
<
2
obs
)
Example: IQ test
Example 10.7.1 IQ Example 10.6.1: H
0
:
2
= 100 versus
H
a
:
2
> 100. Use signicance level = 0.05.
Recall: y = 106.5, s = 9.5.
Solution:
Observed test statistic value:
Rejection region:
p-value=
Two Sample Variance Tests
Suppose that S
2
1
and S
2
2
are the sample variances for two independent
random samples of size n
1
and n
2
from distributions N(
1
,
2
1
) and
N(
2
,
2
2
). All parameters are unknown.
The forms
U
j
=
(n
j
1)S
2
j
2
j
, j = 1, 2
are independent
2
random variables with (n
1
1) and (n
2
1)
degrees of freedom, respectively, so that
F =
U
1
n
1
1
/
U
2
n
2
1
=
S
1
2
2
S
2
2
2
1
F
n
1
1,n
2
1
.
Thus, when
2
1
=
2
2
,
F =
S
2
1
S
2
2
F
n
1
1,n
2
1
To test the equality of the two population variances:
H
0
:
2
1
=
2
2
versus
H
a
:
2
1
=
2
2
(or
2
1
>
2
2
or
2
1
<
2
2
)
Signicance level
Test statistic: F = S
2
1
/S
2
2
RR (two-tailed test): F > F
n
1
1,n
2
1,/2
or
F < F
n
1
1,n
2
1,1/2
= (F
n
2
1,n
1
1,/2
)
1
RR (right-tailed test): F > F
n
1
1,n
2
1,
= (F
n
2
1,n
1
1,1
)
1
RR (left-tailed test): F < F
n
1
1,n
2
1,1
= (F
n
2
1,n
1
1,
)
1
Two Sample Variance Tests: Example
Example 10.7.2 Compare the variances of the amount of active
ingredients in generic and brand-name drugs. Random samples of
size 20 (generic) and 30 (brand-name). Data: s
2
g
= 0.00109mg
2
,
s
2
b
= 0.000384mg
2
. Use level = 0.05 to test
H
0
:
2
g
=
2
b
versus H
a
:
2
g
>
2
b
10.8 Neyman-Pearson - MP -level test
Consider a test involving a parameter with test statistic W and
rejection region RR. The power of the test:
Power() = P(reject H
0
when the parameter value is )
= P(W RR, when the parameter value is )
Relationship between power and ,
Suppose H
0
: =
0
,
a
is a parameter value under H
a
. Then
Power(
0
) = = P(Reject H
0
when H
0
is true)
Power(
a
) = 1 (
a
)
We would like to choose a level (Type I error) RR to maximize the
Power() for in H
a
, i.e. nd the Most Powerful (MP) -level
test.
Neyman-Pearson Lemma
MP -level Tests
Y
1
, , Y
n
is a rs from a distribution with parameter and likelihood
L(). We wish to test:
H
0
: =
0
versus H
a
: =
a
,
using level of signicance , where
0
and
a
are given.
Theorem 1 The Neyman-Pearson Lemma For the given level
of signicance, , the test that maximizes Power(
a
) has a RR with
the form:
RR :
L(
0
)
L(
a
)
< k,
where k is chosen to insure that the level (Type I error probability) is
. The Most Powerful (MP) - level test is sometimes called the best
test.
Remark: The Neyman-Pearson Lemma gives the Rejection Region,
RR, that maximizes the power:
Power(
a
) = P(RR| =
a
)
given that
P(RR| =
0
) =
MP -level Test
Example: Beta(, 1) (n = 1)
Example 10.8.1 One observation n = 1, Y from Beta(, 1) with
pdf:
f(y|) = y
1
, 0 < y < 1
(a) Use the N-P lemma to nd the = .05 MP test of
H
0
: = 2 versus H
a
: = 1
(b) For the MP 0.05-level test derived above, calculate Power(1)
Solution:
Likelihood L() = f(y|) = y
1
Therefore, in this case
L(
0
)
L(
a
)
=
L(2)
L(1)
=
2y
1y
0
= 2y, 0 < y < 1
By N-P Lemma, the rejection region for the MP test:
RR = {2Y < k} or {Y < k/2}
Determining k:
= P(RR| =
0
= 2)
= P(Y < k/2| = 2)
=
_
k/2
0
2ydy = y
2
|
k/2
0
= (k/2)
2
k/2 =
k = 2
Therefore, for this problem, the MP 0.05-level test has the rejection
region
RR : Y <
0.05 = 0.2236
(b) Solution (compute Power(1)):
Power(1) = P(RR| =
a
= 1)
= P(Y < 0.2236| = 1)
=
_
0.2236
0
1y
0
dy =
_
0.2236
0
1dy
= y|
0.2236
0
= 0.2236.
MP -level Test
Normal Sample
2
known
Y
1
, , Y
n
rs N(,
2
),
2
known. Test
H
0
: =
0
versus H
a
: =
a
, where
a
>
0
The pdf of Y
i
is
f(y|) =
1
2
exp
_
(y )
2
2
2
_
, < y < +.
Use the N-P lemma to nd the MP -level test procedure.
Solution:
From the pdf, we obtain the likelihood for :
L() = f(y
1
|)f(y
2
|) f(y
n
|)
=
_
1
2
_
n
exp
_
i=1
(y
i
)
2
2
2
_
By the N-P Lemma, the FORM of the MP -level test rejection region
is:
RR :
L(
0
)
L(
a
)
< k.
L(
0
)
L(
a
)
=
_
1
2
_
n
_
1
2
_
n

exp
_
n
i=1
(y
i
0
)
2
2
2
_
exp
_
n
i=1
(y
i
a
)
2
2
2
_ < k
exp
_
1
2
2
_
n
i=1
(y
i
0
)
2
i=1
(y
i
a
)
2
__
< k
1
2
2
_
n
i=1
(y
i
0
)
2
i=1
(y
i
a
)
2
__
< ln(k)
_
n
i=1
(y
i
0
)
2
i=1
(y
i
a
)
2
_
>2
2
ln(k)
RR :
_
n
i=1
(y
i
0
)
2
i=1
(y
i
a
)
2
_
>2
2
ln(k)
_
n
i=1
y
2
i
2n y
0
+n
2
0
_
_
n
i=1
y
2
i
2n y
a
+n
2
a
_
> 2
2
ln(k)
2n y(
a
0
) +n
2
0
n
2
a
> 2
2
ln(k)
2n y(
a
0
) > 2
2
ln(k) n
2
0
n
2
a
since
a
0
> 0
RR : y >
2
2
ln(k) n
2
0
n
2
a
2n(
a
0
)
.
Note that the right hand side of the above does not involve the data,
so the inequality is equivalent to y > k
So the form of the RR becomes:

RR : y > k
RR : y > k
To determine k
, we set
= P(
Y RR| =
0
)
= P(
Y > k
| =
0
)
= P
_
Y
0
/
n
>
k
0
/
n
| =
0
_
= P
_
Z >
k
0
/
n
_
Therefore,
k
0
/
n
= z
=
0
+z
n
Thus, the MP -level test of
H
0
: =
0
versus H
a
: =
a
, where
a
>
0
has the rejection region:
RR : y >
0
+z
n
or equivalently:
RR : Z =
y
0
/
n
> z
.
Uniformly Most Powerful (UMP) -level test
In the Normal sample example with known , note that the test does
not depend on
a
(except that we need the assumption
a
>
0
).
Because of this, the test is the most powerful level test for
H
a
: =
a
, where
a
>
0
. We call such test the Uniformly
Most Powerful (UMP) -level test of
H
0
: =
0
versus H
a
: >
0
.
Simple hypothesis: hypothesis that uniquely species the
distribution of the population from which the sample is taken.
Composite hypothesis: not a simple hypothesis.
Eg. for the above normal sample example with known H : =
0
is
simple, while H : >
0
is composite
MP -level test
Exponential Sample
Example 10.8.2 Y
1
, , Y
n
rs from Exp() with pdf:
f(y) =
1
e
y/
, 0 < y < +
(1) Using N-P Lemma to construct the MP -level test for
H
0
: =
0
versus H
a
: =
a
, where
a
>
0
Hint:
n
i=1
Y
i
Gamma(n, ).
10 Hypothesis Testing 91 10 Hypothesis Testing 92
(2) Construct the UMP -level test for
H
0
: =
0
versus H
a
: >
0
.
(3) For n = 36, we wish to test
H
0
: = 1 versus H
a
: = 2(or > 1)
using level = 0.01. Use StaTable to nd the critical value at
http: // www. cytel. com/ Products/ StaTable/ or
http: // mcsp. wartburg. edu/ nmb/ fall10/ math313/
seeingstats/ Chpt4/ gammaProb. html
Gamma(, ), is the shape parameter and is the scale
parameter.
(4) (Large Sample Approach). Using the CLT to obtain an
approximate test for the hypotheses in (3).
Summary: Neyman-Pearson Lemma
MP -level Tests
N-P Lemma provides the test statistic and form of the RR for the
MP test. The constant (critical value) must be determined to
assure that the test is level .
It is not always possible to nd a UMP test.
The N-P Lemma cannot be applied if there are unknown
parameters other than .
If the rvs in the random sample are discrete, then it is usually not
possible to achieve a given level of signicance, .
10.9 Likelihood Ratio Test
Likelihood Ratio Test (LRT)
An approach for developing tests when either or both of the
hypotheses are composite.
Can be used when the model for the data has more than one
parameter:
1
,
2
, ,
k
We will refer to the vector of parameters:
= (
1
,
2
, ,
k
)
The likelihood of the random sample:
L(
1
,
2
, ,
k
) = L()
LRT: Normal Example
Let Y
1
, , Y
n
be a random sample from N(,
2
), where
2
is
unknown. Then
= (,
2
).
We wish to test
H
0
: =
0
where H
a
: =
0
Note that the null hypothesis is actually:
H
0
: =
0
,
2
> 0,
i.e. H
0
is not simple (it is composite).
In this situation, H
0
states that the parameters fall in a particular set,
i.e. is in set
0
. We write
0
: the parameter space under
the null hypothesis.
For our example,
H
0
:
0
= {(,
2
) : =
0
,
2
> 0}
The alternative states:
H
a
: {(,
2
) : =
0
,
2
> 0} =
a
,
where
a
denotes the parameter space under the alternative
hypothesis.
We denote the union of the sets
0
and
a
by , i.e.
0

a
=
In our example:
=
0

a
= {(,
2
) : =
0
,
2
> 0} {(,
2
) : =
0
,
2
> 0}
= {(,
2
) : < < +,
2
> 0}
That is, = set of all possible values of the parameters, without
regard to the hypotheses.
Notations:
L(
0
) = max
0
L()
denotes the maximum of the likelihood of the parameter values in
0
(under H
0
).
L(
) = max
L()
denotes the maximum of the likelihood of the parameter values in .
This is always true:
L(
0
) L(
),
since the space contains
0
. If the maximum over falls in
0
then
L(
0
) = L(
a
)
Evidence that H
0
is false (and H
a
is true) is that
L(
0
) << L(
a
)
Likelihood ratio test (LRT) statistic:
=
L(
0
)
L(
)
This is always true: 0 1
Rejection region:
RR : =
L(
0
)
L(
)
k,
k is chosen to achieve a preselected level, .
LRT: Normal Example
Let Y
1
, , Y
n
be a random sample from N(,
2
), where
2
is
unknown. Then
= (,
2
).
We wish to test
H
0
: =
0
versus H
a
: =
0
0
= {(,
2
) : =
0
,
2
> 0}
= {(,
2
) : < < +,
2
> 0}
Use the LRT method to nd the RR for a level test.
10 Hypothesis Testing 105 10 Hypothesis Testing 106
LRT: Large Sample RR
Theorem 2 Let Y
1
, , Y
n
have a joint likelihood L(). Let
r
0
= # free parameters in
0
r = # free parameters in
Assuming that certain regularity conditions hold, then under H
0
and
for large n, 2ln() has approximately a
2
distribution with r r
0
degrees of freedom.
LRT: Normal Example
Using Theorem ?? to derive the level large sample RR.
LRT Example 2: Poisson Dispersion Test
Let X
1
, X
2
, , X
n
be independent rv from
Poisson(
i
), i = 1, 2, , n with pmf:
P(X
i
= x
i
) =
e
x
i
i
x
i
!
, x
i
= 0, 1, ;
i
> 0
We wish to test
H
0
:
i
= , i = 1, 2, , n
versus
H
a
:
i
are not all equal.
Assume that n is large. Construct an approximate level LRT.
Example 2: Poisson Disperson Test
NIST test of asbestos bers, # ber on 23 squares on a grid:
31 29 19 18 31 28 34 27 34 30 16 18 26 27 27 18 24 22 28 24 21 17 24
Example 3: Binomials
Let X
i
Binomial(n
i
, p
i
), i = 1, 2 be independent. We wish to test
H
0
: p
1
= p
2
versus H
a
: p
1
= p
2
0
= {(p
1
, p
2
) : 0 < p
1
= p
2
< 1}
a
= {(p
1
, p
2
) : 0 < p
1
< 1, 0 < p
2
< 1}
Both hypotheses are composite. Suppose n
i
are large. Carry out an
approximate level LRT.
Example 3: Binomials
Clinical Trial: Allergy Medicine versus Placebo
Randomization of 3774 subjects:
Allergy medicine group: n
1
= 2103, x
1
= 547 reported headaches
Placebo group: n
2
= 1671, x
2
= 368 reported headaches
Test whether the proportion of those reporting headaches in dierent
in the two groups, using signicance level = 0.05.
Ref: Michael Sullivan, III. (2004) Statistics: Informed Decisions Using Data.
Summary: LRT
The likelihood ratio approach does not guarantee an optimum test
(unlike the N-P Lemma).
Using the likelihood ratio approach will customarily provide an
acceptable test.
Unlike that N-P Lemma, the likelihood ratio approach can be
applied where the underlying model has nuisance parameters
(parameter not of particular interest).
Summary of Chapter 10
Four Basic Elements of a Statistical Test: (1) H
0
; (2) H
a
; (3)
test statistic; (4) rejection region
Error probabilities:
Type I error probability (level): = P(reject H
0
H
0
is
true) (sending an innocent person to jail)
Type II error probability: (
a
0
H
a
is true with =
a
) (setting a guilty person free)
Power() = P(reject H
0
with parameter value )
Large sample tests
Suppose

N(,
2
) or approximately. For instance, apply

CLT when

is consistent and unbiased for , and n is large.

2
is known or can be estimated consistently by

2
H
0
: =
0
versus H
a
: >
0
Test statistic: T =

or Z = (

0
)/
RR:

:

> z
or equivalently

>
0
+z
P-value=P(Z > z
obs
) (for right tailed z-test)
Typical examples:
Test on population means (one-sample z-test)
Test on two population mean dierence (two-sample z-test)
Test on population proportions (one-sample z-test)
Test on two proportion dierence (two-sample z-test)
Calculation of (
a
) and Power for a level test procedure
Determination of sample size to control the Type II error
probability at level for a level test procedure. Examples
discussed include
Test on means with one-sample z test (SAT prep course)
Test on means with two-sample z test (WMS 10.44)
Test on population proportions with one-sample z test
(Beer-tasting)
Test-CI relationship: CIs are complements of RRs, which are also
called acceptance regions
p-value
Observed signicance level
Smallest level of signicance, for which the observed value
indicates that H
0
should be rejected
Tail area captured by the observed test statistic
Reject H
0
when p-value <
Testing mean in small normal samples
One sample t-test
Two sample t-test (assuming equal variance)
Testing variance in small normal samples

2
-test for one population variance
F-test for testing the equality of two population variances
Neyman-Pearson Lemma
Use N-P Lemma to construct MP -level test for testing
simple hypotheses H
0
and H
a
Based on the MP test, construct uniformly MP -level test for
composite hypotheses
Only one unknown parameter is involved
Likelihood-ratio test
Test statistic: =
L(
0
)
L(
)
L(
0
) (restricted likelihood): the maximum value of
the likelihood when the parameters are restricted (and
reduced in number) based on the assumption of H
0
L(
) (unrestricted likelihood): the maximum value of

the likelihood when some the parameters are unrestricted,
i.e. obtained under the entire parameter space
Construct the RR for a level based on the sampling
distribution of
Large sample procedure: for large n, when the distribution
satises some regularity conditions, 2ln()
2
rr
0
,
r: No of free parameters in the whole parameter space
(unrestricted)
r
0
: No of free parameters in the restricted parameter space
(under H
0
)

Ch10 4page

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch10 4page

Uploaded by

Copyright:

Available Formats

10 Hypothesis Testing 1

The RR is usually located in tails of the sampling distribution

is known or can be estimated by a consistent

is unknown and that

, in the Z-test statistic.

= V (Y/n) = p(1 p)/n. So under H

Calculate (0.5), the Type II error rate when p

also depends on n. Solving

So the form of the RR becomes:

) or approximately. For instance, apply

is known or can be estimated consistently by

) (unrestricted likelihood): the maximum value of

You might also like