You are on page 1of 75

The word 'belief' is a difficult thing for me.

I
don't believe. I must have a reason for a
certain hypothesis. Either I know a thing, and
then I know it - I don't need to believe it.

[Statistics|101]
Chapter Fourteen
Motivation
There are two possible It is said that we can prove anything by statistics except the
outcomes: if the result truth. Well, I think that should be we can prove anything by
confirms the hypothesis, misused statistics except the truth. Because truly, we can
then you've made a prove anything by merely using statistics responsibly.
measurement. If the result
is contrary to the We already know that it can serve as a powerful tool in
hypothesis, then you've discovering and understanding the truth based on collected
made a discovery. data.

Hypothesis testing opens the gate to making decisions on


whether or not to reject an assertion made about the
population parameter by using the information obtained
from the sample.

Our goal extends not only to estimating parameters, but


also to determine if the sample a hand provides us sufficient
information to support the conjecture we have made about
the parameter.

[Statistics|101] Hypothesis Testing


Learning Objectives
By the end of this module, each student should be able:

To know the basic concepts of hypothesis testing


To test hypothesis about the mean and proportion
To test hypothesis about independence
To follow the proper statistical tests of hypothesis
To torture the data long enough that it confesses

[Statistics|101] Hypothesis Testing


STATISTICAL HYPOTHESIS
A statistical hypothesis is a conjecture
concerning one or more populations whose
veracity can be established using sample
data.

[Statistics|101] Hypothesis Testing


NULL V.S. ALTERNATIVE HYPOTHESIS
The null hypothesis, denoted by Ho, is a
statistical hypothesis which the researcher
doubts to be true.

The alternative hypothesis, denoted by Ha, is


the operational statement of the theory that
the researcher believes to be true and wishes
to prove.
The null and alternative hypothesis are nonoverlapping
statements. One and only one of the two is true since one is a
contradiction of the other.

[Statistics|101] Hypothesis Testing


NULL V.S. ALTERNATIVE HYPOTHESIS
Null Hypothesis Alternative Hypothesis

= o > o
< o
o

[Statistics|101] Hypothesis Testing


NULL V.S. ALTERNATIVE HYPOTHESIS

[Statistics|101] Hypothesis Testing


ONE-TAILED V.S TWO-TAILED TEST
A one-tailed test of hypothesis is a test where
the alternative hypothesis specifies a one-
directional difference for the parameter of
interest.

A two-tailed test of hypothesis is a test where


the alternative hypothesis does not specify a
directional difference for the parameter of
interest.

[Statistics|101] Hypothesis Testing


TEST STATISTIC
A test statistic is a statistic whose value is
calculated from sample data, which will be
the basis for deciding whether to reject Ho
or not in a test of hypothesis.
Using the sampling distribution of the test statistic, we can
compute for the probability of selecting a sample where the
realized value of the test statistic belongs in a specified region,
when Ho is true. We can then use its probability as a basis for
the decision we take in a test of hypothesis.

It is logical to decide to reject Ho if we know that it is unlikely to


select a sample whose value for the test statistic is as what we
have observed in our sample, when Ho is true.

[Statistics|101] Hypothesis Testing


CRITICAL REGION
The critical region or region of rejection is the
set of values of the test statistic for which we
reject the null hypothesis.
We may think of the region of rejection as the set of values that
the test statistics will unlikely take on when the null hypothesis is
true.

The region of rejection is very small which means that if the null
hypothesis were true, then there is only a small chance of
selecting a sample where the value of the test statistic belongs in
the region of rejection.

This is why we reject Ho whenever the realized value of the test


statistic belongs in the region of rejection.

[Statistics|101] Hypothesis Testing


ACCEPTANCE REGION
The acceptance region or region of
nonrejection is the set of values of the test
statistic for which we do not reject the null
hypothesis.
If the realized value of the test statistic belongs in the acceptance region,
we decide not to reject the null hypothesis. On the other hand, if its
value falls in the region of rejection, we decide to reject the null
hypothesis.

The region of rejection is always located at the tail end of the distribution
of the test statistic when Ho is true. For a two-tailed test, the region of
rejection is at the two tail ends of the distribution. As expected, the
region of rejection of a one-tailed test is at one tail end of the
distribution, depending on the direction stated in the alternative
hypothesis.

[Statistics|101] Hypothesis Testing


REGIONS

critical value

[Statistics|101] Hypothesis Testing


TYPE I &TYPE Ii error
Type I error is the error committed when we
decide to reject the null hypothesis when in
reality it is true.

Type II error is the error committed when we


decide not to reject the null hypothesis when
in reality it is false.
We cannot commit these two errors at the same time. When we
reject Ho, its possible to commit a Type I error. When we decide
not to reject Ho, its possible to commit a Type II error.

The probability of the two errors are inversely related.

[Statistics|101] Hypothesis Testing


TYPE I &TYPE Ii error
Null Hypothesis
Decision True False
Reject Ho Type I error Correct!
Accept Ho Correct! Type II error

[Statistics|101] Hypothesis Testing


TYPE Ii error

Ho: Marcos is a hero.


Ha: Marcos is not a hero.
Accept that Marcos is a hero when in fact he is NOT!!!

[Statistics|101] Hypothesis Testing


LEVEL OF SIGNIFICANCE
The level of significance, denoted by , is the
maximum probability of committing a Type I
error that a researcher is willing to commit.
The smaller the value of , the lower the risk of committing a
Type I error. Hence, we choose a level of significance depending
on the consequence of committing a Type I error.

Common values for are 0.05, 0.10, and 0.01.

The level of significance affects the size of the region of rejection.

If the null hypothesis is rejected at a level of significance and we


use the same data set to perform the test at a higher level of
significance, then the null hypothesis will once again be rejected.

[Statistics|101] Hypothesis Testing


P-VALUE
The p-value is the probability of selecting a
sample whose computed value for the test
statistic is equal to or more extreme (in the
direction stated in Ha) than the realized value
computed from the sample data, given that
the null hypothesis is true.
As a rule, if the p-value is greater than the level of significance
, then we do not reject Ho.

If the p-value is less than or equal to the level of significance,


then we reject the null hypothesis.

[Statistics|101] Hypothesis Testing


P-VALUE

p-value reject Ho

p-value > do not reject Ho

[Statistics|101] Hypothesis Testing


STEPS IN HYPOTHESIS TESTING
State the null and alternative hypothesis.
Using the
Critical Value

Choose the level of significance.

Set up the decision rule. Select the appropriate


test statistic and establish the critical region.

Collect the data and compute the value of the


test statistic from the sample data.

Make the decision and write your conclusion.

[Statistics|101] Hypothesis Testing


STEPS IN HYPOTHESIS TESTING
State the null and alternative hypothesis.
Using the
p-Value

Choose the level of significance.

Set up the decision rule. Select the appropriate


test statistic and establish the critical region.

Collect the data and compute the value of the


test statistic from the sample data.

Compute for the p-value and make the


decision. Write your conclusion.

[Statistics|101] Hypothesis Testing


EXERCISES
True or False.

a. If a test indicates that Ho is rejected at 0.05 level


of significance, then the test will also reject Ho at
0.10 level of significance.

b. A hypothesis test for which the Type I error


occurs with probability 0.02 has probability of
Type II error equal to 0.98.

c. If the decision is to reject the null hypothesis,


then it is impossible to commit a Type II error.
EXERCISES
Suppose it is desired to test the following hypothesis:

Ho: Smoking is not harmful to your health.


Ha: Smoking is harmful to your health.

In terms of the null hypothesis, state in words what is


represented by:

a. a Type I error
b. a Type II error

Which type of error do you think is more serious? Why?


HYPOTHESIS TESTS FOR THE MEAN
Assume that we have a random sample (X1, X2, , Xn)
from a normal distribution with mean and variance 2.
Hypothesis Tests for the Population Mean
Null Hypothesis (Ho) Alternative Hypothesis (Ha) Test Statistic Region of Rejection

< o z < -z
Case 1: 2 is known
> o X o z > z
Z=
= o o |z| > z/2
n
< o t < -t,n-1
Case 2: 2 is unknown and n 30
X o
> o T= t > t,n-1
= o o S |t| > t/2,n-1
n
< o z < -z
Case 3: 2 is unknown and n > 30 X o
> o Z= z > z
= o o S |z| > z/2
n

[Statistics|101] Hypothesis Testing


EXAMPLE
A certain restaurant advertises that it puts 0.25 pound of beef in
its burgers. A customer who frequents the restaurant thinks the
burgers actually contain less than 0.25 pound of beef. With
permission from the owner, the customer selected a random
sample of 60 burgers and found the mean and standard
deviation to be 0.22 and 0.07, respectively.

a. Test the customers assertion at 0.01


level of significance using the critical
value approach.

b. Compute for the p-value. Will you


reject Ho in (a) at 0.01 level of
significance?

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Let be the mean amount of beef in the burgers made by the restaurant.

Ho: = 0.25
Ha: < 0.25
= 0.01

Decision Rule: Reject Ho if z < -z0.01 = -2.326.

X o 0.22 0.25
The test statistic is Z = = = -3.3197
S 0.07
n 60
Decision: Since z = -3.3197 < -2.326, we reject Ho.

Conclusion: At 1% level of significance, the customer has sufficient evidence


to claim that the mean amount of beef in burgers the restaurant makes is
less than 0.25 pound.

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Using the sample data, the observed value of the test statistic is -3.3197.

Since the alternative hypothesis is < 0.25, then

p-value = P(Z -3.3197 | Ho is true) = P(Z -3.3197 | = 0.25) = 0.0005

Since 0.0005 < 0.01, we reject Ho.

Note: A p-value of 0.0005 means that the probability of selecting a


sample whose sample mean is as low as what we have observed in
our sample, or even lower than that, is very small if the null
hypothesis were true. This is the reason why we were inclined to
reject the null hypothesis, in favor of the alternative hypothesis.

If Ha had been > 0.25, then p-value = P( Z -3.3197 | Ho is true).


If Ha had been 0.25, then p-value = P( |Z| -3.3197 | Ho is true).

[Statistics|101] Hypothesis Testing


EXAMPLE
A test can be conducted to determine the length of time required for a student to
read a specified amount of material. In this test, students were instructed to read
at the maximum speed at which they could still comprehend the material. A
random sample of sixteen students took the test, with the following results (in
minutes). Assume that the results of the test are normally distributed.

18 27 29 20 19 25 24 21
24 19 23 28 31 22 27 21

a. Estimate using a 95% confidence interval.

b. Test the null hypothesis Ho: = 25 against


the alterative hypothesis Ha: 25 at
= 0.05.

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Let = mean length of time (in minutes) required to read the material

We use Case 2 since the population variance is unknown and sample size is small.

Using the sample data, we can compute for the following:

X = 23.625 S2 = 15.45 n = 16

Thus, the 95% confidence interval estimate is


S S 3.9306 3.9306
X t ,n1 , X + t ,n1 = 23.625 t0.025,161 , 23.625+ t0.025,161
2 n 2 n 16 16

3.9306 3.9306
= 23.625 2.131 , 23.625+ 2.131 = (21.53, 25.72)
16 16

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Ho: = 25
Ha: 25

= 0.05

Decision Rule: Reject Ho if |t| > t0.025,15 = 2.131.

X o 23.625 25
The test statistic is T = = = -1.3993
S 3.93
n 16
Decision: Since |t| = |-1.3993| = 1.3993 < 2.131, we do not reject Ho.

Conclusion: At 5% level of significance, we do not have sufficient evidence to claim that


25.

[Statistics|101] Hypothesis Testing


Confidence interval & hypothesis testing
As we have mentioned earlier, confidence interval estimation and
hypothesis testing are directly related. The result of a (1-)100% interval
estimation for is consistent with the result of the corresponding 2-
tailed test at level of significance.

If the hypothesized value, o, belongs in the


computed (1-)100% interval estimation for
, then the value of the test statistic of the
corresponding test will belong in the
acceptance region. As a result, the test will
fail to reject the null hypothesis.

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Ho: = 25
Ha: 25

= 0.05

Decision Rule: Reject Ho if |t| > t0.025,15 = 2.131.

X o 23.625 25
The test statistic is T = = = -1.3993
S 3.93
n 16
Decision: Since |t| = |-1.3993| = 1.3993 < 2.131, we do not reject Ho.

Conclusion: At 5% level of significance, we do not have sufficient evidence to claim that


25.

[Statistics|101] Hypothesis Testing


HYPOTHESIS TESTS FOR THE PROPORTION
Assume that the population proportion is not expected
to be too close to 0 or 1 and n is large.

Hypothesis Tests for the Population Proportion

Null Hypothesis (Ho) Alternative Hypothesis (Ha) Test Statistic Region of Rejection

p < po Y npo z < -z


Z=
npo (1po )
p = po p > po z > z
where Y is the number of
p po successes in a random |z| > z/2
sample of size n

[Statistics|101] Hypothesis Testing


EXAMPLE
A commonly prescribed medicine in the market for relieving nervous
tension is believed to be only 75% effective. Results of an experiment
with a new medicine administered to a random sample of 110 adults
who were suffering from nervous tension showed that 93 received relief.
Is this sufficient evidence to conclude that the new medicine is superior
to the one commonly prescribed? Use a 0.05 level of significance.

Let p be the population proportion of adults suffering from nervous tension who
will be relived by the new medicine.

Ho: p = 0.75
Ha: p > 0.75

= 0.05

Decision Rule: Reject Ho if z > z0.05 = 1.645

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
93 (110)(0.75)
The test statistic is Z = = 2.312.
(110)(0.75)(0.25)

Decision: Since z = 2.312 > 1.645, we conclude that at 0.05 level of


significance we have sufficient evidence to conclude that p > 0.75.

Thus, the new medicine is superior to the one commonly prescribed with
a greater proportion of adults suffering from nervous tension who will be
relieved by it.

[Statistics|101] Hypothesis Testing


EXAMPLE
The brand executive of a company claims that they have failed to meet
their goal because less than 80% of all target consumers are familiar with
the shampoo commercial that they had broadcast on radio and
television during the past month. A random sample of 500 respondents
indicated that 388 were familiar with the said commercial. Is this claim
valid? Use the 0.05 level of significance.

Let p be the population proportion of all target consumers who are familiar with
the shampoo commercial.

Ho: p = 0.8
Ha: p < 0.8

= 0.05

Decision Rule: Reject Ho if z < -z0.05 = -1.645

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
388 (500)(0.8)
The test statistic is Z = = -1.342.
(500)(0.8)(0.2)

Decision: Since z = -1.342 > -1.645, we do not reject Ho.

At 5% level of significance, we do not have sufficient evidence to validate


the executives claim.

[Statistics|101] Hypothesis Testing


EXERCISES
A mortgage is a type of loan that is secured by a designated piece
of property. If the borrower defaults on the loan, the lender can
sell the property to recover the outstanding debt. The following
data are outstanding principal balance of home mortgages
foreclosed by the bank due to default by the borrower during the
last 3 years obtained from a random sample of 12 foreclosed
mortgages:

95,982 81,422 39,888 46,836 66,899 69,110


59,200 62,331 105,812 55,545 56,635 72,123

Test the claim that the average outstanding balance of home


mortgages is less than 80,000 using a:

a. 0.05 level of significance.


b. 0.10 level of significance.
c. 0.01 level of significance.
EXERCISES
The manager of the credit department for an oil company would
like to determine whether the average monthly balance of credit
card holders is higher than Php 3,000.00. An auditor randomly
samples 150 accounts and finds that the average owed is Php
4,170.00 with a standard deviation of Php 1,182.50.

Using the 0.05 level of significance, can the auditor conclude that
there is evidence that the average monthly balance is really higher
than Php 3,000.00?
EXERCISES
A television manufacturer claims in its warranty that in the past,
less than 15% of its television sets needed any repair during their
first two years of operation. In order to test the validity of this
claim, a government testing agency selects a sample of 100 sets
and finds that 12 sets required some repair within their first two
years of operation.

Is the manufacturers claim valid? Test at 0.0


HYPOTHESIS TESTS FOR TWO MEANS
Let (X1, X2, , Xn1) be a random sample with mean X and variance 2
X.
Also, let (Y1, Y2, , Yn2) be an independent random sample with mean Y and variance 2
Y.
Let X and Y denote the sample mean and S2 2
X and SY denote the sample variance of the two independent random samples, respectively.

Hypothesis Tests for the Difference of Means (Independent Samples)


Null Hypothesis (Ho) Alternative Hypothesis (Ha) Test Statistic Region of Rejection

Case 1: 2 2 X Y < do (X Y) do z < -z


X and Y are Z=
known X Y > do 2 2 z > z
X Y do X + Y |z| > z/2
X Y = do n1 n2

(X Y) do
T= ;
Case 2: 2 2
X and Y are X Y < do t < -t,n +n 2
unknown but 2 2 1 1 1 2
X = Y X Y > do Sp n + n t > t,n +n 2
1 2 1 2
X Y = do X Y do |t| > t/2,n +n 2
1 2
n 1 1 S2 2
Sp = X + (n2 1)SY
n1 + n2 2

[Statistics|101] Hypothesis Testing


HYPOTHESIS TESTS FOR TWO MEANS
Let (X1, X2, , Xn1) be a random sample with mean X and variance 2
X.
Also, let (Y1, Y2, , Yn2) be an independent random sample with mean Y and variance 2
Y.
Let X and Y denote the sample mean and S2 2
X and SY denote the sample variance of the two independent random samples, respectively.

Hypothesis Tests for the Difference of Means (Independent Samples)


Null Hypothesis (Ho) Alternative Hypothesis (Ha) Test Statistic Region of Rejection
Case 3: 2 2
X and Y are X Y < do t < -t,v
unknown but 2 2
t > t,v
X Y X Y > do (X Y) do
X Y = do X Y do T= |t| > t/2,v
S2 S2
X+ Y
2
SX
2
SY
2

n1+ n2
n1 n2 v=
2
2
2
2
SX SY
n1 n2
+
n1 1 n2 1

Case 4: 2 2
X and Y are
unknown but n1 > 30 X Y < do (X Y) do z < -z
Z=
and n2 > 30
X Y > do z > z
X Y do S2 S
X+ Y
2
|z| > z/2
X Y = do n1 n2

[Statistics|101] Hypothesis Testing


EXAMPLE
Let us once again consider the yields of pechay, in kilograms, from the two types
of plots presented previously.

Type I 10.1 7.85 4.9 5.705 5.625 7.45


8 5.4 7.55 7.25 5.75 4.575

Type II 8.7 9.5 9.2 6.45 10.35 8.1


7.7 3.3 8.3 7.8 6.15 5

Suppose the yields of pechay in both types are normally distributed with equal
population standard deviations of 2. Is there reason to believe that the second
type of plot produces a higher yield than the first type of plot? Test at 0.05 level of
significance.

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Let X = mean yield of pechay in Type I plots and
Y = mean yield of pechay in Type II plots

Ho: X Y = 0
Ha: X Y < 0

= 0.05

We use Case 1!

Decision Rule: Reject Ho if z < -z0.05 = -1.645


(X Y) 0
The test statistic is Z = = -1.0609
4+4
12 12

Decision: Since z = -1.0609 > -1.654, we do not reject Ho.

Conclusion: At 5% level of significance, we do not have sufficient evidence to conclude that


Type 2 plots, on the average, produce higher yields than Type I plots.

[Statistics|101] Hypothesis Testing


HYPOTHESIS TESTS FOR TWO MEANS
Let {(X1, Y1), (X2, Y2), , (Xn, Yn)} be your sample data.

Define Di = Xi Yi for i = 1, 2, , n, and D = X Y.


Hypothesis Tests for the Difference of Means (Related Samples)

Null Hypothesis (Ho) Alternative Hypothesis (Ha) Test Statistic Region of Rejection

D < do t < -t,n-1


D do
D > do Z= t > t,n-1
D = do SD
n
D do |t| > t/2,n-1

[Statistics|101] Hypothesis Testing


EXAMPLE
An ornithologist working at a south coast reed swamp wishes to know if the
habitat is used by migrating reed warblers for fattening up before taking off on
migration. Birds arrive in numbers during August and stay at least until the end of
September.

A sample of reed warblers were weighed in August and the same set of birds were
weighed in September. Following are the weights of the reed warblers (in grams) in
the sample.
Reed Warbler
1 2 3 4 5 6 7 8 9 10
August 10.3 11.4 10.9 12.0 10.0 11.9 12.2 12.3 11.7 12.0
September 12.2 12.1 13.1 11.9 12.0 12.9 11.4 12.1 13.5 12.3

Is there evidence here to suggest that the mean weight of the reed warblers tends
to be heavier in September than in August? Use a 0.05 level of significance.
Assume that the weights are normally distributed.

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Let X = mean weight of the reed warblers in September and
Y = mean weight of the reed warblers in August

Ho: D = 0 i di
Ha: D > 0 1 1.9
2 0.7
= 0.05
3 2.2
4 -0.1
Decision Rule: Reject Ho if t > t0.05,10-1 = t0.05,9 =1.833
5 2.0
di di
6 1.0
1 12.2 10.3 = 1.9 6 12.9 11.9 = 1.0
7 -0.8
2 12.1 11.4 = 0.7 7 11.4 12.2 = -0.8
8 -0.2
3 13.1 10.9 = 2.2 8 12.1 12.3 = -0.2
9 1.8
4 11.9 12.0 = -0.1 9 13.5 11.7 = 1.8
10 0.3
5 12.0 10.0 = 2.0 10 12.3 12.0 = 0.3

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Computing for the mean and standard deviation of the dis, we get:

i di 10
2 2
10 di 10d
1 1.9
d= i=1 di = 0.88 SD = i=1 = 1.0654
10 101
2 0.7
3 2.2 0.88 0
The test statistic is t = = 2.612
4 -0.1 1.0654
10
5 2.0
6 1.0 Decision: Since t = 2.612 > 1.833, we reject Ho.
7 -0.8
Conclusion: At 5% level of significance, there is sufficient
8 -0.2 evidence to conclude that the mean weight of the birds tends to
9 1.8 be heavier in September than in August.
10 0.3

[Statistics|101] Hypothesis Testing


HYPOTHESIS TESTS FOR THE PROPORTION
We require the sample sizes n1 30 and n2 30 (or the sample sizes are large).
Let X = number of elements in the 1st sample possessing the characteristic of interest.
Let Y = number of elements in the 2nd sample possessing the characteristic of interest.
Hypothesis Tests for the Difference of Proportions

Null Hypothesis (Ho) Alternative Hypothesis (Ha) Test Statistic Region of Rejection

p1 p2 < 0 Z=
p 1 p2 z < -z
1 1
p (1 p) n + n
p1 p2 = 0 p1 p2 > 0 1 2 z > z
X+Y
p1 p2 0 where p = |z| > z/2
n1 + n2

[Statistics|101] Hypothesis Testing


EXAMPLE
Consider again our Male-Female unpleasant shopping example. Suppose
two samples were taken in the Philippines. The first sample consists of
2,015 adult males while the second sample consists of 2,085 adult females.
Each respondent was asked about their opinion on the pleasantness of
shopping. The results of the survey were as follows:

Males Females
Sample Size 2,015 2,085
Number who think 850 570
shopping is an unpleasant
experience

We want to test the hypothesis that males dislike shopping more than
females at 0.05 level of significance.

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Let p1 = proportion of males who think shopping is an unpleasant experience and
p2 = proportion of females who think shopping is an unpleasant experience

Ho: p1 p2 = 0
Ha: p1 p2 > 0

= 0.05

Decision Rule: Reject Ho if z > z0.05 = 1.645.

850 570
p1 p2 2015 2085
The test statistic is Z = = = 9.9877
p (1 p) n1 +n1 71 (1 71 ) 1 + 1
1 2 205 205 2015 2085

X+Y 850+570 1420 71


where p = n + n = = =
1 2 2015+2085 4100 205

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Decision: Since 9.9877 > 1.645, we reject Ho.

Conclusion: At 5% level of significance, we have sufficient evidence to say


that the proportion among males who dislike shopping is
higher than the proportion among females.

[Statistics|101] Hypothesis Testing


EXERCISES
20 minutes 25 minutes An experiment was conducted to
2.8 3.0 2.8 3.1 determine whether different baking
times produce different rises of
3.0 3.1 2.7 3.1
chocolate chip muffins. Twenty four
3.1 3.0 2.9 3.0 muffins were baked for 20 minutes and
2.9 3.1 2.9 3.1 the rise of each muffin was recorded.
2.7 3.0 3.1 3.1 Another set of 20 muffins were baked
for 25 minutes and the rise of each
2.6 3.1 3.0 3.0
muffin was also recorded. The data, in
2.6 3.0 2.6 3.0 centimeters, are given.
2.8 3.2 2.7 3.1
2.7 3.1 2.8 Test whether the mean rise of muffins
baked for 20 minutes differ from those
2.6 3.0 2.7 baked for 25 minutes. Use the 0.01 level
2.8 3.0 2.8 of significance.
2.9 3.1 2.8
EXERCISES
In a sample of 160 students enrolled in private
schools, 60 were found to be smokers. In a sample of
650 students enrolled in public schools, 115 were
found to be smokers.

Is there sufficient evidence to conclude that there is a


higher proportion of student smokers in private
schools than in public schools? Test at 0.01 level of
significance.
EXERCISES
In 2001, a sample of 1,980 illiterate individuals from
Country A showed that 1,236 of these individuals are
females. In the same year, a sample of 2,108 illiterate
individuals from Country B showed that 1,209 of
these individuals are females.

Can we conclude that the proportions of females


among illiterate individuals are different for the two
countries? Test at 0.05 level of significance.
CHI-SQUARE TESTS

Test for
Goodness- Independence
of-Fit Test

Test for
Homogeneity

[Statistics|101] Hypothesis Testing


TEST FOR INDEPENDENCE

categorical
nominal or
ordinal

nominal or
ordinal
variables

[Statistics|101] Hypothesis Testing


CONTINGENCY TABLE
Y
X Row Total
0 1
0 a b a+b
1 c d c+d
Column Total a+c b+d a+b+c+d = n

where a, b, c, and d are the frequencies in each cell.

These are your observed frequencies.

[Statistics|101] Hypothesis Testing


CONTINGENCY TABLE
Y
X Row Total
0 1
0 a+b)(a+c a+b)(b+d a+b
n n
1 c+d)(a+c c+d)(b+d c+d
n n
Column Total a+c b+d a+b+c+d = n

where a, b, c, and d are the same frequencies.

These are your expected frequencies.

[Statistics|101] Hypothesis Testing


STEPS IN TEST FOR INDEPENDENCE
State the null and alternative hypothesis.

Choose the level of significance, .

Collect the data.

Construct the r x c contingency table.


Compute for the row and column totals.

[Statistics|101] Hypothesis Testing


STEPS IN TEST FOR INDEPENDENCE
Compute for the expected
frequencies using the formula.

Establish the Decision Rule.

Compute for the value of the test


statistic.

Make the statistical decision and


conclude.

[Statistics|101] Hypothesis Testing


Null & alternative hypothesis

Ho: X and Y are


independent.

Ha: X and Y are not


independent.
[Statistics|101] Hypothesis Testing
RXC CONTINGENCY TABLE
Y
X Row Total
1 2 c

1 O1,1 O1,2 O1,c R1

2 O2,1 O2,2 O2,c R2

r Or,1 Or,2 Or,c Rr

Column Total C1 C2 Cc n

Note that Oi,j is the observed number of elements whose realized value for X
is the ith category and whose realized value for Y is the jth category,
where i = 1, 2, , r and j = 1, 2, , c.

[Statistics|101] Hypothesis Testing


EXPECTED FREQUENCIES
Ri Cj
Eij =
n
for i=1,2,...,r
and j=1,2,...,c

[Statistics|101] Hypothesis Testing


DECISION RULE

Reject Ho if
2 2
X > ,(r1)(c1 .

[Statistics|101] Hypothesis Testing


TEST STATISTIC

r c r c
2 2
Oij Eij Oij
X2 = = n
Eij Eij
i=1 j=1 i=1 j=1

[Statistics|101] Hypothesis Testing


EXAMPLE
A study was conducted to determine whether the leader-follower
tendency of a person is associated with his height. In this study, a sample
of 95 people were selected. Based on the information collected, each one
in the sample was classified according to their leader-follower tendency
and height.

The categories of leader-follower tendency are:


(i) follower a person who tends to follow
(ii) in-between a person who sometimes tend to
follow but other times tend to lead
(i) leader a person who tends to lead

The categories of height are:


(i) short
(ii) tall

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Test the hypothesis that there is a relationship between leader-follower
tendency and height at 0.01 level of significance by using the cross-
classification table below:

Height of Person
Leader-Follower
Total
Tendency Short Tall

Follower 22 14 36
In-between 9 6 15
Leader 12 32 44
Total 43 52 95

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Ho: Leader-follower tendency and height are independent.
Ha: Leader-follower tendency and height are not independent.

= 0.01

Decision Rule: Reject Ho if X2 > 2


0.01,(31)(21 = 2
0.01,2 = 9.21

We get first the following values:

O1,1 = 22 O1,2 = 14

O2,1 = 9 O2,2 = 6

O3,1 = 12 O3,2 = 32

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
36)(43 36)(52
E1,1 = = 16.3 E1,2 = = 19.7
95 95

15)(43 15)(52
E2,1 = = 6.8 E2,2 = = 8.2
95 95

44)(43 44)(52
E3,1 = = 19.9 E3,2 = = 24.1
95 95

2 2216.3 2 1419.7 2 96.8 2 68.2 2 1219.9 2 3224.1 2


X = + + + + +
16.3 19.7 6.8 8.2 19.9 24.1
= 10.67

[Statistics|101] Hypothesis Testing


EXAMPLE (CONT.)
Decision: Since X2 = 10.67 > 9.21, we reject Ho.

Conclusion: At 1% level of significance, we have sufficient evidence to


conclude that the leader-follower tendency is associated with
height.

The association suggested by the data is that short people tend to be


followers, while tall people tend to be leaders.

Naturally, this conclusion applies to the population from where the sample
was taken.

[Statistics|101] Hypothesis Testing


EXERCISES
In an experiment to study the dependence of
hypertension on smoking habits, the following data
were taken on 180 individuals.
Smoking Habits
Non-smokers Moderate Heavy smokers
smokers
Hypertension 21 36 30
No hypertension 48 26 19

Test the hypothesis that the presence of absence of


hypertension is independent of smoking habits. Use
a 0.05 level of significance.
EXERCISES
A random sample of 200 married men, all retired,
was classified according to education and number
of children

Educational Number of Children


Attainment 0-1 2-3 Over 3
Elementary 14 37 32
Secondary 19 42 17
College 12 17 10
EXERCISES
The following table was part of the results of a pilot
project conducted by the Nutrition Center of the
Philippines in Batangas, on the development of an
anemia control program.

Perform a test for independence on the summarized


data using the 0.05 level of significance.

Nutritional Status
Classification
of Subjects Normal 1o 2o 3o
Malnourished Malnourished Malnourished
Normal 332 531 122 11
Anemic 198 404 217 23
MG ACTIVITY!!

Summation
Top 3 Learning Points
3
i th learning point
i=1

[Statistics|101] Hypothesis Testing


Questions??
The word 'belief' is a difficult thing for me. I
don't believe. I must have a reason for a
certain hypothesis. Either I know a thing, and
then I know it - I don't need to believe it.

[Statistics|101]
Chapter Fourteen

You might also like