You are on page 1of 61

Chi-Square Test

There are two different types of chi


square test, both involve
categorical data

• The chi square for goodness of fit (or one-


sample chi-square)
• The chi square test for independence
One-sample chi square
• Cases are categorized on only one
dimension or variable
Chi-square test for independence
• Determine if two categorical variables are
related
Example A
See a doctor Did not see the doctor

50 50
Example B
See Doctor Did not see the
doctor

Sick 15 15

Not sick 15 15
Which one utilizes one-sample
chi square and which one utilizes
chi square test of independence?
Example A
See a doctor Did not see the doctor

50 50
Example B
See Doctor Did not see the
doctor

Sick 15 15

Not sick 15 15
Example of one-sample chi
square
Example of one-sample chi square
• It is hypothesized that abnormal behavior
is more likely to occur during a full moon.
• The admission of new patients into a
mental health unit over a one year period
by lunar phases and found the following
distribution of admissions
Admission of new patients into a
mental health

Lunar Phase
New Moon First Third Full Moon
(1) Quarter Quarter (4)
(2) (3)
2 10 10 18
Do admissions vary with the phase
of the moon?
• The chi square goodness of fit test may be
used to answer the question
• To use a chi-square test, it is needed to
find expected frequencies of admission for
each lunar phase
Basis to obtain expected
frequencies
• If admissions into mental health unit is
unrelated to phase of the moon, then the
frequency of new admissions should be
equal in each phase
The hypothesis
• Ho : New admissions to mental health
unit and phase of the moon are
independent in the population
sampled
• H1 : New admissions to the mental
health unit and phase of the moon
are related in the population
sampled
• If null hypothesis is rejected, the new
admissions should occur in each lunar
phase
• But if null hypothesis is not rejected, the
one fourth of the new admissions should
occur on each lunar phase
• Let say there are 40 new patients,
because there are 4 lunar phases,
therefore, 40/4 = 10 new patients for each
lunar phase
The observed and expected values
New Moon First Third Full Moon
Quarter Quarter

Observed Observed Observed Observed


value = 2 value = 10 value = 10 value = 18

Expected Expected Expected Expected


value = 10 value = 10 value = 10 value = 10
What is the difference between
observed and expected value?
• Observed value is the data which have
collected by the researcher
• Expected value is the theoretical data
which have been made based on the
hypothesis
Basis to obtain expected
frequencies
• If admissions into mental health unit is
unrelated to phase of the moon, then the
frequency of new admissions should be
equal in each phase
The hypothesis
• Ho : New admissions to mental health
unit and phase of the moon are
independent in the population
sampled
• H1 : New admissions to the mental
health unit and phase of the moon
are related in the population
sampled
• If null hypothesis is rejected, the new
admissions should occur in each lunar
phase
• But if null hypothesis is not rejected, the
one fourth of the new admissions should
occur on each lunar phase
The observed and expected values
New Moon First Third Full Moon
Quarter Quarter

Observed Observed Observed Observed


value = 2 value = 10 value = 10 value = 18

Expected Expected Expected Expected


value = 10 value = 10 value = 10 value = 10
Using SPPS test to execute
one sample chi square test
*Refer to data file
Test using SPSS
• Analyze
• Nonparametric tests
• Chi-square
• Move the variable into the Test variable list
box
• In the Expected values, choose All
categories equal
• Options →descriptives →continue →OK
Another example?
Example 1

Ill enough to go to a doctor

Response Yes No

Frequency
67 55
(observed)

n = 122
• This test answers the question:
How good does the observed
distribution fit with the theoretical
distribution?

• The expected frequency is the same for


each response category (122/2 = 61)
Example 1

Ill enough to go to a doctor

Scale Yes No
Frequency
67 55
(observed)
Frequency
61 61
(expected)
Hypothesis
• Null Hypothesis:
the proportion of people who are ill and see
the doctor and people who are ill but did not
see the doctor is the same

• Alternative Hypothesis :
the proportion of people who are ill and see
the doctor and people who are ill but did not
see the doctor is different
• The test statistic is χ2
• The value of χ2 is small when the
difference netween observed frequency, fo
and expected frequency,fe is small
i.e. (fo – fe ) approaches 0
• The bigger the χ2 value, the bigger is the
possibility to reject the null hypothesis
Test using SPSS
• Analyze
• Nonparametric tests
• Chi-square
• Move the variable into the Test variable list
box
• In the Expected values, choose All
categories equal
• Options →descriptives →continue →OK
Il l Enough to Go to a Doctor

Obs erv ed N Expec ted N Res idual


Yes 67 61. 0 6. 0
No 55 61. 0 -6.0
Tot al 122 (fo – fe )

Test Statistics

Ill Enough to
Go to a Doctor
Chi-Squarea 1. 180
df 1
Asy mp. Sig. .277 p > .05
a. 0 cells (.0%) hav e expec ted f requencies less than
5. The minimum expec ted cell f requency is 61.0.
Decision
• The results show that the test is not
significant,
χ2 (df = 1, N = 122) = 1.18, p = .277
• Fail to reject the null hypothesis (p > .05)
• The proportion of people who are ill and
see the doctor is the same as the
proportion of people who are ill and did not
see the doctor
Chi-Square Test for
Independence
The purpose of crosstabulation
• To show in tabular format the relationship
between two or more categorical variables
Example of categorical variables
• Gender (male, female)
• Ethnicity (Asian, Whites, Hispanic)
• Place of residence (rural, urban)
• Response (Yes, No)
• Grade (A, B, C, D, F)
Crosstabulation
• Suppose there are 5 Americans and 20
Asians (and that there are 15 females and
10 males)
• Based on the information, how many
female Asians or male Whites?
• Use Crosstabs command to “cross” two
variables to answer the question

Male Female

Americans 2 3

Asians 8 12
Observed value
• Observed value is the frequency within
each cell
• Example; 2 male Americans, 3 male
Americans, 8 male Asians, 12 male Asians
Observed value is the frequency
within each cell
Male Female

Americans 2 3

Asians 8 12
Question: Is there a relationship
between gender and ethnicity?
Male Female

Americans 5 10

Asians 8 8
• In order to answer the question, we
analyse the expected value
• Expected value is based on the
assumption that the two variables are
independent of each other
*Refer to data file
Example
• In Malaysian society, most football players
are male, and the two categories (gender
and football players) are dependent on
each other
• Based on the assumption, if there are 10
football players, the expected value for
male football players is 10
Example
• If the observed values are 9 male football
players, 1 male non-football players,1
female football players and 9 female non-
football players

DO THE EXPECTED VALUES DIFFER


SIGNIFICANTLY WITH THE OBSERVED
VALUES?

What do you think?


Example
• If the observed values are 9 male non-
football players, 1 male football players,9
female football players and 1 female non-
football players

DO THE EXPECTED VALUES DIFFER


SIGNIFICANTLY WITH THE OBSERVED
VALUES?

What do you think?


• In order to determine whether the
observed values for the cells deviate
significantly from the expected values, we
use CHI SQUARE TEST OF
INDEPENDENCE
• If X² statistic is large, suggesting a
significant difference between observed
and expected values
• With p < 0.05, observed values differed
significantly from expected value
• Refer to phi value which is a measure of
the strength of association between two
categorical variables
• Thus, two variables are not independent of
each other
What do you do when the
frequencies in each cell less than
5?
• The first assumption in Chi Square is at
least 80 percent of cells have expected
frequencies of more than 5 or more
• When all expected frequencies are small
(less than 5), Fisher’s Exact Test can be
employed
Another example of chi square
test of independence
Example
Male Female

Motorcycle 36 48

Car 24 72
How to report the result of chi
square test based on Publication
Manual of the American
Psychological Association
(APA)?
*Refer to data file
• Table 1 present the observed and
expected frequencies for gender of worker
in the factory and types of vehicles used to
go to the factory. With alpha equal to 0.05,
a chi square test on these frequencies was
statistically significant, X² (1, N = 180) =
6.43. Male workers tend to use motorcycle
to go to the factory than female workers.
In this report you should write
• Provides the observed and expected
frequencies of the responses
• Indicates the significant level selected for
the test
• X² (1, N = 180) - Identifies the test
statistics as the chi square. The 1 in
parentheses indicates the test was based
on 1 df and the N = 180 gives the total
sample size
In this report you should write
• = 6.43 – gives the value of X
• p < 0.05 – indicates that
(a) “Ho: The gender of factory worker is
independent of types of vehicle used by
the factory worker” are rejected
Other examples
Hypothesis
• Ho : Children and types of toys are
independent in the population
sampled
• H1 : Children and types of toys are
related in the population sampled

*Refer to data file


Hypothesis
• Ho : Teachers and Views about
Teaching Science and Math in
English are not dependent in the
population sampled
• H1 : Teachers and Views about
Teaching Science and Math in
English are related in the
population sampled
*Refer to data file
Hypothesis
• Ho : Students’ background and types
of books are not dependent in the
population sampled
• H1 : Students’ background and types
of books are related in the
population sampled
*Refer to data file
Hypothesis
• Ho : Children gender and types of toys
are not dependent in the
population sampled
• H1 : Children gender and types of toys
are related in the population
sampled
*Refer to data file

You might also like