You are on page 1of 13

1

Chapter 8: Chi-Square

C
Chhaapptteerr 88::

Upon completion of this chapter, you should be able to:

 Explain the concept of the chi-square


 Discuss the procedure to using the chi-square
 Interpret SPSS outputs on chi-square tests

CHAPTER OVERVIEW

 Introduction
 Assumptions
 Goodness of fit test Chapter 1: Introduction
Chapter 2: Descriptive Statistics
 χ2 test for independence
Chapter 3: The Normal Distribution
Chapter 4: Hypothesis Testing
Chapter 5: T-test
Summary
Chapter 6: Oneway Analysis of Variance
Key Terms
Chapter 7: Correlation
Chapter 8: Chi-Square

This chapter introduces the concept of the chi-square which is a non-parametric statistical
tool which does not require the assumption of normality to be used. The goodness of fit
test enables whether the expected frequencies are equal to the observed frequencies. If the
observed frequencies differ a great deal from the expected frequencies, then it is likely
that there are significant differences between the groups.
2
Chapter 8: Chi-Square

Introduction

So far we have discussed the use of inferential statistical tools such as the t-
test and ANOVA which demand strict adherence to certain assumptions such as
normality of population. When you have serious violations of the assumptions of
parametric test, you can use non-parametric techniques. These tests tend to be less
powerful than their parametric counterparts.
Also, in some situations you need to use non-parametric statistics because the
variable measured are not interval or ratio but instead are categorical such as religion,
ethnic origin, socioeconomic class, political preference and so forth. To examine
hypotheses using such variables, the chi-square test has been widely used. In this
chapter, we will discuss this popular non-parametric tests called the CHI-SQUARE
(pronounced as “kai-square”) and denoted by this symbol: χ2

Assumptions

Even though certain assumptions are not critical for using the chi-square; you
need to address a number of generic assumptions:

 Random Sampling ─ Observations should be randomly sampled from the


population of all possible observations.

 Independence Observations ─ Each observation should be generated by a


different subject and no subject is counted twice. In other words, subject
should appear in only one group and the groups are not related in any way.

 Size of Expected Frequencies ─ When the number of cells is less than 10 and
particularly when the total sample size is small, the lowest expected frequency
required for a chi-square test is 5. However, the observed frequencies can be
any value, including zero.

In this chapter, we will discuss the use of the chi-square for:


a) One-variable χ2 (goodness-of-fit test) – used when we have one variable
only.
b) χ2 (test for independence: 2 x 2 – used when we are looking for an
association between two variables, with two levels
3
Chapter 8: Chi-Square

a) One-Variable χ2 or Goodness-Of-Fit Test

This test enables us to find out whether a set of Obtained (or Observed)
Frequencies differs from a set of Expected Frequencies. Usually the Expected
Frequencies are the ones that we expect to find if the null hypothesis is true. We
compare our Observed Frequencies with the Expected Frequencies and see how good
the fit is.

EXAMPLE:
Working through the computations for this example will enable you to
understand how the One-Variable Χ2 Or Goodness-Of-Fit Test is used.

A sample of 110 teenagers were asked which of four types of handphone brands they
preferred. The number of people choosing the different brands were recorded in Table
8.1.

Brand A Brand B Brand C Brand C


20 teenagers 60 teenagers 10 teenagers 20 teenager

Table 8.1 Preferences for Brands of Handphones

We want to find out if one or more brands are preferred over others. If they are
not, then we should expect roughly the same number of people in each category.
There will not be exactly the same number of people in each category, but they should
be near equal.
Another way of saying this is: If the null hypothesis is TRUE, and some
brands are not preferred more than others, then all brands should equally represented.
We expect roughly EQUAL NUMBERS IN EACH CATEGORIES, if the NULL
HYPOTHESIS is TRUE.

Expected Frequencies

There is 110 people, and there are four categories. If the null hypothesis is true, then
we should expect 110 / 4 = 27.5 teenagers to be in each category. This is because, if
all brands of handphones are equally popular, we would expect roughly equal
numbers of people in each category. In other words, the number of teenagers should
be evenly distributed among the four brands.
 The numbers that we find in the four categories, if the null hypothesis is true
are called the EXPECTED FREQUENCIES (i.e. all brands are equally
popular)
 The numbers that we find in the four categories are called the OBSERVED
FREQUENCIES (i.e. based on the data we collected).

See Table 8.2. What χ2 does is to compare the Observed Frequencies with the
Expected Frequencies.
4
Chapter 8: Chi-Square

 If all brands of handphones are equally popular, the Observed Frequencies will
not differ from the Expected Frequencies.
 If the Observed Frequencies differ a great deal from the Expected Frequencies,
then it is likely that all four brands of handphones are not equally popular.

Table 8.2 shows the observed and expected frequencies for the four brands of
handphones. It is often difficult to tell just by looking at the data which is why you
have to use the χ2 test.

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6


Observed Expected Difference Square Observed
divided by
Expected
Brand A 20 27.5 -7.5 56.25 2.05
Brand B 60 27.5 32.5 1056.25 38.41
Brand C 10 27.5 -17.5 306.25 11.14
Brand C 20 27.5 -7.5 56.25 2.05

TOTAL 53.65

Table 8.2 Expected and Observed Frequencies and the Differences

HOW DO YOU DETERMINE IF THE OBSERVED AND EXPECTED


FREQUENCIES ARE SIMILAR?

Step 1:
Calculate the differences between the Expected Frequencies and Observed
Frequencies (see Column 4). Do not worry about the minus and plus signs!

Step 2:
Square the differences (see Column 5) to obtain the absolute value of the difference.

Step 3:
Divide the squared difference with the measure of variance (see Column 6). The
„measure of variance‟ is the Expected Frequencies (i.e. 27.5). For Brand A it is 56.25 ∕
27.5 = 2.05 and do the same for the other brands.

Step 4:
Add up the figures you obtained in Column 6 and you get 53.65. So the χ2 is 53.65.
5
Chapter 8: Chi-Square

The FORMULA for the χ2 which you did above is shown as follows:

Step 5:
The degrees of freedom (DF) is one less than the number of categories. In this case
DF is 4 categories – 1 = 3. We need to know this, for it is usual to report the DF,
along with the χ2 and the associate probability level.

SPSS Output

HANDPHONES
Chi-Square 45.636a
Df 3
Asymp. Sig. .0000

a. 0 cells (.0%) have expected frequencies less than 5.


The minimum expected cell frequency is 27.5

The χ2 value of 53.65 (rounded to 53.6) is compared with that value that would be
expected for a χ2 with 3 DF, if the null hypothesis were true (i.e. all brands of
handphones are preferred equally). [SPSS will compute this comparison]. The SPSS
Output shows that with a χ2 value of 53.6 the associated probability value is 0.0001.
This means that the probability that this difference was due to chance is very small.
We can conclude that there is a significant difference between the Observed and
Expected Frequencies; i.e. all the four brands of handphones are not equally popular.
More people prefer brand B (60) than the other handphone brands.
6
Chapter 8: Chi-Square

SPSS PROCEDURES FOR THE CHI-SQUARE TEST FOR


GOODNESS OF FIT

 Select the Data menu.


 Click on the Weight Cases….to open the Weight Cases dialogue
box
 Click on the Weight cases by radio button
 Select the variable you require and click on the > button to move
the variable in the Frequency Variable: box.
 Click on OK. The message Weight On should appear on the status
bar at the bottom of the application window.
 Select the Analyze menu.
 Click on Nonparametric Tests and then Chi-Square…to open the
Chi-Square Test dialogue box.
 Select the variable you require and click on the > button to move to
the variable into the Test Variable List: box.
 Click on OK.

Chi-square (χ2 ) enables you to discover whether there is a relationship or association


between two categorical variables. For example, is there an association between
students who smoke and do not smoke, and students who like or do not like school.
This is categorical data, because we are asking whether they smoke or do not smoke
(not how many cigarettes they smoke) and whether they are active or not active in
sports. The design of the study is shown in Table 8.3 which is called contingency
table and it is 2 x 2 because there are two rows and two columns.

Smoke Do not Smoke


Not Active in Sports 50 15
Active in Sports 20 25

Table 8.3 2 x 2 Contingency Table

EXAMPLE:
Say for example you ask 110 students the following questions:
 How many of you smoke and are active in sports?
7
Chapter 8: Chi-Square

 How many of you smoke and are not active in sports?


 How many of you do not smoke and are active in sports?
 How many of you do not smoke and not active in sports?

b) χ2 test for Independence: 2 X 2

The other primary use of the chi-square test is to examine whether two
variables are independent or not. What does it mean to be independent? It means that
the two factors are not related. Typically in educational research, we are interested in
finding factors that are related. For example, education and income, occupation and
prestige, age and job satisfaction. In this case, the chi- square can be used to assess
whether two variables are independent or not.
More generally, we say that variable Y is "not correlated with" or
"independent of" variable X if more of one is not associated with more of another. If
two categorical variables are correlated their values tend to move together, either in
the same direction or in the opposite.

Example

A researcher is interested in finding out whether males from high income or low
income students get into trouble more often in school. Table 8.4 is the table
documenting the percentage of high income and low income students who have
discipline problems in school:

Discipline No Discipline Total


Problems Problems
Low Income 46 71 117
High Income 37 83 120
Total 83 154 237

Table 8.4 Observed Frequencies

To examine statistically whether boys got in trouble in school more often, we need to
frame the question in terms of hypotheses.

Step 1: Establish Hypotheses

The first step of the chi-square test for independence is to establish hypotheses. The
null hypothesis is that the two variables are independent - or, in this particular case
that the likelihood of getting into discipline problems is the same for high income and
low income students. The alternative hypothesis to be tested is that the likelihood of
getting in into discipline problems is not the same for high income and low income
students.
8
Chapter 8: Chi-Square

It is important to keep in mind that the chi-square test only tests whether two
variables are independent. It cannot address questions of which is greater or less.
Using the chi-square test, we cannot evaluate directly the hypothesis that low income
students get in trouble more than high income students; rather, the test (strictly
speaking) can only test whether the two variables are independent or not.

Step 2: Calculate the expected value for each cell of the table

As with the goodness-of-fit example described earlier, the key idea of the chi-square
test for independence is a comparison of observed and expected values. How many of
something were expected and how many were observed in some process? In the case
of tabular data, however, we usually do not know what the distribution should look
like. Rather, in this use of the chi-square test, expected values are calculated based on
the row and column totals from the table.

The expected value for each cell of the table can be calculated using the following
formula:

For example, in the table comparing the percentage of high income and low income
students involved in discipline problems, the expected count for the number of low
income students with discipline problems (Cell A) is:

117 x 83
Expected Frequency (E1) = = 40.97
237

120 x 154

Expected Frequency (E4) = 237 = 77.97

Use the formula and compute the Expected Frequencies for E2 and E3. Table 8.5
shows the completed expected frequencies for all the four cells.

Discipline No Discipline Total


Problems Problems
Low Income O = 46 O = 71 117
E1 = E2 =
High Income O = 37 O = 83 120
E3 = E4 =
Total 83 154 237

Table 8.5 Observed and Expected Frequencies


9
Chapter 8: Chi-Square

Step 3: Calculate Chi-square statistic

With these sets of figures, we calculate the chi-square statistic as follows:

Observed Frequency ─ Expected Frequency

Chi-square = Sum of

Expected Frequency

In the example above, we get a chi-square statistic equal to:

Step 4: Assess significance level

a) Degrees of Freedom

Before we can proceed we need to know how many degrees of freedom we have.
When a comparison is made between one sample and another, a simple rule is that the
degrees of freedom equal (number of columns minus one) x (number of rows minus
one) not counting the totals for rows or columns.

For our data this gives (2-1) x (2-1) = 1.

b) Statistical Significance

 We now have our chi square statistic (χ2 = 1.87), our predetermined alpha
level of significance (0.05), and our degrees of freedom (df =1). Entering the
Chi square distribution table with 1 degree of freedom and reading along the
row we find our value of χ2 = 1.87 is below 3.841 (see Table 8.6).
 When the computed χ2 statistic is less than the critical value in the table for a
0.05 probability level, then we DO NOT reject the null hypothesis of equal
distributions.
 Since our χ2 = 1.87 statistic is less than the critical value for 0.05 probability
level (3.841) we DO NOT reject the null hypothesis and conclude that
students from low income families are NOT SIGNIFICANTLY more likely to
have discipline problems than students from high income families.
10
Chapter 8: Chi-Square

probability level (alpha)


Df 0.5 0.10 0.05 0.02 0.01 0.001

1 0.455 2.706 3.841 5.412 6.635 10.827

2 1.386 4.605 5.991 7.824 9.210 13.815

3 2.366 6.251 7.815 9.837 11.345 16.268

4 3.357 7.779 9.488 11.668 13.277 18.465

5 4.351 9.236 11.070 13.388 15.086 20.517

Table 8.6 Extract from the Table of χ2 Critical Values


11
Chapter 8: Chi-Square

SPSS PROCEDURES FOR THE CHI-SQUARE TEST FOR


RELATEDNESS OR INDEPENDENCE

 Select the Analyze menu

 Click on Descriptive Statistics and then on Crosstabs…..to open the


Crosstabs dialogue box.

 Select a row variable and click on > button to move the variable into the
Row(s): box

 Select a column variable and click on the > button to move the variable into
the Column(s): box

 Click on the Statistics….command pushbutton to open the Crosstabs:


Statistics sub-dialogue box

 Click on the Chi-square box

 Click on Continue

 Click on the Cells….command pushbutton to open the Crosstabs: Cell


Display sub-dialogue box

 In the Counts box, click on the Observed and Expected check boxes

 In the Percentages box, click on the Row, Column and Total check boxes

 Click on Continue and then OK.


12
Chapter 8: Chi-Square

LEARNING ACTIVITY

10-14 15-19 20-24 25-29


years years years years
Observed 72 31 15 50

Look at the table above:

 What is the value of the expected frequencies?

LEARNING ACTIVITY

A study was conducted to determine if science and


mathematics should be taught in English. A total of 105
parents were asked to respond „yes‟ or „no‟. The data
was categorised according to whether they were from an
urban or rural area and is shown in the table below:

Yes No Total
Urban 36 14 50
Rural 30 25 55
Total 66 39 105

 What is the null hypothesis? What is the alternative hypothesis?


 How many degrees of freedom are there?
 What is the value of the chi-square statistic for this table?
 What is the p-value of this statistic?
13
Chapter 8: Chi-Square

SUMMARY

 In some situations one needs to use non-parametric statistics because the


variable measured are not interval or ratio but instead are categorical.

 Goodness of fit test enables us to find out whether a set of Obtained (or
Observed) Frequencies differs from a set of Expected Frequencies.

 Chi-square (χ2 ) enables you to discover whether there is a relationship or


association between two categorical variables.

 Chi-square (χ2 ) compares the Observed Frequencies with the Expected


Frequencies.

 If the Observed Frequencies differ a great deal from the Expected Frequencies,
then it is likely that there are significant differences.

 The degrees of freedom (DF) is one less than the number of categories.

 The chi-square test is used to examine whether two variables are independent
or not; i.e. whether the two factors are not related.

KEY WORDS:

 Goodness of fit
 Chi-square
 Test of independence
 Observed frequencies
 Expected frequencies
 Row total
 Column total
 Degress of freedom

--------000--------

You might also like