You are on page 1of 11

Chi-Square Analysis (Ch.

8)
Chi-square test of association (contingency) 2x2 tables rxc tables Post-hoc Interpretation Running SPSS Windows CROSSTABS Chi-square test of goodness of fit

Purpose
Chi-square test of association 2X2 associations (i.e., relation between two dichotomous variables) Examples Gender (m/f) x Experience of physical aggression in past year (yes/no) First Language (English / Not English) x Getting Question on Test Correct (Correct/Incorrect)

Purpose
Chi-square test of association
RxC associations (i.e., more categories than 2x2)

Examples
Socioeconomic Status x Vehicle Brand Age Group x Preferred Music Genre

Example of a 2X2
Testing the association between Beer Consumption and Gender Null hypothesis
No association Proportion of cases in one cell to the marginal (e.g. 44/70) = proportion of the marginal on the other variable to the total (e.g., 54/104)
Gender

Drink Beer Yes No

44

10

54

26

24

50

70

34

104

Example of a 2X2: Calculating Expected Values


That is: H0: E11 = R1
C1 T
Drink Beer Yes No

E11 54 = 70 104

E11 = 36.3
Once we have calculated one expected value, the others follow:

M
Gender

54 44 10 (36.3) (17.7) 26 24 50 (33.7) (16.3) 70 34 104

E12 = 70 36.3 = 33.7

Example of a 2X2: Calculating ChiSquare value


2 =

(O

bserved

E xpected )

Drink Beer Yes No

E xpected
M
Gender

2 =

(44 36.3)2 + ... + (24 16.3)2


36.3 16.3

54 44 10 (36.3) (17.7) 26 24 50 (33.7) (16.3) 70 34 104

= 10.3

df = (r-1) (c-1) = 1 critical value at .01 = 6.64 (see Table E in book) Report: 2 (1) = 10.3, p < .01

Example of a 2X2: Test of Proportion


In the case of a 2x2, instead of a Chi-square test, you could use a test of proportion. For example we could compare the Proportion of male beer drinkers (44/54=.815) and female beer drinkers (26/50=.520).
Z=

Drink Beer Yes No

M 44 p1 p 2 n1 + n 2 p= Gender 1 1 1+ N2 N + pq F 26 N1 N 2 q = 1 p 44 + 26 p= 54 + 50 70 .815 .520 Z= 1 1 Z = 3.204, p < .01 + (.673)(.327 ) 54 50 Zcrit @ .01 = 2.58

10

54

24

50

34

104

Assumptions of Chi-Square Test


Sampling distributions of the O-E deviations is normal
Potential problem if expected values are really small

Data points must be independent of each other


A subject contributes only once to the frequency count

What to do with small expected frequencies?


Yates correction (not recommended) Cochrans rule
All expected frequencies greater than 1 No more than 20% should be less than 5 For example in a 2X2, if you have one cell with expected frequency smaller than five (1/4 = 25%), you have violated Cochrans rule

Collapse cells when possible (i.e., combine categories)

2 Distribution for different dfs

Using SPSS
The data can be in 2 forms: By Category gender beer 1 1 2 2 1 2 1 2 frequency 44 10 26 24

Gender: male = 1, female = 2 Beer: yes = 1, no = 2

Using SPSS
or, by subject (would be 104 rows) male = 1, female = 2 gender beer 1 1 1 2 1 2 . . . 2 2 2 1 yes = 1, no = 2

Using SPSS
Note: if you input the data this way, you must do the following in the Data Window: Data Weight cases by (freq var)

Using SPSS
Analyze Descriptive Statistics Crosstabs In Crosstabs, click on Chi-square under Statistics Observed, Expected, and Unstandardized Residuals under Cells

Using SPSS

Using SPSS
Gender * Beer Crosstabulation Beer yes Gender male Count Expected Count Residual Count Expected Count Residual Count Expected Count 44 36.3 7.7 26 33.7 -7.7 70 70.0 no 10 17.7 -7.7 24 16.3 7.7 34 34.0 Total 54 54.0 50 50.0 104 104.0

female

Total

Using SPSS
Chi-Square Tests Value 10.255b 8.959 10.467 df 1 1 1 Asymp. Sig. (2-sided) .001 .003 .001 Exact Sig. (2-sided) Exact Sig. (1-sided)

Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases

.002 10.156 104 1 .001

.001

a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 16. 35.

R x C Example
2 with variables having > 2 levels first step is the same might want to do post hoc tests to further understand the association
Look at table and describe the association (focus on large residuals) Or pick out specific cells (2 x 2) and test Or collapse cells to make a 2 x 2 and test

R x C Example: Adjusting the Type I Error Rate


Make adjustment for increased chance of Type I error in posthoc tests Can use Bonferroni adjustment (when constructing a 2x2 table from existing cells k = # of 2 x 2 tables that can be made from a r x c table k= r! c! ---------- * ---------2!(r-2)! 2!(c-2)!

use = .05/k

R x C Example: Obtained and Expected Frequencies


residenc * yr_study Crosstabulation yr_study second third 182 56 177.7 179.7 4.3 -123.7 42 67 48.2 48.8 -6.2 18.2 143 133 132.7 134.2 10.3 -1.2 229 319 221.8 224.3 7.2 94.7 24 52 39.6 40.1 -15.6 11.9 620 627 620.0 627.0 residenc res Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count first 421 181.5 239.5 20 49.2 -29.2 130 135.4 -5.4 45 226.4 -181.4 17 40.4 -23.4 633 633.0 fourth 23 143.1 -120.1 56 38.8 17.2 103 106.8 -3.8 258 178.5 79.5 59 31.9 27.1 499 499.0 Total 682 682.0 185 185.0 509 509.0 851 851.0 152 152.0 2379 2379.0

myself

home-parents

roommate

spouse-partner

Total

R x C Example: Chi-Square Test


Chi-Square Tests Value 803.377a 870.058 587.664 2379 df 12 12 1 Asymp. Sig. (2-sided) .000 .000 .000

Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 31.88.

R x C Example: Examine the Cells


residenc * yr_study Crosstabulation yr_study second third 182 56 177.7 179.7 4.3 -123.7 42 67 48.2 48.8 -6.2 18.2 143 133 132.7 134.2 10.3 -1.2 229 319 221.8 224.3 7.2 94.7 24 52 39.6 40.1 -15.6 11.9 620 627 620.0 627.0 residenc res Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count first 421 181.5 239.5 20 49.2 -29.2 130 135.4 -5.4 45 226.4 -181.4 17 40.4 -23.4 633 633.0 fourth 23 143.1 -120.1 56 38.8 17.2 103 106.8 -3.8 258 178.5 79.5 59 31.9 27.1 499 499.0 Total 682 682.0 185 185.0 509 509.0 851 851.0 152 152.0 2379 2379.0

myself

home-parents

roommate

spouse-partner

Total

R x C Example: Examine the Cells


residenc * yr_study Crosstabulation yr_study second third 182 56 177.7 179.7 4.3 -123.7 42 67 48.2 48.8 -6.2 18.2 143 133 132.7 134.2 10.3 -1.2 229 319 221.8 224.3 7.2 94.7 24 52 39.6 40.1 -15.6 11.9 620 627 620.0 627.0 residenc res myself Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count Residual Count Expected Count first 421 181.5 239.5 20 49.2 -29.2 130 135.4 -5.4 45 226.4 -181.4 17 40.4 -23.4 633 633.0 fourth 23 143.1 -120.1 56 38.8 17.2 103 106.8 -3.8 258 178.5 79.5 59 31.9 27.1 499 499.0 Total 682 682.0 185 185.0 509 509.0 851 851.0 152 152.0 2379 2379.0

home-parents

roommate

spouse-partner

Conclusion: A large proportion of the Chi square can be explained by the Fact that there is a very large proportion of first year students who live in residence

Total

OE E

239.52 = 316.03 181.5 316.03 = 39.3% 803.38

Contribution of cell (first year-residence) Chi square value

2 x 2 Posthoc: Examine Specific Contrast


Extract a 2 x 2 table of interest or Collapse categories to form a 2 x 2 table (the example follows this second approach) In SPSS you can use the command RECODE to form new categories I did a 2 x 2 analysis in which I collapse all non-first year students into one category and all non-residence living students I did this in the syntax menu using the following commands:
recode yr_study (1=1) (2 thru hi = 2) into year. recode residenc (1=1) (2 thru hi = 2) into resid. execute.

2 x 2 Posthoc: Expected and Obtained Frequencies


resid * year Crosstabulation year first year 2-4 years 421 261 181.5 500.5 239.5 -239.5 212 1485 451.5 1245.5 -239.5 239.5 633 1746 633.0 1746.0

resid

res

other housing

Total

Count Expected Count Residual Count Expected Count Residual Count Expected Count

Total 682 682.0 1697 1697.0 2379 2379.0

2 x 2 Posthoc Bonferroni Adjustment


k=
k=

r! c! 2!(r 2)! 2!(c 2)!

k = 60

= .05 60
= .0008

5X 4 X 3X 2 X1 4 X 3X 2 X1 2 X 1X 3 X 2 X 1 2 X 1X 2 X 1

In our example, we collapsed a number of categories. Therefore, we would not use the above adjustment. Gardner indicates that there are no specific meaningful Bonferonni adjustment when categories are collapsed and suggests at a minimum to use a Type I error rate of .01

2 x 2 Posthoc: Chi-Square Test


Chi-Square Tests Value 603.957b 601.438 570.564 df 1 1 1 Asymp. Sig. (2-sided) .000 .000 .000 Exact Sig. (2-sided) Exact Sig. (1-sided)

Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases

.000 603.703 2379 1 .000

.000

a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 181. 47.

Chi Square Test of Goodness of Fit


How closely a set of obtained frequencies compares to expected frequencies (based on theory or previous information) A significant test indicates badness of fit Use the same formula:

2 =

(O

bserved

E xpected )

E xpected

Example (Goodness of fit)


You conduct a study to evaluate the frequency of alcohol consumption of university students. You want to determine whether your distribution differs from previous findings suggesting the following distribution:
Category Never < Once per month 1-3 times per month Once per week More than once per week* % 5.0 22.1 31.6 18.7 22.6

*I collapsed three categories (2-3 times per week, 4-6 times per week, and every day)

Example (Goodness of fit)


Category Never < Once per month 1-3 times per month Once per week More than once per week* % 5.0 22.1 31.6 18.7 22.6 Obtained 15 40 50 35 60 200 Expected 10 44.2 63.2 37.4 45.2 200

10

Example (Goodness of fit)


Obtained 15 40 50 35 60 200 Expected 10 44.2 63.2 37.4 45.2 200
2 =

(O

bserved

E xpected )

E xpected

2 =

(15 10 )2 + ... + (60 45.2)2


10 45.2

= 10.66
df = number of categories - 1 = 4 Gardner recommends Type I error rate of .20 Critical value at .20 = 5.99 2 Reject null of good fit: (4) = 10.66 p < .20

11

You might also like