You are on page 1of 16

Clarifying the Concepts 15-1. Distinguish nominal, ordinal, and scale data.

Answer: Nominal data are those that are categorical in nature; they cannot be ordered in any meaningful way, and they are often thought of as simply named. Ordinal data can be ordered, but we cannot assume even distances between points of equal separation. For example, the difference between the second and third scores may not be the same as the difference between the seventh and the eighth. Scale data are measured on either the interval or ratio level; we can assume equal intervals between points along these measures. 15-3. What is the difference between the chi-square test for goodness-of-fit and the chi-square test for independence? Answer: The chi-square test for goodness-of-fit is a nonparametric hypothesis test used with one nominal variable. The chisquare test for independence is a nonparametric test used with two nominal variables. 15-5. List two ways in which statisticians use the word independence or independent with respect to concepts introduced earlier in this book. Then describe how independence is used by statisticians with respect to chi square. Answer: Throughout the book, we have referred to independent variables, those variables that we hypothesize to have an effect on the dependent variable. We also described how statisticians refer to observations that are independent of one another, such as a between-groups research design requiring that observations be taken from independent samples. Here, with regard to chi square, independence takes on a similar meaning. We are testing whether the effect of one variable is independent of the otherthat the proportion of cases across the levels of one variable does not depend on the levels of the other variable. 15-7. How are the degrees of freedom for the chi-square hypothesis tests different from those of most other hypothesis tests? Answer: In most previous hypothesis tests, the degrees of freedom have been based on sample size. For the chi-square hypothesis tests, however, the degrees of freedom are based on the numbers of categories, or cells, in which participants can be counted. For example, the degrees of freedom for the chi-square test for goodness-of-fit is the number of categories minus 1:dfX2 = k 1. Here, k is the symbol for the number of categories. 15-9. What information is presented in a contingency table in the chi-square test for independence? Answer: The contingency table presents the observed frequencies for each cell in the study.

15-11. Define the symbols in the following formula: Answer: This is the formula to calculate the chi-square statistic, which is the sum, for each cell, of the squared difference between each observed frequency and its matching expected frequency, divided by the expected value for its cell. 15-13. Why do we sometimes convert scale data to ordinal data?

Answer: When we are concerned about meeting the assumptions of a parametric test, we can convert scale data to ordinal data and use a nonparametric test. 15-15. When do we use the MannWhitney U test? Answer: We use the MannWhitney U test when there are two groups, a between-groups design, and an ordinal dependent variable. 15-17. Explain how the relation between ranks is the core of the Spearman rank-order correlation. Answer: In all correlations, we assess the relative position of a score on one variable with its position on the other variable. In the case of the Spearman rank-order correlation, we examine how ranks on one variable relate to ranks on the other variable. For example, with a positive correlation, scores that rank low on one variable tend to rank low on the other, and scores that rank high on one variable tend to rank high on the other. For a negative correlation, low ranks on one variable tend to be associated with high ranks on the other. Calculating the Statistics 15-19. In order to compute statistics, we need to have working formulas. For each of the following, (i) identify the incorrect symbol, (ii) state what the correct symbol should be, and (iii) explain why the initial symbol was incorrect. a. For the chi-square test for goodness-of-fit: df2 = N 1 b. For the chi-square test for independence: df2 (krow 1) + (kcolumn 1)

c.

d.

e.

f.

g. Answer: a. (i) N is incorrect. (ii) k is the correct symbol. (iii) Degrees of freedom for the chi-square test of goodness-of-fit is based on the number of groups, symbolized by k. b. (i) + is incorrect. (ii) The multiplication symbol is the correct symbol. (iii) When obtaining the degrees of freedom for the chi-square test for independence, we multiply the degrees of freedom associated with each variable. c. (i) M is incorrect. (ii) O is the correct symbol. (iii) Calculation of chi square involves calculating the difference between observed (O) and expected frequencies.

d. (i) k is incorrect. (ii) df is the correct symbol. (iii) Calculation of Cramers V involves dividing by the degrees of freedom, not the number of groups. e. (i) Both ks are incorrect. (ii) Total is the correct symbol. (iii) Calculation of the expected values is based on the total counts for the rows and the columns, not the numbers of categories. f. (i) The r is incorrect. (ii) rS is the correct symbol. (iii) This is the formula for the Spearman rank-order correlation, which requires the subscript S. g. (i) R12 is incorrect. (ii) R1 is the correct symbol. (iii) In the MannWhitney U test, we do not square the ranks before we sum them; we just sum the ranks. 15-21. Use this calculation table for the chi-square test for goodness-of-fit to complete this exercise.

a. Calculate degrees of freedom for this chi-square test for goodness-of-fit. b. Perform all of the calculations to complete this table. c. Compute the chi-square statistic. Answer: a. dfX2 = k 1 = 3 1 = 2 b.

15-23. Below are some data to use in a chi-square test for independence. Calculate the degrees of freedom for this test.

Answer: df2 = (krow 1)(kcolumn 1) 5 (2 1)(2 1) = 1 15-25. Using the data presented in Exercise 15-23 and the work you did in Exercise 15-24, calculate the test statistic. Answer:

15-27. Convert the following scale data to ordinal or ranked data, starting with a rank of 1 for the smallest data point.

Answer:

15-29. Compute the Spearman correlation for the data listed in Exercise 15-27. Answer:

15-31. Compute the Mann Whitney U test on the following data:

Answer: Rgroup1 = 1 + 2.5 + 8 + 4 + 6 + 10 = 31.5 Rgroup2 = 11 + 9 + 2.5 + 5 + 7 + 12 = 46.5 The formula for the first group is:

The formula for the second group is:

Our test statistic would be 10.5, because it is the smaller of the two. Applying the Concepts 15-33. For each of the following research questions, state whether a parametric or nonparametric hypothesis test is more appropriate. Explain your answers. a. Are women more or less likely than men to be economics majors? b. At a small company with 15 staff and 1 top boss, do those with a college education tend to make a different amount of money from those without one? c. At your high school, did athletes or nonathletes tend to have higher grade point averages? d. At your high school, did athletes or nonathletes tend to have higher class ranks? e. Compare car accidents in which the occupants were wearing seat belts with accidents in which the occupants were not wearing seat belts. Do seat belts seem to make a difference in the numbers of accidents that lead to no injuries, nonfatal injuries, and fatal injuries? f. Compare car accidents in which the occupants were wearing seat belts with accidents in which the occupants were not wearing seat belts. Were those wearing seat belts driving at slower speeds, on average, than those not wearing seat belts?

Answer: a. A nonparametric test would be appropriate because both of the variables are nominal: gender and major. b. A nonparametric test is more appropriate for this question because the sample size is small and the data are unlikely to be normal; the top boss is likely to have a much higher income than the other employees. This outlier would lead to a nonnormal distribution. c. A parametric test would be appropriate because the independent variable (type of student: athlete versus nonathlete) is nominal and the dependent variable (gradepoint average) is scale. d. A nonparametric test would be appropriate because the independent variable (athlete versus nonathlete) is nominal and the dependent variable (class rank) is ordinal. e. A nonparametric test would be appropriate because the research question is about the relation between two nominal variables: seat-belt wearing and degree of injuries. f. A parametric test would be appropriate because the independent variable (seat-belt use: no seat belt versus seat belt) is nominal and the dependent variable (speed) is scale. 15-35. A New York Times article on grade inflation reported several findings related to a tendency for average grades to rise over the years and a tendency for the top-ranked institutions to give the highest average grades (Archibold, 1998). For each of the findings outlined below, state (i) the independent variable or variables, and their levels where appropriate; (ii) the dependent variable(s); and (iii) what category of research design is being used:

Iscale independent variable(s) and scale dependent variable IInominal independent variable(s) and scale dependent variable IIIonly nominal variables Explain your answer to part (iii). a. In 1969, 7% of all grades were As; in 1994, 25% of all grades were As. b. The average GPA for the graduating students of elite schools is 3.2; the average GPA for graduating students at selective schools (the level below elite schools) is 3.04; and the average GPA for graduating students at state colleges is 2.95. c. At Dartmouth College, an elite university, SAT scores of incoming students have increased along with their subsequent college GPAs (perhaps an explanation for grade inflation). Answer: a. (i) Year. (ii) Grades received. (iii) This is a category III research design because the independent variable, year, is nominal and the dependent variable, grade (A or not), could also be considered nominal. b. (i) Type of school. (ii) Average GPA of graduating students. (iii) This is a category II research design because the independent variable, type of school, is nominal and the dependent variable, GPA, is scale. c. (i) SAT scores of incoming students. (ii) College GPA. (iii) This is a category I research design because both the independent variable and the dependent variable are scale. 15-37. CNN.com reported on a 2005 study that ranked the worlds cities in terms of how livable they are using a range of criteria related to stability, health care, culture and environment, education, and infrastructure. Vancouver came out on top. For each of the following research questions, state which nonparametric hypothesis test is most

appropriate: Spearman rank-order correlation coefficient or MannWhitney U test. Explain your answers. a. Which cities tend to receive higher rankingsthose north or south of the equator? b. Are the livability rankings related to a citys economic status (assessed by rank)? Answer: a. The MannWhitney U test would be most appropriate because it is a nonparametric equivalent to the independent-samples t test. It is used when we have a nominal independent variable with two levels (here, they are north and south of the equator), a between-groups research design, and an ordinal dependent variable (here, it is the ranking of the city). b. The Spearman rank-order correlation would be most appropriate because we are asking a question about the relation between two ordinal variables. 15-39. Across all of India, there are only 933 girls for every 1000 boys (Lloyd, 2006), evidence of a bias that leads many parents to illegally select for boys or to kill their infant girls. (Note that this translates into a proportion of girls of 0.483.) In Punjab, a region of India in which residents tend to be more educated than in other regions, there are only 798 girls for every 1000 boys. Assume that you are a researcher interested in whether sex selection is more or less prevalent in educated regions of India and that 1798 children from Punjab constitute the entire sample. ( Hint: You will use the proportions from the national database for comparison.) a. How many variables are there in this study? What are the levels of any variable you identified? b. What hypothesis test would be used to analyze these data? Justify your answer. c. Conduct the six steps of hypothesis testing for this example. (Note: Be sure to use the correct proportions for the expected values, not the actual numbers for the population.) d. Report the statistics as you would in a journal article. Answer: a. There is one variable, gender of the children. Its levels are girls and boys. b. A chi-square test for goodness-of-fit would be used because we have one sample, the children from Punjab, and we are comparing proportions of children that fall within each level of gender (a nominal variable) to expectations based on national proportions. c. Step 1: Population 1 is children with gender proportions like those that we observed in Punjab. Population 2 is children with gender proportions similar to those in India as a whole. The comparison distribution is a chi-square distribution. The hypothesis test will be a chi-square test for goodness-of-fit because we have only one nominal variable. This study meets three of the four assumptions. (1) The variable under study is nominal. (2) Each observation is independent of the others. (3) There are more than five times as many participants as there are cells (there are 1798 children in the sample and only 2 cells). (4)We do not know, however, whether this is a randomly selected sample of the more educated people, so we must generalize with caution. Step 2: Null hypothesis: The proportions of boys and girls in Punjab are the same as those in India as a whole. Research hypothesis: The proportions of boys and girls in Punjab are different from those in India as a whole. Step 3: The comparison distribution is a chi-square distribution that has 1 degree of freedom: df2 = 2 1 = 1. Step 4: Our critical 2, based on a p level of 0.05 and 1 degree of freedom, is 3.841. Step 5:

Step 6: Reject the null hypothesis. Our calculated chi square value exceeds our critical value. It appears that the proportion of girls in Punjab is less than that in the general population of India. d. 2(1, N = 1798) = 11.05, p < 0.05 15-41. In a classic prisoners dilemma game with money for prizes, players who cooperate with each other both earn good prizes. If, however, your opposing player cooperates but you do not (the term used is defect), you receive an even bigger payout and your opponent receives nothing. If you cooperate but your opposing player defects, he or she receives that bigger payout and you receive nothing. If you both defect, you each get a small prize. Because of this, most players of such games choose to defect, knowing that if they coopera te but their partners dont, they wont win anything. The strategies of U.S. and Chinese students were compared. The researchers hypothesized that those from the market economy (United States) would cooperate less (i.e., would defect more often) than would those from the nonmarket economy (China).

a. How many variables are there in this study? What are the levels of any variables you identified? b. What hypothesis test would be used to analyze these data? Justify your answer. c. Conduct the six steps of hypothesis testing for this example, using the above data. d. Calculate the appropriate measure of effect size. According to Cohens conventions, what size effect is this? e. Report the statistics as you would in a journal article. Answer:

a. There are two variables in this study. The independent variable is the country the student is from (United States, China). The dependent variable is the choice the student made (defect, cooperate). b. A chi-square test for independence would be used because we have data on two nominal variables. c. Step 1: Population 1 contains students like those in this sample. Population 2 contains students from a population in which country of origin and choice to defect or cooperate are independent. The comparison distribution is a chisquare distribution. The hypothesis test will be a chi-square test for independence because we have two nominal variables. This study meets three of the four assumptions. (1) The two variables are nominal; (2) every participant is in only one cell; and (3) there are more than five times as many participants as there are cells (there are 122 participants and 4 cells). (4) The students were not randomly selected, however, so we should use caution when generalizing beyond this sample. Step 2: Null hypothesis: The proportion of Chinese students who choose to defect as opposed to cooperate is similar to the proportion for U.S. students. Research hypothesis: The proportion of Chinese students who choose to defect as opposed to cooperate is different from the proportion for U.S. students. Step 3: The comparison distribution is a chi-square distribution that has 1 degree of freedom: df2 = (krow 1)(kcolumn 1) = (2 1)(2 1) = 1. Step 4: Our cutoff 2, based on a p level of 0.05 and 1degree of freedom, is 3.841. Step 5:

Step 6: Reject the null hypothesis. Our calculated chi-square value exceeds our critical value. It appears that the proportion of participants who choose to defect is higher among U.S. students than among Chinese students. d. Cramers

According to Cohens conventions, this is a medium effect. e. 2(1, N = 122) = 9.99, p < 0.05, Cramers V = 0.29 15-43. Refer to the prisoners dilemma example in Exercise 15-41. a. Draw a table that includes the conditional proportions for participants from China and from the United States. (The conditional proportions are the proportions of Chinese who defect or cooperate and the proportions of Americans who defect or cooperate.) b. Create a graph with bars showing the proportions for all four conditions. c. Create a graph with two bars showing just the proportions for the defections for each country. Answer: a. The accompanying table shows the conditional proportions.

b. The accompanying graph shows the conditional proportions for all four conditions.

c. The accompanying graph shows only the bar for defects.

15-45. Here are some monthly cell phone bills, in dollars, for college students:

a. Convert these data from scale to ordinal. (Dont forget to put them in order first.) What happens to an outlier when you convert these data to ordinal? b. Roughly, what shape would the distribution of these data take? Would they likely be normally distributed? Explain why the distribution of ordinal data is never normal.

c. Why does it not matter if the ordinal variable is normally distributed? ( Hint: Think about what kind of hypothesis test you would conduct.) Answer: a. The accompanying table shows the ordered data and corresponding ranks. When converted to ordinal data, the outlier is still at the top of the distribution but is no longer very different from the rest of the scores in the distribution. Prior to converting to ordinal data, the outlier, 500, was well above the next-highest observation, 200. Now the scores of 500 and 200 are ranked 29 and 28, respectively.

b. The distribution is likely to be somewhat rectangular and not normal. However, the distribution of ordinal data is never normal because each score is assigned a rank, which means that each individual raw score usually has a different rank from the others. In most cases (unless there are ties), all frequencies would be 1. c. It does not matter that the ordinal transformation is not normally distributed because we would be using nonparametric statistics to analyze the data. Nonparametric statistics do not require the assumption that the underlying distribution is normal. 15-47. Does speed in completing a test correlate with ones grade? Here are test scores for eight students in one of our statistics classes. They are arranged in order from the student who turned in the test first to the student who turned in the test last.

98 74 87 92 88 93 62 67 a. What are the two variables of interest? For each variable, state whether its scale or ordinal.

b. Calculate the Spearman correlation coefficient for these two variables. Remember to convert any scale variables to ranks. c. What does the coefficient tell us about the relation between these two variables? d. Why couldnt we calculate a Pearson correlation coefficient for these data? Answer: a. The first variable of interest is test grade, which is a scale variable. The second variable of interest is the order in which students completed the test, which is an ordinal variable. b. The accompanying table shows test grade converted to ranks, difference scores, and squared differences.

We calculate the Spearman correlation coefficient as:

c. The coefficient tells us that there is a rather large positive relation between the two variables. Students who completed the test more quickly also tended to score higher. d. We could not have calculated a Pearson correlation coefficient because one of our variables, order in which students turned in the test, is ordinal. 15-49. Exercise 15-47 presented data to enable you to calculate the Spearman correlation coefficient that quantifies the relation between the speed of taking the test and the test grade. a. Does this correlation coefficient suggest that students should take their tests as quickly as possible? That is, does it indicate that taking the test quickly causes a good grade? Explain your answer. b. What third variables might be responsible for this correlation? That is, what third variables might cause both speedy test-taking and a good test grade? Answer: a. This correlation does not indicate that students should attempt to take their tests as quickly as possible. Correlation does not provide evidence for a particular causal relation. A number of underlying causal relations could produce this observed correlation. b. A third variable that might cause both speedy test-taking and a good test grade is knowledge of the material.

Students with better knowledge of and more practice with the material would be able to get through the test more quickly and get a better grade. 15-51. Do red states (U.S. states whose residents tend to vote Republican) have different voter turnouts from blue states (U.S. states whose residents tend to vote Democratic)? The accompanying table shows voter turnouts (in percentages) for the 2004 presidential election for eight randomly selected red states and eight randomly selected blue states.

a. What is the independent variable, and what are its levels? What is the dependent variable? b. Is this a between-groups or within-groups design? Explain. c. Conduct all six steps of hypothesis testing for a MannWhitney U test. d. How would you present these statistics in a journal article? Answer: a. The independent variable is type of state, and its levels are red and blue. The dependent variable is the percentage of registered voters who voted. b. This is a between-groups design because each state is either a red state or a blue state but cannot be both. c. Step 1: We need to convert our data to an ordinal measure. The states were randomly selected, so we can assume that they are representative of their populations. Finally, there are no tied ranks. Step 2: Null hypothesis: There is no difference between the voter turnout in red and blue states. Research hypothesis: There is a difference between the voter turnout in red and blue states. Step 3: There are eight red and eight blue states. Step 4: The critical value for a MannWhitney U test with two groups of eight, a p level of 0.05, and a two-tailed test is 15. The smaller calculated statistic needs to be less than or equal to this critical value to be considered statistically significant. Step 5:

Rred = 5 + 7 + 9 + 10 + 11 + 14 + 15 + 16 = 87 Rblue = 1 + 2 + 3 + 4 + 6 + 8 + 12 + 13 = 49

Step 6: The smaller calculated U, 13, is less than the critical value of 15, so we reject the null hypothesis. There is a statistically significant difference between voter turnout in red and blue states. Voter turnout tends to be higher in blue states than in red states. d. U = 13, p < 0.05

You might also like