Professional Documents
Culture Documents
Chapter 11 Solutions Develop Your Skills 11.1 1. These data are collected on a random sample of days. They should be independent, unless the locations are close enough to each other that the foot traffic at each would be affected by the same factors. We will assume this is not the case. Histograms show approximate normality.
10 8 6 4 2 0 NumberofPeople
NumberofDays
275
The histogram for foot traffic at location 1 shows some right-skewness, but sample sizes are reasonable, and close to the same, so we will assume the population data are normally distributed. The largest variance is 478.7 (for location 2), and the smallest is 257.2 (location 1). The largest variance is less than twice as large as the smallest. So, following our rule, we will assume the population variances are approximately equal. Therefore, these data meet the required conditions for one-way ANOVA.
276
2.
Because we don't know the details of how the cashiers made their sample selection, we cannot know if the sample was truly random or independent. We will assume that the sample data were properly collected. Histograms suggest normality.
NumberofPurchases
15 10 5 0
ValueofPurchase
WineryPurchasesforCustomers Aged3050
18 16 14 12 10 8 6 4 2 0
NumberofPurchases
ValueofPurchase
WineryPurchasesforCustomers Over50YearsofAge
14
NumberofPurchases
12 10 8 6 4 2 0
ValueofPurchase
277
The largest variance is 652.9, and the smallest is 555.1, so clearly the sample variances are fairly close in value. We will assume that the population variances are approximately equal. These data appear to meet the requirements for one-way ANOVA. 3. We will presume that the college collected the sample data appropriately, so the data are independent and truly random. The histograms suggest normality.
AnnualSalariesofMarketing Graduates
7
NumberofGraduates
6 5 4 3 2 1 0
AnnualSalary
AnnualSalariesofAccounting Graduates
9 8 7 6 5 4 3 2 1 0 NumberofGraduates
AnnualSalary
278
AnnualSalariesofHuman ResourcesGraduates
8 7 6 5 4 3 2 1 0 NumberofGraduates
AnnualSalary
AnnualSalariesofGeneral BusinessGraduates
NumberofGraduates 7 6 5 4 3 2 1 0
AnnualSalary
The largest variance is 159,729,974, and the smallest is 70,826,421. The ratio of the largest to the smallest is about 2.3, which is meets the requirement (less than four). These data appear to meet the requirements for one-way ANOVA.
279
4.
It appears the data are randomly selected, and independent. The data sets are too small for histograms, but stem-and-leaf displays suggest normality.
Route1 3 3 4 0 5 1 6 0
6 5 4
6 7
Route2 2 2 3 2 4 6
8 3 9
8 5
Route3 3 1 4 3 5 3 6 1
6 6 5
9 7
The largest variance is 94, the smallest is 67, for a ratio of largest-to-smallest of about 1.4. This is within the accepted range, so we will assume the population variances are approximately equal. These data appear to meet the requirements for one-way ANOVA. 5. The histograms appear approximately normal. We have to be a bit cautious about assuming these are random samples. For example, one class may be mostly Accounting students, one may be mostly Marketing students, etc. The students who have selected these programs may have different levels of interest and aptitudes for statistics. We will assume that the classes are approximately randomly selected, in the absence of other information, but should note the caution. The largest variance is not much larger than the smallest variance, so we will assume the population variances are approximately equal.
280
Develop Your Skills 11.2 6. H0: 1 = 2 = 3 H1: At least one differs from the others. = 0.05 nT = 85, n1 = 27, n2 = 30, n3 = 28, k = 3 x1 50.5556, x 2 56.6, x 3 74.3214
2 2 s12 257.1795, s2 478.7310, s3 333.5595 SSbetween = 8475.2497, SSwithin = 29,575.9738
MSbetween MS within
The F-distribution has 2, 82 degrees of freedom. The closest we can come in the table is 2, 80. We see that the p-value is < 1% (Excel provides a p-value of 0.00003). Reject H0. There is sufficient evidence to conclude that at least one of the locations has a different average number of daily passersby than the others. The Excel output for this data set is shown below.
Count 27 30 28
df 2 82 84
MS 4237.6249 360.6826074
281
7.
H0: 1 = 2 = 3 H1: At least one differs from the others. = 0.05 nT = 150, n1 = 50, n2 = 50, n3 = 50, k = 3 x1 77.5684, x 2 119.6708, x 3 132.4674
2 2 s12 652.9145, s2 555.0899, s3 625.7846 SSbetween = 82504.4210, SSwithin = 89855.6606
We have already checked for normality and equality of variances. F = 67.5 The F-distribution has 2, 147 degrees of freedom. Excel provides a p-value of approximately zero. Reject H0. There is sufficient evidence to conclude that customers in different age groups make different average purchases. 8. H0: 1 = 2 = 3 = 4 H1: At least one differs from the others. = 0.025 nT = 80, n1 = 20, n2 = 20, n3 = 20, n4 = 20, k = 4 x1 51,395, x 2 71,170, x 3 56,100, x 4 53,885
2 2 2 s12 159,729,973.68, s2 70,826,421.05, s3 116,576,842.11, s4 76,859,236.84 SSbetween = 4,750,850,500, SSwithin = 8,055,857,000
We have already checked for normality and equality of variances. F = 14.9 The F-distribution has 3, 76 degrees of freedom. Excel provides a p-value of approximately zero. Reject H0. There is sufficient evidence to conclude that at least one of the program streams had an average salary for graduates that differs from that of the other program streams.
282
Count 20 20 20 20
Sum Average 1027900 51395 1423400 71170 1122000 56100 1077700 53885
df
MS 3 1.58E+09 76 1.06E+08 79
9.
H0: 1 = 2 = 3 H1: At least one differs from the others. = 0.05 nT = 30, n1 = 10, n2 = 10, n3 = 10, k = 3 x1 47, x 2 34.6, x 3 48.7
2 2 s12 78.4444, s2 67.1556, s3 94.0111 SSbetween = 1184.8667, SSwithin = 2156.5
We have already checked for normality and equality of variances. F = 7.4 The F-distribution has 2, 27 degrees of freedom. Excel provides a p-value of 0.0027. Reject H0. There is sufficient evidence to conclude that the average commuting time for at least one of the routes is different from the others. The Excel output is shown below.
283
Count 10 10 10
df
10. H0: 1 = 2 = 3 H1: At least one differs from the others. = 0.05 nT = 135, n1 = 45, n2 = 45, n3 = 45, k = 3 x1 70.1111, x 2 56.6889, x 3 54.0667
2 2 s12 212.1010, s2 226.5828, s3 218.0182 SSbetween = 6666.8444, SSwithin = 28894.8889
We have already checked for normality and equality of variances. F = 15.2 The F-distribution has 2, 132 degrees of freedom. Excel provides a p-value of approximately zero. Reject H0. There is sufficient evidence to conclude that differences in the use of the online software are associated with differences in final grades. We should be cautious about interpreting the results, because although there is evidence of a difference in the average grades, we cannot necessarily attribute the differences in the use of the online software as the cause. There are many potential confounding factors, that is, other factors which could have an effect on the final grades.
284
Develop Your Skills 11.3 11. Completed Excel templates are shown below. For locations 1 and 3:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
285
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
The first two confidence intervals do not contain zero, so it appears that the average number of people passing by location 3 is greater than at the other two locations. 12. Completed Excel templates are shown below (to save space, the row checking for rejection of the null hypothesis in ANOVA is not shown). For under 30 and over 50:
Tukey KramerConfidenceInterval x bari 77.568 x barj 132.467 ni 50 nj 50 q(fromAppendix7) 3.36 MSwithin 611.2629973 UpperConfidenceLimit 43.15088123 LowerConfidenceLimit 66.64711877
286
q(fromAppendix7)
MSwithin
UpperConfidenceLimit LowerConfidenceLimit
q(fromAppendix7)
MSwithin
UpperConfidenceLimit LowerConfidenceLimit
None of these confidence intervals contains zero. Certainly the highest average purchase is with those over 50. 13. Completed Excel templates are shown below (to save space, the row checking for rejection of the null hypothesis in ANOVA is not shown). Marketing and Accounting:
Tukey KramerConfidenceInterval xbari xbarj
ni nj
q(fromAppendix7)
MSwithin
UpperConfidenceLimit LowerConfidenceLimit
287
q(fromAppendix7)
MSwithin
UpperConfidenceLimit LowerConfidenceLimit
q(fromAppendix7)
MSwithin
UpperConfidenceLimit LowerConfidenceLimit
q(fromAppendix7)
MSwithin
UpperConfidenceLimit LowerConfidenceLimit
At this point, no further comparisons are necessary. Since this interval contains zero, there does not appear to be a significant difference between the average salaries of Marketing graduates and Human Resources graduates. The differences between the
288
sample means for all other pairs are smaller than for this pair, and so we know there will not be a significant difference for the other pairs. To summarize: We have 95% confidence that the interval ($-28,385.05, $-11,164.95) contains the average difference in the salaries of Marketing graduates, compared to Accounting graduates (in other words, the average salary of Accounting graduates is likely at least $11,164.95 higher) ($8,674.95, $25,895.05) contains the average difference in the salaries of Accounting graduates, compared to General Business graduates ($6,459.95, $23,680.05) contains the average difference in the salaries of Accounting graduates, compared to Human Resources graduates. The differences between the average salaries of Human Resources, General Business, and Marketing graduates are not significant. 14. Because of the balanced design, these calculations simplify to: ( xi x j ) qscore ( xi x j ) 3.49
MS within n
79.8703704 10 ( xi x j ) 9.86321
(34.6 48.7) 9.86321 14.1 9.86321 ( 23.96, 4.24) For route 1 and route 2:
( 47 34.6) 9.86321 12.4 9.86321 ( 2.54, 22.26) For route 1 and 3: ( 47 48.7) 9.86321 1.7 9.86321 ( 11.56, 8.16)
289
Route 2 would be the recommended route. 15. We have to be careful NOT to answer this question merely by inspection! First we recall that the F-test for ANOVA indicated a rejection of the null hypothesis. We have sample evidence that the population means are not all the same. The completed Excel templates are shown below. For assigned quizzes and sample tests only:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
We have 95% confidence that the interval (8.6, 23.5) contains the amount that the average mark for all those who used the online software for assigned quizzes, versus the average mark for all those who used sample tests only. Thus it appears that the average mark is at least 8.6 percent higher for those who use the online software for assigned quizzes.
290
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
Once again, it appears that the average marks are higher when the online software is used for assigned quizzes for marks, compared with quizzes for no marks. We have 95% confidence that the interval (6.0, 20.8) contains the amount by which the average marks are higher when the online software is used for assigned quizzes for marks. We cannot conclude that there is a difference in the average marks when the online software is used for quizzes (no marks) or sample tests only. The confidence interval shown below contains zero.
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
291
We have evidence that assigning quizzes for marks results in the best average marks for students. However, as we cautioned before, we cannot be certain of the causeand-effect relationship here, because there are many potentially confounding variables.
Chapter Review Exercises 1. The histograms appear approximately normal, although there is some skewness in each one. However, with the large sample sizes, it is not unreasonable to assume the normality requirements are met.
2.
The largest variance is 590.65, and the smallest is 370.02. The ratio of the largest to the smallest is not above 4, so it is reasonable to assume that population variances are approximately equal. The missing values are shown below in bold type.
3.
SUMMARY Groups Class#1 Class#2 Class#3 ANOVA SourceofVariation BetweenGroups WithinGroups Total
4.
The appropriate F-distribution has 2, 282 degrees of freedom. We refer to the area in the F table for 2, 120 degrees of freedom and see that an F-score of 6.1 has a p-value less than 0.010. Excel provides a more accurate value of 0.0026.
292
5.
458.7512505 95 ( x i x j ) 7.273691 For Class 2 and Class 3: (53.5579 63.9474) 7.273691 (-17.7, -3.1) We have 95% confidence that the interval (-17.7, -3.1) contains the difference between the average marks of Class 2 and Class 3. In other words, it appears that the average marks of those with the Class 3 professor are at least 3 percentage points higher than the average mark for those with the Class 2 professor. For Class 1 and Class 2: (61.4737 53.5579) 7.273691 (0.6, 15.2) We have 95% confidence that the interval (0.6, 15.2) contains the difference between the average marks of Class 1 and Class 2. In other words, it appears that the average marks of those with the Class 1 professor are at least 0.6 percentage points higher than the average mark for those with the Class 2 professor. For Class 1 and Class 3: (61.4737 63.9474) 7.273691 (-9.7, 4.8) In this case, the interval contains zero, and so there does not appear to be a significant difference between the average marks of those with the Class 1 professor and those with the Class 3 professor. From these comparisons, it appears that the average marks are lower for the Class 2 professor`s classes, and so this class should be avoided. There is no significant difference between the average marks for Class 1 and Class 3. The choice should then be: any professor but the one who lead Class 2.
293
However, this is not a valid method of choosing classes, because there could be many explanations for why the Class 2 marks were significantly lower. It could have to do with the teacher`s expertise, and evaluation methods. But it could also have arisen because of other factors: the students in Class 2 might have been less wellprepared, they may have worked more, or had family responsibilities that prevented them from studying, the class times might have been inconvenient, etc. 6. The conditions for ANOVA are not met, given the information in these three samples. The distribution of monthly balances for Mastercard owners is quite skewed to the left. The distribution of monthly balances for American Express owners is quite skewed to the right. As well, the variance of the American Express data is less than four times as large as the variance for the Mastercard data. It would not be appropriate to use ANOVA techniques in this case. The Kruskal-Wallis test could be used to compare these samples and draw conclusions about the populations (this technique is not covered in this text). The requirement for equal variances is met. The largest variance is 14.757, which is only 2.3 times as large as the smallest variance, which is 6.314. The missing values are shown below, in bold type.
SUMMARY Groups Employee1 Employee2 Employee3 Employee4 ANOVA SourceofVariation BetweenGroups WithinGroups Total Count 35 37 32 42 SS 264.6295 1621.124 1885.753 Sum 404 462 357 377 df 3 142 145 Average 11.54286 12.48649 11.15625 8.97619 MS 88.20984 11.41637 Variance 6.314286 14.75676 10.32964 13.536 F 7.726613
7.
8.
The F-distribution will have 3, 142 degrees of freedom. The closest we can come in the table is 3, 120. The closest entry in the table is 3.95, and so we know that the pvalue is < 0.01. At the 5% level of significance, the data do suggest that there are differences in the average number of minutes each employee spends with a customer before making a sale.
294
9.
The completed Excel templates are shown below. Employee 4 and Employee 2:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
We have 95% confidence that the interval (-5.5, -1.5) contains the number of minutes by which the average time spent with customers before making a sale for Employee 4 differs from the average time spent by Employee 2. In other words, we expect the average time spent by Employee 4 is at least 1.5 minutes less than Employee 2. Employee 4 and Employee 1:
TukeyKramerConfidence Interval Wasthenullhypothesis rejected intheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
295
We have 95% confidence that the interval (-4.5, -0.5) contains the number of minutes by which the average time spent with customers before making a sale for Employee 4 differs from the average time spent by Employee 1. In other words, we expect the average time spent by Employee 4 is at least 0.5 minutes less than Employee 2. Employee 4 and Employee 3:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
We have 95% confidence that the interval (-4.2, -0.1) contains the number of minutes by which the average time spent with customers before making a sale for Employee 4 differs from the average time spent by Employee 3. In other words, we expect the average time spent by Employee 4 is at least 0.1 minutes less than Employee 3.
296
TukeyKramerConfidence Interval Wasthenullhypothesis rejected intheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
Since t his interval contains zero, we conclude there is no significant difference between the average number of minutes Employees 2 and 3 spend with customers before making a sale. At this point, we can conclude that there are no significant differences between the average number of minutes Employees 1, 2 and 3 spend with customers before making a sale (the differences in the sample means are all less than the difference for Employees 2 and 3). This means that the average amount of time spent by Employee 4 is less than the average amount of time spent by the other employees.
297
10. Without further information, we cannot comment on whether the data are independent random samples. In practice, we should never take this on faith. We will assume this condition is met, with a caution that if it isn't, the results may not be reliable. Histograms of the sample data reassure us that the population data are probably normally distributed.
8 6 4 2 0 NumberofAccidents
8 6 4 2 0 NumberofAccidents
298
The largest variance is 16.5, which is less than twice as large as the smallest variance of 8.3, so we will assume the population variances are approximately equal. It appears that the conditions for one-way ANOVA are met. 11. The Excel output is shown below.
Anova:SingleFactor SUMMARY Groups Numberof Accidents, TrainingMethod#1 Numberof Accidents, TrainingMethod#2 Numberof Accidents, TrainingMethod#3
Frequency
Count 30 30 30
Sum
Average
Variance
df
299
H0: 1 = 2 = 3 H1: At least one differs from the others. = 0.025 nT = 90, n1 = 30, n2 = 30, n3 = 30 x1 9.3667, x 2 11.0333, x 3 12.0677
2 2 s12 8.3092, s2 9.7575, s3 16.4782, SSbetween = 55.6778, SSwithin = 11.5149
We have already checked for normality and equality of variances. F = 4.835 Excel provides a p-value of 0.010205. Reject H0. There is sufficient evidence to conclude that at the average number of factory accidents is different, according to the training method. However, we cannot be certain that it is the training method that caused these differences. There may be other factors involved. 12. Comparing training method #1 and #3:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
We have 95% confidence that the interval (-4.8, -0.6) contains the amount by which the average number of factory accidents for training method #1 differs from the average number of factory accidents for training method #3. In other words, it appears that training method #1 is associated with at least 0.6 fewer accidents, on average.
300
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
Since this confidence interval contains zero, there is not a significant difference in the average number of factory accidents associated with training methods #2 and #3. Comparing training method #1 and #2:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
Since this confidence interval contains zero, there is no significant difference between the average number of accidents that are associated with training methods #1 and #2.
301
Training method #1 compares favourably to training method #3, but otherwise the differences are not significant. This suggests that training method #3 is the "worst". Again, we should be cautious, because there may be other explanatory factors. 13. Histograms of the sample data show significant skewness for some of the connection times. The data for early morning and late afternoon connection times appear skewed to the right, and the connection times for the evening are skewed to the left. Sample sizes are also relatively small. As a result, it would probably not be wise to proceed with ANOVA here, as the required conditions do not appear to be met.
ConnectionTimestoOnline MutualFundAccount
12 10
Frequency
8 6 4 2 0 TimesinSeconds,LateAfternoon
ConnectionTimestoOnline MutualFundAccount 10
9 8 7 6 5 4 3 2 1 0 TimesinSeconds,Evening
Frequency
302
ConnectionTimestoOnline MutualFundAccount
9 8 7 6 5 4 3 2 1 0 TimesinSeconds,EarlyAfternoon
Frequency
ConnectionTimestoOnline MutualFundAccount
9 8 7 6 5 4 3 2 1 0
Frequency
TimesinSeconds,EarlyMorning
ConnectionTimestoOnline MutualFundAccount
9 8 7 6 5 4 3 2 1 0
Frequency
TimesinSeconds,MidDay
303
14. We are told the data were collected on a random sample of days. Histograms are shown below.
Frequency Frequency
The histograms appear approximately normal. The Excel ANOVA output is shown below.
Copyright 2011 Pearson Canada Inc.
Frequency
304
Anova:SingleFactor SUMMARY Groups CommutingTimeinMinutes,6 a.m.Departure CommutingTimeinMinutes, 7a.m.Departure CommutingTimeinMinutes,8 a.m.Departure Count 24 22 27 Sum Average Variance
df
We see from the output that the variances are fairly close in value, and certainly the largest is less than four times as large as the smallest. It appears that the conditions for ANOVA are met. H0: 1 = 2 = 3 H1: At least one differs from the others. = 0.05 nT = 73, n1 = 24, n2 = 22, n3 = 27, k = 3 x1 45.7, x 2 45.5, x 3 39.4
2 2 s12 172.4, s2 175.4, s3 197.5 SSbetween = 667.0, SSwithin = 12784.7
We have already checked for normality and equality of variances. F = 3.1 Excel provides a p-value of 0.16. Fail to reject H0. There is not enough evidence to conclude that the mean commuting times are not all equal.
305
15. First, check conditions. The data are not actually random samples, but could perhaps be considered to be (see the explanation in the exercise). Histograms of the data are shown below.
Frequency
8 6 4 2 0 FinalGrade
12 10
Frequency
8 6 4 2 0 FinalGrade
306
The histograms appear reasonably normal. The Excel ANOVA output is shown below.
Anova:SingleFactor SUMMARY Groups MarksofClass Scheduledfor8a.m. Thursdays MarksofClass Scheduledfor4p.m. Fridays MarksofClass Scheduledfor2p.m Wednesday
Count
Sum
Average
Variance
20
1257
62.85 268.0289
23
25
1691
67.64
263.99
df
We can see from the output that the variances are sufficiently similar to allow us to assume the requirements for ANOVA are met (population variances approximately equal). H0: 1 = 2 = 3 H1: At least one differs from the others. = 0.01 nT = 78, n1 = 20, n2 = 23, n3 = 25, k = 3 x1 62.85, x 2 71.74, x 3 67.64
2 2 s12 268.03, s2 305.20, s3 263.99 SSbetween = 845.31, SSwithin = 18142.74
307
Excel provides a p-value of 0.23. Fail to reject H0. There is not enough evidence to conclude that the mean grades for the students in classes for all three schedules are not equal. It does not appear that the scheduled time for classes affects the marks. However, we should be cautious, because there are many other factors that could be affecting marks. If we could control for them, we would be in a better position to investigate the effects of class schedule on student grades. 16. The first thing to note is that the data are not completely randomly selected. The information is provided by those who enter the contest. These customers may not represent all drugstore customers. Therefore, we must be cautious in interpreting the results. We would need more information about whether most customers entered the contest, before we could apply the results to all customers. As well, we have no way to be sure that the data are correct. Some people may have misrepresented their age or the value of their most recent purchase. With these caveats, we will proceed, but mostly for the practice! Histograms of the data appear approximately normal, and sample sizes, at 45, are fairly large.
15 10 5 0
AmountofPurchase
15 10 5 0 AmountofPurchase
308
20 15 10 5 0 AmountofPurchase
AmountofPurchase
10 5 0
AmountofPurchase
15 10 5 0
AmountofPurchase
309
ANOVA Sourceof Variation BetweenGroups WithinGroups Total SS 7179.96 26175.49 33355.46 df MS F Pvalue Fcrit
The largest variance is 147.8, and the smallest is 57.8, so the largest variance is less than four times the smallest variance. We will assume that the population variances are sufficiently equal to proceed with ANOVA.
310
17. H0: 1 = 2 = 3 = 4 = 5 = 6 H1: At least one differs from the others. = 0.05 nT = 270, n1 = 45, n2 = 45, n3 = 45, n4 = 45, n5 = 45, n6 = 45, k = 6 x1 23.46, x 2 27.50, x 3 34.84, x 4 35.65, x5 36.60, x6 26.05
2 2 2 2 2 s12 106.43, s2 83.10, s3 57.77, s4 121.01, s6 78.81 147.78, s5 SSbetween = 7179.961, SSwithin = 26175.49
We have already checked for normality and equality of variances. F = 14.5 Excel provides a p-value of approximately zero. Reject H0. There is enough evidence to conclude that the mean purchases of customers in different age groups are not all equal, when we consider the most recent purchases of those who entered the contest. 18. Because there are so many age groups in this data set, it is not as easy to see where the greatest differences in samples means is, simply by inspection. The easiest way to proceed is to create a table showing the differences in sample means. This is fairly easily constructed in Excel. See an example of such a table, below. Notice that the table shows the absolute value of the differences. Under 18 18-25 26-34 35-49 50-74 Under 18 18-25 26-34 35-49 50-74 75 and over 0 4.237 11.380 12.190 13.141 2.587 0.000 7.144 7.953 8.904 1.650 0.000 0.810 0.000 1.760 0.951 0.000 8.794 9.603 10.554 75 and over
By inspection of the table, we can see that we should start first by comparing the differences of purchases for customers under 18 and 50-74, then under 18 and 35-49, then under 18 and 26-34, and so on. We need the q-value for 6, 265 degrees of freedom. We will use the value for 6, 120 degrees of freedom, as the closest entry in Appendix 7.
311
We have 95% confidence that the interval (-$19.23, -$7.06) contains the amount by which the average most recent purchase of customers under 18 differs from those aged 50-74 (for those who entered the contest). Under 18 and 35-49:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit yes 23.46 35.6502222 45 45 4.1 99.1496022 6.10434638 18.2760981
We have 95% confidence that the interval (-$18.27, -$6.10) contains the amount by which the average most recent purchase of customers under 18 differs from those aged 35-49 (for those who entered the contest).
312
We have 95% confidence that the interval (-$17.47, -$5.29) contains the amount by which the average most recent purchase of customers under 18 differs from those aged 26-37 (for those who entered the contest). 75 and over and 50-74:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
We have 95% confidence that the interval (-$16.64, -$4.47) contains the amount by which the average most recent purchase of customers 75 and over differs from those aged 50-74 (for those who entered the contest).
313
We have 95% confidence that the interval (-$115.69, -$3.52) contains the amount by which the average most recent purchase of customers 75 and over differs from those aged 35-49 (for those who entered the contest). 75 and over and 26-34:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
At this point, we see the confidence interval contains zero. For this and all the remaining comparisons, there is not a significant difference in the average purchases (for those who entered the contest).
314
19. This question has already been answered, in the discussion of exercise 16. We proceeded, for practice, but these data do not represent a random sample of data about the drugstore customers. 20. Generally speaking, these data do not meet the requirements for ANOVA. The data sets are non-normal, and quite significantly skewed. The histograms for Canada-wide data are shown below.
120 100
Frequency
80 60 40 20 0 WagesandSalaries
20 10 0 WagesandSalaries
60 50
Frequency
40 30 20 10 0 WagesandSalaries
315
21. The professor has selected random samples, from large classes, and there is no immediately obvious reason why the observations would not be independent. The sample data appears to be approximately normally distributed, as the histograms below illustrate.
12 10
Frequency
8 6 4 2 0 FinalMarkinMicroeconomics
15 10 5 0
Frequency
FinalMarkinMicroeconomics
12 10
Frequency
8 6 4 2 0 FinalMarkinMicroeconomics
316
10 8
Frequency
6 4 2 0 FinalMarkinMicroeconomics
12 10
Frequency
8 6 4 2 0 FinalMarkinMicroeconomics
df
317
The ANOVA output shows the largest variance as 355.40, and the smallest as 251.44, and so the largest variance is less than four times as large as the smallest. We will presume that the population variances are approximately equal. H0: 1 = 2 = 3 = 4 = 5 H1: At least one differs from the others. = 0.05 nT = 153, n1 = 32, n2 = 34, n3 = 36, n4 = 27, n5 = 24, k = 5 x1 64.88, x 2 65.21, x 3 55.14, x 4 57.67, x5 52.54
2 2 2 2 s12 355.40, s2 251.44, s3 305.32, s4 256.43 284.00, s5 SSbetween = 3968.56, SSwithin = 43283.32
We have already checked for normality and equality of variances. F = 3.39 Excel provides a p-value of 0.010. Reject H0. There is enough evidence to suggest that the mean marks are not all equal. Again, because there are so many possible comparisons, it is useful to calculate all differences in sample means, so we can see which is largest, second-largest, and so on. Such a summary table is shown below (absolute values of differences are shown).
LessThan 5<10 10<15 15<20 20orMore 5Hours HoursPer HoursPer HoursPer HoursPer PerWeek Week Week Week Week LessThan5Hours PerWeek 0 5<10HoursPer Week 0.330882 0 10<15HoursPer Week 9.736111 10.06699 0 15<20Hours PerWeek 7.208333 7.539216 2.527778 20orMoreHours PerWeek 12.33333 12.66422 2.597222
0 5.125 0
318
So, the first comparison will be the marks of students who work 20 or more hours a week and those who work 5 - <10 hours a week, then students who work 20 or more hours a week and those who work less than 5 hours a week, and so on. We need the q-value from Appendix 7 for 5, 148 degrees of freedom. Note that if we use the table value for 5, 120 degrees of freedom, we get the following result. For the marks of those who work 20 or more hours a week, and those who work 5 <10 hours a week:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
We have 95% confidence that the interval (-25.30, -0.03) contains the amount by which the average mark of students who work 20 hours or more and those who work 5 to <10 hour per week. Note that although there appears to be a significant difference between the marks of those who work 20 or more hours a week, and those who work 5 - <10 hours a week, the size of the difference may be quite small.
319
For the marks of those who work 20 or more hours a week, and those who work <5 hours a week:
TukeyKramerConfidence Interval Wasthenullhypothesis rejectedintheANOVAtest? xbari xbarj ni nj q(fromAppendix7) MSwithin UpperConfidenceLimit LowerConfidenceLimit
This confidence interval contains zero. For this and all remaining comparisons, there is not a significant difference in the average marks.
320