Professional Documents
Culture Documents
Hypothesis Testing
continuous outcomes: z- or t-test one sample two samples paired samples (matched samples) discrete outcomes: 2 one sample ( 2 goodness-of-fit test) two samples ( 2 test of independence)
Hypothesis Testing
continuous outcomes: ANOVA more than two samples/groups several types of ANOVAs
one-way (one-factor) extension of two-sample t-test randomized block (no interaction effects) multi-factor (possible interaction effects) repeated measures extension of paired-samples t-test
What is ANOVA?
One-Way ANOVA allows us to compare the means of 2 or more groups or categories (the independent variable) on one dependent variable to determine if the groups differ significantly from one another on the DV. To use ANOVA, you must have a categorical (or nominal) variable that has at least two independent groups (e.g. treatment vs control, fuel 1 vs fuel 2) as the independent variable and a continuous variable (interval or ratio) as the dependent variable. ANOVA is very similar to a t-test, particularly when comparing only 2 groups. But when looking at 3 or more groups, ANOVA is much more effective in determining significant group differences.
H0: There is no difference in MPG between fuels. HA: There is a difference in MPG between fuels. (What is the IV? What is the DV?)
Data Set 1 Fuel 1 40 44 42 44 40 M1 = 42 Fuel 2 50 54 52 52 52 Fuel 3 56 56 54 58 56 Fuel 1 36 48 34 44 48 M1 = 42 Data Set 2 Fuel 2 54 40 58 62 46 Fuel 3 34 74 58 42 72
M2 = 52 M3 = 56
M2 = 52 M3 = 56
Grand M = 50
Grand M = 50
7(X X )2 2 s ! n 1
One-Way ANOVA: Compare the between-group (between-
factor) variance to the within-group (within-factor) variance In case of ANOVA, variance is referred to as the mean square F statistic is determined by the ratio of these two variances
F!
n j (X j X) 2 /(k 1) (X X j ) 2 /(N k)
SSBETWEEN + SSWITHIN
Fuel 1 40 44 Fuel 2 50 54 52 52 52 M2 = 52 Grand M = 50 Fuel 3 56 56 54 58 56 M3 = 56
Data Set 1
42 44 40 M1 = 42
(X X) /(k ) ! (X X ) /( k)
SSBETWEEN + SSWITHIN
Fuel 1 40 44 Fuel 2 50 54 52 52 52 M2 = 52 Grand M = 50 Fuel 3 56 56 54 58 56 M3 = 56
Data Set 1
42 44 40 M1 = 42
SST = (40 - 50)2 + (44 - 50)2 + + (58 - 50)2 + (56 - 50)2 = 552 units of variation
SSBETWEEN + SSWITHIN
Fuel 1 40 44 Fuel 2 50 54 52 52 52 M2 = 52 Grand M = 50 Fuel 3 56 56 54 58 56 M3 = 56
Data Set 1
42 44 40 M1 = 42
SSB = 5 [(42 - 50)2 + (52 - 50)2 + (56 - 50)2] = 5 [ 64 + 4 + 36] = 520 units of variation
SSBETWEEN + SSWITHIN
Fuel 1 40 44 Fuel 2 50 54 52 52 52 M2 = 52 Grand M = 50 Fuel 3 56 56 54 58 56 M3 = 56
Data Set 1
42 44 40 M1 = 42
SSW1 = (40 - 42)2 + + (40 - 42)2 = 16 SSW2 = (50 - 52)2 + + (52 - 52)2 = 8 SSW3 = (40 - 56)2 + + (40 - 56)2 = 8 = 32 units of variation
Reject H0 because F = 97.5 > F = 3. 89 (E = .05). Conclude that there is a significant difference between fuels in MPG.
SSBETWEEN + SSWITHIN
Fuel 1 36 48 34 44 48 M1 = 42 Fuel 2 54 40 58 62 46 M2 = 52 Grand M = 50 Fuel 3 34 74 58 42 72 M3 = 56
Data Set 2
SSBETWEEN + SSWITHIN
Fuel 1 36 48 Fuel 2 54 40 58 62 46 M2 = 52 Grand M = 50 Fuel 3 34 74 58 42 72 M3 = 56
Data Set 2
34 44 48 M1 = 42
SST = (36 - 50)2 + (48 - 50)2 + + (42 - 50)2 + (72 - 50)2 = 2280 units of variation
SSBETWEEN + SSWITHIN
Fuel 1 36 48 Fuel 2 54 40 58 62 46 M2 = 52 Grand M = 50 Fuel 3 34 74 58 42 72 M3 = 56
Data Set 2
34 44 48 M1 = 42
SSB = 5 [(42 - 50)2 + (52 - 50)2 + (56 - 50)2] = 5 [ 64 + 4 + 36] = 520 units of variation (NOTE: Same as for Data Set 1)
SSBETWEEN + SSWITHIN
Fuel 1 36 48 34 44 48 M1 = 42 Fuel 2 54 40 58 62 46 M2 = 52 Grand M = 50 Fuel 3 34 74 58 42 72 M3 = 56
Data Set 2
SSW1 = (36 - 42)2 + + (48 - 42)2 = 176 SSW2 = (54 - 52)2 + + (46 - 52)2 = 320 SSW3 = (34 - 56)2 + + (72 - 56)2 = 1264 = 1760 units of variation
Accept H0 because F = 1.77 < F = 3. 89 (E = .05). Conclude that there is not a significant difference between fuels in MPG.
small
large
factor has little or no effect. accept HO. factor has a large effect. reject HO. hard to say.
large
small
large
large
sx
Critical value of Tukey statistic (seeTable D) is based on number of groups/factors (3 here) and the df of the error term (12 here) 3.77 for = .05 and 5.05 for = .01
Tukey1-2 = (42 - 52)/.73 = 13.7 Tukey1-3 = (42 - 56)/.73 = 19.2 Tukey2-3 = (52 - 56)/.73 = 5.48
Each of the 3 means are significantly different from each other at .01 level of significance mileage for Fuel 3 > mileage for Fuel 2 > mileage for Fuel 1
Tests the H0 that the error variance of the dependent variable is equal across groups.
ANOVA M Sum of Squares Between Groups Within Groups Total 520.000 32.000 552.000 df 2 12 14 Mean Square 260.000 2.667 F 97.500 Sig. .000
Co pari on
5% Co fi (I-J) .
* * * * * *
c I t rv l Upp r Bou
St . Error . . . . . . . 5 l v l.
Si . . . . . . . 6 6
ow r Bou . 6
- . 4 . 4 . 6 - . 4 6. 6 6. 6
sio
sio
*. T
- 6. 6 . 4 -6. 6 . 4 . 4
Data Set 3
22 21 22 20
M1 = 21 M2 = 26 M3 = 28 Grand M = 25
Data Set 3
22 21 22 20
M1 = 21 M2 = 26 M3 = 28 Grand M = 25
Data Set 3
22 21 22 20
M1 = 21 M2 = 26 M3 = 28 Grand M = 25
Data Set 3
21 22 20 M1 = 21
extraneous source of variance and to remove it from the error term, thus increasing the between-groups F value
in effect, the randomized block design removes unexplained
variance from the error term by associating it with an extraneous factor that is affecting the results
Fisher (from whom we get our F value) developed the block
design to account for extraneous variance in crop yield associated with farm location (e.g., northern vs. central vs. southern locales in England) in order to test whether there were real differences in his main experimental factor, fertilizer type
SSBETWEEN + SSWITHIN
Fertilizer 1 38 42 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33
29 32 18 22 M1 = 30.17
Grand M = 34.75
SST = (38 34.75)2 + (42 34.75)2 + + (27 34.75)2 + (28 34.75)2 = 1232.25 units of variation
SSBETWEEN + SSWITHIN
Fertilizer 1 38 42 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33
29 32 18 22 M1 = 30.17
Grand M = 34.75
SSBETWEEN + SSWITHIN
Fertilizer 1 38 42 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33
29 32 18 22 M1 = 30.17
Grand M = 34.75
SSW1 = (38 30.17)2 + + (22 - 30.17)2 SSW2 = (50 39.33)2 + + (28 - 39.33)2 = 980.17 units of variation
Grand M = 34.75 Sources of Variation Between Groups Within Groups/Error Total Sum of Squares 252.1 980.2 1232.3 df 1 10 11 Mean Square 252.1 98.02 112.03 F 2.57 p .140
SST = (38 34.75)2 + (42 34.75)2 + + (27 34.75)2 + (28 34.75)2 = 1232.25 units of variation
SSB = 6 [(30.17 34.75)2 + (39.33 - 34.75)2] = 252.1 units of variation (NOTE: Same as for Unblocked Data Set)
SSBL = 4 [(45.5 34.75)2 + (35 - 34.75)2 + (23.75 - 34.75)2] = 946.5 units of variation