You are on page 1of 38

GMS MS 700/GMS AN 704 Elementary Biostatistics March 23, 2011

Hypothesis Testing Analysis of Variance (ANOVA)

Hypothesis Testing
continuous outcomes: z- or t-test one sample two samples paired samples (matched samples) discrete outcomes: 2 one sample ( 2 goodness-of-fit test) two samples ( 2 test of independence)

Hypothesis Testing
continuous outcomes: ANOVA  more than two samples/groups  several types of ANOVAs
   

one-way (one-factor) extension of two-sample t-test randomized block (no interaction effects) multi-factor (possible interaction effects) repeated measures extension of paired-samples t-test

What is ANOVA?
 One-Way ANOVA allows us to compare the means of 2 or more groups or categories (the independent variable) on one dependent variable to determine if the groups differ significantly from one another on the DV.  To use ANOVA, you must have a categorical (or nominal) variable that has at least two independent groups (e.g. treatment vs control, fuel 1 vs fuel 2) as the independent variable and a continuous variable (interval or ratio) as the dependent variable.  ANOVA is very similar to a t-test, particularly when comparing only 2 groups. But when looking at 3 or more groups, ANOVA is much more effective in determining significant group differences.

t-Tests vs. ANOVA


 t-tests allow us to decide whether the observed difference between two group means is large enough not to be due to chance (i.e., statistically significant).  But the more t tests we run, the greater the chance of rejecting the null hypothesis when it is true (Type 1 error).  ANOVA takes into account the number of groups being compared, and provides us with more certainty in concluding significance when looking at 3 or more groups.  Rather than finding a simple difference between 2 means as in a t-test, in ANOVA we find the average difference between means of multiple independent groups using the squared value of the difference between the means.

H0: There is no difference in MPG between fuels. HA: There is a difference in MPG between fuels. (What is the IV? What is the DV?)
Data Set 1 Fuel 1 40 44 42 44 40 M1 = 42 Fuel 2 50 54 52 52 52 Fuel 3 56 56 54 58 56 Fuel 1 36 48 34 44 48 M1 = 42 Data Set 2 Fuel 2 54 40 58 62 46 Fuel 3 34 74 58 42 72

M2 = 52 M3 = 56

M2 = 52 M3 = 56

Grand M = 50

Grand M = 50

One-Way (One-Factor) ANOVA (one IV): An Intuitive Decomposition of Sum of Squares/Variance


 Variance: the near average of the squared differences of

a set of observations around its mean

7(X  X )2 2 s ! n 1
 One-Way ANOVA: Compare the between-group (between-

factor) variance to the within-group (within-factor) variance  In case of ANOVA, variance is referred to as the mean square  F statistic is determined by the ratio of these two variances

Hypothesis Testing for More than 2 Means: ANOVA


 Continuous outcome  k Independent Samples, k > 2 H0: Q!Q2!Q !Qk H1: Means are not all equal Test Statistic

F!

n j (X j  X) 2 /(k  1) (X  X j ) 2 /(N  k)

Find critical value in Table 4 F distribution df = (k -1), (N k)

An Intuitive Decomposition of Sum of Squares Data Set 1: Decision Rule


SSTOTAL
=

SSBETWEEN + SSWITHIN
Fuel 1 40 44 Fuel 2 50 54 52 52 52 M2 = 52 Grand M = 50 Fuel 3 56 56 54 58 56 M3 = 56

Data Set 1

42 44 40 M1 = 42

(X  X) /(k  ) ! (X  X ) /(  k)

k 1 = 3 1 = 2; N k = 15 3 = 12 F (2, 12) = 3.89 (E = .05; Table 4)

An Intuitive Decomposition of Sum of Squares Data Set 1


SSTOTAL
=

SSBETWEEN + SSWITHIN
Fuel 1 40 44 Fuel 2 50 54 52 52 52 M2 = 52 Grand M = 50 Fuel 3 56 56 54 58 56 M3 = 56

Data Set 1

42 44 40 M1 = 42

SST = (40 - 50)2 + (44 - 50)2 + + (58 - 50)2 + (56 - 50)2 = 552 units of variation

An Intuitive Decomposition of Sum of Squares: Data Set 1


SSTOTAL
=

SSBETWEEN + SSWITHIN
Fuel 1 40 44 Fuel 2 50 54 52 52 52 M2 = 52 Grand M = 50 Fuel 3 56 56 54 58 56 M3 = 56

Data Set 1

42 44 40 M1 = 42

SSB = 5 [(42 - 50)2 + (52 - 50)2 + (56 - 50)2] = 5 [ 64 + 4 + 36] = 520 units of variation

An Intuitive Decomposition of Sum of Squares Data Set 1


SSTOTAL
=

SSBETWEEN + SSWITHIN
Fuel 1 40 44 Fuel 2 50 54 52 52 52 M2 = 52 Grand M = 50 Fuel 3 56 56 54 58 56 M3 = 56

Data Set 1

42 44 40 M1 = 42

SSW1 = (40 - 42)2 + + (40 - 42)2 = 16 SSW2 = (50 - 52)2 + + (52 - 52)2 = 8 SSW3 = (40 - 56)2 + + (40 - 56)2 = 8 = 32 units of variation

for Fuel 1 for Fuel 2 for Fuel 3

An Intuitive Decomposition of Sum of Squares Data Set 1: Conclusion


Sources of Variation Between Groups Within Groups/Error Total Sum of Squares 520 32 552 df 2 12 14 Mean Square 260 2.67 F 97.5 p .000

Reject H0 because F = 97.5 > F = 3. 89 (E = .05). Conclude that there is a significant difference between fuels in MPG.

An Intuitive Decomposition of Sum of Squares Data Set 2: Decision Rule


SSTOTAL
=

SSBETWEEN + SSWITHIN
Fuel 1 36 48 34 44 48 M1 = 42 Fuel 2 54 40 58 62 46 M2 = 52 Grand M = 50 Fuel 3 34 74 58 42 72 M3 = 56

Data Set 2

k 1 = 3 1 = 2; N k = 15 3 = 12 F (2, 12) = 3.89 (E = .05; Table 4)

An Intuitive Decomposition of Sum of Squares Data Set 2


SSTOTAL
=

SSBETWEEN + SSWITHIN
Fuel 1 36 48 Fuel 2 54 40 58 62 46 M2 = 52 Grand M = 50 Fuel 3 34 74 58 42 72 M3 = 56

Data Set 2

34 44 48 M1 = 42

SST = (36 - 50)2 + (48 - 50)2 + + (42 - 50)2 + (72 - 50)2 = 2280 units of variation

An Intuitive Decomposition of Sum of Squares Data Set 2


SSTOTAL
=

SSBETWEEN + SSWITHIN
Fuel 1 36 48 Fuel 2 54 40 58 62 46 M2 = 52 Grand M = 50 Fuel 3 34 74 58 42 72 M3 = 56

Data Set 2

34 44 48 M1 = 42

SSB = 5 [(42 - 50)2 + (52 - 50)2 + (56 - 50)2] = 5 [ 64 + 4 + 36] = 520 units of variation (NOTE: Same as for Data Set 1)

An Intuitive Decomposition of Sum of Squares Data Set 2


SSTOTAL
=

SSBETWEEN + SSWITHIN
Fuel 1 36 48 34 44 48 M1 = 42 Fuel 2 54 40 58 62 46 M2 = 52 Grand M = 50 Fuel 3 34 74 58 42 72 M3 = 56

Data Set 2

SSW1 = (36 - 42)2 + + (48 - 42)2 = 176 SSW2 = (54 - 52)2 + + (46 - 52)2 = 320 SSW3 = (34 - 56)2 + + (72 - 56)2 = 1264 = 1760 units of variation

for Fuel 1 for Fuel 2 for Fuel 3

An Intuitive Decomposition of Sum of Squares Data Set 2: Conclusion


Sources of Variation Between Groups Within Groups/Error Total Sum of Squares 520 1760 2280 df 2 12 14 Mean Square 260 146.7 F 1.77 p .212

Accept H0 because F = 1.77 < F = 3. 89 (E = .05). Conclude that there is not a significant difference between fuels in MPG.

One-Way (One-Factor) ANOVA: An Intuitive Decomposition of Sum of Squares/Variance


Between-Group Variance small Within-Group Variance small Likely Statistical Outcome hard to say.

small

large

factor has little or no effect. accept HO. factor has a large effect. reject HO. hard to say.

large

small

large

large

Post-Hoc Tukey HSD Test between Means


Tukey HSD !
1

 sx

MSe 2.67 sx ! ! ! .73 ng 5


where ng = the number of cases in each group

Critical value of Tukey statistic (seeTable D) is based on number of groups/factors (3 here) and the df of the error term (12 here) 3.77 for = .05 and 5.05 for = .01

Tukey1-2 = (42 - 52)/.73 = 13.7 Tukey1-3 = (42 - 56)/.73 = 19.2 Tukey2-3 = (52 - 56)/.73 = 5.48

p < .01 p < .01 p < .01

Each of the 3 means are significantly different from each other at .01 level of significance mileage for Fuel 3 > mileage for Fuel 2 > mileage for Fuel 1

SPSS Input for Data Set 1


Fuel 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Mileage 40 44 42 44 40 50 54 52 52 52 56 56 54 58 56

SPSS Output for Data Set 1


Test of Homogeneity of Variances Mileage Levene Statistic 1.000 df1 2 df2 12 Sig. .397

Tests the H0 that the error variance of the dependent variable is equal across groups.

ANOVA M Sum of Squares Between Groups Within Groups Total 520.000 32.000 552.000 df 2 12 14 Mean Square 260.000 2.667 F 97.500 Sig. .000

SPSS Output for Data Set 1


Mu tip il T k y HSD (I) F l (J) F l Diff r i sio

Co pari on

5% Co fi (I-J) .
* * * * * *

c I t rv l Upp r Bou

St . Error . . . . . . . 5 l v l.

Si . . . . . . . 6 6

ow r Bou . 6

- . 4 . 4 . 6 - . 4 6. 6 6. 6

sio

sio

*. T

- 4. . -4. 4. 4. iff r is si ific t t th

- 6. 6 . 4 -6. 6 . 4 . 4

An Intuitive Decomposition of SS: Practice Decision Rule


Fuel 1 20 Fuel 2 25 27 26 26 26 Fuel 3 28 28 27 29 28

Data Set 3

22 21 22 20

M1 = 21 M2 = 26 M3 = 28 Grand M = 25

An Intuitive Decomposition of SS: Practice Between-Groups Variance


Fuel 1 20 Fuel 2 25 27 26 26 26 Fuel 3 28 28 27 29 28

Data Set 3

22 21 22 20

M1 = 21 M2 = 26 M3 = 28 Grand M = 25

An Intuitive Decomposition of SS: Practice Within-Groups Variance


Fuel 1 20 Fuel 2 25 27 26 26 26 Fuel 3 28 28 27 29 28

Data Set 3

22 21 22 20

M1 = 21 M2 = 26 M3 = 28 Grand M = 25

An Intuitive Decomposition of SS: Practice


Fuel 1 20 22 Fuel 2 25 27 26 26 26 M2 = 26 Grand M = 25 Sources of Variation Between Groups Within Groups/Error Total Sum of Squares df Mean Square F p Fuel 3 28 28 27 29 28 M3 = 28

Data Set 3

21 22 20 M1 = 21

One-Way (One-Factor) ANOVA: Fishers Randomized Block Design


 In some cases, an extraneous factor is a systematic source

of variance that increases the error term


 The goal of a randomized block design is to block the

extraneous source of variance and to remove it from the error term, thus increasing the between-groups F value
 in effect, the randomized block design removes unexplained

variance from the error term by associating it with an extraneous factor that is affecting the results
 Fisher (from whom we get our F value) developed the block

design to account for extraneous variance in crop yield associated with farm location (e.g., northern vs. central vs. southern locales in England) in order to test whether there were real differences in his main experimental factor, fertilizer type

One-Factor Randomized Block Design SSTOTAL


=

SSBETWEEN + SSWITHIN
Fertilizer 1 38 42 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33

Data Set Unblocked

29 32 18 22 M1 = 30.17

Grand M = 34.75

SST = (38 34.75)2 + (42 34.75)2 + + (27 34.75)2 + (28 34.75)2 = 1232.25 units of variation

One-Factor Randomized Block Design SSTOTAL


=

SSBETWEEN + SSWITHIN
Fertilizer 1 38 42 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33

Data Set Unblocked

29 32 18 22 M1 = 30.17

Grand M = 34.75

SSB = 6 [(30.17 34.75)2 + (39.33 - 34.75)2] = 252.1 units of variation

One-Factor Randomized Block Design SSTOTAL


=

SSBETWEEN + SSWITHIN
Fertilizer 1 38 42 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33

Data Set Unblocked

29 32 18 22 M1 = 30.17

Grand M = 34.75

SSW1 = (38 30.17)2 + + (22 - 30.17)2 SSW2 = (50 39.33)2 + + (28 - 39.33)2 = 980.17 units of variation

for Fertilizer 1 for Fertilizer 2

One-Factor Randomized Block Design


Fertilizer 1 38 42 29 32 18 22 M1 = 30.17 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33

Data Set Unblocked

Grand M = 34.75 Sources of Variation Between Groups Within Groups/Error Total Sum of Squares 252.1 980.2 1232.3 df 1 10 11 Mean Square 252.1 98.02 112.03 F 2.57 p .140

One-Factor Randomized Block Design SSTOTAL


=

SSBETWEEN + SSBLOCK + SSWITHIN


Fertilizer 1 38 42 29 32 18 22 M1 = 30.17 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33 Grand M = 34.75 MS = 23.75 MC = 35 Sector Mean MN = 45.5

Blocked Variable Northern Sector

Data Set Blocked

Central Sector Southern Sector

SST = (38 34.75)2 + (42 34.75)2 + + (27 34.75)2 + (28 34.75)2 = 1232.25 units of variation

One-Factor Randomized Block Design SSTOTAL


=

SSBETWEEN + SSBLOCK + SSWITHIN


Fertilizer 1 38 42 29 32 18 22 M1 = 30.17 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33 Grand M = 34.75 MS = 23.75 MC = 35 Sector Mean MN = 45.5

Blocked variable Northern Sector

Data Set Blocked

Central Sector Southern Sector

SSB = 6 [(30.17 34.75)2 + (39.33 - 34.75)2] = 252.1 units of variation (NOTE: Same as for Unblocked Data Set)

One-Factor Randomized Block Design SSTOTAL


=

SSBETWEEN + SSBLOCK + SSWITHIN


Fertilizer 1 38 42 29 32 18 22 M1 = 30.17 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33 Grand M = 34.75 MS = 23.75 MC = 35 Sector Mean MN = 45.5

Blocked variable Northern Sector

Data Set Blocked

Central Sector Southern Sector

SSBL = 4 [(45.5 34.75)2 + (35 - 34.75)2 + (23.75 - 34.75)2] = 946.5 units of variation

One-Factor Randomized Block Design


Blocked variable Northern Sector Fertilizer 1 Fertilizer 2 50 52 38 41 27 28 M2 = 39.33 df 2 1 8 11 Sector Mean MN = 45.5 MC = 35 MS = 23.75 Grand M = 34.75 Mean Square 473.25 252.1 4.21 112.03 F 112.4 59.9 p .000 .000 38 42 Central Sector 29 32 Southern Sector 18 22 M1 = 30.17

Sources of Variation Blocked/Extraneous Factor Between Groups Within Groups/Error* Total

Sum of Squares 946.5 252.1 33.7 1232.3

*Was 980.2 Unblocked. 980.2 946.5 = 33.7

SPSS Input for Blocked Data Set


Fertilizer 1 1 1 1 1 1 2 2 2 2 2 2 Plot 1 1 2 2 3 3 1 1 2 2 3 3 Bushels 38 42 29 32 18 22 50 52 38 41 27 28

You might also like