Professional Documents
Culture Documents
ANOVA
ANalysis Of VAriance
An ANOVA is a way to compare multiple sample means to see if they are significantly different. The term comes from a term that describes what the experiment does: ANalysis Of VAriance = ANOVA. An ANOVA looks at the variance between the sample means, and decides if they are significant or not. This can be done to compare two or more samples.
Assumptions of ANOVA
each group is approximately normal
check this by looking at histograms and/or normal quantile plots, or use assumptions can handle some nonnormality, but not severe outliers
At least one population mean is different H1 : Not all of the population means are the same
At least one population mean is different i.e., there is a treatment effect Does not mean that all population means are different (some pairs may be the same)
Normality Check
We should check for normality using: assumptions about population histograms for each group normal quantile plot for each group With such small data sets, there really isnt a really good way to check normality from data, but we make the common assumption that physical measurements of people tend to be normally distributed.
Compare largest and smallest standard deviations: largest: 1.764 smallest: 1.458 1.458 x 2 = 2.916 > 1.764
Note: variance ratio of 4:1 is equivalent.
(x i x )2
ij
variation WITHIN groups for each data value we look at the difference between that value and the mean of its group
(x
xi )
The F-distribution
A ratio of variances follows an F-distribution:
2 between ~ Fn, m 2 within
F=
A large F is evidence against H0, since it indicates that there is more difference between groups than within groups.
The F-test tests the hypothesis that two variances are equal. F will be close to 1 if sample variances are equal.
2 2 H 0 : between = within 2 2 H a : between within
Treatment 2 y21 y22 y23 y24 y25 y26 y27 y28 y29 y210
10
Treatment 3 y31 y32 y33 y34 y35 y36 y37 y38 y39 y310
10
Treatment 4 y41 y42 y43 y44 y45 y46 y47 y48 y49 y410
10
( x ij x i )
y
y1 =
10
1j
j =1
y
y 2 =
10
j =1
2j
y
y 3 =
10
3j
y
y 4 =
10
j =1
4j
j =1
10
y1 ) 2
10
10
y 3 ) 2
10
(y
j =1
1j
(y
j =1
2j
y 2 ) 2
(y
j =1
3j
(y
j =1
4j
y 4 ) 2
10 1
10 1
10 1
10 1
10
1j
(y
j=1
y1)2
(y
j=1
2j
y2 )2
(y
j=1
10
3j
y3 )2
(y
j=1
10
4j
y4 )2
10 ij
101
101
101
101
y
y =
i =1 j =1
40
10
(y
j =1
1j
y1 ) 2
+ ( y 2 j y 2 ) +
2 j =1
10
10
(y
j =3
3j
y 3 ) +
2
10
(y
j =1
4j
y 4 )
10 ij
( y
i =1 j =1
yi ) 2
10 x
Sum of Squares Within (SSW) (or SSE, for chance error)
i =1
( y i y ) 2
Sum of Squares Between (SSB). Variability of the group means compared to the grand mean (the variability due to the treatment).
Partitioning of Variance
10
i =1 j =1
( y ij y )
Total sum of squares(TSS). Squared difference of every observation from the overall mean. (numerator of variance of Y!)
10 ij
( y
i =1 j =1
y i ) 2
(y
i =1
y ) 2
10 ij
( y
i =1 j =1
y ) 2
ANOVA Table
Source of variation Between (k groups) d.f. k-1 Sum of squares SSB
(sum of squared deviations of group means from grand mean)
ANOVA = t-test
Mean Sum of Squares F-statistic SSB/k-1
SSB SSW
SSB = n
n
(X
i =1
n X n + Yn 2 X + Yn 2 )) + n (Yn ( n )) = 2 2 i =1
(
i =1 2
n X n Yn 2 Y X ) + n ( n n )2 2 2 2 2 i =1
n((
X n 2 Yn 2 X *Y Y X X *Y ) + ( ) 2 n n + ( n )2 + ( n )2 2 n n ) = 2 2 2 2 2 2
p-value Go to
d.f. 1
Sum of squares
n( X n 2 X n * Yn + Yn ) = n ( X n Yn ) 2
F-statistic
n( X Y ) 2 sp
2
p-value Go to
k 1
nk k
Fk-1,nk-k chart
Within
(n individuals per group)
nk-k
SSW
(sum of squared deviations of observations from their group mean)
SSB Squared (squared difference difference in means in means times n multiplied by n) SSW
equivalent to numerator of pooled variance
=(
(X Y ) sp
2
sp
) = (t 2 n 2 )
2
Pooled variance
Total variation
nk-1
TSS=SSB + SSW
TSS
Example 1
Treatment 1 60 inches 67 42 67 56 62 64 59 72 71 Treatment 2 50 52 43 67 67 59 67 64 63 65 Treatment 3 48 49 50 55 56 61 61 60 59 64 Treatment 4 47 67 54 67 68 65 65 56 60 65
Example 1
Step 1) calculate the sum of squares between groups:
Treatment 1 60 inches 67 42 67 56 62 64 59 72 71 Treatment 2 50 52 43 67 67 59 67 64 63 65 Treatment 3 48 49 50 55 56 61 61 60 59 64 Treatment 4 47 67 54 67 68 65 65 56 60 65
Mean for group 1 = 62.0 Mean for group 2 = 59.7 Mean for group 3 = 56.3 Mean for group 4 = 61.4
Example 1
Step 2) calculate the sum of squares within groups:
Treatment 1 60 inches 67 42 67 56 62 64 59 72 71 Treatment 2 50 52 43 67 67 59 67 64 63 65 Treatment 3 48 49 50 55 56 61 61 60 59 64 Treatment 4 47 67 54 67 68 65 65 56 60 65
(60-62) 2+(67-62) 2+ (42-62) 2+ (67-62) 2+ (56-62) 2+ (6262) 2+ (64-62) 2+ (59-62) 2+ (72-62) 2+ (71-62) 2+ (5059.7) 2+ (52-59.7) 2+ (4359.7) 2+67-59.7) 2+ (6759.7) 2+ (69-59.7) 2+.(sum of 40 squared deviations) = 2060.6
Between
196.5
65.5
1.14
.344
Within
36
2060.6
57.2
Total
39
2257.1
Interpretation of ANOVA
Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value
Coefficient of Determination
Between
196.5
65.5
1.14
.344
R2 =
Within
36
2060.6
57.2
Total
39
2257.1
The amount of variation in the outcome variable (dependent variable) that is explained by the predictor (independent variable).
INTERPRETATION of ANOVA: How much of the variance in height is explained by treatment group? R2=Coefficient of Determination = SSB/TSS = 196.5/2275.1 = 9%
Example 2
Table 6. Mean micronutrient intake from the school lunch by school
Calcium (mg) Iron (mg) Folate (g) Zinc (mg)
a
Example 2
S1a, n=25 117.8 62.4 2.0 0.6 26.6 13.1 1.9 1.0 S2b, n=25 158.7 70.5 2.0 0.6 38.7 14.5 1.5 1.2 S3c, n=25 206.5 86.2 2.0 0.6 42.6 15.1 1.3 0.4 P-valued 0.000 0.854 0.000 0.055
Step 1) calculate the sum of squares between groups: Mean for School 1 = 117.8 Mean for School 2 = 158.7 Mean for School 3 = 206.5 Grand mean: 161 SSB = [(117.8-161)2 + (158.7-161)2 + (206.5-161)2] x 25 per group = 98,113
School 1 (most deprived; 40% subsidized lunches). School 2 (medium deprived; <10% subsidized). School 3 (least deprived; no subsidization, private school). d ANOVA; significant differences are highlighted in bold (P<0.05).
b c
FROM: Gould R, Russell J, Barker ME. School lunch menus and 11 to 12 year old children's food choice in three secondary schools in England-are the nutritional standards being met? Appetite. 2006 Jan;46(1):86-92.
Example 2
Step 2) calculate the sum of squares within groups: S.D. for S1 = 62.4 S.D. for S2 = 70.5 S.D. for S3 = 86.2 Therefore, sum of squares within is: (24)[ 62.42 + 70.5 2+ 86.22]=391,066
Example 2
Step 3) Fill in your ANOVA table
Source of variation d.f. 2 Sum of squares 98,113 Mean Sum of Squares 49056 F-statistic 9 p-value <.05
Between
Within Total
72 74
391,066 489,179
5431
**R2=98113/489179=20% School explains 20% of the variance in lunchtime calcium intake in these kids.
df
Count
Sum Average Variance 3 18 6 0.49 4 23.8 5.95 0.176667 3 22.6 7.533333 0.123333
data group 5.3 1 6.0 1 6.7 1 5.5 2 6.2 2 6.4 2 5.7 2 7.5 3 7.2 3 7.9 3 TOTAL TOTAL/df
# of data values - # of groups 1 less than # of groups (equals df for each group added together)
F = 2.5528/0.25025 = 10.21575
(x
obs
ij
xi ) 2
( x
obs
ij
x )2
( x
obs
x )2
So How big is F?
Since F is
So SST = (n -1) s2, and MST = s2. That is, SST and MST measure the TOTAL variation in the data set.
(x
ij
xi )
ni 1
So SS[Within Group i] = (si2) (dfi ) This means that we can compute SSE from the standard deviations and sizes (df) of each group:
s2 p =
2 (n1 1)s12 + ( n 2 1) s2 + ... + (n I 1) sI2 nI 2 + ... + ( df I ) sI2 (df1 ) s12 + (df 2 )s2 2 sp = df1 + df 2 + ... + df I
2 sp =
In Summary
SST = ( x ij x ) 2 = s2 ( DFT )
obs
SSE = ( x ij x i ) 2 = si (df i )
2 obs groups
SSG = ( x i x ) 2 =
obs
n (x
i groups
x)2
SS MSG ; F= DF MSE