121 Anova

What is ANOVA?
ANOVA
ANalysis Of VAriance
An ANOVA is a way to compare multiple sample means to see if they are significantly different. The term comes from a term that describes what the experiment does: ANalysis Of VAriance = ANOVA. An ANOVA looks at the variance between the sample means, and decides if they are significant or not. This can be done to compare two or more samples.
When are ANOVAs useful?

An ANOVA can be used when one wants to compare any number of samples. This test be done to see if many samples could have come from the same population. This test can also tell you about the differences between two or more areas. For example, if a survey is conducted in many different towns, you can see if their average responses differ significantly. Similarly, you can take samples of plant growth in different climates, soil, or with different treatments. In all cases, an ANOVA can be used to see if the means vary significantly.
When are ANOVAs useful?

Can be used to test whether any number of means differ. Can be used to look at the interacting effects of two or more variables. Compares variability within and between experimental groups to test differences between means.
How does one carry out an ANOVA?

An ANOVA is conducted by first putting all the samples into one, large sample. The standard deviation of this sample is then found, and called . Next, the value for the range of variation in sample means is found. If the variation between the means is greater than the range of variation, the null hypothesis is rejected. The range of variation is found by finding / N, (N is the squareroot of N) where N is the number of samples in each sample. The difference between each pair of sample means is then found, which is the variation of the means. If any one of these is greater than the range of variation, then those two means are significantly different from each other. Depending on your goal, this may cause you to reject your null hypothesis.
What does ANOVA do?

ANOVA tests the following hypotheses:
H0: The means of all the groups are equal. Ha: Not all the means are equal
doesnt say how or which ones differ.
Note: we usually refer to the sub-populations as groups when doing ANOVA.
Hypotheses of One-Way ANOVA

H 1 = 2 = 3means = L = c All population 0 :
All population means are equal i.e., no treatment effect (no variation in means among groups)
Assumptions of ANOVA
each group is approximately normal
check this by looking at histograms and/or normal quantile plots, or use assumptions can handle some nonnormality, but not severe outliers
At least one population mean is different H1 : Not all of the population means are the same
At least one population mean is different i.e., there is a treatment effect Does not mean that all population means are different (some pairs may be the same)
standard deviations of each group are approximately equal

rule of thumb: ratio of largest to smallest sample st. dev. must be less than 2:1
Normality Check
We should check for normality using: assumptions about population histograms for each group normal quantile plot for each group With such small data sets, there really isnt a really good way to check normality from data, but we make the common assumption that physical measurements of people tend to be normally distributed.
Standard Deviation Check

Variable days treatment A B P N 8 8 9 Mean 7.250 8.875 10.111 Median 7.000 9.000 10.000 StDev 1.669 1.458 1.764
Compare largest and smallest standard deviations: largest: 1.764 smallest: 1.458 1.458 x 2 = 2.916 > 1.764
Note: variance ratio of 4:1 is equivalent.
Notation for ANOVA

n = number of individuals all together I = number of groups x = mean for entire data set is Group i has ni = # of individuals in group i xij = value for individual j in group i x i = mean for group i si = standard deviation for group i
How ANOVA works

ANOVA measures two sources of variation in the data and compares their relative sizes variation BETWEEN groups for each data value look at the difference between its group mean and the overall mean
(x i x )2
ij
variation WITHIN groups for each data value we look at the difference between that value and the mean of its group
(x
xi )
How ANOVA works

The ANOVA F-statistic is a ratio of the Between Group Variation divided by the Within Group Variation:
The F-distribution
A ratio of variances follows an F-distribution:
2 between ~ Fn, m 2 within
F=
Between MSG = Within MSE
A large F is evidence against H0, since it indicates that there is more difference between groups than within groups.
The F-test tests the hypothesis that two variances are equal. F will be close to 1 if sample variances are equal.
2 2 H 0 : between = within 2 2 H a : between within
How are these computations made?

We want to measure the amount of variation due to BETWEEN group variation and WITHIN group variation For each data value, we calculate its contribution to: 2 BETWEEN group variation: x i x
How to calculate ANOVAs

Treatment 1 y11 y12 y13 y14 y15 y16 y17 y18 y19 y110
10
10
10
10
n=10 obs./group k=4 groups
WITHIN group variation:
( x ij x i )
y
y1 =
10
1j
j =1
y
y 2 =
10
j =1
2j
y
y 3 =
10
3j
y
y 4 =
10
j =1
4j
j =1
The group means
10
y1 ) 2
10
10
y 3 ) 2
10
(y
j =1
1j
(y
j =1
2j
y 2 ) 2
(y
j =1
3j
(y
j =1
4j
y 4 ) 2
10 1
10 1
10 1
10 1
The (within) group variances
Sum of Squares Within (SSW), or Sum of Squares Error (SSE)

10
Sum of Squares Between (SSB), or Sum of Squares Regression (SSR)

Overall mean of all 40 observations (grand mean mean)
10
1j
(y
j=1
y1)2
(y
j=1
2j
y2 )2
(y
j=1
10
3j
y3 )2
(y
j=1
10
4j
y4 )2
10 ij
101
101
101
101
The (within) group variances
y
y =
i =1 j =1
40
10
(y
j =1
1j
y1 ) 2
+ ( y 2 j y 2 ) +
2 j =1
10
10
(y
j =3
3j
y 3 ) +
2
10
(y
j =1
4j
y 4 )
10 ij
( y
i =1 j =1
yi ) 2
10 x
Sum of Squares Within (SSW) (or SSE, for chance error)
i =1
( y i y ) 2
Sum of Squares Between (SSB). Variability of the group means compared to the grand mean (the variability due to the treatment).
Total Sum of Squares (SST)
Partitioning of Variance
10
i =1 j =1
( y ij y )
Total sum of squares(TSS). Squared difference of every observation from the overall mean. (numerator of variance of Y!)
10 ij
( y
i =1 j =1
y i ) 2
(y
i =1
y ) 2
10 ij
( y
i =1 j =1
y ) 2
SSW + SSB = TSS
ANOVA Table
Source of variation Between (k groups) d.f. k-1 Sum of squares SSB
(sum of squared deviations of group means from grand mean)
ANOVA = t-test
Mean Sum of Squares F-statistic SSB/k-1
SSB SSW
SSB = n
n
(X
i =1
n X n + Yn 2 X + Yn 2 )) + n (Yn ( n )) = 2 2 i =1
(
i =1 2
n X n Yn 2 Y X ) + n ( n n )2 2 2 2 2 i =1
n((
X n 2 Yn 2 X *Y Y X X *Y ) + ( ) 2 n n + ( n )2 + ( n )2 2 n n ) = 2 2 2 2 2 2
p-value Go to
Source of variation Between (2 groups)
d.f. 1
Sum of squares
Mean Sum of Squares
n( X n 2 X n * Yn + Yn ) = n ( X n Yn ) 2
F-statistic
n( X Y ) 2 sp
2
p-value Go to
k 1
nk k
Fk-1,nk-k chart
Within
(n individuals per group)
nk-k
SSW
(sum of squared deviations of observations from their group mean)
s2=SSW/nk-k Within 2n-2
SSB Squared (squared difference difference in means in means times n multiplied by n) SSW
equivalent to numerator of pooled variance
=(
(X Y ) sp
2
sp
) = (t 2 n 2 )
2
Pooled variance
F1, 2n-2 Chart notice values are just (t t 2 2n-2)
Total variation
nk-1
TSS (sum of squared deviations of observations from grand mean)
TSS=SSB + SSW
Total 2n-1 variation
TSS
Example 1
Treatment 1 60 inches 67 42 67 56 62 64 59 72 71 Treatment 2 50 52 43 67 67 59 67 64 63 65 Treatment 3 48 49 50 55 56 61 61 60 59 64 Treatment 4 47 67 54 67 68 65 65 56 60 65
Example 1
Step 1) calculate the sum of squares between groups:
Mean for group 1 = 62.0 Mean for group 2 = 59.7 Mean for group 3 = 56.3 Mean for group 4 = 61.4
Grand mean= 59.85

SSB = [(62-59.85)2 + (59.7-59.85)2 + (56.3-59.85)2 + (61.4-59.85)2 ] xn per group= 19.65x10 = 196.5
Example 1
Step 2) calculate the sum of squares within groups:
The ANOVA table

Step 3) Fill in the ANOVA table
Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value
(60-62) 2+(67-62) 2+ (42-62) 2+ (67-62) 2+ (56-62) 2+ (6262) 2+ (64-62) 2+ (59-62) 2+ (72-62) 2+ (71-62) 2+ (5059.7) 2+ (52-59.7) 2+ (4359.7) 2+67-59.7) 2+ (6759.7) 2+ (69-59.7) 2+.(sum of 40 squared deviations) = 2060.6
Between
196.5
65.5
1.14
.344
Within
36
2060.6
57.2
Total
39
2257.1
Interpretation of ANOVA
Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value
Coefficient of Determination
Between
196.5
65.5
1.14
.344
R2 =
Within
36
2060.6
57.2
SSB SSB = SSB + SSE SST
Total
39
2257.1
The amount of variation in the outcome variable (dependent variable) that is explained by the predictor (independent variable).
INTERPRETATION of ANOVA: How much of the variance in height is explained by treatment group? R2=Coefficient of Determination = SSB/TSS = 196.5/2275.1 = 9%
Example 2
Table 6. Mean micronutrient intake from the school lunch by school
Calcium (mg) Iron (mg) Folate (g) Zinc (mg)
a
Example 2
S1a, n=25 117.8 62.4 2.0 0.6 26.6 13.1 1.9 1.0 S2b, n=25 158.7 70.5 2.0 0.6 38.7 14.5 1.5 1.2 S3c, n=25 206.5 86.2 2.0 0.6 42.6 15.1 1.3 0.4 P-valued 0.000 0.854 0.000 0.055
Mean SDe Mean SD Mean SD Mean SD
Step 1) calculate the sum of squares between groups: Mean for School 1 = 117.8 Mean for School 2 = 158.7 Mean for School 3 = 206.5 Grand mean: 161 SSB = [(117.8-161)2 + (158.7-161)2 + (206.5-161)2] x 25 per group = 98,113
School 1 (most deprived; 40% subsidized lunches). School 2 (medium deprived; <10% subsidized). School 3 (least deprived; no subsidization, private school). d ANOVA; significant differences are highlighted in bold (P<0.05).
b c
FROM: Gould R, Russell J, Barker ME. School lunch menus and 11 to 12 year old children's food choice in three secondary schools in England-are the nutritional standards being met? Appetite. 2006 Jan;46(1):86-92.
Example 2
Step 2) calculate the sum of squares within groups: S.D. for S1 = 62.4 S.D. for S2 = 70.5 S.D. for S3 = 86.2 Therefore, sum of squares within is: (24)[ 62.42 + 70.5 2+ 86.22]=391,066
Example 2
Step 3) Fill in your ANOVA table
Source of variation d.f. 2 Sum of squares 98,113 Mean Sum of Squares 49056 F-statistic 9 p-value <.05
Between
Within Total
72 74
391,066 489,179
5431
**R2=98113/489179=20% School explains 20% of the variance in lunchtime calcium intake in these kids.
An even smaller example

Suppose we have three groups
Group 1: 5.3, 6.0, 6.7 Group 2: 5.5, 6.2, 6.4, 5.7 Group 3: 7.5, 7.2, 7.9
Excel ANOVA Output

ANOVA Source of Variation SS Between Groups 5.127333 Within Groups 1.756667 Total 6.884
1 less than number of groups 1 less than number of individuals (just like other situations)
df
We get the following statistics:
MS F P-value F crit 2 2.563667 10.21575 0.008394 4.737416 7 0.250952 9

number of data values number of groups (equals df for each group added together)
SUMMARY Groups Column 1 Column 2 Column 3
Count
Sum Average Variance 3 18 6 0.49 4 23.8 5.95 0.176667 3 22.6 7.533333 0.123333
Computing ANOVA F statistic

WITHIN difference: group data - group mean mean plain squared 6.00 -0.70 0.490 6.00 0.00 0.000 6.00 0.70 0.490 5.95 -0.45 0.203 5.95 0.25 0.063 5.95 0.45 0.203 5.95 -0.25 0.063 7.53 -0.03 0.001 7.53 -0.33 0.109 7.53 0.37 0.137 1.757 0.25095714 BETWEEN difference group mean - overall mean plain squared -0.4 0.194 -0.4 0.194 -0.4 0.194 -0.5 0.240 -0.5 0.240 -0.5 0.240 -0.5 0.240 1.1 1.188 1.1 1.188 1.1 1.188 5.106 2.55275
Example of Minitab ANOVA Output

Analysis of Variance for days Source DF SS MS treatment 2 34.74 17.37 Error 22 59.26 2.69 Total 24 94.00 F 6.45 P 0.006
data group 5.3 1 6.0 1 6.7 1 5.5 2 6.2 2 6.4 2 5.7 2 7.5 3 7.2 3 7.9 3 TOTAL TOTAL/df
# of data values - # of groups 1 less than # of groups (equals df for each group added together)
1 less than # of individuals (just like other situations)
overall mean: 6.44
F = 2.5528/0.25025 = 10.21575


(x
obs
ij
xi ) 2
( x
obs
ij
x )2
( x
obs
x )2
MSG = SSG / DFG MSE = SSE / DFE
SS stands for sum of squares ANOVA splits this into 3 parts
F = MSG / MSE (P-values for the F statistic are in Table E)
P-value comes from F(DFG,DFE)
So How big is F?
Since F is
Connections between SST, MST, and standard deviation

If ignore the groups for a moment and just compute the standard deviation of the entire data set, we see 2 x ij x SST 2 s = = = MST n 1 DFT
Mean Square Between / Mean Square Within = MSG / MSE

A large value of F indicates relatively more difference between groups than within groups (evidence against H0) To get the P-value, we compare to F(I-1,n-I)-distribution I-1 degrees of freedom in numerator (# groups -1) n - I degrees of freedom in denominator (rest of df)
So SST = (n -1) s2, and MST = s2. That is, SST and MST measure the TOTAL variation in the data set.
Connections between SSE, MSE, and standard deviation

Remember: si =
2
Pooled estimate for standard deviation

One of the ANOVA assumptions is that all groups have the same standard deviation. We can estimate this with a weighted average:
(x
ij
xi )
ni 1
SS [ Within Group i ] dfi
So SS[Within Group i] = (si2) (dfi ) This means that we can compute SSE from the standard deviations and sizes (df) of each group:
s2 p =
SSE = SS [Within ] = SS [Within Group i ] = si2 ( ni 1) = si2 (dfi )
2 (n1 1)s12 + ( n 2 1) s2 + ... + (n I 1) sI2 nI 2 + ... + ( df I ) sI2 (df1 ) s12 + (df 2 )s2 2 sp = df1 + df 2 + ... + df I
2 sp =
SSE = MSE DFE
so MSE is the pooled estimate of variance
In Summary
SST = ( x ij x ) 2 = s2 ( DFT )
obs
SSE = ( x ij x i ) 2 = si (df i )
2 obs groups
SSG = ( x i x ) 2 =
obs
n (x
i groups
x)2
SSE +SSG = SST; MS =
SS MSG ; F= DF MSE

121 Anova

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

121 Anova

Uploaded by

Copyright:

Available Formats

What is ANOVA?

When are ANOVAs useful?

When are ANOVAs useful?

How does one carry out an ANOVA?

What does ANOVA do?

Note: we usually refer to the sub-populations as groups when doing ANOVA.

Hypotheses of One-Way ANOVA

standard deviations of each group are approximately equal

Standard Deviation Check

Notation for ANOVA

How ANOVA works

How ANOVA works

Between MSG = Within MSE

How are these computations made?

How to calculate ANOVAs

n=10 obs./group k=4 groups

WITHIN group variation:

The group means

The (within) group variances

Sum of Squares Within (SSW), or Sum of Squares Error (SSE)

Sum of Squares Between (SSB), or Sum of Squares Regression (SSR)

The (within) group variances

Total Sum of Squares (SST)

SSW + SSB = TSS

Source of variation Between (2 groups)

Mean Sum of Squares

s2=SSW/nk-k Within 2n-2

F1, 2n-2 Chart notice values are just (t t 2 2n-2)

TSS (sum of squared deviations of observations from grand mean)

Total 2n-1 variation

Grand mean= 59.85

The ANOVA table

SSB SSB = SSB + SSE SST

Mean SDe Mean SD Mean SD Mean SD

An even smaller example

Excel ANOVA Output

We get the following statistics:

MS F P-value F crit 2 2.563667 10.21575 0.008394 4.737416 7 0.250952 9

SUMMARY Groups Column 1 Column 2 Column 3

Computing ANOVA F statistic

Example of Minitab ANOVA Output

1 less than # of individuals (just like other situations)

overall mean: 6.44

Example of Minitab ANOVA Output

Example of Minitab ANOVA Output

MSG = SSG / DFG MSE = SSE / DFE

SS stands for sum of squares ANOVA splits this into 3 parts

F = MSG / MSE (P-values for the F statistic are in Table E)

P-value comes from F(DFG,DFE)

Connections between SST, MST, and standard deviation

Mean Square Between / Mean Square Within = MSG / MSE

Connections between SSE, MSE, and standard deviation

Pooled estimate for standard deviation

SS [ Within Group i ] dfi

SSE = SS [Within ] = SS [Within Group i ] = si2 ( ni 1) = si2 (dfi )

SSE = MSE DFE

so MSE is the pooled estimate of variance

SSE +SSG = SST; MS =

You might also like