Professional Documents
Culture Documents
and Chi-square
p < .01
One-way ANOVA
ANOVA
(ANalysis Of VAriance)
Idea: For two or more groups, test
difference between means, for
quantitative normally distributed
variables.
Just an extension of the t-test (an
ANOVA with only two groups is
mathematically equivalent to a t-test).
One-Way Analysis of Variance
H 0 : μ1 μ 2 μ 3
2
between
~ Fn,m
2
within
H a : between
2
within
2
ANOVA example
1.1
Within group
Between variability
1.0
S
group
P variation
I
N Within group
E
Within group variability
0.9
variability
0.8
0.7
PLACEBO 800mg CALCIUM 1500 mg CALCIUM
Group means and standard
deviations
Placebo group (n=11):
Mean spine BMD = .92 g/cm2
standard deviation = .10 g/cm2
800 mg calcium supplement group (n=11)
Mean spine BMD = .94 g/cm2
standard deviation = .08 g/cm2
1500 mg calcium supplement group (n=11)
Mean spine BMD =1.06 g/cm2
standard deviation = .11 g/cm2
The size of the
Between-group groups. The difference of
variation. each group’s
The F-Test
mean from the
overall mean.
(. 92 .97 ) 2
(. 94 .97 ) 2
(1.06 .97 ) 2
2
sbetween nsx2 11 * ( ) .063
3 1
2
swithin avg s 2 1 (.10 2 .08 2 .112 ) .0095
3
2
s .063
F2,30 between
2
6.6
s within .0095
Large F value indicates
The average Each group’s variance.
that the between group
amount of variation exceeds the
variation within within group variation
groups. (=the background
noise).
How to calculate ANOVA’s by
hand…
Treatment 1 Treatment 2 Treatment 3 Treatment 4
y11 y21 y31 y41
y12 y22 y32 y42 n=10 obs./group
y13 y23 y33 y43
y14 y24 y34 y44 k=4 groups
y15 y25 y35 y45
y16 y26 y36 y46
y17 y27 y37 y47
y18 y28 y38 y48
y19 y29 y39 y49
y110 y210 y310 y410
10
10 10 10
y1 j
y 2j y 3j y 4j The group means
j 1 j 1
y1 y 2
j 1
y 3
j 1 y 4
10 10 10
10
10
10 10
(y (y
10
( y 2 j y 2 ) 2
(y y 3 ) y 4 ) 2
2
y1 ) 2
The (within)
1j 3j 4j
j 1 j 1 j 1 j 1
10 1 10 1 10 1 10 1 group variances
Sum of Squares Within (SSW),
or Sum of Squares Error (SSE)
10
(y
10 10
(y (y
10
y 2 )
(y
2
y1 ) 2 2j y 3 ) 2
y 4 ) 2
The (within)
1j 3j 4j
j 1 j 1 j 1 j 1
group variances
10 1 10 1 10 1 10 1
10 10
(y
10 10
(y + ( y 3 j y 3 ) + y 4 ) 2
2
y1 ) 2 ( y 2 j y 2 ) 2 + 4j
1j
j 1 j 3 j 1
j 1
4 10
i 1 j 1
( y ij y i ) 2 Sum of Squares Within (SSW)
(or SSE, for chance error)
Sum of Squares Between (SSB), or
Sum of Squares Regression (SSR)
4 10
Overall mean
of all 40
observations
y
i 1 j 1
ij
(“grand mean”) y
40
i 1 j 1
( y ij y ) 2 every observation from the
overall mean. (numerator
of variance of Y!)
Partitioning of Variance
4 10 4 4 10
( y
i 1 j 1
ij y i ) 2
+10x ( y i y ) 2 = ( y ij y ) 2
i 1 i 1 j 1
X n Yn 2 Y X
n ( ) n ( n n )2
i 1 2 2 i 1 2 2
X n 2 Yn 2 X *Y Y X X *Y
n(( ) ( ) 2 n n ( n )2 ( n )2 2 n n )
2 2 2 2 2 2
n( X n 2 X n * Yn Yn ) n( X n Yn ) 2
2 2
Mean
Source of Sum of Sum of
variation d.f. squares Squares F-statistic p-value
59.7) 2+ (69-59.7) 71 65 64 65
2…+….(sum of 40 squared
deviations) = 2060.6
Step 3) Fill in the ANOVA table
Source of variation d.f. Sum of squares Mean Sum of F-statistic p-value
Squares
Total 39 2257.1
Step 3) Fill in the ANOVA table
Source of variation d.f. Sum of squares Mean Sum of F-statistic p-value
Squares
Total 39 2257.1
INTERPRETATION of ANOVA:
How much of the variance in height is explained by treatment group?
R2=“Coefficient of Determination” = SSB/TSS = 196.5/2275.1=9%
Coefficient of Determination
SSB SSB
R 2
SSB SSE SST
The amount of variation in the outcome variable (dependent
variable) that is explained by the predictor (independent variable).
Beyond one-way ANOVA
Often, you may want to test more than 1
treatment. ANOVA can accommodate
more than 1 treatment or factor, so long
as they are independent. Again, the
variation partitions beautifully!
Total 74 489,179
**R2=98113/489179=20%
School explains 20% of the variance in lunchtime calcium
intake in these kids.
ANOVA summary
A statistically significant ANOVA (F-test)
only tells you that at least two of the
groups differ, but not which ones differ.
• Scheffe (adjusts p)
Arrange p-values:
6 9 7 10 5 2 8 4 3
Conformed?
2 4 6 8 10
Yes 20 50 75 60 30
No 80 50 25 40 70
Conformed?
2 4 6 8 10
Yes 47 47 47 47 47
No 53 53 53 53 53
Do observed and expected differ more
than expected due to chance?
Chi-Square test
(observed - expected)2
2
expected
The expected
value and
variance of a chi-
square:
E(x)=df
Var(x)=2(df)
Chi-Square test
(observed - expected)2
2
expected
Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom,
indicates statistical significance. Here 85>>4.
Caveat
**When the sample size is very small in
any cell (<5), Fisher’s exact test is
used as an alternative to the chi-square
test.
Chi-square example: recall data…
Cell size of 3 tells us we should opt for Fisher’s exact result in SAS. But
doesn’t turn out very different in this case.
8 435 453
5 3
ptumor / cellphone .014; ptumor / nophone .033
352 91
(pˆ1 p
ˆ2) 0 8
Z ;p .018
( p )(1 p ) ( p )(1 p ) 453
n1 n2
(.014 .033) .019
Z 1.22
(.018 )(.982 ) (.018 )(.982 ) .0156
352 91
Same data, but use Chi-square test
Brain tumor No brain tumor
Own 5 347 352
Don’t own 3 88 91
8 435 453
8 352
ptumor .018; pcellphone .777
453 453
ptumor xpcellphone .018 * .777 .014
Expected in cell a .014 * 453 6.3; 1.7 in cell c;
345.7 in cell b; 89.3 in cell d
(R-1 )*(C-1 ) 1*1 1 df
(8 - 6.3) 2 (3 - 1.7) 2 (89.3 - 88) 2 (347 - 345.7) 2
2
1 1.48
6.3 1. 7 89 .3 345 .7
NS
note :Z 2 1.22 2 1.48
Same data, but use Odds Ratio
Brain tumor No brain tumor
8 435 453
5 * 88
OR .423
3 * 347
lnOR - 0 ln(.423) .86
Z 1.16; p .05
1 1 1 1 1 1 1 1 .74
a b c d 5 347 3 88