Analysis of Variance (ANOVA) Techniques

Biostatistics
BY: Zelalem M.( M.Sc. In Biostatistics)

E-mail: zelalem2011@gmail.com
College of Medicine and Health
Sciences
Bahir Dar University
12/16/16
Analysis of Variance
1. Introduction
Definition 1:
The response variable is the variable of

interest to be measured in the experiment. We
also refer to the response as the dependent
variable.
Definition 2:
An experimental unit is the object on which the
response of factors are observed or measured.
Definition 3 : Replication refers to the
number of experimental units that receive the
Sampling Study
Random sample of subjects who belong to
different groups
Health
Eaters
Cholesterol
Levels
Vegetarians
Meat &
Potato
Eaters
Random
Sample
Random
Sample
Random
Sample
y11
y 21
y 31
y12
y 22
y 32
y1n1
y 2n 2
y 3 n3
Experimental Study
An experiment in which the subjects are randomly assigned to
one of several groups
Male College Undergraduate Students
Veg.
Diet
Random Sampling
Health
Diet
Set of
Experimental
Units
Cholesterol
Levels @ 1
year.
Set of
Experimental
Units
Set of
Experimental
Units
y11
y 21
y 31
y12
y 22
y 32
y1n1
y 2n 2
y 3 n3
M&P
Diet
Responses
Analysis of Variance
A division of the overall variability in data
values in order to compare means.
Overall (or total) variability is divided into
two components:
the variability between groups, and
the variability within groups
Summarized in an ANOVA table.
ANOVA Notation
Group
1
2
Data
Means
X 1n1
X 1
X 21
X 2n2
X 2
X m1
X m2
X mn m
X m
X 11
X 12
X 21
Grand Mean
The Model
Yij = + j + eij, or,
Yij - = j + eij.
The difference between the grand
mean () and the value of subject
number i in group number j
is equal to the effect of being in
treatment group number j, j,
plus error, eij
General ANOVA Table

One-way Analysis of Variance
Source
Treatment
Error
Total
DF
m-1
n-m
n-1
SS
MS
SS(Between) MSB
SS(Error)
MSE
SS(Total)
F
MSB/MSE
From F-distribution with m-1

numerator and n-m denominator
d.f.
n-1 = (m-1) + (n-m)
MSB = SS(Between)/(m-1)
MSE = SS(Error)/(n-m)
SS(Total) = SS(Between) + SS(Error)
The hypothesis to be tested for k means

(k>2) is
H0 : 1 = 2 = 3 = . . . =k
H1 : At least one population mean does not
equal to
another population mean
(We can not carry out a one-tailed alternative
hypothesis, H1 : 1 < 2 < 3 < < k) but
write asH1:12
If we reject the null hypothesis, we cannot
conclude that at least one population mean
does not equal another population mean
Assumptions
Observed
random
data
constitute
sample from
independent
the respective
populations.
Each of the populations from which the
sample comes is normally distributed but
not for skewness.
Each of the populations has the same
variance.
2. One Way ANOVA

If there is only one factor we call it oneway ANOVA.
For one factor with k groups, we use the
following definitions:
Xijj is the ith observation in the jth group
X
is the mean of all observations in the jth

group
is the grand mean of the observations.

Refer the following example
Experimental Study
Male College Undergraduate Students
Veg.
Diet
Random Sampling
Health
Diet
Set of
Experimental
Units
x11
Cholesterol
x12
Levels @ 1
year.
x1n1
Set of
Experimental
Units
Set of
Experimental
Units
x21
x22
x32
x2 n2
x3n3
M&P
Diet
x31
Responses
SST = SStt + SSE

ni
ni
SST X ij X X ij X i X i X
m
i 1 j 1
ni
X
m
i 1
j 1
ni
i 1
j 1
i 1 j 1
ni
i 1
j 1
X i 2 X ij X i X i X
2
ij
X i X
SST SStt SSE

Weve broken down the TOTAL variation into a
component due to TREATMENT and a component due to
random ERROR.
A one-way ANOVA test of the null

hypothesis requires calculation of
the
following quantities
Between groups sum of squares (SSB)
k
SSB n j ( X j X )
j 1
Within-groups ksum of squares (SSW)

2
SSW ( n j 1) S j
j 1
Between-groups degrees of freedom = k-1

Within-groups degrees of freedom = n-k,
where
Between -groups mean square (MSB)
SSB
MSB
k 1
Within-groups mean square (MSW)

SSW
MSW
nk
The F-statistic is used to test the null

hypothesis
with d.f. =(k-1,
n-k)
MSB
MSW
The within-groups mean square is also called

the error mean square (MSE); the degrees of
freedom is error degrees of freedom and the
SS is error sum of squares (SSE).
Between Groups Variance measures how
groups vary about the Grand mean.
Within Groups Variance measures how
scores in each group vary about the group
mean.
If calculated F > Ftabulated reject HO.

Ftabulated depends on , degree of freedom
for the numerator and degree of freedom
for the denominator
Follow-up Procedures
significant F only tells us there are
differences, not where specific differences
lie.
Lets call the between groups variation:

Between -groups mean square (MSB)
Lets call the within groups variation:
Within-groups mean square (MSW)

ANOVA compares Between -groups mean
square
(MSB)
toMSB
Within-groups
square (MSW)
through a ratio
MSW
mean
So what are these MSB and MSW?

Remember the meaning of Variance!
What we are doing is separating out our
dependent variables overall variance for all
groups into that which is attributable to the
deviations of groups means from the overall
mean (B) and deviations of individuals
scores from their groups mean (W).
MSB
MSW
As MSW gets larger, F gets smaller.

Y-bar Y-bar
As MSW gets smaller, F gets larger.
Y-
bar
Y-bar
Y-bar
Y-bar
So, as F gets smaller, the groups are less

distinct. As F gets larger, the groups are
more distinct.
Sir Ronald Fisher (18901962)

Founder of population genetics
Analysis of variance (in 1920s)
likelihood
P-value
randomized experiments
multiple regression
etc., etc. etc.
Summary of the ANOVA

Technique
Between
Group Sum
of Squares
SSB
Between
Group mean
square
Estimate
MSB
MSB
FCal .
MSW
Total
Variability
Within Groups
sum of
squares
SSW
Within-Groups
mean square
estimate
MSW
ANOVA Summary Table :

The F Ratio
Source
Between
MSB/MSW
Within
SSW
Total
SST
SS
df
SSB
n-k
MS
k-1
F
MSB
MSW
n-1
Where k = number of groups

n = total number of observations (subjects) in
all groups
H 0 : 1 2 ........ k
H 1 : some ' s are unequal

The FEffect
Ratio
of
eating
habit
Source
Between
Within
SSW
SS
SSB
n-k
df
MS
k-1
F
MSB
MSB/MSW
MSW
Individual
Total
SST
Nn- 1
Differences
(measureme
Where k = number
of groups
nt error,
n = total number
of observations (subjects) in all groups
random
fluctuation)
Ho: 1 = 2 = 3....
H1: some s unequal

TheSum
F of
Ratio
Degrees of
Freedom
squared
deviations
( x x) 2
Source
Between
Within
SSW
Total
SST
SS
SSB
n-k
n-1
df
MS
k-1
F
MSB
MSB/MSW
MSW
Mean of the
Squares
( x x)
Where k = number of groups

N = total number of observations (subjects) in all groups
Ho: 1 = 2 = 3....
H1: some s unequal
/n
Examples of F distribution. The shape of the

distribution is depend on the sample size and the
number of groups we are comparing:
Critical values for the Fstatistic
DCH, AAU
Relationship Between t and F

F = t2
F
and
are
based
on
the
same
mathematical model and t is just a special

case of ANOVA.
It is ok to use F test when comparing 2
means
Example
The following table shows the natural killer
cell activity measured for three groups of
subjects: those who had low, medium, and
high scores on the social readjustment rating
scale.
Mean
Natural Killer Cell Activity (lytic

Low
Moderate
High score
units)
score
22.2
15.1score
10.2
91.8
23.2
11.3
29.1
10.5
11.4
37.0
13.9
5.3
35.8
9.7
14.5
44.2
19.0
11.0
88.0
19.8
13.6
56.0
9.1
33.4
9.3
30.1
25.0
19.9
15.5
27.0
39.5
10.3
36.3
12.8
11.0
17.7
37.4
40.23
15.60
18.06
SD
25.71
6.42
9.97
1 = 2 = 3 (the mean natural killer cell

activity is equal in the three groups)
1 2 or 23 or 13 (the means are not
equal)
The data is collected
The test statistic is the F-statistic
= 0.01
Critical value with d.f. k-1=2 and n-k=34 is
F0.01; 2, 34 = 5.31
13 40.23 12 15.6 12 18.06

X.
25.05
13 12 12
3
SSB n j ( X j X ) 2
j 1
13(40.23 25.05) 2 12(15.60 25.05) 2 12(18.06 25.05) 2 4653.57

3
SSW (n j 1) S j
j 1
12(25.71) 2 11(6.42) 2 11(9.97 ) 2 9478 .84
Between-groups d.f. = k-1 = 3-1 =

2
Within-groups d.f. = n-k = 37-3 =

34
MSB = SSB/k-1 = 4653.57/2 =

2326.79
MSW = SSW/n-k = 9478.84/34 =

278.79
Since Fcal = 8.35 > Ftab = 5.31 we reject

H0. There is a difference in mean natural
killer cell activity among patients with low,
moderate and high scores on the social
readjustment rating scale.
Note: that rejecting H0 does not tell us
which group means differ
Source of
d.f.
SS
MS
Variation
Between
statistic
2
-groups
Within -groups
34
4653.5
2326.7
9478.8
278.79
4
Total
F-
36
14132.4
1
8.35
Question: We know that at least one mean is

different from another.
We dont know
which one. Now what?

Answer: We do a post-hoc test to see which
means are different from each other.
A post-hoc test (after the fact test) is a
series
of
independent
samples
t-tests
comparing each groups mean to each of

the others means (alpha value adjustment
is needed).
3. Multiple-comparison
The analysis of variance tests the global

hypothesis that all the population means are
equal against at least one population mean
does not equal another population mean.
It does not provide any information on which
population or populations differed from the
other.
There are variety of methods, called multiplecomparison procedures, that can be used to
provide information on this point.
All are essentially based on the t-test but
include appropriate corrections for the fact
Conducting the t-tests, you will have k(k-1)/2

pairs of t-tests.
In our example, we will have 3 (3-1)/2 or 3
comparisons.
Because of the likelihood of multiple
comparison errors, statisticians have created
ways to reduce the multiple comparison error
rate.
One of these is the Bonferroni, which adjusts
the -level for each comparison by the
number of comparisons.
This lowers the
likelihood of rejection in each test, making
the joint -level equal to the original -level.
In our example, .05/3 = .017. So -level for
comparison tests
The frequently used tests with there suitability
to the situations are,
- t-test for the paired or independent
groups
(with or without adjusting the level
downward);
- Bonferronis t-method
- Dunns multiple-comparison procedure;
- Tukeys HSD (Honestly Significant Difference)

- Scheffes Procedure;
- Newman-Keuls Procedure;
- Dunnetts Procedure;
- Duncans new multiple-range test
- Least Significant Difference (LSD)
Choice of Multiple Comparisons of Means

The choice of a multiple comparisons method in ANOVA will
depend on the type of experimental design used and the
comparisons of interest to the analyst.
For example, Tukey ( 1949) developed his procedure
specifically for pairwise comparisons when the sample sizes of
the treatments are equal.
The Bonferroni method, like the Tukey procedure, can be
applied when pair wise comparisons are of interest; however,
Bonferroni's method does not require equal sample sizes.
Scheffe (1953) developed a more general procedure for
comparing all possible linear combinations of treatment means
( called contrasts).
Consequently, when making pairwise comparisons, the
confidence intervals produced by Scheffe's method will
generally be wider than the Tukey or Bonferroni confidence
intervals.
Scheff test:
Suitable for pair-wise comparison
between all groups, not simply pair
wise compare
Corrects for the increased risk of a
Type I error (most conservative posthoc test)
Dunnet test:
Useful for planned comparisons,
e.g. comparing two different groups
against a single control group
Less stringent than Scheff

Analysis of Variance (ANOVA) Techniques

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Variance (ANOVA) Techniques

Uploaded by

Copyright:

Available Formats

Biostatistics

BY: Zelalem M.( M.Sc. In Biostatistics)

The response variable is the variable of

Male College Undergraduate Students

General ANOVA Table

From F-distribution with m-1

n-1 = (m-1) + (n-m)

SS(Total) = SS(Between) + SS(Error)

The hypothesis to be tested for k means

2. One Way ANOVA

is the mean of all observations in the jth

is the grand mean of the observations.

SST = SStt + SSE

SST SStt SSE

A one-way ANOVA test of the null

Within-groups ksum of squares (SSW)

Between-groups degrees of freedom = k-1

Between -groups mean square (MSB)

Within-groups mean square (MSW)

The F-statistic is used to test the null

The within-groups mean square is also called

If calculated F > Ftabulated reject HO.

Lets call the between groups variation:

Within-groups mean square (MSW)

So what are these MSB and MSW?

As MSW gets larger, F gets smaller.

So, as F gets smaller, the groups are less

Sir Ronald Fisher (18901962)

Summary of the ANOVA

ANOVA Summary Table :

Where k = number of groups

ANOVA Summary Table :

ANOVA Summary Table :

Where k = number of groups

Examples of F distribution. The shape of the

Critical values for the Fstatistic

Relationship Between t and F

mathematical model and t is just a special

Natural Killer Cell Activity (lytic

1 = 2 = 3 (the mean natural killer cell

13 40.23 12 15.6 12 18.06

13(40.23 25.05) 2 12(15.60 25.05) 2 12(18.06 25.05) 2 4653.57

12(25.71) 2 11(6.42) 2 11(9.97 ) 2 9478 .84

Between-groups d.f. = k-1 = 3-1 =

Within-groups d.f. = n-k = 37-3 =

MSB = SSB/k-1 = 4653.57/2 =

MSW = SSW/n-k = 9478.84/34 =

Since Fcal = 8.35 > Ftab = 5.31 we reject

Question: We know that at least one mean is

which one. Now what?

comparing each groups mean to each of

The analysis of variance tests the global

Conducting the t-tests, you will have k(k-1)/2

- Tukeys HSD (Honestly Significant Difference)

Choice of Multiple Comparisons of Means

You might also like