Desci2 Lecture
Learning Objectives
In this chapter, you learn:
The basic concepts of experimental design
How to use oneway analysis of variance to test
for differences among the means of several
populations (also referred to as groups in this
chapter)
When to use a randomized block design
How to use twoway analysis of variance and the
concept of interaction
Chapter Overview
Analysis of Variance (ANOVA)
Ftest
Tukey
Kramer
test
OneWay
ANOVA
TwoWay
ANOVA
Interaction
Effects
Randomized
Block Design
Multiple
Comparisons
example
A financial analyst studied the annual returns of
three different categories of unit trust.
She wanted to know whether the average
annual returns per category varied across unit
trust categories.
A random sample of unit trusts from each of
three categories (labelled A, B and C) were
selected and their annual returns were
recorded.
The data is shown in Table below.
Management Question
Can the financial analyst conclude that the
average annual returns from the three
categories of unit trusts are the same?
General ANOVA Setting
Investigator controls one or more independent
variables
Called factors (or treatment variables)
Each factor contains two or more levels (or groups or
categories/classifications)
Observe effects on the dependent variable
Response to levels of independent variable
Experimental design: the plan used to collect
the data
Completely Randomized Design
Experimental units (subjects) are assigned
randomly to treatments
Subjects are assumed homogeneous
Only one factor or independent variable
With two or more treatment levels
Analyzed by oneway analysis of variance
(ANOVA)
OneWay Analysis of Variance
Evaluate the difference among the means of three
or more groups
Examples: Accident rates for 1
st
, 2
nd
, and 3
rd
shift
Expected mileage for five brands of tires
Assumptions
Populations are normally distributed
Populations have equal variances
Samples are randomly and independently drawn
Hypotheses of OneWay ANOVA
All population means are equal
i.e., no treatment effect (no variation in means among
groups)
At least one population mean is different
i.e., there is a treatment effect
Does not mean that all population means are different
(some pairs may be the same)
c 3 2 1 0
: H = = = =
same the are means population the of all Not : H
1
OneFactor ANOVA
All Means are the same:
The Null Hypothesis is True
(No Treatment Effect)
c 3 2 1 0
: H = = = =
same the are all Not : H
j 1
3 2 1
= =
Example:
3 Groups
OneFactor ANOVA
At least one mean is different:
The Null Hypothesis is NOT true
(Treatment Effect is present)
c 3 2 1 0
: H = = = =
same the are all Not : H
j 1
3 2 1
= =
3 2 1
= =
or
(continued)
Partitioning the Variation
Total variation can be split into two parts:
SST = Total Sum of Squares
(Total variation)
SSA = Sum of Squares Among Groups
(Amonggroup variation)
SSW = Sum of Squares Within Groups
(Withingroup variation)
SST = SSA + SSW
Partitioning the Variation
Total Variation = the aggregate dispersion of the individual
data values across the various factor levels (SST)
WithinGroup Variation = dispersion that exists among
the data values within a particular factor level (SSW)
AmongGroup Variation = dispersion between the factor
sample means (SSA)
SST = SSA + SSW
(continued)
Partition of Total Variation
Variation Due to
Factor (SSA)
Variation Due to Random
Sampling (SSW)
Total Variation (SST)
Commonly referred to as:
Sum of Squares Within
Sum of Squares Error
Sum of Squares Unexplained
WithinGroup Variation
Commonly referred to as:
Sum of Squares Between
Sum of Squares Among
Sum of Squares Explained
Among Groups Variation
=
+
d.f. = n 1
d.f. = c 1 d.f. = n c
Total Sum of Squares
= =
=
c
1 j
n
1 i
2
ij
j
) X X ( SST
Where:
SST = Total sum of squares
c = number of groups (levels or treatments)
n
j
= number of observations in group j
X
ij
= i
th
observation from group j
X = grand mean (mean of all data values)
SST = SSA + SSW
Total Variation
Group 1 Group 2 Group 3
Response, X
X
2
cn
2
12
2
11
) X X ( ... ) X X ( ) X X ( SST
c
+ + + =
(continued)
AmongGroup Variation
Where:
SSA = Sum of squares among groups
c = number of groups
n
j
= sample size from group j
X
j
= sample mean from group j
X = grand mean (mean of all data values)
2
j
c
1 j
j
) X X ( n SSA =
=
SST = SSA + SSW
AmongGroup Variation
Variation Due to
Differences Among Groups
i
2
j
c
1 j
j
) X X ( n SSA =
=
1 c
SSA
MSA
=
Mean Square Among =
SSA/degrees of freedom
(continued)
AmongGroup Variation
Group 1 Group 2 Group 3
Response, X
X
1
X
2
X
3
X
2 2
2 2
2
1 1
) x x ( n ... ) x x ( n ) x x ( n SSA
c c
+ + + =
(continued)
WithinGroup Variation
Where:
SSW = Sum of squares within groups
c = number of groups
n
j
= sample size from group j
X
j
= sample mean from group j
X
ij
= i
th
observation in group j
2
j
ij
n
1 i
c
1 j
) X X ( SSW
j
=
= =
SST = SSA + SSW
WithinGroup Variation
Summing the variation
within each group and then
adding over all groups
c n
SSW
MSW
=
Mean Square Within =
SSW/degrees of freedom
2
j
ij
n
1 i
c
1 j
) X X ( SSW
j
=
= =
(continued)
j
WithinGroup Variation
Group 1 Group 2 Group 3
Response, X
1
X
2
X
3
X
2
c
cn
2
2
12
2
1
11
) X X ( ... ) X X ( ) X x ( SSW
c
+ + + =
(continued)
Obtaining the Mean Squares
c n
SSW
MSW
=
1 c
SSA
MSA
=
1 n
SST
MST
=
OneWay ANOVA Table
Source of
Variation
df SS
MS
(Variance)
Among
Groups
SSA MSA =
Within
Groups
n  c SSW MSW =
Total n  1
SST =
SSA+SSW
c  1
MSA
MSW
F ratio
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
SSA
c  1
SSW
n  c
F =
OneWay ANOVA
F Test Statistic
Test statistic
MSA is mean squares among groups
MSW is mean squares within groups
Degrees of freedom
df
1
= c 1 (c = number of groups)
df
2
= n c (n = sum of sample sizes from all populations)
MSW
MSA
F =
H
0
:
1
=
2
=
=
c
H
1
: At least two population means are different
Interpreting OneWay ANOVA
F Statistic
The F statistic is the ratio of the among
estimate of variance and the within estimate
of variance
The ratio must always be positive
df
1
= c 1 will typically be small
df
2
= n  c will typically be large
Decision Rule:
Reject H
0
if F > F
U
,
otherwise do not
reject H
0
0
o = .05
Reject H
0
Do not
reject H
0
F
U
OneWay ANOVA
F Test Example
You want to see if three
different golf clubs yield
different distances. You
randomly select five
measurements from trials on
an automated driving
machine for each club. At the
0.05 significance level, is
there a difference in mean
distance?
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204
OneWay ANOVA Example:
Scatter Diagram
270
260
250
240
230
220
210
200
190
Distance
1
X
2
X
3
X
X
227.0 x
205.8 x 226.0 x 249.2 x
3 2 1
=
= = =
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204
Club
1 2 3
OneWay ANOVA Example
Computations
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204
X
1
= 249.2
X
2
= 226.0
X
3
= 205.8
X = 227.0
n
1
= 5
n
2
= 5
n
3
= 5
n = 15
c = 3
SSA = 5 (249.2 227)
2
+ 5 (226 227)
2
+ 5 (205.8 227)
2
= 4716.4
SSW = (254 249.2)
2
+ (263 249.2)
2
++ (204 205.8)
2
= 1119.6
MSA = 4716.4 / (31) = 2358.2
MSW = 1119.6 / (153) = 93.3
25.275
93.3
2358.2
F = =
F
= 25.275
OneWay ANOVA Example
Solution
H
0
:
1
=
2
=
3
H
1
:
j
not all equal
o = 0.05
df
1
= 2 df
2
= 12
Test Statistic:
Decision:
Conclusion:
Reject H
0
at o = 0.05
There is evidence that
at least one
j
differs
from the rest
0
o = .05
F
U
= 3.89
Reject H
0
Do not
reject H
0
25.275
93.3
2358.2
MSW
MSA
F = = =
Critical
Value:
F
U
= 3.89
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA
Source of
Variation
SS df MS F Pvalue F crit
Between
Groups
4716.4 2 2358.2 25.275 4.99E05 3.89
Within
Groups
1119.6 12 93.3
Total 5836.0 14
OneWay ANOVA
Excel Output
EXCEL: tools  data analysis  ANOVA: single factor
The TukeyKramer Procedure
Tells which population means are significantly
different
e.g.:
1
=
2
=
3
Done after rejection of equal means in ANOVA
Allows pairwise comparisons
Compare absolute mean differences with critical
range
x
1
=
2
3
TukeyKramer Critical Range
where:
Q
U
= Value from Studentized Range Distribution
with c and n  c degrees of freedom for
the desired level of o (see appendix E.9 table)
MSW = Mean Square Within
n
j
and n
j
= Sample sizes from groups j and j


.

\

+ =
j' j
U
n
1
n
1
2
MSW
Q Range Critical
The TukeyKramer Procedure:
Example
1. Compute absolute mean
differences:
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204 20.2 205.8 226.0 x x
43.4 205.8 249.2 x x
23.2 226.0 249.2 x x
3 2
3 1
2 1
= =
= =
= =
2. Find the Q
U
value from the table in appendix E.10 with
c = 3 and (n c) = (15 3) = 12 degrees of freedom
for the desired level of o (o = 0.05 used here):
3.77 Q
U
=
The TukeyKramer Procedure:
Example
5. All of the absolute mean differences
are greater than critical range.
Therefore there is a significant
difference between each pair of
means at 5% level of significance.
Thus, with 95% confidence we can conclude
that the mean distance for club 1 is greater
than club 2 and 3, and club 2 is greater than
club 3.
16.285
5
1
5
1
2
93.3
3.77
n
1
n
1
2
MSW
Q Range Critical
j' j
U
=

.

\

+ =


.

\

+ =
3. Compute Critical Range:
20.2 x x
43.4 x x
23.2 x x
3 2
3 1
2 1
=
=
=
4. Compare:
(continued)
The Randomized Block Design
Like OneWay ANOVA, we test for equal
population means (for different factor levels, for
example)...
...but we want to control for possible variation
from a second factor (with two or more levels)
Levels of the secondary factor are called blocks
Partitioning the Variation
Total variation can now be split into three parts:
SST = Total variation
SSA = AmongGroup variation
SSBL = AmongBlock variation
SSE = Random variation
SST = SSA + SSBL + SSE
Sum of Squares for Blocking
Where:
c = number of groups
r = number of blocks
X
i.
= mean of all values in block i
X = grand mean (mean of all data values)
=
=
r
1 i
2
i.
) X X ( c SSBL
SST = SSA + SSBL + SSE
Partitioning the Variation
Total variation can now be split into three parts:
SST and SSA are
computed as they were
in OneWay ANOVA
SST = SSA + SSBL + SSE
SSE = SST (SSA + SSBL)
Mean Squares
1 c
SSA
groups among square Mean MSA
= =
1 r
SSBL
blocking square Mean MSBL
= =
) 1 )( 1 (
= =
c r
SSE
MSE error square Mean
Randomized Block ANOVA Table
Source of
Variation
df SS MS
Among
Blocks
SSBL MSBL
Error
(r1)(c1) SSE MSE
Total rc  1 SST
r  1
MSBL
MSE
F ratio
c = number of populations rc = sum of the sample sizes from all populations
r = number of blocks df = degrees of freedom
Among
Treatments
SSA c  1 MSA
MSA
MSE
Blocking Test
Blocking test: df
1
= r 1
df
2
= (r 1)(c 1)
MSBL
MSE
... : H
3. 2. 1. 0
= = =
equal are means block all Not : H
1
F =
Reject H
0
if F > F
U
Main Factor test: df
1
= c 1
df
2
= (r 1)(c 1)
MSA
MSE
c
. .3 .2 .1 0
... : H = = = =
equal are means population all Not : H
1
F =
Reject H
0
if F > F
U
Main Factor Test
The Tukey Procedure
To test which population means are significantly
different
e.g.:
1
=
2
3
Done after rejection of equal means in randomized
block ANOVA design
Allows pairwise comparisons
Compare absolute mean differences with critical
range
x
=
1 2 3
etc...
x x
x x
x x
.3 .2
.3 .1
.2 .1
= =
'
=
=
r
1 i
c
1 j
n
1 k
2
ijk
) X X ( SST
2
r
1 i
.. i
) X X ( n c SSA
'
=
=
2
c
1 j
. j .
) X X ( n r SSB
'
=
=
Total Variation:
Factor A Variation:
Factor B Variation:
Two Factor ANOVA Equations
2
r
1 i
c
1 j
.j. i.. ij.
) X X X X ( n SSAB +
'
=
= =
= =
'
=
=
r
1 i
c
1 j
n
1 k
2
. ij
ijk
) X X ( SSE
Interaction Variation:
Sum of Squares Error:
(continued)
Two Factor ANOVA Equations
where:
Mean Grand
n rc
X
X
r
1 i
c
1 j
n
1 k
ijk
=
'
=
= =
'
=
r) ..., 2, 1, (i A factor of level i of Mean
n c
X
X
th
c
1 j
n
1 k
ijk
.. i
= =
'
=
=
'
=
c) ..., 2, 1, (j B factor of level j of Mean
n r
X
X
th
r
1 i
n
1 k
ijk
. j . = =
'
=
=
'
=
ij cell of Mean
n
X
X
n
1 k
ijk
. ij
=
'
=
'
=
r = number of levels of factor A
c = number of levels of factor B
n = number of replications in each cell
(continued)
Mean Square Calculations
1 r
SSA
A factor square Mean MSA
= =
1 c
SSB
B factor square Mean MSB
= =
) 1 c )( 1 r (
SSAB
n interactio square Mean MSAB
= =
) 1 ' n ( rc
SSE
error square Mean MSE
= =
TwoWay ANOVA:
The F Test Statistic
F Test for Factor B Effect
F Test for Interaction Effect
H
0
:
1..
=
2..
=
3..
=
H
1
: Not all
i..
are equal
H
0
: the interaction of A and B is
equal to zero
H
1
: interaction of A and B is not
zero
F Test for Factor A Effect
H
0
:
.1.
=
.2.
=
.3.
=
H
1
: Not all
.j.
are equal
Reject H
0
if F > F
U
MSE
MSA
F =
MSE
MSB
F =
MSE
MSAB
F =
Reject H
0
if F > F
U
Reject H
0
if F > F
U
TwoWay ANOVA
Summary Table
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Squares
F
Statistic
Factor A SSA r 1
MSA
= SSA
/(r 1)
MSA
MSE
Factor B SSB c 1
MSB
= SSB
/(c 1)
MSB
MSE
AB
(Interaction)
SSAB (r 1)(c 1)
MSAB
= SSAB
/ (r 1)(c 1)
MSAB
MSE
Error SSE rc(n 1)
MSE =
SSE/rc(n 1)
Total SST n 1
Features of TwoWay ANOVA
F Test
Degrees of freedom always add up
n1 = rc(n1) + (r1) + (c1) + (r1)(c1)
Total = error + factor A + factor B + interaction
The denominator of the F Test is always the
same but the numerator is different
The sums of squares always add up
SST = SSE + SSA + SSB + SSAB
Total = error + factor A + factor B + interaction
Examples:
Interaction vs. No Interaction
No interaction:
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor A Levels
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor A Levels
M
e
a
n
R
e
s
p
o
n
s
e
M
e
a
n
R
e
s
p
o
n
s
e
Interaction is
present:
Multiple Comparisons:
The Tukey Procedure
Unless there is a significant interaction, you
can determine the levels that are significantly
different using the Tukey procedure
Consider all absolute mean differences and
compare to the calculated critical range
Example: Absolute differences
for factor A, assuming three factors:
3.. 2..
3.. 1..
2.. 1..
X X
X X
X X
Multiple Comparisons:
The Tukey Procedure
Critical Range for Factor A:
(where Q
u
is from Table E.10 with r and rc(n1) d.f.)
Critical Range for Factor B:
(where Q
u
is from Table E.10 with c and rc(n1) d.f.)
n' c
MSE
Range Critical
U
Q =
n' r
MSE
Range Critical
U
Q =
Chapter Summary
Described oneway analysis of variance
The logic of ANOVA
ANOVA assumptions
F test for difference in c means
The TukeyKramer procedure for multiple comparisons
Considered the Randomized Block Design
Treatment and Block Effects
Multiple Comparisons: Tukey Procedure
Described twoway analysis of variance
Examined effects of multiple factors
Examined interaction between factors