You are on page 1of 17

Analysis of Variance

with two factors


(Session 13)
SADC Course in Statistics

Learning Objectives
At the end of this session, you will be able to
understand and interpret the components
of a linear model with two categorical
factors
fit a model involving two factors, interpret
the output and present the results
understand the difference between raw
means and adjusted means
appreciate that a residual analysis is the
same with more complex models
To put your footer here go to View > Header and Footer

Using Paddy again!


In the paddy example, there were two
categorical factors, variety and village.
Here we will look at a model including both
factors and the corresponding output.
We will also discuss assumptions associated
with anova models with categorical factors and
procedures to check these assumptions.

To put your footer here go to View > Header and Footer

A model using two factors


Objective here is to compare paddy yields
across the 3 varieties and also across villages.
A linear model for this takes the form:

yij = 0 + vi + gj + ij
Here 0 represents a constant, and the gj (i=1,2,3)
represent the variety effect as before.
We also have the term vi (i=1,2,3,4) to represent the
village effect.
To put your footer here go to View > Header and Footer

Anova results
Source

d.f.

S.S.

M.S.

Prob.

Village

13.91

4.64

14.0

0.000

Variety

25.68

12.84

38.7

0.000

Residual

30

9.95

0.3318

Total

35

49.55

Above is a two-way anova since there are


two factors explaining the variability in
paddy yields.
Again the Residual M.S. (s2) = 0.3318
describes the variation not explained by
village and variety.
To put your footer here go to View > Header and Footer

Sample sizes
--------+-----------------------+------|
Variety
|
Village | New
Old
Trad | Total
--------+-----------------------+------KESEN |
0
3
4 |
7
NANDA |
2
7
5 |
14
NIKO |
0
2
3 |
5
SABEY |
2
5
3 |
10
--------+-----------------------+------Total |
4
17
15 |
36
--------+-----------------------+-------

Above shows data is not balanced. Hence


need to worry about the order of fitting
terms. How then should we interpret the
sequential S.S.s shown in slide 5 anova?
To put your footer here go to View > Header and Footer

Anova with adjusted SS and MS


Source

d.f.

Adj.S.S.

Adj.M.S.

Prob.

Village

4.32

1.44

4.34

0.012

Variety

25.68

12.84

38.7

0.000

Residual

33

9.95

0.3318

Total

35

49.55

How may the above results be interpreted?


What are your conclusions?

To put your footer here go to View > Header and Footer

Model estimates
Parameter

Coeff. Std.error

t prob

0 :constant

5.284

0.386

13.7

0.000

v2 (Nanda)

0.718

0.272

2.63

0.013

-0.179

0.337

-0.53

0.599

0.633

0.294

2.16

0.039

g2 (old)

-1.201

0.327

-3.67

0.001

g3 (trad)

-2.614

0.340

-7.68

0.000

v3 (Niko)
v4 (Sabey)

What do these results tell us?

To put your footer here go to View > Header and Footer

Relating estimates to
means
Again:

Old - New

-1.201 =

Estimate of g2

Trad - New =

-2.614 =

Estimate of g3

This is similar to the case with one categorical


factor can make comparisons easily with the
base level using model estimates.
But when sample sizes are unequal across the
two categorical factors, results should be
reported in terms of adjusted means!

To put your footer here go to View > Header and Footer

Raw means and adjusted


means
Sample
Raw
Variety

Std.error

Size(n)

Means

(s.d./n)

New improved

5.96

0.128

Old improved

17

4.54

0.173

Traditional

15

3.00

0.168

Model based summaries (adjusted means):


Variety

Adjusted means

Std.error (s/n)

New improved

5.58

0.308

Old improved

4.38

0.148

Traditional

2.96

0.150

To put your footer here go to View > Header and Footer

10

Computing adjusted means


The model equation
yij = 0 + vi + gj + ij
can be used to find the variety adjusted means
e.g. adjusted mean for traditional variety is:

v
1
2
3
4
0
g 3
4
= 5.284+0.25[0+0.7180.179+0.633]2.614
= 2.963
Thus the variety adjusted mean is an average
over the 4 villages.
To put your footer here go to View > Header and Footer

11

Checking model
assumptions
Anova model with two categorical factors is:
yij = 0 + gi + vj + ij
Model assumptions are associated with the ij.
These are checked in exactly the same way as
before.
A residual analysis is done, looking at plots of
residuals in various ways.
We give below a residual analysis for the model
fitted above.
To put your footer here go to View > Header and Footer

12

Histogram to check normality

.1

.2

Density
.3

.4

.5

Histogram of standardised residuals after


fitting a model of yield on village and variety.

-2

-1

0
Standardized residuals

To put your footer here go to View > Header and Footer

13

A normal probability plot


Another check on the normality assumption

-2

-1

Standardized residuals
0
1

Do you
think the
points
follow a
straight
line?

-2

-1

0
Inverse Normal

To put your footer here go to View > Header and Footer

14

Std. residuals versus fitted values


Checking assumption of variance
homogeneity, and identification of outliers:

-2

-1

Standardized residuals
0
1

What can
you say here
about the
variance
homogeneity
assumption?

4
Fitted values

To put your footer here go to View > Header and Footer

15

Finally know your software


Different software packages impose different
constraints on model parameters so need to
be aware what this is.
For example, Stata and Genstat set the first
level of the factor to zero. SPSS and SAS set
the last level to zero. Minitab imposes a
constraint that sets the sum of the parameter
estimates to zero!
Check also whether the software produces
sequential or adjusted or some other form of
sums of squares. The correct interpretation of
anova results would depend on this.
To put your footer here go to View > Header and Footer

16

Practical work follows to ensure


learning objectives are
achieved

To put your footer here go to View > Header and Footer

17

You might also like