You are on page 1of 70

Chapter 9

Advanced Topics in ANOVA

Page
Unbalanced ANOVA designs
1. Why is the design unbalanced? 9-2
2. What happens with unbalanced designs? 9-3
3. An introduction to the problem 9-5
4. Types of sums of squares 9-10
5. An example 9-15

ANOVA designs with random effects


6. Fixed effects vs. random effects 9-22
7. Model II: One-factor random effects model 9-24
8. Model II: Two-factor random effects model 9-30
9. Model III: Two-factor mixed effects model 9-35
10. Contrasts and post-hoc tests 9-41
11. Effect sizes 9-41
12. Final considerations about random effects 9-42

ANOVA designs with nested effects


13. An introduction to nested designs 9-43
14. Structural models for nested designs 9-45
15. Testing nested effects 9-46
16. Final considerations about nested designs 9-52

ANOVA designs with randomized blocks


17. The logic of blocked designs 9-53
18. Examples of randomized block designs 9-55
19. Final consideration about blocked designs 9-69

9-1 2006 A. Karpinski


Advanced Topics in ANOVA:
Unbalanced ANOVA designs

1. Why is the design unbalanced?

Random factors
o The unequal cell sizes are randomly unequal
o The process leading to the missingness is independent of the levels of the
independent variable
Scheduling problems
Computer errors

IV 1
IV 1 IV B Level 1 Level 2 Level 3
IV B Level 1 Level 2 Level 3 Level 1 n11 =4 n21 =7 n31 =3 14
Level 1 n11 =15 n 21 =10 n31 =20 45 Level 2 n12 =4 n22 =3 n32 =6 13
Level 2 n12 =20 n 22 =20 n32 =15 55 Level 3 n13 =5 n23 =4 n33 =5 14
35 30 35 100 13 14 14 41

Systematic factors
o The unequal cell sizes are directly or indirectly related to the levels of the
independent variables
A treatment is painful/ineffective
High prejudice individuals refuse to answer questions regarding
attitudes toward ethnic groups

IV 1
IV 1 IV B Level 1 Level 2 Level 3
IV B Level 1 Level 2 Level 3 Level 1 n11 =3 n21 =6 n31 =9 18
Level 1 n11 =40 n 21 =40 n31 =50 130 Level 2 n12 =2 n22 =6 n32 =9 17
Level 2 n12 =20 n 22 =20 n32 =30 70 Level 3 n13 =4 n23 =8 n33 =13 25
60 60 80 200 9 20 31 60

9-2 2006 A. Karpinski


Missing observations due to systematic factors is bad. Analyzing these data
can lead to very biased results.

All of the methods we discuss for analyzing unbalanced designs assume the
cell sizes are either a result of:
o Random factors
o Real differences in the population

2. What happens with unbalanced designs?

Recall that two contrasts are orthogonal if for unequal n

1 = (a1 , a 2 , a3 ,..., a a )
2 = (b1 , b2 , b3 ,..., ba )

a
ai bi a1b1 a 2 b2 a b

j =1 ni
=0 or
n1
+
n2
+ ... + a a = 0
na

In general the tests for main effects and interactions are no longer orthogonal
for unbalanced designs.
Because of this non-orthogonality, the sums of squares will not nicely
partition.
SSA + SSB + SSAB SSModel

As a result:
o The tests for the main effects and interactions are not independent of each
other.
o Single degree of freedom contrasts may not be combined into a
simultaneous test.

The most popular method for dealing with these issues is to use different
methods of computing the sums of squares for each effect.

These different methods of computing sums of squares DO NOT affect:


i. The error term (MSW)
ii. The test of the highest order interaction

9-3 2006 A. Karpinski


Three possible approaches to unequal cell sizes (assuming data are missing
completely at random)

o Add observations to make the design balanced


This solution may not be pragmatic
It may also present problems regarding random assignment in a true
experiment

o Delete observations to make design balanced


While an unbalanced design is less powerful than a balanced design,
you ALWAYS lose power by tossing observations
There is not a good method for deciding whom to toss. (If you use a
random process, then a different person using the same algorithm may
come to different conclusions. If you use a systematic process, then
you may bias your results.)
I recommend that you NEVER delete an observation to make a design
balanced.

o Impute the missing data


A topic too advanced for this course!

o Conduct analysis on an unbalanced design

9-4 2006 A. Karpinski


3. An introduction to the problem of unbalanced designs

Balanced, orthogonal designs


o For balanced designs, the SS partition is complete and each components
contribution to the total SS is unique.

SSA SSB

SSAB

Unbalanced, non-orthogonal designs


o For unbalanced designs, the SS are not necessarily unique to each
component
o These figures are just heuristics. With data, it is possible to have
negative overlapping area.

SSA SSB

SSAB

9-5 2006 A. Karpinski


Approach #1: Only count the unique contribution of each factor
o This approach is known as the Unique SS or Type III SS approach

SSA SSB

SSAB

Approach #2: Start with only the main effects. Use a unique SS approach to
divide the main effect sums of squares. Then, add the next highest order
effects. For the remaining SS, use the unique approach to divide the SS.
Continue until all effects have been added.
o This approach is known as using Type II SS

SSA SSB

SSAB

9-6 2006 A. Karpinski


Approach #3: Start with only the main effects. Determine an order of
importance. Give the most important effect all its SS. For next effect, give
the effect its entire remaining SS. Continue until all main effects are used.
Next consider the two-way interactions, and determine an order of
importance and repeat the process. Continue until all effects have been
considered.
o This approach is known as the hierarchical or Type I SS approach.

Factor A entered first Factor B entered first

SSA SSB SSA SSB

SSAB SSAB

9-7 2006 A. Karpinski


The problem of unequal sample sizes occurs when we collapse across cells
to look at the marginal means. There are different ways to collapse the main
effects, and each gives a different answer.

(The MSW and the highest order interaction are unaffected by these
different methods because they do not average across any cellsthey say
something about individual cells.)

An example: Salary data for female and male employees

Female Male
No No
College Degree College Degree College Degree College Degree
24 15 25 19
26 17 29 18
25 20 27 21
24 16 20
27 21
24 22
27 19
23
Mean 25 17 27 20
Sample Size 8 4 3 7

Gender
Female Male
Education College Degree n11 = 8 n 21 = 3
X 11 = 25 X 21 = 27
No College Degree n12 = 4 n 22 = 7
X 12 = 17 X 22 = 20

9-8 2006 A. Karpinski


Question: Is there a difference in the salaries of men and women?

o Approach #1: Lets run a contrast comparing womens salary to mens


salary

Gender
Women Men
Education College Degree -1 1
No College Degree -1 1

Based on this approach, we conclude that men earn more than


women!

25 + 17
Women earn $21000 = 21
2

27 + 20
Men earn $23500 = 23.5
2

o Approach #2: Ignore education level and compute marginal gender


means.

Gender
Women Men
College Degree n F = 12 n M = 10
X F = 22.33 X M = 22.10

Based on this approach we look at the marginal means for gender, and
conclude that women earn slightly more than men

o Which answer is correct?

9-9 2006 A. Karpinski


o It depends each method answers a different question
Method #2 asks: Are men paid a higher salary than women?
Method #1 asks: Within an education status, are men paid a higher
salary than women?

This discrepancy is known as Simpsons Paradox

4. Types of Sums of Squares


I am going to focus on the use and interpretation of each type of sums of
squares, and will ignore how to compute these SS. SPSS (or any statistical
software) can calculate each of the SS, but if you must see the computational
details, see an advanced ANOVA book.

Type III / Unique SS or Regression SS


o In general, this is the best and most common approach to analysis
o For Type III SS, each cell mean is weighted equally when computing
marginal means. These cell means are unweighted (because they
considered equally, independent of the sample sizes).
o This approach leads to the identical results as converting the design to a
one-factor arrangement and using contrasts to test the main effects and
interactions.
o When the design is not orthogonal, the SS of each effect may sum to a
number greater than the total SS because of redundancy/overlap in SS.
For Type III SS, we only use the part of the SS that is unique to the factor
of interest.
(For those of you familiar with regression, Type III SS is equivalent to testing for
each effect after having previously controlled for/entered all other effects OR by
entering all effects simultaneously.)

9-10 2006 A. Karpinski


o In our example, using Type III SS is equivalent to taking approach #1 to
the analysis.

Testing the main effect for gender using a Type III SS approach:
Gender
Women Men
Education College Degree X 11 = 25 X 21 = 27
-1 1
No College Degree X 12 = 17 X 22 = 20
-1 1

Main effect for gender


25 + 17
Women earn $21000 = 21
2
27 + 20
Men earn $23500 = 23.5
2

How is the main effect for education tested?

In SPSS:
UNIANOVA dv BY gender edu
/METHOD = SSTYPE(3).
Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 273.864a 3 91.288 32.864 .000
Intercept 9305.790 1 9305.790 3350.084 .000
GENDER 29.371 1 29.371 10.573 .004
EDU 264.336 1 264.336 95.161 .000
GENDER * EDU 1.175 1 1.175 .423 .524
Error 50.000 18 2.778
Total 11193.000 22
Corrected Total 323.864 21
a. R Squared = .846 (Adjusted R Squared = .820)

Main effect for gender such that men earn more than women,
F(1,22) = 10.57, p = .004
Main effect for education such that college educated individuals earn
more than non-college educated individuals,
F(1,22) = 95.16, p < .001

9-11 2006 A. Karpinski


Type I / Hierarchical SS
o For Type I SS, each cell mean is weighted by its cell size when
computing marginal means.
o The order the factors are entered into SPSS makes a difference in how
the SS are computed.
o When the design is not orthogonal, the SS of each effect may sum to a
number greater than the total SS because of redundancy/overlap in SS.
For Type I SS:
For the first factor listed, we use all the SS for that factor (unique and
redundant)
For the next factors, we use the entire SS that is not redundant with
the previous factors
(For those of you familiar with regression, Type I SS is equivalent to testing for
each effect by entering each effect one after the other)

o In our example, Type I SS (with gender listed first) is equivalent to


ignoring education level and using weighted marginal means

Gender
Women Men
College Degree n F = 12 n M = 10
X F = 22.33 X M = 22.10

In SPSS:
UNIANOVA dv BY gender edu
/METHOD = SSTYPE(1).

Tests of Between-Subjects Effects

Dependent Variable: DV
Type I Sum
Source of Squares df Mean Square F Sig.
Corrected Model 273.864a 3 91.288 32.864 .000
Intercept 10869.136 1 10869.136 3912.889 .000
GENDER .297 1 .297 .107 .747
EDU 272.392 1 272.392 98.061 .000
GENDER * EDU 1.175 1 1.175 .423 .524
Error 50.000 18 2.778
Total 11193.000 22
Corrected Total 323.864 21
a. R Squared = .846 (Adjusted R Squared = .820)

9-12 2006 A. Karpinski


UNIANOVA dv BY edu gender
/METHOD = SSTYPE(1).
Tests of Between-Subjects Effects

Dependent Variable: DV
Type I Sum
Source of Squares df Mean Square F Sig.
Corrected Model 273.864a 3 91.288 32.864 .000
Intercept 10869.136 1 10869.136 3912.889 .000
EDU 242.227 1 242.227 87.202 .000
GENDER 30.462 1 30.462 10.966 .004
EDU * GENDER 1.175 1 1.175 .423 .524
Error 50.000 18 2.778
Total 11193.000 22
Corrected Total 323.864 21
a. R Squared = .846 (Adjusted R Squared = .820)

Gender listed first Edu listed first


Main effect for gender F(1,18) = 0.11, p = .75 F(1,18) = 10.97, p < .001
Main effect for education F(1,18) = 98.06, p < .001 F(1,18) = 87.20, p < .001

Not surprisingly, there are additional types of sums of squares


o Type II SS
A compromise between Type I and Type III SS
o Type IV SS
Use when there are missing cells in the design of the experiment

Which SS are better?

o In general, you ran the design because you wanted to compare the cell
means. In this case, the unequal cell sizes are irrelevant and you should
use Type III SS
If we have an experimental design and the data are missing at random,
then there is no defensible reason for allowing cells with larger
numbers of observations to exert a greater influence on the analysis
For men and women with equal levels of education, do men and
women receive equal pay?
Type III SS also have the advantage of being the simplest to convert
to contrast coefficients

9-13 2006 A. Karpinski


o If your design intentionally has unequal cell sizes (perhaps to reflect
differences in the composition of the population) and you want your
analyses to reflect this feature, then Type I SS may be more appropriate
Do men and women receive equal pay?

o This issue of which type of SS to use for unbalanced designs is still


controversial. Different texts and different authors offer different
recommendations. The important point is for you to think about what
question you are asking and which type of SS best answers that question.
You must decide this issue before you analyze your data, not after
examining the p-values!

Important points to remember


o Regardless of the type of SS used, the error term remains unchanged
o Any analysis that does not involve marginal means remains unchanged
The test of the highest order interaction is unchanged
Tests of cell mean contrasts are unchanged
o In most cases Type III SS seem to be the best because they take into
account information about all the factors
If important factors are omitted from the design, you may arrive a
erroneous conclusions (In regression, this is known as the omitted
variable problem).

9-14 2006 A. Karpinski


5. An Example: Level of Management and Support of Affirmative Action

Management Level
Middle- Upper- Middle- Upper-
Management, Management, Management, Management,
Minor Minor Major Major
Gender Division Division Division Division CEO
Female 21 25 29 31 30 25 22 35 25 27 36
26 24 23 28 31 30 30 27
Male 25 18 31 28 33 31 35 35 43 36 44 43
26 22 31 40 36 37 40 45 42
DV = Scores on an Affirmative Action Attitude Scale

Note that this design is rather odd it is a 2*2* 2 with an extra 2 cells

Management Level
Middle Management Upper Management
Minor Major Minor Major
Gender Division Division Division Division Gender CEO
Male Male
Female Female

Rather than trying to analyze it as a 2*2*3 with two missing cells, it is much
easier to consider this design to be a 2*5 design. Using appropriate contrasts,
we can test
o Main effect of management level
o Main effect of division
o Management by division interaction
o Interactions between all these terms and gender

But we can also make comparisons between these groups and CEOs.

Using this approach, we can avoid designs with empty cells and the need to
learn about Type IV SS.

9-15 2006 A. Karpinski


Your specific research questions were:
i. Do middle and upper management from minor divisions differ in their
support for AA?
ii. Do minor division managers differ from major division managers in
their support for AA?
iii. Do CEOs differ from other management in their support for AA?
iv. Do questions i. iii. differ by gender?

First, lets look at the data:


50

40

36

30
3

1
20 GENDER

Female
DV

10 Male
N= 5 3 4 4 4 4 3 5 3 5

MM - Minor UM - Minor MM - Major UP - Major CEO

MANAGE

EXAMINE VARIABLES=dv BY group


/PLOT NPPLOT.

Tests of Normality

Shapiro-Wilk
GROUP Statistic df Sig.
DV 1.00 .989 5 .977
2.00 .895 4 .405
3.00 .912 4 .492
4.00 1.000 3 1.000
5.00 .750 3 .000
6.00 .842 3 .220 Test of Homogeneity of Variances
7.00 .827 4 .161 DV
8.00 .971 4 .850 Levene
9.00 .887 5 .341 Statistic df1 df2 Sig.
10.00 .836 5 .154 .348 9 30 .950

9-16 2006 A. Karpinski


Rather than running a traditional main effects and interaction analysis, lets
skip the omnibus tests and do a contrast-based test of the hypotheses.

o We should adopt a Type III SS approach the variations in the cell sizes
appear to be random and we are interested in the cell means.

o To conduct contrasts with a Type III SS approach, we need to consider


each cell mean equally, regardless of its sample size but that is what we
do when we use our standard tests for contrasts.

o However, remember that we cannot combine single degree of freedom


contrasts into a simultaneous omnibus test of a hypothesis.

Hypothesis 1
o Do middle and upper management in the minor divisions differ in their
support for AA?
o Does this level of support differ by gender?

Management Level
Gender MM, UM, MM, UM,
Minor Minor Major Major CEO
Hyp1: Female -1 1 0 0 0
Male -1 1 0 0 0
Hyp 1B: Female -1 1 0 0 0
Male 1 -1 0 0 0

ONEWAY dv by group
/cont = -1 1 0 0 0 -1 1 0 0 0
/cont = -1 1 0 0 0 1 -1 0 0 0.
Contrast Tests

Value of
Contrast Contrast Std. Error t df Sig. (2-tailed)
DV Hyp 1 8.0000 4.00638 1.997 30 .055
Hyp 1 * Gender -2.0000 4.00638 -.499 30 .621

9-17 2006 A. Karpinski


In the minor divisions, we find that upper management is more
supportive of AA than middle management,
t(30) = 2.00, p = .06, 2 = .07 .
This difference in support of AA does not vary by gender,
t(30) = -0.50, p = .62, 2 < .01

As an example of the effect size calculation, here are the omega


squared calculations for the test of Hypothesis 1:

Hypothesis 1: 1 = 8

12 (8) 2 64
SS1 = = = = 61.935
c 2j (1) 2
(1) 2 2
(1) (1) 2
1.033
n 5
+
4
+0+0+0+
3
+
4
+0+0+0
j

SS MSWithin 61.935 15.53


2 = = = .0695
SS + (N 1)MSWithin 61.935 + (39)15.53

Hypothesis 2
o Do minor division managers differ from major division managers in their
support for AA?
o Does this level of support differ by gender?

Management Level
Gender MM, UM, MM, UM,
Minor Minor Major Major CEO
Hyp 2: Female -1 -1 1 1 0
Male -1 -1 1 1 0
Hyp 2B: Female -1 -1 1 1 0
Male 1 1 -1 -1 0

ONEWAY dv by group
/cont = -1 -1 1 1 0 -1 -1 1 1 0
/cont = -1 -1 1 1 0 1 1 -1 -1 0.
Contrast Tests

Value of
Contrast Contrast Std. Error t df Sig. (2-tailed)
DV Hyp 2 26.0000 5.66588 4.589 30 .000
Hyp 2 * Gender -18.0000 5.66588 -3.177 30 .003

9-18 2006 A. Karpinski


We find a significant division of management by gender interaction,
t(30) = -3.18, p < .01, 2 = .19 .

To understand this interaction, we must conduct simple effects tests:


ONEWAY dv by group
/cont = -1 -1 1 1 0 0 0 0 0 0
/cont = 0 0 0 0 0 -1 -1 1 1 0.
Contrast Tests

Value of
Contrast Contrast Std. Error t df Sig. (2-tailed)
DV Hyp 2 - Women only 4.0000 4.00638 .998 30 .326
Hyp 2 - Men only 22.0000 4.00638 5.491 30 .000

For women, we find no significant difference between major and


minor division management in their support for AA,
t(30) = 1.00, ns, 2 < .01.
For men, we find that managers in major divisions express more
support for AA than managers in minor divisions,
t(30) = 5.49, p < .05, 2 = .42 .
(Use Scheff correction t crit = 4 * F (.05,4,30) = 3.28 , as the critical value)

Hypothesis 3
o Do CEOs differ from other management in their support for AA?
o Does this level of support differ by gender?

Management Level
Gender MM, UM, MM, UM, CEO
Minor Minor Major Major
Hyp 3: Female -1 -1 -1 -1 4
Male -1 -1 -1 -1 4
Hyp 3B: Female -1 -1 -1 -1 4
Male 1 1 1 1 -4

ONEWAY dv by group
/cont = -1 -1 -1 -1 4 -1 -1 -1 -1 4
/cont = -1 -1 -1 -1 4 1 1 1 1 -4.
Contrast Tests

Value of
Contrast Contrast Std. Error t df Sig. (2-tailed)
DV Hyp 3 54.0000 12.83173 4.208 30 .000
Hyp * Gender -34.0000 12.83173 -2.650 30 .013

9-19 2006 A. Karpinski


We find a significant level of management by gender interaction,
t(30) = -2.65, p = .01, 2 = .13 .

To understand this interaction, we must conduct simple effects tests:


ONEWAY dv by group
/cont = -1 -1 -1 -1 4 0 0 0 0 0
/cont = 0 0 0 0 0 -1 -1 -1 -1 4.
Contrast Tests

Value of
Contrast Contrast Std. Error t df Sig. (2-tailed)
DV Hyp 3 - Women only 10.0000 9.94462 1.006 30 .323
Hyp 3 - Men only 44.0000 8.10912 5.426 30 .000

For women, we find no significant difference between management


and CEOs in their support for AA, t(30) = 1.01, ns, 2 < .01.
For men, we find that CEOs express more support for AA than other
managers, t(30) = 5.42, p < .05, 2 = .42

(Use Scheff correction t crit = 4 * F (.05,4,30) = 3.28 , as the critical value)

Note that for a contrast-based analysis, we are implicitly adopting a Type III
SS approach by weighting each cell mean equally. Single degree of
freedom tests of cell means are not affected by an unbalanced design
(However, we would not be able to combine single df tests into a
simultaneous test).

9-20 2006 A. Karpinski


If we had taken a traditional approach, we would have used Type III SS for
our analysis because we assume that the data are missing at random and we
want to know if attitudes toward AA differ by gender within each
management position.

UNIANOVA dv BY gender manage


/METHOD = SSTYPE(3)
/PRINT = DESC.

Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 1427.100a 9 158.567 10.208 .000
Intercept 36013.846 1 36013.846 2318.488 .000
GENDER 260.000 1 260.000 16.738 .000
MANAGE 687.429 4 171.857 11.064 .000
GENDER * MANAGE 268.351 4 67.088 4.319 .007
Error 466.000 30 15.533
Total 40706.000 40
Corrected Total 1893.100 39
a. R Squared = .754 (Adjusted R Squared = .680)

o We find a significant gender by management position interaction,


F(1,30) = 4.32, p < .01

o We would be required to perform follow-up tests before interpreting the


main effects for gender and management.

Attitude Toward Affirmative Action


MM - Minor
45
MM - Major
40
UM - Minor
Attitude

35
UM - Major
30
CEO
25

20
Female Male

Gender

9-21 2006 A. Karpinski


ANOVA designs with random effects

6. Fixed effects vs. random effects

Model I: The fixed effects model

o A fixed effect is one in which the experimenter is only interested in the


levels of the IV that are included in the study
o In advance of the study, the experimenter decides to examine a relatively
small set of treatments. Each treatment of interest is included in the
study. The experimenter wishes to make inferences about those
treatments and no others.
o The effect is fixed in that if someone were to replicate the study, the
identical treatments would be used

o Example of a fixed effects model: An advertising company wants to


examine the effectiveness of five different billboards in both men and
women, and in White-Americans, Black-Americans, Asian-Americans,
and Hispanic Americans.

This design is a 5*2*4 between subjects, fixed effects ANOVA

Factor 1: Advertisement (5 different billboards)


Factor 2: Gender (Men and Women)
Factor 3: Ethnicity (4 ethnic groups)

Each of these factors is fixed. If the design were to be replicated, the


exact same ads, genders, and ethnicities would be used. The
experimenter wants to make inferences regarding only these ads,
genders, and ethnicities.

(The exact same participants would not be used participants are


always a random effect)

Yijkl = + j + k + l + ( ) jk + ( ) jl + ( )kl + ( ) jkl + ijkl

9-22 2006 A. Karpinski


Model II: The random effects model

o A random effect is one in which the factor levels are randomly sampled
from a population. Inferences are made not only for the factor levels
included in the study, but to the entire population of factor levels.
o The effect is random in that if someone were to replicate the study, the
different treatments would be sampled from the population.

o Example of a random effects model: A company owns several hundred


retail stores throughout the country, and it wants to examine the
effectiveness of a new sales promotion. Five stores are randomly
sampled. The sales promotion is implemented in each store for a trial
period and then evaluated.

This design is a 1-factor between-subjects, random effect ANOVA

Factor 1: Store (5 stores)

The store factor is a random factor. If the design were to be


replicated, five different stores would be randomly sampled from the
population. The experimenter wants to make inferences regarding the
effectiveness of the sales promotion in all stores, not just the five
included in the study.

Model III: Mixed model

o A mixed model is a model containing at least one fixed effect and at least
one random effect
In psychology many people refer to a design with at least one between-subjects
factor and at least one within-subjects factor as a mixed design. Although this
terminology is common in psychology it is inconsistent with the statistical usage
of the term. Consistent with the statistical usage, we will reserve the term mixed
model for a model with fixed and random factors

9-23 2006 A. Karpinski


o Example of a mixed model: To investigate the effect of mental activity
on blood flow to the brain (BF), participants completed a math test, a
reading comprehension test, or a history task. The experimenter wanted
to generalize the results to a classroom setting, and reasoned that
different classrooms might have different effects on baseline BF. Thus,
six fifth grade classrooms were selected at random from the Philadelphia
public school system. The students in each class were randomly assigned
to the math test, the reading comprehension test, or the history test. Post-
test BF readings were taken on all participants.

This design is a 2-factor between-subjects, mixed model ANOVA

Factor 1: Test (Math, Reading Comprehension, or History)


Factor 2: Classroom (6 classrooms)

The test factor is a fixed factor. These three kinds of tasks are the
only tasks of interest to the experimenter. The classroom factor is a
random factor. If the design were to be replicated, six different
classrooms would be randomly sampled from the population.

The key idea of the random effects model is that you not only take into
account random noise, 2 , you also take into account the variability due to
the sampling of the factor levels, 2

7. Model II: One-factor random effects model

Lets consider the sales effectiveness example in more detail

Store
1 2 3 4 5
5.80 6.00 6.30 6.40 5.70
5.10 6.10 5.50 6.40 5.90
5.70 6.60 5.70 6.50 6.50
5.90 6.50 6.00 6.10 6.30
5.60 5.90 6.10 6.60 6.20
5.40 5.90 6.20 5.90 6.40
5.30 6.40 5.80 6.70 6.00
5.20 6.30 5.60 6.00 6.30
X 1 = 5.50 X 2 = 6.22 X 3 = 5.90 X 4 = 6.33 X 5 = 6.16

9-24 2006 A. Karpinski


For a random effects model, we need to check some additional assumptions,
compared to the fixed-effects model

o Fixed effects assumptions:


All observations are drawn from normally distributed populations
All observations have a common variance
All observations are independent and are randomly sampled from the
population

o Random effects assumptions:


All treatment effects are drawn from normally distributed populations
All treatment effects are independent and are randomly sampled from
the population

o In general, we cannot check these random effects assumptions in the


data. We must infer them from the design.

EXAMINE VARIABLES=dv BY store


/PLOT BOXPLOT NPPLOT SPREADLEVEL.

7.0

6.5

6.0

5.5

5.0
DV

4.5
N= 8 8 8 8 8

1.00 2.00 3.00 4.00 5.00

STORE

Tests of Normality

Shapiro-Wilk
STORE Statistic df Sig.
DV 1.00 .950 8 .716
2.00 .913 8 .373 Test of Homogeneity of Variance
3.00 .950 8 .716
Levene
4.00 .930 8 .516 Statistic df1 df2 Sig.
5.00 .946 8 .667 DV .073 4 35 .990

9-25 2006 A. Karpinski


The structural model for a oneway random effects model looks similar to a
fixed model

o Fixed effects model:


Yij = + j + ij ij ~ N (0, )

o Random effects model:


Yij = + j + ij ij ~ N (0, ) j ~ N (0, )
So that = +
2
Y
2 2

Random effects are denoted with a subscript to highlight that they


are random. That is, the j ' s are not fixed at a level, but have a
distribution.
In general, we are not interested in estimating the j ' s because they
vary from study to study. It is much more informative to estimate the
distribution of j ' s : j ~ N (0, )
When we estimate effects, we will want to estimate 2

ANOVA table for a random-effects model


o Recall the ANOVA table for the fixed-effects model
H 0 : 1 = 2 = ... = a = 0
Source SS df MS E(MS) F
Between SSBet a-1 SSB/DFBet
+
2 ni i2 MSBet
a 1 MSW
Within (Error) SSW N-a SSW/DFW 2
Total SST N-1

o A valid F-test for a factor is constructed so that:


When the null hypothesis is true, the expected F-value is 1
If H0 is true:
ni i
2

=0
a 1

2
+
n i i
2

MSBet a 1 2
Then F = = = 2 =1
MSW 2

9-26 2006 A. Karpinski


When the alternative hypothesis is true, the expected F-value is
greater than 1 and this increase is only due to the factor of interest
If H1 is true:
ni i
2

>0
a 1

2 +
n i i
2

MSB a 1
Then F = = >1
MSW 2

o Now the ANOVA table for the random-effects model


H 0 : 2 = 0

Source SS df MS E(MS) F
Between SSBet a-1 SSB/DFBet 2 + n 2 MSBet
MSW
Within (Error) SSW N-a SSW/DFW 2
Total SST N-1

o Although the F-tests are constructed in the same manner as a fixed effects
model, under the hood different components are being estimated

When the null hypothesis is true, the expected F-value is 1


If H0 is true: 2 = 0
MSBet 2 + n 2 2
Then F = = = 2 =1
MSW 2

When the alternative hypothesis is true, the expected F-value is


greater than 1 and this increase is only due to the factor of interest
If H1 is true: 2 > 0
MSBet 2 + n 2
Then F = = >1
MSW 2

9-27 2006 A. Karpinski


Random Effects in SPSS
UNIANOVA dv BY store
/RANDOM = store.

Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Intercept Hypothesis 1449.616 1 1449.616 1665.507 .000
Error 3.482 4 .870a
STORE Hypothesis 3.482 4 .870 10.717 .000
Error 2.843 35 8.121E-02b
a. MS(STORE)
b. MS(Error)

o To test the effect of store: F(4, 35) = 10.72, p < .01

o We reject the null hypothesis of no store effect and conclude that the
effectiveness of the sales campaign varies by store

If store had been a fixed effect, we would conduct post-hoc tests to


determine how the stores differed.
But when store is a random effect, we are not interested in differences
between specific stores used in the study. We only want to know if
the store variable adds any variance to the DV (or accounts for any
variance in the DV). In general, we are not interested in post-hoc tests
on the levels of a random variable.

9-28 2006 A. Karpinski


o For any random effects model, SPSS also provides us with the E(MS) so
that we can see how the F-test was constructed:
Expected Mean Squaresa

Variance Component
Quadratic
Source Var(STORE) Var(Error) Term
Intercept 8.000 1.000 Intercept
STORE 8.000 1.000
Error .000 1.000
a. For each source, the expected mean square
equals the sum of the coefficients in the cells
times the variance components, plus a quadratic
term involving effects in the Quadratic Term cell.

E(MSSTORE) = 8*VAR(STORE) + VAR(ERROR)

VAR(STORE) = 2 and VAR(ERROR) = 2

E(MSSTORE) = 8 2 + 2

We can use this information to estimate the variance components

To estimate the error variance


2 = MSW = .08

To estimate the variance of the store effect


E ( MS STORE ) = 8 2 + 2 From the table of expected mean squares

( MS STORE ) = 8 2 + 2 Substitute sample values for population


values/parameters

So that with a little algebra, we obtain:


MS STORE = 8 2 + MSW
8 2 = MS STORE MSW
MS STORE MSW .87 .08
2 = = = .10
8 8

To estimate total variance


Y2 = 2 + 2 = .08 + .10 = .18

9-29 2006 A. Karpinski


8. Model II: Two-factor random effects model

An Example: Suppose a projective test involves 10 cards administered to a


patient, and the number of responses to each card is recorded. The
developer of the test suspects that the order of the cards might influence the
number of responses. Furthermore, the developer has created a standardized
set of instructions in hopes that the effect of the administrator will be
negligible.
To test these assumptions about the test, the developer randomly
selects four possible orders of the ten cards. Four administrators are
recruited to give each order of the test to two patients

Administrator
Order 1 2 3 4
1 26 15 30 33 25 23 28 30
2 26 24 25 33 27 17 27 26
3 33 27 26 32 30 24 31 26
4 36 28 37 42 37 33 39 25

50

40

30
ORDER

1.00

20
2.00

3.00
DV

10 4.00
N= 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

1.00 2.00 3.00 4.00

ADMIN

With 2 observations/cell, this example is obviously for pedagogical purposes


only. Due to the limited number of observations per cell, we will assume
that the assumptions are satisfied.

9-30 2006 A. Karpinski


The structural model for this design:
Yijk = + j + k + ( ) jk + ijk

ij ~ N (0, )
j ~ N (0, )
k ~ N (0, )
( ) jk ~ N (0, )

So that Y2 = 2 + 2 + 2 + 2

ANOVA table for a random-effects model

o The test of each factor is examining a different variance component


Main effect for Administrator: H 0 : 2 = 0
Main effect for Order: H 0 : 2 = 0
Administrator by Order interaction: H 0 : 2 = 0

o In the two factor random effects model, we need to be much more careful
about examining the E(MS) and constructing appropriated tests of each
effect.

Source SS df MS E(MS) F
Factor A SSA a-1 SSA/DFA + n
2 2
+ nb 2 MSA
MSAB
Factor B SSB b-1 SSB/DFB 2 + n
2
+ na 2 MSB
MSAB
A*B SSAB (a-1)*(b-1) SSAB/DFAB 2 + n
2
MSAB
MSW
Within (Error) SSW N-ab SSW/DFW 2
Total SST N-1

o For multi-factor random effects ANOVA, you must always examine the
expected MS to make sure you are using the correct error term!

9-31 2006 A. Karpinski


To construct a test for Factor A or Factor B, we must use the MS from
the interaction as the error term

For example, lets consider Factor A

If H0 is true: 2 = 0
MSA + n + nb + n
2 2 2 2 2

Then F = = = 2 =1
MSAB 2 + n
2
+ n
2

If H1 is true: 2 > 0
MSA + n + nb
2 2 2

Then F = = >1
MSAB 2 + n
2

Suppose we tried to construct an F-test using the MSW

If H0 is true: 2 = 0
MSA + n + nb + n
2 2 2 2 2

Then F = = = >1
MSW 2 2

F would be greater than 1, even when the null hypothesis was


true! This test is not a test for the effect of factor A!!!

To construct a test for the AB interaction, we must use the MSW as


the error term

If H0 is true:
2
=0
MSAB + n 2
2 2

Then F = = = 2 =1
MSW 2

If H1 is true:
2
>0
MSAB + n
2 2

Then F = = >1
MSW 2

9-32 2006 A. Karpinski


Using SPSS to analyze a two-factor random effects design

UNIANOVA dv BY admin order


/RANDOM = admin order.
Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Intercept Hypothesis 26507.531 1 26507.531 155.441 .000
Error 716.173 4.200 170.531a
ADMIN Hypothesis 151.094 3 50.365 3.446 .065
Error 131.531 9 14.615b
ORDER Hypothesis 404.344 3 134.781 9.222 .004
Error 131.531 9 14.615b
ADMIN * Hypothesis 131.531 9 14.615 .631 .755
ORDER Error 370.500 16 23.156c
a. MS(ADMIN) + MS(ORDER) - MS(ADMIN * ORDER)
b. MS(ADMIN * ORDER)
c. MS(Error)

o SPSS highlights the fact that it is using different error terms to test each
factor

o We conclude:
There is a significant effect of order of the test on number of
responses, F(3,9) = 9.22, p < .01
Also there is a marginally significant effect of administrator on the
number of responses, F(3,9) = 3.45, p = .07
But that there is no order by administrator interaction effect on the
number of responses, F(9,16) = 0.63, p = .76.

9-33 2006 A. Karpinski


o SPSS also gives us information on the E(MS) so that we can calculate the
variance components
Expected Mean Squaresa,b

Variance Component
Var(ADMIN * Quadratic
Source Var(ADMIN) Var(ORDER) ORDER) Var(Error) Term
Intercept 8.000 8.000 2.000 1.000 Intercept
ADMIN 8.000 .000 2.000 1.000
ORDER .000 8.000 2.000 1.000
ADMIN * ORDER .000 .000 2.000 1.000
Error .000 .000 .000 1.000
a. For each source, the expected mean square equals the sum of the coefficients in
the cells times the variance components, plus a quadratic term involving effects in
the Quadratic Term cell.
b. Expected Mean Squares are based on the Type III Sums of Squares.

To estimate the error variance


2 = MSW = 23.16

To estimate the variance of the interaction effect


E ( MS Admin*Order ) = 2
2
+ 2 From the table of expected mean squares
MS Admin*Order = 2
2
+ 2 Substitute sample values for population
values/parameters
So that with a little algebra, we obtain:
MS Admin*Order = 2
2
+ MSW
2
2
= MS Ad min*Order + MSW
MS Admin*Order MSW 14.615 23.156

2
= = =0
2 2

To estimate the variance of the administrator effect


E ( MS Admin ) = 8 2 + 2
2
+ 2 = 8 2 + MS Admin*order
So that with a little algebra, we obtain:
MS Admin MS Admin*Order 50.365 14.615
2 = = = 4.47
8 8

To estimate the variance of the order effect


E ( MS Order ) = 8 2 + 2
2
+ 2 = 8 2 + MS Admin*order
So that with a little algebra, we obtain:
MS Order MS Admin*Order 134.781 14.615
2 = = = 15.02
8 8

To estimate total variance


Y2 = 2 + 2 + 2 +
2
= 23.16 + 4.47 + 15.02 + 0 = 42.65

9-34 2006 A. Karpinski


Note that any component that is estimated to be less than zero is
assumed to have a value of zero

o SPSS can also compute variance components directly


VARCOMP dv BY order admin
/RANDOM = order admin.

Variance Estimates

Component Estimate
Var(ORDER) 15.021
Var(ADMIN) 4.469
Var(ORDER * ADMIN) -4.271a
Var(Error) 23.156
Dependent Variable: DV
Method: Minimum Norm Quadratic Unbiased Estimation
(Weight = 1 for Random Effects and Residual)
a. For the ANOVA and MINQUE methods, negative
variance component estimates may occur. Some
possible reasons for their occurrence are: (a) the
specified model is not the correct model, or (b)
the true value of the variance equals zero.

9. Model III: Two-factor mixed models

Multi-factor experiments involving only random effects are relatively rare in


behavioral research. It is much more common to encounter mixed models
(containing both fixed and random effects) than to encounter a multi-factor
random effects model

A return to the study on the effect of mental activity on blood flow (BF)
See p. 9-24. This design is a 2-factor between-subjects mixed model
ANOVA
Factor 1: Test (Math, Reading Comprehension, or History)
Factor 2: Classroom (6 classrooms)

Task (fixed)
Classroom
(random) Math Reading Comp History
1 7.8 8.7 11.1 12.0 11.7 10.0
2 8.0 9.2 11.3 10.6 9.8 11.9
3 4.0 6.9 9.8 10.1 11.7 12.6
4 10.3 9.4 11.4 10.5 7.9 8.1
5 9.3 10.6 13.0 11.7 8.3 7.9
6 9.5 9.8 12.2 12.3 8.6 10.5

9-35 2006 A. Karpinski


As with the previous example, due to the limited number of observations per
cell, we will assume that the assumptions are satisfied.

14

12

10

6
TASK

Math
4
Reading

2 History
N= 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

1.00 2.00 3.00 4.00 5.00 6.00

CLASS

When considering mixed models, interactions between fixed effects and


random effects are considered to be random effects.

The structural model for a mixed design (A fixed; B random):


Yijk = + j + k + ( ) jk + ijk
ij ~ N (0, )
k ~ N (0, )
( ) jk ~ N (0, )

So that Y2 = 2 + 2 + 2

ANOVA table for a mixed-effects model

o The test of each:


Main effect for task: H0 : 1 = 2 = 3 = 0
Main effect for class: H 0 : 2 = 0
Task by class interaction: H 0 : 2 = 0

9-36 2006 A. Karpinski


o Again, we need to consider the E(MS)s so that we construct valid F-tests.

Source SS df MS E(MS) F
Factor A SSA a-1 SSA/DFA nb 2j MSA
(Fixed) 2 + n
2
+
a 1 MSAB
Factor B SSB b-1 SSB/DFB 2 + na 2 MSB
(Random) MSW
A*B SSAB (a-1)*(b-1) SSAB/DFAB 2 + n
2 MSAB
MSW
Within SSW N-ab SSW/DFW 2
(Error)
Total SST N-1

To construct a test for Factor A (the fixed effect):


We must use the MS from the interaction as the error term
To construct a test for Factor B (a random effect):
We must use the MSW as the error term
To construct a test for the Factor AB interaction (a random effect):
We must use the MSW as the error term

Why does having a random effect change the error term of the fixed effect,
but not of the random effect?

o Consider a design with therapy (3 fixed levels) and clinical trainee (3


random levels)

o We assume that the three trainees used in the study were drawn from a
population of trainees. Imagine that we can put on our magic classes and
see population means for the therapy modes for the entire population of
trainees (and for simplicity, we will assume that the population is small
consisting of 17 trainees)

Clinical Trainee
Therapy a b c d e f g h i j k l m n o p q r Mean
A 7 6 5 7 6 5 4 4 4 1 2 3 4 4 4 1 2 3 4
B 4 4 4 1 2 3 7 6 5 7 6 5 1 2 3 4 4 4 4
C 1 2 3 4 4 4 1 2 3 4 4 4 7 6 5 7 6 5 4
Mean 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

9-37 2006 A. Karpinski


o In our study, we randomly sample 3 of the trainees. So lets consider a
random sample of three trainees

Clinical Trainee
Therapy g k r Mean
A 4 2 3 3.0
B 7 6 4 5.67
C 1 4 5 3.33
Mean 4 4 4 4

o The random trainee factor does not affect our estimation of the effect of
trainee
o The random trainee factor does affect our estimation of the therapy (the
fixed factor)
Trainee and Therapy interact, which can cause variability among
means for the fixed factor to increase
MS(A) must be measuring something other than just error and the
effect of Therapy. When we look at the EMS for factor A, we see that
it captures variability due to the A*B interaction

Using SPSS to analyze a two-factor mixed effects design


UNIANOVA dv BY task class
/RANDOM = class.
Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Intercept Hypothesis 3570.062 1 3570.062 2626.655 .000
Error 6.796 5 1.359a
TASK Hypothesis 44.042 2 22.021 3.784 .060
Error 58.195 10 5.820b
CLASS Hypothesis 6.796 5 1.359 .234 .939
Error 58.195 10 5.820b
TASK * Hypothesis 58.195 10 5.820 7.207 .000
CLASS Error 14.535 18 .808c
a. MS(CLASS)
b. MS(TASK * CLASS)
c. MS(Error)

o But wait!! SPSS is using the wrong error term for test of the main effect
of classroom!!!
Classroom is a random effect. To test the random effect, we need to
use MSW as the error term. SPSS is using MSAB.

9-38 2006 A. Karpinski


o We will have to do the correct test by hand
MSCLASS 1.36
Main Effect for Class: F(5,18) = = = 1.68, p = .19
MSW 0.81

o We can also use the TEST subcommand and ask SPSS to compute the F-
test. We need to enter the effect (class), the SS of the denominator
(14.54) and the df of the denominator (18)

UNIANOVA dv BY task class


/RANDOM = class
/TEST = class vs 14.54 df(18).
Test Results

Dependent Variable: DV
Sum of
Source Squares df Mean Square F Sig.
Contrast 6.796 5 1.359 1.683 .190
Error 14.540a 18a .808
a. User specified.

o BEWARE! SPSS may contain other errors. If you are going to be


analyzing balanced random or mixed designs, it is worth your time and
effort to look up or calculate the E(MS)s for your design (For an
algorithm see Neter, Appendix D)

o Note: SPSS does not consider this to be an error. They state that
statisticians differ in how they approach this problem.
http://spss.com/tech/answer/details.cfm?tech_tan_id=100000073

As indicated in this tech note, SAS makes the same error. Thus,
even if you run the analysis in SAS, you will still have to rerun the
analysis

I cannot find any recent texts that agree with the SPSS approach.
Neter et al (1996, p 981), Kirk (1995, p 374) and Maxwell & Delaney
(1990, p 429/431) all give the E(MS) I list on the previous page. For
balanced designs, SPSS does the wrong analysis. For unbalanced
designs, SPSSs approach may be appropriate.

9-39 2006 A. Karpinski


SPSSs Incorrect Output What the output should look like
Expected Mean Squaresa,b

Variance Component
Var(task * Quadratic
Source Var(class) class) Var(Error) Term
Intercept Intercept, Expected Mean Squaresa
6.000 2.000 1.000
task
task .000 2.000 1.000 task Variance Component
class 6.000 2.000 1.000 Var(TASK * Quadratic
task * class .000 2.000 1.000 Source Var(CLASS) CLASS) Var(Error) Term
Intercept Intercept,
Error .000 .000 1.000 6.000 2.000 1.000
TASK
a. For each source, the expected mean square equals the
TASK .000 2.000 1.000 TASK
sum of the coefficients in the cells times the variance
CLASS 6.000 .000 1.000
components, plus a quadratic term involving effects in the
Quadratic Term cell. TASK * CLASS .000 2.000 1.000
b. Expected Mean Squares are based on the Type III Sums of Error .000 .000 1.000
Squares. a. Andy's Hand-Corrected Table

o The table on the right is a hand-corrected variance components table


(based on the correct E(MS) values listed on page 9-37)

To estimate the error variance


2 = MSW = 0.81

To estimate the variance of the interaction effect


E(MSTask* Class ) = 2
2
+ 2
So that with a little algebra, we obtain:
MSTask*Class MSW 5.82 0.81

2
= = = 2.51
2 2

Task is a fixed effect there is no variance component to estimate

To estimate the variance of the class effect


E(MSClass ) = 6 2 + 2
So that with a little algebra, we obtain:
MSClass MSW 1.36 0.81
2 = = = 0.09
6 6

To estimate total variance


Y2 = 2 + 2 +
2
= 0.81+ 0.09 + 2.51 = 3.41

o SPSSs VARCOMP command also errs on the variance estimate for the
class effect (SPSS output not shown here)

9-40 2006 A. Karpinski


10. Contrasts and post-hoc tests

To perform contrasts or post-hoc tests, you can use the same formulas
previously discussed for ANOVA with one exception. You must use the
correct error term in place of MSW, and the degrees of freedom associated
with that error term

o If you perform contrasts/post-hoc test on the marginal means for factor


A, you need to use the error term used to test factor A
o If you perform contrasts/post-hoc test on the marginal means for factor B,
you need to use the error term used to test factor B
o If you perform contrasts/post-hoc test on the individual cell means, you
need to use the error term used to test AB interaction

11. Effect sizes for random effects designs

The random effects equivalent of eta squared is rho,

Rho is interpreted just as eta squared as the proportion of the variance in


the DV accounted for by the factor in the sample

A2
A =
Y2

Omega squares must still be used for fixed effects in a mixed model. In
general, for a fixed factor A:

SSA (dfA) MS[errorterm]


A2 =
SSA (dfA) MS[errorterm] + ( N ) MSW

o For example, in a two-factor mixed model, with A fixed and B random,


we used MSAB as the error term to test Factor A. Thus, our equation for
omega squared would be:
SSA (dfA) MSAB
A2 =
SSA (dfA) MSAB + ( N ) MSW

44.04 (2)5.82
Task
2
= = .53
44.04 (2)5.82 + (36)(.808)

9-41 2006 A. Karpinski


12. Final considerations about random effects

The distinction between fixed and random effects is not always as clear as
presented here. For example, Clark (1973) argued convincingly that
when a list of words is used in a study, the words should be treated as a
random effect. The key is what type of inference you want to make

We consider the random effects as being sampled from an infinite


population. If the population is finite but large, we are OK. However, when
the population to be sampled from is small, adjustments are necessary

We estimate the distribution of the random effects based on the means (and
the variability of those means) of the random factor. If you only have 2-3
levels of your random factor, you will not get a good estimate of the
distribution. It is desirable to have a relatively large number of levels of any
random factor. In addition, it is important that the levels of the random
factor be randomly sampled from the population of interest

In designs with three or more factors that include two or more random
effects, it is common to encounter situations where no exact F-test can be
constructed. In this case, quasi-F ratios (linear combinations of MSs) are
used to approximate an F-ratio.

All of our calculations assume that cell sizes are equal. Things get very
wacky with unequal cell sizes, and it is no longer possible to construct exact
F-tests (the ratios of expected MSs no longer satisfy the requirements for a
valid F-test). Approximate tests are available and are calculated in SPSS.

It is a good idea to calculate or look-up E(MS)s for balanced designs and/or


to replicate the analysis using another statistical package.

9-42 2006 A. Karpinski


ANOVA designs with nested effects

13. An introduction to nested designs

Nested designs are also known as hierarchical designs


The factorial designs studied thus far are considered to be crossed designs.
That is, every level of a factor appears in (or is crossed with) every level of
all other factors. If you display the design in a grid, there are no empty cells
in a crossed design.

Example 1: The effect of therapists sex on treatment outcome You observed


three male and three female therapists. Each therapist sees four patients, and
you record a general measure of psychological health.

Sex of therapist Male Female

Therapist 1 2 3 4 5 6

o Sex is the main variable of interest and is a fixed effect


o Therapist is nested within sex (It can not be crossed because a therapist
can not be both male and female). Therapist will also be considered a
random effect
o Each therapist sees three patients. Thus, patients are nested within
therapist (and are a random effect)

Example #2: The effect of race of defendant on jury decision making

Race of Defendant Black White

Jury 1 2 3 4 5 6 7 8 9 10 11 12

o Race is the main variable of interest and is a fixed effect


o Jury is nested within race. Jury will most likely be considered a random
effect
o Each jury is composed of 12 participants. The participants are nested
within jury (and are also a random effect)

9-43 2006 A. Karpinski


Example #3: A new intervention is developed to reduce drug use in inner
city middle-schools students. Six inner-city schools are selected at random,
three receive the new intervention and three receive the old intervention and
within each of those schools two classrooms are selected at random to
receive the new intervention.

Old intervention

School School A School B School C

Classroom 1 2 3 4 5 6 7 8 9 10 11 12

New intervention

School School D School E School F

Classroom 1 2 3 4 5 6 7 8 9 10 11 12

o Type of intervention is a fixed effect


o School is a random effect nested within treatment
o Classroom is a random effect nested within school
o The participants are a random effect nested within classroom

General comments about nested designs


o In behavioral research, nested factors are usually random effects
o In factorial between subjects designs, participants are nested within cell

Because I am presenting only an introduction to nested designs, I will


consider only designs with random effects nested within a fixed effect (like
these examples). I can provide references for the analysis of more advanced
designs.

9-44 2006 A. Karpinski


14. Structural models for nested designs

Example #1: Therapists sex and treatment outcome


o Factor A: Therapists sex (Male vs. Female) Fixed effect
o Factor B: Therapist Random effect

Yijk = + j + k ( j ) / + i ( jk )

j The fixed effect of therapists sex


k( j) / The random effect of therapist within sex
i ( jk ) The errors/residuals
AKA the random effect of participant within therapist
Sometimes notated i ( jk ) / to emphasize the nesting

Example #3: Drug use intervention


o Factor A: Intervention Fixed effect
o Factor B: School within intervention Random effect
o Factor C: Classroom within school Random effect

Yijkl = + j + k ( j ) / + l ( jk ) / + i ( jkl )

j The fixed effect of intervention


k( j) / The random effect of school within intervention
l ( jk ) / The random effect of class within school
i ( jk ) The errors/residuals
AKA the random effect of participant within class
Sometimes notated i ( jkl ) / to emphasize the nesting

Note that because these designs are nested, not crossed, there is no way to
estimate an interaction effect.

9-45 2006 A. Karpinski


15. Testing nested effects

With nested effects, we again need to make sure we use the correct error
term when constructing F-tests.

Design Effect Error Term


Two-factor B/A A MS(B/A)
B Random B/A MSW
A Fixed

Three- factor C/B/A A MS(B/A)


C,B Random B/A MS(C/B)
A Fixed C/B MSW

o Just as for the random effect designs the SS are calculated in the same
manner as before. The only difference is the construction of the F-test
o For more complex designs, youll have to look up the error term, or trust
SPSS

Example #1: Therapists sex and treatment outcome

Sex of Therapist
Male Female
1 2 3 4 5 6
49 42 42 54 44 57
40 48 46 60 54 62
31 52 50 64 54 66
40 58 54 70 64 71

o To test the effect of sex of therapist, we treat each therapist as one


observation (collapsing across participants)

Sex of Therapist
Male Female
40 50 48 62 54 64

A one-factor ANOVA on these six observations would have:


1 df in the numerator
4 df in the denominator
This is essentially how the effect of sex of therapist is analyzed in a
nested design

9-46 2006 A. Karpinski


o SPSS syntax:
UNIANOVA dv BY sex thera
/RANDOM = thera
/DESIGN = sex thera within sex .

Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Intercept Hypothesis 67416.000 1 67416.000 601.929 .000
Error 448.000 4 112.000a
SEX Hypothesis 1176.000 1 1176.000 10.500 .032
Error 448.000 4 112.000a
THERA(SEX) Hypothesis 448.000 4 112.000 2.459 .083
Error 820.000 18 45.556b
a. MS(THERA(SEX))
b. MS(Error)

Effect for sex of therapist: F(1,4) = 10.50, p = .03


Effect of therapist: F(4, 18) = 2.46, p = .08

o Lets do the one-factor ANOVA on the collapsed data to examine the


effect of sex of therapist

Sex of Therapist
Male Female
40 50 48 62 54 64

Descriptives
ANOVA
DV
DV
Sum of
N Mean Squares df Mean Square F Sig.
1.00 3 46.0000 Between Groups 294.000 1 294.000 10.500 .032
2.00 3 60.0000 Within Groups 112.000 4 28.000
Total 6 53.0000 Total 406.000 5

This analysis produces the same results only the SS are different.
This analysis was tricked into thinking each observation was one
participant, but in the actual analysis, we know that each observation
was based on data from four participants. If you multiply the SS in
this oneway analysis by 4, you will get the same results as the nested
analysis. (This trick only works for balanced designs)

9-47 2006 A. Karpinski


o To calculate the effect sizes:
Sex is a fixed effect, so we need to calculate omega squared

SSA (dfA) MS[errorterm]


A2 =
SSA + (dfA) MS[errorterm] + ( N ) MSW

1176 (1)112
Sex
2
= = .45
1176 + (1)112 + (24)45.56

Therapist within sex is a random effect, so we need to calculate phi

Thera
2

Thera ( sex ) =
( sex )

Y2

Expected Mean Squares

Variance Component
Var(THER Quadratic
Source A(SEX)) Var(Error) Term
Intercept Intercept,
4.000 1.000
SEX
SEX 4.000 1.000 SEX
THERA(SEX) 4.000 1.000
Error .000 1.000

E ( MS Thera ( sex ) ) = 4 Thera


2
( sex ) +
2

MS Thera ( sex ) MSW 121 45.56


Thera
2
( sex ) = = = 18.86
4 4

Y2 = Thera
2
( sex ) +
2

Y2 = 18.86 + 45.56 = 64.42

Thera
2
18.86
Thera ( sex ) =
( sex )
= = .29
2
Y 64.42

9-48 2006 A. Karpinski


Example #3: Drug use intervention
(Lets assume that there were three students in each class)

Old Intervention
School 1 School 2 School 3
1 2 3 4 1 2 3 4 1 2 3 4
11.2 16.5 18.3 19 7.3 11.9 11.3 8.9 15.3 19.5 14.1 16.5
11.6 16.8 18.7 18.5 7.8 12.4 10.9 9.4 15.9 20.1 13.8 17.2
12.0 16.1 19.0 18.2 7.0 12.0 10.5 9.3 16.0 19.3 14.2 16.9

New Intervention
School 1 School 2 School 3
1 2 3 4 1 2 3 4 1 2 3 4
13.2 17.25 20.3 20.5 9.3 12.9 10.3 10.9 17.55 20.75 15.1 18.75
12.35 18.8 18.45 17.5 7.05 14.65 12.15 8.15 14.9 22.1 14.55 17.2
13.25 15.85 21.0 19.2 8.5 14.25 10.0 11.55 17.75 21.3 13.7 16.9

o To gain an intuitive understanding of how nested effects are tested, it is


beneficial to examine each effect separately

o To test the effect of the intervention, we essentially treat each school as


one observation (collapsing across classrooms and participants)

Intervention
Old New
16.33 9.89 16.57 17.30 10.81 17.55

A one-factor ANOVA on these six observations has:


1 df in the numerator (a-1) = (2-1) = 1
4 df in the denominator a(b-1) = 2(3-1) = 2*2 = 4

ONEWAY dv by treat
/STAT = DESC.
Descriptives
ANOVA
DV
DV
Sum of
N Mean Std. Deviation Squares df Mean Square F Sig.
1.00 3 14.2613 3.78589 Between Groups 1.379 1 1.379 .095 .773
2.00 3 15.2200 3.82122 Within Groups 57.869 4 14.467
Total 6 14.7407 3.44232 Total 59.248 5

F(1,4) = 0.10, p = .77

9-49 2006 A. Karpinski


o To test the effect of school (within intervention), we treat each class as
one observation (collapsing across participants)

School (Treatment)
1(Old) 2(Old) 3(Old) 1(New) 2(New) 3(New)
11.60 7.37 15.73 12.93 8.28 16.73
16.47 12.10 19.63 17.30 13.93 21.38
18.67 10.90 14.03 19.92 10.81 14.45
18.57 9.20 16.86 19.07 10.20 17.61

A school within treatment ANOVA on these 24 observations has:


4 df in the numerator a(b-1) = 2(3-1) = 2*2 = 4
18 df in the denominator ab(c-1) = 2*3*(4-1) = 2*3*3 = 18

UNIANOVA dv BY treat school


/DESIGN = treat, school within treat.
Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 237.029 5 47.406 6.427 .001
Intercept 5213.833 1 5213.833 706.816 .000
TREAT 5.491a 1 5.491 .744 .400
SCHOOL(TREAT) 231.538 4 57.885 7.847 .001
Error 132.777 18 7.377
Total 5583.639 24
Corrected Total 369.807 23
a. Ignore this test for the effect of treatment in this setup

F(4,18) = 7.85, p = .001

o Finally, to test the effect of class (within school within intervention), we


examine the individual observations

This analysis has:


18 df in the numerator ab(c-1) = 2*3*(4-1) = 2*3*3 = 18
48 df in the denominator abc(n-1) = 2*3*4*(3-1) = 48

9-50 2006 A. Karpinski


o To analyze all the effects in one command:

UNIANOVA dv BY treat school class


/RANDOM = school class
/PRINT = DESC
/DESIGN = treat, school within treat,
class within school within treat.

Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Intercept Hypothesis 15643.857 1 15643.857 90.088 .001
Error 694.600 4 173.650a
TREAT Hypothesis 16.531 1 16.531 .095 .773
Error 694.600 4 173.650a
SCHOOL(TREAT) Hypothesis 694.600 4 173.650 7.850 .001
Error 398.194 18 22.122b
CLASS(SCHOOL Hypothesis 398.194 18 22.122 27.682 .000
(TREAT)) Error 38.358 48 .799c
a. MS(SCHOOL(TREAT))
b. MS(CLASS(SCHOOL(TREAT)))
c. MS(Error)

Effect of treatment: F(1,4) = 0.10, p = .77


Effect of school(treatment): F(4,18) = 7.85, p = .001
Effect of class(school(treatment)): F(18,48) = 27.68, p < .001

o SPSS also provides the variance components so that effect sizes can be
calculated for the random effects
Expected Mean Squaresa,b

Variance Component
Var(CLASS
Var(SCHOO (SCHOOL(T Quadratic
Source L(TREAT)) REAT))) Var(Error) Term
Intercept Intercept,
12.000 3.000 1.000
TREAT
TREAT 12.000 3.000 1.000 TREAT
SCHOOL(TREAT) 12.000 3.000 1.000
CLASS(SCHOOL
.000 3.000 1.000
(TREAT))
Error .000 .000 1.000
a. For each source, the expected mean square equals the sum of the
coefficients in the cells times the variance components, plus a
quadratic term involving effects in the Quadratic Term cell.
b. Expected Mean Squares are based on the Type III Sums of Squares.

9-51 2006 A. Karpinski


16. Final considerations about nested designs

In these examples, we did not test the assumptions for the model because of
small cell sizes. However, the ANOVA assumptions must be satisfied for the
results to be valid. The assumptions for a nested model are the same as the
assumptions for a fixed or random effects model (depending on if there are
fixed or random effects in the model).

Pay attention to the small degrees of freedom in the tests for some of the
nested effects. In both examples, the test of the fixed effect (the effect of
most interest in these designs) is based on six observations! Nested designs
can have very low power unless you have a large number of levels of the
nested effects.

We have focused on balanced complete nested designs with random effects


nested within a fixed effect. Many other nested designs are possible
including partially nested designs. Before you run a more complicated
nested design, make sure that you know how to analyze it. Kirk (1995) is a
good reference.

As in the random effects case, contrasts and post-hoc tests can be conducted
by using the appropriate error term in previously developed equations.

We have discussed nested designs in an ANOVA framework where all the


independent variables are categorical variables. In a regression framework,
these models are usually called hierarchical linear models (HLM) and are
very popular at the moment. In an HLM analysis, different terminology and
different methods of estimation are used, but the interpretation is the same.

9-52 2006 A. Karpinski


ANOVA designs with randomized blocks

17. The logic of blocking

When we test the effect of a factor on a dependent variable, there are always
many other factors that lead to variability in the DV. When these variables
are not of interest to us, they are called nuisance variables.
For example, if we are interested in the relationship between type of therapy
and psychological wellness, there are many other factors that influence
wellness other than the type of therapy.

What can we do about nuisance variables?

o The typical approach is to use random assignment of participants to


treatment conditions.
The nuisance variables are distributed equally over the experimental
factors so that they do not affect just one treatment level.
However, all the variation in the DV caused by the nuisance variable
is accumulated in the MSW. A large MSW (relative to the MS of the
factor of interest) will decrease our power to detect the effect of
interest.

o An alternative approach is to hold the nuisance variables constant.


For example, to examine the effectiveness of several types of therapy,
we can use only 18-year-old white females who have the same
severity of the disorder. By creating a homogenous sample, we will
decrease the MSW and increase our power.
This approach limits the generalizability of the conclusions. In
addition, if you attempt to hold several variables constant, it may be
difficult to find participants for the study.

o You can also include the nuisance variable(s) as factors in the study.
This approach is known as blocking.

9-53 2006 A. Karpinski


Any variable that is related to the DV may be used as a blocking variable.
There are two categories of common blocking variables:

o Characteristics associated with the participant:


Gender Education
Age Attitudes
Income Previous experience with
IQ task

o Characteristics associated with the experimental setting:


Time of day Week
Batch of material Measuring instrument
Location The participant (!)

When we include a blocking factor in the design, we can capture the


variability it causes in the DV in a SS(Blocks). This process will reduce the
SS Within, compared to a non blocked design

SS Total
(SS Corrected Total)

SS A SS Error
df=(a-1) df = N-a

SS A SS Blocks SS Residual
df=a-1 df = bl-1 df = N a bl + 1

9-54 2006 A. Karpinski


18. Examples of blocked designs
Example #1: Methods of quantifying risk. Managers were exposed to one of
three methods of quantifying risk. After learning about the method,
participants were asked to rate their degree of confidence in their risk
assessments.
Fifteen participants were grouped into five blocks, according to their age.
Within each block, participants were randomly assigned to one of the three
experimental conditions

o Layout for a randomized block design

Participant
1 2 3
Block 1 (Oldest participants) C W U
2 C U W
3 U W C
4 W U C
5 (Youngest participants) W C U

o Data from the quantifying risk example:

Method
Block Utility Worry Comparison Average
1 (oldest) 1 5 8 4.7
2 2 8 14 8.0
3 7 9 16 10.7
4 6 13 18 12.3
5 (youngest) 12 14 17 14.3
Average 5.6 14 17

Note that a randomized block design looks like a factorial design, but
there is only one participant per cell. If there were two or more
participants per cell, we would call this design a two-way ANOVA.

Because there is one participant per cell, we do not have any


information to test the block by factor interaction.

9-55 2006 A. Karpinski


o Assumptions for a randomized block design:
Because we only have one observation/cell, we cannot check
assumptions on a cell-by-cell basis as we would for a factorial design.

We require the standard assumptions:


Independently and randomly sampled observations
Homogeneity of variances
(Checked on the marginal means for the factor AND for the blocks)
Normality
(By block and by treatment)
We assume that there is no treatment by block interaction (non-
additivity of treatment and blocks)
Plot observed values by block and look for parallel lines

Additional assumptions are required if the blocking factor is a random


effect

o Checking assumptions in the quantifying risk example


EXAMINE VARIABLES=dv BY block treat
/PLOT BOXPLOT SPREADLEVEL NPPLOT.

By treatment:
20

10
3

Test of Homogeneity of Variance


DV

Levene -10
N= 5 5 5
Statistic df1 df2 Sig. 1.00 2.00 3.00

DV .048 2 12 .953
TREAT

Tests of Normality

Shapiro-Wilk
TREAT Statistic df Sig.
DV 1.00 .940 5 .665
2.00 .943 5 .687
3.00 .860 5 .227

9-56 2006 A. Karpinski


By block:
20

10

0
Test of Homogeneity of Variances

DV

DV
Levene -10
N= 3 3 3 3 3
Statistic df1 df2 Sig. 1.00 2.00 3.00 4.00 5.00

.552 4 10 .702
BLOCK

Tests of Normality

Shapiro-Wilk
BLOCK Statistic df Sig.
DV 1.00 .993 3 .843
2.00 1.000 3 1.000
3.00 .907 3 .407
4.00 .991 3 .817
5.00 .987 3 .780

But with three observations per block, these tests are essentially
worthless!

No treatment by block interaction


Test for Interaction

20
16 Block 1
Block 2
12
Block 3
8
Block 4
4 Block 5
0
Utility Worry Comparison

It may be difficult to judge the difference between random error and a


true block * factor interaction. You are looking for an extreme pattern
in the data.

o All the assumptions appear to be satisfied in this case

9-57 2006 A. Karpinski


o What to do if assumptions are not satisfied?

Non-normality and/or moderate heterogeneity of variances


Rank data and perform analysis on ranked data

Heterogeneity of variances and/or treatment by block interaction


Transform data

o Structural model for a randomized block design with one factor and one
block:
Yij = + j + i + ij

= Grand population mean


= Y ..

j = The treatment effect:


The effect of being in level j of factor A
j = 0 or j ~ N (0, )
j = Y . j Y ..

i = The block effect:


The effect of being in level i of the blocking variable
i =0
i = Yi . Y ..

ij = The unexplained error associated with Yij


ij = Yij Yi . Y . j + Y ..

The randomized block design is identical to a two-factor ANOVA


with no interaction term.

In this case, the blocking variable is considered to be a fixed variable.


Special accommodations are necessary for a random blocking factor.

9-58 2006 A. Karpinski


o Sums of squares decomposition and ANOVA table for a randomized
block design:

E(MS)
Treatments Treatments
Source SS df MS Fixed Random
Treatment SSA a-1 MSA bl j 2
2 + bl 2
+
2

a 1
Blocks SSBL bl-1 MSBL a 2j a 2j
+
2
+
2

bl 1 bl 1
Error SSError (a-1)(bl-1) MSE 2 2
Total SST N-1

To construct a significance test

For fixed treatment effects For Random Treatment effects


H 0 : 1 = 2 = ... = a = 0 H 0 : 2 = 0

But for either fixed or random effects, we construct the F-test in


the same manner
MSA
F [a 1, (a 1)(bl 1)] =
MSE

To test for the block effect


MSBL
F [bl 1, (a 1)(bl 1)] =
MSE

However, we are usually not so interested in the test of the


blocking variable. We included this variable to reduce the error
variability.

9-59 2006 A. Karpinski


o Using SPSS to analyze a randomized block design
UNIANOVA dv BY block treat
/DESIGN = treat block.

Note that a factorial design (treatment, block, and treatment*block) is


assumed unless otherwise stated with the DESIGN subcommand
Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 374.133a 6 62.356 20.901 .000
Intercept 1500.000 1 1500.000 502.793 .000
TREAT 202.800 2 101.400 33.989 .000
BLOCK 171.333 4 42.833 14.358 .001
Error 23.867 8 2.983
Total 1898.000 15
Corrected Total 398.000 14
a. R Squared = .940 (Adjusted R Squared = .895)

We find a significant treatment effect, F(2,8) = 33.99, p < .001


SSA (dfA) MSError 202.8 (2)2.983
2A = = = .814
SSA + (N dfA) MSError 202.8 + (15 2)2.983

Note that post-hoc tests on the marginal treatment means are required
to identify the effect

o What if we had neglected to block by age of participant?


ONEWAY dv BY treat.
ANOVA

DV
Sum of
Squares df Mean Square F Sig.
Between Groups 202.800 2 101.400 6.234 .014
Within Groups 195.200 12 16.267
Total 398.000 14

SSA (dfA) MSWithin 202.8 (2)16.267


A2 = = = .41
SSA + ( N dfA) MSWithin 202.8 + (15 2)16.267

Although inclusion of the blocking effect did not change the


conclusion of the statistical test, blocking greatly increased the size of
the effect of treatment.

9-60 2006 A. Karpinski


Example #2: Fat in the diet. A researcher studies three low fat diets.
Participants were blocked on the basis of age. DV = post-diet reduction in
blood plasma lipid levels
Fat content of diet
Extremely Fairly Moderately
Block Low Low Low
15-24 .73 .67 .35
25-34 .86 .75 .41
35-44 .94 .81 .46
45-54 1.40 1.32 .95
55-64 1.62 1.41 .98

o First, lets check the assumptions


EXAMINE VARIABLES=dv BY block fat
/PLOT BOXPLOT NPPLOT.

By block By treatment level


1.8
1.8

1.6
1.6

1.4 1.4

1.2 1.2

1.0 1.0

.8 .8

.6 .6

.4 .4
DV
DV

.2 .2
N= 3 3 3 3 3 N= 5 5 5

1.00 2.00 3.00 4.00 5.00 1.00 2.00 3.00

BLOCK FAT

Tests of Normality

Shapiro-Wilk Tests of Normality


BLOCK Statistic df Sig.
DV 1.00 .865 3 .281 Shapiro-Wilk
2.00 .920 3 .452 FAT Statistic df Sig.
3.00 .935 3 .506 DV 1.00 .898 5 .401
4.00 .878 3 .320 2.00 .829 5 .138
5.00 .962 3 .626 3.00 .792 5 .070

Test of Homogeneity of Variance

Levene
Statistic df1 df2 Sig.
DV Based on Mean .336 2 12 .721
Based on Median .047 2 12 .954
Based on Median and
.047 2 11.893 .954
with adjusted df
Based on trimmed mean .302 2 12 .745

9-61 2006 A. Karpinski


Check for treatment by block interaction:

2
1.6 Age 15-24
Age 25-34
1.2
Age 35-44
0.8
Age 45-54
0.4 Age 55-64
0
Extreme Fair Moderate

All assumptions seem fine

o To examine the effect of fat in the diet on plasma lipid levels, lets
conduct a randomized block ANOVA

UNIANOVA dv BY block fat


/DESIGN = fat block.
Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 2.045a 6 .341 141.102 .000
Intercept 12.440 1 12.440 5151.017 .000
FAT .626 2 .313 129.527 .000
BLOCK 1.419 4 .355 146.890 .000
Error 1.932E-02 8 2.415E-03
Total 14.504 15
Corrected Total 2.064 14
a. R Squared = .991 (Adjusted R Squared = .984)

We find a significant effect of fat in the diet on plasma lipid levels,


F(2,8) = 129.52, p < .001

Lets conduct Tukey HSD post-hoc tests on the marginal treatment


means. We can have SPSS do the test for us:
UNIANOVA dv BY fat block
/POSTHOC = fat ( TUKEY )
/DESIGN = fat block .

9-62 2006 A. Karpinski


Multiple Comparisons

Dependent Variable: DV
Tukey HSD

Mean
Difference 95% Confidence Interval
(I) FAT (J) FAT (I-J) Std. Error Sig. Lower Bound Upper Bound
1.00 2.00 .1180* .03108 .013 .0292 .2068
3.00 .4800* .03108 .000 .3912 .5688
2.00 1.00 -.1180* .03108 .013 -.2068 -.0292
3.00 .3620* .03108 .000 .2732 .4508
3.00 1.00 -.4800* .03108 .000 -.5688 -.3912
2.00 -.3620* .03108 .000 -.4508 -.2732
Based on observed means.
*. The mean difference is significant at the .050 level.

Extremely low vs. fairly low fat: t(8) = 3.80, p = .013


Extremely low vs. moderately low fat: t(8) = 15.44, p < .001
Fairly low vs. moderately low fat: t(8) = 11.65, p < .001

o Note that if we had neglected to block on age, we would have failed to


find a significant treatment effect!
ONEWAY dv BY fat.
ANOVA

DV
Sum of
Squares df Mean Square F Sig.
Between Groups .626 2 .313 2.610 .115
Within Groups 1.438 12 .120
Total 2.064 14

o What would happen if we forgot this was a randomized block design, and
attempted to analyze it as a factorial design?
UNIANOVA dv BY fat block
/DESIGN = fat block fat*block.
Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 2.064a 14 .147 . .
Intercept 12.440 1 12.440 . .
FAT .626 2 .313 . .
BLOCK 1.419 4 .355 . .
FAT * BLOCK 1.932E-02 8 2.415E-03 . .
Error .000 0 .
Total 14.504 15
Corrected Total 2.064 14
a. R Squared = 1.000 (Adjusted R Squared = .)

Why did this happen???

9-63 2006 A. Karpinski


A final example: A researcher studied how children solved a variety of
puzzles. Sixty children were blocked into groups of 6 on the basis of age,
gender, and IQ. Within each block, children were randomly assigned to
work on a specific type of puzzle. The number of puzzles (out of a possible
20) solved by each child was recorded.

Puzzle Type
Block P1 P2 P3 P4 P5 P6
1 5 14 8 10 11 6
2 7 10 7 9 12 5
3 11 9 10 11 14 6
4 9 10 6 13 15 7
5 13 12 7 14 16 11
6 7 9 8 6 11 5
7 10 11 8 12 13 8
8 4 8 5 7 9 4
9 14 13 11 15 17 12
10 9 9 8 10 14 9

o First, lets check assumptions:


EXAMINE VARIABLES=dv by block puzzle
/PLOT BOXPLOT NPPLOT SPREADLEVEL.

By factor
18
Tests of Normality
16
Shapiro-Wilk
14
PUZZLE Statistic df Sig.
12
DV 1.00 .970 10 .891
51 2.00 .924 10 .394
10 15
3.00 .941 10 .560
8 4.00 .974 10 .925
6
5.00 .979 10 .959
45 6.00 .927 10 .415
4
DV

2
N= 10 10 10 10 10 10

1.00 2.00 3.00 4.00 5.00 6.00 Test of Homogeneity of Variance


PUZZLE
Levene
Statistic df1 df2 Sig.
DV Based on Mean 1.110 5 54 .366

9-64 2006 A. Karpinski


By block
18
Tests of Normality
16
Shapiro-Wilk
14 17 59 BLOCK Statistic df Sig.
DV 1.00 .969 6 .886
12 2.00 .972 6 .907
10
3.00 .964 6 .847
4.00 .952 6 .759
8
5.00 .963 6 .846
6 18 6.00 .983 6 .964
7.00 .918 6 .493
4
8.00 .892 6 .331
DV

2 9.00 .983 6 .964


N= 6 6 6 6 6 6 6 6 6 6

1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 10.00 .750 6 .020

BLOCK
Test of Homogeneity of Variance

Levene
Statistic df1 df2 Sig.
DV Based on Mean .521 9 50 .852

Block by factor interaction

18
16
14
12
10
8
6
4
2
0
P1 P2 P3 P4 P5 P6

All appears OK.

9-65 2006 A. Karpinski


Lets start with a general ANOVA approach
UNIANOVA dv BY puzzle block
/DESIGN = puzzle block.
Tests of Between-Subjects Effects

Dependent Variable: DV
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 488.000a 14 34.857 15.121 .000
Intercept 5684.267 1 5684.267 2465.861 .000
PUZZLE 238.933 5 47.787 20.730 .000
BLOCK 249.067 9 27.674 12.005 .000
Error 103.733 45 2.305
Total 6276.000 60
Corrected Total 591.733 59
a. R Squared = .825 (Adjusted R Squared = .770)

o We find a significant puzzle effect, F (5,45) = 20.73, p < .01

o To describe specific differences, we conduct pair-wise posthoc tests


UNIANOVA dv BY puzzle block
/POSTHOC = puzzle ( TUKEY )
/DESIGN = puzzle block.
Multiple Comparisons

Dependent Variable: DV
Tukey HSD

Mean
Difference 95% Confidence Interval
(I) PUZZLE (J) PUZZLE (I-J) Std. Error Sig. Lower Bound Upper Bound
1.00 2.00 -1.6000 .67900 .194 -3.6207 .4207
3.00 1.1000 .67900 .590 -.9207 3.1207
4.00 -1.8000 .67900 .106 -3.8207 .2207
5.00 -4.3000 .67900 .000 -6.3207 -2.2793
6.00 1.6000 .67900 .194 -.4207 3.6207
2.00 3.00 2.7000 .67900 .003 .6793 4.7207
4.00 -.2000 .67900 1.000 -2.2207 1.8207
5.00 -2.7000 .67900 .003 -4.7207 -.6793
6.00 3.2000 .67900 .000 1.1793 5.2207
3.00 4.00 -2.9000 .67900 .001 -4.9207 -.8793
5.00 -5.4000 .67900 .000 -7.4207 -3.3793
6.00 .5000 .67900 .976 -1.5207 2.5207
4.00 5.00 -2.5000 .67900 .008 -4.5207 -.4793
6.00 3.4000 .67900 .000 1.3793 5.4207
5.00 6.00 5.9000 .67900 .000 3.8793 7.9207
Based on observed means.

Puzzle 5 is solved more frequently than all other puzzles


Puzzles 2 and 4 are solved more frequently than puzzles 3 and 6

9-66 2006 A. Karpinski


Alternatively, imagine that you had the following a priori hypotheses
o P2 = P4
o P3 = P6
P 2 + P 4 P3 + P 6
o P5 > >
2 2

o We cannot enter contrasts directly into SPSS, so well have to do these


contrasts by hand.

o Computing and testing a Main Effect Contrast (see 7-39)


a
= c j X . j . = c1 X .1. + ... + c r X .a .
j =1

a c 2j
StdError ( ) = MSError
j =1 nj

Where c 2j is the squared weight for each marginal mean


n j is the sample size for each marginal mean
MSE is MSE from the omnibus ANOVA
(With the effects of the blocks removed)

t~

t observed =
c j X . j.
standard error( ) c 2j
MSE
nj

SSC
2 dfc SSC
SS ( ) = F (1, dfw) = =
c 2
SSE MSE
n
j
dfw
j

9-67 2006 A. Karpinski


o Create contrast coefficients:
P2 = P4 (0 1 0 1 0 0)
P3 = P6 (0 0 1 0 0 1)
P 2 + P 4 P3 + P 6
P5 > > (0 -1 0 -1 2 0) (0 1 -1 1 0 -1)
2 2

o Compute the value of each contrast:


Descriptive Statistics

Dependent Variable: DV
PUZZLE Mean Std. Deviation N
1.00 8.9000 3.24722 10
2.00 10.5000 1.95789 10
3.00 7.8000 1.75119 10
4.00 10.7000 2.90784 10
5.00 13.2000 2.48551 10
6.00 7.3000 2.66875 10
Total 9.7333 3.16692 60

(0 1 0 1 0 0) 1 = 10.5 + 10.7 = 0.2 SS ( 1 ) = 0.2


(0 0 1 0 0 1) 2 = 7.8 + 7.3 = 0.4 SS ( 2 ) = 0.8
(0 -1 0 -1 2 0) 3 = 10.5 10.7 + 2 *13.2 = 5.2 SS ( 3 ) = 45.067
(0 1 -1 1 0 -1) 4 = 10.5 7.8 + 10.7 7.3 = 6.1 SS ( 4 ) = 93.025

o Test the contrast:


.2
1 : F (1,45) = = 0.08, p = .77
2.305

.8
2 : F (1,45) = = 0.35, p = .56
2.305

45.067
3 : F (1,45) = = 19.55, p < .01
2.305

93.025
4 : F (1,45) = = 40.36, p < .01
2.305

o Note that if these were post-hoc tests, then we would need to apply the
Tukey HSD or Scheff correction.

9-68 2006 A. Karpinski


19. Final considerations about blocking

As shown in the last SPSS output, when there is one participant per cell, the
SS for the interaction is the error term. Some authors create ANOVA tables
with no error term, and use the SS(BL*A) to test the effect of A. The only
difference in these approaches is the labeling of the error term.

If the blocking variable is not related to the DV, then you actually lose
power by including it in the design.

Blocked Design
Source SS df MS F
Treatment SSA a-1 MSA MSA
F [(a 1), ( N a bl + 1)] =
MSE
Blocks 0 bl-1 MSBL
Error SSError (a-1)(bl-1) MSE
Total SST N-1

Standard Design
Source SS df MS F
Treatment SSA a-1 MSA MSA
F[(a 1),(N a)] =
MSE
Within SSError N-a MSE
Total SST N-1

o When SSBL = 0, then MSE (in blocked design) = MSW (in the standard
design), so that the F-ratios in the two cases are identical
o But there are fewer degrees of freedom in the error term for the blocked
design (N-a-bl+1) than in the standard design (N-a). The loss of these b-
1 dfs results in lower power for the blocked design.

o In reality, the SSBL will never be exactly zero, but when SSBL is small
and the number of blocks is large, you will lose power.

9-69 2006 A. Karpinski


The blocking variable must be a discrete variable. Oftentimes in behavioral
research (and in both of our examples) the blocking variable is a continuous
variable that must be artificially grouped for the purpose of analysis. When
you treat a continuous variable as a discrete variable, you lose information
and power. An analysis of covariance (ANCOVA) is a similar design to a
randomized block design, except nuisance variables may be continuous.

Testing for non-additivity of treatment effects and blocks:


o If looking at the plot of the DV by blocks makes you feel uneasy (it
shouldnt!), a statistical test is available: Tukeys test for nonadditivity.

o If you have more than 1 observation per cell, then you have a factorial
design. You can calculate a SS(Bl*A) and test the interaction.

If you want to block on two factors, you can use the same procedure outlined
here. Simply combine the two factors into one block. For example, to block
on age and education:
Young and no education
Young and education
Old and no education
Old and education

9-70 2006 A. Karpinski

You might also like