You are on page 1of 14

ANOVA

Model and Matrix Computations


Notation
The following notation is used throughout this chapter unless otherwise stated:
N F CN Number of cases Number of factors Number of covariates Number of levels of factor i Value of the dependent variable for case k Value of the jth covariate for case k Weight for case k Sum of weights of all cases

ki Yk

Z jk
wk
W

The Model
A linear model with covariates can be written in matrix notation as

Y = X + ZC + e
where
Y X

(1)

6 Vector of parameters 1 p 16 Matrix of covariates 1 N CN 6


Vector of error terms N 1

N 1 vector of values of the dependent variable

Design matrix N p of rank q < p

C
e

Vector of covariate coefficients CN 1

2 ANOVA

Constraints
To reparametrize equation (1) to a full rank model, a set of non-estimable conditions is needed. The constraint imposed on non-regression models is that all parameters involving level 1 of any factor are set to zero. For regression model, the constraints are that the analysis of variance parameters estimates for each main effect and each order of interactions sum to zero. The interaction must also sum to zero over each level of subscripts. For a standard two way ANOVA model with the main effects i and j , and interaction parameter ij , the constraints can be expressed as

1 = 1 = 1 j = i1 = 0 = = i = j = 0

non regression regression

where indicates summation.

Computation of Matrices
X X
Non-regression Model

The X X matrix contains the sum of weights of the cases that contain a particular combination of parameters. All parameters that involve level 1 of any of the factors are excluded from the matrix. For a two-way design with k1 = 2 and k 2 = 3 , the symmetric matrix would look like the following:

2 2 2 3 22 23
N 2

2
N 22 N 2

3
N 23
0

22
N 22 N 22
0

23
N 23
0

N 3

N 23
0

N 22

N 23

The elements N i or N j on the diagonal are the sums of weights of cases that have level i of or level j of . Off-diagonal elements are sums of weights of cases cross-classified by parameter combinations. Thus, N 3 is the sum of weights of

ANOVA 3

cases in level 3 of main effect 3 , while N 22 is the sum of weights of cases with 2 and 2 .
Regression Model

A row of the design matrix X is formed for each case. The row is generated as follows: If a case belongs to one of the 2 to ki levels of factor i, a code of 1 is placed in the column corresponding to the level and 0 in all other ki 1 columns associated with factor i. If the case belongs in the first level of factor i, 1 is placed in all the ki 1 columns associated with factor i. This is repeated for each factor. The entries for the interaction terms are obtained as products of the entries in the corresponding main effect columns. This vector of dummy variables for a case will be denoted as d i , i = 1,K , NC , where NC is the number of columns in the reparametrized design matrix. After the vector d is generated for case k, the ijth cell of X X is incremented by d i d j wk , where i = 1,K , NC and j i .

16

16 1 6

Checking and Adjustment for the Mean

After all cases have been processed, the diagonal entries of X X are examined. Rows and columns corresponding to zero diagonals are deleted and the number of levels of a factor is reduced accordingly. If a factor has only one level, the analysis will be terminated with a message. If the first specified level of a factor is missing, the first non-empty level will be deleted from the matrix for non-regression model. For regression designs, the first level cannot be missing. All entries of X X are subsequently adjusted for means. The highest order of interactions in the model can be selected. This will affect the generation of X X . If none of these options is chosen, the program will generate the highest order of interactions allowed by the number of factors. If submatrices corresponding to main effects or interactions in the reparametrized model are not of full rank, a message is printed and the order of the model is reduced accordingly. Cross-Product Matrices for Continuous Variables Provisional means algorithm are used to compute the adjusted-for-the-means crossproduct matrices.

4 ANOVA

Matrix of Covariates Z Z The covariance of covariates m and l after case k has been processed is

Z Z ml k = Z Z ml

05

  w W Z - w Z  W Z - w Z    = = 0k - 15 + 
k k k k lk j lj k mk j 1 j 1

j mj

  

Wk Wk -1

where Wk is the sum of weights of the first k cases. The Vector Z Y The covariance between the mth covariate and the dependent variable after case k has been processed is

Z Ym k = Z Ym k 1 +

05

0 5

wk Wk Yk

  

j =1

w j Yj

   W Z w Z 
k k mk j j =1

mj

  

Wk Wk 1

The Scalar Y Y The corrected sum of squares for the dependent variable after case k has been processed is
k  2 wk  Wk Yk w j Y j    j =1 Y Y1 k 6 = Y Y1 k 16 +

Wk Wk 1

The Vector X Y

X Y is a vector with NC rows. The ith element is

X Yi =

Y w
k k =1

k k

ANOVA 5

where, for non-regression model, k = 1 if case k has the factor combination in column i of X X ; k = 0 otherwise. For regression model, k = d i , where

d i is the dummy variable for column i of case k. The final entries are adjusted for
the mean. Matrix X Z The (i, m)th entry is

16

16

X Z im =

Z
k =1

mk wk k

where k has been defined previously. The final entries are adjusted for the mean.

Computation of ANOVA Sum of Squares


The full rank model with covariates

Y = X + ZC + e
can also be expressed as

Y = X k b k + X mb m + ZC + e
where X and b are partitioned as

X = X k | X m and =

b "# . !b $
k m

6 ANOVA

The normal equations are then

Z Z !

X Z k X mZ

Z X k X X k k XmXk

Z X m X Xm k XmXm

$ "# C "# Z Y "# $ ## b ## = X Y ## $ $ !b $ !X Y$


k k m m

(2)

The normal equations for any reduced model can be obtained by excluding those entries from equation (2) corresponding to terms that do not appear in the reduced model. Thus, for the model excluding b m ,

Y = X k b k + ZC + e
the solution to the normal equation is:

~ C "# = Z Z ~ !b $ !X Z
k k

Z X k X Xk k

"# Z Y "# $ ! X Y$
1 k

(3)

The sum of squares due to fitting the complete model (explained SS) is

$ $ $ R C, b k , b m = C, b , b m k

Z Y "# $ $ $ X Y # = C Z Y + b X Y + b X Y k k k m m X Y# ! m $

For the reduced model, it is

~ ~ R C, b k = C, b k

Z Y "# = CZ Y + b X Y ~ ~ X Y$ !
k k k

The residual (unexplained) sum of squares for the complete model is RSS = Y Y R C, b k , b m and similarly for the reduced model. The total sum of squares is Y Y . The reduction in the sum of squares due to including b m in a model that already includes b k and C will be denoted as R b m | C, b k . This can also be expressed as

R b m | C, b k = R C, b k , b m R C, b k

6 1

6 1

ANOVA 7

There are several ways to compute R b m | C, b k . The sum of squares due to the full model, as well as the sum of squares due to the reduced model, can each be calculated, and the difference obtained (Method 1).

~ ~ $ $ $ R b m | C, b k = C Z Y + b X Y + b m X m Y C Z Y b k X Y k k k
A sometimes computationally more efficient procedure is to calculate

1
1

$ 1 $ R b m | C, b k = b Tm b m m

$ where b m are the estimates obtained from fitting the full model and Tm is the partition of the inverse matrix corresponding to b m (Method 2).

Z Z
m

!X Z

X Z k

X k X X k k XmXk

Z X m X Xm k Xm Xm

"# ## $

= Tkc

T !T

Tck Tk Tmk

Tcm Tkm Tm

mc

"# ## $

Model and Options


Notation
Let b be partitioned as

 m "# M # M"# = m ## b= !D $ d # M # # !d $#
1 F 1 F 1

8 ANOVA

where
M Vector of main effect coefficients Vector of coefficients for main effect i
i

mi

m1 6

M excluding m i M including only m1 through m i1 Vector of interaction coefficients Vector of kth order interaction coefficients Vector of coefficients for the ith of the kth order interactions

M i
D

dk

d ki
D1
k

D excluding d k D including only d1 through d k 1

D k dk
C

1i 6

d k excluding d ki
Vector of covariate coefficients Covariate coefficient C excluding ci C including only c1 through ci1

C1 6
i

ci

Ci

Models
Different types of sums of squares can be calculated in ANOVA.

Sum of Squares for Type of Effects


Experimental and Hierarchical Covariates with Main Effects Covariates after Main Effects Regression

1 6 R1C, M 6 R1C, M 6 R1C | M 6 R1M 6 R 1 C | M , D6 R1M | C, D6


RC R M |C

Covariates

16

Main Effects

Interactions

4 9 R4d k | C, M , D k 9 R4d k | C, M , D k 9 R4d k | C, M , D k 9


R d k | C, M , D k

ANOVA 9

All sums of squares are calculated as described in the introduction. Reductions in sums of squares R A | B are computed using Method 1. Since all cross-product

2 1 67

matrices have been corrected for the mean, all sums of squares are adjusted for the mean.

Sums of Squares Within Effects


Default Experimental Covariates with Main Effects Covariates after Main Effects Regression

Hierarchical Hierarchical and Covariates with Main Effects or Hierarchical and Covariates after Main Effects

  R  c | M , C1 6    R m |C, M 1 6  R  c | M , C1 6  R m | M 1 6      R c | M , C1 6 , D   R m |M 1 6 , C, D R4ci |Ci 9 R4m i | C, M i 9 R4ci | Ci , M 9 R4m i | M i 9


R ci |C1 6
i i

Covariates




i i

Main Effects
i

R m i | C, M 1 6
i i

i R d k | C, M, D k , d k i

 

Interactions

1 6  

same as default same as default

k R d k | C, M , D i i

 

3 8  

same as default same as default

Reductions in sums of squares are calculated using Method 2, except for specifications involving the Hierarchical approach. For these, Method 1 is used. All sums of squares are adjusted for the mean.

Degrees of Freedom
Main Effects

df M =

1k 16
i i =1

10 ANOVA

Main Effect i

1k 16
i

Covariates

df c = CN

Covariate i 1

Interactions d r

df r = number of linearly independent columns corresponding to interaction d r in X X

Interactions d ri df = number of independent columns corresponding to interaction d ri in X X

Model

df Model = df M + df c +

df
r =1

F 1

ANOVA 11

Residual

W 1 df Model

Total W1

Multiple Classification Analysis


Notation
Yijk nij
ki
W Value of the dependent variable for the kth case in level j of main effect i Sum of weights of observations in level j of main effect i Number of nonempty levels in the ith main effect Sum of weights of all observations

Basic Computations
Mean of Dependent Variable in Level j of Main Effect i
nij

Yij =

Yijk
k =1

nij

Grand Mean

Y =

Y
i j k

ijk

12 ANOVA

Coefficient Estimates The computation of the coefficient for the main effects only model bij and ~ coefficients for the main effects and covariates only model bij are obtained as

4 9

3 8

previously described.

Calculation of the MCA Statistics (Andrews, et al., 1973)


Deviations For each level of each main effect, the following are computed: Unadjusted Deviations The unadjusted deviation from the grand mean for the jth level of the ith factor:

mij = Yij Y

Deviations Adjusted for the Main Effects

1 mij = bij

bij nij
j =2

ki

W , where bi1 = 0 .

Deviations Adjusted for Main Effects and Covariates (Only for Models with Covariates)

~ 2 mij = bij

~ bij nij
j =2

ki

~ W , where bi1 = 0 .

ANOVA 13

ETA and Beta Coefficients For each main effect i, the following are computed:

ETA i =

n 3Y
ij j =2

ki

ij

Y Y

Beta Adjusted for Main Effects

Beta i =

n 4m 9
ij j =2

ki

1 2 ij

Y Y

Beta Adjusted for Main Effects and Covariates

Beta i =

n 4m 9
ij j =2

ki

2 2 ij

Y Y

Squared Multiple Correlation Coefficients Main effects model


2 Rm =

RM

Y Y

1 6. 1 6.

Main effects and covariates model


2 Rmc =

R M, C Y Y

The computations of R(M), R(M,C), and Y Y are outlined previously.

14 ANOVA

Unstandardized Regression Coefficients for Covariates


Estimates for the C vector, which are obtained the first time covariates are entered into the model, are printed.

Cell Means and Sample Sizes


Cell means and sample sizes for each combination of factor levels are obtained from the X Y and X X matrices prior to correction for the mean.

Yi =

1 X Y 6 1 X X 6

i = 1,K , CN

ii

Means for combinations involving the first level of a factor are obtained by subtraction from marginal totals.

Matrix Inversion
The Cholesky decomposition (Stewart, 1973) is used to triangularize the matrix. If the tolerance is less than 10 5 , the matrix is considered singular.

References
Andrews, F., Morgan, J., Sonquist, J., and Klem, L. 1973. Multiple classification analysis. Ann Arbor, Mich.: University of Michigan at Ann Arbor. Searle, S. R. 1966. Matrix algebra for the biological sciences. New York: John Wiley & Sons, Inc. Searle, S. R. 1971. Linear models. New York: John Wiley & Sons, Inc. Stewart, G. W. 1973. Introduction to matrix computations. New York: Academic Press.

You might also like