Professional Documents
Culture Documents
ki Yk
Z jk
wk
W
The Model
A linear model with covariates can be written in matrix notation as
Y = X + ZC + e
where
Y X
(1)
C
e
2 ANOVA
Constraints
To reparametrize equation (1) to a full rank model, a set of non-estimable conditions is needed. The constraint imposed on non-regression models is that all parameters involving level 1 of any factor are set to zero. For regression model, the constraints are that the analysis of variance parameters estimates for each main effect and each order of interactions sum to zero. The interaction must also sum to zero over each level of subscripts. For a standard two way ANOVA model with the main effects i and j , and interaction parameter ij , the constraints can be expressed as
1 = 1 = 1 j = i1 = 0 = = i = j = 0
Computation of Matrices
X X
Non-regression Model
The X X matrix contains the sum of weights of the cases that contain a particular combination of parameters. All parameters that involve level 1 of any of the factors are excluded from the matrix. For a two-way design with k1 = 2 and k 2 = 3 , the symmetric matrix would look like the following:
2 2 2 3 22 23
N 2
2
N 22 N 2
3
N 23
0
22
N 22 N 22
0
23
N 23
0
N 3
N 23
0
N 22
N 23
The elements N i or N j on the diagonal are the sums of weights of cases that have level i of or level j of . Off-diagonal elements are sums of weights of cases cross-classified by parameter combinations. Thus, N 3 is the sum of weights of
ANOVA 3
cases in level 3 of main effect 3 , while N 22 is the sum of weights of cases with 2 and 2 .
Regression Model
A row of the design matrix X is formed for each case. The row is generated as follows: If a case belongs to one of the 2 to ki levels of factor i, a code of 1 is placed in the column corresponding to the level and 0 in all other ki 1 columns associated with factor i. If the case belongs in the first level of factor i, 1 is placed in all the ki 1 columns associated with factor i. This is repeated for each factor. The entries for the interaction terms are obtained as products of the entries in the corresponding main effect columns. This vector of dummy variables for a case will be denoted as d i , i = 1,K , NC , where NC is the number of columns in the reparametrized design matrix. After the vector d is generated for case k, the ijth cell of X X is incremented by d i d j wk , where i = 1,K , NC and j i .
16
16 1 6
After all cases have been processed, the diagonal entries of X X are examined. Rows and columns corresponding to zero diagonals are deleted and the number of levels of a factor is reduced accordingly. If a factor has only one level, the analysis will be terminated with a message. If the first specified level of a factor is missing, the first non-empty level will be deleted from the matrix for non-regression model. For regression designs, the first level cannot be missing. All entries of X X are subsequently adjusted for means. The highest order of interactions in the model can be selected. This will affect the generation of X X . If none of these options is chosen, the program will generate the highest order of interactions allowed by the number of factors. If submatrices corresponding to main effects or interactions in the reparametrized model are not of full rank, a message is printed and the order of the model is reduced accordingly. Cross-Product Matrices for Continuous Variables Provisional means algorithm are used to compute the adjusted-for-the-means crossproduct matrices.
4 ANOVA
Matrix of Covariates Z Z The covariance of covariates m and l after case k has been processed is
Z Z ml k = Z Z ml
05
w W Z - w Z W Z - w Z = = 0k - 15 +
k k k k lk j lj k mk j 1 j 1
j mj
Wk Wk -1
where Wk is the sum of weights of the first k cases. The Vector Z Y The covariance between the mth covariate and the dependent variable after case k has been processed is
Z Ym k = Z Ym k 1 +
05
0 5
wk Wk Yk
j =1
w j Yj
W Z w Z
k k mk j j =1
mj
Wk Wk 1
The Scalar Y Y The corrected sum of squares for the dependent variable after case k has been processed is
k 2 wk Wk Yk w j Y j j =1 Y Y1 k 6 = Y Y1 k 16 +
Wk Wk 1
The Vector X Y
X Yi =
Y w
k k =1
k k
ANOVA 5
where, for non-regression model, k = 1 if case k has the factor combination in column i of X X ; k = 0 otherwise. For regression model, k = d i , where
d i is the dummy variable for column i of case k. The final entries are adjusted for
the mean. Matrix X Z The (i, m)th entry is
16
16
X Z im =
Z
k =1
mk wk k
where k has been defined previously. The final entries are adjusted for the mean.
Y = X + ZC + e
can also be expressed as
Y = X k b k + X mb m + ZC + e
where X and b are partitioned as
X = X k | X m and =
b "# . !b $
k m
6 ANOVA
Z Z !
X Z k X mZ
Z X k X X k k XmXk
Z X m X Xm k XmXm
(2)
The normal equations for any reduced model can be obtained by excluding those entries from equation (2) corresponding to terms that do not appear in the reduced model. Thus, for the model excluding b m ,
Y = X k b k + ZC + e
the solution to the normal equation is:
~ C "# = Z Z ~ !b $ !X Z
k k
Z X k X Xk k
"# Z Y "# $ ! X Y$
1 k
(3)
The sum of squares due to fitting the complete model (explained SS) is
$ $ $ R C, b k , b m = C, b , b m k
Z Y "# $ $ $ X Y # = C Z Y + b X Y + b X Y k k k m m X Y# ! m $
~ ~ R C, b k = C, b k
Z Y "# = CZ Y + b X Y ~ ~ X Y$ !
k k k
The residual (unexplained) sum of squares for the complete model is RSS = Y Y R C, b k , b m and similarly for the reduced model. The total sum of squares is Y Y . The reduction in the sum of squares due to including b m in a model that already includes b k and C will be denoted as R b m | C, b k . This can also be expressed as
R b m | C, b k = R C, b k , b m R C, b k
6 1
6 1
ANOVA 7
There are several ways to compute R b m | C, b k . The sum of squares due to the full model, as well as the sum of squares due to the reduced model, can each be calculated, and the difference obtained (Method 1).
~ ~ $ $ $ R b m | C, b k = C Z Y + b X Y + b m X m Y C Z Y b k X Y k k k
A sometimes computationally more efficient procedure is to calculate
1
1
$ 1 $ R b m | C, b k = b Tm b m m
$ where b m are the estimates obtained from fitting the full model and Tm is the partition of the inverse matrix corresponding to b m (Method 2).
Z Z
m
!X Z
X Z k
X k X X k k XmXk
Z X m X Xm k Xm Xm
"# ## $
= Tkc
T !T
Tck Tk Tmk
Tcm Tkm Tm
mc
"# ## $
m "# M # M"# = m ## b= !D $ d # M # # !d $#
1 F 1 F 1
8 ANOVA
where
M Vector of main effect coefficients Vector of coefficients for main effect i
i
mi
m1 6
M excluding m i M including only m1 through m i1 Vector of interaction coefficients Vector of kth order interaction coefficients Vector of coefficients for the ith of the kth order interactions
M i
D
dk
d ki
D1
k
D k dk
C
1i 6
d k excluding d ki
Vector of covariate coefficients Covariate coefficient C excluding ci C including only c1 through ci1
C1 6
i
ci
Ci
Models
Different types of sums of squares can be calculated in ANOVA.
Covariates
16
Main Effects
Interactions
ANOVA 9
All sums of squares are calculated as described in the introduction. Reductions in sums of squares R A | B are computed using Method 1. Since all cross-product
2 1 67
matrices have been corrected for the mean, all sums of squares are adjusted for the mean.
Hierarchical Hierarchical and Covariates with Main Effects or Hierarchical and Covariates after Main Effects
Covariates
i i
Main Effects
i
R m i | C, M 1 6
i i
i R d k | C, M, D k , d k i
Interactions
1 6
k R d k | C, M , D i i
3 8
Reductions in sums of squares are calculated using Method 2, except for specifications involving the Hierarchical approach. For these, Method 1 is used. All sums of squares are adjusted for the mean.
Degrees of Freedom
Main Effects
df M =
1k 16
i i =1
10 ANOVA
Main Effect i
1k 16
i
Covariates
df c = CN
Covariate i 1
Interactions d r
Model
df Model = df M + df c +
df
r =1
F 1
ANOVA 11
Residual
W 1 df Model
Total W1
Basic Computations
Mean of Dependent Variable in Level j of Main Effect i
nij
Yij =
Yijk
k =1
nij
Grand Mean
Y =
Y
i j k
ijk
12 ANOVA
Coefficient Estimates The computation of the coefficient for the main effects only model bij and ~ coefficients for the main effects and covariates only model bij are obtained as
4 9
3 8
previously described.
mij = Yij Y
1 mij = bij
bij nij
j =2
ki
W , where bi1 = 0 .
Deviations Adjusted for Main Effects and Covariates (Only for Models with Covariates)
~ 2 mij = bij
~ bij nij
j =2
ki
~ W , where bi1 = 0 .
ANOVA 13
ETA and Beta Coefficients For each main effect i, the following are computed:
ETA i =
n 3Y
ij j =2
ki
ij
Y Y
Beta i =
n 4m 9
ij j =2
ki
1 2 ij
Y Y
Beta i =
n 4m 9
ij j =2
ki
2 2 ij
Y Y
RM
Y Y
1 6. 1 6.
R M, C Y Y
14 ANOVA
Yi =
1 X Y 6 1 X X 6
i = 1,K , CN
ii
Means for combinations involving the first level of a factor are obtained by subtraction from marginal totals.
Matrix Inversion
The Cholesky decomposition (Stewart, 1973) is used to triangularize the matrix. If the tolerance is less than 10 5 , the matrix is considered singular.
References
Andrews, F., Morgan, J., Sonquist, J., and Klem, L. 1973. Multiple classification analysis. Ann Arbor, Mich.: University of Michigan at Ann Arbor. Searle, S. R. 1966. Matrix algebra for the biological sciences. New York: John Wiley & Sons, Inc. Searle, S. R. 1971. Linear models. New York: John Wiley & Sons, Inc. Stewart, G. W. 1973. Introduction to matrix computations. New York: Academic Press.