You are on page 1of 44

Chapter 2

General Linear Hypothesis and Analysis of Variance


Regression model for the general linear hypothesis
Let Y1 , Y2 ,..., Yn be a sequence of n independent random variables associated with responses. Then
we can write it as
=
E (Yi )

=
j xij , i 1, =
2,..., n, j 1, 2,..., p

j =1

Var (Yi ) = 2 .

This is the linear model in the expectation form where 1 , 2 ,..., p are the unknown parameters and
xij s are the known values of independent covariates X 1 , X 2 ,..., X p .

Alternatively, the linear model can be expressed as


=
Yi

j =1

xij , + i=
, i 1, 2,..., n;=
j 1, 2,..., p

where i s are identically and independently distributed random error component with mean 0 and
variance 2 , i.e., E ( i ) = 0 Var ( i ) = 2 and Cov( i , =
0(i j ).
j)

In matrix notations, the linear model can be expressed as


=
Y X +

where

Y = (Y1 , Y2 ,..., Yn ) ' is an n1 vector of observations on response variable,

X 11 X 12 ... X 1 p

X 21 X 22 ... X 2 p

the matrix X =
is an n p matrix of

X X ... X
np
n1 n 2

n observations on

p independent

covariates X 1 , X 2 ,..., X p ,

= ( 1 , 2 ,..., p ) is a

p 1 vector of

unknown regression parameters (or regression

coefficients) 1 , 2 ,..., p associated with X 1 , X 2 ,..., X p , respectively and

= (1 , 2 ,..., n ) is a n1 vector of random errors or disturbances.

We assume that E ( ) = 0, covariance matrix =


V ( ) E=
( ') 2 I p , rank
=
(X ) p .

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

In the context of analysis of variance and design of experiments,


the matrix X is termed as design matrix;
unknown 1 , 2 ,..., p are termed as effects,
the covariates X 1 , X 2 ,..., X p , are counter variables or indicator variables where xij counts
the number of times the effect j occurs in the ith observation xi .

xij mostly takes the values 1 or 0 but not always.

The value xij = 1 indicates the presence of effect j in xi and

xij = 0 indicates the absence of effect j in xi .

Note that in the linear regression model, the covariates are usually continuous variables.

When some of the covariates are counter variables and rest are continuous variables, then the
model is called as mixed model and is used in the analysis of covariance.

Relationship between the regression model and analysis of variance model


The same linear model is used in the linear regression analysis as well as in the analysis of variance.
So it is important to understand the role of linear model in the context of linear regression analysis
and analysis of variance.

Consider the multiple linear model


Y = 0 + X 11 + X 2 2 + ... + X p p + .

In the case of analysis of variance model,


the one-way classification considers only one covariate,
two-way classification model considers two covariates,
three-way classification model considers three covariates and so on.
If , and denote the effects associated with the covariates X , Z and W which are the counter
variables, then in
One-way model: Y =
+ X +
Two-way model: Y = + X + Z +
Three-way model : Y = + X + Z + W + and so on.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

Consider an example of agricultural yield. The study variable Y denotes the yield which depends on
various covariates X 1 , X 2 ,..., X p . In case of regression analysis, the covariates X 1 , X 2 ,..., X p are the
different variables like temperature, quantity of fertilizer, amount of irrigation etc.

Now consider the case of one-way model and try to understand its interpretation in terms of multiple
regression model. The covariate X is now measured at different levels, e.g., if X is the quantity of
fertilizer then suppose there are p possible values, say 1 Kg., 2 Kg., ,..., p Kg. then X 1 , X 2 ,..., X p
denotes these p values in the following way.
The linear model now can be expressed as
Y = o + 1 X 1 + 2 X 2 + ... + p X p + by defining.

1 if effect of 1 Kg.fertilizer is present


X1 =
0 if effect of 1 Kg.fertilizer is absent
1 if effect of 2 Kg.fertilizer is present
X2 =
0 if effect of 2 Kg.fertilizer is absent

1 if effect of p Kg.fertilizer is present


Xp =
0 if effect of p Kg.fertilizer is absent.

If effect of 1 Kg. of fertilizer is present, then other effects will obviously be absent and the linear
model is expressible as
Y = 0 + 1 ( X 1 = 1) + 2 ( X 2 = 0) + ... + p ( X p = 0) +

= 0 + 1 +

If effect of 2 Kg. of fertilizer is present then


Y = 0 + 1 ( X 1 = 0) + 2 ( X 2 = 1) + ... + p ( X p = 0) +
= 0 + 2 +

If effect of p Kg. of fertilizer is present then


Y = 0 + 1 ( X 1 = 0) + 2 ( X 2 = 0) + ... + p ( X p = 1) +
= 0 + p +

and so on.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

If the experiment with 1 Kg. of fertilizer is repeated n1 number of times then n1 observation on
response variables are recorded which can be represented as
Y11 = 0 + 1.1 + 2 .0 + ... + p .0 + 11
Y12 = 0 + 1.1 + 2 .0 + ... + p .0 + 12

Y1n1 = 0 + 1.1 + 2 .0 + ... + p .0 + 1n1

If X 2 =1 is repeated n 2 times, then on the same lines n2 number of times then n2 observation on
response variables are recorded which can be represented as
Y21 = 0 + 1.0 + 2 .1 + ... + p .0 + 21
Y22 = 0 + 1.0 + 2 .1 + ... + p .0 + 22

Y2 n2 = 0 + 1.0 + 2 .1 + ... + p .0 + 2 n2

The experiment is continued and if X p = 1 is repeated n p times, then on the same lines
Yp1 = 0 + 1.0 + 2 .0 + ... + p .1 + P1
Yp 2 = 0 + 1.0 + 2 .0 + ... + p .1 + P 2

Ypn p = 0 + 1.0 + 2 .0 + ... + p .1 + pn p

All these n1 , n2 ,.., n p observations can be represented as

y
11 1
y12 1


y1n 1
1
y21 1
y
22 1


y
2 n2 1


y p1 1

1
y p2


y pn 1
p

11

12

1n

21
0

22
1 +

2 n2
p

p1

p2

0 0 0 0 1
pn
p

1 0 0 0 0
1 0 0 0 0

1 0 0 0 0
0 1 0 0 0
0 1 0 0 0

0 1 0 0 0

0 0 0 0 1
0 0 0 0 1

or
=
Y X + .

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

In the two-way analysis of variance model, there are two covariates and the linear model is
expressible as
Y 0 + 1 X 1 + 2 X 2 + ... + p X p + 1Z1 + 2 Z 2 + ... + q Z q +
=

where X 1 , X 2 ,..., X p denotes , e.g., the p levels of quantity of fertilizer, say 1 Kg., 2 Kg.,...,p Kg.
and Z1 , Z 2 ,..., Z q denotes, e.g. , the q levels of level of irrigation, say 10 Cms., 20 Cms.,10 q Cms.
etc. The levels X 1 , X 2 ,..., X p , Z1 , Z 2 ,..., Z q are the counter variable indicating the presence or
absence of the effect as in the earlier case. If the effect of X 1 and Z1 are present, i.e., 1 Kg of
fertilizer and 10 Cms. of irrigation is used then the linear model is written as
Y = 0 + 1.1 + 2 .0 + ... + p .0 + 1.1 + 2 .0 + ... + p .0 +
= 0 + 1 + 1 + .

If X 2 = 1 and Z 2 = 1 is used, then the model is

Y = 0 + 2 + 2 + .
The design matrix can be written accordingly as in the one-way analysis of variance case.
In the three-way analysis of variance model
Y = + 1 X 1 + ... + p X p + 1Z1 + ... + q Z q + 1W1 + ... + rWr +

The regression parameters ' s can be fixed or random.

If all ' s are unknown constants, they are called as parameters of the model and the model is
called as a fixed effect model or model I. The objective in this case is to make inferences about
the parameters and the error variance 2 .

If for some j , xij = 1 for all i = 1, 2,..., n then j is termed as additive constant. In this case, j
occurs with every observation and so it is also called as general mean effect.

If all s are observable random variables except the additive constant, then the linear model is
termed as random effect model, model II or variance components model. The objective in
this case is to make inferences about the variances of ' s , i.e., 2 , 2 ,..., 2 and error
1

variance 2 and/or certain functions of them..

If some parameters are fixed and some are random variables, then the model is called as mixed
effect model or model III. In mixed effect model, at least one j is constant and at least one

j is random variable.

The objective is to make inference about the fixed effect parameters,

variance of random effects and error variance 2 .


Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

Analysis of variance
Analysis of variance is a body of statistical methods of analyzing the measurements assumed to be
structured as
y=
1 xi1 + 2 xi 2 + ... + p xip + i , =
i 1, 2,..., n
i

where xij are integers, generally 0 or 1 indicating usually the absence or presence of effects j ; and

i s are assumed to be identically and independently distributed with mean 0 and variance 2 . It
may be noted that the i s can be assumed additionally to follow a normal distribution N (0, 2 ) . It
is needed for the maximum likelihood estimation of parameters from the beginning of analysis but
in the least squares estimation, it is needed only when conducting the tests of hypothesis and the
confidence interval estimation of parameters. The least squares method does not require any
knowledge of distribution like normal upto the stage of estimation of parameters.

We need some basic concepts to develop the tools.

Least squares estimate of :


Let y1 , y2 ,..., yn be a sample of observations on Y1 , Y2 ,..., Yn . The least squares estimate of is the
values of for which the sum of squares due to errors, i.e.,
n

' ( y X )( y X )
S2 =
i2 ==
i =1

=
yy 2 X ' y + X X
is minimum where y = ( y1 , y2 ,..., yn ) . Differentiating S 2 with respect to and substituting it to
be zero, the normal equations are obtained as
dS 2
= 2 X X 2 X y= 0
d

or X X = X y .
If X has full rank p, then

( X X ) has a unique inverse and the unique least squares estimate of

is
= ( X X ) 1 X y

which is the best linear unbiased estimator of in the sense of having minimum variance in the
class of linear and unbiased estimator. If rank of X is not full, then generalized inverse is used for
finding the inverse of ( X X ).
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

If L is a linear parametric function where L = (1 , 2 ,..., p ) is a non-null vector, then the least
squares estimate of L is L .
A question arises that what are the conditions under which a linear parametric function L admits
a unique least squares estimate in the general case.

The concept of estimable function is needed to find such conditions.

Estimable functions:
A linear function

of the parameters with known is said to be an estimable parametric

function (or estimable) if there exists a linear function LY of Y such that


b
E ( LY ) = for all R .

Note that not all parametric functions are estimable.

Following results will be useful in understanding the further topics.

Theorem 1: A linear parametric function

L admits a unique least squares estimate if and only

if L is estimable.

Theorem 2 (Gauss Markoff theorem):


If the linear parametric function L

is estimable then the linear estimator L

where is a

solution of
X X = X Y

is the best linear unbiased estimator of L ' in the sense of having minimum variance in the class of
all linear and unbiased estimators of L .

Theorem 3: If the linear parametric function


=
1

'
l1=
, 2 l2' ,...,
=
k lk' are estimable , then any

linear combination of 1 , 2 ,..., k is also estimable.

Theorem 4: All linear parametric functions in are estimable if and only if X has full rank.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

If X is not of full rank, then some linear parametric functions do not admit the unbiased linear
estimators and nothing can be inferred about them. The linear parametric functions which are not
estimable are said to be confounded. A possible solution to this problem is to add linear restrictions
on so as to reduce the linear model to a full rank.

Theorem 5: Let L1' and L'2 be two estimable parametric functions and let L1' and L'2 be
their least squares estimators. Then
Var ( L1' ) = 2 L1' ( X X ) 1 L1
Cov( L' , L' ) = 2 L' ( X X ) 1 L
1

assuming that X is a full rank matrix. If not, the generalized inverse of X X can be used in place of
unique inverse.

Estimator of 2 based on least squares estimation:


Consider an estimator of 2 as,
1
( y X )( y X )
n p
1
=
[ y X ( X X ) 1 X ' y ][Y X ( X X ) 1 X y ]
n p
1
= y '[ I X ( X X ) 1 X '][ I X ( X X ) 1 X ] y
n p
1
=
y '[ I X ( X X ) 1 X ] y
n p

2 =

where the hat matrix

[ I X ( X X ) 1 X ] is an idempotent matrix with its trace as

tr [ I X ( X X ) 1 ] =
trI trX ( X X ) 1 X
=
n tr ( X X ) 1 X X (using the result tr ( AB) =
tr ( BA))
= n tr I p
= n p.

Note that using E ( y ' Ay )= ' A + tr ( A) , we have


=
E ( 2 )

2
n p

tr[ I X ( X X ) 1 X ]

= 2

and so 2 is an unbiased estimator of 2 .

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

Maximum Likelihood Estimation


The least squares method does not uses any distribution of the random variables in case of the
estimation of parameters. We need the distributional assumption in case of least squares only while
constructing the tests for hypothesis and the confidence intervals. For maximum likelihood
estimation, we need the distributional assumption from the beginning.

Suppose y1 , y2 ,..., yn are independently and identically distributed following a normal distribution
p

with mean E ( yi ) = j xij and variance Var ( yi ) = 2 (i =1, 2,, n). Then the likelihood function
j =1

of y1 , y2 ,..., yn is

L( =
y , 2 )

1
n
2

n
2 2

(2 ) ( )

exp 2 ( y X )( y X )
2

where y = ( y1 , y2 ,..., yn ) . Then


n
n
1
log 2 log 2
L=
ln L( y , 2 ) =
( y X )( y X ).
2
2
2 2

Differentiating the log likelihood with respect to

and 2 , we have

L
X y
=
0 X X =

L
1
=0 2 = ( y X )( y X )
2
n

Assuming the full rank of X , the normal equations are solved and the maximum likelihood
estimators are obtained as

= ( X X ) 1 X y
1
2 =( y X )( y X )
n
1
=
y I X ( X X ) 1 X y.
n
The second order differentiation conditions can be checked and they are satisfied for and 2 to be
the maximum likelihood estimators.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

Note that in the maximum likelihood estimator is same as the least squares estimator and

is an unbiased estimator of ,i.e., E ( ) = unlike the least squares estimator but

2 is not an unbiased estimator of 2 ,i.e.,=


E ( 2 )

n p 2
2 like the least squares
n

estimator.

Now we use the following theorems for developing the test of hypothesis.

Theorem 6: Let Y = (Y1 , Y2 ,..., Yn ) follow a multivariate normal distribution

N ( , ) with mean

vector and positive definite covariance matrix . Then Y AY follows a noncentral chi-square
distribution with p degrees of freedom and noncentrality parameter A , i.e., 2 ( p, A ) if and
only if A is an idempotent matrix of rank p.

Theorem 7: Let Y = (Y1 , Y2 ,..., Yn ) follows a multivariate normal distribution

N ( , )

with

follows 2 ( p1 , A1 )
mean vector and positive definite covariance matrix . Let Y AY
1
and Y A2Y follows

2 ( p2 , A2 ) .

Then

Y AY
and Y A2Y are independently distributed if
1

A1A2 =
0.

Theorem 8: Let Y = (Y1 , Y2 ,..., Yn ) follows a multivariate normal distribution N ( , 2 I ) , then


the maximum likelihood (or least squares) estimator L of estimable linear parametric function

n 2
1

is independently distributed of ; L follow N L , L( X X ) L and


follows 2 (n p )
2

where rank ( X ) = p.

Proof: Consider = ( X X )1 X Y , then


E ( L ) = L( X X ) 1 X E (Y )
= L( X X ) 1 X X
= L
Var ( L ) = LVar ( ) L
= LE ( )( ) L
= 2 L( X X ) 1 L.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

10

Since is a linear function of y and L is a linear function of , so L follows a normal


distribution

N L , 2 L( X X ) 1 L . Let

A= I X ( X X ) 1 X

and B = L( X X ) 1 X ,

then

( X X ) 1 X Y BY
=
L L=

and

and n 2 =
(Y X ) ' I X ( X X )1 X (Y X ) =
Y ' AY .
So, using Theorem 6 with rank(A) = n - p,

n 2
follows a 2 (n p ) . Also
2

=
BA L( X X ) 1 X L( X X ) 1 X X ( X X ) 1 X
= 0.
So using Theorem 7, Y ' AY and BY are independently distributed.

Tests of Hypothesis in the Linear Regression Model


First we discuss the development of the tests of hypothesis concerning the parameters of a linear
regression model. These tests of hypothesis will be used later in the development of tests based on
the analysis of variance.

Analysis of Variance
The technique in the analysis of variance involves the breaking down of total variation

into

orthogonal components. Each orthogonal factor represents the variation due to a particular factor
contributing in the total variation.

Model
Let

Y1 , Y2 ,..., Yn be independently distributed following a normal distribution with mean


p

E (Yi ) = j xij and variance


j =1

2 . Denoting Y = (Y1 , Y2 ,..., Yn ) a n1 column vector, such

assumption can be expressed in the form of a linear regression model


=
Y X +

where X is a n p matrix, is a p 1 vector and

is a n1 vector of disturbances with

E ( ) = 0

Cov( ) = 2 I and follows a normal distribution.


This implies that
E (Y ) = X

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

11

E (Y X )(Y X ) =
2I.
Now we consider four different types of tests of hypothesis .
In the first two cases, we develop the likelihood ratio test for the null hypothesis related to the
analysis of variance. Note that, later we will derive the same test on the basis of least squares
principle also. An important idea behind the development of this test is to demonstrate that the test
used in the analysis of variance can be derived using least squares principle as well as likelihood
ratio test.

Case 1: Consider the null hypothesis for testing

H0 : = 0

where (=
=
1 , 2 ,..., p ), 0 ( 10 , 20 ,..., p0 ) ' is specified and

2 is unknown. This null

hypothesis is equivalent to
0
H=
=
20 ,...,
=
p p0 .
0 : 1
1 , 2

Assume that all i ' s are estimable, i.e., rank ( X ) = p (full column rank). We now develop the
likelihood ratio test.
The ( p + 1) 1 dimensional parametric space is a collection of points such that

{( ,

2
); < i < , =
> 0, i 1, 2,... p} .

Under H 0 , all s s are known and equal, say 0 all are known and the reduces to one
dimensional space given by

{(

, 2 ); 2 > 0} .

The likelihood function of y1 , y2 ,..., yn is


n

1 2
1

, )
exp 2 ( y X 0 )( y X 0 )
L( y =
2
2
2

The likelihood function is maximum over when and 2 are substituted with their maximum
likelihood estimators, i.e.,

= ( X X ) 1 X y
1
n

2 =( y X )( y X ).
Substituting and 2 in L( y | , 2 ) gives

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

12

1 2
1

, )
exp 2 ( y X )( y X )
Max L( y =
2

2
2

2
n
n
=
exp .

2
2 ( y X )( y X )
1
Under H 0 , the maximum likelihood estimator of 2 is 2 =
( y X 0 )( y X 0 ).
n

The maximum value of the likelihood function under H 0 is


n

1 2
1

, )
exp 2 ( y X 0 )( y X 0 )
Max L( y =

2
2

n
2

n
n
exp
=
0
0
2
2 ( y X )( y X )

The likelihood ratio test statistic is

Max L( y , 2 )

Max L( y , 2 )

( y X )( y X ) 2
=
0
0
( y X )( y X )
n

2
)( y X )

(
y
X

=
'

0
0

( y X ) + ( X X ) ( y X ) + ( X X )
( 0 ) ' X X ( 0 )
= 1 +

( y X )( y X )

q
= 1 + 1
q2

n
2

n
2

where q2 =
( y X )( y X )
and q =
( 0 ) X X ( 0 ).
1

The expression of q1 and q2 can be further simplified as follows.


Consider

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

13

q1 =
( 0 ) X X ( 0 )
( X X ) 1 X y 0 X X ( X X ) 1 X y 0
=
( X X ) 1 X ( y X 0 ) X X ( X X ) 1 X ( y X 0 )
=
=
( y X 0 ) X ( X X ) 1 X X ( X X ) 1 X ( y X 0 )
=
( y X 0 ) X ( X X ) 1 X ( y X 0 )
q2 =
( y X )( y X )
y X ( X X ) 1 X y y X ( X X ) 1 X y
=
= y I X ( X X ) 1 X y
=
[( y X 0 ) + X 0 ][ I X ( X X ) 1 X '][( y X 0 ) + X 0 ]
( y X 0 )[ I X ( X X ) 1 X ]( y X 0 )
=

Other two terms become zero using

[ I X ( X X ) 1 X ] X =
0
In order to find out the decision rule for H 0 based on , first we need to find if is a monotonic
increasing or decreasing function of
Let g =

q1
. So we proceed as follows:
q2

q1
, so that
q2

q
= 1 + 1
q2
= (1 + g )

then
d
n
=
dg
2

n
2

n
2

1
n

(1 + g ) 2

+1

So as g increases, decreases.
Thus is a monotonic decreasing function of
The decision rule is to reject

q1
.
q2

H 0 if 0 where 0 is a constant to be determined on the basis

of size of the test . Let us simplify this in our context.


Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

14

0
q
or 1 + 1
q2

or

(1 + g )

n
2

n
2

or (1 + g ) 0

2
n

or g 0 n 1
or g C
where C is a constant to be determined by the size condition of the test.
So reject H 0 whenever
Note that the statistic

q1
C.
q2

q1
q2

can also be obtained by the least squares method as follows. The least

squares methodology will also be discussed in further lectures.


( 0 ) X X ( 0 )
q1 =
q1

Min( y X )( y X )

sum
of
squares
due to
deviation
from H o
OR
sum of
squares due
to

Min( y X )( y X )

sum
of
squares
due to H o

sum
of
squares
due to error

OR
Totalsum of
squares

It will be seen later that the test statistic will be based on the ratio
distribution of

q1
. In order to find an appropriate
q2

q1
, we use the following theorem:
q2

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

15

Theorem 9: Let
Z= Y X 0
Q1 = Z X ( X X ) 1 X ' Z
=
Q2 Z [ I X ( X X ) 1 X ]Z .
Q1

Then
and

Q2

and

Q2

are independently distributed. Further, when H 0 is true , then

Q1

~ 2 ( p)

2
2
~ 2 (n p ) where (m) denotes the distribution with m degrees of freedom.

Proof: Under H 0 ,
E (Z ) = X 0 X 0 = 0
( Z ) Var
(Y ) 2 I .
Var
=
=

Further Z is a linear function of Y and Y follows a normal distribution. So

Z ~ N (0, 2 I )
The matrices X ( X X ) 1 X and [ I X ( X X ) 1 X ] are idempotent matrices. So
1
tr [ X ( X X ) =
X ] tr[( X X ) 1=
X X ] tr=
(I p ) p

tr [ I X ( X X ) 1 X ] =
tr I n tr[ X ( X X ) 1 X ] =
n p

So using theorem 6, we can write that under H 0


Q1

~ 2 ( p)

and

Q2

~ 2 (n p)

where the degrees of freedom p and (n p ) are obtained by the trace of X ( X X ) 1 X and trace
of I X ( X X ) 1 X , respectively.

Since

I X ( X X ) 1 X X ( X X ) 1 X =
0,
So using theorem 7, the quadratic forms Q1 and Q2 are independent under H 0 .
Hence the theorem is proved.
Since Q 1 and Q 2 are independently distributed, so under H 0
Q1 / p
follows a central F distribution, i.e.
Q2 /(n p )

n p Q1

F ( p, n p).
p

Q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

16

Hence the constant C in the likelihood ratio test statistic is given by

=
C F1 ( p, n p)
where F1 (n1 , n2 ) denotes the upper 100 % points of F-distribution with n1 and n2 degrees of
freedom.

The computations of this test of hypothesis can be represented in the form of an analysis of variance
table.
ANOVA for testing H 0 : = 0
______________________________________________________________________________
Source of
Degrees
Sum of
Mean
F-value
variation
of freedom
squares
squares
______________________________________________________________________________
q1
n p q1
q1
Due to
p

p
p q2
Error

Total

n p

q2

q2
(n p)

( y X 0 )( y X 0 )

Case 2: Test of a subset of parameters

0
H 0=
: k =
1, 2,.., r < p when r +1 , r + 2 ,..., p and
k ,k

2 are unknown.

In case 1 , the test of hypothesis was developed when all s are considered in the sense that we test
0
for each=
i =
1, 2,..., p. Now consider another situation, in which the interest is to test only a
i , i

subset of 1 , 2 ,..., p , i.e., not all but only few parameters. This type of test of hypothesis can be
used, e.g., in the following situation. Suppose five levels of voltage are applied to check the
rotations per minute (rpm) of a fan at 160 volts, 180 volts, 200 volts, 220 volts and 240 volts. It can
be realized in practice that when the voltage is low, then the rpm at 160, 180 and 200 volts can be
observed easily. At 220 and 240 volts, the fan rotates at the full speed and there is not much
difference in the rotations per minute at these voltages. So the interest of the experimenter lies in
testing the hypothesis related to only first three effects, viz., 1 , for 160 volts, 2 for 180 volts
and 3 for 200 volts. The null hypothesis in this case can be written as:
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

17

0
0
0
=
=
30
H=
0 : 1
1 , 2
2 , 3

when 4 , 5 and 2 are unknown.


Note that under case 1, the null hypothesis will be
0
0
0
0
0
=
=
=
=
50 .
H=
0 : 1
1 , 2
2 , 3
3 , 4
4 , 5

Let 1 , 2 ,..., p be the p parameters.


We can divide them into two parts such that out of 1 , 2 ,..., r , r +1 ,..., p and we are interested in
testing a hypothesis of a subset of it.
Suppose, We want to test the null hypothesis
0
H 0=
: k =
1, 2,.., r < p when r +1 , r + 2 ,..., p and 2 are unknown.
k ,k

The alternative hypothesis under consideration is H1 : k k0 for at least one k =


1, 2,.., r < p.
In order to develop a test for such a hypothesis, the linear model
=
Y X +

under the usual assumptions can be rewritten as follows:


(1)
Partition X = ( X 1 X 2 ) , =

(2)

where (1) = ( 1 , 2 ,..., r ) , (2) = ( r +1 , r + 2 ,..., p )


with order as X 1 : n r , X 2 : n ( p r ), (1) : r 1 and (2) : ( p r ) 1.

The model can be rewritten as


=
Y X +
(1)
= ( X1 X 2 )
+
(2)
= X 1 (1) + X 2 (2) +

The null hypothesis of interest is now


0
H 0 : =
=
( 10 , 20 ,..., r0 ) where (2) and 2 are unknown.
(1)
(1)

The complete parametric space is

{( ,

2
); < i < , =
> 0, i 1, 2,..., p}

and sample space under H 0 is

{(

0
(1)

, (2) , 2 ); < i < , 2 > 0, i= r + 1, r + 2,..., p} .

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

18

The likelihood function is


n

1 2
1

, 2 )
exp 2 ( y X )( y X ) .
L( y =
2
2
2

The maximum value of likelihood function under is obtained by substituting the maximum
likelihood estimates of and 2 , i.e.,

= ( X X ) 1 X y
1
n

2 =( y X )( y X )
as
n

1 2
1

, 2 )
exp 2 ( y X )( y X )
Max L( y =

2
2

2
n
n
exp .
=

'
2
2 ( y X ) ( y X )
Now we find the maximum value of likelihood function under H 0 . The model under H 0 becomes
0
+ X 2 2 + . The likelihood function under H 0 is
Y = X 1 (1)

1 2
1

0
0
exp 2 ( y X 1 (1)
X 2 (2) )( y X 1 (1)
X 2 (2) )
L( y =
, 2 )
2
2
2

1 2
1

exp 2 ( y * X 2 (2) )( y * X 2 (2) )


=
2
2
2

(0)
where y*= y X 1 (1)
. Note that (2) and 2 are the unknown parameters . This likelihood function

looks like as if it is written for y* ~ N ( X 2 (2) , 2 ).


This helps is writing the maximum likelihood estimators of (2) and 2 directly as

(2) = ( X 2' X 2 ) 1 X 2' y *


1
n

( y * X 2 (2) )( y * X 2 (2) ).
2 =

Note that X 2' X 2 is a principal minor of X X . Since X X is a positive definite matrix, so X 2' X 2 is
also positive definite . Thus ( X 2' X 2 ) 1 exists and is unique.

Thus the maximum value of likelihood function under H 0 is obtained as

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

19

1 2
1

exp 2 ( y * X 2 (2) )( y * X 2 (2) )


Max L( y * ,=
)
2

2
2

2
n
n

exp

2
2 ( y * X 2 (2) ) '( y * X 2 (2) )
0
The likelihood ratio test statistic for H 0 : (1) = (1)
is

max L( y , 2 )

max L( y , 2 )

( y X )( y X )
=

( y * X 2 (2) )( y * X 2 (2) )

n
2

( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X ) + ( y X )( y X )
=

( y X )( y X )

( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X )
= 1 +

( y X )( y X )

-n
q1 2
= 1 +
q2

- n2

n
2

where q1 = ( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X ) and q2 =


( y X )( y X ).
Now we simplify q1 and q2 .
Consider
( y * X 2 (2) )( y * X 2 (2) ) =
= ( y * X 2 ( X 2' X 2 ) 1 X 2' y*)( y * X 2 ( X 2' X 2 ) 1 X 2' y*)
= y *' I X 2 ( X 2' X 2 ) 1 X 2' y *
'

0
0
( y X 1 (1)
=
X 2 (2) ) + X 2 (2) I X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) ) + X 2 (2)
0
0
= ( y X 1 (1)
X 2 (2) ) I X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) ).

0.
The other terms becomes zero using the result X 2' I X 2 ( X 2' X 2 ) 1 X 2' =
0
0
Note that under H 0 , X 1 (1)
+ X 2 (2) can be expressed as ( X 1 X 2 )( (1)
(2) ) ' ,

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

20

Consider

( y X )( y X ) =
= ( y X ( X ' X ) 1 X ' y )( y X ( X ' X ) 1 X ' y )
= y I X ( X X ) 1 X ' y

0
0
0
0
X 2 (2) ) + X 1 (1)
+ X 2 (2) ) I X ( X ' X ) 1 X ( y X 1 (1)
X 2 (2) ) + X 1 (1)
+ X 2 (2) )
= ( y X 1 (1)
0
0
( y X 1 (1)
X 2 (2) ) I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) )
0
0
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) )
= ( y X 1 (1)

0.
and other term becomes zero using the result X ' I X ( X X ) 1 X =
0
Note that under H 0 , the term X 1 (1)
+ X 2 (2) can be expressed as ( X 1

0
X 2 )( (1)

(2) ) ' .

Thus

q1 = ( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X )
= y *' I X 2 ( X 2 X 2 ) 1 X 2 y * y ' I X ( X X ) 1 X y
0
0
=
X 2 (2) ) I X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) )
( y X 1 (1)
0
0
( y X 1 (1)
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) )
0
0
=
( y X 1 (1)
X 2 (2) ) X ( X X ) 1 X X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) )

( y X )( y X )
q2 =

= y [ I X ( X X ) X ] y
0
0
( y X 1 (1)
=
X 2 (2) ) + ( X 1 (1)
+ X 2 (2) ) [ I X ( X X ) X ]
0
0
( y X 1 (1)
X 2 (2) ) + ( X 1 (1)
+ X 2 (2) )
'

0
0
=
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) ).
( y X 1 (1)

Other terms become zero. Note that in simplifying the terms q1 and q2 , we tried to write them in the
0
quadratic form with same variable ( y X 1 (1)
X 2 (2) ).

Using the same argument as in the case 1, we can say that since
function of

is a monotonic decreasing

q1
, so the likelihood ratio test rejects H 0 whenever
q2

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

21

q1
>C
q2

where C is a constant to be determined by the size of the test.

The likelihood ratio test statistic can also be obtained through least squares method as follows:
0
(q1 + q2 ) : Minimum value of ( y X )( y X ) when H 0 : (1) = (1)
holds true.

: Sum of squares due to H 0

q2

: Sum of squares due to error.

q1

: Sum of squares due to the deviation from H 0 or sum of squares due to (1) adjusted
for (2) .

0
If (1)
= 0 then

q1 = ( y X 2 (2) ) '( y X 2 (2) ) ( y X ) '( y X )


'
X 2' y ) ( yy X y )
= ( yy (2)

= X y

'
X 2' y.
(2)

sum of squares
due to (2)

Reduction
sum of squares
or
sum of squares
due to

ignoring (1)

Now we have the following theorem based on the Theorems 6 and 7.

Theorem 10: Let


0
Z=
Y X 1 (1)
X 2 (2)

Q1 = Z AZ
Q2 = Z BZ
=
where A X ( X X ) 1 X ' X 2 ( X 2' X 2 ) 1 X 2'
B= I X ( X X ) 1 X '.
Q1

Q2

Q1

Q2

~ 2 (n p ).
2

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

Then

and

are independently distributed. Further

~ 2 (r ) and

22

Thus under H 0 ,
Q1 / r
n p Q1
follow a F-distribution F (r , n p ).
=
Q2 /(n p )
r Q2

Hence the constant C in is

=
C F1 (r , n p ),
where F1 (r , n p ) denotes the upper % points on F-distribution with r and (n p ) degrees of
freedom.

The analysis of variance table for this null hypothesis is as follows:


0
ANOVA for testing H 0 : (1) = (1)

Source of
Variation

Degrees
of
Freedom

Sum of
squares

Due to (1)

q1

Error

n p

q2

Total

n ( p r)

Mean
squares

q1
r
q2
(n p)

F-value

n p q1

r q2

q1 + q2

Case 3: Test of H 0 : L =
Let us consider the test of hypothesis related to a linear parametric function. Assuming that the linear
parameter function L is estimable where L = (1 , 2 ,..., p ) is a p 1 vector of known constants
and = ( 1 , 2 ,..., p ) . The null hypothesis of interest is

H 0 : L = .
where is some specified constant.
2
Consider the set up of linear model=
Y X + where Y = (Y1 , Y2 ,..., Yn ) follows N ( X , I ). The

maximum likelihood estimators of and 2 are


= ( X X ) 1 X y
1
and 2 =( y X )( y X )
n

respectively. The maximum likelihood estimate of estimable L is L , with


Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

23

E ( L ' ) = L
Cov( L ) = 2 L( X X ) 1 L
L ~ N L , 2 L( X X ) 1 L

and

n 2

~ 2 (n p)

assuming X to be the full column rank matrix. Further,

n 2
are also independently
L and
2

distributed.
Under H 0 : L = , the statistic
(n p )( L )

t=

n 2 L( X X ) 1 L

follows a t-distribution with

(n p ) degrees of freedom. So the test for

H 0 : L = against

H1 : L rejects H 0 whenever
t t

(n p)

where t1 (n1 ) denotes the upper % points on t-distribution with n1 degrees of freedom.

: 1 =
2 ,...,
=
k k
Case 4: Test of H 0=
1 , 2
Now we develop the test of hypothesis related to more than one linear parametric functions. Let the
ith estimable linear parametric function is
i = L'i and there are k such functions with Li and both being p 1 vectors as in the Case 3.

Our interest is to test the hypothesis

H 0=
: 1 =
2 ,...,
=
k k
1 , 2
where 1 , 2 ,..., k are the known constants.
Let = (1 , 2 ,..., k ) and = (1 , 2 ,.., k ).


: L=
Then H 0 is expressible as H 0=
where L is a k p matrix of constants associated with L1 , L2 ,..., Lk . The maximum likelihood
estimator of i is : i = L'i where = ( X X ) 1 X y.
, ,..., ) L .
Then (=
=
1
2
k

Also E () =
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

24

Cov() = 2V

V = (( L'i ( X X ) 1 L j ))

where

where

( L'i ( X X ) 1 L j )

is the

(i, j )th

element of V . Thus

( )V 1 ( )

2
follows a 2 distribution with k degrees of freedom and

n 2

follows a 2 distribution with (n p ) degrees of freedom where


1
n

2 =( y X )( y X ) is the maximum likelihood estimator of 2 .


Further

( )V 1 ( )

and

n 2

are also independently distributed.

Thus under H 0 : =

( )V 1 ( )

2
n

(n p)

n p ( )V 1 ( )
or

n 2
k
follows F- distribution with k and (n p ) degrees of freedom. So the hypothesis H 0 : = is

1, 2,..., k whenever F F1 (k , n p ) where


rejected against H1 : At least one i for i =
F1 (k , n p ) denotes the 100 % points on F-distribution with k and (n p) degrees of freedom.

One-way classification with fixed effect linear models of full rank:


The objective in the one-way classification is to test the hypothesis about the equality of means on
the basis of several samples which have been drawn from univariate normal populations with
different means but the same variances.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

25

Let there be p univariate normal populations and samples of different sizes are drawn from each of
the population. Let yij ( j = 1, 2,..., ni ) be a random sample from the ith normal population with mean

i and variance 2 , i = 1, 2,..., p , i.e.


2,..., ni ; i 1, 2,..., p.
i , 2 ), j 1,=
Yij ~ N (=

The random samples from different population are assumed to be independent of each other.

These observations follow the set up of linear model


Y X +
=

where

Y = (Y11 , Y12 ,..., Y1n1 , Y21 ,..., Y2 n2 ,..., Yp1 , Yp 2 ,..., Ypn p ) '
y = ( y11 , y12 ,..., y1n1 , y21 ,..., y2 n2 ,..., y p1 , y p 2 ,..., y pn p ) '

= ( 1 , 2 ,..., p )
= (11 , 12 ,..., 1n , 21 ,..., 2 n ,..., p1 , p 2 ,..., pn ) '
1

1 0...0

n1 values
1 0 0

0 1...0

n values
2

X =
0 1...0

0 0...1

n p values

0 0...1

1 if i occurs in the j th observation x j

xij = or if effect i is present in x j

0 if effect i is absent in x j
p

n = ni .
i =1

So X is a matrix of order n p, is fixed and


-

first n1 rows of are 1' = (1, 0, 0,..., 0),

next n2 rows of are 2' = (0,1, 0,..., 0)

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

26

and similarly the last n p rows of are 'p = (0, 0,..., 0,1).

Obviously, rank
=
, E (Y ) X and Cov(Y ) = 2 I .
( X ) p=
This completes the representation of a fixed effect linear model of full rank.
The null hypothesis of interest is H 0 : 1= 2= ...= p= (say)
and H1 : At least one i j (i j )
where and 2 are unknown.

We would develop here the likelihood ratio test. It may be noted that the same test can also be
derived through the least squares method. This will be demonstrated in the next module. This way
the readers will understand both the methods.
We already have developed the likelihood ratio for the hypothesis H 0 : 1= 2= ...= p in the case
1.
The

whole

{( ,

parametric

space

is

( p + 1)

dimensional

space

2
) : < < ,
=
> 0, i 1, 2,..., p} .

Note that there are ( p + 1) parameters are 1 , 2 ,..., p and 2 .


Under H 0 , reduces to two dimensional space as

{( ,

); < < , 2 > 0} . .

The likelihood function under is


n

1 p ni
1 2
2
exp
L( y , ) =
2 ( yij i )
2
2
2 =i 1 =j 1

1 p ni
n
ln (2 2 ) 2 ( yij i ) 2
ln L( y , 2 ) =
L=
2
2 =i 1 =j 1
L
1
=0 i =
i
ni

ni

y
j =1

ij

=yio

L
1 p ni
2

0
( yij yio )2 .
2
n=i 1 =j 1

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

27

The dot sign ( o ) in yio indicates that the average has been taken over the second subscript j. The
Hessian matrix of second order partial derivation of ln L with respect to i and 2 is negative
definite at = yio and 2 = 2 which ensures that the likelihood function is maximized at these
values.
Thus the maximum value of L( y , 2 ) over is
n

1 p ni

1 2
2 ( yij i ) 2
Max L( y , 2 ) =
exp

2
2 =i 1 =j 1

=
p ni

2
2 ( yij yio )
=i 1 =j 1

n/2

n
exp .
2

The likelihood function under is


n

1 p ni

1 2
exp 2 ( yij ) 2
L( y , ) =
2
2
2 =i 1 =j 1

1 p ni
n
ln L( y , 2 ) =
ln(2 2 ) 2 ( yij ) 2
2
2 =i 1 =j 1

The normal equations and the least squares are obtained as follows:
ln L( y , 2 )

ln L( y , 2 )
2

ni

1
=0 = yij = yoo
n=i 1 =j 1
1 p ni
=0 2 = ( yij yoo ) 2 .
n=i 1 =j 1

The maximum value of the likelihood function over under H 0 is


n

1 p ni

1 2
exp
2 ( yij ) 2
Max L( y , 2 ) =

2
2 =i 1 =j 1

=
p ni

2
2 ( yij yoo )
=i 1 =j 1

n/2

n
exp .
2

The likelihood ratio test statistic is


Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

28

Max L( y , 2 )

Max L( y , 2 )

p ni
2
( yij yio )
i 1 j 1

= =p =ni

2
( yij yoo )
=i 1 =j 1

n/2

We have that
p

ni

( y

ij

=i 1 =j 1

=
yoo )
2

ni

( y

ij

=i 1 =j 1
p

ni

( y

=i 1 =j 1

ij

yio ) + ( yio yoo )


p

yio ) + ni ( yio yoo ) 2


2

=i 1

Thus
n
2

yi ) 2 + ni ( yio yoo ) 2
I 1
=

p ni

( yij yio ) 2

=i 1 =j 1

ni

( y

ij

=i 1 =j 1

q
= 1 + 1
q2

n
2

where

=
q1

n (y
i =1

io

yoo ) ,=
and q2
2

ni

( y

=i 1 =j 1

ij

yio ) 2 .

Note that if the least squares principal is used, then


q 1 : sum of squares due to deviations from H 0 or the between population sum of squares,
q 2 : sum of squares due to error or the within population sum of squares,
q 1 +q 2 : sum of squares due to H 0 or the total sum of squares.
Using the theorems 6 and 7, let
p

Q1 =
ni (Yio Yoo ) ,
2

Q2 =
Si2

i 1 =i 1

where

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

29

ni

(Y

=
Si2

ij

i =1

Yio ) 2

Yoo =

1 p ni
Yij
n=i 1 =j 1

Yio =

1
ni

ni

Y
j =1

ij

then under H 0
Q1

Q2

~ 2 ( p 1)
~ 2 (n p)

and

Q1

Q2

and

are independently distributed.

Thus under H 0
Q1
2

p 1

~ F ( p 1, n p ).
Q
2
2

n p

The likelihood ratio test reject H 0 whenever


q1
>C
q2

C F1 ( p 1, n p).
where the constant =
The analysis of variance table for the one-way classification in fixed effect model is
Source of
Variation

Degrees of
freedom

Sum of
squares

Between Population

p 1

q1

Within Population

n p

q2

Total

n 1

Mean sum
of squares

q1
p 1
q2
n p

n p q1

.
p

q2

q1 + q2

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

30

Note that

Q
E 2 =2
n p
p

( i )2

Q1
1 p
2
i =1
=
+
=

;
E
i

p 1
p i =1
p 1

Case of rejection of

H0

If F > F1 ( p 1, n p ), then H 0 : 1= 2= ...= p is rejected. This means that at least one i is


different from other which is responsible for the rejection. So objective is to investigate and find out
such i and divide the population into group such that the means of populations within the groups
are same. This can be done by pairwise testing of ' s.

: i k (i k ) against H1 : i k .
Test H 0=
This can be tested using following t-statistic

t=

Yio Yko
1 1
s2 +
ni nk

which follows the t distribution with (n p ) degrees of freedom under H 0 and s 2 =

q2
. Thus
n p

the decision rule is to reject H 0 at level if the observed difference

( yio yko ) > t

The quantity t

1 , n p
2

1 , n p
2

1 1
s2 +
ni nk
1 1
s 2 + is called the critical difference.
ni nk

Thus following steps are followed :


1. Compute all possible critical differences arising out of all possible pair ( i , k ), i k =
1, 2,..., p .
2. Compare them with their observed differences
3. Divide the p populations into different groups such that the populations in the same group have
same means.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

31

The computation are simplified if ni = n for all i. In such a case , the common critical difference
(CCD) is
CCD = t

1 , n p
2

2s 2
n

and the observed difference ( yio yko ), i k are compared with CCD.
If yio yko > CCD
then the corresponding effects/means yio and yko are coming from populations with the different
means.

Note: In general we say that if there are three effects 1 , 2 , 3

then

2 ( denote as event A) is accepted


if H 01 : =
1
3 ( denote as event B) is accepted
and if H 02 : =
2

2 ( denote as event C ) will be accepted. The question arises here that in what sense
then H 03 : =
1
do we conclude such statement about the acceptance of H 03. The reason is as follows:
Since event A B C , so
P ( A B ) P (C )

In this sense if the probability of an event is higher than the intersection of the events, i.e., the
probability that H 03 is accepted is higher than the probability of acceptance of H 01 and H 02
both, so we conclude , in general , that the acceptance of H 01 and H 02 imply the acceptance of

H 03 .

Multiple comparison test:


One interest in the analysis of variance is to decide whether population means are equal or not. If the
hypothesis of equal means is rejected then one would like to divide the populations into subgroups
such that all populations with same means come to the same subgroup. This can be achieved by the
multiple comparison tests.
A multiple comparison test procedure conducts the test of hypothesis for all the pairs of effects and
compare them at a significance level i.e., it works on per comparison basis.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

32

This is based mainly on the t-statistic. If we want to ensure that the significance level
simultaneously for all group comparison of interest, the approximate multiple test procedure is one
that controls the error rate per experiment basis.
There are various available multiple comparison tests. We will discuss some of them in the context
of one-way classification. In two-way or higher classification, they can be used on similar lines.

1. Studentized range test:


It is assumed in the Studentized range test that the p samples, each of size n, have been drawn from p
normal populations. Let their sample means be y1o , y2 o ,..., y po These means are ranked and arranged
and y *p Max
in an ascending order as y1* , y2* ,..., y *p where y1* = Min yio=
yio , i 1, 2,..., p.
=
i

Find the range as =


R y *p y1* .
The Studentized range is defined as
q p, n p =

R n
s

where q , p , is the upper % point of Studentized range when = n p. The tables for q , p , are
available.
The testing procedure involves the comparison of q p , with q , p , in the usual way as1= 2= ...= p

if q p ,n p < q , p ,n p then conclude that

if q p ,n p > q , p ,n p then all s in the group are not the same.

2. Student-Newman-Keuls test:
The Student-Newman-Keuls test is similar to Studentized range test in the sense that the range is
compared with % points on critical Studentized range W P given by

W p = q , p ,

s2
.
n

= y *p y1* is now compared with W p .


The observed range R

If R < W p then stop the process of comparison and conclude that 1= 2= ...= p .

If R > W p then

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

33

(i)

divide the ranked means y1* , y2* ,..., y *p into two subgroups containing - ( y *p , y *p 1 ,..., y2* )
and ( y *p 1 , y *p 2 ,..., y1* )

(ii)

y *p y2* and =
Compute the ranges R=
R2 y *p 1 y1* . Then compare the ranges R1 and R2
1
with W p 1 .
If either range ( R1 or R2 ) is smaller than W p 1 , then the means (or i s) in each of the
groups are equal.
If R1 and/or R2 are greater then W p 1 , then the ( p 1) means (or i s) in the group
concerned are divided into two groups of ( p 2) means (or i s) each and compare the
range of the two groups with W p 2 .

Continue with this procedure until a group of i means (or i s) is found whose range does not exceed

Wi .
By this method, the difference between any two means under test is significant when the range of the
observed means of each and every subgroup containing the two means under test is significant
according to the studentized critical range.

This procedure can be easily understood by the following flow chart.

Arrange '
yio s in increasing order
y1* y2* ... y *p

Compute =
R y *p y1*
Compare with W p = q , p ,

If R < W p
Stop and conclude
1= 2= ...= p

s2
n

If R > W p
continue

Divide ranked mean in 2 groups


( y ,.*p,..., y2* ) and ( y *p 1 ,..., y1* )

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

34

Compute R=
y p* y2*
1
=
R2 y *p 1 y1*
Compare R1 andR2 with W p 1

4 possibilities of R1 and R2 with W p 1

R1 < W p 1
R2 < W p 1
2 =3 =... = p
and
1= 2= ...= p 1
1 = 2 =... = p

R1 < W p 1

R1 > W p 1

R2 > W p 1

R2 < W p 1

2 =3 =... = p

1 = 2 =... = p 1

and at least one


i j , =
i j 1, 2,..., p 1

and at least one


2,3,..., p
i j , i j =

which is 1

which is p

one subgroup is

One subgroup has

( 2 , 3 ,..., p )

( 1 , 2 ,..., p 1 )

and another group has only 1

and another has only p

R1 > W p 1
R2 > W p 1

Divide ranked means


in four groups
1. y *p ,..., y3*
2. y p* 1 ,..., y2*
3. y p* 1 ,..., y2* (same as in 2)
4. y p* 2 ,..., y1*

Compute
R=
y *p y3*
3
=
R4 y *p 1 y2*
=
R5 y *p 2 y1*
and compare
with W p 2

Continue till we get subgroups with same '


s

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

35

3. Duncans multiple comparison test:


The test procedure in Duncans multiple comparison test is the same as in the Student-NewmanKeuls test except the observed ranges are compared with Duncans % critical range

D p = q* p , p ,

s2
n

where p =1 (1 ) p 1 , q*

p , p,

denotes the upper 100 % points of the Studentized range based on

Duncans range.

Tables for Duncans range are available.


Duncan feels that this test is better than the Student-Newman-Keuls test for comparing the
differences between any two ranked means. Duncan regarded that the Student-Newman-Keuls
method is too stringent in the sense that the true differences between the means will tend to be
missed too often. Duncan notes that in testing the equality of a subset k , (2 k p ) means through
null hypothesis, we are in fact testing whether ( p 1) orthogonal contrasts between the ' s differ
from zero or not. If these contrasts were tested in separate independent experiments, each at level

, the probability of incorrectly rejecting the null hypothesis would be 1 (1 ) p 1 . So Duncan


proposed to use 1 (1 ) p 1 in place of in the Student-Newman-Keuls test.
[Reference: Contributions to order statistics, Wiley 1962, Chapter 9 (Multiple decision and multiple
comparisons, H.A. David, pages 147-148)].

Case of unequal sample sizes:


When sample means are not based on the same number of observations, the procedures based on
Studentized range, Student-Newman-Keuls test and Duncans test are not applicable. Kramer
proposed that in Duncans method, if a set of p means is to be tested for equality, then replace

q* * , p ,
p

s
1 1
1
by q* * , p , s
+

p
2 nU nL
n

where nU and nL are the number of observations corresponding to the largest and smallest means in
the data. This procedure is only an approximate procedure but will tend to be conservative, since
means based on a small number of observations will tend to be overrepresented in the extreme
groups of means.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

36

Another option is to replace n by the harmonic mean of n1 , n2 ,..., n p , i.e.,

p
1

i =1 ni
p

4. The Least Significant Difference (LSD):


In the usual testing of H 0 : i = k against H1 : i k , the t-statistic
t=

yio yko
(y y )
Var
io
ko

is used which follows a t-distribution , say with degrees of freedom ' df ' . Thus H 0 is rejected
whenever
t >t

df , 1

and it is concluded that 1 and 2 are significantly different. The inequality t > t

df , 1

can be

equivalently written as

yio yko > t

df , 1

(y y ) .
Var
io
ko

If every pair of sample for which


yio yko exceeds t

df , 1

(y y )
Var
io
ko

then this will indicate that the difference between i and k is significantly different . So according
to this, the quantity t

df , 1

( y y ) would be the least difference of y and y for which it


Var
io
ko
ko
io

will be declared that the difference between i and k is significant. Based on this idea, we use the
pooled variance of the two sample Var ( yio yko ) as s2 and the Least Significant Difference (LSD) is
defined as

=
LSD t

df , 1
2

1 1
s2 + .
ni nk

n=
n, then
If n=
1
2
LSD = t

df , 1

2s 2
.
n

p ( p 1)
pairs of
2

yio and yko , (i k =


1, 2,..., p ) are compared with LSD. Use of LSD

criterion may not lead to

good results if it is used for comparisons suggested by the data

Now all

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

37

(largest/smallest sample mean) or if all pairwise comparisons are done without correction of the test
level. If LSD is used for all the pairwise comparisons then these tests are not independent. Such
correction for test levels was incorporated in Duncans test.

5. Tukeys Honestly significant Difference (HSD)


In this procedure , the Studentized rank values

q , n , are used in place of t-quantiles and the

standard error of the difference of pooled mean is used in place of standard error of mean in
common critical difference for testing H 0 : i = k against

H 0 : i k and Tukeys Honestly

Singnificant Difference is computed as

HSD = q

1 , p ,
2

MSerror
.
n

assuming all samples are of the same size n. All

p ( p 1)
pairs yio yko
2

are compared with HSD.

If yio yko > HSD then i and k are significantly different.


We notice that all the multiple comparison test procedure discussed up to now are based on the
testing of hypothesis. There is one-to-one relationship between the testing of hypothesis and the
confidence interval estimation. So the confidence interval can also be used for such comparisons.

0 so first we establish the relationship and then describe


Since H 0 : i = k is same as H 0 : i k =
the Tukeys and Scheffes procedures for multiple comparison test which are based on the
confidence interval. We need the following concepts.

Contrast:
A linear parametric function=
L l=
'


i =1

where = ( 1 , 2 ,..., P ) and = (1 , 2 ,..., p ) are

the p 1 vectors of parameters and constants respectively is said to be a contrast when

= 0.

i=1

For example

1 2= 0, 1 + 2 3 1= 0, 1 + 2 2 33= 0

etc. are contrast whereas

1 + 2 = 0, 1 + 2 + 3 + 4 = 0, 1 2 2 33 = 0 etc. are not contrasts.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

38

Orthogonal contrast:
If =
L1 =
'

L2 m=
'
i i and=
i =1

mi i are contrasts such that m = 0 or


i =1

m = 0 then L
i =1

and L2 are called orthogonal contrasts.


For example,

L1 = 1 + 2 3 4 and

L2 = 1 2 + 3 4 are contrasts. They are also the

orthogonal contrasts.

The condition

m
i =1

= 0 ensures that L1 and L2 are independent in the sense that

Cov
( L1 , L2 ) =
=
i mi 0.
2

i =1

Mutually orthogonal contrasts:


If there are more than two contrasts then they are said to be mutually orthogonal, if they are pairwise orthogonal.

It may be noted that the number of mutually orthogonal contrasts is the number of degrees of
freedom.

Coming back to the multiple comparison test, if the null hypothesis of equality of all effects is
rejected then it is reasonable to look for the contrasts which are responsible for the rejection. In terms
of contrasts, it is desirable to have a procedure
(i)

that permits the selection of the contrasts after the data is available.

(ii)

with which a known level of significance is associated.

Such procedures are Tukeys and Scheffes procedures. Before discussing these procedure, let us
consider the following example which illustrates the relationship between the testing of hypothesis
and confidence intervals.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

39

Example: Consider the test of hypothesis for


H 0 : =
j (i =
j 1, 2,..., p)
i
or H 0 : i j =
0
or H 0 : contrast = 0
or H 0 : L = 0.

The test statistic for H 0 : i = j is

( i j ) ( i j )
=
(y y )
Var
io
ko

L L
( L )
Var

where denotes the maximum likelihood (or least squares) estimator of and t follows a tdistribution with df degrees of freedom. This statistic, in fact, can be extended to any linear contrast,
say e.g., L = 1 + 2 3 4 , L = 1 + 2 3 4 .
The decision rule is
reject H 0 : L = 0 against H1 : L 0 .
If

( L ) .
L > tdf Var

The 100 (1 ) % confidence interval of L is obtained as

L L

P tdf
tdf =1

Var ( L)

( L ) L L + t Var
( L ) =1
or P L tdf Var
df

so that the 100(1 ) % confidence interval of L is


L t Var
( L ), L + t Var
( L )
df
df

and
( L ) L L + t Var
( L )
L tdf Var
df

If this interval includes

L = 0 between lower and upper confidence limits, then

H 0 : L = 0 is

accepted. Our objective is to know if the confidence interval contains zero or not.
Suppose for some given data the confidence intervals for 1 2 and 1 3 are obtained as

3 1 2 2 and 2 1 3 4.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

40

Thus we find that the interval of 1 2 includes zero which implies that
accepted. Thus 1 = 2 .

H 0 : 1 2 =
0 is

On the other hand interval of 1 3 does not include zero and so

H 0 : 1 3 =
0 is not accepted . Thus 1 3 .
If the interval of 1 3 is 1 1 3 1 then H 0 : 1 = 3 is accepted. If both H 0 : 1 = 2 and

H 0 : 1 = 3 are accepted then we can conclude that =


=
3 .
1
2

Tukeys procedure for multiple comparison (T-method)


The T-method uses the distribution of studentized range statistic . (The S-method (discussed next)
utilizes the F-distribution). The T-method can be used to make

the simultaneous confidence

statements about contrasts ( i j ) among a set of parameters {1 , 2 ,..., p } and an estimate s 2 of


error variance if certain restrictions are satisfied.
These restrictions have to be viewed according to the given conditions. For example, one of the
restrictions is that all i ' s have equal variances. In the setup of one-way classification, i has its

2
mean Yi and its variance is
. This reduces to a simple condition that all ni ' s are same, i.e.,
ni
ni = n for all i. so that all the variances are same.
Another assumption is to assume that 1 , 2 ,..., p are statistically independent and the only contrasts
considered are the

p ( p 1)
1, 2,..., p .
differences i j , i j =
2

We make the following assumptions:


(i)

The 1 , 2 ,..., p are statistically independent

(ii)

2 2
1, 2,..., p, a > 0 is a known constant.
i ~ N ( =
i , a ), i

(iii)

s 2 is an independent estimate of 2 with degrees of freedom (here = n p ) , i.e.,

s2
~ 2 ( )
2
and
(iv) s 2 is statistically independent of 1 , 2 ,..., p .

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

41

The statement of T-method is as follows:


Under the assumptions (i)-(iv), the probability is (1 ) that the values of contrasts
=
L

=
Ci i ( Ci 0) simultaneously satisfy

=i 1 =i 1

1 p

1 p

L Ts Ci L L + Ts Ci
=
2 i 1=

2 i 1

where L = Ci i , i is the maximum likelihood (or least squares) estimate of i , T = aq , p , ,


i =0

with q , p , beng the upper point of the distribution of studentized range.


Note that if L is a contrast like i j (i j ) then

1 p
Ci = 1 and the variance is 2 so that
2 i =1

a = 1 and the interval simplifies to

( i j ) Ts i j ( i j ) + Ts
of
where T = q , p , . Thus the maximum likelihood (or least squares) estimate L=
i
j

L= i j is said to be significantly different from zero according to T-criterion if the interval


( i j Ts, i j + Ts ) does not cover i j =
0, i.e.,

if i j > Ts
1 p
or more general if L > Ts Ci
2 i =1

The steps involved in the testing now involve the following steps:

Compute L or i j .

Compute all possible pairwise differences.

Compare all the differences with


q , p , .

s 1 p
Ci
n 2 i =1

1 p

If L or ( i j ) > Ts Ci
2 i =1

q
then i and j are significantly different where T = , p , .
n

Tables for T are available.


When sample sizes are not equal, then Tukey-Kramer Procedure suggests to compare L with
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

42

q , p ,

1 1 1 1 p
s + Ci

2 ni n j 2 i =1

or

1 1 1 1 p
+ Ci
2 ni n j 2 i =1

The Scheffes method (S-method) of multiple comparison


S-method generally gives shorter confidence intervals then T-method. It can be used in a number
of situations where T-method is not applicable, e.g., when the sample sizes are not equal.
A set L of estimable functions { } is called a p-dimensional space of estimable functions if there
exists p linearly independent estimable functions ( 1 , 2 ,..., p ) such that every in L is of the
p

form = Ci yi where C1 , C2 ,..., C p are known constants. In other word, L is the set of all
i =1

linear combinations of 1 , 2 ,..., p .


Under the assumption that the parametric space is Y ~ N ( X , 2 I ) with rank ( X ) = p,

= ( 1 ,..., p ), X is n p matrix, consider a p-dimensional space L of estimable functions


generated by a set of p linearly independent estimable functions { 1 , 2 ,..., p }.
n

For any L, Let = Ci yi be its least squares (or maximum likelihood) estimator,
i =1

Var ( ) = 2 Ci2
i =1

= ( say )
2

and 2 = s 2 Ci2
i =1

where s 2 is the mean square due to error with (n p ) degrees of freedom.


The statement of S-method is as follows:
Under the parametric space , the probability is

(1 ) that simultaneously for all

constant S
L, S + S where the =

pF1 ( p, n p ) .

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

43

Method: For a given space L of estimable functions and confidence coefficient

(1 ) , the

least square (or maximum likelihood) estimate of L will be said to be significantly


different from zero according to S-criterion if the confidence interval
( S + S )

does not cover = 0, i.e., if > S .


The S-method is less sensitive to the violation of assumptions of normality and homogeneity of
variances.

Comparison of Tukeys and Scheffes methods:


1. Tukeys method can be used only with equal sample size for all factor level but S-method is
applicable whether the sample sizes are equal or not.
2. Although, Tukeys method is applicable for any general contrast, the procedure is more
powerful when comparing simple pairwise differences and not making more complex
comparisons.
3. It only pairwise comparisons are of interest, and all factor levels have equal sample sizes,
Tukeys method gives shorter confidence interval and thus is more powerful.
4. In the case of comparisons involving general contrasts, Scheffes method tends to give
marrower confidence interval and provides a more powerful test.
5. Scheffes method is less sensitive to the violations of assumptions of normal distribution and
homogeneity of variances.

Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur

44

You might also like