Anova Review

ANOVA/ANCOVA Review
YIK LUN, KEI

2016/11/12
One-Way ANOVA (One categorical variable)

Reduced model: Empty because there is no relationship between Y and C
Full model: Y = 0 + 1 C
F-test =
SS(regression)/dfregression
SS(residual)/dfresidual
H0 : 1 = 0 There is no relationship between Y and C

Ha : 1 6= 0 There is some relationship between Y and C
Example (two levels categorical variable)
nyc <- read.csv("http://www.stat.tamu.edu/~sheather/book/docs/datasets/nyc.csv",header=T)
attach(nyc);East<-factor(East)
Full <- lm(Price ~ East)
anova(Full)
##
##
##
##
##
##
##
##
Analysis of Variance Table

Response: Price
Df Sum Sq Mean Sq F value Pr(>F)
East
1
502.3 502.31 5.9906 0.01542 *
Residuals 166 13919.2
83.85
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value is 0.01542, which is less than 0.05, we reject H0 : 1 = 0. Therefore, there is some
relationship between Price and East: the mean price when East = 0 is different from the mean price when
East = 1.
Example (three levels categorical variable)

pain = c(4, 5, 4, 3, 2, 4, 3, 4, 4, 6, 8, 4, 5, 4, 6, 5, 8, 6, 6, 7, 6, 6, 7, 5, 6, 5, 5)
drug = c(rep("A",9), rep("B",9), rep("C",9));migraine = data.frame(pain,drug)
Full <- lm(pain ~ drug, data = migraine);anova(Full)
##
##
##
##
##
##
##
##

Response: pain
Df Sum Sq Mean Sq F value
Pr(>F)
drug
2 28.222 14.1111 11.906 0.0002559 ***
Residuals 24 28.444 1.1852
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pairwise.t.test(pain, drug, p.adjust="bonferroni")

##
##
##
##
##
##
##
##
##
##
Pairwise comparisons using t tests with pooled SD

data:
pain and drug
A
B
B 0.00119 C 0.00068 1.00000
P value adjustment method: bonferroni
results = aov(pain ~ drug, data=migraine);TukeyHSD(results, conf.level = 0.95)

##
##
##
##
##
##
##
##
##
##
Tukey multiple comparisons of means

95% family-wise confidence level
Fit: aov(formula = pain ~ drug, data = migraine)
$drug
diff
lwr
upr
p adj
B-A 2.1111111 0.8295028 3.392719 0.0011107
C-A 2.2222222 0.9406139 3.503831 0.0006453
C-B 0.1111111 -1.1704972 1.392719 0.9745173
print(model.tables(results, "means"), digits = 3)

##
##
##
##
##
##
##
##
##
Tables of means
Grand mean
5.111111
drug
drug
A
B
C
3.67 5.78 5.89
Since the p-value is 0.0003, which is less than 0.05, we reject H0 : 1 = 0. Therefore, there is some relationship
between pain and drug. Namely, at least one of the three population means is different from each other.
In pairwise t-test, we can see the p-value for A-B and A-C are 0.00119 and 0.00068 respectively, therefore
population mean between A and B, as well as population mean between A and C are different. On the
2
contrary, p-value for B-C is 1, which means we fail to reject the null that there is no difference in population
mean between B and C. Finally, we are 95% confident that the difference in population mean between A and
B is 2.11, and the difference in population mean between A and C is 2.22, and the difference in population
mean between B and C is 0.11.
Two-Way ANOVA (two categorical variables C1 and C2)

Reduced model: Y = 0 + 1 X
Full model: Y = 0 + 1 X + 2 C + 3 X C
H0 : 2 = 3 = 0
Ha : 2 6= 0 or 3 6= 0
partial F-test: F =
(RSS(reduced)RSS(f ull))/(dfreduced dff ull )

RSS(f ull)/dff ull
dff ull = p + 1 and dfreduced = p k + 1

p = number of predictor and k = number of predictor omitted
Example (two categorical variables)
births <- read.delim("/Users/air/Desktop/ncbirth.txt",header=T);attach(births)
Full <- lm(pounds ~ smoke + premie + smoke * premie,data = births)
Reduced <- lm(pounds ~ smoke,data = births)
anova(Reduced,Full)
##
##
##
##
##
##
##
##
##

Model 1: pounds
Model 2: pounds
Res.Df
RSS
1
198 394.30
2
196 273.97
--Signif. codes:
~ smoke
~ smoke + premie + smoke * premie
Df Sum of Sq
F
Pr(>F)
2
120.33 43.043 3.189e-16 ***
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model <- aov(pounds ~ smoke + premie + smoke * premie,data = births)

print(model.tables(model, "means"), digits = 3)
##
##
##
##
##
##
##
Tables of means
Grand mean
7.21615
smoke
smoke
3
##
##
##
##
##
##
##
##
##
##
##
##
##
0
1
7.32 6.63
premie
premie
0
1
7.54 5.33
smoke:premie
premie
smoke 0
1
0 7.64 5.44
1 6.94 4.69
summary(Full)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
lm(formula = pounds ~ smoke + premie + smoke * premie, data = births)
Residuals:
Min
1Q Median
-3.810 -0.756 0.174
3Q
0.744
Max
2.364
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
7.63596
0.09785 78.040 < 2e-16 ***
smoke
-0.69236
0.25590 -2.706 0.00742 **
premie
-2.19476
0.25590 -8.577 2.94e-15 ***
smoke:premie -0.05884
0.68618 -0.086 0.93175
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.182 on 196 degrees of freedom
Multiple R-squared: 0.3249, Adjusted R-squared: 0.3146
F-statistic: 31.45 on 3 and 196 DF, p-value: < 2.2e-16
interaction.plot(premie, smoke, pounds, type = "b", col = c(1:2),main="Interaction plot",

xlab = "premie", ylab = "Mean of birthweight")
smoke
1
2
0
1
5.5
6.0
6.5
7.0
5.0
Mean of birthweight
7.5
Interaction plot
2
0
1
premie
Since the p-value for F-test is very small, we reject H0 : 2 = 3 = 0 that premie and interaction term are
insignificant predictors. In other words, the full model is preferred and there is at least one category has
different population mean. As we can see from the mean table, the means between smoke = 0 and smoke =
1, the means between premie = 0 and premie = 1, are all different. But the means between interaction and
non-interaction are not very different. Hence the internation is not significant. This can also be confirmed
from the regression summary of full model. According to interaction plot, we can see that the means for
smoke and premie are indeed different. However, since these two lines are parallel and do not cross each
other, this again prove that the interaction term is not significant.
Final model should be pounds = 7.64 - 0.7 * smoke - 2.2 * premie

Final <- lm(pounds ~ smoke + premie,data = births)
summary(Final)
##
##
##
##
##
##
##
##
##
##
Call:
lm(formula = pounds ~ smoke + premie, data = births)
Residuals:
Min
1Q
-3.8537 -0.7572
Median
0.1728
3Q
0.7428
Max
2.3628
Coefficients:
Estimate Std. Error t value Pr(>|t|)
##
##
##
##
##
##
##
##
##
(Intercept)
7.6372
0.0966 79.058 < 2e-16
smoke
-0.7005
0.2368 -2.958 0.00348
premie
-2.2029
0.2368 -9.301 < 2e-16
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
***
**
***
'.' 0.1 ' ' 1

F-statistic: 47.41 on 2 and 197 DF, p-value: < 2.2e-16
anova(Reduced,Final)
##
##
##
##
##
##
##
##
##

Model 1: pounds
Model 2: pounds
Res.Df
RSS
1
198 394.30
2
197 273.98
--Signif. codes:
~ smoke
~ smoke + premie
Df Sum of Sq
1
Pr(>F)
120.32 86.515 < 2.2e-16 ***
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANCOVA (one continuous variable and one categorical variable)

Example (two levels categorical variable)
Full <- lm(Price ~ Food + East + Food:East, data = nyc)
Reduced <- lm(Price ~ Food, data = nyc)
anova(Reduced,Full)
##
##
##
##
##
##
##

Model 1:
Model 2:
Res.Df
1
166
2
164
Price ~ Food
Price ~ Food + East + Food:East
RSS Df Sum of Sq
F Pr(>F)
8751.2
8620.9 2
130.36 1.2399 0.2921
Since the p-value is 0.2921, which is greater than 0.05, we fail to reject H0 : 2 = 3 = 0. Therefore, reduced
model is better than full model.
Example (Three-way ANOVA with three levels categorical variable)

cracker <- read.table("/Users/air/Desktop/cracker.txt",header=TRUE)
attach(cracker);treat <- factor(treat)
Full <- lm(sales ~ treat + x + treat:x)
Reduced1 <- lm(sales ~ x)
Reduced2 <- lm(sales ~ treat + x)
anova(Reduced1,Reduced2,Full)
##
##
##
##
##
##
##
##
##
##
##

Model 1: sales ~ x
Model 2: sales ~ treat + x
Model 3: sales ~ treat + x + treat:x
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
13 455.72
2
11 38.57 2
417.15 59.5536 6.457e-06 ***
3
9 31.52 2
7.05 1.0065
0.4032
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Reduced2)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
lm(formula = sales ~ treat + x)
Residuals:
Min
1Q Median
-2.4348 -1.2739 -0.3362
3Q
1.6710
Max
2.4869
Coefficients:
Estimate Std. Error t value
(Intercept) 17.3534
2.5230
6.878
treat2
-5.0754
1.2290 -4.130
treat3
-12.9768
1.2056 -10.764
x
0.8986
0.1026
8.759
--Signif. codes: 0 '***' 0.001 '**' 0.01
Pr(>|t|)
2.66e-05
0.00167
3.53e-07
2.73e-06
***
**
***
***
'*' 0.05 '.' 0.1 ' ' 1

F-statistic: 57.78 on 3 and 11 DF, p-value: 5.082e-07
According to three-way anova table, model 2 is preferred to model 1 and model 3. Variable x and treat are
significant and interaction is insignificant for model.
Final Model: sales = 17.4 + 0.89 * x - 5.07 * treat2 - 12.98 * treat3

results = aov(sales ~ treat + x);TukeyHSD(results, conf.level = 0.95)
##
##
##
##
##
##
##
##
##
##
Tukey multiple comparisons of means

95% family-wise confidence level
Fit: aov(formula = sales ~ treat + x)
$treat
diff
lwr
upr
p adj
2-1 -2.2 -5.398655 0.9986549 0.1968712
3-1 -11.0 -14.198655 -7.8013451 0.0000042
3-2 -8.8 -11.998655 -5.6013451 0.0000358
7
pairwise.t.test(sales, treat, p.adjust="bonferroni")

##
##
##
##
##
##
##
##
##
##
Pairwise comparisons using t tests with pooled SD

data:
sales and treat
1
2
2 1.000 3 0.015 0.053
P value adjustment method: bonferroni
According to pairwise t-test and TukeyHSD, we can see that the difference between treat1 and treat3, as well
as the difference between treat2 and treat3 are significant. However, the difference between treat1 and treat2
are not significant, since the p-value is greater than 0.05.

Anova Review

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anova Review

Uploaded by

Copyright:

Available Formats

ANOVA/ANCOVA Review

YIK LUN, KEI

One-Way ANOVA (One categorical variable)

H0 : 1 = 0 There is no relationship between Y and C

Analysis of Variance Table

Example (three levels categorical variable)

Analysis of Variance Table

pairwise.t.test(pain, drug, p.adjust="bonferroni")

Pairwise comparisons using t tests with pooled SD

pain and drug

results = aov(pain ~ drug, data=migraine);TukeyHSD(results, conf.level = 0.95)

Tukey multiple comparisons of means

print(model.tables(results, "means"), digits = 3)

Two-Way ANOVA (two categorical variables C1 and C2)

(RSS(reduced)RSS(f ull))/(dfreduced dff ull )

dff ull = p + 1 and dfreduced = p k + 1

Analysis of Variance Table

120.33 43.043 3.189e-16 ***

model <- aov(pounds ~ smoke + premie + smoke * premie,data = births)

interaction.plot(premie, smoke, pounds, type = "b", col = c(1:2),main="Interaction plot",

Final model should be pounds = 7.64 - 0.7 * smoke - 2.2 * premie

Residual standard error: 1.179 on 197 degrees of freedom

Analysis of Variance Table

120.32 86.515 < 2.2e-16 ***

ANCOVA (one continuous variable and one categorical variable)

Analysis of Variance Table

Example (Three-way ANOVA with three levels categorical variable)

Analysis of Variance Table

'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.873 on 11 degrees of freedom

Final Model: sales = 17.4 + 0.89 * x - 5.07 * treat2 - 12.98 * treat3

Tukey multiple comparisons of means

pairwise.t.test(sales, treat, p.adjust="bonferroni")

Pairwise comparisons using t tests with pooled SD

sales and treat

You might also like