Professional Documents
Culture Documents
Statistics 504
10 February 2014
Homework 1
Anatomical Abnormalities Associated with Schizophrenia
Part 1:
Answers:
0 = 0.1986667
We reject the null hypothesis that there is no difference, on average, in the volume of the left
hippocampus between schizophrenics and non-schizophrenics at an of 0.05 with a tvalue of 3.2289 and a corresponding p-value of 0.00606.
At a significance level of =0.05 the 95% confidence interval for our estimate of is (0.0667041,
0.3306292). Being that the interval does not contain 0 this confidence interval supports or
rejection of the null hypothesis that that there is no difference, on average, in the volume
of the left hippocampus between schizophrenics and non-schizophrenics at an of 0.05.
R Input:
install.packages("Sleuth3")
library(Sleuth3)
case0202$diff=case0202$Unaffected - case0202$Affected
modela=lm(diff~1, data=case0202)
summary(modela)
confint(modela)
t.test(case0202$diff)
R Output:
> case0202$diff=case0202$Unaffected - case0202$Affected
> modela=lm(diff~1, data=case0202)
> summary(modela)
Call:
lm(formula = diff ~ 1, data = case0202)
Residuals:
Min
1Q
Median
-0.38867 -0.14367 -0.08867
3Q
0.11633
Max
0.47133
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.19867
0.06153
3.229 0.00606 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.2383 on 14 degrees of freedom
> confint(modela)
2.5 %
97.5 %
(Intercept) 0.0667041 0.3306292
> t.test(case0202$diff)
One Sample t-test
data: case0202$diff
t = 3.2289, df = 14, p-value = 0.006062
Aaron Vincent
Statistics 504
10 February 2014
Part 2a:
Answers:
0 = 1.56
1=0.19867
R has defined that the indicator variable is 1 when it is unaffected and 0 when it is affected.
R Input:
volume<- c(case0202$Unaffected, case0202$Affected)
status <- rep(c("unaffected", "affected"), each =15)
newdata <- data.frame(volume,status)
model1 = lm(volume ~ status, data=newdata, var.equal = TRUE)
summary(model1)
confint(model1)
cbind(case0202,model.matrix(model1))
R Output:
> volume<- c(case0202$Unaffected, case0202$Affected)
> status <- rep(c("unaffected", "affected"), each =15)
> newdata <- data.frame(volume,status)
> model1 = lm(volume ~ status, data=newdata, var.equal = TRUE)
Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
extra argument var.equal is disregarded.
> summary(model1)
Call:
lm(formula = volume ~ status, data = newdata, var.equal = TRUE)
Residuals:
Min
1Q Median
3Q
Max
-0.54000 -0.19367 0.01633 0.17883 0.46000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.56000 0.07060 22.10 <2e-16 ***
statusunaffected 0.19867 0.09984 1.99 0.0565 .
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.2734 on 28 degrees of freedom
Multiple R-squared: 0.1239,
Adjusted R-squared: 0.0926
Aaron Vincent
Statistics 504
10 February 2014
F-statistic: 3.959 on 1 and 28 DF, p-value: 0.05646
> confint(model1)
2.5 % 97.5 %
(Intercept)
1.415384455 1.7046155
statusunaffected -0.005850599 0.4031839
> cbind(case0202,model.matrix(model1))
Unaffected Affected diff (Intercept) statusunaffected
1
1.94 1.27 0.67
1
1
2
1.44 1.63 -0.19
1
1
3
1.56 1.47 0.09
1
1
4
1.58 1.39 0.19
1
1
5
2.06 1.93 0.13
1
1
6
1.66 1.26 0.40
1
1
7
1.75 1.71 0.04
1
1
8
1.77 1.67 0.10
1
1
9
1.78 1.28 0.50
1
1
10
1.92 1.85 0.07
1
1
11
1.25 1.02 0.23
1
1
12
1.93 1.34 0.59
1
1
13
2.04 2.02 0.02
1
1
14
1.62 1.59 0.03
1
1
15
2.08 1.97 0.11
1
1
16
1.94 1.27 0.67
1
0
17
1.44 1.63 -0.19
1
0
18
1.56 1.47 0.09
1
0
19
1.58 1.39 0.19
1
0
20
2.06 1.93 0.13
1
0
21
1.66 1.26 0.40
1
0
22
1.75 1.71 0.04
1
0
23
1.77 1.67 0.10
1
0
24
1.78 1.28 0.50
1
0
25
1.92 1.85 0.07
1
0
26
1.25 1.02 0.23
1
0
27
1.93 1.34 0.59
1
0
28
2.04 2.02 0.02
1
0
29
1.62 1.59 0.03
1
0
30
2.08 1.97 0.11
1
0
Part 2b:
Answers:
a = 0
u = 0 + 1
(a - u) = 0 (0 + 1) = - 1
(u - a) = (0 + 1) - 0 = 1
R Input:
Aaron Vincent
Statistics 504
10 February 2014
N/A
R Output:
N/A
Part 2c:
Answers:
a = 1.56
u = 1.758667
(a - u) = -0.1986667
(u - a) = 0.1986667
R Input:
install.packages("contrast")
library(contrast)
contrast(model1, list(status=c("affected", "unaffected")))
contrast(model1, list(status=c("affected", "unaffected")), list(status=c("unaffected","affected")))
aggregate(volume ~ status, newdata, mean)
R Output:
> contrast(model1, list(status=c("affected", "unaffected")))
lm model parameter contrast
Contrast
S.E. Lower Upper t df Pr(>|t|)
1.560000 0.07059902 1.415384 1.704616 22.10 28
0
1.758667 0.07059902 1.614051 1.903282 24.91 28
0
> contrast(model1, list(status=c("affected", "unaffected")), list(status=c("unaffected","affected")))
lm model parameter contrast
Contrast
S.E.
Lower
Upper t df Pr(>|t|)
1 -0.1986667 0.0998421 -0.403183932 0.005850599 -1.99 28 0.0565
2 0.1986667 0.0998421 -0.005850599 0.403183932 1.99 28 0.0565
> aggregate(volume ~ status, newdata, mean)
status volume
1 affected 1.560000
2 unaffected 1.758667
Part 2d:
Aaron Vincent
Statistics 504
10 February 2014
Answers:
Summary Function:
Here the p-value for the statusunaffected parameter is greater than the of 0.05
suggesting there is no statistically significant difference, on average, in the volume
of the left hippocampus between schizophrenics and non-schizophrenics.
Contrast Function:
Here the (a - u) and (u - a) p-values are also greater than the of 0.05 suggesting
that there is no difference, on average, in the volume of the left hippocampus between
schizophrenics and non-schizophrenics.
R Input:
volume<- c(case0202$Unaffected, case0202$Affected)
status <- rep(c("unaffected", "affected"), each =15)
newdata <- data.frame(volume,status)
model1 = lm(volume ~ status, data=newdata, var.equal = TRUE)
summary(model1)
install.packages("contrast")
library(contrast)
contrast(model1, list(status=c("affected", "unaffected")), list(status=c("unaffected","affected")))
R Output:
> model1 = lm(volume ~ status, data=newdata, var.equal = TRUE)
Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
extra argument var.equal is disregarded.
> summary(model1)
Call:
lm(formula = volume ~ status, data = newdata, var.equal = TRUE)
Residuals:
Min
1Q Median
3Q
Max
-0.54000 -0.19367 0.01633 0.17883 0.46000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.56000 0.07060 22.10 <2e-16 ***
statusunaffected 0.19867 0.09984 1.99 0.0565 .
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.2734 on 28 degrees of freedom
Multiple R-squared: 0.1239,
Adjusted R-squared: 0.0926
F-statistic: 3.959 on 1 and 28 DF, p-value: 0.05646
> contrast(model1, list(status=c("affected", "unaffected")), list(status=c("unaffected","affected")))
lm model parameter contrast
Aaron Vincent
Statistics 504
10 February 2014
Contrast
S.E.
Lower
Upper
t df Pr(>|t|)
1 -0.1986667 0.0998421 -0.403183932 0.005850599 -1.99 28 0.0565
2 0.1986667 0.0998421 -0.005850599 0.403183932 1.99 28 0.0565
R Output:
> model1=lm(Metabol~Sex + Gastric + Sex:Gastric, data=case1101)
> summary(model1)
Aaron Vincent
Statistics 504
10 February 2014
Call:
lm(formula = Metabol ~ Sex + Gastric + Sex:Gastric, data = case1101)
Residuals:
Min
1Q Median
3Q
Max
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.1973 0.8022 -0.246 0.8075
SexMale
-0.9885 1.0724 -0.922 0.3645
Gastric
0.8369 0.4839 1.730 0.0947 .
SexMale:Gastric 1.5069 0.5591 2.695 0.0118 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.207 on 28 degrees of freedom
Multiple R-squared: 0.8137,
Adjusted R-squared: 0.7938
F-statistic: 40.77 on 3 and 28 DF, p-value: 2.386e-10
SexMale:Gastric
1
0.0
2
0.0
3
0.0
4
0.0
5
0.0
6
0.0
7
0.0
8
0.0
9
0.0
10
0.0
11
0.0
12
0.0
13
0.0
14
0.0
15
0.0
16
0.0
17
0.0
18
0.0
19
1.3
20
1.2
21
1.4
22
1.3
23
2.7
24
1.1
25
2.3
26
2.7
27
1.4
28
2.2
29
2.0
30
2.8
31
5.2
32
4.1
> library(visreg)
> visreg(model1, "Sex", "Gastric")
> visreg(model1, "Gastric", "Sex")
Aaron Vincent
Statistics 504
10 February 2014
> library(contrast)
> contrast(model1, list(Gastric = 1, Sex = c("Male","Female")),list(Gastric = 0, Sex =
c("Male","Female")),cnames = c("Male","Female"))
lm model parameter contrast
Contrast
S.E. Lower Upper t df Pr(>|t|)
Male 2.3438714 0.2801480 1.770014 2.917729 8.37 28 0.0000
Female 0.8369478 0.4838925 -0.154261 1.828157 1.73 28 0.0947
Part 2:
Answers:
0 = -0.9884969 intercept
1 = 0.5184267 difference in gastric alcohol measurements between males and females @ lvl 1
2 = 2.0253503 difference in gastric alcohol measurements between males and females @ lvl 2
3 = 3.5322739 difference in gastric alcohol measurements between males and females @ lvl 3
4 = 5.0391975 difference in gastric alcohol measurements between males and females @ lvl 4
5 = 6.5461211 difference in gastric alcohol measurements between males and females @ lvl 5
R Input:
contrast(model1, list(Sex = "Male", Gastric = c(0,1,2,3,4,5)),list(Sex = "Female", Gastric =
c(0,1,2,3,4,5)), cnames = c("0.0", "1", "2","3","4","5"))
R Output:
> contrast(model1, list(Sex = "Male", Gastric = c(0,1,2,3,4,5)),list(Sex = "Female", Gastric =
c(0,1,2,3,4,5)), cnames = c("0.0", "1", "2","3","4","5"))
lm model parameter contrast
Contrast
S.E. Lower Upper t df Pr(>|t|)
0.0 -0.9884969 1.0723910 -3.185190 1.208197 -0.92 28 0.3645
1 0.5184267 0.6175524 -0.746572 1.783425 0.84 28 0.4083
2 2.0253503 0.4878412 1.026053 3.024648 4.15 28 0.0003
3 3.5322739 0.8484555 1.794292 5.270256 4.16 28 0.0003
4 5.0391975 1.3516783 2.270410 7.807985 3.73 28 0.0009
5 6.5461211 1.8866534 2.681487 10.410755 3.47 28 0.0017
Part 3:
Answers:
Sex Gastric
Male
Female
Male
Female
Male
Female
Male
lvl
0
0
1
1
2
2
3
fit
-1.1857659
-0.1972691
1.1581055
0.6396787
3.5019768
1.4766265
5.8458482
lwr
-4.0565270
-3.1664867
-1.5025839
-1.9589775
0.9376268
-1.1030566
3.2514046
upr
1.684995
2.771949
3.818795
3.238335
6.066327
4.056310
8.440292
Aaron Vincent
Statistics 504
10 February 2014
Female
Male
Female
Male
Female
9
3
4
4
5
5
R Input:
case1101.new=expand.grid(Sex=c("Male","Female"),Gastric=c(0,1,2,3,4,5))
case1101.predict=predict(model1, case1101.new, interval="pred")
cbind(case1101.new, case1101.predict)
R Output:
> case1101.new=expand.grid(Sex=c("Male","Female"),Gastric=c(0,1,2,3,4,5))
> case1101.predict=predict(model1, case1101.new, interval="pred")
> cbind(case1101.new, case1101.predict)
Sex Gastric
fit
lwr
upr
1 Male
0 -1.1857659 -4.0565270 1.684995
2 Female
0 -0.1972691 -3.1664867 2.771949
3 Male
1 1.1581055 -1.5025839 3.818795
4 Female
1 0.6396787 -1.9589775 3.238335
5 Male
2 3.5019768 0.9376268 6.066327
6 Female
2 1.4766265 -1.1030566 4.056310
7 Male
3 5.8458482 3.2514046 8.440292
8 Female
3 2.3135743 -0.6055874 5.232736
9 Male
4 8.1897196 5.4429016 10.936538
10 Female
4 3.1505221 -0.3641948 6.665239
11 Male
5 10.5335910 7.5306750 13.536507
12 Female
5 3.9874699 -0.2728064 8.247746
Part 4:
Answers:
The test of interaction between sex and gastric alcohol measurements (SexMale:Gastric) is the
test of the null hypothesis that the regression lines between men and women are parallel.
Model.full - p-value = 0.0118
t-value = 2.695
Model.null p-value = 4.24e-08
t-value = 7.352
Anova F-stat = 7.352
t-value squared = (2.695^2) = 7.352
R Input:
model.full=lm(Metabol~Sex + Gastric + Sex:Gastric, data=case1101)
model.null=lm(Metabol~Sex+Gastric, data=case1101)
anova(model.null, model.full)
summary(model.null)
summary(model.full)
Aaron Vincent
Statistics 504
10 February 2014
R Output:
> model.full=lm(Metabol~Sex + Gastric + Sex:Gastric, data=case1101)
> model.null=lm(Metabol~Sex+Gastric, data=case1101)
> anova(model.null, model.full)
Analysis of Variance Table
Model 1: Metabol ~ Sex + Gastric
Model 2: Metabol ~ Sex + Gastric + Sex:Gastric
Res.Df RSS Df Sum of Sq
F Pr(>F)
1 29 51.400
2 28 40.813 1 10.587 7.2635 0.01176 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> summary(model.null)
Call:
lm(formula = Metabol ~ Sex + Gastric, data = case1101)
Residuals:
Min
1Q Median 3Q Max
-2.2779 -0.6328 -0.0966 0.5783 4.5703
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.9466 0.5198 -3.745 0.000796 ***
SexMale
1.6174 0.5114 3.163 0.003649 **
Gastric
1.9656 0.2674 7.352 4.24e-08 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.331 on 29 degrees of freedom
Multiple R-squared: 0.7654,
Adjusted R-squared: 0.7492
F-statistic: 47.31 on 2 and 29 DF, p-value: 7.41e-10
> summary(model.full)
Call:
lm(formula = Metabol ~ Sex + Gastric + Sex:Gastric, data = case1101)
Residuals:
Min
1Q Median 3Q Max
-2.4427 -0.6111 -0.0326 0.5436 3.8759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.1973 0.8022 -0.246 0.8075
SexMale
-0.9885 1.0724 -0.922 0.3645
Gastric
0.8369 0.4839 1.730 0.0947 .
SexMale:Gastric 1.5069 0.5591 2.695 0.0118 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.207 on 28 degrees of freedom
10
Aaron Vincent
Statistics 504
10 February 2014
11
Part 5:
Answers:
> summary(model5)
Call:
lm(formula = Metabol ~ x1 + x2 + x3, data = case1101)
Residuals:
Min
1Q Median 3Q Max
-2.4427 -0.6111 -0.0326 0.5436 3.8759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1973 0.8022 -0.246 0.8075
x1
-0.9885 1.0724 -0.922 0.3645
x2
0.8369 0.4839 1.730 0.0947 .
x3
1.5069 0.5591 2.695 0.0118 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.207 on 28 degrees of freedom
Multiple R-squared: 0.8137,
Adjusted R-squared: 0.7938
F-statistic: 40.77 on 3 and 28 DF, p-value: 2.386e-10
R Input:
x=c(case1101$Sex)
x
x1=ifelse(x=="1", 0, 1)
x1
x2=c(case1101$Gastric)
x2
x3=c(x1*x2)
x3
model5=lm(Metabol~x1+x2+x3, data=case1101)
summary(model5)
summary(model1)
R Output:
> x=c(case1101$Sex)
> x
[1] 1 1 1 1 1 1 1 1 1 1 1 1
> x1=ifelse(x=="1", 0, 1)
> x1
[1] 0 0 0 0 0 0 0 0 0 0 0 0
> x2=c(case1101$Gastric)
> x2
[1] 1.0 1.6 1.5 2.2 1.1 1.2
.0 2.2 1.3 1.2
[21] 1.4 1.3 2.7 1.1 2.3 2.7
> x3=c(x1*x2)
1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0.9 0.8 1.5 0.9 1.6 1.7 1.7 2.2 0.8 2.0 3
1.4 2.2 2.0 2.8 5.2 4.1
Aaron Vincent
Statistics 504
10 February 2014
12
> x3
[1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
.0 0.0 1.3 1.2
[21] 1.4 1.3 2.7 1.1 2.3 2.7 1.4 2.2 2.0 2.8 5.2 4.1
> model5=lm(Metabol~x1+x2+x3, data=case1101)
> summary(model5)
Call:
lm(formula = Metabol ~ x1 + x2 + x3, data = case1101)
Residuals:
Min
1Q Median
-2.4427 -0.6111 -0.0326
3Q
0.5436
Max
3.8759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1973
0.8022 -0.246
0.8075
x1
-0.9885
1.0724 -0.922
0.3645
x2
0.8369
0.4839
1.730
0.0947 .
x3
1.5069
0.5591
2.695
0.0118 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.207 on 28 degrees of freedom
Multiple R-squared: 0.8137, Adjusted R-squared: 0.7938
F-statistic: 40.77 on 3 and 28 DF, p-value: 2.386e-10
> summary(model1)
Call:
lm(formula = Metabol ~ Sex + Gastric + Sex:Gastric, data = case1101)
Residuals:
Min
1Q Median
-2.4427 -0.6111 -0.0326
3Q
0.5436
Max
3.8759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.1973
0.8022 -0.246
0.8075
SexMale
-0.9885
1.0724 -0.922
0.3645
Gastric
0.8369
0.4839
1.730
0.0947 .
SexMale:Gastric
1.5069
0.5591
2.695
0.0118 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.207 on 28 degrees of freedom
Multiple R-squared: 0.8137, Adjusted R-squared: 0.7938
F-statistic: 40.77 on 3 and 28 DF, p-value: 2.386e-10
Part 6:
Answers:
Model dat1 - xi1 = is the the ith male subject
xi2 = is the interaction of the gastric alcohol measurement with the ith female
subject
xi3 = is the interaction of the gastric alcohol measurement with the ith male
subject
Males Slope = 2.3439
Intercept = -1.1858
Aaron Vincent
Statistics 504
10 February 2014
Case-wise Men: E(Metabol) = -0.1973 + 0.9885 + (2.3438741 * Gastric)
Females Slope = 0.8369
Intercept = -0.1973
Case-wise Females: E(Metabol) = -1.973 + (0.8369 * Gastric)
Model dat 2 - xi1 = is the ith female subject
xi2 = is the ith male subject
xi3 = is the interaction of the gastric alcohol measurement with the ith female
subject
xi4 = is the interaction of the gastric alcohol measurement with the ith male
subject
Males Slope = 2.3439
Intercept = -1.3831
Case-wise Male: E(Metabol) = -0.1973 + -1.1858 + (2.3439 * Gastric)
Females Slope = 0.8369
Intercept = -0.1973
Case-wise Female: E(Metabol) = -0.1973 + (0.8369 * Gastric)
R Input:
dat1=lm(Metabol ~ Sex + Sex:Gastric, data=case1101)
dat2=lm(Metabol ~ Sex + Sex:Gastric - 1, data=case1101)
summary(dat1)
summary(dat2)
R Output:
13
Aaron Vincent
Statistics 504
10 February 2014
Call:
lm(formula = Metabol ~ Sex + Sex:Gastric - 1, data = case1101)
Residuals:
Min
1Q Median 3Q Max
-2.4427 -0.6111 -0.0326 0.5436 3.8759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
SexFemale
-0.1973 0.8022 -0.246 0.8075
SexMale
-1.1858 0.7117 -1.666 0.1068
SexFemale:Gastric 0.8369 0.4839 1.730 0.0947 .
SexMale:Gastric 2.3439 0.2801 8.367 4.22e-09 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.207 on 28 degrees of freedom
Multiple R-squared: 0.8997,
Adjusted R-squared: 0.8853
F-statistic: 62.77 on 4 and 28 DF, p-value: 1.423e-13
Part 7:
Answers:
dat2=lm(Metabol ~ Sex + Gastric: -1, data=case1101)
R Input:
dat2=lm(Metabol ~ Sex + Gastric: -1, data=case1101)
R Output:
> summary(dat2)
Call:
lm(formula = Metabol ~ Sex + Gastric:-1, data = case1101)
Residuals:
Min
1Q Median 3Q Max
-3.8214 -1.0304 -0.4607 0.4250 8.1786
Coefficients:
Estimate Std. Error t value Pr(>|t|)
SexFemale 1.1000 0.5221 2.107 0.0436 *
SexMale 4.1214 0.5920 6.962 9.84e-08 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.215 on 30 degrees of freedom
Multiple R-squared: 0.6381,
Adjusted R-squared: 0.614
F-statistic: 26.45 on 2 and 30 DF, p-value: 2.389e-07
14