You are on page 1of 2

Fall 2013

STATISTICS 479
Assignment #6 (40 points)
Instructions: Turn in the programs, the output, the plots and the written answers (when required to
do so) for each part of both questions.
1. A persons muscle mass is expected to decrease with age. To explore this relationship in women, a
nutritionist selected a random sample of women between the ages of 40 and 80. In the data table
below, x is the age in years and y is a measure of muscle mass. Use a SAS program to t a simple
linear regression model to this data. You must use SAS statements to produce output necessary to
answer each part when necessary. Must attach your SAS output but write the answers in the form
discussed in the text to each of the questions using information from the output.
x 71 64 43 67 56 73 68 56 76 65 45 58 45 53 49 78
y 82 91 100 68 87 73 78 80 65 84 116 76 97 100 105 77
Use the data and regression model y =
0
+
1
x+ from Problem 1 in proc reg step(s) to produce
a Normal quantile plot of residuals, a residuals versus predicted values plot, a plot containing a
regression line, condence limits, and prediction limits overlaid on a scatter plot of data, and a plot
of residuals versus the explanatory variable. Use the plots= option to select plots to be output.
Also use the anova table, the estimates table, and the output statistics table as necessary to answer
all questions given below.
(a) Using numbers output from the SAS output, construct an analysis of variance table including
a column for the F-statistic to test H
0
:
1
= 0 against H
a
:
1
= 0 at = .05. State your
decision using the p-value.
(b) Give the least squares estimates of
0
and
1
and their standard errors, respectively. What is
your prediction equation?
(c) Use your prediction equation to estimate the expected loss in mean muscle mass associated
with a 5-year increase in age? Show your work clearly.
(d) What is the coecient of determination for your regression equation? In you own words,
explain what this number means to you in terms of variability in muscle mass.
(e) Construct a 95% condence interval for
1
. State in words what this interval says about the
expected decrease in muscle mass.
(f) Test the hypothesis H
0
:
1
= 0 against H
a
:
1
= 0 at = .05 using the p-value from the
estimates table. State your decision.
(g) Find the point estimate of the mean muscle mass for all women aged 60 years. Obtain a
95% condence interval for the mean muscle mass for all women aged 60 years. Remember to
modify the data to include a case to make SAS compute these.
(h) The following plots must be produced as part of the graphical output from your SAS program
as described at the beginning of this question. Attach these plots to your solution and answer
the questions relating to them, if any.
1
i. Obtain a graph with plots of the 95% condence interval and 95% prediction interval
curves for the tted regression line overlaid on a scatter plot of the original data.
ii. Obtain a scatter plot of residuals against the x variable. Does this plot to explain whether
the straight line model is adequate or not? Explain.
iii. Obtain a normal probability plot residuals and a plot of residuals against the predicted val-
ues variable. Do these plots indicate that any of the model assumptions are not plausible?
In particular, is the assumption of normal errors reasonable? Explain.
2. It is reasonable to expect that heavier an automobile is it will be less ecient as reected by the
miles per gallon (MPG) rating of the vehicle. The following data give MPG ratings (y) under city
driving conditions and the weight (x) of a random sample of 16 new vehicles.
Automobile Weight(lbs.) MPG
ID x y
A 2620 16.0
B 2875 21.0
C 2320 22.8
D 3215 21.4
E 3440 18.7
F 3510 19.1
G 3570 14.3
H 2790 24.4
I 3150 22.8
J 3240 19.2
K 3670 16.4
L 3730 17.3
M 2200 30.4
N 2465 25.5
O 1835 31.9
P 2045 26.3
Use a SAS program to t a single variable regression model and obtain all residual case statistics
and diagnostic plots discussed in class. Also use proc sgplot to obtain a scatterplot of the data with
each point labeled using the Vehicle Id.
(a) Are there any cases that are x-outliers? Explain.
(b) Are there any cases that are y-outliers? Explain.
(c) The Cooks D statistic for some of these cases are large. Explain reasons for this by using
the fact Cooks D is a product of functions of studentized residuals and hat diagonals.
(d) Suppose that the model is retted after the vehicles labeled A and O are deleted from the data
set one at a time. Discuss the model t for each of these compared with the t of the original
model.
(e) Explain the eect of these cases on the model t by using the appropriate case statistics output
from the model t to the original data.
Due Thursday, November 7, 2013
2

You might also like