You are on page 1of 6

USU 7610 – Research Design and Analysis II Fall 2008

EXAM I

Name: ________________________ Oct 20 4:30PM, 2008


SSN (last 3 digits): _____________
USU 7610 – Research Design and Analysis II Spring 2008 1

Please attach this sheet to your answer sheets.

1. You will receive a data set called “sesame_7610.sav.” This data set stems from a study
evaluating the effects of the TV series “Sesame street” on the cognitive development of
children. For the purpose of the following analyses, please use the following predictor
variables:

• preknow_body = Knowledge of body parts before exposure to the series


• preknow_letters = Pre-knowledge of letters
• preknow_forms = Pre-knowledge of forms
• preknow_numbers = Pre-knowledge of numbers
• MA_Peabody = Mental age assessed with the Peabody intelligence test
• MA_times_prebody = Index combining preknowledge of body parts with
Peabody intelligence test.

The dependent variable is “postknow_body,” that is, we want to predict knowledge


of body parts assessed from the same children after repeated exposure to the Sesame
Street series.

Your task: Explore what combination of variables provides the best and most
parsimonious prediction of the dependent variable “postknow_body.
For comparison purposes, perform three types of regressions, print the SUMMARY
TABLE for each, and answer the following questions:
(a) Which variables are retained when you perform a forward regression? Why
5 points are none of the remaining variables retained (i.e., what criterion would they
need to meet – and fail to meet – in order to be taken into the regression
equation)?
(b) - Which variables are retained if you perform a backward regression?
5 points
- Do we get the same results?
- If not, why do we have a different selection of variables in “backward” than in
“forward?”
(c) Perform the same regression with the “enter” method, entering all predictors in
5 points
a single block. Compare the R2 of this analysis with the R2 of both (a) and (b).
How much additional variance does the group of predictors that were NOT
retained in the forward and backward regressions account for in the results in
(c)?
(d) For the analysis performed in (c), show that the multiple correlation that you
10 points obtained in this analysis corresponds to the bivariate correlation between Y and
Yˆ .
2. The following output was produced from a study on psychological outcomes of loss of
a parent during childhood. The dependent variable is called “Depress_T,” indicating
the level of depression on a standardized depression scale. Predictors of depression are:
• PV_Loss = Perceived vulnerability to additional loss in the future
• Ageatloss = Age at loss
• Gender, with 1 = male, 2 = female,
• Supp_tot = social support
USU 7610 – Research Design and Analysis II Spring 2008 2

Here is the partial output:

Descriptive Statistics

Mean Std. Deviation N


depress_T Depression 61.5333 9.12827 135
pv_loss Perceived vulnerability to loss 22.1407 6.66934 135
ageatlosa Age at loss 11.9333 4.49411 135
Gender 1.6370 .48265 135
supp_total Social Support score 65.5037 13.11082 135

Model Summary

Model Adjusted R Std. Error of


15 points
R R Square Square the Estimate
1 .567(a)
a Predictors: (Constant), supp_total Social Support score, pv_loss Perceived vulnerability to loss,
ageatlosa Age at loss, Gender

ANOVA(b)

Sum of
Model Squares df Mean Square F Sig.
1 Regression 3591.343 4
10 points Residual 7574.257 130
Total 11165.600 134
a Predictors: (Constant), supp_total Social Support score, pv_loss Perceived vulnerability to loss,
ageatlosa Age at loss, Gender
b Dependent Variable: depress_T Depression

Coefficients(a)

Unstandardized Standardized
Model Coefficients Coefficients Correlations
Part
B Std. Error Beta t Sig. Zero-Order Partial (semipartial)
(Constant) 65.764 4.370 ----- ----- -----
pv_loss Perceived
vulnerability to loss .729 .104 .429 .523 .506

ageatlosa Age at
-.172 .149 -.015 -.100 -.083
loss
Gender -5.545 1.440 -.164 -.320 -.278
supp_total Social
Support score -.141 .051 -.215 -.237 -.200

a Dependent Variable: depress_T Depression

15 points
(Total for a: (a) Complete the empty cells above, including10 points 5 points
those in the ANOVA table, Model
55 points) summary table, and the Coefficients table.

(b) Based on the correlations shown in the coefficients table, calculate the
10 points UNIQUENESS of each predictor if all other predictors are controlled for (i.e.,
the amount of variance explained by each single predictor if this predictor were
entered last into the regression equation).
USU 7610 – Research Design and Analysis II Spring 2008 3

3. The following table shows the scores of the first six participants in a lung cancer study
for optimism, support, tumor status, and quality of life.
Further, the residuals of optimism (called “res_opti”), support (called “res_supp”), and
quality of life (called res_qol”), controlling for the effects of tumor status are shown.

Case Summaries
ID optimism support tumorstat quallife res_opti res_supp res_qol
1 26.00 90.00 4.00 114.00 .37019 1.67207 -33.86181
2 24.00 86.00 4.00 131.00 -1.62981 -2.32793 -16.86181
3 27.00 93.00 1.00 157.00 1.26630 6.45545 -4.28174
4 29.00 90.00 2.00 128.00 3.30093 2.86099 -28.80843
5 26.00 48.00 1.00 135.00 .26630 -38.54455 -26.28174
6 22.00 89.00 1.00 148.63 -3.73370 2.45545 -12.65674
a Limited to first 6 cases.

Question: Which column / variable scores would you have to correlate to obtain:
(a) The bivariate correlation between optimism and quality of life
2 points ________________________________________________
each
(a) to (e) (b) The partial correlation between support and quality of life, controlling for
tumorstat in both _________________________________________
(c) The semipartial (part) correlation between support and quality of life,
controlling for tumorstat in support ______________________________
(d) the partial correlation between optimism and support
__________________________________________________
(e) the semipartial (part) correlation between optimism and support, controlling for
tumorstat in optimism _________________________________

4. Let us assume you work on a literature review in preparation of your dissertation or for
a meta-analysis. You want to find out from the published literature how much variance
in the GRE scores will be explained by each IQ and hours of training for the test, and
what is the uniqueness of these two predictors. Here is a correlation table that was
published in a statistical report:
Pearson Correlation

IQ Training Hours GRE


IQ 1.000 .272** .403**
Training Hours .272** 1.000 .380**
GRE .403 .380** 1.000
USU 7610 – Research Design and Analysis II Spring 2008 4

5 points (a) What amount of variance do both IQ and training hours explain if entered each
as the only predictor?
10 points (b) How much variance does training explain beyond the variance explained by IQ?
10 points (c) How much variance does IQ explain beyond the impact of training?
10 points (d) How much variance explain the two variables in combination?

5. You will receive a data set called Exam1_no5.sav. This data set stems from a study on
recovery from heart surgery and yields the following variables:

- Age = age of participant


- Gender = 1 = male, 2 = female
- Op = type of surgery (0 = heart valve replacement, 1 = bypass surgery)
- symptom_p = post-surgical symptom stress
- worry = worry about symptoms / disease
- pom_sad = depression
- heartfunc = a NEGATIVE Indicator of heart output / efficiency
- selfcontrol = presurgical assessment of self-control skills
- active_p = physical activity following surgery = dependent variable

(a) Compute the bivariate correlations between all quantitative predictor variables and the
dependent variable (= active_p). Evaluate whether the correlations make sense to
10 points
you, specifically in terms of their direction (sign).

(b) Perform a visual inspection of the data for a subsequent regression analysis in which
the first . Check for
- violations of linear relationships in the partial plots
- correlations between each quantitative predictor and the residual of the regression
(assumption: rxe = 0)
- deviations from the assumption that the residual does not correlate with the
predicted value (ry^e = 0) and that the residuals show homoscedasticity
20 points - normality of the residuals.
For each of the above criteria, provide a statement whether your visual inspection
indicates any potential problems. (Evaluate each individual plot)

(c) Run a regression with ALL PREDICTORS in the dataset, including gender and op.
15 points Check for multicollinearity:
- Request and evaluate ALL multicollinearity diagnostics covered in class (starting
with the bivariate correlations among predictors).
- Explain what you find. What are the symptoms of multicollinearity in this
regression analysis?
- Which variable(s) indicate the greatest “trouble?”

(d) Eliminate a single variable that causes the greatest trouble in terms of multicollinearity.
Perform the regression again, excluding this predictor, and check whether the problem
5 points
with multicollinearity is now resolved. NOTE: I expect an answer, not just an output!!
NOTE: If there is still multicollinearity, eliminate the next predictor that still causes
trouble.
USU 7610 – Research Design and Analysis II Spring 2008 5

6. Use again the data from #5.

(a) Now, after having resolved problems with multicollinearity, check for
20 points
multivariate outliers: For the same regression analysis, under “SAVE,” save the
multivariate outlier diagnostics indicating leverage, impact, distance. Evaluate
the results. Are there any outliers in terms of leverage, distance, and / or
impact?
15 points (b) From your results in (a), which two cases do you identify as the worst
multivariate outliers (report the ID numbers)?
(c) Perform the final analysis, first including ALL cases, in a stepwise
(hierarchical) approach, and excluding the variable(s) that caused
multicollinearity in #5. Use the following regression model:
1. In the first step, enter age, gender, and type of surgery (OP) in the first step.
(This is what you typically do to “partial out” the influence of these
variables before testing the impact of your target predictors. )
2. In the second step (block), enter the physiological symptoms and heart
functioning into the regression (unless one of them was excluded because of
multicollinearity).
3. In the third step (block), enter all remaining predictors except (a) the
variable “selfcontrol”, and (b) predictor(s) identified as causing
multicollinearity.
4. In the last step, enter self-control.

Explain the output.

20 points
- How much additional variance is explained in each step (i.d., what is
delta R2 in each step)?
- What are your conclusions regarding the predictors and their relative
contributions to the prediction of post-surgical activity level of the
participants?
10 points (d) How do the results change if you exclude the two worst outlier cases identified
in (f)? Which of the coefficients change the most?

Total points possible: 250

You might also like