You are on page 1of 4

DSUR5104 Statistical Methods Exercises 3

University of Leeds Centre for Epidemiology and Biostatistics Semester 1 2011

Useful R commands
prop.test(x=,n=,p=0.5,alternative="two.sided",conf.level=0.95) #produces a one-sample test of proportions and confidence interval #for proportion x/n with alternative hypothesis proportion not equal 0.5 prop.test(x=c(x1,x2),n=c(n1,n2),alternative="two.sided",conf.level=0.95) #produces a two-sample test of proportions and confidence interval #for difference x1/n1-x2/n2, alternative hypothesis x1/n1 not equal x2/n2 xtabs(countfactor1+factor2) #produce a two-way contingency table for counts within the different #levels of factor1 and factor2 xtabs(countfactor1+factor2,subset=factor3=="level1") #produce a two-way contingency table for counts within the different #levels of factor1 and factor2 for just those observations for which #factor3 takes the value "level1" chisq.test(xtabs(countfactor1+factor2)) #produces a chi-square test of independence between factor1 and factor2 chisq.test(xtabs(countfactor1+factor2))$expected #extract the expected frquencies under independence fisher.test(xtabs(countfactor1+factor2),alternative="two.sided") #conduct a two-sided Fishers exact test for the contingency #table of factor1 against factor2 mcnemar.test(xtabs(countfactor1+factor2)) #conduct McNemars test for the contingency table of factor1 against factor2 cor.test(data1,data2,alternative="two.sided",method="pearson") #calculate Pearsons correlation coefficient and perform a two-sided test #of whether the correlation is significantly different from zero cor.test(data1,data2,alternative="two.sided",method="spearman") #calculate Spearmans correlation coefficient and perform a two-sided test #of whether the correlation is significantly different from zero summary(lm(yx)) #fit a regression line of y against x and summarise the results abline(lm(yx),col="blue") #add a blue regression line to an existing scatter plot plot(fitted(lm(yx)),residuals(lm(yx))) #plot regression model residuals against fitted values

1. A study (N. S. Levitt et al. J. Epi. and Commun. Health 53 264-268 (1999)) aimed to identify the environmental factors related to the emergence of cardiovascular disease in children living in an urban environment in South Africa. A birth cohort was formed, and when the children were ve years old, the children and care givers were traced to attend interviews. The ve-year group was compared to those who could not be traced on a number of factors. One of the factors was a variable that determined whether the mother had medical aid (which is similar to health insurance) at the time of the birth of the child. Consider the table of data given below.

DSUR5104 Statistical Methods Exercises 3

University of Leeds Centre for Epidemiology and Biostatistics Semester 1 2011

Medical aid / Interviews Had medical aid No medical aid Total

Not traced 195 979 1174

Five-year group 46 370 416

Total 241 1349 1590

(a) Use the data to construct a 95% condence interval for the proportion of urban South African children that have medical aid. 3 marks (b) Construct a 99% condence interval for the difference in the proportions of urban South African children that have medical aid in the group traced for ve-year interview, compared to the group that could not be traced. 3 marks (c) The study also considered the race of the mother as shown in the table below (and represented in the data le cardio.csv). Medical aid / Interviews Had medical aid No medical aid Total Medical aid / Interviews Had medical aid No medical aid Total White Five-year group 10 2 12 Black Not traced Five-year group 91 36 957 368 1048 404 Not traced 104 22 126 Total 114 24 138 Total 127 1325 1452

i. Conduct a chi-square test of independence with null hypothesis that there is no association between medical aid and participation in follow up. Compare the result with your test in (b) above. ii. Repeat the analysis of (i) just for black mothers and then just for white mothers. iii. Comment on your results and the validity of any assumptions made in the test procedures. 3 marks 2. A study (B. Kristensen et al. J. Intern. Med. 232 237-245 (1992)) considered the effect of prednisolone on severe hypercalcaemia in women with metastatic breast cancer. Of 30 patients, 15 were randomly selected to receive prednisolone, and the other 15 formed a control group. Seven of the 15 prednisolone-treated patients achieved normalisation in their level of serumionised calcium. This happened for none of the 15 patients in the control group. The data are stored in the le hypercal.csv. (a) Use Fishers exact test to determine whether the results were signicantly better for treatment than control. Comment on your results. 2

DSUR5104 Statistical Methods Exercises 3

University of Leeds Centre for Epidemiology and Biostatistics Semester 1 2011

3 marks 3. A study (J. L. Coulehan et al. Amer. J. Public Health 76 412-414 (1986)) of acute myocardial infarction (MI) among Navajo Indians matched 144 victims of MI according to age and gender with 144 individuals free of heart disease. Subjects were then asked whether they had ever been diagnosed as having diabetes. The data are stored in the le diabetes.csv and shown in the contingency table below. MI cases / MI controls Diabetes No diabetes Total Diabetes 9 16 25 No diabetes 37 82 119 Total 46 98 144

(a) Use McNemars test to determine if the proportion of individuals with diabetes is the same amongst the MI cases and the MI controls in this matched-pairs study. Comment on your results. 3 marks 4. The data le mortality.csv contains information on the percentage of births attended by a physician and the maternal mortality rate (per 100,000 live births) for a range of developed and developing countries. (a) Produce a scatter plot of maternal mortality rate against percentage physician attendance and comment on the results. 3 marks (b) Calculate Pearsons correlation coefcient for these data. Test whether the correlation coefcient is signicantly different from zero. 3 marks (c) Calculate Spearmans rank correlation coefcient for these data. Test whether the correlation coefcient is signicantly different from zero. Which of the two correlation coefcients do you think is a more appropriate summary of the relationship between the variables and why? 3 marks 5. A study (A.J. Lea British Medical Journal 1 488-490 (1965)) discussed the relationship between mean annual temperature (in F) and the mortality index for a type of breast cancer in women. The subjects were residents of 16 regions of Great Britain, Norway, and Sweden. The objective of the study was to determine whether a linear relationship between the variables was appropriate. The data are given in the le cancer.csv. (a) Construct a scatter plot of breast cancer mortality rate against mean annual temperature. Fit a regression line with temperature as the independent variable and mortality index as the dependent variable. Add the regression line to your plot and interpret your results. 3

DSUR5104 Statistical Methods Exercises 3

University of Leeds Centre for Epidemiology and Biostatistics Semester 1 2011

3 marks (b) Plot the residuals from the model against the tted values and produce a QQ plot of the residuals. Comment on your results. 3 marks

You might also like