University of North Carolina at Chapel Hill EXAM

University of North Carolina at Chapel Hill
School of Public Health

Department of Epidemiology
Fundamentals of Epidemiology (EPID 168)
Midterm Examination, Fall 1999
 Instructions:
o Write the last 4 digits of your ID number in space provide on each page (top
right).
o Write clearly and legibly; avoid writing on the back of these pages.
o Show all your work and include units where appropriate.
o Write all answers and computations on these pages.
1. Which of the following best describes the retrospective design where subjects are
sampled by disease status and is often used when the investigator is interested in rare
diseases. (4 pts)
A. intervention trial
B. case control study
C. retrospective cohort
D. ecologic study
E. none of the above
2. Which of the following best describes the study design that can be either retrospective or
prospective and is often used when the investigators are interested in rare exposures. (4
pts)
A. intervention trials
B. cohort studies
C. prevalence studies
D. case control study
3. The strength of an association is one of the criteria for evaluating the cause and effect
relationship between an exposure and outcome. Which of the following is a measure of
the strength of association? (Choose one best answer). (4 pts)
A. incidence rate among the exposed

B. cumulative incidence among the exposed
C. the ratio of odds of exposure among cases to the odds of exposure among the non-
cases
D. odds of disease among exposed relative to the prevalence of exposure in the
source population
4. Incidence rates of a disease are often referred to as direct measures of risk. Can incidence
rates be calculated from case-control studies? Briefly explain in 1-2 sentences why they
can or can not be calculated. (4 pts)
5. For each of the following epidemiological measures, indicate whether it is a rate, a

proportion or that it is neither a rate nor a proportion. Circle the best answer. (1 pt each)
a. Population attributable risk RATE PROPORTION NEITHER
b. Incidence density (ID) RATE PROPORTION NEITHER
c. Prevalence RATE PROPORTION NEITHER
d. Relative risk RATE PROPORTION NEITHER
6. Indicate true or false next to each of the following. (2 pt each)
____ ____ a. A "J" or "U" shaped relationship of a continuous risk factor and continuous
measure of disease suggests a Pearson product-moment correlation coefficient of
near plus one or minus one.
b. A risk ratio measure and a correlation coefficient are both measures of
____ ____
association.
c. A population attributable risk proportion depends on the prevalence of exposure
____ ____
and is not directly related to the strength of an association.
d. The study base for a case-control study consists of those people who if they
____ ____
developed the disease could have been counted as cases.
e. The Bradford Hill criterion "coherence" means that the association has been
____ ____ observed repeatedly in different places, by different observers, and at different
times.
f. If an exposure is a cause of a disease, then "temporality" is the Bradford Hill
____ ____
criterion for causal inference that must hold true between exposure and disease.
7. The death rates from various conditions are often compared across geographic areas.
These comparisons are usually based on directly age-standardized mortality rates. Which
of the following best describes what is meant by an age-standardized rate created by the
direct method? (Choose one best answer). (4 pts)
A. The number of events in each age stratum of a standard population is used to
create a weighted average rate.
B. The event rates in each age stratum in the standard population are used to create a
weighted average rate.
C. The event rates in the geographic area of interest are applied to the age-stratum
sizes of a standard population to create a rate that is a weighted average.
D. The event rates in the geographic area of interest are compared to the event rates
of a standard population to create a summary rate that is a weighted average.
8. In order to estimate counts and rates of work-related fatalities, the National Traumatic
Occupational Fatality system has introduced a tick-box on the death certificate to indicate
"injury at work." Kraus et al. (Am J Epidemiol 1995; 141: 973-9) attempted to validate
this "injury at work" classification system against a gold standard [International
Classification of Diseases (ICD) death certificate codes designating deaths that occurred
during work-related activities]. After reviewing a sample of 100,000 death certificates,
the authors reported the following: 1,195 true positives; 788 false positives; 97,672 true
negatives; 345 false negatives. ("positive" indicates that the tick-box was checked;
"negative" indicates that it was not checked; "true" indicates agreement between the tick-
box and the ICD code).
a. Using the counts provided above, complete the 2x2 table below: (2 pts)
ICD Classification
Not work-
Death Certificate Work-related TOTAL
related
Work-related 1195 345
Not work-related 788 97672
TOTAL
b. What are the sensitivity and specificity of the "injury at work" classification
system? (4 pts)
c. What is the positive predictive value? In your own words, how would you interpret
this value? (3 pts)
d. Based on these data is the death certificate "injury at work" classification system
likely to underestimate or overestimate the true number of work-related fatal
injuries? (2 pts)
e. The use of data from the "tick-box" on the death certificates to track work-related
mortality trends is an example of which kind of surveillance system? (choose one
best answer). (4 pts)
A. Active surveillance
B. Passive surveillance
C. Retrospective cohort surveillance
D. Cross-sectional survey surveillance
f. The sensitivity and specificity computed above are quantitative measures of which of
the following aspects of death certificate classification of work-related fatalities?
(choose one best answer). (4 pts)
A. Reliability of death certificate classification

B. Repeatability of death certificate classification
C. Validity of death certificate classification
D. Attributable risk of work-related classification on death certificates
E. None of the above
9. Age-related maculopathy is a leading cause of blindness among people 65 and older in the
United States, and is estimated to affect between 16 and 26% of people in this age group. In
a recent study by Klein, residents aged 43 to 86 years in the town of Beaver Dam, Wisconsin
were asked to participate in a study to determine whether cigarette smoking was related to
age-related maculopathy. At a baseline examination, participants were asked to report their
lifetime smoking habits. After 5 years, participants had an examination to determine whether
they had developed age-related maculopathy. The following table presents the number of
cases of age-related maculopathy measured at the follow-up examination among the 1232
male participants ages 43-86 who did not have age related maculopathy (ARM) at the
baseline examination:
Smoking status N Cases of ARM
Never smokers 368 26
Ever smokers 864 79
a. Which of the following best describes the research design used by in this study?
(choose one best answer) (3 pts)
A. Population based cross-sectional study

B. Case cohort study
C. Nested case control study
D. Prospective cohort study
b. Create a 2 x 2 table where one axis is smoking status and the other is age-related
maculopathy status. (4 pts)
c. Calculate the 5-year cumulative incidence of age-related maculopathy in ever

smokers, and in never smokers. Show your work. (4 pts)
d. Calculate the cumulative incidence ratio comparing the incidence of age-related

maculopathy in ever smokers with that in never smokers. Show your work. (4 pts)
e. Assuming causality, what is the proportion of cases of age-related maculopathy that

could have been prevented in the population of males ages 43-86 in Beaver Dam if
the smokers had never smoked? Show your work. (4 pts)
10. The following data come from a national survey of the occurrence of back pain. A case of
low back pain was defined as having at least one episode of severe back pain occurring over
a period of 6 months. The number of cases was obtained from surveys of different
occupation groups as well as a national random sample.
Cell phone manufacturing Textile manufacturing National random sample
Age Persons cases Rate Persons Cases Rate Persons Cases rate
25-39 1000 2 .002 100 2 .02 10,000 30 .003
40-55 700 25 .037 500 30 .06 15,000 900 .06
55+ 50 15 .300 1500 150 .100 15,000 1200 .08
Total 1750 42 .024 2100 182 .087 40,000 2130 .053

a. Compute a standardized event ratio (similar to a standardized mortality ratio (SMR)
except the episodes of back pain aren’t mortal events) of back pain for the cell
phone-manufacturing employees. Briefly state in one sentence the interpretation of
this measure in this case. (3 pts)
b. Compute a standardized event ratio (similar to a standardized mortality ratio (SMR)

except the episodes of back pain aren’t mortal events) of back pain for the textile-
manufacturing employees. Briefly state in one sentence the interpretation of this
measure in this case. (3 pts)
c. Can these two ratios in part (a) and (b) be compared? Briefly explain why or why not.
(3 pts)
11. The evidence supporting obesity as a risk factor for colon cancer remains inconclusive,
especially among women. A recent study (Am J Epidemiol 1999;150:390-398) reported the
association between obesity (measured at baseline) and colon cancer morbidity as
determined from review of medical records and death certificates in a nationally
representative cohort of men and women age 25-74 years who participated in the First
National Health and Nutrition Examination Survey from 1971 to 1975 and were
subsequently followed up through 1992. The following table is from this study for men and
women combined.
Baseline body Number of incident Person-years Crude

mass index* cases of colon cancer of follow up incidence
rate/100,000
PY
<22 28 53,475
22 - <24 41 38,919
24 - <26 36 36,610
26 - <28 40 32,635
28 - <30 35 21,122
30+ 42 34,904
* kg body weight per height in meters squared
a. Which of the following best describes the research design used in this study? (choose
one best answer). (2 pts)
A. Cross-sectional survey
B. Ecological study
C. Population based case control study
D. Cohort study
b. Complete the table by calculating the crude body mass index-specific incidence rates.
(3 pts)
c. Calculate the relative risk (RR) of colon cancer associated with a BMI of 28-<30. Use
the lowest BMI category as referent. In one sentence interpret your answer. (2 pts)
d. Calculate the attributable risk proportion of those in the 28-<30 BMI category. In
one sentence interpret your answer. (the attributable risk formulas provided in class
can be used even though the data provide is for rates) (2 pts)
12. Analyses of data from cohort studies often have to deal with the reality that participants have
unequal lengths of follow up. Given the data below, calculate the (a) total person time
(month) of follow up, (b) the overall incidence density rate, (c) 13 month cumulative
incidence, and (d) the product limit estimate of failure. Each horizontal line represents a
cohort participant. Each vertical line represents one month. Arrows indicate time of loss to
follow up. Black boxes indicate onset of disease (failure). (2 pts each)
a. ______________
b. ______________
c. ______________
d. ______________
(Unfortunately the diagram is not yet available)

Answer Guide
1. B. Case-control studies are said to use sampling by disease and are suited for studying rare
diseases.
2. B. Cohort studies can be either retrospective or prospective and are often used to study rare
exposures.
3. The ratio of odds of exposure among cases to odds of exposure among noncases is the odds
ratio, which is a measure of association.
4. Incidence rates cannot be estimated from case-control studies without additional
information. In the case-control design selection of subjects is based on disease status, so the
number of cases is under the control of the investigator. If the investigator has access to all
cases and knows the size of the population from which they arise s/he can estimate
incidence, but knowledge of the population size is not available from the case-control
design.
5.
a. Population attributable risk (PARP)

Both "proportion" and "neither" received credit, since this is a subtle distinction.
According to Regina Elandt-Johnson (Am J Epidemiol 1975;102:267-271), a
proportion is a type of ratio in which the numerator is included in the denominator
[p=a/(a+b)]. Since PARP can be expressed as ("attributable" cases / all cases), it is
indeed a proportion. However, it can also be expressed as a difference of two
proportions (I-I0) or the product of a proportion (prevalence) and the difference of
two proportions [p(I1-I0)], so it is easy to be misled about its mathematical form
(indeed, the "official" answer to this question could not explain why it is a
proportion!).
b. Incidence density (ID) is a RATE.
c. Prevalence is a PROPORTION.
d. Relative risk is NEITHER a rate nor a proportion.
6. Indicate true or false next to each of the following. (2 pt each)
a. FALSE – A Pearson product-moment correlation coefficient measures the extent to

which a relationship is linear, so a value of plus one or minus one corresponds to a
straight line.
b. TRUE – A risk ratio measure and a correlation coefficient are both measures of
association.
c. FALSE – A population attributable risk proportion depends on the prevalence of

exposure and is ALSO directly related to the strength of an association.
d. TRUE – The study base for a case-control study consists of those people who if they
developed the disease could have been counted as cases.
e. FALSE – The Bradford Hill criterion "coherence" means that all of the known facts
about the relationship fit into place; the criterion of "consistency" means that the
association has been observed repeatedly in different places, by different observers,
and at different times.
f. TRUE – "Temporality" is the one Bradford Hill criterion for causal inference that
must hold true between exposure and disease.
7. C. "The event rates in the geographic area of interest are applied to the age-stratum sizes of a
standard population to create a rate that is a weighted average" describes a directly-
standardized rate.
8. a.
ICD Classification
Death Certificate Work-related Not work-related TOTAL
Work-related 1195 788 1,983
Not work-related 345 97,672 98,017
TOTAL 1,540 98,460 100,000
b. Sensitivity = 1,195/1,540 = 78% Specificity = 97,672/98,460 = 99%

c. Positive predictive value = 1,195/1,983 = 60%
d. Based on these data the death certificate "injury at work" classification system will
overestimate the true number of work-related fatal injuries, since more non-work-
related injuries will be classified as work-related than vice-versa.
e. B. Passive surveillance – the reports are submitted by health care workers in

conformance with a general obligation rather than in response to a specific request
from the surveillance organization.
f. C. Sensitivity and specificity are measures of validity, since there is a standard for
"truth".
9. D. Prospective cohort, since the investigators monitored people without the condition over
time to detect its development.
Cigarette smoking status
Ever smokers Never Total

smokers
Case ARM 79 26 105

cases
Status Non-cases 785 342 1127
Total 864 368 1232
a. CI in ever smokers = # new cases / population at risk = 79/864 = 0.091 in 5 years

CI in never smokers = # new cases / population at risk = 26/368 = 0.071 in 5 years
b. (was labeled "e") Cumulative incidence ratio (CIR) = CI in ever smokers / CI in
never smokers
= (79/864) / (26/368) = 1.29
c. (was labeled "f") PARP = (overall incidence – incidence in never smokers) / overall
incidence of ARM
= (0.0852 – 0.0707) / 0.0852 = 17%
10.
a. Standardized event ratio (for cell phones) = SMR (cell phone) = observed/expected
= 42/{(.003)(1000) + (.06)(700) + (.08)(50)} = 42/49 = 0.86
b. Standardized event ratio (textiles) = SMR (textile) = observed/expected

= 182/{(.003)(100) + (.06)(500) + (.08)(1500)} = 182/150 = 1.2
c. These two ratios cannot be compared directly. An SMR is a weighted average where
the weights (e.g., age structure) come from the population for which indirect
standardization is being carried out. So SMRs for two populations use different
weights. Unless the populations have identical age structures, the stratum-specific
rates are the same for all strata, or the stratum-specific rates for one population are a
constant multiple of those for the second population, the comparison is invalid. With
indirect standardization, it is actually the "standard population" rates that are being
"standardized" to the age distribution of the study population.
11.
Baseline body Number of incident Person-years Incidence

mass index* cases of colon cancer of follow up rate/100,000 PY
<22 28 53,475 52.4
22 - <24 41 38,919 105.3
24 - <26 36 36,610 98.3
26 - <28 40 32,635 122.6
28 - <30 35 21,122 165.7
30+ 42 34,904
* kg body weight per height in meters squared
a. D. Cohort study
b. RR of colon cancer for BMI 28-<30 kg/m2 vs. lowest = 165.7/52.4 = 3.16
c. ARP for BMI 28-<30 kg/m2 vs. lowest = (3.16 – 1) / 3.16 = 68%
The ARP of 68% means that 68% of the incidence in the 28-<30 kg/m2 group is
attributable to elevated BMI.
12.
a. 43 person-months
b. 3 cases/43 person-months = 7.0 cases per 100 person-months
c. 13-month CI = 3/7 = 0.43
d. Product-limit estimate of survival = 1-[(6-1)/6 x (5-1)/5 x (3-1)/3)] = 1-0.444 =

0.555
Revised and formatted 8/4/2000, 8/8/2000 by Victor_Schoenbach@unc.edu

Final Examination, Fall 1999
The questions on this examination are largely based on Cantor KP, Lynch CF, Hildesheim ME,
Dosemeci M, Lubin J, Alavanja M, Craun G. Drinking water source and chlorination byproducts in
Iowa. III. Risk of brain cancer. Am J Epidemiol 1999;150:552-60. You may refer to an unannotated
copy of this article during the examination.
1. Briefly discuss two reasons why a case-control study is (or is not) well suited to examine risk
factors for brain cancer. (3 pts)
2. The authors describe the study design they used as a "population-based case-control study".
Briefly explain how this is different than a non-population based case-control study. Include
in your answer issues regarding the selection of cases, selection of controls, and validity. (3
pts)
3. Cases were identified by the State Health Registry of Iowa. Which of the following
categories of study design best describes this method of case finding? Choose one best
answer. (3 pts)
A. Prospective follow-up
B. Passive surveillance
C. Cross-sectional survey
D. Community-based screening
E. Hospital-based surveillance
4. The authors state that cases had to be newly diagnosed with histologically confirmed glioma
without previous diagnosis of a maligant neoplasm. Which of the following best describes an
advantage of using incident cases instead of prevalent cases? Choose one best answer. (3 pts)
A. Using incident cases allows the investigators to directly compute relative risks.
B. Using incident cases reduces the non-systematic error of case-control studies.
C. Estimates of exposure from incident cases may be less influenced by disease status.
D. Using incident cases allows for the investigation of effects on risk versus those
effecting duration.
E. Incident cases are less likely to be lost to follow up than prevalent cases.
5. Even if the investigators are careful in the selection of cases and controls, selection bias can
make interpretation of results difficult. Which of the following is NOT a situation that can
produce selection bias? Choose one best answer. (3 pts)
A. The exposure has some influence on the process by which controls are selected.
B. The exposure has some influence on the process of case ascertainment.
C. The disease status has some influence on the recall of exposures.
D. The exposed cases are reported to registries more than unexposed.
E. All of the above will produce selection bias.
6. In this study, exposre information for many of the brain cancer cases was provided by proxy
respondents. The authors did not have information from independent sources that could be
used to directly verify information provided by these surrogates. However, suppose a follow-
up questionnaire was administered to cases, and for 85 of the cases, the investigators were
able to obtained information about whether or not they used a private well directly for the
cases (self report). Assuming that self report is the best available assessment of whether they
used a private well or not, complete the table below so that it reflects a sensitivity, specificity,
and positive predictive value of a proxy response of 77%, 75%, and 57%, respectively.
Assume that 26 of cases reported that they used private wells. Show your calculation. (6 pts)
Proxy report Self Report = YES Self report = NO
YES
NO
7. Cases in this study were histologically confirmed. This is an example of which of the
following disease classification criteria? Choose one best answer. (3 pts)
A. Causal criteria
B. Ecologic criteria
C. Manifestational criteria
D. Etiologic criteria
8. Consider the data presented in Table 1 of this article. Which of the following best represents
the proportion of the risk of brain cancer in the population that is attributable to working on
a farm (farm occupation). Assume that a farm occupation is causally related to brain cancer
risk. Choose one best answer. (4 pts)
A. 33%
B. 57%
C. 10%
D. 29%
E. Cannot be calculated from case-control studies
9. A case-control study like the one described in this paper is most useful when it helps us
understand what is happening in the study base (underlying population). Which of the
following best describes the study base in this article? Choose one best answer. (3 pts)
A. The study base is those who if they developed brain cancer could have been selected
as a case.
B. The study base is those who have an equal probability to be selected as a case or
control.
C. The study base is those who are identified as cases or controls after excluding non-
responders.
D. The study base is those who if exposed would have been identified as exposed.
E. None of the above.
10. In Table 3 the odds ratios for incident brain cancer by duration of chlorinated surface water
exposure are given. The odds ratio (95% confidence interval) in men estimating the risk of
brain cancer with 1-19 years of exposure is 1.3 (0.8, 2.1) and 2.5 (1.2, 5.0) for 40 years or
more of exposure. Which of the following best describes the role of chance in observing
these two estimates? Choose one best answer. (3 pts).
A. The odds ratio for  40 years exposure is more likely due to chance because it is
based on fewer cases and controls.
B. The odds for 1-19 years of exposure is more likely due to chance because the point
estimate is closer to the null value (1.0).
C. The odds ratio for  40 years exposure is more likely due to chance because the
confidence interval is so wide.
D. The odds ratio for 1-19 years of exposure is less likely due to chance because the
confidence interval is narrower.
E. The odds ratio for  40 years exposure is less likely due to chance because the
confidence interval does not include 1.0.
11. Table 3 presents odds ratios for the association of incident brain cancer with various levels
of lifetime average THM exposure. The odds ratio (95% confidence interval) for lifetime
average THM concentration of 0.8-2.2  g/liter for men was 0.9 (0.6, 1.6). The odds ratio
(95% confidence interval) for lifetime average THM concentration of  32.6  g/liter for
woman was 0.9 (0.4, 1.8). Which of the following best describes the precision of these two
estimates of risk? Choose one best answer. (3 pts)
A. The estimate is equal because the point estimates are the same.
B. The estimate is equal because neither confidence interval excludes 1.0.
C. The estimate in men is slightly more precise because the confidence interval is
narrower.
D. The estimate in women is slightly more precise because the exposure level is much
higher.
E. The precision of the estimates cannot be compared because they are from different
exposure groups.
12. Using the data in Table 4, which of the following best describes the crude unadjusted odds
ratios estimating the risk of brain associated with  40 years exposure to chlorinated surface
water in men with above median tap water intake? Use the category of 0 years exposure to
chlorinated surface water as the reference group. Choose one best answer. (4 pts)
A. 4.0
B. 1.5
C. 3.6
D. 2.6
E. Cannot be computed from data in Table 4.
13. Table 1 shows the adjusted odds ratio estimating the risk of brain cancer by population size.
Using the  25,000 population size as a reference calculate the crude (unadjusted) odds ratio
associated with the > 50,100 population. In 2 sentences or less explain why the two estimate
agree or disagree. (4 pts)
14. The authors state that they "found a dose-response relationship among men between brain
cancer and duration of consuming drinking water from chlorinated surface water…". Using
3 Bradford Hill criteria, in 3-4 sentences, address causality (or the lack of causality) of the
relationship of drinking water to brain cancer. (4 pts)
15. An early study of drinking water and brain cancer was an ecological study conducted by the
lead author of the present article. In this study, brain cancer mortality rates in 923 U.S.
counties were compared with average levels of THM measured in the drinking water
supplies of those counties. For counties in which the sampled water supply served at least
85% of the residents of that county, the correlation coefficient between county-specific
mortality rates from brain cancer and trihalomethane levels was 0.24 in White men and 0.19
in White women. After reviewing this paper, your colleague concluded that THM in drinking
water are causally related to brain cancer. However, you are more cautious in your
interpretation, citing the "ecological fallacy." Please define the ecologic fallacy (2 pts) and
describe why it limits the causal inferences that can be made from the ecological study
described above (2 pts).
16. The authors used information provided by cases and controls on place of residence, primary
source of drinking water, and tap water and total fluid consumption to create an index of
cumulative lifetime exposure. However, the natural history of cancer (initiation, promotion,
conversion, and progression) may encompass many years. If drinking water is involved at the
earliest stages of brain cancer (initiation), then drinking water exposures in the recent past
may be more important than present exposures or those in the distant past (e.g., in
childhood). As defined in class, which of the following periods would be important in
defining the minimal and maximal length of time expected between drinking water exposure
and diagnosis with histologically confirmed glioma? Choose one best answer. (3 pts)
A. Induction period
B. One year case fatality
C. Latent period
D. Both a and c
E. None of above
17. The authors included all cases of histologically confirmed malignant brain cancers, including
glioblastoma, fibrillary and gemistocytic astrocytoma, and mixed glioma. If authors suspected
that drinking water exposure was associated with only certain subtypes of brain cancer (i.e.,
disease heterogeneity), which of the following strategies could they employ at the analysis
stage? (3 pts)
A. Adjustment for cancer type using mathematical modeling (e.g., logistic regression)
B. Stratification of cases by brain cancer type
C. Direct standardization by brain cancer type
D. Indirect standardization by brain cancer type
E. Matching cases and control by brain cancer type
18. The authors restricted their analysis to those cases and controls with at least 70 percent of
their lifetime years with a known source of drinking water. This approach was used to reduce
which type of bias? Choose one best answer (3 pts)
A. Confounding bias
B. Selection bias
C. Information bias
D. Random error
19. (question was not asked)
20.
a. Using the data in Table 3, label and complete a 2x2 table for the association between
brain cancer and >=40 years’ residence with a chlorinated surface water source
(versus 0 years), collapsing over sex (i.e., combine the data for men and women). (4
pts)
b. Calculate the odds ratio for your 2x2 table in part a. Show your work. (3 pts)
c. Suppose that the sex-adjusted OR for the relationship between brain cancer and
>=40 years’ residence with a chlorinated surface water source is 1.1. Is sex a
confounder of this relationship? Justify your answer. (3 pts)
d. Is sex an effect modifier (assuming a multiplicative model for joint effects) of the
relationship between brain cancer and >=40 years’ residence with a chlorinated
surface water source? Justify your answer. (3 pts)
e. According to Table 1, having a farming occupation (ever vs. never) is a risk factor
for brain cancer (OR=1.5). Assume that among the controls, farming occupation is
associated with duration of residence with a chlorinated surface water source. Could
farming occupation be a confounder of the associations reported under the Total
column in Table 3? Explain your answer. (3 pts)
21. Characteristics of cases and controls included in this study are shown in Table 1. Using this
information answer the following questions.
a. Calculate the appropriate crude (unadjusted) measure of association between farm

occupation and brain cancer. Consider those ever working on a farm as sufficient to be
classified as having a farm occupation. In 2 sentences or less interpret what this odds ratio
means. (4 pts)
Farm Occupation CASE CONTROL
YES
NO
b. Assume that 10% of the cases that were labeled as never having worked on a farm truly had
worked in such an environment. Furthermore assume that 15% of the controls that were
labeled as having ever worked on a farm, in fact never really did work on a farm. What
would the true association be between farm occupation and brain cancer? Assume that the
classification of disease status is valid. (4 pts)
c. Which of the following best describes a comparison of the odds ratios you computed in
parts (a) and (b)? Choose one best answer. (3 pts)
A. The odds ratios are different as a result of differential misclassification of exposure.
B. The odds ratios are different as a result of nondifferential misclassification of
exposure.
C. The odds ratios are different as a result of differential misclassification of disease
status.
D. The odds ratios are different as a result of nondifferential misclassification of disease
status.
E. The odds ratios are different as a result of random variation in the exposure
assessment.
22. Which of the following is a measure of the validity of methods used to classify exposures
such as having worked on a farm? Choose one best answer. (3 pts)
A. interclass correlation coefficient

B. kappa statistic
C. standard error
D. sensitivity
23.
a. Using data in Table 1, assess whether the crude OR of brain cancer associated with
farm occupation is confounded by age and/or sex. Support your answer with
relevant calculations. Table 1 shows the adjusted odds ratios estimating the risk of
brain cancer due to having farm occupation. (2 pts)
b. What feature of the study design could have contributed to the crude OR’s in Table
1 being confounded by age and/or sex? (2 pts)
Answer Guide
1. Case-control studies are well-suited for studying risk factors for brain cancer because the
disease is rare (hence difficult to study in a cohort design). Also, the case-control design
facilitates examining many risk factors of current interest, a substantial advantage when so
few risk factors have been identified. A retrospective cohort study can examine only
exposures for which historical data are available.
2. A "population-based case-control study" is a case-control study for which the study base is a
defined population. With a hospital-based case-control study, it is difficult to specify the
study base, since which cases come to a given hospital is influenced by such factors as
seriousness and treatability of the disease, type of hospital, and health care financing ability
and arrangements. A representative sample from this same defined population yields a
control group that permits valid estimation of odds ratios. In contrast, the validity of
measures of association estimated using a control group selected from among hospitalized
persons is always somewhat uncertain, since it is generally impossible to know how well such
controls provide valid estimates of the study base.
3. B. The method of finding cases was passive surveillance.
4. D. Using incident cases allows the odds ratio to estimate the incidence density ratio or risk
ratio. In contrast, the exposure distribution among prevalent cases will reflect differential
survival in relation to exposures as well as differential incidence.
5. C. Selective recall (the disease status has an influence on the recall of exposures) is a form of
information bias, not selection bias.
6. Since 26 of the cases reported using a private well, 85-26=59 cases did not. Sensitivity=0.77
means that the proxy respondents correctly classified as "exposed" 0.77x26 approx.=20
brain cancer cases. Specificity=0.75 means that the proxy respondents correctly classified as
"unexposed" 0.75x59 approx.=44 brain cancer cases. The rest of the table can be completed
by subtraction and addition. As a check on the arithmetic, the positive predictive value is
20/35 approx.=0.57.
Validation of proxy reports of use water from a private well
Case's self report
Report of proxy Yes No Total
Yes private well 20 15 35
No private well 6 44 50
Total 26 59 85
7. C. Manifestional criteria – histological criteria are observable characteristics of tumor cells in
microscopic examination.
8. C. 10% – the proportion of cases who are exposed is 85/291 approx.=0.29, and the OR
approx.=1.5. Substituting into the formula for PARP in a case-control study gives 0.29x(1.5–
1)/1.5 approx.=0.097.
9. A. The study base consisted of those people who if they developed brain cancer could have
been selected as a case.
10. E. The OR for the oldest group is less likely to be due to chance because the confidence
interval does not include 1.0 (although not without problems, this response was the best).
11. C. The narrower confidence interval indicates that the estimate for men is slightly more
precise.
12. D. 2.6 = (7x423)/(30x38) for men with above median tap water intake
Exposure to chlorinated water
40+ years < 40 years
Cases 7 30
Controls 38 423
13.
Average population
≥50,010 ≤2,500
Cases 32 112
Controls 246 780
Crude OR = (32x780)/(112x246) = 0.91 versus 0.7 adjusted. The estimates differ because
the OR in the table has been adjusted for age and sex (according to the footnote to Table 1).
14. Bradford Hill criteria are:
The associations observed for this association were of medium strength (1.7 for 20-39 years
of exposure to chlorinated surface water, 2.5 for >=40 years). The authors measured lifetime
exposure (through recall) so in spite of the prolonged induction and latent periods for brain
cancer, the criterion of temporality is satisfied to some extent. Some of the exposure history
in Table 3 must have occurred after the brain cancer had begun and is therefore not relevant.
However, it seems unlikely that if the association were causal it could go in the opposite
direction (i.e., brain cancer causes exposure to chlorinated water). There is little evidence to
support the plausibility of the association nor of its being found for men but not for women.
Studies of the association have not yielded consistent results. (The remaining criteria –
coherence, experiment, and analogy – are not applicable to the information in the article.)
15. The "ecologic fallacy" is the inference from aggregate data that a relationship exists at the
level of the individual. The flaw in this inference is that the prevalences of a characteristic
(e.g., exposure to trihalomethanes in drinking water) and a condition (e.g., brain cancer) can
both be elevated in a population even if the individuals who possess the characteristic are
not those with the condition. In the study described in the question, people who developed
brain cancer may not themselves have ingested large amounts of THM despite living in
counties with high THM levels in the county water supplies. A related analytic problem is
that the absence of individual-level data precludes individual-level control for potential
confounders, such as farming occupation.
16. D (both A and C). "Induction period" refers to the time between exposure and the onset of
the disease; "latent period" refers to the time from disease onset to diagnosis. For exposure
to be causal in early stages of tumor development, the exposure must be present prior to the
latent period. In principle, exposure prior to the sum of the longest possible induction
period and the longest possible latent period would not be relevant, either.
17. B. Stratification of cases by brain cancer type would permit examination of the relationship
for the individual subtypes.
18. B. "We selected cases and controls with at least 70 percent of their lifetime years with a
known source of drinking water in order to …minimize misclassification of exposure …"
(end of p 554).
19. (question was not asked)
20. a.
Risk of brain cancer by number of years resided in a dwelling supplied with

chlorinated surface water
>=40 years None Total
Cases 13 + 7 = 20 92 + 78 = 170 190
Controls 81 + 60 = 141 875 + 400 = 1275 1416
Total 161 1445 1606
b. OR = (20x1275) / (170x141) approx.= 1.1

c. The presence of confounding is usually determined on the basis of existence of a
meaningful difference between the crude and adjusted OR's, which there is not.
Since the OR's for men (2.5) and women (0.7) are quite different, for an
unambiguous indication of confounding the crude OR would have to be above 2.5
or below 0.7.
d. Yes, there is modification of the OR by gender, in that they differ meaningfully.
Although the two confidence intervals overlap substantially, neither point estimate is
contained within the confidence interval for the other gender's estimate, so besides
giving opposite "messages", the two OR's are likely to differ in fact (not necessarily
for biological reasons).
e. The ORs shown in Table 1 are (according to the footnote) controlled for farming
occupation, so that should not be a source of confounding, except to the extent that
the crudeness of the measure ("yes" versus "no") prevents the control from being
fully effective.
21. a.
Farming occupation and brain cancer risk
Farming occupation
Yes No Total
Cases 85 206 291
Controls 628 1355 1983
Total 713 1561 2274
OR = (85x1355) / (206x628) approx.= 0.89
The OR of 0.89 indicates no (or possibly a slight inverse) crude association between
brain cancer risk and having had a farming occupation.
b. If 10% of "unexposed" cases in fact had had a farming occupation, then

0.10x206=21 cases should be reclassified as exposed; if 15% of "exposed" controls
in fact had not had a farming occupation, then 0.15×628=94 controls should be
reclassified as exposed. The resulting table and OR would be:
Farming occupation and brain cancer risk
Farming occupation
Yes No Total
Cases 85 + 21 = 106 206 – 21 = 185 291
Controls 628 – 94 = 534 1355 + 94 = 1449 1983
Total 640 1634 2274
OR = (106x1449) / (185x534) approx.= 1.6

Correcting for the misclassification produces a table with a moderate positive
association between odds of farming occupation and brain cancer.
c. A. The odds ratios are different due to differential misclassification of exposure.
22. D. Sensitivity is a measure of validity (kappa is a measure of agreement that gives equal
weight to both classifications; standard error measures variability of an estimate)
23.
a. The crude OR = (85 x 1,355) / (206 x 628) = 0.89. This value is substantially
different from the adjusted value of 1.5, indicating that confounding by age and sex
are present.
b. Controls were matched by age and sex to cancer cases for five cancer sites. Thus, the
control group is not a simple random sample from the study base, so that analyses
must control for the matching variables.
12/20/1999, Wayne D. Rosamond and Victor J. Schoenbach

1. a. Briefly summarize two criteria on which disease classifications are based. Discuss a reason
why these two criteria do not always correspond with one another. (3 pts)
1. b. List two examples of each of the two types of criteria you mentioned in 1A. (2 pts)
2. Cohort studies can form the framework for efficient substudies, using nested case-control
and case-cohort study designs. Which of the following best compares and contrasts these
nested case control studies and case-cohort studies. (3 pts)
A. Both nested case control and case-cohort studies select controls that are matched on
time of case development but only case-cohort studies allow for multiple
comparisons with different case groups.
B. Both nested case control and case-cohort studies select controls from the entire
baseline cohort, but in case-cohort studies the selection is done at random.
C. In case-cohort studies a single group of controls can be used for comparison with
several case groups.
D. In nested case control studies, cases are selected entirely from the non-exposed
cohort group.
E. both C and D
3. Name the three component parts of any kind of incidence measure. (3 pts)
4. Over a ten-year period the number of bicycle injury events in a population increases even as
the age adjusted bicycle injury rate decreases in the population. Describe two conditions that
could cause this outcome (assume the definition of a bicycle injury and the quality of the
data remain constant over the 10 year period) (3 pts)
5. Which of the following best describes the condition(s) that are required for the odds ratio
(OR) to estimate the risk ratio (RR) in a case-control study. (choose one best answer) (3pts)
A. Incident cases are identified for a defined population at risk.
B. The controls represent the base population that gave rise to the cases.
C. The disease outcome is rare in the base population at risk.
D. All of the above.
6. The association between induced abortion and breast cancer has been the subject of
previous epidemiological studies. Cohort studies have found no association, while at least
one case-control study has found a positive association. Possible explanations for the
different results in case-control and cohort studies of this topic include (choose single best
answer). (3pts)
A. Case-control studies are prone to selection bias, whereas cohort studies are not
vulnerable to selection bias.
B. Recall bias might explain the association observed in a case-control study, but this
would not be a problem in prospective cohort studies.
C. The method of disease classification is different in case-control and cohort studies.
D. All of the above
7. Swaen et al (1998) conducted a study of 6,803 males who worked for at least six months
before 1/1/80 at one of nine chemical plants in the Netherlands. The workers were
followed for mortality from 1/1/56 until 1/1/96. Before 1/1/80, 2,842 of the workers were
occupationally exposed to acrylonitrile and the other 3,961 workers were not exposed to
acrylonitrile. After 1/1/80, there was no exposure to acrylonitrile. To measure the
association between occupational exposure to acrylonitrile and several outcomes, the
investigators calculated standardized mortality ratios (SMRs) for both the exposed and the
unexposed workers. Age-interval-specific person-years were generated for specific exposure
groups and were multiplied by the mortality rates for the total male population of the
Netherlands to generate expected numbers of cause specific deaths.
a. What study design did the investigators use? (2 pts)
b. What was the (crude) cumulative incidence ratio (CIR) for mortality comparing the
exposed to the unexposed men? What are two reasons why this measure is
problematic with these data?
c. For brain cancer, the SMR for the exposed workers (SMR=173.9) was more than
twice the SMR for the unexposed workers (SMR=85.7). Why are these two SMRs
not strictly comparable? (3 pts)
d. There were 290 deaths due to all causes among the exposed group and 983 deaths
due to all causes among the unexposed group. What measure of effect could be
calculated to strictly compare all-cause mortality between the exposed and the
unexposed group. (2 pts)
8. The issue of classification of disease is fundamental to epidemiological investigations. The

degree that we correctly separate cases of disease from non cases can be quantified in terms
of specificity and sensitivity. The issue of correct classification is important in research
involving cerebrovascular disease (stroke). Generally speaking there are two kinds of strokes,
ischemic (blood flow is restricted to brain tissue because of blocked artery in or leading to
the brain) and he morrhagic (a vessel in the brain ruptures causing bleeding in the brain).
These two pathologic processes are quite different.
Background information:
A panel of experts reviewed the medical records of 525 patients discharged from the hospital
with diagnosis codes indicative of a stroke (ICD 430-438). The panel classified strokes as
either ischemic or not ischemic. Assume the diagnos is reached by the panel is the most
accurate classification possible. Of the 525 cases, 325 had a discharge diagnosis code for
ischemic stroke (ICD code 434). Of these 325 patients, 85 were determined by the panel not
to be ischemic strokes. All but 20 o f the patients with discharge diagnosis codes other than
434 were determined by the panel to have non-ischemic strokes.
Given the background information, compute the sensitivity, specificity, and positive
predictive value of a hospital discharge code for ischemic stroke (ICD code 434) in
classifying a patient as truly having an ischemic stroke.
a. sensitivity of a 434 code: (2 pts)
b. specificity of a 434 code: (2 pts)
c. positive predictive value of a 434 code: (2 pts)
d. Constructing a receiver/response operating characteristic (ROC) curve may be useful

in understanding the implications of using different case definitions. Briefly explain
what a ROC curve is and what information it provides. (2 pts)
e. If you were to use a 434 discharge code to identify a group of cases with ischemic
stroke and the sensivity was 99% but the specificity was 40%, which of the following
would best describe your resulting case group. (choose one best answer). (2 pts)
A. The case group would be highly homogenous with respect to
pathophysiology of stroke.
B. The case group would be highly heterogeneous with respect to
pathophysiology of stroke.
C. The case group would have many false negative ischemic strokes.
D. The case group would represent the source population of cases.
f. What two factors influence the positive predictive value of a screening test in
most situations? (2 pts)
9. Suppose that a study was conducted to compare the rates of automobile collisions in
two cities. The researchers were impressed with studies that suggest that the use of
cell phones and pagers contribute to auto collisions. They wanted to adjust
(standardize) the rates of auto collisions in the two cities for cell phone and pager
use. Data on cell phone use and auto collisions in the two cities were collected and
are presented in the table below.
Cell phone and Corona del Mar, California Boulder, Colorado

pager use
# # Rate* # # accidents Rate*

persons accidents persons
Heavy 4479 293 100 2
Moderate 974 27 300 6
Never 1106 15 8293 145
Total 6559 335 8693 153
* per 1000 persons
a. Calculate the crude total and cell phone/pager use specific rates for Corona
del Mar and Boulder. How do these two cities compare in crude prevalence
of auto accidents. (2 pts)
b. Using the combined number of persons in both areas as a standard, calculate

a standardized rate (standardized for cell phone/pager use) for each of the
states. Use the direct standardization method. Briefly describe how these
standardized rates compare with each other and with the crude rates. Briefly
describe any meaningful differences. (4 pts)
c. In general, which of the following best describes a major weakness of both
crude and adjusted rates? (2 pts)
A. Both measures hide or obscure the heterogeneity in the population.

B. Both measures are only estimates of the true population rate.
C. Neither measure can be used to determine the magnitude of disease
burden in the population.
D. None of the above.
10. In a community intervention study, like the Minnesota Heart Health Program, the
effectiveness of an educational intervention program was evaluated. Which of the
following best describes the unit of assignment, the unit of observation, and the unit
of analysis in these types of studies (in this order)? (2 pts)
A. group, person, group

B. person, group, group
C. group, group, group
D. none of the above
11. Indicate next to each statement below whether you consider it to be TRUE, FALSE,
or if you are NOT SURE. A correct answer receives 2 points, an incorrect one
zero.
a. An advantage of cohort designs compared to the pure case control designs is

that cohort studies can directly estimate risks.
b. The temporal sequence of exposure and disease can be directly addressed in
a cohort design as well as in a case control study.
c. A disadvantage of the cohort design compared to a case control study design
is that in a cohort study one cannot address multiple outcomes.
d. As described in class, a randomized clinic trial is an example of a prospective
dynamic cohort study.
e. A disadvantage of the cohort design compared to a case control study is that
in a cohort study one needs to follow a large number of participants if the
disease is rare.
f. Ecological studies cannot directly assess causal inference because they
measure disease and exposure in a person at the same point in time.
g. Correlation studies can be quick, inexpensive, and allow for multinational
comparisons.
h. A case report is a type of descriptive study that is commonly conducted,
partially because an appropriate control group is easily defined.
i. Cross-sectional studies are limited by their lack of generalizability, but are
powerful in that they directly measure risk.
j. The study of person, place, and time helps to understand the natural history
of a disease.
k. A risk difference is determined by the absolute difference in two incidence
rates, whereas the relative difference is considered an attributable risk.
l. A correlation coefficient measures the degree of linear or monotonic
relationship between two variables and is therefore suitable for determining
the epidemiologic strength of association between them.
m. As an estimate of a relative risk, an odds ratio is a measure of association that
can be used to determine the magnitude of an association between exposure
and an outcome.
n. An attributable risk proportion is a measure of the impact assessing how
much risk results from exposure levels. Attributable risks that adjust for the
prevalence of the causal factor in a population is called a population
attributable risk.
o. Case control studies have several crucial advantages that relate to their
efficiency for studying rare conditions and those with prolonged induction
and their efficiency in examining many exposure and outcomes.
p. Incidence density is a proportion where the units of time are specified.
q. The decision to use an incidence density measure or a cumulative incidence
as a measure of the strength of association may depend on the objectives of
the study. Cumulative incidence is preferred if estimating individual risk is
the main objective.
r. A standardized mortality ratio (SMR) can be determined using indirect
adjustment. Because rates from a standard population are used, SMR’s from
two study populations can be compared as long as the rates in the standard
population are stable.
s. Comparability between cases and controls is a important step in constructing
a case-control study. It should be possible to detect exposure in controls to
the same extent as in cases. It is also critical that controls have similar
motivation and availability as cases. These two conditions are best met when
controls are selected from the general population.
12. Attributable measures are used by researchers to assess the public health impact of a
detrimental exposure, assuming causality. Given data from a cohort study on the
incidence of stroke (see below), estimate the attributable risk proportion among the
exposed (physically inactive). Explain your answer in one sentence. Assume that
physical activity is causally related to stroke risk.
Incidence
Physical activity Did develop a Do not develop Person years
per 1,000
level stroke a stroke (PY)
PY
ACTIVE 45 5,955 43,200
INACTIVE 135 13,865 100,800
Total 180 19,820 144,000
a. Attributable risk proportion (INACTIVITY) (3 pts)

Explain:
b. Additional data from the National Health and Nutrition Examination Survey
(NHANES) suggest the prevalence of a physically active lifestyle (at least 30 minutes
of moderate activity 3 days per week) is 27%. Using this information and your
answer to part (A), estimate what we can hope to accomplish with programs to get
people to be physically active in the total population. In one sentence explain your
answer. (3 pts)
Explain:
13. Suppose that in 1998 researchers hypothesized that communication ability and skill
in young adulthood was related to Alzheimer’s Disease. To test this they evaluated
hand written essays completed by a group of 350 nuns joining a single religious sect
in 1930. By careful review of these writing samples, the researchers categorized all
350 as either having a high error profile (N=150) or a low error profile (N=200).
Using surveillance of death certificates and other methods the researchers verified
vital status of each nun through 1998. An accounting of all deaths produced the table
below.
Cause of Death and Year by Handwriting Profile Status
High error
Low error profile
profile
# of Year of # of Year of
Cause of Death Cause of Death
Deaths Death Deaths Death
Alzheimer’s Disease 2 1980 Alzheimer’s Disease 1 1985
Alzheimer’s Disease 5 1995 Heart Disease 8 1980
Heart Disease 10 1980 Heart Disease 10 1990
Heart Disease 15 1995 Other 20 1960
Other 25 1960 Other 10 1970

Other 30 1970
a. Describe the type of study design used in this example. (2 pts)
b. Compute the incidence density rate of Alzheimer’s disease death for those with a
high error profile and for those with a low error profile. (3 pts) Show your work.
c. Compute the incidence density ratio for the risk of Alzheimer’s disease death
associated with a high error communication profile. Explain, in two sentences or
less, what this value means. (3 pts)
d. Using data from this study compute an odds ratio for the association of a high error
communication profile with death from Alzheimer’s disease. Show a clearly labeled
2x2 table. (2 pts)
e. Compare the odds ratio with the incidence density ratio computed in part c and
explain why they are similar or different.
Back to the top To list of examinations To EPID168 home page

8/4/2000vs from questions.sav

School of Public Health, Department of Epidemiology
Epidemiology 168, Fall 1998
Midterm Exam Answer Guide
1. a. Manifestational criteria: disease definition and classification based on observable

characteristics, such as symptoms, signs, history, labloratory findings, response to treatment,
prognosis.
Causal criteria: disease definition and classification based on the cause of the condition,
b. Manifestational criteria: Examples are cancers, arthritis, cholescystitis, schizophrenia,

depression, addiction, insomnia, . . .
Causal criteria : microbial diseases for which the pathogen has been identified (syphilis, TB,
malaria, yellow fever, influenza, etc.), lead poisoning, birth trauma,
2. (C)- Other choices are incorrect because controls in case-cohort studies are not matched to
cases (A), contrrols are selected at random with both designs (B), and cases must be selected
without regard to exposure (D).
3. New cases or events, population at risk or source population, passage of time
4. The size of the population may have grown (number increases even though rate does not);
the age distribution of the population may have changed (e.g., influx of families with small
children, outmigration of families with older children), so that age-standardized rate may not
change but a greater proportion of the population may be in the higher risk age range
(assuming that younger children have higher injury rates).
5. (D)- All of the above - use of prevalent cases requires that duration is not related to
exposure, controls should provide estimate of exposure in study base, and rare disease
assumption is required for OR to estimate RR (though not for OR to estimate IDR).
6. (B)- In a prospective cohort study, information on exposure is obtained before the outcome
(breast cancer, in this case) has occurred. Therefore recall bias - different recall by cases and
noncases - is not an issue. In a case-control study, cases and noncases may recall and report
exposure with different degrees of accuracy.
7. a. A (retrospective) cohort study.
b. CIR = (290/2,842) / (983/3,961) = 0.411

A cumulative measure ignores possible differences in length of follow-up between groups
being compared. A crude measure ignores possible differences in the age distributions
between men who have been exposed and men who have not.
c. SMRs are an indirect method of standardization, since they are based on weighted
averages for which the weights are taken from the population whose SMR is being
computed rather than from a "standard" population. Unless the age (and in this case, age-
calendar year interval) distributions for the populations whose SMR's are being computed
are the same, then the weighted averages that make up the SMR's are based on different sets
of weights and are not strictly comparable. Since age-interval distributions of exposed and
unexposed workers may differ, their SMR's are not strictly comparable.
d. Mortality rates computed with person-time denominators can be compared between

exposed and unexposed person-time. These will take into account the varying amounts of
follow-up for workers in different categories. Unless the person-years at risk for exposed and
unexposed workers have the same age distribution, which we do not know, then adjustment
for age is needed. Since there are ample numbers of deaths from any cause, mortality rates
can be directly-standardized using any reasonable set of weights. Since directly-standardized
rates are "strictly comparable", a ratio or difference of directly standardized rates would be a
suitable measure of association.
8. All but 85 of the 325 code 434's were correct classifications, so there were 240 (=325-85)
ischemic stroke patients correctly classified by discharge code. All but 20 of the patients
without code 434 were judged to have had an ischemic stroke, meaning that 20 were judged
to have an ischemic stroke. Thus, there were 260 (240+20) ischemic stroke patients, of
whom 240 were identified by discharge code (sensitivity=240/260). The remaining 265
(=525-260) patients did not have an ischemic stroke, and 180 of them were in fact not given
a code 434 (specificity=180/265). Of the 325 code 434's, 240 had had an ischemic stroke
(PPV=240/325). These data are summarized in the following table:
Comparison of discharge code 434 and classification by expert

panel
Expert panel
Discharge code Ischemic Not ischemic Total
Code 434 240 85 325
Other 20 180 200
Total 260 265 525
a. Sensitivity= (325-85) / [(325-85+20) = 240 / 260 = 92.3%
b. Specificity = (200-20) / (525-260) = 180 / 265 = 68%
c. Positive predictive value of a 434 code = (325-85) / 325 = 73.8%
d. An ROC curve plots the value of sensitivity and specificity for each case definition or
cutpoint. Examining the ROC curve shows the trade-off between sensitivity and specificity
that is available for the diagnostic test or measurement method. [The area between the
identity diagonal (slope = 1.0) and the ROC curve serves as a measure of accuracy that takes
into account both sensitivity and specificity, with the assumption that the costs of false
negatives and false positives are the same.]
e. (B) - Due to the low specificity (50%), half of hemmorhagic strokes in the patient group
will be classified as ischemic strokes.
f. Specificity and prevalence of the condition
9. a. Corona del Mar has a 2.9 times higher crude accident rate than Boulder.
Corona del Mar = 51.1/1000 and Boulder = 17.6/1000. Ratio = 2.9
b. Adjusted rates -
Corona del Mar: (4579 x .0654) + (1274 x .0277) + (9399 x .0136)/15,252 = 29.9/1000
Boulder: (4579 x .0200) + (1274 x .0200) + (9399 x .0178)/15,252 = 18.6/1000
The cell phone/pager adjusted auto accident rate for Corona del Mar was 1.6 times that of
Boulder. A portion of the difference seen in the crude rates was due to differences in the
distribution of use of cell phones and pagers between the two cities.
The standard weights are the sum of the population sizes for the two cities. The weighted
rates are the rates for each city, weighted (multiplied) by the standard weights. The total of
the weighted rates is the directly standardized rate. A problem in using the directly
standardized rates is that there are small numbers of cellular phone and pager users in
Boulder.
The higher crude rate in Corona del Mar reflects the much higher use of cellular phones and
pagers, which is associated with a much higher accident rate. The difference is reduced for
the standardized rates, since these control for the different distributions of cellular phones
and pagers between the two cities. However, this is a situation where it is essential to
examine the specific rates, since Boulder has lower accident rates among cellular phone and
pager users but a higher rate among never-users.
Since the rates in never users are quite similar, Corona del Mar is likely to make its greatest
impact on accident rates by getting motorists to reduce cellular phone and pager use while
driving or finding some way to such use safer (promote the use of "designated drivers"!?).
c.(A) Both measures obscure heterogeneity (variation) in rates across subgroups.
10. (A) Community intervention trials of this type assign groups to treatments and collect
measurements from individuals. The unit of analysis must be the same as the unit of
assignment (GROUP) or both (i.e., using mixed models).
11. a. T – a cohort study enrolls people who are free of the outcome and monitors them for the
development of the outcome, so the cohort design can be used to estimate risk of the event;
b. Not sure – the temporal sequence of exposure and disease can typically not be addressed
in a case-control study, though in some cases (e.g., a genetic characteristic or other
"exposure" that can be definitively assigned to a time prior to disease onset);
c. F – a cohort design can readily be used to study multiple outcomes; a case-control design
can readily be used to study multiple exposures;
d. T – a randomized clinical trial often enrolls participants over a period of time, with
follow-up time measured from the time of randomization;
e. T – a cohort study begins with disease-free subjects and monitors them for development
of the outcome; if the outcome is rare, many subjects must be followed to obtain an
adequate number of cases;
f. F – ecological studies use group-level variables (e.g., per capita meat consumption) and
relate them to disease rates; direct assessment at the individual level is NOT made, which is
the basis for the ecological fallacy (where the group data are used to infer a link at the
individual level);
g. T – correlational studies (another term for ecological studies) are often used to compare
disease rates across geopolitical entities using available data;
h. F – a case report does not involve a control group;
i. F – cross-sectional studies measure prevalence, not risk (of a future event); they are the
most statistically generalizable type of study when, as is often the case, the study population
is obtained through population-sampling;
j. F – the natural history of a disease is the process by which it develops over time;
descriptive information relating to person, place, and time can at best provide only indirect
information;
k. F – as used in class, the term "attributable risk" refers to the risk difference;
l. F – strength of association as used in epidemiology refers to the degree of change in the

one variable with respect to changes in the other variable; two variables can be very strongly
correlated (vary linearly or motonically) yet a large change in one may be associated with only
a small change in the other (e.g., a straight line with a modest slope has a high correlation but
a small degree of change in the ordinate variable for a given change in the variable on the
abscissa);
m. T – for a rare outcome, the odds ratio (OR) closely approximates the cumulative
incidence ratio (CIR) and incidence density ratio (IDR), so it indicates strength of association
in the epidemiologic sense; when the outcome is not rare, the OR does not approximate but
does vary with the CIR and IDR, so the OR still gives an indication strength of association
n. T – an attributable risk proportion estimates the proportion of risk that is associated with
an exposure in people who are exposed; attributable risk (as used in this course) is the risk
difference, which indicates the amount of risk associated with an exposure in people who are
exposed; attributable risk must be adjusted for the prevalence of the exposure in order to
estimate the amount of risk associated with exposure in the population as a whole;
o. F – since case-control studies begin with people who are already cases, they avoid having
to study a large number of people for a long time in order to accumulate enough cases; they
can also compare cases and controls in respect to many exposures; HOWEVER, they
cannot readily study many outcomes, since to do so requires enrolling cases for each of the
outcomes to be studied (i.e., equivalent to conducting several case-control studies that share
the same control group);
p. F – incidence density is a (relative) rate; cumulative incidence is a proportion;
q. F – incidence density and cumulative incidence are measures of frequency of occurrence,

not of strength of associatiion;
r. F – comparability of standardized rates and ratios across study populations requires that
the standardized measures be constructed using the same set of weights; indirect
standardization (e.g., via a SMR) employs the weights (the number of people in each
stratum) from the study population, so measures standardized using this method are, strictly
speaking, useful only for comparing a study population with the standard population used in
the standardization;
s. F – typically, general population controls will be less motivated than cases and sources of
medical information for them will not be comparable to those for cases.
12. a. ARP = (I1 - I0) / I1 = (RR-1) / RR = (1.34-1.04) / 1.34 = 0.30 / 1.34 = 22% (after
rounding)
The "I can't remember formulas" method:

ARP = attributable cases / all exposed cases = attributable cases / 135
Attributable cases = attributable risk * Exposed PY = (1.34-1.04)*100,800 = 30.24
ARP = 30/135 = 22% (after rounding)
Interpretation: Based on these data, 22% (about one in five) strokes in people who are
physically inactive can be attributed to their physical inactivity; in other words, if physically
inactive people became active early enough in their lives, their stroke incidence would
decrease by 22%
b. A key point here is that 27% is the prevalence of physically active people, whereas the
exposure is physical inactivity, whose prevalence is therefore 100% - 27% = 73%
PARP = p1(RR-1) / [1 + p1(RR-1)] = 0.73(1.286-1) / [1 + 0.73(1.286-1)]
= (0.73 x 0.286) / (1 + 0.73 x 0.286) = 0.209 / 1.209 = 17%
(The formula PARP = (I - I0) / I can also be used by first estimating the crude population
incidence, I, as a weighted average of the incidences in exposed and unexposed, weighting by
the prevalence of exposure, e.g.: I = (0.73)(1.34) + (0.27)(1.04) = 1.26, so PARP = (1.259 -
1.04) / 1.259 = 17%
The "I can't remember formulas" method:
PARP = Attributable cases / All cases
Attributable cases are (1.34-1.04) x number of exposed person-years. Since we do not know
the population size, represent it by n. Based on the NHANES data, 27% of people are
physically active, so there are 0.73n physically inactive people (in one year, 0.73 person-
years). So: Attributable cases = (1.34-1.04)(0.73) = 0.219.
All cases are exposed cases + unexposed cases. Since we do not know the population size,
let it be represented by n. Based on the prevalence of physically active people, there are
0.73n phyisically inactive and 0.27n physically active people (or person-years, if we assume a
one-year period). So the total number of cases = exposed cases + unexposed cases =
0.73(1.34) + 0.27(1.04) = 1.259
Therefore, PARP = 0.219/1.259 = 17%
Note that these measures can be computed more precisely by using the original number of
cases and person-years and not rounding intermediate results, but two significant figures is
adequate for the actual result, and in this case the answer does not change.
Explanation: Seventeen percent of all strokes in the population are attributable to physical
inactivity; if everyone were physically active, there would be 17% fewer strokes.
c. Attributable risk measures assume that the relationship is causal (i.e., that physical
inactivity does in fact cause an ncrease stroke risk). Some of the above interpretations may
also require that the process be reversible, so that changing to a physically active lifestyle
brings risk down to the level of someone who was not inactive. Another assumption is that
the rates and rate ratio observed in the cohort study hold ofr the entire population. Also, we
have ignored the effects of other factors, most notably age.
13. a. This is a retrospective cohort study (researchers developed the hypothesis in 1998).
b. High error profile: (2 + 5 + 6 + 5)/8021 = 2.24 per 1,000 women-years.
Low error profile: (1+3+4) / 12,287 = 0.651 per 1,000 wy
Women-years (WY) are computed as follows:
End Start Years Women WY

1980 1930 50 2 100
1985 1930 55 5 275
1990 1930 60 6 360
1995 1930 65 5 325
1980 1930 50 10 500
1995 1930 65 15 975
1960 1930 30 25 750
1970 1930 40 30 1,200
1998 1930 68 52 3,536
Totals 150 8,021
c. IDR= ID High / ID low = 2.24/0.651 = 3.4. Nuns with a high error communications
profile are 3.4 times more likely to die from Alzheimer's Disease than nuns with a low error
profile.
d.
Alzheimer’s Disease
Handwriting Profile AD Yes AD No
High error 18 132
Low error 8 192
odds ratio = (18) (192)]/[(8) (132)] = 3.27
e. The two are similar because the condition is fairly rare.

Most of the questions on this examination relate to the article "Individual risk factors for hip
osteoarthritis: obesity, hip injury, and physical activity" (Cyrus Cooper, Hazel Inskip, Peter Croft,
Lesley Campbell, Gillian Smith, Magnus McLaren, and David Coggon. Am J Epidemiol 1998;
147:516-22). You may refer to this article during the examination.
1. Briefly list two reasons why a case control study is (or is not) appropriate to examine
individual risk factors for hip osteoarthritis. (2 pts)
2. The authors state that their cases come from a defined population. List four features of
the population or the study design that support this statement or helped the authors to
achieve it? (4 pts)
3. Considering the study population, study design, and other information in the article,
which of the following statements is (are) TRUE and which is (are) FALSE. (2 pts each)
a. In these two health districts, the incidence density of symptomatic hip

osteoarthritis of sufficient severity to warrant hip arthroplasty exceeds 40 per
100,000 person-years.
b. If about 12% of the population was age 65 years or older, then about 12,000
people age 65 years or older in the two districts have radiographic evidence of hip
osteoarthritis.
c. The data in Table 1 demonstrate that women are 1.9 times as likely to develop
severe symptomatic hip osteoarthritis as are men.
d. The data in Table 2 indicate that female gender is not a risk factor for hip
osteoarthritis.
e. In this study, matching the control group to the cases on age, as opposed to a
random sample of the general adult population, probably resulted in greater
statistical power and precision.
4. The case identification process was based on a register in each district made up of
persons on a waiting list for a total hip arthoplasty (surgical reformation of the hip joint).
Waiting lists for procedures are common in societies with a nationa l or social medicine
system. In the United States, a region wide waiting list for a hip arthoplasty is unlikely, as
the availability of receiving this procedure would be more related to insurance status or
ability to afford such a procedure. Explain how using the register system in the Untied
Kingdom to select cases either increases or decreases the possibility of selection bias as
compared to a study conducted in the United States. (4 pts)
5. How was the diagnosis of hip osteoarthritis made in this study? Was this based on
manifestional or causal criteria? Explain your answer. (3 pts)
6. According to the authors: "For each case, a control of the same sex and age was
selected from the list of the same general practice held by the county Family Health
Service Association". State in one sentence the rationale for using a list from ge neral
practioners? (3pts)
7. Eighty-four percent of the patients listed for total hip arthroplasty fulfilled the criteria
for entry into the study as cases. Which of the following best describes the criteria: (3 pts)
a. age > 45 years, being on the waiting list for hip arthroplasty, and the presence
of Heberden’s nodes.
b. age > 45 years, pain duration at least for 36 months, and presence of
Heberden’s nodes.
c. history of hip fracture within the past year, being on the waiting list for hip
arthroplasty and reside in the study area.
d. presence of Heberden’s nodes, history of hip fracture within the past year, and
reside in the study area.
e. being on the waiting list for hip arthroplasty, reside in the study area, and age
> 45 years
8. The authors report that 89% of the eligible cases agreed to participate and 60% of the
1060 controls approached agreed to participate. Which of the following best states a
condition regarding the non-responders that could lead to an odds ratio re ported for the
risk of osteoarthritis associated with previous hip injury that is biased away from the null
(>1). Choose one best answer. (3 pts)
a. control non-responders are more likely to have a history of hip injury compared
to case non-responders.
b. control non-responders are less likely to have a history of hip injury compared
to case non-responders.
c. being a non-respondent is not related to previous hip injury.
d. none of the above
9. What was accomplished by replacing controls who refused to participate? (Choose one
best answer) (3 pts)
If controls who refused had not been replaced:
a. selection bias would have been greater;
b. the control group would have been less representative of the study base;
c. probability of a Type I error would have been greater;
d. probabillty of a Type II error would have been greater;
e. nondifferential misclassification bias would have been greater.
f. it would have been necessary to control for age and sex in the analysis.
10. The authors selected controls who were individually matched to cases by age, gender,
and family practitioner. Matching in the design stage is usually considered only for those
variables that are known to be confounders. Under which of the follow ing circumstances
could gender be a confounder of the association between a risk factor (obesity) and the
outcome (hip osteoarthritis)? Circle all that apply. (4 pts)
a. the prevalence of obesity and the prevalence of hip osteoarthritis are both
higher in men that in women
b. the prevalence of obesity is lower in men than women, but the prevalence of
hip osteoarthritis is higher in men than women.
c. the prevalence of obesity is higher in men than women, but the prevalence of
hip osteoarthritis is the same in men and women.
d. the prevalence of obesity is the same in men and women, but the prevalence of
hip osteoarthritis is higher in men than women.
11. The odds ratios in Table 2 are "mutually adjusted for the other two variables" by
logistic regression. The following questions concern the models used to estimate the odds
ratios in the table (ignore the fact that it was "condit ional" logistic regression and ignore
the middle categories for body mass index and presence of Heberden’s nodes) (2 pts
each):
a. How many logistic models were necessary to estimate the odds ratios for body
mass index >28.0, definite Heberden’s nodes, and previous hip injury among
women.
b. The odds ratio estimate for hip injury in women was 2.8. What must the logistic
coefficient have been?
c. From this table, estimate the odds ratio for women who had both definite
Heberden’s nodes and previous hip injury compared to women who had neither.
12. In this study, information on medical history, life style, and leisure time physical
activities was obtained through a "structured interviewer-administered questionnaire".
(page 517). It is possible that persons on a waiting list for a hip arthoplasty would be
more keenly aware of hip injuries they may have had in the past than controls. If true, this
is an example of which of the following? Choose one best answer. (3 pts)
a. differential case ascertainment bias
b. differential misclassification bias
c. differential selection bias
d. differential precision bias
e. none of the above

13. Among women, the odds of previous hip injury is higher among cases than controls
(Table 2; OR=2.8). As indicated in the footnotes for Table 2, the odds ratio for pervious
hip injury is adjusted or controlled for the other two variables in the Ta ble (body mass
index and Heberden’s nodes). Using the counts shown in Table 2, calculate an unadjusted
(crude) odds ratio for previous hip injury in women. (3 pts)
Unadjusted (crude) odds ratio = _________
14. Which of the following conclusions can be made from the above results? (choose one
a. the unadjusted (crude) association between hip injury and hip osteoarthritis in
women is completely confounded by body mass index and Heberden’s nodes.
b. since the unadjusted and adjusted odds ratios are similar, the risk factor (hip
injury) must not be associated with the adjustment variables (body mass index and
Heberden’s nodes)
c. since the unadjusted and adjusted odds ratios are similar, there is no effect-
measure modification of the association between hip injury and hip osteoarthritis.
15. The odds ratios presented in Table 5 are adjusted for previous hip injury. Why might
they still be confounded by hip injury? (3 pts)
16. In Table 6, is the crude association between previous hip injury and risk of unilateral
hip osteoarthritis biased towards the null or away from the null? (2 pts)
17. Based on the data in Table 3, what is the odds ratio for Heberden's nodes (definite
versus none) for persons in the Upper tertile of body mass index? (3 pts)
18. Rothman has proposed that "public health synergism" is present when an observed
joint effect exceeds that expected under the additive model. Do the odds ratios in Table 3
indicate the presence of "public health synergism" for effect of Heberden 's nodes and
elevated body mass index on hip osteoarthiritis? If not, do the odds ratios conform to a
multiplicative model? Include in your answer a 1-2 sentence assessment of whether these
data indicate "public health synergism". (For this question, ignore the row for "Possible"
Heberden's nodes and the column for the middle tertile of body mass index, and assume
that both Heberden’s nodes and elevated BMI reflect casual risk factors for hip
osteoarthritis. Note: do not necessarily rely on the autho rs' description of this table.)
(6 pts)
19. The authors investigated the association of specific sporting activities with risk of hip
osteoarthritis. Their data are presented in Table 5. Using their data, compute separately
the unadjusted (crude) risk of osteoarthritis associated with pla ying golf and for
swimming in men and women combined. Consider those who do not participate in any
sport as the reference group and assume no missing data. Show two appropriate 2x2 table
and your calculations. (4 pts)
19a. Compare these unadjusted (crude) odds ratios with the ones presented in Table 3.
Briefly describe and explain the comparison. (3 pts)
19b. Consider the possibility that golfers who have hip osteoarthritis are reluctant to seek
medical attention for their condition for fear it will mean the end of their ability to play
golf. Therefore, cases who golf are less likely to be se lected for this study than cases
who do not golf. If the true OR associated with golf is 2.0, then which of the following
best describes the selection bias and its impact on the odds ratio you computed. (3 pts)
a. non-differential selection bias resulting in an odds ratio biased toward the null.
b. non-differential selection bias resulting in an odds ratio biased away from the
Null.
c. differential selection bias resulting in an odds ratio biased away from the null.
d. differential selection bias resulting in an odds ratio biased toward the null.
19c. The authors state that "...the association with swimming may have arisen because
patients with hip osteoarthritis were advised to swim..." (page 521). Suppose that 25% of
the cases had been incorrectly classified as swimmers and assume that the misclassified
cases had not participated in any other sporting activity, either. Re-compute the odds ratio
for the association of hip osteoarthritis and swimming, after re-classifying these
individuals, using the number from the 2x2 table in question 19 above. Briefly discuss
how your conclusion about the role of swimming does (or does not) change. In what
direction did misclassification bias the study OR? (3 pts)
20. The odds ratio (95% confidence interval) estimating the risk of osteoarthritis
associated with a previous hip injury was 24.8 (3.1-199.3) in men and 2.8 (1.4-5.8) in
women (see Table 2).
a. Which estimate indicates a stronger association? (2 pts)
b. Which estimate is more precise? (2 pts)
c. Which estimate is more compatible with a population odds ratio of 4.0? (2 pts)
21. Which one of the statements best interprets the following passage? (3 pts)
"In a previous case-control study (17) of men aged 60-76 years, we observed a
doubling of risk for hip osteoarthritis among those in the highest third of body
mass index distribution, as compared with those in the lowest third, although the
increased risk was not statistically significant." (p519 bottom of right column)
a. Hip osteoarthritis is not as significant when it occurs in obese older patients,

because it is expected that overweight that lasts for many years will lead to
damage to the joints.
b. A doubling of risk is not significant from a statistical perspective, because it

represents only a moderate association.
c. The doubling of risk was not statistically significant because a p-value was not
computed, so it is not possible for the authors to know whether the increased risk
was due to chance.
d. If 1,000 independent random samples the same size as that study population
were drawn from a population with no increased risk of hip osteoarthritis, fewer
than 950 would have an OR between 0.5 and 2.0.
e. If 1,000 independent random samples the same size as that study population
were drawn from a population with a doubling of risk of hip osteoarthritis for the
highest third of the body mass distribution, as compared with the lowest third,
more th an 5% of the samples would display no elevation in risk.
f. If 1,000 independent random samples the same size as that study population
were drawn from a population with a doubling of risk of hip osteoarthritis for the
highest third of the body mass distribution, as compared with the lowest third,
fewer t han 80% would display an association of that magnitude.
22. A medical journalist, confused by the thrust of this article, comes to you and says:
"I've read this article several times, but I can't figure out what it shows about the
relationship of body mass index, Heberden's nodes, and hip osteoarthri tis. The authors
explain that 'two broad mechanisms are believed to underlie the pathogenesis of
osteoarthritis at any joint site: mechanical stress and a generalized predisposition to the
disorder' as indexed by Heberden’s nodes [p519 right column]. T hat seems
straightforward enough, and they later conclude that the analysis 'supports the notion that
this condition arises through an interaction between a generalized predisposition to the
disorder and specific mechanical insults to the hip' [p521]. Y et on page 518 [right
column], the authors state that there was 'no statistically significant interaction' between
body mass index and Heberden's nodes, and on page 519 [left column] they refer to
obesity and a tendency to polyarticular involvement as 'i ndependent risk factors for hip
osteoarthritis'. Would you please assess for me what this article shows about the
relationship among body mass index, Heberden's nodes, and hip osteoarthritis? I have
room for 40-60 words. Thanks!" (6 pts)
23. Write a brief statement for or against a causal relationship between hip injury and risk
of osteoarthritis. Comment specifically on at least two of Bradford Hill’s criteria for
causal inference. Support your conclusion with data or statements f rom the article. (4
pts)
Last changed 4/10/1999 by Victor_Schoenbach@unc.edu, links 8/4/2000vs, reworded #19c on 12/14/2000vs

School of Public Health, Department of Epidemiology
Final Examination, Fall 1998 - Answer Guide
1. Briefly list two reasons why a case control study is (or is not) appropriate to examine individual
risk factors for hip osteoarthritis. (2 pts)
Condition rare, faster to complete than cohort study, wide range of exposures of interest.
2. The authors state that their cases come from a defined population. List four features of the
population or the study design that support this statement or helped the authors to achieve it? (4
pts)
1. The two health districts had a centralized orthopedic facility for assessment and treatment of hip
osteoarthritis;
2. Local orthopedic surgeons were willing to enter all patients into the study;
3. All men and women 45 years and older who were placed on the waiting list for primary total hip
arthoplasty were considered for the study;
4. The authors included patients who consulted orthopedic surgeons privately.
5. The study excluded patients who lived outside the two districts.
The diverse socioeconomic profile was an advantage for generalizability but does not make this a defined
population.
3. Considering the study population, study design, and other information in the article, which of
the following statements is TRUE and which are FALSE . (2 pts each)
a. In these two health districts, the incidence density of symptomatic hip osteoarthritis of
sufficient severity to warrant hip arthroplasty exceeds 40 per 100,000 person-years.
[TRUE - 726 eligible cases / 1 million population over 18 months = 48.4 per 100,000]
b. If about 12% of the population was age 65 years or older, then about 12,000 people age
65 years or older in the two districts have radiographic evidence of hip osteoarthritis.
[TRUE - 10% population prevalence in age 65 years and older * 12% of one million]
c. The data in Table 1 demonstrate that women are 1.9 times as likely to develop severe
symptomatic hip osteoarthritis as are men.
[FALSE - the data in Table 1 cannot demonstrate this female excess, since there is no information
about the sex ratio in the older population; this ratio may well reflect a greater incidence of severe
symptomatic hip osteoarthritis in women, but some of the excess presumably derives from greater
mortality among men.]
d. The data in Table 2 indicate that female gender is not a risk factor for hip osteoarthritis.
[FALSE - controls were matched to cases on gender (and age), so the sex ratio in the controls must
match that in the cases]
e. In this study, matching the control group to the cases on age, as opposed to a random
sample of the general adult population, probably resulted in greater statistical power and
precision.
[TRUE - the mean age of the cases is 70 years old, with the majority older than 60; thus, the use of
general population controls without regard to age would result in relatively little overlap between the
age distributions of cases and controls on this very important variable.]
4. The case identification process was based on a register in each district made up of persons on a
waiting list for a total hip arthoplasty (surgical reformation of the hip joint). Waiting lists for
procedures are common in societies with a national or social medicine system. In the United States,
a region wide waiting list for a hip arthoplasty is unlikely, as the availability of receiving this
procedure would be more related to insurance status or ability to afford such a procedure. Explain
how using the register system in the Untied Kingdom to select cases either increases or decreases the
possibility of selection bias as compared to a study conducted in the United States. (4 pts)
Using the registry may reduce selection bias if affluence or ability to pay for a hip replacement is associated
with exposures like BMI, physical activity, Heberden’s nodes. Cases selected from surgery lists in the United
States system may have a differential association with a risk factor as compared cases not receiving this
procedure, so measures of association may be more biased in a U.S. study.
5. How was the diagnosis of hip osteoarthritis made in this study? Was this based on manifestional
or causal criteria? Explain your answer. (3 pts)
(page 517, left column, 2nd paragraph): Diagnosis of hip osteoarthritis in this study was based on pelvic
radiographs. This is based on manifestional criteria.
6. According to the authors: "For each case, a control of the same sex and age was selected from the
list of the same general practice held by the county Family Health Service Association". State in one
sentence the rationale for using a list from general practioners? (3pts)
(page 517, left column, 3rd paragraph): In England and Wales, almost everyone is registered with a general
practitioner so that these lists essentially provide an enumeration of the general population.
7. Eighty-four percent of the patients listed for total hip arthroplasty fulfilled the criteria for entry
into the study as cases. Which of the following best describes the criteria: (3 pts)
a. age > 45 years, being on the waiting list for hip arthroplasty, and the presence of
Heberden’s nodes.
b. age > 45 years, pain duration at least for 36 months, and presence of Heberden’s nodes.
c. history of hip fracture within the past year, being on the waiting list for hip arthroplasty
and reside in the study area.
d. presence of Heberden’s nodes, history of hip fracture within the past year, and reside in
the study area.
e. being on the waiting list for hip arthroplasty, reside in the study area, and age > 45 years (answer)
8. The authors report that 89% of the eligible cases agreed to participate and 60% of the 1060
controls approached agreed to participate. Which of the following best states a condition regarding
the non-responders that could lead to an odds ratio reported for the risk of osteoarthritis associated
with previous hip injury that is biased away from the null (>1). Choose one best answer. (3 pts)
a. control non-responders are more likely to have a history of hip injury compared to case non-responders.
(answer)
b. control non-responders are less likely to have a history of hip injury compared to case
non-responders.
c. being a non-respondent is not related to previous hip injury.
9. What was accomplished by replacing controls who refused to participate? (Choose one best
answer) (3 pts) If controls who refused had not been replaced:
a. selection bias would have been greater;
b. the control group would have been less representative of the study base;
c. probability of a Type I error would have been greater;
d. probabillty of a Type II error would have been greater; (answer)
e. nondifferential misclassification bias would have been greater.
f. it would have been necessary to control for age and sex in the analysis.
Answer: d. Failure to replace controls who refused would have reduced both the number of controls and of
cases (due to the matching), with a loss of statistical power and increase in the probability of a type II error.
10. The authors selected controls who were individually matched to cases by age, gender, and family
practitioner. Matching in the design stage is usually considered only for those variables that are
known to be confounders. Under which of the following circumstances could gender be a
confounder of the association between a risk factor (obesity) and the outcome (hip osteoarthritis)?
Circle all that apply. (4 pts)
a. the prevalence of obesity and the prevalence of hip osteoarthritis are both higher in men that in women
(true)
b. the prevalence of obesity is lower in men than women, but the prevalence of hip osteoarthritis is higher in
men than women. (true)
c. the prevalence of obesity is higher in men than women, but the prevalence of hip
osteoarthritis is the same in men and women.
d. the prevalence of obesity is the same in men and women, but the prevalence of hip
osteoarthritis is higher in men than women.
11. The odds ratios in Table 2 are "mutually adjusted for the other two variables" by logistic
regression. The following questions concern the models used to estimate the odds ratios in the table
(ignore the fact that it was "conditional" logistic regresion and ignore the middle categories for body
mass index and presence of Heberden’s nodes) (2 pts each):
a. How many logistic models were necessary to estimate the odds ratios for body mass index
>28.0, definite Heberden’s nodes, and previous hip injury among women.
"Mutually adjusted" means that each odds ratio comes from a model that includes the other two
factors, which therefore means that all three factors are included in the same model. So one model
yields an adjusted odds ratio for each variable. So one model was used.
b. The odds ratio estimate for hip injury in women was 2.8. What must the logistic
coefficient have been?
<p
The OR for a dichotomous or indicator variable is exp(beta), where beta is the logistic
coefficient. Therefore the coefficient was 1n(2.8) = 1.0296.
</p
c. From this table, estimate the odds ratio for women who had both definite
Heberden’s nodes and previous hip injury compared to women who had
neither.
The logistic model is based on additivity of the logit or multiplicativity of the odds.
Therefore the odds ratio for the double exposure is the product of the adds ratio for
each of the risk factors: 1.5*2.8=4.2.
12. In this study, information on medical history, life style, and leisure time physical
activities was obtained through a "structured interviewer-administered
questionnaire". (page 517). It is possible that persons on a waiting list for a hip
arthoplasty would be more keenly aware of hip injuries they may have had in the past
than controls. If true, this is an example of which of the following? Choose one best
answer. (3 pts)
a. differential case ascertainment bias
b differential misclassification bias (answer)
c. differential selection bias
d. differential precision bias
13. Among women, the odds of previous hip injury is higher among cases than
controls (Table 2; OR=2.8). As indicated in the footnotes for Table 2, the odds ratio
for pervious hip injury is adjusted or controlled for the other two variables in the
Table (body mass index and Heberden’s nodes). Using the counts shown in Table 2,
calculate an unadjusted (crude) odds ratio for previous hip injury in women. (3 pts)
Unadjusted (crude) odds ratio = __________ 2.9
14. Which of the following conclusions can be made from the above results? (chose
one best answer) (3 pts)
a. the unadjusted (crude) association between hip injury and hip osteoarthritis
in women is completely confounded by body mass index and Heberden’s
nodes.
b. since the unadjusted and adjusted odds ratios are similar, the risk factor
(hip injury) must not be associated with the adjustment variables (body mass
index and Heberden’s nodes)
c. since the unadjusted and adjusted odds ratios are similar, there is no effect-
measure modification of the association between hip injury and hip
osteoarthritis.
d. none of the above (answer)

15. The odds ratios presented in Table 5 are adjusted for previous hip injury. Why
might they still be confounded by hip injury? (3 pts)
There may be residual confounding by type of hip injury or by how long ago the hip injury
occurred, or imperfect recall of hip injury (non-differential misclassification).
16. In Table 6, is the crude association between previous hip injury and risk of
unilateral hip osteoarthritis biased towards the null or away from the null? (2 pts)
Towards the null (crude OR = 7.6 vs. adjusted OR = 10.6)
17. Based on the data in Table 3, what is the odds ratio for Heberden's nodes
(definite versus none) for persons in the Upper tertile of body mass index? (3 pts)
OR for Definite Heberden's nodes / none = 3.2 / 1.6 = 2.0
18. Rothman has proposed that "public health synergism" is present when an
observed joint effect exceeds that expected under the additive model. Do the odds
ratios in Table 3 indicate the presence of "public health synergism" for effect of
Heberden's nodes and elevated body mass index on hip osteoarthiritis? If not, do the
odds ratios conform to a multiplicative model? Include in your answer a 1-2 sentence
assessment of whether these data indicate "public health synergism". (For this
question, ignore the row for "Possible" Heberden's nodes and the column for the
middle tertile of body mass index, and assume that both Heberden’s nodes and
elevated BMI reflect casual risk factors for hip osteoarthritis. Note: do not
necessarily rely on the authors' description of this table.) (6 pts)
Odds ratios for hip Body mass

osteoarthiritis index
Heberden's nodes Lowest third Middle third Highest third
None 1.0 1.1 (0.7-1.8)* 1.6 (1.0-2.7)
Possible 1.5 (0.8-2.7) 1.5 (0.8-2.6) 2.0 (1.1-3.6)
Definite 1.4 (0.9-2.3) 2.2 (1.4-3.7) 3.2 (1.9-5.4)
* Numbers in parentheses, 95% confidence interval.
Ignoring the intermediate categories for Heberden's nodes and body mass
index gives the following expression for the additive model:
Expected joint excess risk = excess risk for factor 1 + excess risk for factor 2
= excess risk for Heberden's nodes + excess risk for Body mass index
Since hip osteoarthritis of this severity is rare, the following approximate
expressions are appropriate:
Expected excess risk = (OR for Heberden's nodes - 1) + (OR for Body mass index -
1)
Expected joint excess risk = (1.4 - 1) + (1.6 - 1) = 1.0
Observed joint excess risk = (3.2 - 1) = 2.2
The substantial difference between 2.2 and 1.0 indicates that the odds ratios
in this table do not conform to an additive model for expected joint effect.
The odds ratios do not conform to a multiplicative model, either:
Expected joint OR = (OR for Heberden's nodes) * (OR for Body mass index )
= 1.4 * 1.6 = 2.24, vs. 3.2 observed
Thus, the relationship is "supramultiplicative", though not greatly so.
Since these odds ratios indicate a joint effect greater than that expected under
an additive model, "public health synergism" is present, to a moderate degree
(we expect a 100% increase in risk but observe a 220% increase in risk)
19. The authors investigated the association of specific sporting activities with risk of
hip osteoarthritis. Their data are presented in Table 5. Using their data, compute
separately the unadjusted (crude) risk of osteoarthritis associated with playing golf
and for swimming in men and women combined. Consider those who do not
participate in any sport as the reference group and assume no missing data. Show
two appropriate 2x2 table and your calculations. (4 pts)
Golfers Cases Controls
YES 51 34
NO 140 162
OR = 1.7
Swimming Cases Controls
YES 156 110
NO 140 162
OR = 1.6
19a. Compare these unadjusted (crude) odds ratios with the ones presented in Table
3. Briefly describe and explain the comparison. (3 pts)
Table shows 1.4 and 1.5, respectively. This suggests that BMI, nodes, and hip injury
explain very little of the association of these two sports with hip osteoarthritis.
19b. Consider the possibility that golfers who have hip osteoarthritis are reluctant to
seek medical attention for their condition for fear it will mean the end of their ability
to play golf. Therefore, cases who golf are less likely to be selected for this study than
cases who do not golf. If the true OR associated with golf is 2.0, then which of the
following best describes the selection bias and its impact on the odds ratio you
computed. (3 pts)
a. non-differential selection bias resulting in an odds ratio biased toward the

null.
b. non-differential selection bias resulting in an odds ratio biased away from

the null.
c. differential selection bias resulting in an odds ratio biased away from the
null.
d. differential selection bias resulting in an odds ratio biased toward the null. (answer)
19c. The authors state that "...the association with swimming may have arisen
because patients with hip osteoarthritis were advised to swim..." (page 521). Suppose
that 25% of the cases had been incorrectly classified as swimmers and assume that
the misclassified cases had not participated in any other sporting activity, either. Re-
compute the odds ratio for the association of hip osteoarthritis and swimming, after
re-classifying these individuals, using the number from the 2x2 table in question 19
above. Briefly discuss how your conclusion about the role of swimming does (or
does not) change. In what direction did misclassification bias the study OR? (3 pts)
Swimming Cases Controls
YES 156-25% = 117 110
NO 140 + 39 = 179 162
OR = 0.96: The misclassification was differential and biased the odds ratio
upward.
20. The odds ratio (95% confidence interval) estimating the risk of osteoarthritis
associated with a previous hip injury was 24.8 (3.1-199.3) in men and 2.8 (1.4-5.8) in
women (see Table 2).
a. Which estimate indicates a stronger association? (2 pts) 24.3
b. Which estimate is more precise? (2 pts) 2.8 (1.4-5.8)
c. Which estimate is more compatible with a population odds ratio of 4.0? (2

pts) 2.8 (1.4-5.8)
21. Which one of the statements best interprets the following passage? (3 pts)
"In a previous case-control study (17) of men aged 60-76 years, we observed
a doubling of risk for hip osteoarthritis among those in the highest third of
body mass index distribution, as compared with those in the lowest third,
although the increased risk was not statistically significant." (p519 bottom of
right column)
a. Hip osteoarthritis is not as significant when it occurs in obese older

patients, because it is expected that overweight that lasts for many years will
lead to damage to the joints.
b. A doubling of risk is not significant from a statistical perspective, because

it represents only a moderate association.
c. The doubling of risk was not statistically significant because a p-value was
not computed, so it is not possible for the authors to know whether the
increased risk was due to chance.
d. If 1,000 independent random samples the same size as that study population were
drawn from a population with no increased risk of hip osteoarthritis, fewer than 950 would
have an OR between 0.5 and 2.0. (answer)
e. If 1,000 independent random samples the same size as that study

population were drawn from a population with a doubling of risk of hip
osteoarthritis for the highest third of the body mass distribution, as
compared with the lowest third, more than 5% of the samples would display
no elevation in risk.
f. If 1,000 independent random samples the same size as that study

population were drawn from a population with a doubling of risk of hip
osteoarthritis for the highest third of the body mass distribution, as
compared with the lowest third, fewer than 80% would display an association
of that magnitude.
Answer: d. "Statistically significant", as conventionally used, means that in the absence of
any true association a model based on chance would yield an association as strong or
stronger than the observed value less than 5% of the time.
22. A medical journalist, confused by the thrust of this article, comes to you and says:
"I've read this article several times, but I can't figure out what it shows about the
relationship of body mass index, Heberden's nodes, and hip osteoarthritis. The
authors explain that 'two broad mechanisms are believed to underlie the
pathogenesis of osteoarthritis at any joint site: mechanical stress and a generalized
predisposition to the disorder' as indexed by Heberden’s nodes [p519 right column].
That seems straightforward enough, and they later conclude that the analysis
'supports the notion that this condition arises through an interaction between a
generalized predisposition to the disorder and specific mechanical insults to the hip'
[p521]. Yet on page 518 [right column], the authors state that there was 'no
statistically significant interaction' between body mass index and Heberden's nodes,
and on page 519 [left column] they refer to obesity and a tendency to polyarticular
involvement as 'independent risk factors for hip osteoarthritis'. Would you please
assess for me what this article shows about the relationship among body mass index,
Heberden's nodes, and hip osteoarthritis? I have room for 40-60 words. Thanks!" (6
pts)
Points to include:
1. Both body mass index and presence of Heberden's nodes were associated with greater
risk of hip osteoarthritis, even when the other is absent.
2. People with both elevated BMI and Heberden's nodes have a greater risk for hip
osteoarthritis than people with only one of these risk factors and even greater than would be
expected from adding or multiplying their individual effects (i.e., greater than expected by
both additive or multiplicative models).
3. The authors seem to believe and the study does not show otherwise that most cases of hip
osteoarthritis in their study result from a combination of mechanical stress (which could be
something other than obesity) and biologic predisposition (which might not yet have
manifested in other joints).
4. The paper presents no biological theory or other information suggesting a mechanistic

interaction between obesity and osteoarthritis at other sites in regard to hip osteoarthritis,
but rather discusses a possible etiologic role for each individually;
Grading: 6 points for 3 of these, 5 points for two of them, 3 points for one. If none was
mentioned then 1-2 points awarded depending upon the relevance and accuracy of what was
written.
23. Write a brief statement for or against a causal relationship between hip injury and
risk of osteoarthritis. Comment specifically on at least two of Bradford Hill’s criteria
for causal inference. Support your conclusion with data or statements from the
article. (4 pts)
(You're on your own here!)
4/10/1999, 8/4/2000, changes to 19c 12/14/2000vs Victor_Schoenbach@unc.edu


NOTE: Adjust margins and/or pagination before printing.

NOTE: This exam is illustrative only. It proved somewhat on the easy side, and a number of the
questions were problematic.
1. Match the term from column A with the most appropriate topic or
concept from column B (use each term only once and each topic only
once). (1 pt each = 12 pts)
Column A - Terms Column B - Topics
____ cumulative incidence 1. Case-control studies
____ incidence density 2. Causal inference
____ prevalence 3. Confounds cross-sectional data
____ dose response 4. Death certificate
____ induction period 5. Descriptive epidemiology
____ odds ratio 6. Diagnostic tests
____ preventive fraction in the exposed 7. Estimates risk
____ underlying cause of death 8. Measures impact

____ positive predictive value 9. Natural history of disease
____ detectable, pre-clinical phase 10. Population screening
____ migrant studies 11. Proportion
____ cohort effect 12. Relative rate
2. Which of the following best describes the basis of the diagnosis of

myocardial infarction? (Choose one best answer) (4 pts)
____ a. manifestational criteria
____ b. Bradford criteria
____ c. causal criteria
____ d. etiologic criteria
3. In the Minnesota Heart Health Program (as described in class) and many
other community intervention studies, the effectiveness of an
educational intervention program is evaluated. Which of the following
selections best describes the unit of assignment, the unit of
observation, and the unit of analysis (in this order) in studies of
these types? (Choose one best answer) (4 pts)
____ a. community, person, community
____ b. person, community, community
____ c. community, community, community
____ d. none of the above
-2- ID Number __-__ __ __ __
4. In a hypothetical clinical trial, a new drug was compared with

"standard therapy" treatment. The endpoint was myocardial infarction.
Which of the following best describes the primary reason to randomize
patients to treatments? (Choose one best answer) (4 pts)
____ a. to create two treatment groups that are similar at baseline on
both known and unknown factors associated with myocardial
infarction.
____ b. prevent bias introduced when the patients know what type of
treatment they are receiving
____ c. prevent bias introduced when the investigators know what type of
treatment the patients are receiving
____ d. b and c
5. Indicate TRUE or FALSE next to each of the following statements.

(2 pts each)
____ a. The indirect method of age standardization applies stratum-

specific rates from an external population to the age distribution
of the study population.
____ b. A standardized mortality ratio is an example of a stratum-specific

crude rate.
____ c. Standardized mortality ratios are perferred for making comparisons

among multiple populations.
____ d. Direct age standardization can be characterized as applying the

same set of weights to the age-specific rates of populations to be
compared.
6. 200 women with a history of chest pain were assessed by an exercise

tolerance test (ETT). Compared with coronary angiography (the "gold
standard"), ETT had a sensitivity of 68% for detecting coronary artery
disease, with specificity 61%. The predictive value of a negative ETT
was higher in younger women (less than 52 years old) and in women with
no more than one risk factor (i.e., family history, hypertension, high
cholesterol, smoking, or diabetes). If sensitivity and specificity do
not vary by age or risk factor status, why is the higher negative
predictive expected? (3 pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
-3- ID Number __-__ __ __ __
7. A randomized trial studied 242 HIV-seropositive, 2nd-trimester

pregnant women to assess the efficacy of zidovudine (AZT) in
preventing perinatal HIV transmission. Results were:
Results from a randomized trial of the efficacy of

zidovudine in preventing perinatal HIV transmission
___________________________________________________________________
Zidovudine Placebo All
Births (no.) 121 121 242
Infection status of infant
Non-infected 112 90 202
HIV-infected 9 31 40
Transmission rate (%) 7.4 25.6 16.5
___________________________________________________________________
7A. Which one answer best describes the transmission rate in the table?
(4 pts)
____ a. proportion
____ b. relative rate
____ c. absolute rate
____ d. odds
7B. Using the data in the table, estimate the relative risk of HIV
infection for infants whose mothers took zidovudine relative to
infants of mothers who took placebo. Show formula and calculations.
(4 pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
-4- ID Number __-__ __ __ __
7C. Based on the data in the above table, estimate the proportion of
potential cases of perinatal HIV transmission that could be prevented
by providing zidovudine to HIV-positive, 2nd trimester pregnant women
who would otherwise not receive the drug. (Assume all women take the
medication and consider only singleton births.) Show formula or
diagram and calculations. (4 pts)
7D. Zidovudine is now routinely offered in association with all

pregnancies to known HIV-seropositive mothers in the United States.
However, growth of resistant strains will reduce the drug's
effectiveness in preventing perinatal HIV transmission. Observational
studies for assessing zidovudine's effectiveness have serious
methodologic problems, but which of the following case-control designs
would be the most nearly valid? (Choose one best answer.) (4 pts)
____ a. Cases are HIV-infected infants; controls are uninfected infants.
____ b. Cases are HIV-infected infants; controls are uninfected infants of

HIV-seropositive mothers.
____ c. Cases are HIV-infected infants; controls are infants whose mothers
should have received zidovudine but did not.
____ d. Cases are HIV-infected infants whose mothers received zidovudine;

controls are uninfected infants whose mothers received zidovudine.
8. The following is background information for questions 8A-8E.
Objective: To determine the prevalence of sexually transmitted

diseases (STD) and high risk sexual behavior for STD among adolescent
males admitted to a juvenile detention facility.
Methods: Data were obtained from interview, exam, and lab tests.
Results:
Table 1. Behavioral variables in 966 subjects

___________________________________________________________________
Variable Mean (SD) Range Median
Age at first coitus 12.3 (2.0) 5-17 13

No. lifetime partners 13.7 (16.8) 1-100 8
No. partners past 4 months 2.9 (3.4) 0-30 2
No. weeks since last sex 5.8 (15.1) 1-260 2
___________________________________________________________________
SD = standard deviation
-5- ID Number __-__ __ __ __
8A. Which of the descriptive statistics in Table 1 (mean, SD, range,

median) is most susceptible to being influenced by a single extreme
value? (Choose one_best answer.) (4 pts)
a. mean
b. SD
c. range
d. median
8B. Of the four variables in Table 1, which has the most symmetrical
(normal-like) distribution? (Choose one best answer.) (4 pts)
a. age at first coitus
b. number of lifetime partners
c. number of partners in the past 4 months
d. number of weeks since last sex
Table 2. Sexually transmitted diseases in adolescent males

admitted to a juvenile detention facility.
______________________________________________________
No. positive
Disease /tested
Syphilis 7/930
Gonorrhea 42/940
Chlamydia 66/957
Any of the above 109/908
_______________________________________________________
8C. Based on the above data and assuming that the the two diseases have
the same average duration, how do their incidence rates compare in
this population? (Choose the one correct answer.) (3 pts)
a. Incidence of gonorrhea is lower than that of chlamydia.
b. Incidence of gonorrhea is the same as that of chlamydia.

c. Incidence of gonnorhea is higher than that of chlamydia.
-6- ID Number __-__ __ __ __
8D. Based on the above data but this time assuming that the two diseases
have the same incidence, how do their average durations compare in
this population? (Choose the one correct answer.) (2 pts)
a. Duration of gonorrhea is shorter than that of chlamydia.
b. Duration of gonorrhea is longer than that of chlamydia.
8E. Elaborate on your answer to the preceding question by deriving an

estimate of the relative duration of gonorrhea relative to chlamydia.
Show the basis for your answer. (3 pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
9. The following is background information for questions 9A-9D.
In a large urban school district, among 8,000 middle-school school

youth who were well at the beginning of the school year, 400 were
absent for 10 days or longer due to acute asthma ("AA-10") during the
first nine-week quarter. Based on a survey believed accurate for the
period, 15% of middle-school youth in the county middle schools smoke
cigarettes. Interviews with the youth who were absent for 10 days or
longer revealed that 100 of them were cigarette smokers. Assume that
the school enrollment does not change during the quarter.
9A. Show these data in the form of a 2 x 2 table. Include an appropriate

title, labels that identify each row and column, and row and column
totals. (4 pts)
9B. What is the cumulative incidence (CI) of AA-10 (10+ absent days due to
acute asthma), in:
a. the cohort of 8,000 youth? (1 pt)
b. youth who smoke cigarettes? (1 pt)
c. youth who do NOT smoke cigarettes? (1 pt)
-7- ID Number __-__ __ __ __
9C. What measure would you use to quantify the strength of association
between cigarette smoking and AA-10? Show the formula for this
measure, substitute the appropriate numbers for that formula, compute
the result, and state its meaning in one sentence. (4 pts)
a. Formula
b. Substitution
c. Result
d. Meaning ____________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
9D. Assuming that cigarette smoking is responsible for the observed excess
in AA-10, how many cases of AA-10 during the quarter are attributable
to cigarette smoking? Show a relevant formula or diagram,
intermediate computation, and result, and give a sentence stating the
meaning of the result. (4 pts)
a. Formula or diagram
b. Substitution
c. Result
d. Meaning ____________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
-8- ID Number __-__ __ __ __
10. Suppose that 900 of the subjects in question #8 consent to regular STD
screening following release from detention. Subjects are counseled
about preventive measures and screened every three months for two
years. All cases are treated and cured.
Table 3. Numbers of cases of three sexually transmitted diseases

in adolescent males discharged from a juvenile detention facility
____________________________________________________________________
Follow up Time (Months)

3 6 9 12 15 18 21 24
Syphilis 0 1 0 3 1 2 3 4
Gonorrhea 10 8 15 21 11 12 19 24
Chlamydia 15 23 8 18 17 17 14 11
Dropouts (cumulative) 10 30 50 90 120 140 190 270
Number tested 890 870 850 810 780 760 710 630
____________________________________________________________________
(Subjects can become infected with the same organism more than once
and/or become co-infected with more than one organism.)
10A. What is the prevalence of chlamydia at the 12 month follow-up? (3 pts)
10B. What is the average incidence density (per 100 person months or per
100 person years) of chlamydia for the two years of follow up? Assume
that: dropouts contribute no time to follow up after the last time
they are tested; subjects remain at risk even while infected. (3 pts)
10C. Give two reasons for preferring incidence density over cumulative
incidence for assessing frequency of infection in this cohort. (6 pts)
i. ___________________________________________________________
_______________________________________________________________
ii. ___________________________________________________________
_______________________________________________________________
-9- ID Number __-__ __ __ __
11. A study of alcoholism and major depressive disorder recruited 100
consecutive patients in a Veterans Administration hospital in Urbana,
Illinois. All patients had been diagnosed as being alcohol abusers.
An equal number of non-abusers were selected randomly from the same VA
hospital. 76 of the participants identified as being abusers
fulfilled criteria for major depression, as did 20 of the non-abusers.
Evaluate the evidence provided by this study for the inference that
alcohol abuse causes depression in relation to the following aspects:
11A. What is an inherent weakness in this design that makes it susceptible

to obtaining inaccurate data? (3 pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
11B. Many of the criteria for causal inference pertain to the evaluation of
evidence from multiple studies, but several can also apply to a single
study. Name two (2) such criteria and use them to evaluate
(quantitatively where possible) the evidence from the above study.
(6 pts)
i. ___________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
ii. ___________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
Congratulations!
10/12/97, 10/13/97 EPID 168 Midterm, Fall 1997

Last changed 4/10/1999 by Victor_Schoenbach@unc.edu
University of North Carolina School of Public Health
EPID 168 - Fundamentals of Epidemiology
Copyright, 1997, Victor Schoenbach and Wayne Rosamond
<!-- Note: Adjust margins or pagination before printing. !>

Note: The scores on this examination were on the high side, and some of the questions on this
exam were problematic.
MIDTERM EXAMINATION, Fall 1997 -- Answer Guide
1. Matching (1 pt each):
Column A - Terms Column B - Topics
7 cumulative incidence (11 is ok) 1. Case-control studies
12 incidence density 2. Causal inference
11 prevalence (7 is ok) 3. Confounds cross-sectional data
2 dose response 4. Death certificate
9 induction period 5. Descriptive epidemiology
1 odds ratio 6. Diagnostic tests
8 preventive fraction in the exposed 7. Estimates risk
4 underlying cause of death 8. Measures impact
6 positive predictive value 9. Natural history of disease
10 detectable, pre-clinical phase 10. Population screening
5 migrant studies 11. Proportion
3 cohort effect 12. Relative rate
(Credit was also given for some other pairings.)
2. Diagnosis of myocardial infarction is based on manifestational

criteria. (4 pts)
3. a. community, person, community (units of assignment, observation,

analysis, respectively, in the Minnesota Heart Health Program. (4 pts)
4. a. to create two treatment groups that are similar at baseline on both

known and unknown factors associated with myocardial infarction (4 pt)
5. Age standardization, True or False (2 pts each):
T a. The indirect method of age standardization uses data from the

stratum specific rates from an external population applied to the
age distribution of the study population.
F b. A standardized mortality ratio is an example of a stratum specific

crude rate.
F c. Standardized mortality ratios are useful when the number of events

is small and multiple comparisons among populations are to be
made.
T d. Direct age standardization can be characterized as applying the

same set of weights to the age-specific rates between populations
to be compared.
6. Predictive value depends both on specificity and on prevalence. For a

given specificity, higher prevalence means higher positive predictive
value, lower prevalence means higher negative predictive value.
Prevalence of coronary artery disease is lower in women who are
younger and have few risk factors, so negative predictive value is
higher in this group. (3 pts)
-2-
7A. a. proportion -- The "transmission rate" is the number of HIV-infected

infants divided by the total number of births in that group. The
proportion estimates the prevalence of HIV infection in these infants.
The proportion also estimates cumulative incidence of HIV-infected
babies among 2nd trimester, HIV-infected pregnant women. Cumulative
incidence measures for birth outcomes are a complex matter, because of
the great opportunity for selection bias due to impaired fecundity and
fertility, and unrecognized pregnancy loss. In this case, however,
the exposure occurs after the pregnancy has been recognized. (4 pts)
7B. Relative risk of HIV infection for zidovudine vs. placebo:
Relative risk (RR) = CI1 / CI0 = 7.4% / 25.6% = 0.29

The transmission rates serve as estimates of CI1 and CI0 (the
incidences can be estimated from the transmission rates even if the
former are regarded as prevalences, since there is a restricted risk
period and duration is not a factor). (4 pts)
7C. Proportion of potential cases of perinatal HIV transmission that could

be prevented by zidovudine, i.e., the preventive fraction in the
exposed, PF1 (all women take zidovudine, so all are exposed) (4 pts):
PF1 = 1 - RR = 1 - 0.29 = 0.71 or 71%
By diagram:
H _ _ _ _ _ _ _ _ _ _ _ _ 25.6% transmission rate in women

I | who do not take zidovudine (based on
V | ^ the placebo group)
| |
T | | Amount of the transmission rate that
r | | is prevented by zidovudine
a | v
n |_______________________ 7.4% transmission rate in women
m | who took zidovudine
i |
s |_______________________ 0
.
(25.6% - 7.4%) / 25.6% = 1 - 7.4% / 25.6% = 0.71 (= 1 - RR)
7D. b. Cases are HIV-infected infants; controls are uninfected infants of

HIV-seropositive mothers. Using all uninfected infants as controls
would make zidovudine appear to be a risk factor for HIV transmission,
since most mothers do not have HIV so their infants will be
uninfected. Choices c. and d. choose the control and/or case group
partly on the basis of exposure, which completely undermines a case-
control design. (4 pts)
8A. c. Range -- the range is in fact completely determined by the highest

and lowest values. (4 pts)
8B. a. Age at first coitus -- its mean and mean are both close together
and not very far from the middle of the range. Although the mean and
median are also close together for the number of partners in the past
4 months, but they are no where near the middle of the range. (4 pts)
-3-
8C. a. Incidence of gonorrhea is lower than that of chlamydia -- if
duration is the same for both diseases, the prevalence odds are
proportional to the incidence density, so gonorrhea's smaller
prevalence (42/940 vs. 66/957) implies a lower incidence. (3 pts)
8D. a. Duration of gonorrhea is shorter than that of chlamydia -- if

incidence rates are the same, chlamydia must last longer in order for
its prevalence to be higher. (2 pts)
8E. (3 pts) Prevalence odds = duration x incidence density. Therefore:
prevalence odds (gonorrhea) duration(G) x incidence density

----------------------------- = --------------------------------
prevalence odds (chlaymdia) duration(C) x incidence density
Since both diseases have the same incidence, the ratio of their
durations equals the ratio of their prevalence odds:
prev. odds for gonorrea 42 / 898 0.468

------------------------ = -------- = ------- = 0.63
prev. odds for chlamydia 66 / 891 0.741
(Credit was also given for "prevalence = incidence x duration", though

this true only approximately.)
9A. School absence from acute asthma and cigarette smoking (4 pts):
School absence due to acute asthma in middle school

by cigarette smoking status
Smokers Nonsmokers Total

------- ---------- -----
AA-10* 100 300 400
Absent fewer than 1,100 6,500 7,600

10 days
------ ----- -----
Total 1,200** 6,800 8,000
* AA-10 refers to absence 10+ days due to acute asthma.
** Based on 15% smoking prevalance
9B. Cumulative incidence of AA-10:

a. Crude CI = 400 / 8,000 = 50 per 1,000 or 5%
b. CI in smokers = 100 / 1,200 = 83 per 1,000 or 8.3%
c. CI in nonsmokers = 300 / 6,800 = 44 per 1,000 or 4.4%
9C. Strength of association (4 pts):
CI in smokers 8.3%
Cumulative incidence ratio = ----------------- = ------ = 1.89
CI in nonsmokers 4.4%
d. The cumulative incidence ratio (CIR) of 1.9 indicates a moderate

association between cigarettes and extended school absence.
-4-
9D. Number of cases of excessive absence due to acute asthma (AA-10) that
(assuming causation) are attributable to smoking.
This question asks for the size of the shaded box in the diagram in
the "evolving text". That diagram, with numbers instead of variables
is:
|
8.3% | 8.3% = incidence
| |XXXXXXXXXXXXXXX| in exposed
Incidence | | | persons
| | 3.9% x 1,200 |
| | = 47 | 3.9% = "attributable
4.4% | |XXX XXXX| risk"
| |\\\\\\\\\\\\\\\|
| 300 | 4.4% x 1,200 | 4.4% = incidence
0| |\\ = 53 \\| in unexposed
6,800 1,200 (15%) persons
Nonsmokers Smokers
So the number of cases attributable is 47 (after rounding). This

number can be obtained in various ways:
Number of cases in smokers - "expected" cases in smokers

100 - 1,200 x 4.4%
Attributable risk x Number of smokers

(I1 - I0) x 1,200
(8.3% - 4.4%) x 1,200
Number of cases in smokers x Attributable risk proportion (ARP)
100 x (1.89 - 1 ) / 1.89
Overall number of cases x Pop. attributable risk proportion (PARP)

400 x (I - I0) / I
400 x (5% - 4.4%) / 5%
400 x 12%
All these methods come up with approximately the same answer, the
differences being due to the rounding of intermediate results in
obtaining some of the incidences and the CIR. When the numbers
from the table are used and intermediate results not rounded, the
number of cases attributable to smoking is 47.0588
Assuming causation, cigarette smoking is responsible for heavy absence

(10 days or more during the fall quarter) due to acute asthma in about
47 middle schoolers in the district, or 12% of all students with heavy
absence due to acute asthma.
10A Prevalence of chlamydia at the 12-month follow-up (3 pts):
Cases 18 cases found at 12-month follow-up

Prevalence = ----- = -------------------------------------- = 2.2%
PAR 810 youth tested at 12-month follow-up
-5-
10B Average incidence density of chlamydia (average simply means one

number that applies to the entire two-year interval, rather than one
rate for each three-month interval - if you compute the latter rates,
however, and take the average, you should obtain the same result as
the overall incidence density) (3 pts):
(Total) Cases
Prevalence = ---------------------
(Total) person-time
(15 + 23 + 8 + 18 + 17 + 17 + 14 + 11) cases

= ------------------------------------------------------------
(890 + 870 + 850 + 820 + 780 + 760 + 710 + 630) x 3 months
123 cases
= ------------------ = 0.65/100 person-months = 7.8/100 person-yrs
18,930 person-months
10C Reasons for preferring incidence density in this case (6 pts):
These diseases have an extended risk period (i.e., one longer than the
period of observation)
People can acquire these diseases more than once
Different lengths of follow-up time per subject
11A. Inherent weaknesses in this design that make it susceptible to

obtaining inaccurate data are the potential for problems of recall,
reporting, and recording in medical records; also, there is
considerable opportunity for alcohol abuse status to influence
diagnosis of depression. (3 pts)
11B. Criteria for causal inference (6 pts)
Strength of association -- in this regard the study provides strong

evidence of causation due to its very high odds ratio
([(76)(80)]/[(20)(24)] = 12.7 -- assuming for this discussion that the
OR is not biased by design problems)
Temporality (antecedant-consequent) -- there is no indication here

that alcohol abuse preceded major depression, and the reverse seems
just as possible.
Other criteria (e.g., dose-response, biological plausibility,

experiment, analogy, consistency, coherence) either do not apply to a
single study or cannot be evaluated with the information provided.
10/19/97 EPID 168 Midterm, Fall 1997, Answer guide

The following exam questions relate to the article: Freudenheim J et al. Exposure to
breastmilk in infancy and the risk of breast cancer. Epidemiology 1994;5:324-331. You
may refer to this article du ring the examination.
NOTE:
o Write all answers on the answer sheets provided.

o You may keep the examination questions.
o Write the last five digits of your student id number in the upper right-hand
corner of each page of your answer sheets.
o This examination is closed book. However, you may use a calculator,
English, foreign language, or medical dictionary.
o When you finish please sign your name on the sign-out sheet under the
pledge:
"I have neither given nor received help from others in completing this examination."
o Good luck and happy holidays.
______________________________________________________________________________
________
1. Which of the following best characterizes the present study as presented in the article
(2 pts):
A. Analytic study to investigate the hypothesized relationship in an available

dataset
B. Descriptive study using available data
C. Analytic study of data collected to investigate the hypothesized relationship
D. A post-hoc analysis of data collected primarily for another study (i.e., of
secondary data)
2. Find an example from the paper for each of the following (give the page number and
quote enough of the words to identify the point or passage; the same point or phrase
cannot be used more than once) (2 pts each)
a. A finding from a migrant study or studies;

b. A finding from descriptive epidemiology;
c. An association from an ecologic study.
3. Several previous studies of exposure to breastmilk and risk of breast cancer in
adulthood reported little association in crude analyses (p. 324). The authors suggest
that the absence of an association could have resulted from a fai lure to adjust for
age. Which of the following best explains why failure to adjust for age could have
obscured an underlying true association. (Choose one best answer.) (2 pts)
A. Age is causally related to breast cancer risk and an infant’s age is related to
her exposure to breastmilk.
B. Age is causally related to breast cancer risk and infant feeding practices have
changed over time.
C. Age is causally related to breast cancer risk but not associated with breast
feeding purchases.
D. Age is causally related to breast cancer risk but is causally related to breast
feeding practices.
4. The authors describe their study as a case-control study of dietary and reproductive
factors for breast cancer (p. 324). Which of the following best describes the type of
situation for which case-control studies are most advantageo us compared to other
designs. (choose one best answer). ( 2 pts)
A. rare exposure, common endemic disease.

B. rare exposure, rare endemic disease.
C. common exposure, common endemic disease
D. common exposure, rare endemic disease
5. The authors used the term "cohort effects" in regard to results from previously
reported studies. Which of the following best describes what is meant by cohort
effects in this context? (choose one best answer). (2 pts)
A. Breast cancer cases are heterogeneous with respect to known factors.

B. Secular changes in infant feeding practices result in an association between
age and exposure to breastmilk.
C. Breast cancer and control subjects come from nonoverlapping birth cohorts.
D. Recall accuracy of breastmilk exposure may differ by birth cohort.
6. Cases in this study were incident cases of conformed cancer of the breast (p. 325).
Which of the following best describes the advantage of selecting incident cases over
prevalent cases (choose one best answer) (2 pts)
A. selecting from a pool of prevalent cases would make separation of factors
associated with risk and those with survival more difficult.
B. selecting from a pool of prevalent cases would make exposure assessment
more difficult because of pre-existing disease status.
C. selecting from a pool of incident cases creates a more homogenous case
group with regard to unknown confounding factors.
D. selecting from a pool of incident cases reduces misclassification bias.
7. The authors characterize this study as a case-control study of primary and

histologically confirmed cancer of the breast in women. For each of the two key
terms in this phrase, briefly explain its meaning and significance for the s tudy: (2 pts
each)
a. primary
b. histologically-confirmed
8. In this study, controls were selected by a random process from residents of the two
counties and were frequency age matched to cases (p. 325). Which of the following
best describes a reason for preferring community controls over ho spital-based
controls for this study? (choose one best answer). (2 pts)
A. the random selection of controls from the community usually produces

groups of cases and controls that are similar in known and unknown
confounding variables.
B. the random selection of controls from the community provides a better
estimate of breastmilk exposure among the source population.
C. the random selection of controls from the community ensures that the
subsequent odds ratio is not an overestimation of the association of breast
feeding and adult breast cancer.
D. The random selection of controls from the community reduces the likelihood
of differential misclassification of exposure in cases and controls.
9. Information on breastmilk exposure was based on subject's self-report (p. 325). If

exposure information could also be obtained from an independent source (such as
physician records, or reports from parents), then the agreement betw een these two
methods could be compared. Which of the following measures would be most
appropriate to quantify the reliability between the two methods? (choose one best
answer). (2 pts)
A. kappa coefficient
B. correlation coefficient of reproducibility
C. intraclass correlation coefficient
D. product-moment correlation
E. A or B
F. A, B, or C
10. In a hypothetical validation study of self-report of being breastfed as an infant, the

presence of a newly discovered antibody that could serve as a "gold standard"
indicator of being breast fed as an infant was compared to self-re port. Testing for
the presence of this new antibody is very expensive and was done only on the 204
cases age 40-50 (see table 1). The following data from the validation study were
compiled. Calculate the (a) sensitivity, (b) specificity, (c) positive pre dictive value,
and (d) predictive value of a negative test. Construct an appropriate 2x2 table and
show your work (6 pts)
Data from validation study:
1. the breastfed antibody was found in 73.5% cases.

2. 80 self-reports were false negative
11. From the data presented in Table 1 answer the following:
a. For premenopausal women with greater than a high school education,

compute and interpret the odds ratio for having breastfed as an infant and
breast cancer as an adult. (2 pts)
b. Referring to your analysis in part (a), assume now that 20% of controls who
gave a positive history of having been breastfed had not in fact been
breastfed, but that all other data were correct. Compute and interpret the odds
ratio for having breastfed as an infant and breast cancer as an adult under this
assumption. (2 pts)
c. Which of the following best describes the type of misclassification illustrated
in part (b) above. (2 pts)
A. differential misclassification of disease and exposure status

B. differential misclassification of exposure
C. nondifferential misclassification of exposure
D. nondifferential misclassification of disease and exposure status
12. For each of the following statements, indicate if it is TRUE OR FALSE: (1 pt each)
a. By matching the controls to the cases on age, the authors have ensured that
age will not be a confounder .
b. The procedure for identifying cases is essentially one of active surveillance.
c. The difference between the proportion of cases interviewed and the
proportion of controls interviewed will cause selection bias.
d. The fact that premenopausal controls who had been breastfed were somewhat
older than controls who had not (page 325, bottom of col. 2) indicates
frequency matching by age did not "work.
e. The absence of an association between age and breast cancer in tables 1 and 2
is likely to be a reflection of selection bias from the low response rates for
cases and controls.
f. In postmenopausal women there appears to be a "dose response"
relationship between body mass index and the association between having
been breastfed.
g. A case-control study design is often the design of choice in outbreak
investigations.
h. For a factor under study to be considered an effect modifier it must be an
independent risk factor for the outcome of interest
13. A list of control variables for use in the logistic regression models appears on page
325, middle of column 2. These variables have been chosen because they (choose one
best answer): (2 pts)
A. are likely to be associated with breast cancer risk in the bottle-fed women.
B. are known or suspected risk factors for breast cancer, or at least proxies for
such factors
C. are likely to be associated with infant feeding history in the controls
D. are likely to be associated with infant feeding history in the cases
14. The presentation of data in Table 2 can be used to examine a number of

relationships. Using these data give a numerical example of each of the following
(show your work and in one sentence explain what the number means): (2 pts eac h)
a. An association between breast cancer risk and having zero pregnancies. Use
> 3 pregnancies as a reference.
b. An association between having been breastfed and being over 165 cm in
height. Use <160 cm as a reference.
c. An association between breast cancer and having been breastfed, overall.
15. On page 326, 2nd column, the authors state "As shown in Table 3, the risk of breast
cancer associated with having been breastfed, was about 0.7 for both pre- and
postmenopausal women." In this context, to which of the following epi demiologic
measures does the term "risk" refer? Choose one best answer. (2 pts)
A. Cumulative incidence
B. Incidence density
C. Attributable risk
D. Odds ratio
16. Using the data in Table 3, estimate AND state the meaning of the following
measures (for this question you may ignore the possibility of selection bias in cases
and controls):
a. Attributable Risk Proportion (ARP) for NOT having been breastfed for all
breast cancer (both premenopausal and postmenopausal breast cancer,
combined). Note that an ARP is also known as the etiologic fraction in the e
xposed. (3 pts)
b. Population Attributable Risk Proportion (PARP) for NOT having been
breastfed for premenopausal and for postmenopausal breast cancer,
separately (i.e., 2 PARP's). Note that the PARP is also known as the etiologic
fract ion. (4 pts)
c. Why would you or would you not expect the PARP to be different for
premenopausal breast cancer compared to the PARP for postmenopausal
breast cancer case in this investigation (part b)? (2 pts)
17. In the multiple logistic model referred to as Model 2 in Table 3, what was the
coefficient for the variable not-having-been-breastfed among all breast cancer cases?
(2 pts)
Which of the following assumptions is involved in that model? Indicate True or False
for each assumption. (1 pt each)
a. The odds of breast cancer vary as the product of the odds for age and the
odds for education.
b. The odds of breast cancer vary as the sum of the odds for age and the odds for
education.
c. Age, education, and not having been breastfed were independent of (i.e.,
uncorrelated with) each other.
d. Breast cancer is a rare disease.
18. Suppose that cases who refused to participate in this study were less likely to have
been breastfed as infants than those who participated in the study. Which of the
following best describes what this fact would imply for the obser ved relative risk
associated with being breastfed compared with what would have been observed had
all persons participated I the study? (choose one best answer). (2 pts)
A. the observed relative risk would be biased away from the null.
B. the observed relative risk would be subject to selection bias and the direction
of the bias can not be estimated.
C. the observed relative risk would be biased toward the null.
D. the observed relative risk would be subject to misclassification bias and the
direction of the bias can not be estimated.
19. In table 3, the confidence intervals for the OR's for all women do not include the
value 1.0, whereas all but one of the OR's for premenopausal breast cancer and
postmenopausal breast cancer do. Mathematically, what does this patte rn reflect? (2
pts)
20. On page 324, 2nd column, the authors offer a possible explanation of why two
previous studies of breastfeeding and breast cancer found little crude association,
observing that the result may have been "confounded by a fa ilure to adjust for age,
because of cohort effects with regard to breastfeeding frequency". The following
stratified analysis has been constructed to illustrate a situation where cohort effects
with regard to breastfeeding completely obscure a true prote ctive association seen
when age is controlled.
Age < 60 Age > 60 Total
Breastfed Bottlefed Breastfed Bottlefed Breastfed Bottlefed
Cases 24 40 256 100 280 140
Controls 79 86 204 54 280 140
OR 0.653 0.678 1.0
Based on these hypothetical data:
a. demonstrate that there is a cohort effect for breastfeeding, (2 pts)
b. briefly explain (1-2 sentences referring to specific numbers or calculations for

these tables) how failure to adjust for age interferes with finding a protective effect of
breastfeeding. (2 pts)
21. An epidemiology graduate student finds evidence in the literature that childhood
sunlight exposure may affect adult breast cancer risk. To explore this hypothesis, she
obtains from the authors the palace of birth for all of the sub jects in the present
study and constructs a sunlight exposure variable ('high" or "low") based on
geologic and meteorologic data for the years of the subject's childhood. Her data
show that 56.2% of the 219 premenopausal women who were not breastfed as i nfants
grew up with "high" sunlight exposure. Based on this fact and the partially-
completed tables below, (a) calculate the odds ratio of breast cancer with respect to
breastmilk exposure within each of the two sunlight exposure strata, and (b) briefly
describe the relationship of the sunlight exposure variable to the association between
breast cancer and breastmilk exposure (i.e. in relation to confounding and effect
modification. (4 pts)
High sunlight Low sunlight
Cases Controls Total Cases Controls Total
Breastfed 24 67
Bottlefed 81 36
Total 191 284
22. Use the data from Table 2 (Distribution of Characteristics of Postmenopausal Cases
and Controls) to draw separate 2 x 2 tables for women who have had : 0 pregnancies,
1-2 pregnancies, and >=3 pregnancies. (5 pts)
a. calculate odds ratios for each of these three categories.

b. Assuming no effects of confounding, interpret your findngs in part (a).
23. A hypothetical cross-sectional ancillary study to this report was conducted. In that
study a survey of breast cancer annual incidence rates in geographically distinct
areas was completed. Region A in the upper Midwest were breast c ancer mortality is
high, and Region B the Southeast where mortality from breast cancer is low. The
following data were obtained.
Region A Region B
Age No. Population Rate/1,000 No. Population Rate/1,000

of of
cases cases
< High 40- 10 7,000 1.4 10 15,000 0.7

School 50
Education
51- 15 10,000 1.5 20 5,000 4.0
60
61- 30 3,000 10 600 55,000 10.9

65
Total 55 20,000 630 75,000
>= High 40- 5 1,000 5.0 6 2,000 3.0

School 50
Education
51- 5 2,000 2.5 10 15,000 0.7
60
61- 4 500 8.0 4 1,000 4.0

65
Total 14 3,500 20 18,000
Grand total 69 23,500 650 93,000
Crude 2.9
Compute the following (for adjusted rates use the direct method and the total
population as a standard):
a. the overall region B crude event rate. (1 pt)

b. Age and educational achievement adjusted rate for Region B: (2 pts)
c. Age and educational achievement adjusted rate for Region B: (2 pts)
d. Compare the overall crude rates with the age and educational achievement
adjusted rates. Briefly explain your findings. (2 pts)
24. Write a brief statement for or against a causal relationship between breastfeeding in
infancy and risk of breast cancer as an adult. Comment specifically on at least two of
Bradford Hill's criteri for causal inference. Include in y our comments data or
statements from the article. (5 pts)
25. Assuming that this relationship is causal, why might a similar study, 50 years from
now, fail to find as strong a relationship? (2 pts)
Format 8/4/2000 vs


Answer Guide
1. C. Analytic study of data collected to investigate the

hypothesized relationship
2. a. A finding from a migrant study or studies: "Studies of

migrants provide some evidence; for example, migrants to the
United States from Japan experienced a rate of breast cancer
intermediate between the lower rate in Japan and the higher
rate in the U.S."
b. A finding from descriptive epidemiology: *Many possibilities,

including either of these sentences:
"This finding implies a possible connection between the
trend toward increasing bottlefeeding in the postwar
period and current trends toward increasing incidence of
breast cancer. Furthermore, it offers a partial
explanation of the international variation in breast
cancer rates, with rates considerably lower in less
developed than in developed nations."
c. An association from an ecologic study: *"Micozzi found mean

adult height and breast cancer incidence in 30 countries to be
highly correlated (r=0.8)."
3. B. Age is causally related to breast cancer risk and infant feeding
practices have changed over time.
4. D. Common exposure, rare endemic disease.
5. B. Secular changes in infant feeding practices result in an association

between age and exposure to breastmilk.
6. A. selecting from a pool of prevalent cases would make separation of

factors associated with risk and those with survival more difficult.
7. a. Primary -- Primary breast cancer is a tumor that originates in the

breast, rather than a tumor in the breast that is the result of
metastasis from a tumor that originated in another location or
tissue. In general, tumors originating in the same organ and
tissue are more likely to have similar etiologies than are
tumors that originate in different organs.
b. Histologically-confirmed -- histological confirmation

refers to the verification of the diagnosis (of breast cancer)
through laboratory examination of tumor tissue. Microscopic
examination of tumor cells establishes the existence and type
of tumor with a greater degree of certainty than does a
clinical diagnosis alone. Counting only histological-confirmed
cases reduces the potential for false positive breast cancer
diagnoses and the misclassification bias will cause.
8. B. The random selection of controls from the community provides a

better estimate of breastmilk exposure among the source population.
9. A. Kappa coefficient
10. Table:
Biomarker validation of women's self-report of having been breastfed
Breastfeeding biomarker found
Yes No Total
S r --------------------------------------------
e e Breastfed 70 26 96
l p
f o Not breastfed 80 28 108
r --------------------------------------------
t Total 150 54 204
Derivation: 204 cases tested (overall total), 73.5% (=150) have

the marker (so 54=204-150 do not), 80 are false negatives by
self-report (so 80 = "yes" biomarker, "no" self-report), and the
remaining cells and marginals are obtained from these numbers.
a. Sensitivity = 70 / 150 = 47% (Answers the question, "Of women

who truly were breastfred, as demonstrated by the presence of the
biomarker for having been breastfed, what % were correctly
classified by self-report?"))
b. Specificity = 28 / 54 = 52% (Answers the question, "Of women

who were not breastfed, as demonstrated by the absence of the
biomarker, what % were correctly classified by self-report?")
c. Positive predictive value (PPV) = 70 / 96 = 73% (Answers the

question, "Of women classified, on the basis of their self-report,
as 'having been breastfed', what % were correctly classified?")
d. Negative predictive value (NPV) = 28 / 108 = 26% (Answers the

question, "Of women classified, on the basis of their self-report,
as 'not having been breastfed', what % were correctly classified?")
11. a. Table:
Adult breast cancer by having been breastfed as an infant,
among premenopausal women with education beyond high school
Case Control Total

------------------------
Breastfed 61 93 154
Not breastfed 69 61 130

-------------------------
Total 130 154 284
OR = (61 x 61) / (93 x 69) = 0.58.
Interpretation: having been breastfed appears to be protective

against female adult breast cancer, with a reduction in risk of
approximately 40%.
b. Table:
Adult breast cancer by having been breastfed as an infant,

among premenopausal women with education beyond high school,
assuming that 20% of controls who reported having been
breastfed had in fact not been
Cases Controls Total
-------------------------
Breastfed 61 74 135

-------------------------
Total 130 154 284
Derivation: 20% of the 93 controls who reported having been

breastfed had not been, so 20% of 93 (=18.6->19) are switched from
"Breastfed" to "Not breastfed", being added to the 61 who reported
not having been breastfed. The remaining 80% of 93 (=74.4->74)
remain in the upper row.
OR = (61 x 80) / (74 x 69) = 1.0, i.e. no association.
c. B. differential misclassification of exposure
12. TRUE or FALSE
a. False - matching controls to cases does not prevent the

matching variable (age) from being associated with the exposure
(having been breastfed), so the matching cannot prevent
confounding. (See also d. and e.)
b. True - The nurse telephoned hospitals on a frequent, regular

basis, to identify all breast cancer cases.
c. False - The difference in the proportions interviewed among

cases and among controls provides a great deal of potential for
selection bias, but if nonparticipation was not related to having
been breastfed then selection bias will not occur.
d. False - The matching caused cases and controls to have the same
age distribution, so it did "work"; matching would not be expected
to eliminate an association between age and the exposure, since
exposure status was not known when controls were being selected and
in any case would not have been used in the matching procedure.
e. False - The matching procedure prevented an association.
f. False - The association between body mass index and breast

cancer can be assessed by estimating odds ratios from Table 2. To
avoid confounding infant feeding history we should preferably
assess the association separately in breastfed women and in women
who have not been breastfed (omitting the complexities from
considering body mass to be an intervening variable in the effect
of infant feeding history). To avoid being misled by a possible
"synergism" involving infant feeding and body mass, ideally we
would look in the "unexposed" group. However, although this study
focuses on breastfeeding, one can also consider "formula feeding"
as an exposure that might be "synergistic" with body mass. So we
can choose either exposure group (or both).
Here are the computations:
From Table 2:
Cases Controls
------------------------- -------------------------
Breastfed Not breastfed Breastfed Not breastfed
Body mass ---------- -------------- --------- -------------
index (kg/mz)
16-22 48 15 89 19
23-27 103 26 125 16
>27 90 17 91 16
To show the details, here is a table for estimating OR's for body mass index and breast
cancer:
Breastfed Not breastfed Total

Body mass --------------- --------------- ---------------
index (kg/m sq) Cases Controls Cases Controls Cases Controls
16-22 48 89 15 19 63 108
23-27 103 125 26 16 129 141
>27 90 91 17 16 107 107
and the resulting OR's are [e.g., (90 * 89) / (48 * 91) = 1.83]:
Breastfed Not breastfed Total

Body mass --------- ------------- ---------
index (kg/m sq)
16-22 (ref. level) 1.0 1.0 1.0
23-27 1.83 2.06 1.57
>27 1.83 1.34 1.71
The OR's in the total column are shown to illustrate that in this
case there is some confounding by breastfeeding history, at body
mass index level 23-27 kg/m sq. Within either breastfed or not
breastfed group there is no "dose-response" relationship.
g. True - Generally, generally an outbreak investigation begins

after the outbreak has begun and the investigation seeks to
determine what characteristics of cases might have been responsible
for their disease. If the cases happened to be part of an existing
cohort for which the requisite exposure information was already
available in some form, then a retrospective cohort study would be
another possibility. If cases are still occurring a prospective
cohort study might be initiated, but the better an idea the
investigators have about which exposures to assess, the more they
should intervene to minimize the occurrence of additional cases.
h. False - for a factor to be considered a confounder, it must be

an independent risk factor for the outcome, but this requirement
does not pertain to effect modification. For example, genital
ulcers cannot cause HIV by themselves, but in conjunction with a
sex partner who is HIV infected, genital ulcers can increase
(modify) the risk of HIV infection.
13. Potential confounders are factors that are known or suspected risk
factors for breast cancer or its detection, or at least proxies for
such factors.
14. a. Breast cancer risk and no previous pregnancies

-------------------------------
No pregnancies 50 38 88
>= 3 pregnancies 167 216 383

-------------------------------
Total 217 254 471
OR = (50 x 216) / (38 x 167) = 1.7 (for zero vs. >= 3 pregnancies)
Interpretation: having never been pregnant was associated with an

increased breast cancer rate, with an apparent 70% greater rate
among nulligravidae (women who have never been pregnant).
Other choices of a reference level produce the same result, e.g.,
1-2 pregnancies as the reference level:
OR = (50 x 102) / (38 x 82) = 1.6.
If both groups, 1-2 pregnancies and 3+ pregnancies are combined

and used as the reference group, then:
OR = (50 x 318) / (38 x 249) = 1.7
b. Height above 165 centimeters and having been breastfed
Height > 165 cm < 160 cm Total

-----------------------------------
Breastfed 148 183 331
----------------------------------
Total 189 208 397
OR = (148 x 25) / (183 x 41) = 0.49.
Interpretation: Women who were breastfed were less likely

to be over 165 cm. tall.
Other possible OR's --
> 165 vs. 160-165: OR = (148 x 43) / (213 x 41) = 0.73
> 165 vs. all others: OR = (148 x 68) / (396 x 41) = 0.62
c. Breast cancer and having been breastfed (crude)

----------------------------------
Breastfed 241 305 546

----------------------------------
Total 299 356 655
OR = (241 x 51) / (305 x 58) = 0.69
Interpretation: having been breastfed was associated with lower

risk of breast cancer
15. D. The statement refers to the (relative) risk of breast cancer

between women who were and were not breastfed, estimated using
the odds ratio.
16. a. Estimate RR for Not breastfed as 1/OR for Breastfed: 1 / 0.69 = 1.45
ARP = (RR - 1) / RR = (1.45 - 1) / 1.45 = 0.45/1.45 = 0.31

Interpretation: Some 31% of breast cancer in women who were not
breastfed was attributable to their having not been breastfed.
b. If know the formula (or can derive it from the diagram and the
"grand synthesis"):
P(E|D) (RR-1)
PARP = --------------- and since breast cancer is rare, use OR.
RR
(117)
----------- (1.47-1)
(117+112) (0.51) (0.47)
Premenopausal: ----------------------- = --------------- = 0.16
1.47 1.47
AND
(58)
-------------- (1.45-1)
(58+241) (0.19) (0.45)
Postmenopausal: ------------------------- = --------------- = 0.06
1.45 1.45
Meaning: In women who wre not breastfed, some 16% of premenopausal

breast cancer and some 6% of postmenopausal breast cancer were
attributable to their having not been breastfed.
OR, reason as follows:
Proportion of exposed (Not breastfed) cases that are atttributable to not having been
breastfed is:
ARP = (RR-1)/RR
Since breast cancer is rare, we can estimate with
(OR-1)/OR = (1.47-1) / 1.47 = 0.3197 for postmenopausal.
However, this proportion applies only to cases who are exposed

(because ARP is "proportion of exposed cases . . ."). So estimate
proportion of all cases who are exposed:
= Pr(Exposed|Case) = 117 / (117+112) = 0.51 for postmenopausal
Muliplying 1. by 2., 0.51 x 0.3197 = 16% for postmenopausal
c. The PARP for premenopausal breast cancer is expected to be

greater due to the secular decrease in breastfeeding during the
decades when these women were infants. Thus, the proportion
exposed to not having been breastfed is substantially greater for
the premenopausal breast cancer cases. Hence, their PARP is
greater.
17. Logistic model coefficients for risk factor variables are natural
logarithms of odds ratios per one unit change in the variable.
So the coefficient was ln(0.70) = -0.3567
Assumptions:
a. True - The odds of breast cancer vary as the product of the odds
for age and the odds for education.
b. False - Only in a few special cases will the product of two odds
equal their sum (e.g., both odds equal zero or both odds equal two).
The logistic model is additive in the logit (logarithm of odds),
multiplicative in the odds.
c. False - One of the reasons for using mathematical modeling is

that the risk factors (exposures and potential confounders) ARE
associated (i.e., not independently distributed)
d. True - Breast cancer is a rare disease.
18. C. The observed relative risk would be biased toward the null.
19. Smaller sample sizes produce wider confidence intervals, so if the

point estimates for the crude and stratum-specific measures are about
the same, then the confidence intervals for the latter will be wider.
20.
AGE < 60 AGE > 60 TOTAL
----------------------------------------------------
Breast Bottle Breast Bottle Breast Bottle
------ ------ ------ ------ ------ ------
Cases 24 40 256 100 280 140
Controls 79 86 204 54 280 140

----------------------------------------------------
OR 0.653 0.678 1.0
a. Control women in older stratum are more likely to have been

breastfed than control women in the younger stratum, e.g., odds of
having been breastfed are 0.9 (79/86) among younger women and 3.8
for AGE > 60.
b. Age is a strong risk factor for breast cancer, so if breastfed

women were older than bottle-fed women, than a possible protective
effect of breastfeeding could have been offset by the greater risk
associated with older age.
21. An epidemiology graduate student finds evidence in the literature

that childhood sunlight exposure may affect adult breast cancer risk.
To explore this hypothesis, she obtains from the authors the place of
birth for all of the subjects in the present study and constructs a
sunlight exposure variable ("high" or "low") based on geologic and
meteorologic data for the years of the subject=B9s childhood. Her data
show that 56.2% of the 219 premenopausal women who were NOT breastfed
as infants grew up with "high" sunlight exposure. Based on this fact
and the partially-completed tables below, (a) calculate the odds ratio
of breast cancer with respect to breastmilk exposure within each of the
two sunlight exposure strata, and (b) briefly describe the relationship
of the sunlight exposure variable to the association between breast
cancer and breastmilk exposure (i.e. in relation to confounding and
effect modification. (4 pts)
High Sunlight Cases Controls Total

Breastfed Yes 44 24 68
Breastfed No 81 *42 123
Total 125 66 191
Low Sunlight Cases Controls Total

Breastfed Yes 67 *120 187
Breastfed No 36 *61 97
Total 103 181 284
* crude from Table 1 or Table 3 = 0.68

High sunlight OR = (44x42)/(24x81) = 0.95
Low sunlight OR = (67x61)/(120x36) = 0.95.
Sunlight is a confounder of the protective effect of breastfeeding
as an infant. It is not an effect modifier.
22. Use the data from Table 2 (Distribution of Characteristics of

Postmenopausal Cases and Controls) to draw separate 2 x 2 tables for
women who have had: a. 0 pregnancies, b. 1-2 pregnancies, c. >=3
pregnancies. Be sure to include appropriate labels. (5 pts)
0 pregnancies 1-2 pregnancies 3 pregnancies

Cases Controls Cases Controls Cases Controls
Breast 34 35 71 90 136 180
Bottle 16 3 11 12 31 36
Total 50 38 82 102 167 216
a) Calculate odds ratios for each of these three categories.
0 pregnancies: OR = (34 x 3) / (16 x 35) = 0.18
1-2 pregnancies: OR = (71 x 12) / (11 x 90) = 0.86
>=3 pregnancies: OR = (136 x 36) / (31 x 180) = 0.88
b) Assuming no effects of confounding, interpret your findings in

part (a).
There is effect modification. The magnitude of the protective
effect of having been breast-fed on development of breast cancer
is dependent on pregnancy history. Having been breast-fed is a
stronger protective factor for those women who never had a pregnancy.
23. A hypothetical cross-sectional ancillary study to this report was

conducted. In that study a survey of breast cancer annual incidence
rates in geographically distinct areas was completed, Region A in the
upper midwest where breast cancer mortality is high, and Region B the
Southeast where mortality from breast cancer is low. The following
data were obtained.
Region A Region B
Cases Population Rate/1000 Cases Population Rate/1000
< High School Education
Age
40-50 10 7,000 1.4 10 15,000 0.7
51-60 15 10,000 1.5 20 5,000 4.0
61-65 30 3,000 10 600 55,000 10.9
Total 55 20,000 630 75,000
High School Education

Age
40-50 5 1,000 5.0 6 2,000 3.0
51-60 5 2,000 2.5 10 15,000 0.7
61-65 4 500 8.0 4 1,000 4.0
Total 14 3,500 20 18,000
Grand Total 69 23,500 650 93,000
Crude 2.9
a. Compute the overall Region B crude event rate: (1 pt) = 7.0/1000
Using the total population as a standard compute the following by the

direct method of adjustment:
b. Age and educational achievement adjusted rate for Region A (2 pts)
= 6.0/1000
c. Age and educational achievement adjusted rate for Region B (2 pts)
= 6.3/1000
d. Comparison of the overall crude rates with the age and educational
achievement adjusted rates.
Briefly explain your findings. (2 pts): Much of the difference

between the crude rates of the two regions is due to the different
distributions of age and educational achievement.
24. Causal relationship - Comment specifically on at least two of Bradford

Hill's criteria for causal inference. Include in your comments data or
statements from the article. (5 pts)
25. Assuming that this relationship is causal, why might a similar study,
50 years from now, fail to find as strong a relationship? (2 pts)
Formula changes (less fat), overfeeding reduced reflecting recent trends.
_____________________________________________
Schoenbach, \ epid168 \ exams 1997 Final exam - answer guide;

12/10/1998, 12/12/1998

Midterm Exam, Fall 1996

EPID 168
Most of the questions in this examination are based on the article:
Garry VM, Schreinemachers D, Harkins ME, Griffith J. Pesticide appliers,

biocides, and birth defects in rural Minnesota. Environ Health Perspect
1996;104:394-399.
A copy of this article was provided to you before this examination and can be
used in answering the following questions.
1. Briefly state the primary study question of this report. Identify the
main exposure and outcome of interest. (3 pts)
2. Briefly explain the difference between disease classification based on

manifestational criteria and disease classification based on causal
criteria. What is the logic for analyzing the data in relation to
categories of anomalies grouped by organ system? (4 pts)
___________________________________________________________
3. As discussed in class, epidemiologic studies often have both

descriptive and analytic characteristics. State one way in which this
study is descriptive and one way in which it is analytic? (4 pts)
4. The reporting of birth defects was provided in accord with state

statutes, and grouping of birth defects categories followed the
National Centers for Health Statistics guidelines (page 394 second
paragraph - methods). This reporting of birth defects is an example
of which of the following types of data collection methods. Choose
one best answer. (4 pts)
A. Active surveillance
B. Ongoing crossectional survey
C. Passive surveillance
D. Follow up study of dynamic population
5. This study determined exposure and outcomes using data from "a list of
all members of the agricultural community who were certified to apply
restricted-use pesticides in 1991" (p. 394-methods) and from "all in-
wedlock live births recorded in the state for the years 1989 through
1992" (p. 394-methods). Briefly assess the strength of these data
sources in establishing the temporal sequence of pesticide exposure
and birth defects and provide support for your assessment. (4 pts)
6. For each of the following epidemiologic measures, indicate whether it

is a rate, a proportion, or a ratio that is neither a rate nor a
proportion, or none of these. Circle the best answer (4 pts)
A. Population attributable risk (PAR) rate proportion ratio neither

B. Incidence density (ID) rate proportion ratio neither
C. Prevalence rate proportion ratio neither
D. Relative risk rate proportion ratio neither
7. The use of the term "rate" is not an infallible guide to the specific
epidemiologic measure being presented. Which one of the following
epidemiologic measures best characterizes the measure that the authors
refer to as the "rate of anomalies per 1000 live births" (Table 2 -
footnote)? Choose one best answer. (4 pts)
A. cumulative incidence (CI)

B. incidence density (ID)
C. prevalence
D. attributable risk proportion
8. The authors indicate that table 1 supports their statement...

"pesticide appliers had significantly more children with an anomaly
than did nonappliers" (p.395 results first paragraph). This
statement is readily understood but not literally correct. Which one
of the following state the finding more precisely? Choose one best
answer. (4 pts)
A. pesticide appliers had 1.37 times more births with anomalies than
did the general population.
B. pesticide appliers had more children with birth anomalies than did
the general population.
C. pesticide appliers had a greater proportion of births with

anomalies as compared to the general population.
D. Pesticide appliers accounted for more births with anomalies than

did the general population.
9. Table 1 presents both crude and age-adjusted odds ratios. In the

table, the age adjusted odds ratio for gastrointestinal anomalies is
slightly larger than the crude estimate, as is the case for most of
the odds ratios presented. If the difference between the crude and
age-adjusted odds ratios had been large, explain in general terms what
this would mean regarding the respective ages of the pesticide
appliers and the general population. Assume the maternal age
structure of the combined population was used as the standard. (3 pts)
___________________________________________________________________
10. Using data in Table 1:
a. Compute an estimate of the potential impact of pesticides on birth

anomalies (in wedlock, all types together) to fathers who are
certified pesticide appliers. State the assumption required to
interpret this estimate. (4 pts)
b. Compute an estimate of the potential impact of pesticides on birth

anomalies (in wedlock, all types together) in the Minnesota
population as a whole. (3 pts).
11. Using the data presented in Table 1, recalculate the crude odds ratio
for all births with anomalies assuming that all musculoskeletal birth
anomalies occurring among those with maternal age greater than 30 and
the "other" anomalies among maternal age > 35 were later found to
actually have occurred among persons incorrectly classified as
appliers. Explain what implications this new calculation would have
on the conclusions of the study. (3 pts)
___________________________________________________________________
12. It is possible that the pesticides examined in this study might have
reduced fecundity or increased the proportion of conceptions not
resulting in live births. Assume that both of these effects (lower
fecundity, more spontaneous abortions, and more still births) have in
fact occurred in the pesticide applier population studied here, so
that the number of live births to pesticide applier fathers is smaller
than it would have been in the absence of pesticide exposure. Which of
the following statements is (are) TRUE and which is (are) FALSE? (2
pts each)
TRUE FALSE
____ ____ A. Since all births would be affected equally, effects on
fecundity and spontaneous abortion WOULD NOT have influenced
the size of the odds ratio presented in this study. [This
question is problematic.]
____ ____ B. If pesticides were equally likely to cause fetal loss and birth
anomalies, then the odds ratios would strongly understate the
harmful effects of pesticides.
13. Table 4 shows the frequency per 1000 births of major anomalies for the
general population by region. Which of the following best describes
the study design from which these data were obtained. (4 pts)
A. ecologic study
B. prospective cohort study
C. retrospective cohort study
D. region-specific case control study
14. The authors begin their discussion section by stating that this report
"is an initial step in the evaluation of the possible relationships
between the frequency of birth anomalies and pesticide use". They
conclude, however by saying that these data "signify a clear-cut need
for comprehensive examination of the health issues involved". This
latter statement seems to indicate that the authors suspect a causal
relationship. Identify and describe three criteria for causal
inference for which at least some information is present in the
article. Give specific examples from the article to support your
selection. (9 pts)
___________________________________________________________________
15. Suppose that after this publication came out, another study was
conducted in Illinois to investigate the hypothesis that birth defects
occurred more often in Illinois as compared to Minnesota. However,
in this new study the authors thought that the type of water consumed
could be related to birth defects. They wanted to adjust
(standardize) the rates of defects in the two states for water type.
Data from the two studies are compared as below.
Births by state and water type
Minnesota Pesticide Appliers Illinois Pesticide Appliers

Normal With anomalies Normal With anomalies
Water Type (#) (#) rate* (#) (#) rate*
Well water only 3379 93 26.8 100 2 ____

City water only 874 27 30.0 200 6 ____
Bottled water only 206 5 23.7 7293 145 ____
Total 4456 125 28.0 7593 153 ____
* per 1000 live births
a. calculate the crude rate and the water-type specific rates for
Illinois. Briefly describe how these two states compare in crude
rates of birth anomalies. (4 pts)
b. Using the combined number of live births as a standard, calculate

a standardized rate (standardized for water type) for each of the
states. Briefly describe how these standardized rates compare
with each other and reasons why they may or may not agree with the
crude rates. (6 pts)
16. Would an inference of causality based on the data in Table 4 be

subject to criticism based on the ecologic fallacy concept. Briefly
explain your answer. (2 pts)
17. Which of the following statements about the present study are (is)
TRUE and which are (is) FALSE. Indicate TRUE or FALSE for each
statement. (2 pts each)
TRUE FALSE
____ ____ A. Subjects used in the analyses for Table 1 of this study were
selected on the basis of their exposure status.
____ ____ B. Table 4 in this study supplied dose response evidence to

support an inference of a causal relationship between
pesticides and birth defects.
____ ____ C. The age-adjusted odds ratio for all birth anomalies of 1.41 is
considered a modest association.
____ ____ D. Since birth defects of these types are rare in the general
population, a cohort study could be designed to efficiently
examine further the relationship of pesticides and birth
anomalies.
____ ____ E. Exposure status in this study was randomized resulting in an

equal distribution of known and unknown confounding variables
between pesticide appliers and the general population.
____ ____ F. a correlation coefficient is a measure of association but is

not useful in assessing the dichotomous outcomes measured in
this study.
____ ____ G. Table 1 used stratified analyses to adjust for a confounding

effect of maternal age on the association between
musculoskeletal/integumental anomalies and pesticide exposure.
[question #18 has been removed, 10/7/97]
19. Succinctly evaluate whether or not, on the basis of the information in

the article (including information that the authors cite to other
work), further measures are warranted now to prevent birth defects
caused by chlorophenoxy herbicides. (5 pts)
1/22/97, 10/7/97 - wr:vs \ mepid168 \ exams 1996 Midterm exam

Midterm Exam, Fall 1996

Answer Key - REVISED
Note: this answer guide is especially detailed in order to provide thorough

explanations of the many concepts that exam touched on (including a few it
touched on unintentionally!).
1. The primary study question for this investigation concerns the

relationship, suggested by previous studies, between exposure to
pesticides and risk of birth anomalies in offspring. The main exposure
is pesticides (assessed by the surrogate measure of being licensed to
apply certain pesticides). The main outcome is birth anomalies in
offspring, as recorded in birth records.
2. Classification of disease using manifestional criteria means grouping

disorders on the basis of their having similar observable
characteristics, e.g., symptoms, signs, behavior, laboratory findings,
onset, course, prognosis, response to treatment. Classification using
causal criteria means grouping disorders on the basis of their having the
same primary etiologic agent, which, of course, must have been previously
identified. The logic for analyzing the data in terms of organ systems
(a manifestational criterion) is that anomalies occurring in the same
organ system may be more likely to have the same (or closely related)
etiology and therefore should exhibit stronger associations with the
relevant exposure than would the more general category of all birth
anomalies.
3. The presentation of data concerning the occurrence of birth defects with

regard to place (crop region) and time (seasons) is basic descriptive
epidemiology. The fact that the study was designed with a view to
examining specific relationships of interest, which were then assessed
with measures of association and statistical tests, derives from an
analytic perspective.
4. C. Passive surveillance
5. This study cannot really establish the temporal sequence of pesticide

exposure and birth defects because a) half of births occurred before the
data used for the pesticide certification (1991); and b) the time of
actual exposure cannot be determined, since exposure is measured so
indirectly and without the ability to establish when it occurred.
6. A. Any answer can be defended - the population attributable risk (PAR) is

equal to the attributable risk multiplied by exposure prevalence or,
equivalently, the crude incidence minus the incidence in unexposed
persons. When incidence is measured as a rate (i.e., ID), then the PAR
is the difference of two rates. When incidence is measured as a
proportion (i.e., CI), then PAR is the difference of two proportions and
therefore cannot exceed 1.0. The resulting value is typically expressed
as a rate or a proportion. So this question is ambiguous -- apologies!
B. Rate - by the definition of ID

C. Proportion - by the definition of prevalence
D. Ratio - relative risk is a ratio of independently-derived risks (or
rates, if "relative risk" is interpreted as applying to the concept,
rather than specifically to the risk ratio).
7. C. prevalence - Although a birth with an anomaly is an "event", there is

no way to establish the population at risk (denominator) for these
events. For example, would the denominator population be couples, fecund
couples, fecund couples trying to conceive, embryos, recognized
pregnancies? Birth anomalies do not arise out of "live births", since
the anomalies already exist in the fetus. Therefore the "rate of
anomalies per 1000 live births" is simply the proportion of live births
in which a birth defect is present.
8. C. Pesticide appliers had a greater proportion of births with anomalies

as compared to the general population.
9. Assuming that prevalence of birth anomalies increases with increasing

maternal age, an increase in the odds ratio due to age-adjustment
indicates that the maternal age distribution in the general population is
shifted toward older ages relative to that distribution in pesticide
applier spouses. The basis for this conclusion is the following. Birth
defect prevalence was greater for pesticide applier couples. If some of
that excess were due to greater age among pesticide applier mothers, then
age-adjustment would diminish the excess, thereby decreasing the odds
ratios. Since instead, age-adjustment increased the odds ratios, then
the older ages of general population mothers must have offset some of the
excess risk due associated with pesticide exposure.
10A. Since the question does not specify absolute or relative impact, either
attributable risk (AR) or attributable risk proportion (ARP) is correct
(actually, attributable prevalence, but the term attributable risk is
typically applied to rates and prevalences as well as risks).
AR = P1 - P0 = [125 / (125 + 4456)] - [3666 / (3666 + 179,265)]

= 0.02728 - 0.02004 = 0.0072466 = 0.0072, or
7.2 per 1000 total live births
Meaning: 7.2 births with anomalies per 1000 live births fathered by
pesticide appliers are attributable to pesticide exposure.
Attributable Risk proportion (ARP) = (RR-1) / RR (using OR for RR)

= (OR - 1) / OR = (1.37 - 1) / 1.37 = 0.270 = 27%
or
ARP = AR / P1 = (0.027283 - 0.02004)/0.027283 = 0.26548 = 27%
Meaning: 27% of the prevalence of births with anomalies among all

live births fathered by pesticide appliers are attributable to
pesticide exposure.
To attribute cases to exposure requires the assumption of a causal

relationship between pesticides exposure and birth defects.
10B. Again, either population attributable risk (PAR) or population

attributable risk proportion (PARP) provide an answer.
Prevalence of paternal exposure among all live births is:
Pe = 4456 / (4456 + 179,265) = 0.02425 = 2.4% of live births
So PAR = AR x Pe = 0.0072466 x 0.02425 = 0.0655 = 0.000176

= 1.8 per 10,000 live births.
or PCrude - P0 = 0.020217 - 0.02004 = 0.000177 = 1.8 / 10,000

Meaning: 1.8 births with anomalies per 10,000 live births to the general
(married) population are attributable to pesticide exposure in pesticide
appliers.
PARP = [Pe (RR-1) ] / [1 + Pe (RR-1)] (using OR for RR)

= [(0.02425) (1.37-1)] / [1+0.02425(1.37)] = 0.0089
= 1% (approximately)
Or, using the case-control formulation,
Pe|d = 125 / ( 125 + 3666 ) = .032973
PARP = Pe|d (OR-1) / OR = (.032973) (1.37-1) / 1.37 = 0.008905

= 1% (approximately)
Or, PARP = Pe x ARP = 0.02425 x 0.26548 = 0.00644, using the ARP

from part a.
Meaning: Approximately 1% of all Minnesota live births with anomalies

are attributable to pesticide exposure in pesticide appliers.
(Note: small differences among the results from the various methods are
primarily due to the fact that the OR of 1.37 has been rounded to fewer
significant digits than are the prevalences computed above.
11. OR = 1.04 (Derivation:

"Corrected" cases in exposed = 127 - (19 + 12) = 96
Proportion in exposed = 96 / (4456 + 96) = 0.0211
"Corrected" cases in control = 3666 + 31 = 3697;
Proportion in control = 3697 / (3697 + 179,265) = 0.0202
0.0211 / 0.0202 = 1.04 = new odds ratio)
Thus, incorrectly classifying those anomalies into the exposed group

overestimates the strength of association.
12. A. False - there is no basis for assuming that all births would be
affected equally.
B. True - The total proportion of harm, including fetal loss, is:
(lost fetuses + birth anomalies)

-----------------------------------------------------
(lost fetuses + birth anomalies + normal live births)
This proportion exceeds the prevalence of birth anomalies among live

births, potentially by a substantial amount.
13. A. ecologic study - exposure is assessed at the community (region) level,

and exposure of persons is inferred based on residence in a geographic
region where pesticides are heavily used.
14. 1) Strength of association, estimated using odds ratios, is modest, and

therefore does not provide strong evidence on which to infer causal
relationships.
2) Biological plausibility - various laboratory studies and a clinical

epidemiologic study show that active ingredients and contaminants in
pesticides can be teratogenic and/or spermatotoxic. Also, several
compounds in the pesticides are endocrine disrupters.
3) Consistency (the authors cite epidemiologic studies [in Iowa,

Nebraska, Colorado] that have found similar relationships).
15. This question underwent a revision to simplify it, but unfortunately some
parts of the previous version remained. The columns labelled
"# live births" should have included the qualifier "Normal", and the
rates for Minnesota needed to be re-computed accordingly. Due to this
problem, two alternate solutions are completely acceptable, one in which
the denominators are the numbers in the "# live births" column and one in
which the denominators equal the sum of these numbers plus the numbers of
births with anomalies. In addition, full credit is given if the rates
for Minnesota were recomputed. Here is the version in which the stated
rates were used and the # of live births column was treated as if it
meant "Total live births":
Birth anomaly prevalences for Illinois, by water type:

Well water: 2/100 = 20.0 per 1000 live births
City water: 6/200 = 30.0 per 1000 live births
Bottled water: 145/7293 = 19.9 per 1000 live births
Overall (crude): 153/7593 = 20.2 per 1000 live births
Thus, the crude prevalence is higher in Minnesota than in Illinois.
Number of live births (both states combined)

--------------------------------------------
Well water 3479
City water 1074
Bottled water 7499
Total 12,052
Standardized prevalence for MN:

3479 x 26.8 + 1074 x 30.0 + 7499 x 23.7
---------------------------------------- = 25.2 per 1,000
12,052 x 1000
Standardized prevalence for IL:
3479 x 20.0 + 1074 x 30.0 + 7499 x 19.9

---------------------------------------- = 20.8 per 1,000
12,052 x 1000
The standardized prevalence for Minnesota also exceeds that for

Illinois, though by a smaller amount than the difference in the crude
prevalences. The difference has been slightly reduced because the
standardized prevalence for Minnesota gives somewhat greater weight to
the prevalence for bottled water (23.7/1000) and less to the
prevalence for well water (26.8/1000) than did the crude prevalence.
16. Yes - it is not clear from these data whether birth anomalies occurred
in people with or without exposure because exposure information was
based on group data.
17. A. False - subjects were selected from birth records for live births
B. False
C. True
D. False
E False
F. True - (however, a correlation coefficient indicates the extent of
association in the sense of two variables moving in tandem; it does
not indicate the strength of association in the epidemiologic sense
of how great a change occurs in the response variable for a change
of a given size in the exposure variable)
G. True
18. [Question removed, 10/7/97]
19. Points in favor of action at this time are the evidence that the
relationship is causal (biological plausibility, consistency between
results of ecologic [by crop-region] and individual-based [pesticide
applier] analyses, pattern of findings (season of conception),
consistency across several epidemiologic studies, and the high
attributable risk percent (27%) among babies with birth anomalies born
to pesticide applier couples. In addition, the substantially
increased prevalences of birth anomalies among all live births in
county clusters with high use of chlorophenoxy herbicides/fungicides
(Table 4), consistent across the four regions, suggest that anomalies
due to pesticides (assuming that the relationship is causal) occur
throughout areas where these pesticides are used. Even though the
population attributable risk proportion is very small (about 1%) for
exposure due to being a pesticide applier, the proportion of all
Minnesota birth anomalies potentially attributable to residence in a
county cluster with high pesticide use is 27% [overall prevalence of
birth anomalies for all Minnesota in-wedlock births was 3791 / 183,721
= 20.63 per 1000 live births (Table 1), prevalence of birth anomalies
in low-pesticide county clusters ("unexposed") was 15 per 1000 (Table
4), so PARP = (PCrude - P0) / PCrude = (20.63 - 15) / 20.63 = .27).
The effects seem to be strongest for chlorophenoxy pesticides,
suggesting that at least this category should be restricted.
Moreover, there are powerful arguments for reducing pesticide use for
environmental reasons as well.
Against taking action other than continuing research are that the
evidence is still not very strong (biological mechanisms not yet
elucidated, relationship is not highly specific, epidemiologic studies
limited and not entirely consistent, experimental evidence not
available), the potential impact on agriculture and therefore food
prices is considerable, and the costs to industry and commerce from
restrictions on a major product are substantial. Moreover, the
relative weakness of the odds ratios (below 2.0) indicates a
significant possibility that other factors could be responsible for
the increase in birth anomaly prevalence seen in association with
pesticide exposure, a possibility whose investigation requires better
data on exposure and other factors that may lead to birth anomalies.
Grading of this question is based on the clarity and support for your
evaluation and recommendation.
10/21/96, 10/7/97 - wr:eml/vs \ mepid168\ exams 1996 Midterm exam - answers rev.
University of North Carolina
Victor J. Schoenbach and Wayne D. Rosamond
Fall 1996 Final Exam (Tuesday 10 Dec 1996)
This examination is based on Per-Gunnar Persson, Anders Ahlbom, Goran

Hellers. Diet and inflamatory bowel disease: a case-control study.
Epidemiology 1992;3:47-52.
NOTE: For simplicity, ignore the requirement that this study was
restricted to those persons with a telephone number.
1. Which of the following best describes the primary objective of this

study? (Choose one best answer) (3 pts)
A. To test the hypothesis that persons with inflammatory bowel disease

are more likely to have been exposed to certain dietary factors than
those without inflammatory bowel disease.
B. To test the hypothesis that the risk of having inflammatory bowel

disease given that you have certain dietary exposures is greater than
the risk of not having inflammatory bowel disease.
C. To test the hypothesis that the increase in inflammatory bowel

disease in the population is attributed to certain dietary exposures.
D. To test the hypothesis that the average consumption of certain

dietary factors increases as the proportion of a group of people with
inflammatory bowel disease increases.
2. Designation as a case of ulcerative colitis was based on which of the

following classification models. (Choose one best answer) (3 pts)
A. Manifestational criteria
B. Causal criteria
C. Both manifestational and causal criteria
D. Neither
3. Medical records were used to validate the hospital diagnoses of Crohn's

disease and ulcerative colitis. By using this validation process
instead of relying on hospital discharges coding alone, the authors are
reducing which of the following sources of error? (Choose one best
answer) (3 pts)
A. Selection bias
B. Prevalence-incidence bias
C. Information bias
D. Surveillance bias
_ -2-
4. Controls were selected as a random sample using the population register

of Stockholm County Council. Which of the following best describes the
primary purpose of using a random sample in this study? (Choose one
A. Maximize generalizability by obtaining a statistically representative

sample.
B. Select a control group that was as similar as possible to the case

group except for dietary exposures.
C. Provide an estimate of the dietary exposure in the source population

from which the cases arose.
D. Select a control group with dietary habits similar to those in the

population of cases.
5. Dietary exposures were assessed using a questionnaire with

Òretrospective questions aimed at a period of time 5 years in the pastÓ
(page 48). Which of the following situations of misclassification would
make sucrose appear more harmful than it really was? (Choose one best
answer) (3 pts)
A. Controls underreported sucrose intake but cases did not.
B. Cases underreported sucrose intake but controls did not.
C. Both cases and controls underreported sucrose intake.
D. Both cases and controls overreport sucrose intake.
6. Suppose that cases excluded due to administrative delay problems were

more likely to have daily soft drink exposure than less than daily.
Which of the following best describes the impact this would have on the
odds ratio presented in Table 3? (Choose one best answer) (3 pts)
A. Without the exclusion the odds ratio would be closer to the null.
B. Without the exclusion the odds ratio would be larger.
C. The exclusion did not affect the odds ratio.
D. Cannot determine on the basis of this information.

7. Diagnoses of disease were verified in this study. Define validity and
compare and contrast this concept with reliability. (4 pts)
8. This study uses a case control design with a population based control
group. Which of the following, in general, is a strength of this
design. (Choose one best answer) (3 pts)
A. Allows examination of rare diseases.
B. Allows examination of rare exposures.
C. Good for establishing temporality.
D. Good for equalizing on known and unknown confounders.

_ -3-
9. Items on the food frequency questionnaire were mostly in a format with

six response options that ranged from twice per day or more often to
less frequently than once every 2 weeks (pg 48). In deriving values for
daily energy intake, the authors treated the food frequency responses as
which level of measurement? (Choose one best answer) (3 pts)
A. Nominal
B. Ordinal
C. Interval
D. Ratio
10. Control for age in the analyses presented in Table 2 was accomplished
through which of the following methods? (Choose one best answer)
(3 pts)
A. Stratified analysis plus matching.
B. Matching plus mathematical modeling.
C. Restriction without stratification
D. Mathematical modeling and stratification.
11. Based on the data presented in Table 2, is ulcerative colitis associated

with fat intake among men? Give a brief statement to support your
answer. (4 pts)
12. The authors state on page 49 that after controlling for smoking, the
relative risk for CrohnÕs disease among men was 1.9 for a high
consumption of sucrose and 0.7 for a high consumption of fiber. Briefly
explain why based on these data the authors state that smoking did not
confound these associations. (3 pts)
13. The data presented in Table 3 indicate that Crohn's disease is

associated with the consumption of fast foods. Suppose that when
stratified by educational attainment, the resulting data were as
follows:
Educational attainment
High Low
Controls Cases Controls Cases
Fast foods
1+ times/wk 12 10 8 14
None 150 100 135 28
a. Calculate the crude and stratum-specific odds ratios. (3 pts)
b. Is this association between fast food and CrohnÕs disease confounded

by education level? Quantify and briefly explain your answer. (3
pts)
c. Briefly explain in 2 sentences or a diagram how education might fit

into a conceptual model consisting of fast food, education, and risk
of Crohn's disease. (3 pts)
_ -4-
14. In the discussion (page 50), the authors state that Òif the change in
diet is the same in cases as in controls, then the relative risk
estimates would be biased toward unityÓ. This is an example of which of
the following? (Choose one best answer) (3 pts)
A. Non differential misclassification bias
B. Non differential selection bias
C. Differential information bias
D. Differential misclassification bias
15. This articles does not present p-values yet reports 95% confidence
intervals for all odds ratios. Which of the following best describes
what information a confidence interval conveys that a p-value does not.
(Choose one best answer) (3 pts)
A. A confidence interval puts the observed point estimate in the context

of randomness.
B. A confidence interval provides information on the precision of the

point estimate.
C. A confidence interval includes an estimate of the statistical power

of the study.
D. A confidence interval reflects the clinical significance of the point

estimate.
16. The study describes the association of consumption of Muesli-type

breakfast cereal and Crohn's disease (Table 3). Briefly state and
evaluate the strength of the numerical evidence for the association
between Muesli-type breakfast cereals and Crohn's disease. (3 pts)
17. Briefly present the evidence for or against the role of fiber as a
confounder of the association of sucrose intake and CrohnÕs disease. (3
pts)
18. Suppose a follow-up to this study was done to estimate the rate (per
10,000 person years) of ulcerative colitis among a large sample in the
Swedish population. The table below summarizes the results.
Fast food intake
Soft drink intake 2/week None
Daily 18.0 9.1
Less frequently 6.8 3.7

a. Which model for the joint effect of these two food items, the
additive model or the multiplicative model, better fits the data?
Your answer should give the formula for each model and show how to
evaluate it with the above data. (5 pts)
b. Do these data, assuming that they accurately reflect causal effects,

indicate a synergistic effect from a public health perspective?
Justify your answer and state an appropriate public health
implication if any. (2 pts)
_ -5-
19. This study did not differentiate between caffeinated and decaffeinated
coffee. Using the data presented in Table 4 and applying the
assumptions below, calculate the odds ratio (heavy versus no use)
associated with caffeinated coffee consumption and determine if it is
protective against ulcerative colitis. Describe in 2 sentences or less
the interpretation of this new odds ratio, ignoring issues of random
error. (4 pts)
Assumptions:
1. 20% of the heavy coffee drinkers ( 3 cups per day) among cases drink
only decaffeinated coffee.
2. 90% of heavy coffee drinkers among controls drink only decaffeinated

coffee.
20. Which of the following variables was NOT in the multiple logistic model
that was used to estimate the relative risk for sucrose intake in
relation to ulcerative colitis in women? (Choose best answer) (3 pts)
A. Age
B. Gender
C. Total energy intake
D. Ulcerative colitis
21. In the multiple logistic model that yielded the relative risk estimate
of 0.7 for Ulcerative colitis in relation to daily vegetable consumption
(Table 4), what was the value of the coefficient for the vegetable
consumption variable assuming that it was coded as 1=daily, 0=less
frequently? Write the conversion equation of coefficient to relative
risk estimate. (3 pts)
22. Assume that the population of Stockholm County in the age range covered
by this study was 1,000,000 in 1980 and remained constant throughout the
decade. What was the average annual incidence of hospital-diagnosed
Crohn's disease during that period regardless of when their medical
record became available? (3 pts)
23. Using the data in Table 2, for which of the following two associations
is there more of an indication of confounding by age and total energy
intake in WOMEN? Support your answer with relevant data and/or
computations. (3 pts)
a. Crohn's disease and sucrose intake (highest versus lowest level)
b. Crohn's disease and disaccharide intake (highest versus lowest level)
24. Briefly state one major strength and one major limitation of this study
(2 pts)
_ -6-
25. List two Bradford Hill criteria for evaluating whether dietary sucrose
intake is causally related to inflammatory bowel disease. Evaluate each
using specific facts from the article. (4 pts)
26. Which of the following statements about the data in Tables 1 and 2 are
TRUE and which are FALSE (answer TRUE or FALSE for each statement). (2
pts each)
a. In women, the rate of (hospitalized) ulcerative colitis was higher

than that of (hospitalized) Crohn's disease.
b. The similarity in age distribution between the case groups and

controls indicates that the rates of these disease are fairly uniform
between the ages of 15 and 79 years.
c. Reporting of dietary intake by the Crohn's disease cases involved
recall over longer periods of time, on the average, than was the case
for the ulcerative colitis cases.
d. The proportion of controls with high dietary fat intake was higher
for men than for women.
27. A Swedish friend of yours who lives in Stockhom has an indentical twin
sister who is anything but identical in terms of her diet. Your friend,
as other health conscious Swedes, avoids fast foods and soft drinks, and
eats whole grain bread and muesli-type cereals daily. Her twin sister,
and many Swedes, often consumes fast foods and soft drinks, but never
touches whole grain bread or muesli.
Your friend comes to visit with you over the holidays, and while you are
sleeping late one morning she comes across your class notes from EPID
168. At breakfast, where she has been busily scribbling on her napkin,
she asks you this question.
"Suppose that fast foods, soft drinks, whole grain bread, and muesli-
type cereal affect Crohn's disease risk independently, and that I can
ignore other risk factors. Suppose also that the excess risks are
additive. Is my twin sister's risk of Crohn's disease 10 times my own?"
She shows you how she used the information in Table 3 to obtain that
estimate:
(3.4 - 1) + (2.8 - 1) + ((1/0.4) - 1) + ((1/0.2) - 1) + 1 = 10.7
She goes on to explain "(3.4 -1) is the excess risk from fast foods, and
((1/0.4) - 1) is the excess risk from eating bread that is not whole
grain."
Even though you're not quite fully awake, you feel justifiable pride in
your command of epidemiologic concepts and explain to her the one big
mistake she has made. You say, " . . . ". Write a brief statement of
what you would say. (4 pts)
Rosemond/Schoenbach, 12/96, edited 11/11/97 epid168 \ exams 1996 Final exam

_


Answer Guide
1. A. To test the hypothesis that persons with inflammatory bowel disease

are more likely to have been exposed to certain dietary factors than
those without inflammatory bowel disease.
2. A. Manifestational criteria
3. C. Information bias
4. C. Provide an estimate of the dietary exposure in source population from

which the cases arose.
5. A. Controls underreported sucrose intake but cases did not.
6. B. This differential selection bias would underestimate the odds ratio.
7. Validity refers to accuracy or how well an instrument or method measures

what it purports to measure. Reliability refers to repeatability, does
an instrument or method get the same result or answer consistently,
regardless of whether the reading is correct.
8. A. allows examination of rare diseases.
9. D. Ratio (The response scale for each item was ordinal, but in order to
create the total energy variable the authors had to convert each
response into calories.)
10. B. Matching plus mathematical modeling.
11. The odds ratios for 80 to 104 grams per day was 1.4 and for intakes of
greater than 105 grams per day the odds ratio was 1.3. This suggests a
tendency for cases to have a greater proportion of high fat eaters than
controls. However, the confidence intervals are broad, extending as low
as 0.4 and 0.6. Furthermore there is no suggestion of a dose response.
This is at most weak evidence of a relationship between fat intake and
ulcerative colitis.
12. a. The crude (with respect to smoking) and adjusted odds ratios are the
same. If smoking had been a confounder in the relationship between
sucrose and Crohn's disease or between fiber and Crohn's disease the
adjusted odds ratio would have been meaningfully different from the
values in Table 2.
13. a. Odds ratios: Crude = (24 x 285) / (20 x 128) = 6840 / 2560 = 2.7
among High education = (10 x 150) / (12 x 100) = 1.3
among Low education = (14 x 135) / (28 x 8) = 8.4
b. The stratum-specific odds ratios are quite different from each other,
suggesting some degree of effect modification. The crude odds ratio
is within the range of the two stratum-specific odds ratio, which
suggests that education is not so much a confounder as an effect
modifier.
c. Three conceptual models of the relationship among fast food, education

and Crohn's disease could be:
education-- lower fast food-- lower Crohn's disease (i.e., higher

educational status could lead to lower fast food consumption which
could then lead to reduced association with Crohn's disease)
education-- (lower fast food + education) -- lower Crohn's disease

(i.e., education also has an interactive effect with fast food
consumption to lead to an association with Crohn's disease)
[education -- lower Crohn's disease] AND [lower fast foods-- lower

Crohn's disease risk] (i.e., lower fast food intake and education act
as independent main effects to influence Crohn's disease risk).
14. A. Nondifferential misclassification bias
15. B. A confidence interval provides information on the precision of the

point estimate.
16. There appears to be a strong protective effect of daily consumption of

Muesli-type breakfast cereals and Crohn's disease (odds ratio = 0.2 [0.1-
0.7]). The association is considerably weaker for weekly consumption of
these cereals (odds ratio = 0.8). There is evidence of a dose response
relationship, even though the OR for weekly consumption was not
statistically significant. One should also consider that the absolute
number of cases with daily consumption of Muesli-type cereals is small
(n=4).
17. The authors state that sucrose and fiber intake could be associated with
one another as well as with Crohn's disease and thus each factor might be
a confounder of the associations between Crohn's disease and the other
("mutual confounding"). The odds ratio was 2.6 for a high sucrose intake
(bottom page 48). When adjusted for fiber the sucrose odds ratio changed
only slightly to 2.5. Therefore, fiber was a only a slight modifier of
the sucrose and Crohn's disease relationship.
18. a. Under the additive model, we expect the joint excess rate of the two
factors will be equal to the sum of the excess rate from each factor
separately. The additive model can also be written in terms of rates:
expected rate of ulcerative colitis with both daily soft drink and =2
fast foods per week = rate (daily soft drinks, without fast food) +
rate (less freq. soft drink, =2 fast food per week) - rate (neither).
If Ri,j is the rate for exposures i and j = 1 (present) or 0 (absent),

then the additive model is: R1,1 = R1,0 + R0,1 - R0,0. This equation
expressed with numbers from the tables is: Expected joint rate = 9.1
+ 6.8 - 3.7 = 12.2. The observed rate with both factors was 18.0.
Therefore, the additive model does not explain the full amount of the
observed joint risk.
Under the multiplicative model, we expect the joint rate ratio of the
two factors to be equal to the product of the rate ratios for each
factor separately. In the above notation, the model can be expressed
as: R1,1 = (R1,0 x R0,1)/R0,0. This equation expressed with numbers
from the tables is: (9.1 x 6.8) / 3.7 = 16.7. The observed rate is
18.0. The close agreement for the observed joint rate and that
expected under the multiplicative model suggests that the relationship
among daily soft drink consumption, frequent fast food exposure, and
Crohn's disease is closer to multiplicative than to additive.
b. Generally, synergism from a public health perspective is equated with

a joint effect that is greater than expected with an additive model.
Therefore, the relationship between fast foods and soft drink is
synergistic, implying that the exposure group to target for maximum
reduction in ulcerative colitis rates per person year is people who
consume both fast foods and soft drinks. One could propose posting
warnings signs in fast food establishments, soft drink vending
machines, and beverage containers, etc.
19. odds ratio for =3 caffeinated coffee = (56 x 36) / (18 x 36) = 3.1
Heavy caffeinated coffee drinking now appears to be a risk factor for
Ulcerative colitis where before coffee drinking appeared to be
protective. An alternative approach would be to include the
decaffeinated coffee drinkers in the "No" (caffeinated) coffee group.
Under this model the odds ratio for =3 cups caffeinated coffee, relative
to none or only decaffeinated = (56 x 201) / (50 x 18) = 12.5
20. B. Gender -- all subjects in this analysis are women.
21. Regression coefficient = log (OR) = log (0.7) = -0.36
22. 236 cases / 5,000,000 person years = 4.72 cases/100,000 person years.
Full credit was given for 236 cases / 4,000,000 person years = 5.9 cases
/ 100,000 per year. Note that the incidence is obtained from all cases
(or at least all confirmed cases), rather than from only consenting
cases.
23. There is more confounding for sucrose:

For sucrose:
Crude OR = (34 x 67) / (27 x 38) = 2.22, versus adjusted OR of 3.6
For disaccharides:
Crude OR = (30 x 66) / (35 x 45) = 1.26, versus adjusted OR of 1.2
24. Strengths could include attempts to evaluate dose response, population-

based case and control selection, validation of case status, large study
population. Weaknesses include potential for recall bias, information
bias in diet assessment.
25. Strength of association (This study assessed the strength of association

by calculating odds ratios. These measures of strength were also put in
context by providing confidence intervals. Some stratum-specific odds
ratios were strong while others were very weak.), dose response,
consistency across studies (limited).
26. a. F
b. F
c. T
d. F
27. Models of joint effects combine effects of "pure" exposures, i.e., in the
absence of other exposures. But the excess risk for each food item in
Table 2 is estimated without controlling for the effects of others. For
example, since people who eat fast foods are also likely to take soft
drinks and not to eat whole grain bread, the relative risk estimates for
fast food 2+ times/week probably already reflect frequent soft drink
consumption and low whole grain bread consumption. In order to add up
the excess risk for each food item, we need to know the excess risks for
exposure to that item in the absence of the others.
12/31/96, 1/16/97 \ epid168 \ exams 1996 Final exam - answer guide

University of North Carolina at Chapel Hill EXAM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

University of North Carolina at Chapel Hill EXAM

Uploaded by

Copyright:

Available Formats

University of North Carolina at Chapel Hill

School of Public Health

Midterm Examination, Fall 1999

A. incidence rate among the exposed

5. For each of the following epidemiological measures, indicate whether it is a rate, a

a. Population attributable risk RATE PROPORTION NEITHER

b. Incidence density (ID) RATE PROPORTION NEITHER

c. Prevalence RATE PROPORTION NEITHER

d. Relative risk RATE PROPORTION NEITHER

6. Indicate true or false next to each of the following. (2 pt each)

Work-related 1195 345

Not work-related 788 97672

A. Reliability of death certificate classification

Smoking status N Cases of ARM

Never smokers 368 26

Ever smokers 864 79

A. Population based cross-sectional study

c. Calculate the 5-year cumulative incidence of age-related maculopathy in ever

d. Calculate the cumulative incidence ratio comparing the incidence of age-related

e. Assuming causality, what is the proportion of cases of age-related maculopathy that

Cell phone manufacturing Textile manufacturing National random sample

25-39 1000 2 .002 100 2 .02 10,000 30 .003

40-55 700 25 .037 500 30 .06 15,000 900 .06

55+ 50 15 .300 1500 150 .100 15,000 1200 .08

Total 1750 42 .024 2100 182 .087 40,000 2130 .053

b. Compute a standardized event ratio (similar to a standardized mortality ratio (SMR)

Baseline body Number of incident Person-years Crude

* kg body weight per height in meters squared

(Unfortunately the diagram is not yet available)

Midterm Examination, Fall 1999

a. Population attributable risk (PARP)

6. Indicate true or false next to each of the following. (2 pt each)

a. FALSE – A Pearson product-moment correlation coefficient measures the extent to

c. FALSE – A population attributable risk proportion depends on the prevalence of

Death Certificate Work-related Not work-related TOTAL

Work-related 1195 788 1,983

Not work-related 345 97,672 98,017

TOTAL 1,540 98,460 100,000

b. Sensitivity = 1,195/1,540 = 78% Specificity = 97,672/98,460 = 99%

e. B. Passive surveillance – the reports are submitted by health care workers in

Cigarette smoking status

Ever smokers Never Total

Case ARM 79 26 105

Status Non-cases 785 342 1127

Total 864 368 1232

a. CI in ever smokers = # new cases / population at risk = 79/864 = 0.091 in 5 years

b. Standardized event ratio (textiles) = SMR (textile) = observed/expected

Baseline body Number of incident Person-years Incidence

* kg body weight per height in meters squared

b. 3 cases/43 person-months = 7.0 cases per 100 person-months

c. 13-month CI = 3/7 = 0.43

d. Product-limit estimate of survival = 1-[(6-1)/6 x (5-1)/5 x (3-1)/3)] = 1-0.444 =

Revised and formatted 8/4/2000, 8/8/2000 by Victor_Schoenbach@unc.edu

Fundamentals of Epidemiology (EPID 168)

Final Examination, Fall 1999

Proxy report Self Report = YES Self report = NO

19. (question was not asked)

a. Calculate the appropriate crude (unadjusted) measure of association between farm

Farm Occupation CASE CONTROL

A. interclass correlation coefficient

Validation of proxy reports of use water from a private well

Case's self report

Report of proxy Yes No Total