You are on page 1of 9

Research Full Report

Examining the Reproducibility of 6 Published Studies in


Public Health Services and Systems Research
Jenine K. Harris, PhD; Sarah B. Wondmeneh, HBSc; Yiqiang Zhao, MPH; Jonathon P. Leider, PhD

ABSTRACT
Objective: Research replication, or repeating a study de novo, is the scientific standard for building evidence and identifying
spurious results. While replication is ideal, it is often expensive and time consuming. Reproducibility, or reanalysis of data
to verify published findings, is one proposed minimum alternative standard. While a lack of research reproducibility has
been identified as a serious and prevalent problem in biomedical research and a few other fields, little work has been done
to examine the reproducibility of public health research. We examined reproducibility in 6 studies from the public health
services and systems research subfield of public health research.
Design: Following the methods described in each of the 6 papers, we computed the descriptive and inferential statistics for
each study. We compared our results with the original study results and examined the percentage differences in descriptive
statistics and differences in effect size, significance, and precision of inferential statistics. All project work was completed
in 2017.
Results: We found consistency between original and reproduced results for each paper in at least 1 of the 4 areas examined.
However, we also found some inconsistency. We identified incorrect transcription of results and omitting detail about data
management and analyses as the primary contributors to the inconsistencies.
Recommendations: Increasing reproducibility, or reanalysis of data to verify published results, can improve the quality of
science. Researchers, journals, employers, and funders can all play a role in improving the reproducibility of science through
several strategies including publishing data and statistical code, using guidelines to write clear and complete methods
sections, conducting reproducibility reviews, and incentivizing reproducible science.

KEY WORDS: open science, public health services and systems research, reproducibility

I n recent decades, the academic culture of pub-


lish or perish has collided with technological
advances in publishing, helping to spur the
exponential growth of published research.1,2 Whether
by correlation or causation, challenges currently
facing science coincide with this growth, including in-
creasing rates of retraction,3 a lack of replication and
reproducibility,4 and the proliferation of predatory
Author Affiliations: Brown School, Washington University in St Louis, St journals.5
Louis, Missouri (Dr Harris, Ms Wondmeneh, and Mr Zhao); and Johns
Hopkins Bloomberg School of Public Health, Baltimore, Maryland (Dr Leider).
Research replication, or the process of repeating a
The authors acknowledge Todd Combs for his helpful comments on drafts of
study de novo, is the scientific standard for building
this manuscript and for his assistance in setting up the coding2share GitHub evidence and identifying spurious results.6,7 While
account as a repository for their code. The research presented in this paper is replication is ideal, the process can be time consuming
that of the authors and does not reflect the official policy of the Robert Wood
Johnson Foundation.
and expensive.7 Reproducibility, or reanalysis of data
J.K.H. received a grant from the Robert Wood Johnson Foundation in March to verify published findings, is one proposed minimum
2017 that supported a portion of her time completing the writing and editing alternative standard.7,8 Reproducibility requires, at
of this manuscript. S.W. and Y.Z. were paid hourly as part-time graduate a minimum, accessible data and clear instructions
research assistants by Washington University in St Louis for their work on
this project. for data management and analysis.8 A recent review
The authors declare no conflicts of interest. of 10 papers from each of 10 top scientific jour-
Supplemental digital content is available for this article. Direct URL citations nals found that 20% to 80% of papers per journal
appear in the printed text and are provided in the HTML and PDF versions of (median = 50%) included an unclear or unknown
this article on the journal’s Web site (http:// www.JPHMP.com).
sample size, and up to 40% of papers (median =
Correspondence: Jenine K. Harris, PhD, Brown School, Washington
University in St Louis, One Brookings Dr, Campus Box 1196, St Louis, MO 20%) included unclear or unknown statistical tests.9
63130 (harrisj@wustl.edu). Incorrect reporting of research results can also hinder
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. reproducibility. Two recent studies found approxi-
DOI: 10.1097/PHH.0000000000000694 mately 6% of P values were reported incorrectly in a

00 2018 • Volume 00, Number 00 www.JPHMP.com 1

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
2 Harris, et al • 00(00), 1–9 Reproducing Public Health Services and Systems Research

sample of psychology papers,10 and 11% of P values Health Officials,” and “NACCHO.” We reviewed the
were incorrect in a sample of medical papers.11 resulting papers and restricted the results to those us-
Lack of reproducibility is serious and prevalent, ing 2013 NACCHO Profile data where all or most
with just 21% of 67 drug studies and 40% to of the analyses were conducted without combining it
60% of 100 psychological studies being successfully with other years of Profile data or other data sources
reproduced.4,6,12 While most of the work on repro- (n = 15). We reviewed all 15 papers and recorded the
ducibility in science has been in the biomedical or psy- following: variables examined, type of statistical anal-
chology fields, 1 recent paper in the journal Science ex- yses conducted, inclusion of detail on missing data
amined economics papers and found that 61% (11 of handling, inclusion of detail on variable recoding and
18) papers were reproducible.13 Less has been done to combining, whether analyses used the NACCHO core
examine the reproducibility of public health research. data and/or 1 or more of the modules, and detail on
Public health services and systems research (PHSSR) inclusion of another data set, if any. We used this sum-
is a subfield of public health research concerned with mary information to select 5 of the 15 articles (33.3%)
the “organization, financing, and delivery of public that (1) included clear descriptions of data manage-
health services, and the impact of these services on ment and analyses, and (2) applied commonly used
public health.”14 Governmental public health practi- statistical methods for PHSSR.17 Harris has also pub-
tioners use research evidence, including PHSSR, to se- lished PHSSR using NACCHO data but has no publi-
lect programs and justify program selection to their cations using 2013 NACCHO data alone for analyses;
partners, evaluate programs, prepare funding appli- we reproduced the results of a study using 2008 Pro-
cations, and conduct needs assessments.15,16 file data led by Harris to determine whether we would
One of the major sources of data for PHSSR be more successful in reproducing our own work.
research is the National Profile of Local Health
Departments Study (Profile), which is a comprehen-
Reproducing descriptive and inferential
sive survey of all local health departments (LHDs)
statistics
nationwide conducted regularly by the National
Association of County and City Health Officials The first step in reproducing the research was to
(NACCHO). Survey questions cover LHD funding, mimic the work described in each manuscript to re-
workforce, programs, and partnerships. Profile data produce descriptive statistics. We examined tables and
are available for public use for free through the methods sections in each paper to identify variables
Inter-university Consortium on Political and Social used, recoding, whether weights were used, and types
Research or can be requested directly by completing of descriptive analyses conducted. All authors worked
a data use agreement form and e-mailing it to NAC- together to manage and analyze the data in order
CHO. Given that it is publicly available at no cost to reproduce the reported descriptive statistics. Once
and widely used in PHSSR, we sought to examine we completed data management and were able to re-
the reproducibility of published studies that used the produce the descriptive statistics as closely as possi-
NACCHO Profile study data. We had 3 goals: (1) ex- ble to the original published manuscript, we repro-
amine reproducibility in a sample of published PHSSR duced bivariate and multivariate inferential model(s)
studies using publicly available data; (2) demonstrate on the same managed data. We reproduced statistics
one strategy for increasing research reproducibility; reported in all tables in each manuscript that used
and (3) encourage public health researchers to begin only NACCHO 2013 Profile data (or 2008 Profile
adopting reproducible research strategies. data for Paper 6); we did not reproduce tables that
included other data sources nor statistics displayed in
Methods figures.
In Paper 1, we were not able to closely reproduce
Goal 1: Examine reproducibility in PHSSR studies the tables of descriptive statistics without additional
information from the authors. For this paper, rurality
Sample selection
was coded on the basis of Rural-Urban Commut-
In early 2017, we conducted a search to identify pub- ing Area (RUCA) codes, which allows classification
lished journal articles using the 2013 NACCHO Pro- of health department jurisdictions by how rural
file data. We searched 10 sources: PubMed, Google or urban they are. The NACCHO data can be re-
Scholar, Ebsco, Web of Knowledge, Scopus, EMBASE, quested with RUCA codes but we were unsure of
CINAHL plus, Academic Complete, Web of Science, the exact categories used in Paper 1, which lacked
and CINAHL. Our search terms were as follows: detail about the codes. We e-mailed the lead author
“National Association of County & City Health Of- and were sent the RUCA codes used. The RUCA
ficials,” “National Association of County and City codes for Paper 1 are included on GitHub with the

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
00 2018 • Volume 00, Number 00 www.JPHMP.com 3

SPSS, Stata, and R command files for all results re- figures were produced in R. The SPSS, Stata, and
ported here (https://github.com/coding2share/phssr- R command files are available on GitHub: https:
reproducibility/). //github.com/coding2share/phssr-reproducibility/.
Each of our analyses reproducing the manuscript in
Comparing reproduced results with original question used the statistical program noted by that pa-
results per. Data and codebooks for the Profile study data are
We examined how similar or different the reproduced available from NACCHO (http://nacchoprofilestudy.
statistics were compared with the published results. org/).
For descriptive statistics reported alone or as part of a
table of bivariate analyses, we determined the percent- Results
age difference between the original and reproduced
value. In computing the percentage difference, we sub- Sample characteristics
tracted the original values from the reproduced values
and divided by the originals. The resulting percentages The 5 studies using 2013 NACCHO Profile study
were positive for reproduced values greater than the data were published between 2015 and 2017 in 3 dis-
originals, and negative for reproduced values less than tinct journals with 4 distinct lead authors. The addi-
the originals. For each paper, we determined the mean tional study by Harris was published in 2013 and used
and standard deviation for the percentage difference 2008 NACCHO Profile study data. Descriptive tables
across all descriptive statistics reported in each table. reported frequencies, percentages, means, and stan-
For multivariate inferential statistics, we compared dard deviations. Bivariate inferential statistics, such
original and reproduced statistics in 3 areas: effect as χ 2 or correlation analyses, were included in 4 of
size, statistical significance, and precision. For effect the 6 papers. The 5 papers using 2013 data used
size comparisons, we used a similar strategy to what weights in analysis and conducted multivariate lo-
we used for descriptive statistics, computing the per- gistic regression for the primary analysis; Harris pa-
centage difference between original and reproduced per neither weighted the data nor reported logistic
odds ratios. For significance, we used the cutoff for regression.
significance reported (or implied) in the published
manuscript and noted if reproduced statistical tests Reproducing descriptive statistics
matched the significance of original test results. For
those not matching, we recorded whether the result Following the methods and results sections of the
became significant or nonsignificant compared with 6 papers, we computed descriptive statistics from
the original. Statistical significance was also compared NACCHO Profile Study data sets obtained directly
for bivariate inferential analyses. Precision was exam- from NACCHO. Figure 1 shows the mean percent-
ined by comparing the mean width of a confidence age difference between the originally reported and re-
interval in the original and reproduced results. produced statistics. Most tables of descriptive statis-
tics were accurately reproduced with the exception of
Examining manuscript features Paper 1 Table 1, and Paper 6 Table 1 (Figure 1; see
Finally, after reproducing results from the 6 papers, Supplemental Digital Content Appendix A, available
we reviewed each paper and noted whether the paper at http://links.lww.com/JPHMP/A396). Tables show-
included detail we felt would have aided us in more ing the original and reproduced values are shown in
accurately reproducing results: (1) the type of bivari- Appendix A.
ate analysis conducted; (2) test statistic values for bi- Most differences between original and reproduced
variate analyses (eg, χ 2 , F); (3) the name or a detailed values in Paper 1 Table 1 were small, with a few ex-
description of the weight variable used (if any); (4) de- ceptions (see Supplemental Digital Content Appendix
tail on how missing data were handled; (5) detail on A, available at http://links.lww.com/JPHMP/A396).
variable combining and recoding; and (6) sample sizes One exception was the administrator highest degree
and raw frequencies for groups analyzed. variable where the category “Associates degree” had
1.8% of the original study participants but 3.8% of
Goal 2: Demonstrate a strategy for research the reproduced participants, more than doubling the
reproducibility percentage. Likewise, there was a large difference in
the percentage of LHDs a community health improve-
During the process of reproducing the 6 studies, we ment plan (CHIP) had, that developed an agency-wide
organized and annotated all of the SPSS and Stata strategic plan in the original (47.8%) compared with
commands we used to get from the raw NACCHO the reproduced (56.2%). These variables were created
Profile data to our reproduced results. Summary by combining variables (degree variable) or recoding

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
4 Harris, et al • 00(00), 1–9 Reproducing Public Health Services and Systems Research

descriptive odds ratios


statistics and correlations

Paper 1 Table 4
Paper 5 Table 4 Model 2
Paper 3 Table 2
Paper 6 Table 3
Paper 5 Table 4 Model 1
Paper 5 Table 3
Manuscript & table number

Paper 1 Table 3
Paper 2 Table 1
Paper 4 Table 1
Paper 6 Table 2
Paper 3 Table 1
Paper 1 Table 2
Paper 3 Table 3 Model 2
Paper 3 Table 3 Model 3
Paper 3 Table 3 Model 1
Paper 1 Table 1
Paper 2 Table 3
Paper 4 Table 2
Paper 6 Table 1

−50% −25% 0% 25% 50%


Mean percent difference from original to reproduced
with standard deviation

FIGURE 1 Difference Between Original and Reproduced Descriptive and Inferential Statistics Reported in 5 Published Public Health Services and
Systems Research (PHSSR) Studies Using National Association of County & City Health Officials (NACCHO) 2013 Profile Data (Papers 1-5) and 1 PHSSR
Study by Harris Using NACCHO 2008 Profile Data (Paper 6)

a multicategory variable into a binary variable (CHIP values were not. Other than this, the reproduced table
variable). The manuscript included very clear instruc- matched the original nearly exactly.
tions for combining and recoding; we reviewed our
syntax in light of the instructions in the manuscript Reproducing inferential models
but were not able to identify any obvious differences
Comparing effect size
in how we constructed the variables and how the
authors described variable construction. No sample On average, the reproduced effect sizes were accurate
sizes or unweighted frequencies were reported, so for most papers, with mean differences between
we did not have a way to compare the number of original and reproduced being less than 5% for all
LHDs recoded inconsistently from the original to the but 1 paper. For Paper 1 Table 4, the mean differ-
reproduced. ence between original and reproduced effect sizes
The differences between original and reproduced was 11.1%. We examined the differences between
in Paper 6 Table 1 (see Supplemental Digital Con- original and reproduced odds ratios and found 4
tent Appendix A, available at http://links.lww.com/ odds ratios in Paper 1 that were very different from
JPHMP/A396) were due to an error in transcription the original, including both categories of rurality,
by the paper authors or the journal staff. Specifically, whether or not the health department had a board
percentages in the land use planning, tobacco preven- of health, and whether the health department was in
tion and control, influenza, and obesity rows in Pa- the state governance category (Figure 2). We thought
per 6 Table 1 appear to have been swapped between that perhaps the large differences seen between
rows, suggesting the row labels were reordered but the original and reproduced descriptive statistics for

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
00 2018 • Volume 00, Number 00 www.JPHMP.com 5

Original Reproduced

Urban

State governance
Local Health Department Characteristic

Public health physician (ref=no)

Per capita revenue

Local governance

Local board(s) of health (ref=no)

Information systems specialist (ref=no)

Agency−wide strategic plan (ref=no)

*Micropolitan

**Epidemiologist (ref=no)

1.0 10.0
Odds ratio & 95% CI (log scale)

FIGURE 2 Original and Reproduced Odds Ratios and Confidence Intervals From Paper 4 Model 1
Abbreviation: CI indicates confidence interval.

this paper may have resulted in the low accuracy Comparing precision
of the regression model, but the variables with the
In addition to relatively high reproducibility in statis-
least accuracy in the descriptive analyses were not
tical significance, the precision of effect size estimates
used in the regression model. We were unable to
was relatively stable across all of the studies except
identify the source of the sizeable differences between
for Paper 1. The reproduced results of Paper 1 were
observed and reproduced regression results for this
more precise than the original results (Figure 2),
model.
which suggests that smaller standard deviations
Comparing statistical significance and/or larger sample sizes were used to compute the
reproduced confidence intervals. Figure 4 shows the
All but 1 of the bivariate tables reproduced were com- mean difference in width between original and re-
pletely accurate with respect to statistical significance produced confidence intervals for odds ratios across
status (Figure 3). The multivariate models each had studies. Supplemental Digital Content Appendix
1 or more changes to statistical significance status B, available at http://links.lww.com/JPHMP/A397,
with the exception of Paper 4. Paper 6 had by far the shows original and reproduced odds ratios for Papers
most changes in statistical significance; 10 of the 40 1 through 5.
correlations changed from statistically significant to
nonsignificant when reproduced. In an examination Examining manuscript features
of documents submitted by Harris for publication of
Paper 6, we found that 4 of the 10 correlations that Three of the 4 papers that included bivariate analy-
went from significant in the original to nonsignificant sis included the type of bivariate analysis conducted,
in the reproduced results were not marked as signif- but none of the tables of bivariate analyses included
icant in the submitted table but were marked as so test statistic value(s) such as χ 2 or t statistics. Of the
in the published table. At some point in the process 5 papers using weights, 2 of them were specific about
from submission to publication, 4 correlations were the weighting variable used. The other 3 papers stated
transcribed incorrectly as significant and not caught that “appropriate” or “proper” weights were used. All
during the proof review or any other stage before papers gave overall response rates of the study and 1
publication. paper (Paper 5) stated that all included variables had

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
6 Harris, et al • 00(00), 1–9 Reproducing Public Health Services and Systems Research

Non−significant in original, Significant in original,


significant when non−significant when Stayed the same
reproduced reproduced

Paper 6 Table 3

Paper 6 Table 2

Paper 6 Table 1

bivariate
Paper 4 Table 1

Paper 3 Table 2
Manuscript & table number

Paper 1 Table 3

Paper 1 Table 2

Paper 5 Table 4 Model 2

Paper 5 Table 4 Model 1

Paper 4 Table 2

multivariate
Paper 3 Table 3 Model 3

Paper 3 Table 3 Model 2

Paper 3 Table 3 Model 1

Paper 2 Table 3

Paper 1 Table 4

0% 25% 50% 75% 100%


Percent of inferential statistics that maintained
or changed significance status when reproduced

FIGURE 3 Percentage of Bivariate and Multivariate Inferential Results That Maintained or Changed Statistical Significance Status When Reproduced
for 6 Public Health Services and Systems Research Studies

less than 5% missing data; no other detail on handling


Original Reproduced of missing data was included. Papers 2 and 4 included
unweighted frequencies, and Papers 5 and 6 included
sample sizes for analyses; other papers included nei-
Paper 5 Table 4 Model 1
ther of these.
Paper and table number

Paper 2 Table 3

Paper 5 Table 4 Model 2

Paper 4 Table 2 Discussion


Paper 3 Table 3 Model 3 We reproduced the tables of descriptive and infer-
Paper 3 Table 3 Model 1 ential statistics for 5 PHSSR studies using 2013
Paper 3 Table 3 Model 2
NACCHO Profile data and 1 study using 2008 Pro-
file data. All 6 used common statistical methods for
Paper 1 Table 4
PHSSR.17 Overall, the accuracy of the reproduced
0 3 6 9 results was high with a few exceptions. Consistent
Mean confidence interval width for with prior research in psychology and medicine,9-11 we
original and reproduced
logistic regression results identified 2 primary issues that hindered reproducibil-
ity: (1) typos and other inaccuracies in transcription;
FIGURE 4 Comparison of Confidence Interval Width (Precision) for Odds and (2) a lack of detail about steps taken and
Ratios From Original and Reproduced Logistic Regression Models in 5
Published Public Health Services and Systems Research Studies
choices made by the authors throughout the research
process.

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
00 2018 • Volume 00, Number 00 www.JPHMP.com 7

First, typos and other errors when transferring text the original Paper 1 Table 2 were accurately repro-
during the writing and publishing process accounted duced, so we expected to also find accurately repro-
for numerous differences between original and repro- duced P values. However, the P values differed slightly
duced results. The switching of 4 rows of data in Paper between original and reproduced results. We believe
6 Table 1 without editing the labels was the biggest in- that this was due to a transcription error; however,
stance we found but not the only one. Paper 2 Table 1 including bivariate test statistics would have verified
had 4 frequency and percentage values switched (see our belief or alerted us to some analytic or reporting
Supplemental Digital Content Appendix A, available choice the authors made that we did not include in
at http://links.lww.com/JPHMP/A396). In addition, our process.
all of the significant P values in Paper 1 Table 2 were The differences found in effect size, precision, and
shown as P = .001 while reproduced results for this statistical significance in the reproduced analyses
paper indicated that these P values were all P < .001. likely resulted from not knowing data management
Although it seems unlikely that all the P values would details, like the handling of missing data during re-
be exactly equal to .001, without the test statistics coding in Papers 2 and 3. If a small change in data
for this table, it is difficult to check whether this management results in significance changes or less
difference is real or a set of typos. In Paper 6, 4 super- precise results, it suggests that findings from these
scripts indicating significance were added mistakenly studies may not be very robust and should be inter-
to nonsignificant correlations at some point between preted with caution. For example, in Paper 5, the dif-
submission and publication and were not discovered ferences between observed and reproduced descrip-
during the publishing process. Altogether, 3 of the 6 tive statistics were minor; however, in the inferential
papers examined included transcription errors in re- results, the variable indicating whether the local board
porting values. An additional paper, Paper 5, included of health had authority to approve the LHD bud-
a labeling error that did not result in misreported get was a significant protective factor against LHD
values but caused some initial confusion during data budget cuts in the original analyses but nonsignifi-
management. Specifically, a less than symbol (<) was cant in the reproduced analyses. The confidence in-
printed in Tables 3 and 4 where there should have tervals in the original (0.50-0.98) and reproduced
been a greater than or equal to (≥) symbol. (0.52-1.03) were close but just different enough to
The omission of detail throughout the research change the statistical significance status. The authors
process also hindered accuracy in reproducing these concluded that this significant protective factor was
papers. For example, our initial attempt to reproduce one of a few indicators that board of health ac-
the unweighted frequencies in Paper 2 Table 1 resulted tions influence LHD financial stability, which may
in lower values in the “No” category for 5 variables. or may not be the case depending on how the data
To reproduce the frequencies exactly, we tried assign- were managed. Instead of focusing on P values,18
ing missing data into the category for “No.” So, for a focus on effect sizes and practically relevant dif-
example, an LHD missing data on the variable for ferences is one possible strategy for addressing this
strategic plan developed would be placed into the concern. Finally, as mentioned, we reproduced anal-
“No” category indicating they were not developing yses in the software used in the original manuscript.
a strategic plan. This strategy was not intuitive to us This was at the recommendation of peer reviewers.
and was not described in the methods of the paper but Initially, we conducted all analyses in R. However,
did result in accurate frequencies. We found a similar when comparing R results to the original results ob-
recoding of missing values to “No” in Paper 3 for sev- tained from Stata and SPSS, we noticed a number
eral variables with no mention of this strategy in the of differences in statistical significance that gave us
article text (see command file at https://github.com/ pause. In examining possible reasons for these differ-
coding2share/phssr-reproducibility). In Paper 3, the ences, we found slight differences in algorithms19,20
authors included unweighted percentages, which were and more substantial differences in how weights are
useful in determining how to treat the missing values. applied across different statistical packages for some
Likewise, while 3 of the 4 papers including bivari- statistical tests. Understanding and describing how the
ate statistics named the test they were using, none of program we use applies weights in our analyses could
the 4 papers reporting bivariate analyses included the increase the ability of others to reproduce our work.
values of test statistics for the bivariate analyses. Al- Study limitations include the small number of
though the values of test statistics, like χ 2 or U, are studies reproduced, the small selection of studies
often not inherently meaningful, they do allow the based on data availability, and use of traditional
determination or verification of the P value for a sta- statistical methods. These studies were not likely to
tistical test. As an example of how this may have be representative of the larger population of PHSSR
improved reproducibility, the percentages reported in studies or public health research. However, given data

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
8 Harris, et al • 00(00), 1–9 Reproducing Public Health Services and Systems Research

Implications for Policy & Practice reproducibility.28 In addition, journals improve re-
producibility by (1) requiring authors to follow ex-
■ Increasing reproducibility, or reanalysis of data to verify pub- isting guidelines for reporting study results;23,25-27
lished results, can improve the quality of science. (2) requiring authors to submit data and statistical
source code;7,11,13,23 (3) using reviewers with statistical
■ To increase reproducibility, include detail in publications
expertise;11 and (4) offering reproducibility reviews.7
such as specific data management decisions, sample sizes
Funders and employers of researchers could consider
for all analyses, unweighted frequencies for weighted anal-
incentivizing or funding reproducible research, repro-
yses, and the name and test statistic for all statistical tests.
duction efforts, and strategies for increasing repro-
■ Checking proofs of publications closely, including all labels ducibility such as sharing data and statistical code.6,29
and numbers in tables, can identify errors to correct in re- While not all of these suggestions are appropriate for
porting or typesetting and increase reproducibility. all projects, especially for those with restricted use
■ Researchers, journals, employers, and funders can improve data or data with private or confidential information,
reproducibility by using existing publishing guidelines; pub- all projects can employ some of these strategies. In-
lishing statistical code, output, and data when possible; creasing research reproducibility in public health can
conducting reproducibility reviews; and incentivizing repro- help ensure that researchers are providing practition-
ducible science. ers with the best possible science to use in determining
how to spend scarce resources and improve public
health.

availability, the use of clear widely used methods, we


anticipate that these studies were more reproducible References
than studies using nonpublic data and/or more 1. Bornmann L, Mutz R. Growth rates of modern science: a biblio-
metric analysis based on the number of publications and cited ref-
complex analytic methods. erences. J Am Soc Inf Sci Technol. 2015;66(11):2215-2222.
Public health practitioners working in local and 2. MacRoberts MH, MacRoberts BR. Problems of citation analysis:
state health departments nationwide use PHSSR to a study of uncited and seldom-cited influences. J Am Soc Inf Sci
Technol. 2010;61(1):1-12.
select evidence-based programs15,16 that have been 3. Fang FC, Steen RG, Casadevall A. Misconduct accounts for the ma-
demonstrated effective in publications like those we jority of retracted scientific publications. Proc Natl Acad Sci U S A.
examined. Classification of a program or strategy as 2012;109(42):17028-17033.
4. Prinz F, Schlange T, Asadullah K. Believe it or not: how much can
effective or evidence-based is often based on P val- we rely on published data on potential drug targets? Nat Rev Drug
ues that reach a certain threshold and/or effect sizes, Discov. 2011;10(9):712-712.
like odds ratios, that appear large and precise. Error- 5. Xia J, Harmon JL, Connolly KG, Donnelly RM, Anderson MR,
Howard HA. Who publishes in “predatory” journals? J Am Soc Inf
laden research wastes resources and puts people at Sci Technol. 2015;66(7):1406-1417.
risk,21 and 73.4% of retractions are not fraud-related 6. Open Science Collaboration. Estimating the reproducibility of psy-
but instead are due to error or something else.22 Al- chological science. Science. 2015;349(6251):aac4716.
7. Peng RD. Reproducible research in computational science. Sci-
though a reproducible study is not necessarily a per- ence. 2011;334(6060):1226-1227.
fect study,7 reproducing research can aid in identifying 8. Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic
and correcting errors and improving the overall qual- research. Am J Epidemiol. 2006;163(9):783-789.
9. Gosselin RD. Open letter to scientific journals. http://www.
ity of research. Increasing reproducible research will biotelligences.com/uploads/2/1/8/0/21806970/open_letter_to_
require a shift in the current scientific culture that has journals.pdf. Updated 2017. Accessed April 22, 2017.
researchers competing for scarce human and finan- 10. Nuijten MB, Hartgerink CH, Assen MA, Epskamp S, Wicherts JM.
The prevalence of statistical reporting errors in psychology (1985-
cial resources and often rewards publication quantity, 2013). Behav Res Methods. 2016;48(4):1205-1226.
novel findings, and statistical significance over scien- 11. García-Berthou E, Alcaraz C. Incongruence between test statis-
tific rigor and quality.4,23 tics and P values in medical papers. BMC Med Res Methodol.
2004;4(1):1.
To begin to shift the culture toward reproducibil- 12. Anderson CJ, Bahnik S, Barnett-Cowan M, et al. Response to com-
ity, there are many strategies researchers, journals, ment on “estimating the reproducibility of psychological science”.
employers of researchers, funders, and others can Science. 2016;351(6277):1037.
13. Camerer CF, Dreber A, Forsell E, et al. Evaluating replicability of lab-
use.23,24 Specifically, reproducibility increases when re- oratory experiments in economics. Science. 2016;351(6280):1433-
searchers (1) describe the methods used in enough de- 1436.
tail in publications so that they can be reproduced;25 14. Mays GP, Halverson PK, Scutchfield FD. Making public health im-
provement real: the vital role of systems research. J Public Health
(2) include a methodologist or biostatistician in all Manag Pract. 2004;10(3):183-185.
stages of research;24 (3) publish data and statistical 15. Jacob RR, Allen PM, Ahrendt LJ, Brownson RC. Learning about and
source code;7,11,13,23 (4) follow existing guidelines for using research evidence among public health practitioners. Am J
Prev Med. 2017;52(3):S304-S308.
reporting study results;23,25-27 and (5) take advan- 16. Brownson RC, Allen P, Duggan K, Stamatakis KA, Erwin PC. Fos-
tage of new tools available for improving research tering more-effective public health by identifying administrative

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
00 2018 • Volume 00, Number 00 www.JPHMP.com 9

evidence-based practices: a review of the literature. Am J Prev 23. Begley CG, Ellis LM. Raise standards for preclinical cancer re-
Med. 2012;43(3):309-319. search. Nature. 2012;483(7391):531-533.
17. Harris JK, Beatty KE, Barbero C, et al. Methods in public health 24. Ioannidis JP, Greenland S, Hlatky MA, et al. Increasing value and
services and systems research. Am J Prev Med. 2012;42(5): reducing waste in research design, conduct, and analysis. Lancet.
S42-S57. 2014;383(9912):166-175.
18. Nuzzo R. Statistical errors. Nature. 2014;506(7487):150. 25. International Committee of Medical Journal Editors (ICMJE). Uni-
19. Park HM. Comparing group means: t-tests and one-way ANOVA form requirements for manuscripts submitted to biomedical jour-
using Stata, SAS, R, and SPSS. http://hdl.handle.net/2022/19735. nals: writing and editing for biomedical publication. Haematologica.
Published 2009. Accessed October 11, 2017. 2004;89(3):264.
20. Park HM. Linear regression models for panel data using SAS, 26. Santori G. Journals should drive data reproducibility. Nature.
Stata, LIMDEP, and SPSS. http://www.indiana.edu/∼statmath/ 2016;535(7612):355-355.
stat/all/panel/panel.pdf. Published 2015. Accessed October 11, 27. Network E. Enhancing the quality and transparency of health re-
2017. search. www.equator-network.org. Published October 4, 2015. Ac-
21. Steen RG. Retractions in the medical literature: how many pa- cessed 2009. Accessed October 11, 2017.
tients are put at risk by flawed research? J Med Ethics. 2011;37(11): 28. Lowndes JSS, Best BD, Scarborough C, et al. Our path to better
688-692. science in less time using open data science tools. Nat Ecol Evol.
22. Steen RG. Retractions in the scientific literature: is the inci- 2017;1:0160.
dence of research fraud increasing? J Med Ethics. 2011;37(4): 29. Ioannidis JP. How to make more published research true. PLoS
249-253. Med. 2014;11(10):e1001747.

Copyright © 2018 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.

You might also like