Professional Documents
Culture Documents
Design In this meta-research study, PubMed was searched Impact Above medi- 0.018 .09 0.013 .11
Factor an vs below
for diagnostic accuracy meta-analyses using hierarchical median
methods, published in imaging journals between January
STARD en- Endorsement 0.0057 .60 0.0019 .83
2005 and April 2016. Data were extracted for each meta- dorsement vs no en-
analysis and its included primary studies, including study dorsement
demographic information, journal Impact Factor, journal Cited half-life Above medi- 0.0063 .55 0.0097 .24
cited half-life, Standards for Reporting Diagnostic accuracy an vs below
median
studies (STARD) endorsement, citation rate, publication date,
Citation rate Above medi- 0.018 .08 0.0045 .58
sample size, sensitivity, and specificity. Meta-analyses were an vs below
excluded for failing to report both primary and summary median
accuracy estimates. Primary studies were divided into 2 Publica- Post-STARD 0.0059 .55 0.0082 .38
groups for each variable assessed; groups were defined based tion timing vs pre-STARD
(relative to
on first publication vs subsequent publications on a topic, STARD 2003)
publication before vs after STARD introduction, presence vs
Publication First pub- 0.025 .005 0.0077 .48
absence of STARD endorsement, or by median split. The timing lished vs later
mean absolute deviation of primary study estimates from the (first pub- published
lished)
corresponding summary estimates for sensitivity and
Abbreviation: STARD, Standards for Reporting Diagnostic accuracy studies.
specificity was compared between groups for each variable. a
Statistical significance defined as P < .004 (after Bonferroni correction).
registry and the manuscript. McMaster University, Hamilton, Ontario, Canada, mbuagblc@
mcmaster.ca; 2Toronto General Hospital-University Health
Design We searched PubMed for all randomized clinical Network, Toronto, Ontario, Canada; 3Princess Margaret Hospital-
trials (RCTs) published between 2012 and 2015 in the top 5 University Health Network, Toronto, Ontario, Canada; 4Bachelor of
Health Sciences Program, McMaster University, Hamilton, Ontario,
general medicine journals (based on the 2014 impact factor as
Canada; 5Oncology Biostatistics, Genentech, South San Francisco,
published by Thomson Reuters), which all required CA, USA
registration of the RCT for publication; 200 full-text
publications (50 from each year) were randomly selected for Conflict of Interest Disclosures: None reported.
data extraction. Key study conduct items were extracted by 2
independent reviewers for each year. When an item was Methodological and Reporting Quality of
reported differently or not reported at all in either source, this Systematic Reviews Underpinning Clinical
was considered a discrepancy in reporting between the Practice Guidelines
registry and the full-text publication. Descriptive statistics Cole Wayant,1 Matt Vassar1
were calculated to summarize the percentage of studies with
discrepancies between the registry and the published Objective This study summarizes the findings of 3 separate
manuscript in reporting of key study conduct items. The studies conducted simultaneously to determine the
items of interest were design (ie, randomized control trial, methodological and reporting quality of systematic reviews
cohort study, case control study, case series), type (ie, (SRs) underpinning clinical practice guidelines (CPGs) in
retrospective, prospective), intervention, arms, start and end pediatric obesity, opioid use disorder, and ST-elevated
dates (based on month and year where available), use of data myocardial infarction.
monitoring committee, and sponsor, as well as primary and Design A search of guideline clearinghouse and professional
secondary outcome measures. organization websites was conducted for guidelines published
Results In the sample of 200 RCTs, there were relatively few by national or professional organizations. We included all
studies with discrepancies in study design (n=6 [3%]), study reviews cited by authors of CPG, including Cochrane reviews,
type (n=6 [3%]), intervention (n=10 [5%]), and study arm and removed duplicates prior to data extraction. PRISMA
(n=24 [12%]) (Figure 2). Only 30 studies (15%) had (Preferred Reporting Items for Systematic Reviews and
discrepancies in their primary outcomes. However, there Meta-analyses) and AMSTAR (A Measurement Tool to Assess
were often discrepancies in study start date (n=86 [43%]), Systematic Reviews) instruments were used to score SRs and
study sponsor (n=108 [54%]), and secondary outcome meta-analyses cited in CPGs. PRISMA and AMSTAR are
measures (n=116 [58%]). Almost 70% of studies had validated tools for measuring reporting quality and
methodological quality, respectively.
www. peerreviewcongress.org 29
Results The mean PRISMA total scores for the pediatric identified 185 published trials. Trials with missing protocols
obesity, opioid use disorder, and ST-elevated myocardial (n = 56), equivalence or noninferiority trials (n = 5), trials
infarction SRs across all CPGs were 16.9, 20.8, and 20.8, that accrued less than 40% of their intended sample size (n =
respectively. The mean AMSTAR total scores were 4.4, 8.8, 14), and trials that pooled their data with other studies (n = 2)
and 6.1, respectively. Consistently underreported items on the were excluded. For trials reporting time-to-event outcomes
PRISMA checklist were items 5 (protocol registration), 8 with hazard ratios (HRs) (n = 81), we compared the proposed
(search strategy), 15 (risk of bias for cumulative evidence), effect size from the sample size calculation in the research
and 22 (risk of bias across studies). Consistently protocol with the observed effect size in the published article
underreported items on the AMSTAR checklist were items 4 to calculate the ratio of observed-to-proposed HRs overall
(duplicate extraction/validation), 5 (list of included/excluded and for trials that did or did not report statistically significant
studies), 8 (quality of evidence assessments), 10 (publication effect on primary end points. All HRs were standardized for a
bias assessments), and 11 (conflict of interest disclosure). reduction in adverse events such that HRs less than 1
Altogether, our study included 150 SRs and 29 CPGs, with indicated a benefit to therapy. We also compared findings
only 9 CPGs assigning grades to their recommendations. The with those previously reported for NCI trials conducted from
150 SRs were cited a total of 308 times: 95 times as a direct 1955 to 2006 and tabulated studies that provided a reference,
evidence for graded recommendations, 21 times as direct evidence, or other specific rationale for their proposed effect
evidence for nongraded recommendations, 189 times as size in the research protocol.
supporting evidence, and 3 times for unclear reasons.
Results Data on 98,200 patients from 108 clinical trials were
Conclusions These investigations into CPGs in pediatric evaluated. The most common cancers were breast,
obesity, opioid use disorder, and ST-elevated myocardial gynecologic, gastrointestinal, brain, and genitourinary
infarction revealed a consistent lack of overall methodological malignant neoplasms. The most common primary end point
and reporting quality in the included SRs as well as was overall survival (40.7%). The median ratio of observed-
heterogeneity in the use of grading scales, or lack thereof. to-proposed HRs was 1.26 (range: 0.33-2.34). The median
Because SRs are considered by most to be level 1A evidence, ratio of observed-to-proposed HRs among trials that observed
an apparent lack of quality may impair clinical decision a statistically significant effect on the primary end point was
making and hinder the practice of evidence-based medicine. 1.09 (range: 0.33-1.29) vs 1.30 (range: 0.86-2.34) for trials
Items such as PRISMA items 15 and 22 and AMSTAR items that did not, compared with 1.34 and 1.86, respectively, for
10 and 11 are of particular concern because these items ensure NCI trials conducted from 1955 to 2006. Twenty-four trials
that bias assessments are performed and conflicts of interests (22.2%) observed a statistically significant effect on the
are disclosed. primary end point favoring the experimental treatment,
compared with 24.6% previously reported. The majority of
Oklahoma State University Center for Health Sciences, Department
1
trials (76.9%) provided no rationale for the magnitude of the
of Analytical and Institutional Research, Tulsa, OK, USA, cole.
proposed treatment effect.
wayant@okstate.edu
www. peerreviewcongress.org 31
for Journalology, Clinical Epidemiology Program, Ottawa Figure 3. Impact on Manuscripts Depending on When Guideline
Hospital Research Institute, Ottawa, Ontario, Canada; 4School of and Checklist Were Used
Epidemiology, Public Health, and Preventive Medicine, Faculty
70
of Medicine, University of Ottawa, Ottawa, Ontario, Canada;
5
Knowledge Synthesis Group, Clinical Epidemiology Program,
60
Ottawa Hospital Research Institute, Ottawa, Ontario, Canada;
6
Department of Medicine, University of Valencia/INCLIVA Health
50
Research Institute and CIBERSAM, Valencia, Spain; 7Knowledge
Translation Program, Li Ka Shing Knowledge Institute, St Michaels
Authors, %
40
Hospital, Toronto, Ontario, Canada; 8Epidemiology Division, Dalla
Lana School of Public Health, University of Toronto, Toronto,
30
Ontario, Canada
Moher are Peer Review Congress Advisory Board Members but were
not involved in the review or decision for this abstract. 10
0
Introducing Reporting Guidelines and Checklists Guideline Used
When Writing
Guideline Used
When Submitting
Checklist Requested
by Editorial Office
for Contributors to Radiology: Results of an
Author and Reviewer Survey Significantly more authors who used the guidelines and checklist when
writing the manuscript reported an impact on the final manuscript (107 of
Marc Dewey, Deborah Levine,
1 2,3
Patrick M. Bossuyt, Herbert
4
189 authors [56.6%]) compared with those who used the guideline when
Y. Kressel5,6 submitting the manuscript (95 of 272 authors [34.9%]; P < .001) or when the
checklist was requested by the editorial office (41 of 240 authors [17.1%];
Objective Numerous reporting guidelines have been P<.001). Error bars show 95% CIs.
Design Cohort study of authors of original research Conclusions Almost 4 of 5 authors and half of the reviewers
submissions to Radiology between July 5, 2016, and June 1, judged the guideline checklists to be useful or very useful.
2017, and of reviewers who had performed reviews since Using the guidelines while writing the manuscript was
January 2016. Authors were asked to complete an associated with greater impact on the final manuscript.
anonymized online survey within 2 weeks of manuscript
Department of Radiology, CharitUniversittsmedizin Berlin,
1
submission but before the editorial decision was made. Berlin, Germany, marc.dewey@charite.de; 2Beth Israel Deaconess
Reviewers were surveyed with similar questions from May 17, Medical Center, Harvard Medical School, Boston, MA, USA; 3Senior
2017, until June 1, 2017. Deputy Editor, Radiology, Boston, MA, USA; 4 Amsterdam Public
Health Research Institute, Department of Clinical Epidemiology,
Results A total of 831 of 1391 authors (59.7%) completed the Biostatistics, and Bioinformatics, University of Amsterdam,
survey within a mean (SD) of 1.5 (2.7) days (range, 0-17 days) Amsterdam, the Netherlands; 5Department of Radiology, Harvard
of the request. Consistent with the types of studies submitted Medical School, Boston, MA, USA; 6Editor, Radiology, Boston, MA,
USA
to Radiology, most authors used STROBE (447 of 829
authors [53.9%]) or STARD (313 authors [37.8%]) and only a Conflict of Interest Disclosures: Marc Dewey was an Associate
small minority used CONSORT (40 authors [4.8%]) or Editor of Radiology at the time of initiation of this study and is a
PRISMA (29 authors [3.5%]). Only 120 of 821 authors consultant to the editor now. Debbie Levine is Senior Deputy Editor
of Radiology. Patrick Bossuyt is the lead senior author of the STARD
(14.6%) used the guideline and checklist when designing the
guidelines and checklist. Herbert Y. Kressel is the Editor in Chief of
study, more so for PRISMA users (16 of 29 [55%]), less so for Radiology and an author of the STARD 2015 guidelines.
STARD users (52 of 310 [16.8%]; P < .001) and STROBE
users (46 of 443 [10.4%]; P < .001). The guidelines were used Funding/Support: This study was supported by the Young
Leaders Club program of the International Society of Strategic
by 189 of 821 authors (23.0%) when writing the manuscript;
Studies in Radiology and the Heisenberg Program of the German
these authors more often reported an impact on the final Research Foundation.
manuscript (107 of 189 [56.6%]) compared with those who
used the guideline when submitting the manuscript (95 of Role of the Funder/Sponsor: The funders had no role in the
design and conduct of the study; collection, management, analysis,
272 [34.9%]; P < .001) or when the checklist was requested
and interpretation of the data; preparation, review, or approval of
by the editorial office (41 of 240 [17.1%]; P < .001) the abstract.
(Figure3). Filling out the checklist was considered very
useful by 256 of 819 authors (31.3%), somewhat useful by 390
(47.6%), not very useful by 122 (14.9%), and not at all useful
by 51 (6.2%). The response rate of reviewers was 32.1% (259
of 808 reviewers). The checklist was used by 200 of 259
Table 11. Reported Use of Standard Reporting Guidelines Among JNCI Authors, Editorial Outcomes, and Reviewer Ratings for
Adherence to Guidelines and Clarity of Presentation for Articles, Reviews, Mini-Reviews, Systematic Reviews, Meta-analysis, and Brief
Communicationsa
Reported Using a SRG Adherence to Reporting Guidelines Clarity of Presentation
All submis- 2209 1065 1144 0.52 1033 552 481 3.0 1036 487 549 3.3
sions (100) (48.2) (51.8) (0.5) (100) (53.4) (46.6) (1.6) (100) (47.0) (53.0) (1.1)
Rejected 255 88 (4.0) 167 0.53 .68 609 343 266 2.9 608 340 268 3.1
after peer (11.5) (7.6) (0.5) (58.9) (33.2) (25.7) (1.6) (58.7) (32.8) (27.6) (1.1)
review
Not rejected 141 102 39 (1.8) 0.49 .47 424 209 215 3.2 .004 428 219 209 3.6 P<.001
after peer (6.4) (4.6) (0.5) (41.0) (20.2) (20.8) (1.6) (41.3) (21.1) (20.2) (1.0)
review
www. peerreviewcongress.org 33
Conflict of Interest Disclosures: The study management
Impact of an Intervention to Improve Compliance committee included a representative from the Public Library of
With the ARRIVE Guidelines for the Reporting of Science (Catriona MacCallum), but other than providing general
In Vivo Animal Research advice during the design of the study and organizing the provision
Emily Sena,1 for the Intervention to Improve Compliance of PDFs of included manuscripts, they had no role.
With the ARRIVE Guidelines (IICARus) Collaborative Group Funding/Support: The Medical Research Council, National
Centre for the Replacement Refinement and Reduction of Animals
Objective To conduct a randomized controlled trial to in Research, Biotechnology and Biological Sciences Research
determine whether journal-mandated completion of an Council, and Wellcome Trust pooled resources without a normal
ARRIVE checklist (requiring authors to state on which page grant cycle to fund this project.
of their manuscript each checklist item is met) improves full
Role of the Funder/Sponsor: The funders had no role in the
compliance with the ARRIVE guidelines. design and conduct of the study; collection, management, analysis,
and interpretation of the data; preparation, review, or approval
Design Manuscripts submitted to PLOS One between March
of the abstract. The funders used their social media streams to
2015 and June 2015 determined in the initial screening publicize the study and recruit outcome assessors. National Centre
process to describe in vivo animal research were randomized for the Replacement, Refinement, and Reduction of Animals in
to either mandatory completion and submission of an Research employees were not allowed to enroll as outcome assessors
ARRIVE checklist or the normal editorial processes, which do because of their possible conflict of interest as sponsors of the
ARRIVE guidelines.
not require any checklist submission. The primary outcome
was between-group differences in the proportion of studies Group Members: The IICARus Collaborative group includes
that comply with the ARRIVE guidelines. We used online the following members: University of Edinburgh, Edinburgh,
randomization with minimization (weighted at 0.75) UK: Emily Sena, Cadi Irvine, Kaitlyn Hair, Fala Cramond, Paula
Grill, Gillian Currie, Alexandra Bannach-Brown, Zsanett Bahor,
according to country of origin; this was performed by the
Daniel-Cosmin Marcu, Monica Dingwall, Victoria Hohendorf, Klara
journal during technical checks after submission. Authors, Zsofia Gerlei, Victor Jones, Anthony Shek, David Henshall, Emily
academic editors, and peer reviewers were blinded to the Wheater, Edward Christopher, and Malcolm Macleod; University
study and the allocation. Accepted manuscripts were redacted of Tasmania, Hobart, Tasmania: David Howells; University of
for information relating to the ARRIVE checklist by an Nottingham, Nottingham, UK: Ian Devonshire and Philip Bath;
Public Library of Science, Cambridge, UK: Catriona MacCallum;
investigator who played no further role in the study to ensure
Imperial College London, London, UK: Rosie Moreland; Mansoura
outcome adjudicators were blinded to group allocation. We University, Mansoura, Egypt: Sarah Antar, Mona Hosh, and
performed outcome adjudication in duplicate by assessing Ahmed Nazzal; University of New South Wales, Kensington, NSW,
manuscripts against an operationalized version of the Australia: Katrina Blazek; Animal Sciences Unit, Animal and
ARRIVE guidelines that consists of 108 items. Discrepancies Plant Health Agency, Addlestone, UK: Timm Konold; University
of Glasgow, Glasgow, UK: Terry Quinn and Teja Gregorc;
are being resolved by a third independent reviewer.
AstraZeneca, Wilmington, Delaware, USA: Natasha Karp; Nuffield
Research Placement Student, London, UK: Privjyot Jheeta and
Results We randomly assigned 1689 manuscripts, with 844
Ryan Cheyne; GlaxoSmithKline, Middlesex, UK: Joanne Storey;
manuscripts assigned to the control arm and 845 assigned to University College London, London, UK, and cole Normale
the intervention arm. Of these, 1299 (76.9%) were sent for Suprieure, Paris, France: Julija Baginskaite; University Medical
review, and of these, 688 (53.0%) were accepted for Center Utrecht, Utrecht, the Netherlands: Kamil Laban; University
publication. All 688 manuscripts were dual assessed, and of Rome Sapienza, Rome, Italy: Arianna Rinaldi; Radboud
University Nijmegen Medical Centre, Nijmegen, the Netherlands:
reconciliation of discrepancies is ongoing. Agreement
Kimberley Wever; University of Southampton, Southampton,
between reviewers was high in relation to questions of the UK: Savannah Lynn; Federal University of Rio de Janeiro, Rio
species reported (93%) and measures to reduce the risk of de Janeiro, Brazil: Evandro Arajo De-Souza; University of
bias (73%-91% for 6 questions) and lowest for reporting the Birmingham, Birmingham, UK: Leigh OConnor; Hospital Research
unit of analysis (50%). Data analysis is ongoing. We will Center of the Sacred Heart of Montreal, Montreal, QC, Canada:
Emmanuel Charbonney; National Cancer Institute, Milano,
present data for between-group differences in the proportion
Italy: Marco Cascella; Federal University of Santa Catarina,
of studies that comply with the ARRIVE guidelines, each of Florianpolis, Brazil: Cilene Lino de Oliveira; University of Geneva,
the 38 subcomponents of the ARRIVE checklist, each of the Geneva, Switzerland: Zeinab Ammar; British American Tobacco,
108 items, and the proportion of submitted manuscripts London, UK: Sarah Corke; Ministry of Health, Cairo, Egypt:
accepted for publication. Mahmoud Warda; Vita-Salute San Raffaele University, Milan,
Italy: Paolo Roncon; University of Hertfordshire, Hertfordshire,
Conclusions Our study will determine the effect of an UK: Daniel Baker; University of Veterinary Medicine Hanover,
alteration of editorial policy to include a completed ARRIVE Hanover, Germany: Jennifer Freymann.
checklist with submissions on compliance with the ARRIVE
guidelines in the work when published. These results will
inform the future development and further implementation of
the ARRIVE guidelines.
www. peerreviewcongress.org 35
pre-FDAAA trials, but post-FDAAA trials were not through Yale University from Johnson and Johnson to develop
significantly more likely to have been published (100% vs methods of clinical trial data sharing, from Medtronic and the
US Food and Drug Administration (FDA) to develop methods
90%; P = .06) nor to have been published with findings in
for postmarket surveillance of medical devices, and from the US
agreement with the FDAs interpretation (98% vs 93%; Centers of Medicare and Medicaid Services to develop and maintain
P=.28) (Figure 4). Subgroup analyses suggest that the performance measures that are used for public reporting. Harlan
changes in overall publication rate were primarily the Krumholz also has received compensation as a member of the
consequence of publishing negative trials, as all pre-FDAAA Scientific Advisory Board for United Healthcare. Joseph Ross
has received research support through Yale University from the
and post-FDAAA positive trials were published (72 of 72 and
FDA to establish a Center for Excellence in Regulatory Science
35 of 35, respectively), whereas 38% (5 of 13) of pre-FDAAA and Innovation at Yale University and the Mayo Clinic, from the
negative trials were published vs 100% (5 of 5) of post- Blue Cross Blue Shield Association to better understand medical
FDAAA negative trials. technology evaluation, and from the Laura and John Arnold
Foundation to support the Collaboration on Research Integrity and
Conclusions After FDAAA was enacted, all efficacy trials Transparency at Yale University.
reviewed by the FDA as part of new drug applications for
neuropsychiatric drugs were registered, with the results Evaluation of the ClinicalTrials.gov Results
reported and published. Moreover, nearly all were published Database and Its Relationship to the Peer-
with interpretations that agreed with the FDAs Reviewed Literature
interpretation. While our study was limited by searching for Deborah A. Zarin,1 Tony Tse,1 Rebecca J. Williams,1 Thiyagu
registration status only on ClinicalTrials.gov, our findings Rajakannan,1 Kevin M. Fain1
suggest that by mitigating selective publication and reporting
of clinical trial results, FDAAA improved the availability of Objective As of February 22, 2017, ClinicalTrials.gov
evidence for physicians and patients to make informed contained summary results for 24,377 studies and received
decisions regarding the care of neuropsychiatric illnesses. 160 new submissions weekly. We estimate that US academic
medical centers are required to report more than half of their
1
Yale School of Medicine, New Haven, CT, USA, constance.zou@ sponsored trials to ClinicalTrials.gov under federal policies.
yale.edu; 2Department of Psychiatry, Massachusetts General We previously estimated that one-half of registered studies
Hospital, Boston, MA, USA; 3McLean Hospital, Belmont, MA,
USA; 4Harvard Medical School, Boston, MA, USA; 5Department
with results posted on ClinicalTrials.gov lacked results
of Internal Medicine, Yale School of Medicine, New Haven, CT, publications. It is critical to continue assessing the degree to
USA; 6Section of Cardiovascular Medicine, Department of Internal which this database meets its intended goals. The objective of
Medicine, Yale School of Medicine, New Haven, CT, USA; 7Center this study was to assess the potential scientific impact of the
for Outcomes Research and Evaluation, Yale-New Haven Hospital, ClinicalTrials.gov results database using our 2013 evaluation
New Haven, CT, USA; 8Robert Wood Johnson Foundation Clinical
Scholars Program, Department of Internal Medicine, Yale School of
framework.
Medicine, New Haven, CT, USA; 9Department of Health Policy and
Design We analyzed 2 samples of ClinicalTrials.gov results
Management, Yale School of Public Health, New Haven, CT, USA;
10
Division of Medical Ethics, Department of Population Health, NYU data to assess the impact on the available evidence base.
School of Medicine, Bioethics International,New York, NY, USA
Results On February 10, 2017, 10,464 of 24,251 posted
Conflict of Interest Disclosures: Constance Zou has received results (43%) had links to PubMed. Because not all
a fellowship through the Yale School of Medicine from the National publications are automatically linked and not all linked
Heart, Lung, and Blood Institute. Jennifer Miller has received publications report results, we manually examined a random
research support through New York University from the Laura and
John Arnold Foundation to support the Good Pharma Scorecard.
sample of 100 sets of posted results listing study completion
Harlan Krumholz and Joseph Ross have received research support dates in 2014. Of these, 28 had at least 1 results publication
prior to results posting, 15 had a results publication after
Figure 4. Pre- and Post-FDAAA Efficacy Trials Supporting
results posting, and we could not identify results publications
Neuropsychiatric Drugs First Approved Between 2005
and 2014: Publication Status and Published Conclusion
for 57 studies. We also identified examples of how
Concordance With FDA Decision publications leveraged the information on ClinicalTrials.gov.
To further examine the potential impact on selective
Negative publication, we evaluated drug-condition-sponsor families.
Pre-FDAAA
Published, agree alendronate for osteoporosis). Ideally, summary results for all
with FDA decision
Equivocal Published, conflicts trials in all families would be publicly available. As of
with FDA decision
December 1, 2014, of 329 trials, 109 (33%) had results posted
Positive Not published
on ClinicalTrials.gov only, 42 (13%) available from PubMed
0 10 20 30 40 50 60 70 80 only, 81 (25%) available from both, and 97 (29%) in neither
No. of Trials (Table 12). Overall, 45 of the 96 drug-condition-sponsor
FDA indicates the US Food and Drug Administration; FDAAA, the 2007 Food families had results available for all 144 trials from at least 1
and Drug Administration Amendments Act.
Neither
ClinicalTrials.gov PubMed Only Both ClinicalTrials.gov Results Disclosure ClinicalTrials.gov nor
Study Design Characteristic Only (n = 109) (n = 42) and PubMed (n = 81) Total (n = 232) PubMed (n = 97)
Interventional model
Masking
Allocation
No. of sites
www. peerreviewcongress.org 37
reviewers (applicant vs SNSF), country of affiliation of
Stakeholder Perceptions of Peer Review at the
reviewers (Switzerland vs international), and gender of National Institutes of Health Center for Scientific
applicants and reviewers. We fit a multivariable linear Review
regression model adjusting for all these variables plus
Mary Ann Guadagno,1 Richard K. Nakamura1
calendar year of submission, discipline (21 disciplines), and
applicants age (5 age classes) and affiliation (4 institution Objective To identify best practices for the successful peer
types). review of grant applications and areas for improvement at the
National Institutes of Health (NIH) Center for Scientific
Results Between 2009 and 2015, 36,993 reviewers assessed
Review (CSR), the following questions guided an evaluation
12,132 applications for the SNSF. The mean (SD) score of
study: (1) to what extent are current CSR practices for peer
reviewers proposed by applicants (n=8308) was 5.12 (1.01) vs
review optimal for achieving its mission? and (2) what are the
4.47 (1.25) for reviewers proposed by the SNSF (n=26,594).
areas of success and improvement in the quality of peer
Mean (SD) scores were 4.19 (1.27) for Swiss experts (n=8399)
review?
vs 4.76 (1.19) for international experts (n=26,503); 4.44
(1.25) for female (n=7121) vs 4.67 (1.22) for male (n=27,781) Design Pilot assessments were conducted to develop a short
principle applicants; and 4.48 (1.26) for reviews from female Quick Feedback survey instrument with four 7-point
(n=6933) vs 4.66 (1.22) from male (n=27,969) reviewers. In Likert-type scale statements ranging from strongly agree to
adjusted analyses, the gender differences were attenuated, strongly disagree, measuring key features of peer review,
whereas the other differences changed little (Table 13). All and an open text box for comments. During 1 grant cycle
differences were statistically significant. between 2015-2016 and 2016-2017, 2 surveys were sent to
10,262 and 10,228 reviewers, respectively, in all CSR study
Conclusions Applications received higher scores from
sections. In 2015, a survey was sent to 916 NIH Program
applicant-proposed reviewers and lower scores from Swiss-
Officers (POs), and a replication survey was sent to POs in
based experts. Scores were lower for applications submitted
2016 to 905 POs. During 2015, 27 focus groups were
by female applicants. Our results are compatible with a
conducted with 4 stakeholder groups, and 10 personal
positive bias of reviewers chosen by the applicant, or a
interviews were completed with NIH Institute Directors.
negative bias of experts based in Switzerland, and cannot
Focus group participants were selected from NIH databases
exclude bias against female applicants. Interestingly, female
to ensure diversity. Interrater reliability between coders was
reviewers consistently scored applications lower than male
95.8%.
reviewers, independent of the applicants gender. Panels
making funding decisions should be aware of these potential Results The 2015-2016 reviewer survey yielded a response
biases. Given the association between scores and source of rate of 47.1% (4832 of 10,262), and the 2016-2017 reviewer
reviewer, the SNSF no longer accepts reviewers proposed by survey yielded a response rate of 47.0% (4807 of 10,228). The
the applicants. 2015 PO survey had a response rate of 38.0% (348 of 916),
and the 2016 replication PO survey yielded a response rate of
1
Swiss National Science Foundation, Bern, Switzerland, joao
37.0% (335 of 905). Nonrespondents were not substantially
.martins@snf.ch
different from respondents. Quick Feedback surveys with
Conflict of Interest Disclosures: The authors are employees of reviewers in both years reported a high level of satisfaction
the Swiss National Science Foundation. with the peer review process. More than 80% of reviewers
indicated they either strongly agreed or agreed that panels
were doing a good job in terms of scoring and discussion and
Table 13. Unadjusted and Adjusted Differences in Scores CSR did a good job relative to the quality of the rosters and
Assigned by Reviewers of Grant Applications Submitted to the assignments (Figure 5). Program Officers were less favorable
Swiss National Science Foundation than reviewers in both years, with only 43% to 57% of POs
Difference (95% CI)a responding favorably. Program Officers dissatisfaction with
Variable Unadjusted Adjusted
review meetings focused on insufficient reviewer expertise in
general and technical and logistical challenges at meetings
Source of reviewer
more specifically. Focus group results supported these
Applicant vs SNSF 0.65 (0.62 to 0.68) 0.52 (0.49 to 0.55) findings. Areas for improvement included reducing the
Affiliation of reviewer burden of peer review for all stakeholders, technical and
Switzerland vs inter- 0.56 (0.59 to 0.53) 0.50 (0.53 to 0.47) logistical issues during meetings, need for clearer
national communication, and more guidance on preparing
Gender of applicant applications.
Female vs male 0.23 (0.19 to 0.26) 0.09 (0.06 to 0.12) Conclusions A comprehensive evaluation using systematic
Gender of reviewer surveys, focus groups, and interviews has resulted in useful
Female vs male 0.17 (0.13 to 0.21) 0.07 (0.03 to 0.10) suggestions for improving best practices for peer review by
stakeholders in real time. Areas of success and suggestions for
All unadjusted P values from t tests <.001. Adjusted results from a linear regression model
a
adjusted for calendar year of submission, discipline, and applicants age and affiliation; all
adjusted P values <.001.
90 b
b b
2015-2016 Reviewer survey
80 2016-2017 Reviewer survey
2015 PO survey
2016 PO survey
70
Strongly Agree/Agree, %a
60
50
40
30
20
10
0
Quality of Collective Assignments Quality of Review of Program Review of Program
Prioritization Expertise Discussion Applications Using IAM Applications Using VAM
The 2015-2016 reviewer survey yielded a response rate of 47.1% (4832 of 10,262), and the 2016-2017 reviewer survey yielded a response rate of 47.0% (4807 of
10,228). The 2015 Program Officer (PO) survey had a response rate of 38.0% (348 of 916), and the 2016 replication PO survey had a response rate of 37.0% (335 of
905). IAM indicates Internet-assisted meeting; VAM, video-assisted meeting.
a
Strongly agree or agree refers to a 1 or 2, respectively, as assessed on a 7-point Likert-type scale.
b
IAM reviewers not included in 2016.
improvements by stakeholders are being addressed by alternative methods of prioritizing applications could reduce
leadership. the number of tied scores or increase ranking dispersal.
3
%
10 20 30 40 50 60 70 80 90
Priority Score
B After Type
5 Original
Half point
2
%
0
10 20 30 40 50 60 70 80 90
Priority Score
A, Distribution of final scores for grant applications as a percent of all scores (of 32,586 applications). Each application received scores from many reviewers that were
multiplied by 10 and averaged to the nearest unit. Possible final scores for each application ranged from 10 to 90. Dates refer to the cycle of review and the number is
the quantity of applications with scores. In January 2016, 10,571 applications received scores; in May 2016, 11,350 applications received scores; and in October 2016,
10,665 applications received scores. B, Comparison of distribution of original average scores (red) with scores for which reviewers were allowed to add or subtract a
half point (blue). Average scores are rounded to the nearest digit to establish ranking. Original refers to the proportion of scores at each possible score level under
normal whole digit scoring. Half point refers to the proportion of scores at each possible score level when reviewers used whole digits plus or minus 1 half point.
www. peerreviewcongress.org 41