Professional Documents
Culture Documents
Contents
Preliminaries Multiplicity &c.
0: Introduction
1: Background & Basic Concepts Multiplicity & Interim Analysis
2: Basic Trial analysis Books
3: Randomization Andersen, B. (1990) Methodological
4: Protocol Deviations Errors in Medical Research. Blackwell
5: Size of the Trial
Papers
6: Multiplicity & Interim Analysis ICH E9 Expert Working Group. (1999)
7: Crossover Trials Statistics in Medicine, 18, 1905-42.
8: Combining Trials Philips, Alan & Haudiquet, Vincent (2003)
9: Binary Response Data Statistics in Medicine, 22, 1-11
10: Comparing Methods of Measurement Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
163 NRJF, University of Sheffield, 2011/12 Semester
164
useful device to try if pressed to perform if many separate significance tests then
post-hoc subgroup analysis (c.f. Richard Peto) difficult to control overall risk of declaring
at least one false positive somewhere
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
169 NRJF, University of Sheffield, 2011/12 Semester
170
Example:
Example: 12 tests have been performed
25 tests are to be performed smallest p-value is 0.019
overall level of 1% intended What is the overall level of significance?
so each should be run at a Bonferroni method says overall level
nominal level of 1/25=0.04% is 120.019 = 0.228
i.e. a result should not be claimed This is the Signs of the Zodiac example
unless p < 0.0004 in any one of them i.e. no worthwhile evidence of any birth sign
being particularly suited to dieting
(see again later)
Remedies:
Bonferroni correction Very frustrating if you had considered
choose primary outcome measure 20 highly correlated measures
multivariate analysis each gives nominal p-value of 0.01
NB: Bonferroni very conservative Bonferroni says can only claim
multiple outcome measures an overall p-value of 0.2
likely to be highly correlated Would have been better not
standing systolic BP will give to have measured the other 19
similar evidence to sitting BP
Cautionary Examples
ref: Basic Clin Med 1981, 15: 445 Andersen quotes
double-blind controlled clinical trial The Lancet (1984, ii: 1457)
to treat rheumatoid arthritis
several end-points repeated at various Moreover, submitting a larger number
timepoints and various subdivisions
of factors to statistical examination
850 pairwise comparisons were made
not only improves your chances of
t-tests and Fishers exact test
48 of these gave p-values < 0.05
a positive result but also enhances
But expect 5% of 850 = 850/20 = 42.5
your reputation for diligence
so finding 48 is not very impressive
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
185 NRJF, University of Sheffield, 2011/12 Semester
186
Example: Signs of Zodiac (see notes) Can also look at 12 separate p-values
p-value for differences in weight loss
between Zodiac signs is 0.405
Dotplot of p-values
Boxplots of Weight loss by Zodiac sign
(means are indicated by solid circles)
No evidence of 2
0.0 0.1 0.2 0.3 0.4 0.5
p-value
0.6 0.7 0.8 0.9
difference so 1
Weight loss
follow-up tests
If any evidence that some groups
0
not really -1
were shewing an effect then some
appropriate -2
of them would be clustered towards
Zodiac sign
Aries
Pisces
Taurus
Leo
Cancer
Aquarius
Capricorn
Libra
Virgo
Scorpio
Sagittarius
Gemini
response rates
Analysis CP CVP statistic & p-value Conclusion:
1 3/14 5/11 1.63 (p>0.20) Not significant at end of trial
2 11/27 13/24 0.92 (p>0.30) (overall p>0.05) since p>0.016
the required nominal value for 5 repeat tests
3 18/40 17/36 0.04 (p>0.80)
If NO interim analyses had been done
4 18/54 24/48 3.25 (0.05<p<0.1)
then conclusion would have been different
5 23/67 31/59 4.25 (0.025<p<0.05) CVP declared significantly better at 5% level
Cautionary Example:
ref: Br J Surg, (1974), 61: 177
No significant difference with 49 patients Continuing to collect data
The trial was therefore continued until a significant result
After 100 patients gave result is obtained is clearly dishonest
2 = 4.675, d.f. = 1, p< 0.05 eventually an apparently
(and the trial was published) significant result will be obtained
Actual nominal p-value is 0.031 > 0.029
so cannot claim overall 5% significance
placebo metoprolol
Miscellany deaths 62/697 (8.9%) 40/698 (5.7%) p<0.02
Post-hoc re-grouping age 4064 26/453 (5.7%) 21/464 (4.5%) p>0.2
Dangerous to combine
small subgroups together age 6574 36/244 (14.8%) 19/234 (8.1%) p=0.03
after the data have been collected Metoprolol better for elderly?
Example: age 4069 51/627 (8.1%) 32/629 (5.1%) p=0.04
Death or survival in 90 days
age 7074 11/70 (15.7%) 8/69 (11.6%) p>0.2
after heart attack
65-69 age group combined Metoprolol better for younger?
with older or younger groups
Problems of multiplicity
can be overcome by
Summary and Conclusions Bonferroni corrections
Multiplicity can arise in Bonferroni typically very conservative
testing several different responses other adjustments in special cases
subgroup analyses e.g. for accumulating data in interim analyses
where adjusting for multiplicity can
interim analyses have counter-intuitive effects
repeated measures more sophisticated analyses
&c. e.g. ANOVA or multivariate methods
The effect of multiplicity is to increase If you torture the data often enough
the overall risk of a false positive it will eventually confess
(i.e. the overall significance level)
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
209 NRJF, University of Sheffield, 2011/12 Semester
210