Clinical Slides 6

MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials
Contents
Preliminaries Multiplicity &c.
0: Introduction
1: Background & Basic Concepts Multiplicity & Interim Analysis
2: Basic Trial analysis Books
3: Randomization Andersen, B. (1990) Methodological
4: Protocol Deviations Errors in Medical Research. Blackwell
5: Size of the Trial
Papers
6: Multiplicity & Interim Analysis ICH E9 Expert Working Group. (1999)
7: Crossover Trials Statistics in Medicine, 18, 1905-42.
8: Combining Trials Philips, Alan & Haudiquet, Vincent (2003)
9: Binary Response Data Statistics in Medicine, 22, 1-11
10: Comparing Methods of Measurement Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
163 NRJF, University of Sheffield, 2011/12 Semester
164
Multiplicity arises in Example: Effect of new dietary control

Multiple end-points
regime.
Subgroup analyses
Data: 250 subjects:
Interim testing
Weight loss at end of week.
Repeated Measures
Data in kg.
&c.
Paired t-test gives p-value of 0.067
Problem of repeated significance tests
May inflate risk of false positive Not quite significant at the 5% level !
i.e. overall significance level
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

166
Can anything be done to squeeze a

significant result out of this expensive p-value for Aries is 0.019
With mean weight loss of 0.5kg
study ?????
weve been told we cannot change our mind
and use a one-sided test instead! p-value for Taurus is 0.099
With mean weight gain of 0.3kg
subgroup data by Sign of the Zodiac:
Conclusions:
Diet successful for those under Aries
Taurus subjects are perverse

168
NRJF, University of Sheffield, 2011/12 Semester 1

Clearly a False Positive Result Statistical tests make mistakes

Declare a real difference exists
Fallacy arises because of but in fact the observed difference
selecting most significant result. is due to natural chance variation
Risk controlled
for each individual single test
(data are artificial, but not very)
significance level of the test or the p-value
useful device to try if pressed to perform if many separate significance tests then
post-hoc subgroup analysis (c.f. Richard Peto) difficult to control overall risk of declaring
at least one false positive somewhere
170
5% test then 95% chance of no mistake c.f. Normal Ranges

in clinicochemcal tests
Two 5% tests then 95%95% A normal person is one who has
(= 90.25%) of no mistake on either not been sufficiently investigated.
A normal range comprise 95% of values
So 10% risk of one or other (or both)
100 normal persons evaluated
giving a false positive
then only 95 of them will normal
If then subjected to another independent
i.e. overall significance level is ~10%
test only 90 will remain as normal

172
Multiplicity: Bonferroni correction

10 independent tests at [nominal] 5% k tests, want overall level to be
H0 true in all (i.e. no difference) Take nominal level

Chance of rejecting at least 1 is 40%
on each test as /k
Example:
SO reduce nominal level in each 5 separate tests
to control overall significance level Overall 5% level of significance wanted
Declare a result if any test
nominally significant at the
5%/5=1% significance level
174

Example:
Example: 12 tests have been performed
25 tests are to be performed smallest p-value is 0.019
overall level of 1% intended What is the overall level of significance?
so each should be run at a Bonferroni method says overall level
nominal level of 1/25=0.04% is 120.019 = 0.228
i.e. a result should not be claimed This is the Signs of the Zodiac example
unless p < 0.0004 in any one of them i.e. no worthwhile evidence of any birth sign
being particularly suited to dieting
(see again later)

176
Bonferroni method typically Multiple End-points

very conservative e.g. pulse rate, systolic
& diastolic blood pressure
i.e. less likely to be able to declare a real
sitting,standing & supine
difference exists even if there is one
before & after exercise
But is safe
i.e. you preserve your scientific reputation
Separate tests high risk of false positives
by avoiding making mistakes but
at expense of failing to discover
something scientifically interesting

178
Remedies:
Bonferroni correction Very frustrating if you had considered
choose primary outcome measure 20 highly correlated measures
multivariate analysis each gives nominal p-value of 0.01
NB: Bonferroni very conservative Bonferroni says can only claim
multiple outcome measures an overall p-value of 0.2
likely to be highly correlated Would have been better not
standing systolic BP will give to have measured the other 19
similar evidence to sitting BP

180

Better is to define primary outcome Multivariate Analysis

perhaps 2 or 3 secondary measures Makes proper allowance in the analysis
Must be stated in the protocol
for correlated observations
medical expertise There are multivariate equivalents of
initial results from a pilot study standard univariate statistical analyses
Students t-test Hotellings T2-test
ANOVA MANOVA
Other measures (e.g. lab results)
Multivariate Analysis of Variance
should be scrutinised Wilks test or Lawley-Hotelling test
report causes for concern

182
Advantage of multivariate analysis

handle all measures simultaneously Cautionary Examples
ref: Br J Clin Pharmacol [Suppl.], 1983, 16: 103
return a single p-value
effect of midazolan on sleep
Disadvantage
table of 29 tests of significance
difficulty of interpreting the nature on measures of platform balance
of the difference detected made at various times
Many MV procedures in stats packages repeated measures analysis (see later)
Advice must be to use them with caution
unless experienced help is to hand
184
Cautionary Examples
ref: Basic Clin Med 1981, 15: 445 Andersen quotes
double-blind controlled clinical trial The Lancet (1984, ii: 1457)
to treat rheumatoid arthritis
several end-points repeated at various Moreover, submitting a larger number
timepoints and various subdivisions
of factors to statistical examination
850 pairwise comparisons were made
not only improves your chances of
t-tests and Fishers exact test
48 of these gave p-values < 0.05
a positive result but also enhances
But expect 5% of 850 = 850/20 = 42.5
your reputation for diligence
so finding 48 is not very impressive
186

One-way ANOVA generalisation to

Subgroup analyses several samples of a two-sample t-test
Similar problems with subgroups
tests differences between subgroups
Need to specify which subgroups tests null hypothesis that all subgroups have
of particular interest in protocol the same mean vs one or more is different
If none in particular then If effect exhibited in only one of
Bonferroni adjustment several subgroups then one (or more)
Analysis of Variance of the subgroups is different from the rest
so test this with ANOVA
Follow-up tests for multiple comparisons
Follow-up tests to identify which is of interest
Tukeys / Dunnetts / Neuman-Keuls /

188
Example: Signs of Zodiac (see notes) Can also look at 12 separate p-values
p-value for differences in weight loss
between Zodiac signs is 0.405
Dotplot of p-values
Boxplots of Weight loss by Zodiac sign
(means are indicated by solid circles)
No evidence of 2
0.0 0.1 0.2 0.3 0.4 0.5
p-value
0.6 0.7 0.8 0.9
difference so 1
Weight loss
follow-up tests
If any evidence that some groups
0
not really -1
were shewing an effect then some
appropriate -2
of them would be clustered towards
Zodiac sign
Aries
Pisces
Taurus
Leo
Cancer
Aquarius
Capricorn
Libra
Virgo
Scorpio
Sagittarius
Gemini
(but see example

near 0.0 and not evenly spread out
in notes)
190
Example (Lee et al, Circulation, 1980)

1073 subjects randomized in two groups But:
No overall significance In fact, no difference between treatments
6 [post-hoc] subgroups defined
All patients were treated in the SAME way
One of these produced significance
at nominal 2.5% level (p=0.023) Groups were just random allocations
Medical reason for expecting
this subgroup to be different
i.e. a false positive effect

192

Cautionary Example (see notes) Interim analyses

ref: N Engl J Med 1978, 298: 647 Desirable in long trial
Complex study on age at presentation of Check protocol compliance
European, black and Latino men & women Side effects?
in an anaemia study Feedback
(maintains interest)
Needs a 3-way ANOVA to investigate
interactions between gender and race and Detect big effects quickly
age.

194
Repeated significance tests

on accumulating data
However, multiplicity problems:
Number of repeated overall significance
Specify in protocol
tests at the 5% level level
Adjust nominal significance levels
1 0.05
Bonferroni too conservative 5 0.14
(accumulating data) 10 0.19

100 0.37

196
Comparison of drug combinations CP

Nominal significance levels and CVP in non-Hodgkins lymphoma.
required to achieve overall level
Measure: tumour shrinkage
N =0.05 =0.01
Trial: over 2 years, about 120 patients.
2 0.029 0.0056
Five interim analyses planned,
5 0.016 0.0028
roughly after every 25th result.
10 0.0106 0.0018
Table gives numbers of successes and
nominal p-values using a 2 test at each stage.

198

response rates
Analysis CP CVP statistic & p-value Conclusion:
1 3/14 5/11 1.63 (p>0.20) Not significant at end of trial
2 11/27 13/24 0.92 (p>0.30) (overall p>0.05) since p>0.016
the required nominal value for 5 repeat tests
3 18/40 17/36 0.04 (p>0.80)
If NO interim analyses had been done
4 18/54 24/48 3.25 (0.05<p<0.1)
then conclusion would have been different
5 23/67 31/59 4.25 (0.025<p<0.05) CVP declared significantly better at 5% level

200
Cautionary Example:
ref: Br J Surg, (1974), 61: 177
No significant difference with 49 patients Continuing to collect data
The trial was therefore continued until a significant result
After 100 patients gave result is obtained is clearly dishonest
2 = 4.675, d.f. = 1, p< 0.05 eventually an apparently
(and the trial was published) significant result will be obtained
Actual nominal p-value is 0.031 > 0.029
so cannot claim overall 5% significance

202
Repeated Measures Remedies

same feature on a patient Bonferroni adjustments
measured at several time points Very conservative (high correlation)
blood concentration at baseline and at Multivariate analysis
1, 3, 6, 12 and 24 hours after drug Special techniques for this
Must not do t-tests at each time point Construction of summary measures
diagrams with mean values of the two Area under curve
treatment groups plotted against time
Change from base line
with error bars for each mean
invite the eye to do exactly that Mean change

204

placebo metoprolol
Miscellany deaths 62/697 (8.9%) 40/698 (5.7%) p<0.02
Post-hoc re-grouping age 4064 26/453 (5.7%) 21/464 (4.5%) p>0.2
Dangerous to combine
small subgroups together age 6574 36/244 (14.8%) 19/234 (8.1%) p=0.03
after the data have been collected Metoprolol better for elderly?
Example: age 4069 51/627 (8.1%) 32/629 (5.1%) p=0.04
Death or survival in 90 days
age 7074 11/70 (15.7%) 8/69 (11.6%) p>0.2
after heart attack
65-69 age group combined Metoprolol better for younger?
with older or younger groups

206
Multiple Regression Example:

men who did not shave regularly were
large regression analyses
many explanatory variables 70% more likely to suffer a stroke and
ordinary regression
30% more likely to suffer heart disease
according to study at the University of Bristol
logistic regression for success/failure data
Cox regression for survival data Perhaps from a logistic regression analysis
Need to ensure that effects are Is diligence in shaving a medically
not selected just because they are plausible feature to be investigated???
the most significant coefficients How many other variables were
included in the study???
208
Problems of multiplicity
can be overcome by
Summary and Conclusions Bonferroni corrections
Multiplicity can arise in Bonferroni typically very conservative
testing several different responses other adjustments in special cases
subgroup analyses e.g. for accumulating data in interim analyses
where adjusting for multiplicity can
interim analyses have counter-intuitive effects
repeated measures more sophisticated analyses
&c. e.g. ANOVA or multivariate methods
The effect of multiplicity is to increase If you torture the data often enough
the overall risk of a false positive it will eventually confess
(i.e. the overall significance level)
210

Clinical Slides 6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clinical Slides 6

Uploaded by

Copyright:

Available Formats

MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

Multiplicity arises in Example: Effect of new dietary control

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Can anything be done to squeeze a

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

NRJF, University of Sheffield, 2011/12 Semester 1

Clearly a False Positive Result Statistical tests make mistakes

5% test then 95% chance of no mistake c.f. Normal Ranges

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Multiplicity: Bonferroni correction

H0 true in all (i.e. no difference) Take nominal level

NRJF, University of Sheffield, 2011/12 Semester 1

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Bonferroni method typically Multiple End-points

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

NRJF, University of Sheffield, 2011/12 Semester 1

Better is to define primary outcome Multivariate Analysis

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Advantage of multivariate analysis

NRJF, University of Sheffield, 2011/12 Semester 1

One-way ANOVA generalisation to

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

(but see example

Example (Lee et al, Circulation, 1980)

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

NRJF, University of Sheffield, 2011/12 Semester 1

Cautionary Example (see notes) Interim analyses

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Repeated significance tests

Bonferroni too conservative 5 0.14

(accumulating data) 10 0.19

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Comparison of drug combinations CP

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

NRJF, University of Sheffield, 2011/12 Semester 1

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Repeated Measures Remedies

NRJF, University of Sheffield, 2011/12 Semester 1

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials

Multiple Regression Example:

NRJF, University of Sheffield, 2011/12 Semester 1

You might also like