You are on page 1of 21

Russell and Burch

Revisited
Michael FW Festing
michaelfesting@aol.com
Workshop on The missing R : Reproducibility in a
Changing Research Landscape ,
ILAR, Washington DC, June 2014
1
Terms of reference
has the earnest effort to addressing the 3Rs actually
contributed to the issue of reproducibility in scientific studies?
Has, for example, the goal to reduce the number of mice to
minimum necessary to attain statistical significance actually left
experiments with insufficient numbers per treatment group for
reproducibility?
2
Why do animal experiments have
fewer subjects than clinical trials
Animal experiments
Aim to detect only large effects
Laboratory animals are uniform:
Age and weight
Diet
Environment
Genotype (particularly if inbred strains used)
Health
More reliable induced disease models
Clinical trials are large because:
Aim is to detect small, clinically important outcomes
Human patients quite variable
Any well designed experiment
should give repeatable results.
It doesnt depend on sample size
but is subject to specified levels of
sampling variation (Type I and Type
II errors).
3
4
Replacement
In-vitro methods, less sentient animals
Refinement
Free of infectious disease
Minimise pain and distress. Anasthesia and analgesia,
environmental enrichment
Reduction, e.g.
Research strategy
Experimental design and statistics
Principles of Humane Experimental
Technique
(Russell and Burch 1959)
Commissioned by Universities Federation for Animal Welfare (UFAW)
Reduction means better experimental
design and statistics
Obtaining the same amount of information from
fewer animals
e.g. Better control of variation using randomised block designs
Use of inbred strains
Obtaining more information from the same
number of animals
e.g factorial designs
5
Janine A. Clayton & Francis
Collins
Policy: NIH to balance sex in cell and animal studies
As part of its initiative to enhance rigour, the NIH plans to
disseminate training on experimental design for NIH staff, trainees
and grantees. Evaluation of sex differences will be included in these
modules.
Nature 14 May 2014
6
7
Incorporating both sexes into
one experiment
All male design
Treated Control
Males & females in two expts.
Treated Control Treated Control
Factorial design
Half of each sex
Treated Control
What is the scope for reduction
Experiments often poorly designed and
incorrectly analysed
Result: Too many false positive and false negative results and a waste
of animals and scientific resources.
Festing MFW (1992). The scope for improving the design of laboratory animal
experiments. Lab Animals 26:256-267.*
Festing MFW (1994). Reduction of animal use: experimental design and quality
of experiments. Lab Animals 28:212-221.
* 1
st
prize by GV-SOLAS for best published or unpublished manuscript on any aspect of
laboratory animal science
8
Survey of a random sample of 271
published papers using laboratory animals
Of the papers studied:
87% did not report random allocation of subjects to treatments
86% did not report blinding where it seemed to be appropriate
100% failed to justify the sample sizes used
5% did not clearly state the purpose of the study
6% did not indicate how many separate experiments were done
13% did not identify the experimental unit correctly
26% failed to state the sex of the animals
24% reported neither age not weight of animals
4% did not mention the number of animals used
35% which reported numbers used these differed in the materials
and methods and the results sections
etc.
9
Kilkenny et al (2009), PLoS One Vol. 4, e7824
Experiments dont have to be
large
Muriel claims that she can tell whether the milk is put in the cup before or
after the tea. Eight cups of tea are prepared, with four TM and four MT.
She is told that they will be presented to her in random order and she
should indicate which type they are.
Number of ways of choosing four cups out of eight cups =
!
! !
= 1680/24 = 70. Only 1/70 is right, so if she does it correctly p=0.014
10
After RA Fisher
Decision rule: If the p-value is less than p=0.05, we reject the null
hypothesis that she cant detect TM/MT and accept the alternative that
she can. The result is said to be statistically significant
Statistical errors in a well
designed experiment
Chance of false positive results (Type I error)
Depends on:
1) significance level (usually set at =0.05)
Chance of false negative results (Type II error)
Depend on:
1) Sample size
2) Significance level
3) Effect size
4) Alternative hypothesis
5) Variability of the experimental material
Current crisis involves too many false positive results. In a well designed
experiment these dont depend on sample size
11
False positive results in badly
designed/analysed experiments
Selective publication of positive results
Incorrect randomisation (e.g. groups kept separate with different
environments and terminated at different times)
Failure to blind where it is possible
Pseudo-replication & incorrect identification of the experimental unit
Failure in quality control of experimental material (e.g. animals and
reagents)
Inadequate external validity (can not be generalised to other situations)
Inadequate description of methods (e.g. strain nomenclature)
Incorrect statistical analysis:
No statistical analysis
Multiple testing without adjustment
Wrong statistical model
Incorrect treatment of outliers: cherry-picking the data
12
Positive results in studies of endocrine disruption by bisphenol A.
94/104 = 90% Government funded
0/11 = 0% Industry funded
Frederick S. vom Saal1 and Claude Hughes.
Environ Health Perspect 113:926933 (2005)
Clear evidence of conflicts of
interest impacting results
13 (10 Govt. funded, 3 Industry) studies used SD rats
from Charles River. All were negative. This strain resistant
to DES
13
Percent responders to a synthetic
polypeptide in outbred CD rats
0
10
20
30
40
50
60
70
80
90
100
1 3 5 7 9 11 13 15 17 19 21 23 25
Sampl e number
P
e
r
c
e
n
t

r
e
s
p
o
n
d
e
r
s
Simonianet al 1968, J . Immunol. 101:730. Note that 7 colonies of
inbredrats wereeither 100% respondersor non-responders. N~30
14
Annual Statistics of Scientic Procedures
on Living Animals Great Britain 2012
4 million animals/70
million people/yr.
~ 4 animal/person in a
70 year lifespan
15
Annual Statistics of Scientic Procedures
on Living Animals Great Britain 2012
16
Annual Statistics of Scientic Procedures
on Living Animals Great Britain 2012
17
4 million animals each
year for 70 million
people.
~ 4 animals/person in
a 70 year lifespan
Training needed
A basic understanding of experimental design and statistics is
necessary for all scientists. For investigators with no previous
training in statistics, this level of expertise can probably be
obtained from an introductory course. There are many texts on
statistical methods, which can be used for both learning purposes
and as reference books. Biomedical research workers should
have more detailed training in biometrics and statistics so that
they can act as consultants to other investigators in their own
institutes.
The Three R's: The Way Forward
Joanne Zurlo, Deborah Rudacille, and Alan M. Goldberg
Article reprinted from "Environmental Health Perspectives,"
August 1996, vol. 104, no. 8
18
2002
19
WWW.
20
Conclusions
The 3Rs provide a strategy for every
research project
Replace animals with in-vitro methods wherever possible
Refine experiments to minimise pain and distress of animals
must be used
Use the minimum number of animals consistent with achieving
the objective
In a well designed experiment false positive results
depend only on the significance level
Current problems are due to excessive numbers of false
positive results. This is due to faulty experimental design
Training is needed!!
21

You might also like