Professional Documents
Culture Documents
qxd
22/11/07
12:05
Page 132
Cunningham JB, McCrum-Gardner E. (2007) Power, effect and sample size using GPower: practical
issues for researchers and members of research ethics committees. Evidence Based Midwifery 5(4): 132-6
Abstract
Background. The issue of sample size has become a dominant concern for UK research ethics committees since their
reform in 2004. Sample size estimation is now a major, but often misunderstood concern for researchers, academic
supervisors and members of research ethics committees.
Aim. To enable researchers and research ethics committee members with non-statistical backgrounds to use freely
available statistical software to explore and address issues relating to sample size, effect size and power.
Method. Basic concepts are examined before utilising the statistical software package GPower to illustrate the use of
alpha level, beta level and effect size in sample size calculation. Examples involving t-tests, analysis of variance (ANOVA)
and chi-square tests are used.
Results. The examples illustrate the importance of effect and sample size in optimising the probability of a study to detect
treatment effects, without requiring these effects to be massive.
Conclusions. Researchers and research ethics committee members need to be familiar with the technicalities of sample
size estimation in order to make informed judgements on sample size, power of tests and associated ethical issues. Alpha
and power levels can be pre-specified, but effect size is more problematic. GPower may be used to replicate the examples
in this paper, which may be generalised to more complex study designs.
Key words: Power, sample size, beta level, effect size, research ethics committees, GPower
Introduction
Since the introduction of a new UK Ethics Committee
Authority (UKECA) in 2004 and the setting up of the
Central Office for Research Ethics Committees (COREC),
research proposals have come under greater scrutiny than
ever before. The era of self-regulation in UK research ethics
has ended (Kerrison and Pollock, 2005). The UKECA
recognise various committees throughout the UK that can
approve proposals for research in NHS facilities (National
Patient Safety Agency, 2007), and the scope of research for
which approval must be sought is defined by the National
Research Ethics Service, which has superceded COREC.
Guidance on sample size (Central Office for Research
Ethics Committees, 2007: 23) requires that the number
should be sufficient to achieve worthwhile results, but
should not be so high as to involve unnecessary recruitment
and burdens for participants. It also suggests that formal
sample estimation size should be based on the primary
outcome, and that if there is more than one outcome then
the largest sample size should be chosen. Sample size is a
function of three factors the alpha level, beta level and
magnitude of the difference (effect size) hypothesised.
Referring to the expected size of effect, COREC (2007: 23)
guidance states that it is important that the difference is
not unrealistically high, as this could lead to an underestimate of the required sample size.
In this paper, issues of alpha, beta and effect size will be
considered from a practical perspective. A freely-available
statistical software package called GPower (Buchner et al,
132
1997) will be used to illustrate concepts and provide practical assistance to novitiate researchers and members of
research ethics committees. There are a wide range of
freely available statistical software packages, such as PS
(Dupont and Plummer, 1997) and STPLAN (Brown et al,
2000). Each has features worth exploring, but GPower
was chosen because of its ease of use and the wide range
of study designs for which it caters. Using GPower, sample
size and power can be estimated or checked by those with
relatively little technical knowledge of statistics.
Alpha and beta errors and power
Researchers begin with a research hypothesis a hunch
about the way that the world might be. For example, that
treatment A is better than treatment B. There are logical
reasons why this can never be demonstrated as absolutely
true, but evidence that it may or may not be true can be
obtained by endeavouring to show that there is no difference in the outcomes of treatments A and B. The statement that can then be tested is there is no difference in
the outcomes of A and B, and this is called the null
hypothesis. The researcher wants to be able to reject the
null hypothesis and to show that differences in the treatments outcomes are not due to chance.
When a hypothesis is specified, researchers also state the
level of significance at which they will reject the null
hypothesis. The minimum level for rejection is usually
p=0.05, and this is the probability of making a Type 1
error rejecting the null hypothesis when it is in fact true.
2007 The Royal College of Midwives. Evidence Based Midwifery 5(4): 132-6
22/11/07
12:05
Page 133
Cunningham JB, McCrum-Gardner E. (2007) Power, effect and sample size using GPower: practical
issues for researchers and members of research ethics committees. Evidence Based Midwifery 5(4): 132-6
Effect size
Effect size is a way of quantifying the difference between
two or more groups, or a measure of the difference in the
outcomes of the experimental and control groups. For
example, if one group has a new treatment and the other
has not (control group), then the effect size is a measure of
the effectiveness of the treatment.
Just because a result is statistically significant, this does
not mean it is substantive in effect. For example, two
Statistical test
Small
(symbol for effect size) effect
Medium
effect
Large
effect
0.20
0.50
0.80
0.10
0.25
0.40
Chi-square goodness
of fit (w)
0.10
0.30
0.50
Using standardised effect size simplifies sample size estimation, but it should not replace the need for sound
judgements on why a specific effect size is chosen.
At the risk of reductionism, sample size can be estimated from specified alpha and beta levels, and from the
specification of whether a small, moderate or large effect
size is anticipated.
2007 The Royal College of Midwives. Evidence Based Midwifery 5(4): 132-6
133
22/11/07
12:06
Page 134
Cunningham JB, McCrum-Gardner E. (2007) Power, effect and sample size using GPower: practical
issues for researchers and members of research ethics committees. Evidence Based Midwifery 5(4): 132-6
Effect size
Small
Moderate
Large
0.80
Independent
sample t-test
788
128
52
1054
172
68
ANOVA
(four group
means)
1096
180
76
1424
232
96
Chi-square
(4 df a twoby-five table)
1194
133
48
1541
172
162
0.90
0.80
0.90
0.80
0.90
134
variance1= 9.2
variance2= 8.8
Pooled variance s2 =
2007 The Royal College of Midwives. Evidence Based Midwifery 5(4): 132-6
22/11/07
12:07
Page 135
Cunningham JB, McCrum-Gardner E. (2007) Power, effect and sample size using GPower: practical
issues for researchers and members of research ethics committees. Evidence Based Midwifery 5(4): 132-6
Figure 1. Effect sizes (d), power and total sample size for
in independent sample t-test, alpha=0.05, equal sample
sizes (from graphic drawn by GPower)
t-tests Means: Difference between two independent means (two groups)
Tail(s) = Two, Allocation ratio N2/N1 = 1, a err prob = 0.05
1200
1000
Effect size
= 0.2
= 0.25
= 0.3
800
600
400
200
0.6
0.7
0.8
0.9
Power
Control
Experiment
Outcome
marginal total
Positively beneficial
0.175
0.325
0.5
No benefit
0.325
0.175
0.5
0.50
0.50
1.0
(j )2
j=1
k2error
= 0.54
3(75)2
This represents a large effect size, and when sample size is
estimated using GPower, a total sample size of 39 cases is
found to be required 13 in each group. GPower can also
calculate the effect size directly from the means.
f=
(P1i P0i)2
i=1
P0i
Where k is the number of cells, P0i is the population
proportion in cell i under the null hypothesis and P1i is
the population proportion in cell i under the alternative
hypothesis. P0i for each cell is calculated by multiplying
the row marginal for the cell by the column marginal for
the respective cell, and dividing by the sum of probabilities for all cells which will always be one. Thus, for
cell1.1, P0i = 0.50 x 0.50/1 = 0.25. P0i is 0.25 for all cells.
w=
)+(
)+(
)(
(0.1750.25)2
(0.3250.25)2
(0.3250.25)2
0.25
0.25
0.25
(0.1750.25)2
0.25
w = 0.30
2007 The Royal College of Midwives. Evidence Based Midwifery 5(4): 132-6
135
22/11/07
12:07
Page 136
Cunningham JB, McCrum-Gardner E. (2007) Power, effect and sample size using GPower: practical
issues for researchers and members of research ethics committees. Evidence Based Midwifery 5(4): 132-6
References
Altman DG. (1981) Statistics and ethics in medical research: how large a
sample? Br Med J 281(6251): 1336-8.
Bacchetti P, Leslie E, Wolf LE, Segal MR, McCulloch CE. (2005) Ethics and
sample size. Am J Epidemiol 161(2): 105-10.
Brown BW, Brauner C, Chan A, Gutierrez D, Herson J, Lovato J, Polsley J,
Russell K, Venier J. (2000) BAM software download site: STPLAN software. The University of Texas MD Anderson Center: Houston, Texas. See:
http://biostatistics.mdanderson.org/SoftwareDownload/SingleSoftware.asp
x?Software_Id=41 (accessed 18 September 2007).
Buchner A, Erdfelder E, Faul F. (1997) How to use GPower. Heinrich-HeineUniversitt: Dsseldorf. See: www.psycho.uni-duesseldorf.de/aap/projects/
gpower/how_to_use_gpower.html (accessed 20 November 2007).
Burns R. (1785) To a mouse: In: Noble A, Hogg P. (Eds.). (2003) The
Canongate Burns: the complete poems and songs of Robert Burns.
Canongate: Edinburgh.
Camacho-Sandoval J. (2007) GPower tutorial. Heinrich-Heine-Universitt:
www.nres.npsa.nhs.uk/applicants/help/contacts/recsrecognised.htm
136
2007 The Royal College of Midwives. Evidence Based Midwifery 5(4): 132-6