Professional Documents
Culture Documents
1136/jech-2013-203104
Theory and methods
Meta-analysis of prevalence
Jan J Barendregt,1 Suhail A Doi,1 Yong Yi Lee,1 Rosana E Norman,2 Theo Vos3
1
University of Queensland,
School of Population Health
2
University of Queensland,
School of Population Health
and Queensland Children's
Medical Research Institute
3
University of Queensland,
School of Population Health
and University of Washington,
Institute of Health Metrics and
Evaluation
Correspondence to
Dr Jan J Barendregt, School of
Population Health, University of
Queensland, Herston Rd,
Herston, QLD 4006, Australia;
j.barendregt@sph.uq.edu.au
Received 12 July 2013
Revised 24 July 2013
Accepted 27 July 2013
ABSTRACT
Meta-analysis is a method to obtain a weighted average
of results from various studies. In addition to pooling
effect sizes, meta-analysis can also be used to estimate
disease frequencies, such as incidence and prevalence. In
this article we present methods for the meta-analysis of
prevalence. We discuss the logit and double arcsine
transformations to stabilise the variance. We note the
special situation of multiple category prevalence, and
propose solutions to the problems that arise. We
describe the implementation of these methods in the
MetaXL software, and present a simulation study and
the example of multiple sclerosis from the Global Burden
of Disease 2010 project. We conclude that the double
arcsine transformation is preferred over the logit, and
that the MetaXL implementation of multiple category
prevalence is an improvement in the methodology of the
meta-analysis of prevalence.
pi
Var(pi )
P i
P 1
i Var(pi )
s
X 1
SE(P)
Var(pi )
i
with SE:
INTRODUCTION
The large majority of meta-analyses are devoted to
establishing the effects of interventions, and therefore aim to get a pooled estimate of effect size, be
that relative risk, odds ratio (OR), risk difference,
or weighted or standardised mean difference.
However, meta-analysis methods can be useful as
well to get a more precise estimate of disease frequency, such as disease incidence rates and prevalence proportions. For example, the Global Burden
of Disease 2010 project aimed to obtain disease frequency estimates of a large number of diseases and
conditions, often based on a limited number of
studies, of usually varying quality.
This article looks at the meta-analysis of prevalence, using a concrete example from the Global
Burden of Disease 2010 study: the distribution of
severity in the prevalence of multiple sclerosis
(MS). We rst discuss the specic properties of
prevalence as a variable and the options to deal
with these in the meta-analysis, next look at the
multiple category case, then discuss the implementation of these methods in the MetaXL software,
present the simulation study and the example, and
conclude.
PREVALENCE AS A VARIABLE
Barendregt
JJ, et al.Article
J Epidemiolauthor
Community(or
Health
2013;0:15.
doi:10.1136/jech-2013-203104
Copyright
their
employer)
2013. Produced
TRANSFORMATIONS
While this works ne for prevalence proportions
around 0.5, increasing problems arise when the
proportions get closer to the limits of the 0..1
range. The rst problem is mostly cosmetic: the
equation for the CI does not preclude condence
limits outside the 0..1 range. While this is annoying, the second problem is much more substantial:
when the proportion becomes small or big, the
variance of the study is squeezed towards 0 (see
equation (1)). As a consequence, in the inverse variance method, the study gets a large weight. A
meta-analysis of prevalence according to the
method described above therefore puts undue
weight on the studies at the extreme of the 0..1
range.
The way to deal with these problems is to transform the prevalence to a variable that is not constrained to the 0..1 range, has an approximately
normal distribution, and by being unconstrained
avoids the squeezing of variance effect. The
meta-analysis can be carried out on the transformed proportions, using the inverse of the variance of the transformed proportion as study
weight. For nal presentation, the pooled transformed proportion and its CI are back transformed
Logit
The logit transformation is given by1:
p
l ln
1p
with variance:
Varl
1
1
+
Np N1 p
Double arcsine
The double arcsine transformation is given by2:
r
r
n
n1
sin1
8
N1
N1
with n the number of people in the category. The variance of t
is given by:
t sin1
1
N 0:5
The back transformation to a proportion is done using3:
Var(t)
8
2
12 30:5 9
0
>
>
>
>
1
>
>
>
>
sin
t
<
7
6
B
C 7 =
sin
t
6
B
C
p 0:5 1 sgn(cos t)61 @sin t
A 7
>
5 >
4
N
>
>
>
>
>
>
:
;
10
with sgn being the sign operator. A much simpler but less
accurate alternative back transformation is:
t 2
11
p sin
2
The double arcsine transformation addresses both the problem
of condence limits outside the 0..1 range and that of variance
instability, and it is therefore preferred over the logit
transformation.
MULTI-CATEGORY PREVALENCE
The discussion so far has implicitly been about single category
prevalence (those with the disease present which of course implies
2
t 2
sin
2
12
7
6
>
@
A 5 >
>
1
>
4
>
>
>
>
>
>
>
>
>
>
:
;
>
v
:
otherwise
13
8
P
>
>
0
if , 2
>
>
>
v
>
8
>
2
12 30:5 9
>
0
>
>
< >
>
>
1
>
>
>
sin t
<
=
7 >
6
LLC
B
C
7
6
sin
t
>
C 7
>
1B
sin t
> 0:5>1 sgn(cos t)6
@
A
>
1
5 >
4
>
>
>
>
>
>
>
>
>
>
;
v
> :
:
otherwise
14
and
8
>
1P
>
>
,2
1
if
>
>
v
>
>
8
>
2
>
12 30:5 9
0
<
>
>
>
>
1
>
>
>
>
ULC
sint
<
7
6
B
C 7 =
>
6
sint
>
B
C
>
0:5
1
sgn(cost)
1
sint
6
>
@
A 7
>
1
>
5 >
4
>
>
>
>
>
>
>
>
>
>
:
;
:
v
otherwise
15
p
where t t + Za=2 v.
The three issues caused by generalising to more than two categories are dealt with as follows. To obtain a single study weight
across categories for the untransformed and logit transformed
prevalences, MetaXL uses the inverse of the average of the category variances.
For the second issue, to obtain a single estimate for Cochrans
Q, we have chosen to take the maximum category Q. This is
because if we believe that the effect size variations across studies
in one category of proportions is not independent of variations
in the other categories, the maximum category Q value would
be the best and most conservative estimate of a common Q.
For the random effects model, a single 2 is calculated in the
usual way from this common Q and this 2 is added to the previously computed average variance across categories. Inverting
this inated variance obtains the random effects weight for each
study. The pooled variance for each category is computed by
rst adding 2 to category variances, invert to a weight, and sum
the category weights over studies. The pooled category variance
is then obtained from the reciprocal of the summed category
weights.
In the quality effects model, the redistribution of inverse variance weights is done using a quality parameter between 0
(lowest quality) and 1 (highest quality), individualised to each
study rather than the common 2 parameter across studies as is
done with the random effects model.5 6 In addition to this, the
quality effects model applies an overdispersion correction based
on Cochrans Q 2 statistic, resulting in a more conservative CI
in the presence of heterogeneity.4 Note that, unlike in the
random effects model, this increase does not affect the study
weights. For the multi-category prevalence analysis, the overdispersion correction is applied to each of the categories separately,
thus boosting the CIs in case of heterogeneity.
The third and nal issue is that, due to the transformations,
the back transformed proportions may not sum to 1. The difference is usually small, and MetaXL deals with it by offering the
option to normalise the prevalences in each category after
pooling and back transformation. The CIs, however, are not
adjusted.
SIMULATION
We carried out a simulation study to compare bias and coverage
between the transformation methods, based on a hypothetical
dataset. We dened nine studies of increasing size, with the
smallest comprising 20 subjects, and each subsequent study
having 20 more subjects, resulting in 180 subjects for the largest
study. We assumed the number of cases in each study to have a
binomial distribution, with study size and prevalence proportion
as the parameters.
For the xed effects model, we assumed a prevalence of 0.05,
and for the random effects model we assumed the prevalence to
have a Normal distribution with parameters 0.05 and 0.005, for
mean and SD, respectively. For the quality effects model, we
assumed that quality associated deviations from the true prevalence would also follow a Normal distribution with parameters
0.05 and 0.005. From the deviation we calculated a quality
score between 0 and 1 using a quadratic function. (Note that
the quality effects model does not require that the quality score
is a function of the difference between the true effect and the
study estimate, but for this simulation this provides a convenient
way to calculate a quality score.)
We implemented the numbers and functions in Excel, using
the Ersatz Monte Carlo simulation add-in, and drew randomly
from the distributions 1000 times.9 Each time we recalculated
the meta-analysis for the three transformation options for each
model. We calculated the mean of the central estimates, the bias
in this mean and the mean squared error, and determined the
coverage proportion, given the pooled CIs.
Results of the simulation study are presented in table 1.
Without transformation, the effect of the squeezed variance for
a prevalence of this size can clearly be seen: the mean pooled
prevalences are much lower than the true prevalence of 0.05,
dragged down by the large weights applied to the samples with
low prevalence. For the logit transformation, a reverse (but
much smaller) effect can be seen: the lowest sampled
Variable
None
Logit
Double arcsine
Fixed effects
Mean
Bias
MSE
Coverage
Mean
Bias
MSE
Coverage
Mean
Bias
MSE
Coverage
0.02810
0.02190
0.00077
0.39400
0.04215
0.00785
0.00014
0.82200
0.03111
0.01889
0.00064
0.63200
0.05428
0.00428
0.00007
0.92500
0.05373
0.00373
0.00007
0.94700
0.05263
0.00263
0.00006
0.95500
0.05206
0.00206
0.00006
0.94300
0.05162
0.00162
0.00006
0.96100
0.05145
0.00145
0.00006
0.95800
Random effects
Quality effects
Model
HCI
Range
Inverse variance
None
Logit
Arcsine
Study name
Country
Casetta 199810
Balasa 200711
Tsai 200412
Houzen 200313
Chancellor 200314
Cabre 200115
Bencsik 199816
Lobinska 200417
Bencsik 200118
Benedikz 200219
Alvarez 199220
Arruda 200121
Al-Araji 200522
Pittock 2004 (1)23
Pittock 2004 (2)23
Modrego Pardo 199724
McDonnell 199825
Tola 199926
Italy
Romania
Taiwan
Japan
New Zealand
Caribbean
Hungary
Poland
Hungary
Iceland
Chile
Brazil
Iraq
USA
USA
Spain
Ireland
Spain
394
152
43
31
86
62
130
204
248
290
100
200
300
162
201
46
258
54
0.561
0.441
0.651
0.645
0.512
0.645
0.785
0.632
0.149
0.710
0.550
0.610
0.400
0.525
0.627
0.609
0.322
0.574
0.175
0.250
0.233
0.129
0.337
0.306
0.162
0.260
0.540
0.179
0.330
0.225
0.373
0.296
0.224
0.217
0.477
0.296
0.264
0.309
0.116
0.226
0.151
0.048
0.054
0.108
0.310
0.110
0.120
0.165
0.227
0.179
0.149
0.174
0.202
0.130
1.00
1.00
0.60
1.00
1.00
1.00
1.00
1.00
0.60
1.00
0.60
1.00
1.00
1.00
1.00
1.00
1.00
1.00
Random effects
None
Logit
Arcsine
Quality effects
None
Logit
Arcsine
Mild
Moderate
Severe
Mild
Moderate
Severe
Mild
Moderate
Severe
0.539
0.281
0.179
0.511
0.297
0.192
0.525
0.289
0.185
0.523
0.266
0.167
0.477
0.271
0.173
0.502
0.270
0.170
0.556
0.297
0.192
0.515
0.305
0.201
0.537
0.302
0.198
0.033
0.031
0.026
0.038
0.034
0.029
0.036
0.033
0.028
Mild
Moderate
Severe
Mild
Moderate
Severe
Mild
Moderate
Severe
0.551
0.280
0.169
0.560
0.278
0.162
0.555
0.279
0.166
0.470
0.199
0.090
0.458
0.207
0.115
0.464
0.204
0.106
0.632
0.360
0.249
0.630
0.346
0.212
0.632
0.355
0.232
0.162
0.161
0.160
0.172
0.139
0.097
0.168
0.151
0.126
Mild
Moderate
Severe
Mild
Moderate
Severe
Mild
Moderate
Severe
0.552
0.272
0.176
0.527
0.286
0.187
0.540
0.279
0.181
0.441
0.192
0.129
0.464
0.244
0.156
0.461
0.224
0.143
0.663
0.352
0.223
0.562
0.317
0.212
0.607
0.331
0.219
0.223
0.160
0.094
0.097
0.073
0.056
0.145
0.107
0.076
REFERENCES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Contributors JJB and SAD developed the methods, YYL, RN and TV did the
meta-analyses, all authors contributed to writing.
Competing interests JJB owns Epigear International Pty Ltd, which sells the
Ersatz software used in the analysis.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The data are available from the Epigear website, and
are part of the MetaXL download.
23
24
25
26