Professional Documents
Culture Documents
Copyright 1999 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved
association; bias (epidemiology); case-control studies; chi-square statistic; genes; significance tests
706
Association studies of genetic markers or candidate genes with disease are often conducted using the
traditional case-control design. Cases and controls are sampled from genetically unrelated subjects, and allele
frequencies compared between cases and controls using Pearson's chi-square statistic. An assumption of this
analysis method is that the two alleles within each subject are statistically independent, at least when no
association exists. This is equivalent to assuming that the frequencies of the genotypes in the general population
comply with Hardy-Weinberg Equilibrium proportions, which may not always be the case. However, deviations
from Hardy-Weinberg Equilibrium can inflate the chance of a false-positive association. These results
demonstrate that when comparing the frequencies of two alleles between cases and controls, the chance of a
false-positive association can be substantially increased if homozygotes for the putative high-risk allele are more
common in the general population than predicted by Hardy-Weinberg Equilibrium. In contrast, Pearson's chisquare statistic can be conservative if the frequency of homozygotes for the high-risk allele is less than that
predicted. A statistically valid method that corrects for deviations from Hardy-Weinberg Equilibrium is presented,
so that the chance of a false-positive association is not greater than the acceptable level. Am J Epidemiol
1999;149:706-11.
Am J Epidemiol
~ Pc)
(i)
here Vis the variance of (pd-pc). If Vis correctly specified, z has an approximate standard normal distribution. When HWE exists, V can be written as
^77 I
^ H W E ~~
(2)
NonHWE
2Nd +
Wj
(3)
=W ~
707
708
by using only cases to estimate p and PAA' and a similar method for controls to estimate V.NooHWEj:' and then
add these to compute
>
= PI
'NonHWE
= V,NonHWE,*/
+ v,NonHWE,c-
= PI
HWb
- P
So,
ZNonHWE
the
max = 0.17
^ ^
Nominal Error Rate
0.15-
0.01
0.10max = 0.07
0.05 -
0.01
o.o
0.0
02.
0.4
0.8
0.8
1.0
FIGURE 1. True Type-I error rate as a function of fractional maximum discrepancy from Hardy-Weinberg Equilibrium (HWE) when there is an
excess of AA homozygotes and the assumed Type-I error rate is either 0.01 or 0.05.
'HWE
(5)
1+/'
'NonHWE
(6)
'NonHWE
RESULTS
inflated, as illustrated in figure 1. Here, the true TypeI error rate is plotted as a function of the fractional
maximum discrepancy, for an assumed Type-I error
rate of either 5 percent or 1 percent. When discrepancy
is at its maximum, the true Type-I error rate can be as
high as 17 percent for an assumed rate of 5 percent,
and as high as 7 percent for an assumed rate of 1 percent.
When the frequency of AA homozygotes is less than
that predicted by HWE, the Type-I error rate can be
deflated, as illustrated in figure 2 for both a common
allele (p = 0.25) and a rare allele (p = 0.05). For a common allele (p = 0.25), the Type-I error rate can be quite
conservative, especially if the assumed error rate is 5
percent. The amount of conservatism is less when the
assumed Type-I error rate is small, as for the assumed
error rate of 1 percent in figure 2. As the allele frequency gets smaller, the amount of negative disequilibrium is also reduced, resulting in a less conservative
Type-I error rate (e.g., when p = 0.05 in figure 2).
The results in figures 1 and 2 are based on expression 4, which assumes that the sample size is large
enough for the normal approximation to be adequate.
To validate the adequacy of this approximation, simulations were performed. The genotypes for an equal
number of cases and controls (Nd = Nc = 50 or 100)
were sampled according to the probabilities PM - p2 +
l P /
where p = 0.10 a n d / = 0, 0.5, or 1.0. The maximum
discrepancy, 5 , was p{\ -p) for excess AA homozy-
^
.
p = .O5
0.04 -
0.03 -
LU
0.02 0.01
0.01
_p = .O5
p = .25
0.0
0.0
02
0.4
0.6
0.8
1.0
FIGURE 2. True Type-I error rate as a function of fractional maximum discrepancy from Hardy-Weinberg Equilibrium (HWE) when there is a
deficiency of AA homozygotes, the allele A is either common (p = 0.25) or rare (p = 0.05), and the assumed Type-I error rate is either 0.01 or
0.05.
Am J Epidemiol
709
710
TABLE 1. Type-I error rates for statistical methods with and without assumptions of Hardy-Welnberg
Equilibrium (HWE)
Frequency
of
AA
homozygotes
0.5
0.0
size
(",= " )
Excess
50
100
Large-sample
approximation
0.033
0.047
0.050
0.041
0.057
0.050
0.091
0.104
0.110
0.050
0.062
0.050
0.166
0.147
0.166
0.054
0.049
0.050
Deficient
50
100
Large-sample
approximation
0.040
0.057
0.050
0.050
0.061
0.050
0.054
0.038
0.044
0.052
0.056
0.050
0.028
0.035
0.038
0.050
0.061
0.050
Nq, number of cases; Nc, number of controls. Type-I error rates for sample sizes of 50 and 100 are based on
simulations; large sample approximation is based on expression 4 in the text,
t f = 0 implies HWE, and f = 1 is the maximum departure from HWE.
t Zywz, statistic assuming HWE; ztlaltttm, statistic with variance corrections for departure from HWE.
7.
8.
9.
10.
11.
ACKNOWLEDGMENTS
12.
13.
15.
REFERENCES
1. Begg CB, Berlin JA. Publication bias: a problem in interpreting medical data. J R Stat Soc [A] 1988;151:419-45.
2. Wacholder S, McLaughlin JK, Silverman DT, et al. Selection
of controls in case-control studies. I. Principles. Am J
Epidemiol 1992;135:1019-28.
3. Wacholder S, Silverman DT, McLaughlin JK, et al. Selection
of controls in case-control studies, n. Types of controls. Am J
Epidemiol 1992;135:1029-41.
4. Wacholder S, Silverman DT, McLaughlin JK, et al. Selection
of controls in case-control studies. HI. Design options. Am J
Epidemiol 1992;135:1042-50.
5. Stefanski LA, Carroll RJ. Covariate measurement error in
logistic regression. Ann Stat 1985;13:133551.
6. Falk CT, Rubinstein P. Haplotypc relative risks: an easy reli-
16.
17.
18.
19.
20.
21.
able way to construct a proper control sample for risk calculations. Ann Hum Genet 1987;51:227-33.
Blum K, Nobel EP, Sheridan PJ, et al. Allelic association of
human dopamine D2 receptor gene in alcoholism. JAMA
1990;263:2055-60.
Blum K, Noble EP, Sheridan PJ, et al. Association of the A1
allele of the D dopamine receptor gene with severe alcoholism. Alcohol 1991;8:409-16.
Holden C. A cautionary genetic tale: the sobering story of D .
Science 1994;264:1696-7.
Gelernter J, Goldman D, Risch N. The Al allele at the D2
dopamine receptor gene and alcoholism. JAMA
1993;269:1673-7.
Pato CN, Macciardi F, Pato MT, et al. Review of the putatitve
association of dopamine D receptor and alcoholism: a metaanalysis. Am J Med Genet 1993;48:78-82.
Guo SW, Thompson EA. Performing the exact test of HardyWeinberg proportion for multiple alleles. Biometrics
1992;48:361-72.
Risch N. A general model for disease-marker association. Ann
Human Genet 1983;47:245-52.
Thomson G. HLA disease associations: models for the study of
complex human genetic disorders. Clin Rev Clin Lab Sci
1995;32:183-219.
Tiret L, Cambien F. Departure from Hardy-Weinberg
Equilibrium should be systematically tested in studies of association between genetic markers and disease. (Letter).
Circulation 1995;92:3364-5.
Sasieni PD. From genotypes to genes: doubling the sample
size. Biometrics 1997;53:1253-61.
Weir BS. Genetic data analysis. Sunderland, MA: Sinauer
Associates, Inc, 1990:34.
Ishigami T, Umemura S, Iwamoto T, et al. Molecular variant of
angiotensinogen gene is associated with coronary atherosclerosis. Circulation 1995;91:951^1.
Li CC. First course in population genetics. Pacific Grove, CA:
The Boxwood Press, 1976:522.
Haiti DL, Clark AG. Principles of population genetics. 2nded.
Sunderland, MA: Sinauer Associates, Inc, 1989.
Armitage P. Tests for linear trends in proportions and frequencies. Biometrics 1955;ll:375-86.
14.
711