You are on page 1of 5

Legal Medicine 3 (2001) 2933

www.elsevier.nl/locate/legalmed

Population variation at the CODIS core short tandem


repeat loci in Europeans
Bruce Budowle a,*, Ranajit Chakraborty b
a
FBI, Laboratory Division, 935 Pennsylvania Ave, NW, Washington, D.C. 20535, USA
b
Human Genetics Center, University of Texas School of Public Health, P.O. Box 20334, Houston, TX, USA
Received 9 December 2000; accepted 24 December 2000

Abstract
Substantial STR population data exist to estimate FST (or u ) values across Europeans. Eleven populations across Europe were
analyzed, and the estimate over all 13 CODIS core STR loci is 0.0028. This value is much less than the conservative estimate of
0.01 advocated by the second National Research Council Report in 1996. Because of the low value for u , whether indepen-
dence is assumed or an adjustment for substructure is employed, there is little practical consequence for forensic purposes for
estimating the frequency of a multiple locus DNA prole. If u is used, a value of 0.01 is very conservative for Europeans. The
same STR population data can be used for evolutionary studies on Europeans, and the calculated genetic distances are
consistent with the ethnohistory of the populations. q 2001 Elsevier Science Ireland Ltd. All rights reserved.
Keywords: Europeans; Population databases; Short tandem repeat (STR); FST; u Genetic distance

1. Introduction the evidentiary sample and the reference sample from


a suspect (or a victim, depending on the circumstances
The short tandem repeat (STR) loci are highly poly- of the case), an estimate of the rarity of the evidentiary
morphic loci found in the human genome, are rela- DNA prole is made. The computation is based on
tively small in size, and can be analyzed in a multiplex using the product rule with adjustments for population
PCR fashion. Thus, for forensic purposes, analysis of substructure. This statistical approach is known as the
STR loci is practical and highly informative for match probability. The National Research Council
addressing the source of an evidence sample. The (NRC) II Report [2] recommends Wright's FST statis-
STR loci typically used in the United States for foren- tic [3,4] (or u ) to estimate the degree of relatedness in
sic analyses are CSF1PO, FGA, TH01, TPOX, VWA, a population and to correct for departures from
D3S1358, D5S818, D7S820, D8S1179, D13S317, HardyWeinberg expectations due to population
D16S539, D18S51, and D21S11. These 13 STR loci substructure. The NRC II Report recommended that
are the core loci for the U.S. national DNA databank a conservative value for u is 0.01, although the NRC
CODIS (Combined DNA Index System) [1]. II Report suggested that 0.03 may be used until more
When a match occurs between the DNA proles of data were collected or for more genetically isolated
populations, such as Native Americans.
* Corresponding author. Data support that the value of u for most population
E-mail addresses: bbudowle@fbiacademy.edu (B. Budowle), groups is much lower than 0.01. Budowle, et al. [5]
bbudowle@fbiacademy.edu (B. Budowle).

1344-6223/01/$ - see front matter q 2001 Elsevier Science Ireland Ltd. All rights reserved.
PII: S 1344-622 3(01)00008-6
30 B. Budowle, R. Chakraborty / Legal Medicine 3 (2001) 2933

found that u estimates over all thirteen core CODIS analyses. These populations were used because raw
STR loci are 0.0006 for African Americans, 20.0005 genotype data were available.
for U.S. Caucasians, 0.0021 for Hispanics, and 0.0039 STR amplication and typing: The DNA samples
for Asians. Foreman, et al. [6] and Foreman and were amplied using kits from either PE Biosystems
Lambert [7] have found the value for u to be below (Foster City, CA) or the Promega Corporation (Madi-
0.01 between population samples within England and son, WI) and following the manufacturers' recom-
between England and the Netherlands or New Zeal- mendations. The amplied products were analyzed
and Caucasians. Their studies only looked at four STR using either an ABI Prisme 310 Genetic Analyzer,
loci: FGA, TH01, vWA, and D21S11 and the diversity ABI Prism w 373 or 377 DNA Sequencer (PE Biosys-
of European populations was small. However, tems, Foster City, CA), or an FMBIO II (MiraiBio/
substantial European population data are available to Hitachi Genetic Systems, Alameda, CA) according to
address the application of statistical inferences and the the manufacturer's recommended protocol. Details
determination of realistic values for u . about the analytical process can be obtained by
In the current paper, data from 11 European popu- contacting the authors of each population study.
lation groups were analyzed for the 13 CODIS STR Statistical analysis: The values for u were deter-
loci. The results show that there is little genetic varia- mined as described by Weir and Cockerham [16].
tion among Europeans, that a u value of 0.01 is Genetic distance, with bias correction, was estimated
conservative for European subpopulations, and that according to Nei [17,18]. The UPGMA algorithm was
estimates of the rarity of a DNA prole are not used to provide a graphical representation of genetic
affected to any consequence whether substructure distance data so that inferences of relationships may
effects are considered or ignored. be made. All analyses were performed using the
TFPGA program kindly provided by M. Miller
(Northern Arizona University at Flagstaff).
2. Materials and methods

Samples: A total of 11 sample populations across 3. Results and discussion


Europe were analyzed. Czech data were reported by
Vanek et al. [8]; Finnish data were kindly provided by To compare sample populations for allele
Matti Karjalainen, Crime Laboratory, Helsinki, frequency differences at a locus, one might employ
Finland; Greek Cypriot data were reported by a standard contingency table analysis. Finding no
Bashiardes et al. [9]; two Italian population studies signicant differences between two allele distribu-
were used, one reported by Biondo et al. [10] (Italy tions can be meaningful for inferring that there is little
1) and the other reported by Garofano et al. [11] (Italy difference between the data sets and thus, there will be
2); Slovene data were reported by Drobnic et al. [12]; little difference on multiple locus prole frequency
two Spanish population studies were used, one as estimates using either data set. However, an observa-
reported by Arce et al. [13] (Spain 1) and the other tion of a signicant difference may not be meaningful.
reported by Entrala et al. [14] (Spain 2); Swiss data For even moderately large sample sizes, such as those
were reported by Gehrig et al. [15]; Turkish data were used in this study, contingency table tests exhibit
kindly provided by Bulbin Akbasak, Inonu Univer- extreme sensitivity to small perturbations, such that
sity, Malatya, Turkey, and Dennis Reeder, Applied the null hypothesis is rejected even if the difference is
Biosystems, Foster City, CA; the United States of little consequence [19]. Traditional population
Caucasian data were reported by Budowle, et al. [5]. genetic approaches that describe the amount of
The STR loci typed in each sample population and the heterogeneity among populations are much more
number of individuals typed are listed in Table 1. The informative than are contingency table tests. Para-
minimum sample size at a locus is at least N 150; meters, such as FST, can be a better estimate of popu-
thus, sufcient allelic data are available in these 11 lation diversity. The main purpose of this study is to
population subgroups to make inferences for forensic calculate realistic values of u for the CODIS core STR
B. Budowle, R. Chakraborty / Legal Medicine 3 (2001) 2933 31

Table 1
The loci typed and number of individuals typed per locus in each European population sample

Population D3S1358 vWA FGA D8S1179 D21S11 D18S51 D5S818 D13S317 D7S820 CSF1PO TPOX TH01 D16S539

Czech Republic 201 201 201 201 201 201 201 201
Finland 469 469 469 469 469 469 469 469
Cyprus (Greek) 152 152 152 152 152 152 152 152 152
Italy 1 618 618 618 618 618 618 618 618 618 618 618 618 618
Italy 2 223 223 223 223 223 223 223 223 223 223 223 223 223
Slovenia 321 321 321 321 321 321 321 321
Spain 1 401 401 401 401 401 401 401 401 401
Spain 2 171 171 171 171 171 171 171 171 171
Switzerland 206 206 206 206 206 206 206 206 206 206 206 206 206
Turkey 198 198 198 198 197 198 198 198 198 198 198 198 198
US Caucasian 203 196 196 196 196 196 195 196 203 203 203 203 202

loci in Europeans, and thereby determine if the NRC tions is higher than that reported for nine North Amer-
II recommendations are conservative. ican Caucasian data sets (u 20.0005), but is
The 11 European population groups represent expected. The greater amalgamation of European
sampling from a wide geographic area of Europe. populations in the United States should reduce the
The average heterozygosity ranges from 0.768 (Italy value. Regardless, a u value of 0.01 is conservative
1) to 0.817 (Spain 2) and is shown in Table 2. for forensic applications. Furthermore, the effect of u
All common alleles are observed in all European is of little consequence for forensically relevant popu-
samples in this study and the degree of polymorphism lations and DNA prole frequency estimates. Even so,
is similar (data not shown). Thus, it can be anticipated some may argue that the population substructure effect
that the value of u will be low among the 11 European should be evaluated with adjustments based on compu-
sample populations. Table 3 displays the u values for tations of the conditional probability given that the
the 13 STR loci in the 11 European population groups. prole has been observed in the suspect [20,21]. The
The estimate over all 13 STR loci (u 0.0028) is need for a conditional probability logically applies
much smaller than the value of 0.01 recommended
by the National Research Council [2].
Table 3
The estimate for the 11 European sample popula-
FST estimates a for the 11 European population samples

Table 2 Locus FST


Average heterozygosity for each European population sample
D3S1358 0.0049
Average Average VWA 0.0022
heterozygosity heterozygosity FGA 0.0029
(Direct count) (Unbiased estimate) D8S1179 0.0042
D21S11 0.0017
Czech Republic 0.8165 0.8139 D18S51 0.0021
Finland 0.7980 0.8032 D5S818 0.0025
Cyprus (Greek) 0.8136 0.8145 D13S317 0.0001
Italy 1 0.7699 0.7684 D7S820 0.0027
Italy 2 0.7685 0.7848 CSF1PO 2 0.0004
Slovenia 0.8244 0.8168 TPOX 0.0011
Spain 1 0.8132 0.8117 TH01 0.0074
Spain 2 0.8129 0.8173 D16S539 0.0040
Switzerland 0.7724 0.7868 FST Over all Loci 0.0028
Turkey 0.7451 0.7795 Jacknife over all loci 0.0028 ^ 0.0006
US Caucasian 0.7852 0.7831
a
FST estimated according to Weir and Cockerham [16].
32 B. Budowle, R. Chakraborty / Legal Medicine 3 (2001) 2933

only when the true contributor of the prole belongs to


the same subpopulation as the suspect. However, when
employing a conservative value for u of 0.01, there still
is little impact on the estimate of the rarity of a prole.
For example, Chakraborty et al. [22] showed that even
when considering the upper 95% condence limit of
the most common 13 STR locus prole using the Swiss
database, the rarity of the prole changes from 1 in
127 10 9 to 1 in 89 10 9 if the conditional probability
is used. Therefore, the assumption of independence has
little practical consequence on such estimates. The
nominal difference on prole frequency estimates logi-
cally would decrease more so if a more realistic u of Fig. 1. UPGMA tree for European subpopulation afnities based on
0.0028 was used for Europeans, and there is no notice- the 13 CODIS core STR loci.
able difference in U.S. Caucasians. One should not
infer from the use of a u of 0.01 that the degree of
inbreeding in Europeans is high. The value is an that of the Foreman et al. [6] study could be due to the
upper bound conservative estimate for forensic appli- fact that they merged the TH01 9.3 and 10 alleles in
cations only. their analysis. This may reduce population differences
Foreman et al. [6] reported similar estimates (as in at the TH01 locus which may be notable especially in
the current study) for u in English Caucasians for four light of the nding of Biondo et al. [10] who observed
STR loci: FGA, TH01, vWA, and D21S11. Interest- a cline in the frequency of the TH01 10 allele from
ingly, they found the locus D21S11 the least differ- north to south Italy.
entiated and the vWA locus to be the most The allele frequency data also can show the evolu-
differentiated of the loci among the populations tionary relationship among European Caucasians. A
studied. In the current European study, among the tabulation of distance measures described by Nei
same four STR loci, the D21S11 locus also had the [17,18] is shown in Table 4. All distance measures
lowest u value (u 0.0017). In contrast, the TH01 are small. The UPGMA tree shows a graphical repre-
locus, not vWA, has the highest u value in our 11 sentation of genetic distance data (Fig. 1). Although
European population study (u 0.0074). The differ- the distances among the Europeans are small, certain
ence in these results between our current study and groups still can be seen to cluster, consistent with their

Table 4
Nei's [18] unbiased distance measures for the 11 European population samples

Czech Finland Cyprus Italy 1 Italy 2 Slovenia Spain 1 Spain 2 Switzerland Turkey US
Republic (Greek) Caucasian

Czech Republic *****


Finland 0.0202 *****
Cyprus (Greek) 0.0277 0.0388 *****
Italy 1 0.0183 0.0408 0.0110 *****
Italy 2 0.0177 0.0304 0.0105 0.0000 *****
Slovenia 2 0.0006 0.0243 0.0127 0.0115 0.0102 *****
Spain 1 0.0143 0.0319 0.0166 0.0044 0.0042 0.0093 *****
Spain 2 0.0169 0.0295 0.0170 0.0019 0.0074 0.0122 2 0.0009 *****
Switzerland 0.0114 0.0222 0.0170 0.0058 0.0048 0.0074 0.0022 2 0.0003 *****
Turkey 0.0324 0.0298 0.0098 0.0075 0.0060 0.0245 0.0153 0.0134 0.0143 *****
US Caucasian 2 0.0010 0.0111 0.0247 0.0033 0.0034 0.0012 0.0035 0.0038 0.0022 0.0145 *****
B. Budowle, R. Chakraborty / Legal Medicine 3 (2001) 2933 33

ethnohistory (and geographic location). The two Report No. RR 803. Birmingham, England: British Crown
Italian groups fall into one clade, as do the Turks Copyright, 1999.
[8] Vanek D, Roman H, Budowle B. Czech population data on ten
and Greek Cypriots, the Czechs and Slovenes, and short tandem repeat loci of SGM Plus STR system kit using
the two Spanish sample populations. Furthermore, DNA puried in FTAe cards. Forens Sci Int 2001 (in press).
the Finns are the most distant. [9] Bashiardes E, Manoli P, Budowle B, Cariolou MA. Data on
In conclusion, there is little differentiation among nine STR loci used for forensic and paternity testing in the
European populations for the 13 CODIS core STR Greek Cypriot population of Cyprus. Forens Sci Int 2001 (in
press).
loci. Whether independence is assumed or an adjust- [10] Biondo R, Spinella A, Montagna P, Walsh S, Holt C, Budowle
ment for substructure using u is employed is of little B. Regional Italian allele at nine short tandem repeat loci.
practical consequence for estimating a multiple locus Forens Sci Int 2001;115:9598.
DNA prole frequency. If u is used, a value of 0.01 is [11] Garofano L, Pizzamiglio M, Vecchio C, Lago G, Floris T,
very conservative for Europeans and is even more D'Errico G, Brembilla G, Romano A, Budowle B. Italian
population data on thirteen short tandem repeat loci: THO1,
conservative for US Caucasians. These results are D21S11, D18S51, VWA, FGA, D8S1179, TPOX, CSF1PO,
consistent with the theory of Li and Chakravarti D16S539, D7S820, D13S317, D5S818, D3S1358. Forens Sci
[23] showing that, for realistic models of population Int 1998;97:5360.
heterogeneity, the use of the product rule for calculat- [12] Drobnic K, Regent A, Budowle B. STR data for the
ing DNA prole frequencies is conservative when AmpFlSTR SGM Plus from Slovenia. Forens Sci Int
2001;115:107109.
population substructure is present but ignored. [13] Arce B, Heinrichs B, Armenteros MF, Carrasco F, Lorente JA,
Budowle B. Spanish population data on nine STR loci. J
Forens Sci 2001 (in press).
Acknowledgements [14] Entrala C, Lorente M, Lorente JA, Alvarez JC, Moretti T,
Budowle B, Villanueva E. Fluorescent multiplex analysis of
This is publication number 01-07 of the Laboratory nine STR loci and the amelogenin locus: Spanish population
Division of the Federal Bureau of Investigation. data. Forens Sci Int 1998;98:179183.
Names of commercial manufacturers are provided [15] Gehrig C, Hochmeister M, Borer UV, Budowle B. Swiss
Caucasian population DNA data for 13 STR loci using
for identication only, and inclusion does not imply AmpFlSTR Proler Plus and Coler PCR amplication kits.
endorsement by the Federal Bureau of Investigation. J Forens Sci 1999;44:10351038.
[16] Weir BS, Cockerham CC. Estimating F-statistics for the analy-
sis of population structure. Evolution 1984;38:13581370.
References [17] Nei M. Genetic distance between populations. Am Natural
1972;106:283292.
[1] Budowle B, Moretti TR, Niezgoda SJ, Brown BL. CODIS and [18] Nei M. Estimation of average heterozygosity and genetic
PCR-based short tandem repeat loci: law enforcement tools. distance from a small number of individuals. Genetics
Second European Symposium on Human Identication 1998, 1978;89:583590.
Madison, Wisconsin: Promega Corporation, 1998. pp. 7388. [19] Rudas T, Clogg CC, Lindsey BG. A new index of t based on
[2] National Research Council II Report. The evaluation of foren- mixture methods for the analysis of contingency tables. J R
sic evidence. Washington, DC: National Academy Press, Stat Soc Series B 1994;56:623639.
1996. [20] Balding DJ, Nichols R. DNA prole match probability calcu-
[3] Wright S. Coefcients of inbreeding and relationship. Am lation: how to allow for population stratication, relatedness,
Natural 1922;56:330338. database selection and single bands. Forens Sci Int
[4] Wright S. The interpretation of population structure by F- 1994;64:125140.
statistics with special regard to systems of mating. Evolution [21] Weir BS. DNA match and prole probabilities. Forens Sci
1965;19:395420. Comm 2001; January Volume 3. http://www.fbi-gov/
[5] Budowle B, Shea B, Niezgoda S, Chakraborty C. CODIS STR programs/1ab/fsc.
Loci Data from 41 Sample Populations. J Forens Sci [22] Chakraborty R, Stivers DN, Su B, Zhong Y, Budowle B. The
2001;46:2965. utility of STR loci beyond human identication: implications
[6] Foreman LA, Lambert JA, Evett IW. Regional genetic varia- for the development of new DNA typing systems. Electro-
tion in Caucasians. Forens Sci Int 1998;95:2737. phoresis 1999;20:16821696.
[7] Foreman LA, Lambert JA. Genetic differentiation within and [23] Li CC, Chakravarti A. DNA prole similarity in a subdivided
between four UK ethnic groups. Forensic Science Service population. Hum Hered 1994;44:100109.

You might also like