You are on page 1of 7

Molecular Ecology Resources (2016) 16, 694700 doi: 10.1111/1755-0998.

12496

POLYPATEX:
an R package for paternity exclusion in
autopolyploids
ALEXANDER B. ZWART,* CAROLE ELLIOTT, TARA HOPLEY, DAVID LOVELL and
ANDREW YOUNG
*CSIRO Data61, GPO Box 664, Canberra, ACT 2601, Australia, CSIRO National Facilities and Collections, GPO Box 1600,
Canberra, ACT 2601, Australia, Queensland University of Technology (QUT), GPO Box 2434, Brisbane, QLD 4001, Australia

Abstract
Microsatellite markers have demonstrated their value for performing paternity exclusion and hence exploring mating
patterns in plants and animals. Methodology is well established for diploid species, and several software packages
exist for elucidating paternity in diploids; however, these issues are not so readily addressed in polyploids due to
the increased complexity of the exclusion problem and a lack of available software. We introduce POLYPATEX, an R
package for paternity exclusion analysis using microsatellite data in autopolyploid, monoecious or dioecious/bisex-
ual species with a ploidy of 4n, 6n or 8n. Given marker data for a set of offspring, their mothers and a set of candidate
fathers, POLYPATEX uses allele matching to exclude candidates whose marker alleles are incompatible with the alleles
in each offspringmother pair. POLYPATEX can analyse marker data sets in which allele copy numbers are known (geno-
type data) or unknown (allelic phenotype data) for data sets in which allele copy numbers are unknown, compar-
isons are made taking into account all possible genotypes that could arise from the compared allele sets. POLYPATEX is
a software tool that provides population geneticists with the ability to investigate the mating patterns of autopoly-
ploids using paternity exclusion analysis on data from codominant markers having multiple alleles per locus.
Keywords: allele matching, microsatellite, pollen dispersal, polyploid
Received 22 July 2014; revision received 13 November 2015; accepted 19 November 2015

genotypes are available for a given set of unique alleles.


Introduction
For example, a hexaploid (6n) with two alleles (AB) can
Population genetics has utilized microsatellite molecular have five different genotypes (AAAAAB, AAAABB,
markers to understand the patterns of pollen and seed AAABBB, AABBBB, ABBBBB), and this ambiguity makes
dispersal and recruitment in plants as well as mating pat- the search for parents of offspring extremely complex.
terns in animals, to provide information to effectively The incidence of polyploidy in plant populations is more
manage populations, either from a conservation or eradi- common and widespread than initially estimated (Otto
cation perspective (Ouborg et al. 1999). Microsatellites are & Whitton 2000; Wood et al. 2009); however, few popula-
a suitable genetic marker system for studying population tion genetic studies have been conducted on these organ-
genetics as they are mostly neutral and highly variable isms due to the complexity of analysing this type of data.
(Selkoe & Toonen 2006). These markers have proven to be Several software packages can deal with autopolyloid
ideal for determining parentage within and among popu- data in terms of data manipulation, determining genetic
lations (Ashley 2010) however, assigning parentage distance and population allele frequencies (e.g. POLYSAT-
requires knowledge of a candidate parents genotype Clark & Jasieniuk 2011; SPAGEDI-Hardy & Vekemans 2002;
including knowledge of the exact number of allelic copies GENODIVE-Meirmans & Van Tienderen 2004; POPDIST-Guld-
present across all loci. For diploid species, this is straight- brandtsen et al. 2000 and AUTOTET-Thrall & Young 2000).
forward and much of the theory and software has been While programs are available to assign paternity or
developed for this ploidy level (Jones et al. 2010). parentage to individuals of a ploidy level greater than
Determining polyploid genotypes when allele copy diploid, they have limitations. ORCHARD (Spielmann et al.
numbers are unknown is more difficult, as multiple 2015) is limited to tetraploids. COLONY (Jones & Wang
2002), FAMOZ (Gerber et al. 2003) and the SAS scripts of
Correspondence: Alexander Zwart, Fax: +61 2 62167007; Riday et al. (2013, 2015) convert data to pseudodiploid-
E-mail: Alec.Zwart@csiro.au dominant data before analysis. This is because of the

2015 John Wiley & Sons Ltd


POLYPATEX: P A T E R N I T Y E X C L U S I O N I N A U T O P O L Y P L O I D S 695

ambiguity in determining allele frequencies when deal- This advantage also applies to similar phenotypic
ing with polyploid allelic phenotype data (Jones et al. situations (i.e. tetraploid offspring BC and mother AB;
2010), unless the data sets are treated as dominant candidate excluded with POLYPATEX and included with
instead of codominant markers (e.g. Riday et al. 2013, presence/absence method CEFG; candidates not
2015; Wang & Scribner 2014), as allele frequencies are the excluded with either method CE or CFG). Therefore, the
foundation for all likelihood-based methods. Wang & genotype and gamete simulations implemented in POLY-
Scribner (2014) transform polyploid codominant geno- PATEX provide a robust basis for refining the level of
types from microsatellite data to pseudodiploid-domi- detail to which candidates can be excluded.
nant genotypes (presence/absence only) and
demonstrate how useful this conversion can be, when The genetic model
coupled with the program COLONY (Jones & Wang 2002),
for determining parentage and sibship in polyploids. POLYPATEX assumes the following genetic model, which is

However, this transformation results in the loss of valu- based on Mendelian rules of inheritance, to exclude can-
able information, and with increasing polyploidy levels, didates as potential fathers using incompatibilities
there is a decrease in parentage assignment and exclu- between parent and offspring pairs. In an autopolyploid
sion accuracies (Wang & Scribner 2014). species of ploidy p, at a given locus,
POLYPATEX is an R (R Core Team 2015) package for con-
p/2 of the p alleles in the offspring are assumed to have
ducting paternity exclusion analysis in autopolyploid been selected, without replacement, from the mothers
monoecious, dioecious or bisexual species having ploidy p alleles (the maternal gamete).
level p of 4n, 6n or 8n. Developed in the context of The remaining p/2 of alleles in the offspring are
microsatellite data, POLYPATEX can also be used for other assumed to have been selected, without replacement,
codominant markers having multiple alleles per locus, from the fathers p alleles (the paternal gamete).
for example allozymes. POLYPATEX is not optimized for The relationship between mother and offspring is
data sets having very large numbers of loci, such as SNP known, and the aim of the analysis was to determine
data. For plants, self-compatible (i.e. mother included as which of one or more candidate fathers is capable of
a candidate father) and self-incompatible breeding sys- producing a paternal gamete compatible with the alle-
tems are options in the algorithm. POLYPATEX applies allele les in the offspringmother pair.
matching at each locus to determine whether a candidate Comparisons are made on the basis of allele presence/
fathers allele set (the (up to p) alleles observed at the absence only POLYPATEX does not use population allele
given locus) are compatible with the corresponding allele frequencies to compute likelihoods for paternity.
sets in an offspringmother pair. POLYPATEX can analyse
marker data sets in which allele copy numbers are To allow for the possibility of phenomena such as
known (genotype data) or unknown (allelic phenotype double reduction that violate the above model (as well as
data). For data sets in which allele copy numbers are the inevitable genotyping errors), POLYPATEX functions
unknown, POLYPATEX considers all possible genotypes for potentialFatherIDs and potential
each locus arising from the observed allele sets in candi- FatherCounts include an argument
date father, offspring and mother, to determine whether mismatches, that can be used to specify a maximum
a match is possible. Per-locus comparisons are then sum- number of mismatching (and nonmissing) loci between
marized across all loci, resulting in tables giving the candidate father and offspring that are allowed, before
identification (ID) and total counts of nonexcluded can- the candidate is excluded as a potential father. The
didate fathers (which we refer to as potential fathers) for default is to allow no mismatching loci in a nonexcluded
each offspring. candidate.
The advantage of the POLYPATEX algorithm over a sim-
pler presence-/absence-based algorithm is that all avail-
Data input format, loading and preprocessing
able information is utilized to exclude candidates. For
example, consider a tetraploid where the offspring has POLYPATEX functions require data to be presented as a
the genotype ABCE and the mother ABCD, with one can- table with one row per individual (mother, candidate
didate father having the genotype EFGG, and another father or offspring). Initial columns contain individual
CEGG. In this situation, the simpler presence/absence IDs, population identifier, ID of each offsprings mother
algorithm would exclude neither candidate because both and for dioecious species, adult gender. For ploidy p and
candidates contain the allele not found in the mother k observed loci, k further blocks of p columns each con-
(E). However, using the POLYPATEX algorithm, the first tain the allele labels. Cells in these k blocks should be left
candidate would be excluded, because it could not have blank as necessary when fewer than p alleles were
contributed a full gamete (two alleles) to the offspring. observed.

2015 John Wiley & Sons Ltd


696 A . B . Z W A R T E T A L .

POLYPATEX provides the function inputData to p observed alleles. The locus genotype is known when p
load the data from a comma-separated value (CSV) for- alleles are observed, and when only one allele is
matted file into R, storing the data set as an R data frame. observed, the locus genotype can be inferred as compris-
inputData passes this dataframe to function ing p copies of that allele. But for 2 to p 1 observed alle-
preprocessData, which performs a number of les, the phenotype gives rise to more than one possible
checks and preprocessing steps to help ensure the valid- genotype. In the allele-matching process, therefore, the
ity of the data set for analysis by other POLYPATEX func- possible combinations of genotypes arising from off-
tions. In particular, preprocessData checks for spring, mother and candidate phenotypes must be
mismatches at each locus between mother and offspring. searched for genotype combinations that allow a match,
These arise when the mothers allele set cannot (in the before a candidate can be claimed as a possible match for
absence of, say, a mutation) generate a gamete compati- the offspring. For efficiency reasons, this search is imple-
ble with the offsprings allele set for that locus. The user mented in phenotPPE using lookup tables, for each
may specify the maximum number of such mismatches ploidy covered by POLYPATEX (4n, 6n and 8n at time of
(0 to k 1) that are allowed before the offspring is publication).
removed from the data set. When an offspring is not In either routine, when one or more of the allele sets
removed, loci that mismatch with its mother are set to in offspring, mother and candidate father are missing,
contain no alleles (i.e. they are set to be missing) in the comparisons cannot be made at that locus, so the
offspring. The default is to remove offspring containing affected locus is subsequently ignored for that trio of
any mismatches with their mothers. Motheroffspring individuals.
mismatch details are reported to the users so that they In the situation where more than one candidate father
can investigate these cases for genotyping errors and is the potential donor for an offspring, the user needs to
check whether the problem may lie in a mothers allele consider the method best suited for their research ques-
sets rather than in those of her offspring. tion and species when handling an exclusion analysis
For genotype data in a species of ploidy p, POLYPATEX that is unresolved. Some options include the following:
requires every allele set to contain exactly p alleles, or (i) considering the offspring derived from that group of
none. When preprocessData finds nonmissing candidate fathers without a definitive level of exclusion;
allele sets with fewer than p alleles in genotype data sets, (ii) fractionally assign males an equal proportion of an
they are reported to the user, then are set to be missing. offspring (Goto et al. 2004); (iii) choose the closest male
In allelic phenotype data sets, there is no requirement to (plants or other sessile organisms only) (Hardesty et al.
observe exactly p unique alleles, so this adjustment is 2006); or (iv) fractionally assign to a male (sessile organ-
only relevant to genotypic data. isms only) based on their location/distance from mother.
After these and other checks, preprocessData For a general review of methods of parentage analysis,
removes individuals from the data set having fewer than see Jones et al. (2010).
a user-specified minimum number (lociMin) of non-
missing allele sets. When a mother is so removed, all of
Exclusion analysis results summaries and
her offspring are also removed from the data frame.
export
Results from either exclusion routine are returned in an R
Paternity exclusion
list structure, whose contents are explained further in the
Two functions are provided by POLYPATEX for performing POLYPATEX documentation. Two further POLYPATEX func-
paternity exclusion, one for analysing genotype data tions provide more convenient summarizations of the
(genotPPE) and one for allelic phenotype data per-locus exclusion results across loci. Function
(phenotPPE). potentialFatherCounts returns an R data
For comparison of genotypes, genotPPE partitions frame with columns containing offspring ID, correspond-
each offspring allele set into alleles that also appear in ing mother ID and the number of candidates flagged as
the mother, and alleles that do not (and hence must be potential fathers for each offspring. Function
provided by the father). For a candidate not to be potentialFatherIDs provides a similar data
excluded as a potential father, it must account for all of frame listing the IDs of these potential fathers. Both func-
the latter alleles, plus as many of the shared alleles as is tions include the argument mismatches, that allows
needed to make a complete gamete of p/2 alleles. Func- a maximum number of mismatching loci between candi-
tion genotPPE takes proper account of allele multi- date father and offspring before the candidate are
plicities in making these comparisons. excluded as a potential father. Results from the two sum-
In the more common case of allelic phenotype data, a mary functions can be exported to file using R functions
nonmissing allele set at a given locus may consist of 1 to such as write.csv.

2015 John Wiley & Sons Ltd


POLYPATEX: P A T E R N I T Y E X C L U S I O N I N A U T O P O L Y P L O I D S 697

Table 1 The first twelve lines of the output table produced by FR_Genotype.csv contains data from seven loci
potentialFatherIDs. For each progeny, each potential in a tetraploid, dioecious species, Salix cinerea (Hopley
father identified by the algorithm is listed, along with the num- 2011). Appendix 1 shows example R code to load the alle-
ber of loci at which a match was made (FLCount Father Locus
lic phenotype data set into R, perform the exclusion anal-
Count), and the total number of loci at which a valid compar-
ison was possible (VLTotal Valid Loci Total). NA is Rs code ysis, and output results from the summary functions
for a missing datum, and appears in columns FLCount and potentialFatherCounts and potential
VLTotal when no potential father has been identified for a given FatherIDs to CSV files for scrutiny in a spreadsheet
offspring application. The code in Appendix 1 assumes that the
data file has been copied to a suitable working directory,
Progeny Mother PotentialFather FLCount VLTotal
and that the working directory of the R session has been
GF1-2310 GF1 None NA NA set to this directory prior to running the code.
GF1-2311 GF1 None NA NA The first 12 lines of the table produced by
GF1-2315 GF1 GF21 6 7 potentialFatherIDs are shown in Table 1.
GF1-2315 GF1 GF6 7 7
GF1-2316 GF1 GF14 7 7
GF1-2316 GF1 GF23 6 7 Performance testing and simulations
GF1-2317 GF1 GF21 6 6
We compared the performance of POLYPATEX (i.e. exclu-
GF1-2317 GF1 GF6 6 6
GF3-2337 GF3 GF2 7 7 sion based) against COLONY (i.e. likelihood based) using
GF3-2338 GF3 GF14 7 7 the phenotypic, autohexaploid example data set of
GF3-2339 GF3 None NA NA E. glabra. We converted the data to pseudodiploid-domi-
GF3-2341 GF3 GF13 7 7 nant genotypes following Wang & Scribner (2014) and
analysed it in COLONY (version 2.0.5.9; Jones & Wang
2002) using several different parameter settings. The type
of analysis method was full-likelihood with polyga-
Example
mous, inbreeding and monoecious as the core param-
The POLYPATEX R package contains two example eters. The data set contained 95 loci, and we defined the
microsatellite data sets in the required input file format. known maternal sibships. POLYPATEX was run as per
Once POLYPATEX is installed, the following R command Appendix 1, except that the number of mismatches
will print out the location of the files: allowed in the father was set to zero.
> system.file(extdata, package = We found that both programs produced similar
PolyPatEx) results for paternity assignments of the majority of the
File GF_Phenotype.csv contains data offspring tested (Table 2). As expected, POLYPATEX was
from seven loci in a hexaploid, monoecious species, more accurate in certain situations (described above),
Eremophila glabra ssp. glabra (Elliott 2010). File whereas COLONY inferred father candidates that could not

Table 2 The number of offspring from the Eremophila glabra example data set (n = 40) that fall into the following paternity assignment
groups: (i) selfing, maternal plant included in the paternity list; (ii) single father, only one father listed; (iii) multiple fathers, more than
one candidate available to sire the offspring; and (iv) no fathers, no candidate fathers assigned from the list of potential fathers, based
on the outcome of POLYPATEX (PPE) and COLONY analysis. COLONY was run at four probabilities that the father was included in the candida-
ture file (pr = 0.20, pr = 0.50, pr = 0.80 and pr = 0.95). The results of COLONY are split into individuals that had the same outcome as PPE
and those that were different to the PPE outcome within each probability class. The ranges in likelihoods from COLONY for each group
(parent pair or maternal only) are in parenthesis (pr = 1.00 if none presented)

Male Outcome compared multiple


Program probability to PPE Selfing Single father fathers No fathers

POLYPATEX 1 6 7 26
COLONY 0.20 Same 1 3 (0.991.00) 4* 23 (0.991.00)
Different 4 5
0.50 Same 1 3 (0.991.00) 4* 23 (0.911.00)
Different 4 5
0.80 Same 1 3 (0.991.00) 4* 23 (0.641.00)
Different 4 (0.991.00) 5 (0.961.00)
0.95 Same 1 4 (0.571.00) 6* 21 (0.551.00)
Different 1 5 (0.861.00) 2 (0.901.00)

*COLONY listed only most likely father (pr = 1.00), but it was one of the fathers listed in PPE.

2015 John Wiley & Sons Ltd


698 A . B . Z W A R T E T A L .

Fig. 1 Simulation results showing the


effect of varying the number of loci
observed on the levels of exclusion of can-
didate fathers for allelic phenotype (solid
line) and genotype (dashed lines) data
sets. Tetraploid (4n) data set presented on
the left and hexaploid (6n) data set pre-
sented on the right. The number of possi-
ble alleles at each locus was fixed at five
for these simulations.

Fig. 2 Simulation results showing the


effect of varying the numbers of alleles
per locus on the levels of exclusion of can-
didate fathers, for allelic phenotype (solid
line) and genotype (dashed lines) data
sets. Tetraploid (4n) data set presented on
the left and hexaploid (6n) data set pre-
sented on the right. The number of loci
observed was fixed at five for these simu-
lations.

be candidates based on the original autohexaploid phe- inclusive when there were multiple candidate fathers
notype or did not infer a potential paternal relationship available for an offspring, despite the user needing to
when POLYPATEX did. In addition, POLYPATEX was more decide how to handle these unresolved exclusions. In

2015 John Wiley & Sons Ltd


POLYPATEX: P A T E R N I T Y E X C L U S I O N I N A U T O P O L Y P L O I D S 699

this situation, COLONY provides a likelihood measure to


Obtaining POLYPATEX
infer the most likely father, but the majority of these
were listed as a single parent pair (pr > 0.99) and there POLYPATEX is available from CRAN, the Comprehensive R
was no indication of the occurrence of alternate fathers Archive Network (https://cran.r-project.org/) or one of
with lower likelihood measures. The differences between its regional mirrors. Assuming connection to the internet,
the two programs arise because of (i) the transformation POLYPATEX and its dependency, package GTOOLS (Warnes
of codominant polyploid data to pseudodiploid-domi- et al. 2015), can be installed by entering the following in
nant data required by COLONY, (ii) the probabilistic infer- the R console (> indicates the R console prompt and
ence of relationships (parentage or sibling) by COLONY on should not be typed):
this type of data, and (iii) the full compatibility of pheno- > install.packages(PolyPatEx)
types or genotypes between offspring and parent by POLYPATEX comes with a detailed user manual, which
POLYPATEX. can be accessed from the R HTML help system, or via the
We conducted a set of simulations to examine the following R command:
effect of loci number and number of alleles per locus on > vignette(PolyPatEx-vignette,
the generation of accurate paternity exclusion. As noted package=PolyPatEx)
by Wang & Scribner (2014), many combinations of
parameters can be explored in such simulation studies
and it is impractical to consider all of them. Drawing on References
Wang & Scribner (2014), we simulated allele data sets for
Ashley MV (2010) Plant parentage, pollination, and dispersal: how DNA
a dioecious species, containing 5 mothers, 10 progeny
microsatellites have altered the landscape. Critical Reviews in Plant
per mother and 30 candidate fathers. Alleles at each Sciences, 29, 148161.
locus in mothers and candidate fathers were simulated Clark LV, Jasieniuk M (2011) POLYSAT: an R package for polyploid
following a triangular distribution, where the probability microsatellite analysis. Molecular Ecology Resources, 11, 562566.
Elliott CP (2010) Patterns and processes: ecological and genetic function of frag-
of occurrence at a locus of allele i is pi = i/(k(k + 1)/2), k mented Emu bush (Eremophila glabra ssp. glabra) populations. Ph.D. Thesis,
being the total number of alleles that can occur at the The Australian National University, Canberra.
locus (and was assumed the same for all loci). Genotypic Gerber S, Chabrier P, Kremer A (2003) FAMOZ: a software for parentage
allele data for each offspring were generated by mating a analysis using dominant, codominant and uniparentally inherited
markers. Molecular Ecology Notes, 3, 479481.
randomly chosen candidate father with the relevant Goto S, Tsuda Y, Nagafuji K et al. (2004) Genetic make-up and diversity
mother, according to the Mendelian genetic model of regenerated Betula maximowicziana Regel. sapling populations in
described above. Allelic phenotype data were formed by scarified patches as revealed by microsatellite analysis. Forest Ecology
and Management, 203, 273282.
reducing the data to unique alleles only, using the POLY-
Guldbrandtsen B, Tomiuk J, Loeschcke V (2000) POPDIST, Version 1.1.1:
PATEX function convertToPhenot. For each value a program to calculate population genetic distance and identity mea-
of the parameter being investigated (number of loci, or sures. Journal of Heredity, 91, 178179.
number of alleles per locus), 100 simulated data sets Hardesty BD, Hubbell SP, Bermingham E (2006) Genetic evidence of fre-
quent long-distance recruitment in a vertebrate-dispersed tree. Ecology
were analysed by POLYPATEX, and summary statistics on
Letters, 9, 516525.
the number of candidate fathers detected were obtained. Hardy OJ, Vekemans X (2002) SPAGeDi: a versatile computer program to
In the absence of genotyping errors or violations of analyse spatial genetic structure at the individual or population levels.
the assumed genetic model, POLYPATEX does not exclude Molecular Ecology Notes, 2, 618620.
Hopley T (2011) Reproductive ecology and dispersal dynamics of the invasive
the true father of the offspring. By its nature, allelic phe-
willow, Salix cinerea, in south-eastern Australia. Ph.D. Thesis, The Aus-
notype data contain less information than genotype data; tralian National University, Canberra.
hence, a greater number of loci (Fig. 1) or more polymor- Jones OR, Wang J (2002) COLONY: a program for parentage and sibship
phic loci (Fig. 2) must be used to exclude all candidates inference from multilocus genotype data. Molecular Ecology Resources,
10, 551555.
except the true father. In turn, higher ploidy levels (e.g. Jones AG, Small CM, Paczolt KA, Ratterman NL (2010) A practical
6n) required a greater number of loci (Fig. 1), and to a guide to methods of parentage analysis. Molecular Ecology
lesser extent polymorphic loci (Fig. 2), to exclude all but Resources, 10, 630.
the true father in comparison with lower ploidy levels Meirmans PG, Van Tienderen PH (2004) GENOTYPE and GENODIVE:
two programs for the analysis of genetic diversity of asexual organ-
(e.g. 4n). This is also due to the greater amount of genetic isms. Molecular Ecology Notes, 4, 792794.
information required for higher ploidy levels compared Otto SP, Whitton J (2000) Polyploid incidence and evolution. Annual
to lower ploidy levels to isolate the true father. Interest- Review of Genetics, 34, 401437.
Ouborg NJ, Piquot Y, Van Groenendael JM (1999) Population genetics,
ingly, these simulation curves descend to the true father
molecular markers and the study of dispersal in plants. Journal of Ecol-
(i.e. 1) more rapidly in Fig. 2 than Fig. 1, suggesting a ogy, 87, 551568.
greater power of exclusion can be obtained from observ- R Core Team (2015) R: A Language and Environment for Statistical Comput-
ing more informative polymorphic loci, rather than sim- ing. R Foundation for Statistical Computing, Vienna, Austria. URL
http://www.R-project.org/.
ply a greater number of loci.

2015 John Wiley & Sons Ltd


700 A . B . Z W A R T E T A L .

Riday H, Johnson DW, Heyduk K, Raasch JA, Darling ME, Sandman JM the example in this article, R scripts to implement the sim-
(2013) Paternity testing in an autotetraploid alfalfa breeding polycross.
ulations that produced Figs 1 and 2, and the Eremophila
Euphytica, 194, 335349.
Riday H, Smith MA, Peel MD (2015) A simple model for pollen-parent glabra ssp. glabra data set (GF_Phenotype) recoded
fecundity distributions in bee-pollinated forage legume polycrosses. for use with the COLONY software.
Theoretical and Applied Genetics, 128, 18651879.
Selkoe KA, Toonen RJ (2006) Microsatellites for ecologists: a practical
Appendix 1
guide to using and evaluating microsatellite markers. Ecology Letters, 9,
615629. Example R code for paternity analysis of a hexaploid,
Spielmann A, Harris SA, Boshier DH, Vinson CC (2015) ORCHARD:
paternity program for autotetraploid species. Molecular Ecology
monoecious species. (> denotes the R command prompt,
Resources, 15, 915920. + is Rs prompt indicating continuation of a command).
Thrall PH, Young A (2000) AUTOTET: a program for analysis of autote- Arguments to potentialFatherCounts and
traploid genotypic data. Journal of Heredity, 91, 348349.
potentialFatherIDs specify that a comparison
Wang J, Scribner KT (2014) Parentage and sibship inference from markers
in polyploids. Molecular Ecology Resources, 14, 541553. must involve at least two nonmissing loci in mother, off-
Warnes GR, Bolker B, Lumley B (2015) gtools: Various R Programming spring and candidate father for it to be included in the
Tools. R package version 3.5.0. http://CRAN.R-project.org/packa- summary (VLTMin = 2), and that at most one
ge=gtools.
mismatching locus is allowed in a candidate that is
Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Riese-
berg LH (2009) The frequency of polyploid speciation in vascular flagged as a potential father (mismatches = 1). The
plants. Proceedings of the National Academy of sciences, 106, 1387513879. latter argument provides some allowance for genotyping
errors or mutations in the data set. In this example,
rather than storing the output from potential
A.Y. suggested the concept. A.Z. developed the R pack- FatherCounts and potentialFatherIDs
age. C.E. and T.H. contributed data and aided in testing as R objects, R function write.csv is used to immedi-
and suggesting improvements to the package and docu- ately export these tables as CSV files in the current work-
mentation. D.L. developed theory used in further test- ing directory.
ing/confirmation of the validity of the package. All
authors contributed to the discussions around the devel- > require(PolyPatEx)
opment of the problem and the code. > adata <- inputData(GF_Phenotype.
csv,
+ numLoci=7,
+ ploidy=6,
Data Accessibility + dataType=phenotype,
+ dioecious=FALSE,
The POLYPATEX R package (which includes documentation
+ selfCompatible=TRUE)
and example data sets) is available from the Comprehen-
> pe1 < - phenotPPE(adata)
sive R Archive Network, (https://cran.r-project.org/web/
> write.csv(potentialFatherCounts
packages/PolyPatEx) or one of its regional mirrors. The
(pe1,mismatches=1,VLTMin=2),
example data sets FR_Genotype.csv and
+ potentialFatherCounts.csv)
GF_Phenotype.csv are included with the pack-
> write.csv(potentialFatherIDs(pe1,
age, and are also archived at the Dryad Digital Repository
mismatches=1,VLTMin=2),
(http://datadryad.org/), via doi:10.5061/dryad.64482.
+ potentialFatherIDs.csv)
Also available via this DOI are an R script to implement

2015 John Wiley & Sons Ltd

You might also like