You are on page 1of 13

Stoch Environ Res Risk Assess

DOI 10.1007/s00477-013-0821-z

REVIEW PAPER

An interactive biplot implementation in R for modeling


genotype-by-environment interaction
Elisa Frutos • M. Purificación Galindo •

Vı́ctor Leiva

 Springer-Verlag Berlin Heidelberg 2013

Abstract Classical and GGE biplot methods are graphi- genotypes and interactions between genotype and envi-
cal procedures that allow multivariate data to be analyzed. ronment, but also data from other areas can be analyzed by
In particular, the GGE biplot displays the genotype main the GGEBiplotGUI package.
effect (G) and the genotype by environment interaction
(GE) in two-way data. The GGE biplot originates from data Keywords Graphical display  Least squares
graphical analysis of multi-environment trials (MET). method  Multivariate data analysis  Singular value
Thus, agronomists, crop scientists and geneticists are decomposition  Statistical models  Statistical
potential users of this method. However, it can also be used software
to visualize and analyze other types of data. In this paper,
we propose a new interactive computational implementa- Mathematics Subject Classification 62H99 
tion in R language to perform the main functions of the 62-04
classical and GGE biplot methods, so it is also useful for
MET data visual analysis. This implementation is orga-
nized in an R package named GGEBiplotGUI. This 1 Introduction and bibliographical review
package is the only interactive, noncommercial and open
source software that currently exists, offering a free alter- The biplot method is a graphical display of multivariate
native to available commercial software. In addition, it can data. This method was introduced by Gabriel (1971) in the
be used without to practically have knowledge of the R context of principal component analysis (PCA). Specifi-
programming language. Here, we present and discuss the cally, the biplot is a joint graphical representation, in a low
capabilities and features of the GGEBiplotGUI package dimensional Euclidean space (usually a plane), of a mul-
and illustrate them by using real data. The GGEBiplot- tivariate data matrix by markers for its rows and columns,
GUI package graphically addresses the questions that a chosen in such a way that the inner (or scalar) product
researcher likely asks. This R package is not only a tool for represents the elements of the data matrix. Due to the
visual data analysis of multi-environment trials, useful for properties of the inner product, the biplot is a powerful data
plant breeders and geneticists, in order to study yields from visualization tool, which can be viewed as a multivariate
extension of the scatterplot, because the biplot represen-
tation is usually performed in the two-dimensional space of
the set of real numbers. This is the classical focus of the
E. Frutos  M. P. Galindo
biplot method (classical biplot), which has two parts. First,
Departamento de Estadı́stica, Universidad de Salamanca,
Salamanca, Spain it carries out the approximation of the data matrix by a
singular value decomposition (SVD) and, second, this
V. Leiva (&) matrix is factorized in row and column markers; see Eckart
Departamento de Estadı́stica, Universidad de Valparaı́so, Avda.
and Young (1939) and Golub and Reinsch (1970).
Gran Bretaña 1111, Playa Ancha, Valparaiso, Chile
e-mail: victor.leiva@uv.cl Gower and Harding (1988), Gower (1992), and Gower
URL: http://www.deuv.cl/leiva and Hand (1996) provided a different focus of the biplot to

123
Stoch Environ Res Risk Assess

that originally proposed by Gabriel (1971). They first We focus our attention on data of genotype by environ-
ordered the individuals using scaling, and then superposed ment, which provide rich information for different purposes,
the variables in a such a way that a joint graphical inter- but we want to stress the methodology discussed in the paper
pretation is possible, as usual in biplot methods. Both of can also be applied to several other types of data, for
these representations are just descriptive, so that no para- example in geosciences; see Bacon-Shone (2009). Plant
metrical assumptions are considered. For more details geneticists often study the performance of many genotypes
about the biplot methods, see Gower and Hand (1996), in diverse environments. Such studies are conducted in order
which is recognized as the first book on this method, and to select the best genotypes for improvements of crops. The
Greenacre (2010) and Gower et al. (2011), for more recent data collected for these studies correspond to one or more
books. attributes, for each genotype in each environment. This type
Another focus different from the classical one is based of data can be analyzed in a straightforward way by means
on the relation between the approximation of least squares of analysis of variance (ANOVA) models. However, other
(LS) of matrices and the SVD. This third focus permits the models can also be used. Particularly, the additive main
biplots to be visualized as models fitted by bilinearity. effects and multiplicative interaction (AMMI) model and
These bilinear models are known as regression biplots and genotype main effects (G) and genotype 9 environment
can be interpreted as a multiplicative bilinear model, interaction effects (GE) model, known as GGE, are the most
allowing us to demonstrate in graphical form the associa- used to describe the kind of data above mentioned; see
tion, linear or not, between subjects and/or variables; see Gauch (1988). The AMMI model (AMMI biplot) has been
Gabriel (1998) and Vicente-Villardón et al. (2006). In this extensively applied in the statistical analysis of multiple-
line, a number of authors have studied the case when the environment trials (MET); see Kempton (1984), Gauch and
biplots are used to describe interaction between row and Zobel (1996, 1997), and Ebdon and Gauch (2002a, b). The
column effects in multiplicative and additive bilinear GGE model (GGE biplot) was proposed by Yan et al.
models; see, e.g., Denis (1991), Falguerolles (1995), Van (2000), allowing genotype by environment interaction (GEI)
Eeuwijk (1995), and Choulakian (1966). In order to of MET data to be visually examined. Several recent papers
approximate the biplots in the case of these last models, the have exhaustively compared and contrasted AMMI and
mentioned authors used estimation methods employed in GGE models, with respect to their suitability for genotype
generalized bilinear models, which are considered as an by environment interaction analysis; see Gauch (2006), Yan
extension of the generalized linear models; see Nelder and and Tinker (2006), Yan et al. (2007) and Gauch et al.
Wedderburn (1972). An exhaustive study on all of these (2008). In AMMI models, as multiplicative terms as needed
alternative ways of the biplot method can be seen in are incorporated in order to explain the variability of the
Cárdenas and Galindo (2003). second order interaction. Such models are based on the
Since Gabriel (1971)’s paper, the biplot method has decomposition in singular values and vectors of the residual
increasingly been used in data visualization and analysis in matrix of the associated linear model. The GGE model
diverse disciplines, being it particularly useful in exploring applies the SVD to the data, subtracting the environment
data from areas such as agricultural, ecological and environ- effects, because these biplots display both G and GE, which
mental sciences, and genetics. In these areas, the size of the are the two sources of variation relevant to cultivar evalu-
data sets often result to be large, with complicated interactions ation; see Kang (1988), Gauch and Zobel (1996) and Yan
and interconnections. These sets are typically genotype-by- and Kang (2003). This model is based on multiplicative
environment-by-trait three-way data, which provide rich linear-bilinear site regression; see Cornelius et al. (1996).
information. Such data can be organized into several types of Bibliography on biplots is generous, with many papers
two-way tables, as for example: (i) genotype 9 environment published in various scientific journals, whereas about
tables for each trait; (ii) genotype 9 trait tables for each 50,000 websites containing the word biplot are available in
environment, across subsets of environments, and across all the internet. Macros for biplot analysis have been imple-
environments; (iii) environment 9 trait tables for each mented in the main commercial statistical softwares; see,
genotype, across subsets of genotypes, and across all geno- e.g., Yan and Tinker (2006). Currently, most commercial
types; (iv) phenotype (genotype–environment combina- statistical software packages include a procedure or macro
tion) 9 trait tables; (v) genotype 9 environment-trait tables; for generating biplots. Specifically, the GGEbiplot pro-
and (vi) combined two-way tables of genotype 9 explanatory gram, dedicated to the GGE biplot (see http://www.
variables plus response variables. A full understanding of the ggebiplot.com), can also generate the classical biplot.
type of data requires knowledge of all these two-way tables. This program is a commercial software and is widely used
From the viewpoint of genotype evaluation, genotype by by agronomists, crop scientists and geneticists; see Yan and
environment and genotype by trait tables are the most relevant; Kang (2003, 2006). However, today the scientific com-
see Yan et al. (2000). munity has at its disposal a non-commercial and open

123
Stoch Environ Res Risk Assess

source software for statistics and graphs, named R, which 2.1 Classical biplots
can be obtained at no cost from http://www.r-project.org.
The statistical software R is currently very popular in the Any n 9 m matrix Y ¼ ðyij Þ of rank r can be factorized
international scientific community; see R Team (2013), and into an n 9 r matrix G and an m 9 r matrix H; both
Leiva et al. (2008) and Barros et al. (2009), for some R necessarily of rank r, that is, as
packages that have been developed. Thus, it seems to be Y ¼ GH> : ð1Þ
necessary to count with an R package that offers a non-
commercial and open source software as alternative to the Factorization given in Eq. (1) can be written in a scalar way as
existing commercial software on biplots. This R package yij ¼ g> ð2Þ
i hj ;
should address the main questions that a researcher could
ask and have an interactive characteristic, because in this for each i and j, where yij is the element in the ith row and
way it can be more attractive for practitioners that does not jth column of Y; g> i is the ith row of G and hj is the jth row
know the R programming language. of H: Factorization given in Eq. (2) assigns each vector
The aims of this work are (i) to propose a new inter- g1 ; . . .; gn (row effects) to each one of the n rows of Y; and
active computational implementation in R language to each vector h1 ; . . .; hm (column effects) to each column of
perform the main functions of the GGE and classical biplot Y: Thus, Eq. (2) provides a representation of Y by means
methods; and (ii) to illustrate it by using real data related to n ? m vectors in the r-dimensional space. For a matrix of
genotype-by-environment interaction. This implementation rank equal to two, these n ? m vectors may be plotted in
is incorporated by an R package named GGEBiplotGUI, the plane, giving a representation of the n m elements of Y;
which is a free alternative to the commercial software by means of the inner product of the corresponding row
created by Yan and Tinker (2006) and that can be down- effect and column effect vectors. Such a plot is referred to
loaded from http://cran.r-project.org. The GGEBiplot- as a biplot, since it allows row effects and column effects to
GUI package is a graphical interface that allows us to use it be jointly plotted; for more details, see Gabriel (1971).
without necessity of having a knowledge of the R language, Matrices with ranks greater than two cannot be repre-
which can do this package very attractive for diverse sented exactly by a biplot. However, if a matrix Y can be
practitioners. The different functions of this package are approximated by a matrix of rank equal to two, say Y ð2Þ ;
organized by means of menu options that allows them to be the inner products of the row and column effects can
used in a friendly and easy way. Although the main approximate the elements of Y:
objective of the GGEBiplotGUI package is to construct To approximate any n 9 m matrix Y of rank r by an
and manipulate GGE biplots, it is also possible to construct n 9 m matrix of smaller rank, one may use the SVD,
by this package AMMI biplots, only selecting the correct which can, in this case, be written as
model, as well as the classical biplot. In addition, the X r

GGEBiplotGUI package can be used to analyze geno- Y ¼ UKV > ¼ kk u k v >


k; ð3Þ
k¼1
type-by-environment data because it includes: (i) mega-
environment analysis; (ii) test-environment evaluation; and where U ¼ ðu1 ;...;ur Þ;V ¼ ðv1 ;...;vr Þ and K ¼ ðk1 ;...;kr Þ;
(iii) genotype evaluation. In order to illustrate the capa- with vectors uk and v>k being called left-singular and right-
bilities and features of the GGEBiplotGUI package, we singular vectors, respectively, and kk being the singular
use the proposed implementation to analyze real agricul- values. For each k ¼ 1;...;r; the singular value kk and the
tural and genomic data. singular vectors uk and vk are chosen to satisfy
The paper is organized as follows. In Sect. 2, we provide u>
k Y ¼ kk v k
the theoretical background of this work. In Sect. 3, we
Yvk ¼ kk uk
propose and discuss the new computational implementa-
tion in R language for biplot methods. In Sect. 4, we per- YY > uk ¼ k2k uk
form the empirical application of the proposed Y > Yvk ¼ k2k vk ;
computational implementation by using real data. Finally,
in Sect. 5, we provide some conclusions. with k1  k2      kr ; such that u> >
k ul ¼ vk vl ¼ dkl ;
where dkl denotes the Kronecker delta. Note that, by using
the LS method, expression given in Eq. (3) can be written as
X
s

2 Background Y ðsÞnm ¼ U ðsÞns KðsÞss V >


ðsÞsm ¼ kk uk v>
k; ð4Þ
k¼1

In this section, we provide an overview on classical and which corresponds to the best approximation for Y of rank
GGE biplot methods. equal to s (see Householder and Young 1938), that is,

123
Stoch Environ Res Risk Assess

expression in Eq. (4) provides an n 9 m matrix, say to categorical data. The rules for the interpretation of the HJ
M ¼ ðmij Þ; of rank s that minimizes k Y  M k2 ¼ biplot are a combination of the rules used in classical bi-
Pn Pm 2 2 2 2 plots, correspondence analysis, factor analysis and multi-
i¼1 j¼1 ðyij  mij Þ : Because k Y k ¼ k1 þ    þ kr ; the
dimensional scaling techniques. Specifically, we have that:
goodness of fit of the model to the data can be measured by
P P
( sk=1k2k / rk=1k2k ) 100 %. (i) the distances among row markers are interpreted as
It is also possible to obtain the marker matrices G and H an inverse function of similarities, in a such a way
by means of alternating regressions. Specifically, if we that closer markers (individuals) are more similar;
consider the row markers G as fixed, the column markers this property allows the clusters of individuals with
can be computed by regression as (i) H> ¼ ðG> GÞ1 G> Y: similar profiles to be identified;
In the same way, fixing H; the row markers can be obtained (ii) the lengths of the column markers (vectors) approx-
as (ii) G> ¼ ðH> HÞ1 H> Y: Thus, alternating steps (i) and imate the SD of the variables;
(ii), their products converge to the SVD; for details about (iii) the cosines of the angles among the column vectors
alternating regressions, see Ukkelberg and Borgen (1993). approximate the correlations among variables in a
By choosing factors G and H of Y ð2Þ ; as in Eq. (1), for such a way that small acute angles are associated
with variables that have high positive correlations,
biplotting, one may use the factorization provided by the
obtuse angles near to the straight angle are associated
SVD given by G ¼ UKc and H ¼ VK1c ; where the scalar
with variables that have high negative correlations,
c can take any value between zero and one. When c = 1,
and right angles are associated with non-correlated
the singular values are entirely partitioned into the row
variables; in the same way, the cosines of the angles
eigenvectors, which is referred to as row-metric preserving
among the variable markers and the axes (principal
or JK biplot; see Gabriel (1971). Because in this case G ¼
components) approximate the correlations between
UK; we have YY > ¼ GG> : Therefore, this partitioning them; whereas for standardized data, they approxi-
recovers the Euclidean distances among row factors and is, mate the factor loadings in factor analysis; and
consequently, appropriate for visualizing the similarity/ (iv) the order of the orthogonal projections of the row
dissimilarity among row factors. When c = 0, the singular markers (points) onto a column marker (vector)
values are entirely partitioned into the column eigenvec- approximates the order of the row elements (values)
tors, referred to as column-metric preserving or GH biplot; in that column; so, as the projection of a point
see Gabriel (1971). Because now H ¼ VK; we have (individual) away from the center of gravity (average
Y > Y ¼ HH> ; which is the sum of squares and cross pro- coordinate point), the value that this individual takes
ducts matrix of Y: If Y is column-centered, then Y > Y is on the variable is farther from its mean.
(m - 1) times the covariance matrix. Thus, this partition-
ing is appropriate for studying the relationships among
column factors. Two important rules applying to such a 2.3 The GGE biplot
partitioning are (i) the correlation between two columns is
approximated by the cosine of the angle between their GEI is commonly observed by crop producers and breeders
vectors, if the data are column-centered before subjecting as the differential ranking of cultivar yields among loca-
to SVD; and (ii) the vector length of a column equals tions or years; see Samonte et al. (2005). Plant breeders
pffiffiffiffiffiffiffiffiffiffiffiffi
m  1 times the standard deviation (SD) of the column conduct MET primarily to identify the superior cultivar for
factor, across the rows. When c = 0.5, symmetric roles are a target region and secondarily to determine whether the
assigned to rows and columns, referred to as SQRT biplot target region can be subdivided into different mega-envi-
or symmetric biplot; see Cárdenas and Galindo (2003). ronments; see Yan et al. (2000). Because the main objec-
tive of a breeding program is to select genotypes that are
consistent and highly productive across different environ-
2.2 The HJ biplot
ments, the existence of GEI presents a problem for
breeders.
Galindo (1986) proposed a new form of representation,
In practice, standard statistical methods that have been
known as HJ biplot, in which the coordinates for columns
applied for analysis of GEI include:
coincide with the column markers in the GH biplot,
whereas the coordinates for the rows coincide with the row (i) ANOVA model;
markers in the JK biplot. Note that these coordinates may (ii) completely multiplicative model (COMM);
be represented in the same Cartesian system. (iii) shifted multiplicative model (SHMM);
The HJ biplot is conceptually similar to the correspon- (iv) genotypes (cultivars) regression model (GREG);
dence analysis, but it applies to continuous data rather than (v) sites (environments) regression model (SREG); and

123
Stoch Environ Res Risk Assess

(vi) AMMI model. Cornelius et al. (1992) and Crossa and Cornelius (1997).
On the one hand, ANOVA is an additive model that The GREG is a reparameterization of the stability analysis
describes the main effects and determines whether GEI is a model of Finlay and Wilkinson (1963) and Eberhart and
significant source of variation or not. However, it does not Russell (1969). This model expresses the performance of
provide insight into the genotypes or environments that each genotype as its multiple regression on t unknown
give rise to the interaction; see details in Samonte et al. functions gjk of unidentified characteristics of the sites; see
(2005). Specifically, for the data matrix Y ¼ ðyij Þ; with Cornelius et al. (1996). In this context, we use the name
response variables Yij, the ANOVA model is ‘‘genotype regression’’ instead of ‘‘row regression’’, men-
tioned in Bradu and Gabriel (1978). Similarly, we use the
Yij ¼ l þ ai þ bj þ /ij þ eij ; ð5Þ name ‘‘site regression’’ instead of ‘‘column regression’’.
where Yij is the yield of the genotype i in the environment The multiplicative terms in the models given in
j, l is the overall mean, ai is the genotype (row) main Eqs. (6)–(10) are estimated by the LS method through the
effect, bj is the environment (column) main effect, /ij is the SVD of the two-way array of data (matrix Z), obtained by
specific genotype i (row) by the environment j (column) subtracting the LS estimates from the additive effects of the
interaction, and eij is the error term of the model, where model, using the cell means; see Cornelius et al. (1996).
iid For these five models, the LS estimates of the additive
eij  Nð0; r2 Þ: On the other hand, COMM, SHMM, GREG, effects and expressions for the elements of the appropriate
SREG and AMMI are multiplicative model forms; see Z ¼ ðzij Þ matrix are: [COMM] zij ¼ yij ; [SHMM] zij ¼
Cornelius et al. (1996). Specifically, the COMM is
yij  b l ¼ y:: ; [GREG] zij ¼ yij  b
l ; with b a i ; with b
a i ¼ yi: ;
X
t
Yij ¼ kk nik gjk þ eij ; ð6Þ [SREG] zij ¼ y  b
ij b ; with b
j b ¼ y ; and [AMMI] zij ¼
j :j
k¼1 ai  b
yij  b l ; with b
bj  b l; b
a i ¼ yi:  b b j ¼ y:j  b
l ; and b

the SHMM is y:: The AMMI model first applies the ANOVA model and
then applies the PCA model to the interaction; see Gauch
X
t
(1988). Then, the genotypic and environmental scores can
Yij ¼ l þ kk nik gjk þ eij ; ð7Þ
k¼1 be used to construct biplots that help to interpret the GEI.
GGE applies SVD to the environment-centered two-way
the GREG is data containing G and GE, but the AMMI model applies
X
t the SVD to the doubly-centered two-way data, containing
Yij ¼ ai þ kk nik gjk þ eij ; ð8Þ GE only. Hence, from Eq. (5), the GGE model is
k¼1
Yij  l  bj ¼ ai þ /ij þ eij :
the SREG is
Thus, the GGE method mixes G and GE and partitions this
X
t
Yij ¼ bj þ kk nik gjk þ eij ; ð9Þ mixture into t multiplicative terms by
k¼1 Xt
Yij  l  bj ¼ kk nik gjk :
and the AMMI model is k¼1
X
t
Note that the GGE biplot is based on the linear-bilinear
Yij ¼ l þ ai þ bj þ kk nik gjk þ eij ; ð10Þ
k¼1 SREG; see Yan and Kang (2003). As mentioned, a biplot is
as a scatterplot that graphically displays a point or score for
where t is the number of SVD axes retained in the model, each genotype and each environment. To distinguish the
kk is the singular value for the SVD axis k, nik is the sin- members of a model family, the number of components
gular value of the genotype i for the SVD axis k, gjk is the must be considered; for example, a GGE model having two
singular value of the environment j for the SVD axis k, and principal components (GGE2) and a AMMI model having
iid
eij is the error term of the models, where eij  Nð0; r2 Þ: two principal components (AMMI2). Two kinds of biplots
In all of the models in Eqs. (6), (7), (8), (9) and (10), the are common in the yield trial literature: (i) a GGE2 biplot
scale parameters kk are ordered as k1  k2      kt  0; has the first component (PC1) for its abscissa and the
and the nik and gjk satisfy the normalization and orthogo- second component (PC2) for its ordinate; and (ii) an
P 2 P 2
nality constraints, given by inik = jgjk = 1 and AMMI2 biplot has also PC1 for its abscissa and PC2 for its
P P
n g
i ik im = n g
j jk jm = 0, respectively, for k = m. ordinate. The interpretation of GGE2 and AMMI2 biplots
The COMM was used by Fisher and Mackenzie (1923), is similar. Genotypes that are more similar are closer in the
whereas the SHMM was the first linear-bilinear model used plot than genotypes that are less similar. The same is true
for identifying subsets of genotypes or environments; see for environments. Genotypes/environments that are alike

123
Stoch Environ Res Risk Assess

tend to cluster together. The angle between environmental After selecting the model, one must click on the OK button
axes is related to the correlation between environments. An and thus the GGE biplot of data1 will be visualized by an
acute angle indicates positive correlation, whereas a right interactive window; for more details about these com-
angle indicates no correlation, but an obtuse angle indicates mands, see Sect. 4 for an example with real data. In this
negative correlation. The projection of a genotype onto an window, a menu with the following options are available:
environmental axis reflects the performance of that geno- File
type in that environment.
• Open log file.
• Copy image: it copies the current image to the clipboard.
• Save image: it saves the current image in pdf,
3 The GGEBiplotGUI package
postscript, metafile, bmp, png, jpg/jpeg or eps/ps
(useful for processing lateX files) formats.
In this section, we describe the R functions (commands) of
• Print image: it prints the current image to any printer
the GGEBiplotGUI package. In this package, three-
connected to the user’s computer.
dimensional biplots are incorporated via the rgl package.
• Exit: it exits the program.
We recall that the user requires almost no knowledge of the
R programming language for using our package. The View
mentioned commands are organized under the following
• Show both: it shows both entries and testers.
menu entries:
• Show genotypes: it shows genotypes only.
File ) View ) Biplot tools ) Format ) Models • Show environments: it shows environments only.
) Biplot: • Show/hide title.
• Show/hide guidelines.
Below, we illustrate these entries, describing only those
• Add/remove symbols: it adds a dot at the location
that are not directly visible.
where genotypes are placed.
After the R program has been downloaded from http://
www.r-project.org and installed, it is also necessary to Biplot tools
install the GGEBiplotGUI package, which depends on
• Examine a genotype: it allows the user to select a
the rgl, tcltk and tkrplot packages, which are
genotype. When a genotype is selected and the OK
automatically installed. Then, to load the GGEBiplotGUI
button is clicked, the performance of different envi-
package into the R software, the command
ronments in this genotype is displayed.
library(‘‘GGEBiplotGUI’’)
• Examine an environment: it allows the user to select an
must be entered at the R prompt or at any editor program
environment. When an environment is selected and the
that the user is considering. Once all these instructions are
OK button is clicked, the performance of different
ready, the data, say, for example, data1, must be loaded;
genotypes in this environment is displayed.
see Sect. 4. Hence, one produces the corresponding biplot
• Relation among environments: it allows us to analyze
by the command
variability and similarity/dissimilarity among environ-
GGEBiplot(data1)
ments in relation to the yield provided by the genotypes.
An emerging windows, entitled ‘‘Model selection’’,
• Compare two genotypes: it allows a visual comparison of
appears, from which one must select the model by choosing
two genotypes with regard to each of the environments.
the type of SVD among:
When this function is chosen, a selection panel appears,
1. ‘‘JK’’ (row metric preserving); which allows one to select two genotypes to be compared.
2. ‘‘GH’’ (column metric preserving); • Which won where/what: it allows an identification of
3. ‘‘HJ’’ (dual metric preserving); and the best genotypes in each environment.
4. ‘‘SQ’’ (symmetrical); • Discriminativeness against representativeness: it
the type of centering within: defines an average environment and draws an aver-
age-environment-axis (AEA). It allows the representa-
1. ‘‘0’’ (no centering); tiveness and discriminating power of environments to
2. ‘‘1’’ (global centering: E ? G ? GE); be visualized.
3. ‘‘2’’ (tester centering: G ? GE); and • Mean against stability: it defines an average environ-
4. ‘‘3’’ (double centering: GE); ment and draws an AEA. It facilitates visualization of
and, finally, the type of scaling between: the mean performance and stability of a cultivar.
1. ‘‘0’’ (no scaling); and • Rank environments with ref. to the ideal environment:
2. ‘‘1’’ (std deviation). it defines an ideal environment and compares all

123
Stoch Environ Res Risk Assess

environments to it. The ideal environment is defined as • PC5 against PC6: it shows the fifth principal compo-
the most discriminating and absolutely representative. nent (PC5) against the sixth principal component
It generates a ranking of the test environments in terms (PC6).
of both criteria. • PC1 against PC3: it shows the first principal component
• Rank genotypes with ref. to the ideal genotype: it defines (PC1) against the third principal component (PC3).
an ideal genotype and compares all genotypes to it. The • PC2 against PC3: it shows the second principal
ideal genotype is defined as one that has the highest component (PC2) against the third principal component
performance in all environments and is, therefore, (PC3).
absolutely stable. It generates a ranking of the cultivars • Biplot3D: it shows the three dimensional biplot.
in terms of both mean performance and stability.
• Back to original data: it resets the biplot based on the
original parameters. 4 Data analysis
Format
In this section, we illustrate the GGEBiplotGUI package
• Biplot title: it allows the biplot title to be modified. by using real agricultural data. These data correspond to
• Change color with the options: background, genotypes the yield from the 1993 Ontario winter wheat (Triticum
labels, environment labels, and biplot title. aestivum L.) performance trials, in which 18 cultivars were
• Change font with the options: default, larger, and smaller. tested at nine locations; see Yan and Kang (2003). The
data, that we name ‘ontario’ data (or simply ontario),
Models
are displayed in Table 1.
This option allows us to modify the entries initially
Data may be entered using R commander or be
chosen about ‘‘Model selection’’, which we now detail:
imported into R and saved as a matrix or a data frame.
• Scaled (divided by): Ontario data have been included in the package as a data
frame.
– no scaling;
To initialize the GUI with ontario data, the
– std deviation;
command:
• Centered by: GGEBiplot(Data = Ontario)
must be entered, which opens the ‘‘Model selection’’
– no centering (it corresponds to the model ‘‘COMM’’);
window; see Fig. 1.
– global-centered E ? G ? GE (it corresponds to the
After selecting the model considering the options
model ‘‘SHMM’’);
‘‘GH’’, ‘‘tester-centered G ? GE’’ and ‘‘no scaling’’, one
– tester-centered G ? GE (it corresponds to the model
clicks on the OK button and the main window appears;
‘‘GGE’’);
see Fig. 2(left). The GGE biplot contains markers for
– double-centered GE (it corresponds to the model
each of the 18 cultivars in lower case letters, as distin-
‘‘AMMI’’);
guished from markers for each of the nine environments
• SVP (singular value partitioning): in upper case letters. The percentages of GGE explained
by the two axes are indicated in Fig. 2(left). The
– JK (the singular value partitioning method is ‘‘row
GGEBiplotGUI package provides options to view the
metric preserving’’);
GGE biplot in numerous ways, in order to address most
– GH (the singular value partitioning method is
questions a breeder or researcher is likely to ask. This is
‘‘column metric preserving’’);
exemplified below.
– HJ (the singular value partitioning method is ‘‘dual
metric preserving’’);
4.1 The performance of different cultivars in a given
– SQ (the singular value partitioning method is
environment
‘‘symmetrical’’);
Biplot From the Biplot tools menu bar, select ‘‘Examine an
environment’’. Choose any environment of interest (BH93,
• PC1 against PC2 (default): it shows the first principal
in this example) from the list in the combo-box, and then
component (PC1) against the second principal compo-
the following features will appear (see Fig. 2, right):
nent (PC2).
• PC3 against PC4: it shows the third principal compo- • A line that passes through the plot origin and the
nent (PC3) against the fourth principal component selected environment (BH93, in this example), which is
(PC4). referred to as the environment axis.

123
Stoch Environ Res Risk Assess

Table 1 Mean yield of 18 winter wheat varieties (genotype) tested at nine Ontario locations (environment) for ontario data
Genotype Environment
BH93 EA93 HW93 ID93 KE93 NN93 OA93 RN93 WP93

Ann 4.46 4.15 2.85 3.08 5.94 4.45 4.35 4.04 2.67
Ari 4.42 4.77 2.91 3.51 5.70 5.15 4.96 4.39 2.94
Aug 4.67 4.58 3.10 3.46 6.07 5.03 4.73 3.90 2.62
Cas 4.73 4.75 3.38 3.90 6.22 5.34 4.23 4.89 3.45
Del 4.39 4.60 3.51 3.85 5.77 5.42 5.15 4.10 2.83
Dia 5.18 4.48 2.99 3.77 6.58 5.05 3.99 4.27 2.78
Ena 3.38 4.18 2.74 3.16 5.34 4.27 4.16 4.06 2.03
Fun 4.85 4.66 4.43 3.95 5.54 5.83 4.17 5.06 3.57
Ham 5.04 4.74 3.51 3.44 5.96 4.86 4.98 4.51 2.86
Har 5.20 4.66 3.60 3.76 5.94 5.35 3.90 4.45 3.30
Kar 4.29 4.53 2.76 3.42 6.14 5.25 4.86 4.14 3.15
Kat 3.15 3.04 2.39 2.35 4.23 4.26 3.38 4.07 2.10
Luc 4.10 3.88 2.30 3.72 4.56 5.15 2.60 4.96 2.89
M12 3.34 3.85 2.42 2.78 4.63 5.09 3.28 3.92 2.56
Reb 4.38 4.70 3.66 3.59 6.19 5.14 3.93 4.21 2.93
Ron 4.94 4.70 2.95 3.90 6.06 5.33 4.30 4.30 3.03
Rub 3.79 4.97 3.38 3.35 4.77 5.30 4.32 4.86 3.38
Zav 4.24 4.65 3.61 3.91 6.64 4.83 5.01 4.36 3.11

average from those performing above average in BH93.


Cultivars ‘kat’, ‘m12’, ‘ena’, ‘luc’, and ‘ann’ performed
below average, whereas other cultivars, on the same side of
the perpendicular line as BH93, performed above average.

4.2 The relative adaptation of a given cultivar


in different environments

From the Biplot tools menu bar, select ‘‘Examine a


genotype’’. Choose any genotype of interest (fun, in this
example) from the list in the combo-box and then the
following features will appear (see Fig. 3, left):
• A line that passes through the plot origin and the select
cultivar, which is referred to as the genotype axis.
Fig. 1 Window to select the model • A line that passes through the plot origin and is
perpendicular to the genotype axis, referred to as the
• A line that passes through the plot origin and is perpendicular line.
perpendicular to the environment axis, referred to as the • The projections of the environments to the genotype axis.
perpendicular line.
Figure 3(left) is based on a ‘‘Tester-centered (G ? GE)’’
• The projections of the cultivars to the environment axis.
table, without any scaling and it is row metric preserving.
Figure 2(right) is based on a ‘‘Tester-centered (G ? The environments are ranked in the direction of the
GE)’’ table, without any scaling and it is row metric pre- genotype axis in terms of the relative performance of ‘fun’.
serving (JK biplot). The cultivars are ranked in the direc- Thus, ‘fun’ performed the best in WP93, followed by
tion of the environment axis. In this example cultivar ‘fun’ NN93, BH93, RN93, ID93, HW93, EA93, KE93 and
was the best, followed by ‘cas’ and ‘har’, whereas ‘kat’ OA93. The perpendicular line separates environments, in
was the poorest in the selected environment BH93. The which ‘fun’ performed above average from those in which
perpendicular line separates cultivars that performed below ‘fun’ performed below average.

123
Stoch Environ Res Risk Assess

Fig. 2 Main window for ontario data (left) and ranking cultivars based on performance in the environment BH93 for ontario data (right)

Fig. 3 Ranking environments based on relative performance of a cultivar—fun—for ontario data (left) and comparing ‘fun’ and ‘zav’ cultivars
for ontario data (right)

4.3 Comparison of two cultivars • A line that is perpendicular to the jointer line and
passes through the plot origin (equality line).
From the Biplot tools menu bar, select ‘‘Compare two
Figure 3(right) is based on a ‘‘Tester-centered (G ?
genotypes’’. Choose any cultivar from Genotype 1 (fun),
GE)’’ table, without any scaling and it is row metric pre-
then choose a different cultivar from Genotype 2 (zav), and
serving. A cultivar has higher values in environments that
click the OK button. Upon this clicking, the following
are located at its side of the equality line; see Yan and
features will appear (see Fig. 3, right):
Tinker (2006). In Fig. 3(right), cultivars ‘fun’ and ‘zav’
• Two ovals that circle the two selected cultivars. were compared. We see that the environments OA93 and
• A line that connects the two cultivars (jointer line). KE93 were on the ‘zav’ side of the equality line. Thus,

123
Stoch Environ Res Risk Assess

‘zav’ was better in these environments, but ‘fun’ was better (G ? GE)’’ table, without any scaling and it is row metric
than ‘zav’ in the other seven environments. preserving. This figure is the average-environment coor-
dination (AEC) view of the GGE biplot, which has the
4.4 Which-won-where following interpretation (see Yan and Tinker 2006):
• The single-arrowed line is the AEC abscissa (or AEA)
From the Biplot tools menu bar, select ‘‘Which won where/ and points to higher mean yield across environments.
what’’. The GGE biplot becomes like Fig. 4(left), which is Thus, ‘fun’ had the highest mean yield, followed by ‘cas’,
based on a ‘‘Tester-centered (G ? GE)’’ table, without any ‘har’, etc., whereas ‘kat’ had the lowest mean yield.
scaling and it is row metric preserving. The polygon is formed • The AEC ordinate passes the plot origin and is perpen-
by connecting the markers of the genotypes that are farthest dicular to the AEC abscissa and points to greater
away from the biplot origin, such that all other genotypes are variability (poorer stability) in either direction. Thus,
contained in the polygon. Figure 4(left) also contains a set of ‘rub’ was highly unstable, whereas ‘cas’ was highly stable.
lines perpendicular to each side of the polygon. These per-
pendicular lines divide the biplot into several sectors. The 4.6 Ranking genotypes relative to the ideal genotype
winning genotype for each sector is the one located at the
respective vertex. Genotypes located at the vertices of the From the Biplot tools menu bar, select ‘‘Rank genotypes
polygon reveal the best or the poorest in one or other envi- with ref. to the ideal genotype’’. The arrow is where an
ronment; see Yan and Tinker (2006). There are five sectors ideal cultivar should be. Its projection on the AEA was
with cultivars ‘fun’, ‘zav’, ‘ena’, ‘kat’ and ‘luc’ as the corner designed to be equal to the longest vector of all cultivars,
or vertex cultivars. Environments OA93 and KE93 fell in the and its projection on the AEC was obviously zero, meaning
sector in which ‘zav’ was the vertex cultivar. This means that that it is absolutely stable. Therefore, genotypes located
‘zav’ was the best cultivar for OA93 and KE93. The other closer to the ideal genotype are more desirable than others.
seven environments fell in the sector in which ‘fun’ was the Thus, ‘fun’ was more desirable than ‘cas’, where this last
vertex cultivar, meaning that ’fun’ was the best cultivar for was, of course, the poorest genotype; see Fig. 5(left). This
these seven environments. No environments fell into sectors figure is based on a ‘‘Tester-centered (G ? GE)’’ table,
with ‘luc’, ‘ena’, and ‘kat’ as the vertices, indicating that these without any scaling and it is row metric preserving.
cultivars were not the best in any of the environments.
4.7 The representativeness and discriminating ability
4.5 Mean performance and stability of the genotypes of the environments

From the Biplot tools menu bar, select ‘‘Mean against Figure 5(right) shows the representativeness and discrimi-
stability’’. Figure 4(right) is based on a ‘‘Tester-centered nating ability of environments and is based on a ‘‘Tester-

Fig. 4 The which-won-where view of the GGE biplot for ontario data (left) and AEC view for ontario data (right)

123
Stoch Environ Res Risk Assess

Fig. 5 Ranking cultivars based on both mean performance and stability for ontario data (left) and discriminativeness against representativeness
of test environments for ontario data (right)

centered (G ? GE)’’ table, without any scaling and it is


column metric preserving (GH biplot). The vector length,
that is, the absolute distance between the marker of an
environment and the plot origin, is a measure of the dis-
criminating ability: as the longer vector, the discrimination
of the environment increases. Therefore, among the nine
environments studied, KE93 and OA93 were most dis-
criminating (informative) and RN93 least discriminating.
The average environment (represented by the small circle
at the end of the arrow) has the average coordinates of all
test environments. AEA is the line that passes through the
average environment and the biplot origin. A test envi-
ronment that has a smaller angle with the AEA is more
representative of other test environments; see Yan and
Tinker (2006). Thus, BH93 is most representative, whereas
OA93 is least representative.

4.8 Environment ranking


Fig. 6 Ranking environments based on discriminating ability and
The ideal test environment should be most discriminating representativeness for ontario data
(informative) and also most representative of the target
environment. Figure 6 defines an ideal test environment,
which is the center of the concentric circles. This is a point 5 Conclusions
on the AEA in the positive direction (most representative),
with a distance to the biplot origin equal to the longest In this paper, we have proposed a new interactive R
vector of all environments (most informative); see Yan and package called GGEBiplotGUI, which has implemented
Tinker (2006). BH93 is closest to this point and is, there- the main functions of the biplot methods. This package is a
fore, best, whereas KE93 and OA93 were poorest for tool for multi-environment trial data visual analysis and
selecting cultivars adapted to the whole region. Figure 6 is offers a free alternative to the existing commercial soft-
based on a ‘‘Tester-centered (G ? GE)’’ table, without any ware. In addition, the package proposed in this paper is the
scaling and it is column metric preserving. only open source and interactive software on biplot

123
Stoch Environ Res Risk Assess

methods, with a graphical interface that allows us to use it Crossa J, Cornelius JL (1997) Sites regression and shifted multipli-
without necessity of having a knowledge of the R pro- cative model clustering of cultivar trial sites under heterogeneity
of error variances. Crop Sci 37:405–415
gramming language, which can do this package very Denis JB (1991) Ajustements de modelles lineaires et bilineaires sous
attractive for diverse practitioners. The GGEBiplotGUI constraintes lineaires avec donnes manquantes. Stat Appl
package graphically addresses the main questions that a 39:5–24
researcher need likely to ask. Although the biplot method Ebdon JS, Gauch HG (2002a) Additive main effect and multiplicative
interaction analysis of national turfgrass performance trials I:
for genotype main effect and genotype by environment interpretation of genotype 9 environment interaction. Crop Sci
interaction was developed originally for analyzing data of 42:489–496
multi-environment trials, it can also be used to visualize Ebdon JS, Gauch HG (2002b) Additive main effect and multiplicative
other types of two-way data. For example, it was satis- interaction analysis of national turfgrass performance trials II:
cultivar recommendations. Crop Sci 42:497–506
factorily employed to visualize diallel cross data (see Yan Eberhart SA, Russell WA (1969) Yield stability for a 10-line diallel of
and Hunt 2002), genotype 9 trait data (see Yan and Rajcan single-cross and double-cross maize hybrids. Crop Sci 9:357–
2002), and genotype 9 genetic marker data (see Yan and 361
Falk 2002). The biplot method for genotype main effect Eckart C, Young G (1939) A principal axis transformation for non-
Hermitian matrices. Bull Am Math Soc 45:118–121
and genotype by environment interaction is equally appli- Falguerolles A (1995) Generalized bilinear models and generalized
cable to all types of two-way data that assume an biplots: some examples. Publications du Laboratoire de Statis-
entry 9 tester structure. Thus, the GGEBiplotGUI tique et Probabilites. Universite Paul Sabatier, Toulouse
package can provide a response to several types of prob- Finlay KW, Wilkinson GN (1963) The analysis of adaptation in a
plant-breeding programme. Aust J Agric Res 14:742–754
lems, but it mainly is intended to the analysis of data Fisher RA, Mackenzie WA (1923) The manurial response of different
generated by plant breeders and geneticists, in order to potato varieties. J Agric Sci 23:311–320
visually study yields from genotypes and interactions Gabriel KR (1971) The biplot graphic display of matrices with application
between genotype and environment. Another interesting to principal component analysis. Biometrika 58:453–467
Gabriel KR (1998) Generalised bilinear regression. Biometrika
feature of this package of free distribution is that it allows 85:689–700
us to produce the classical biplots, which can be other Galindo MP (1986) An alternative for simultaneous representation:
attractive aspect for different practitioners. HJ-biplot. Questı́io 10:12–23
Gauch HG (1988) Model selection and validation for yield trials with
Acknowledgments The authors wish to thank the Editor-in-Chief, interaction. Biometrics 44:705–715
Professor George Christakos, an Associate Editor, and anonymous Gauch HG (2006) Statistical analysis of yield trials by AMMI and
referees for their comments on an earlier version of this manuscript, GGE. Crop Sci 46:1488–1500
which resulted in this improved version. The research of Victor Leiva Gauch HG, Zobel RW (1996) AMMI analysis of yield trials. In:
was partially supported by FONDECYT 1120879 grant from the Gauch HG, Kang MS (eds) Genotype-by-environment interac-
Chilean government. tion. CRC Press, Boca Raton, pp 1–40
Gauch GH, Zobel RW (1997) Interpreting mega-environments and
targeting genotypes. Crop Sci 37:311–326
Gauch HG, Piepho HP, Annicchiarico P (2008) Statistical analysis of
References yield trials by AMMI and GGE: further considerations. Crop Sci
48:866–889
Bacon-Shone J (2008) Compositional data analysis in the geosci- Golub GH, Reinsch CH (1970) Singular value decomposition and
ences: from theory to practice (by A. Buccianti, G. Mateu- least squares solution. Numer Math 14:403–420
Figueras and V. Pawlowsky-Glahn, eds). Stoch Environ Res Gower JC (1992) Generalized biplots. Biometrika 79:475–493
Risk Assess 22:139–141 Gower JC, Hand D (1996) Biplots. Chapman & Hall/CRC, London
Barros M, Paula GA, Leiva V (2009) An R implementation for Gower JC, Harding SA (1988) Nonlinear biplots. Biometrika
generalized Birnbaum–Saunders distributions. Comput Stat Data 75:445–455
Anal 53:1511–1528 Gower JC, Gardner-Lubbe S, Le Roux N (2011) Understanding
Bradu D, Gabriel KR (1978) The biplot as a diagnostic tool for biplots. Wiley, New York
models of two-way tables. Technometrics 20:47–68 Greenacre M (2010) Biplots in practice. BBVA Foundation, Madrid
Cárdenas O, Galindo MP (2003) Biplot with external information Householder AS, Young G (1938) Matrix approximation and latent
based on generalized bilinear models. Printed by Council of roots. Am Math Mon 45:165–171
Scientific and Humanistic Development of the Central Univer- Kang MS (1988) Using genotype-by-environment interaction for crop
sity of Venezuela, Caracas. Spanish version can be downloaded cultivar development. Adv Agron 62:199–252
from bit.ly/14BARON Kempton RA (1984) The use of biplots in interpreting variety by
Choulakian V (1966) Generalized bilinear models. Psychometrika environment interactions. J Agric Sci 103:123–135
61:271–283 Leiva V, Hernandez H, Sanhueza A (2008) An R package for a
Cornelius PL, Seyedsadr MS, Crossa J (1992) Using the shifted general class of inverse Gaussian distributions. J Stat Softw
multiplicative model to search for separability in crop cultivar 26:1–21
trials. Theor Appl Genet 84:161–172 Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R
Cornelius PL, Crossa J, Seyedsadr MS (1996) Statistical tests and Stat Soc A 135:370–384
estimators for multiplicative models for genotype-by-environ- R Team (2013) R: a language and environment for statistical
ment interaction. In: Kang MS, Gauch HG Jr (eds) Genotype-by- computing. R Foundation for Statistical Computing, Vienna.
environment interaction. CRC Press, Boca Raton Available at http://www.R-project.org

123
Stoch Environ Res Risk Assess

Samonte SOPB, Wilson LT, McClung AM, Medley JC (2005) Yan W, Kang MS (2003) GGE biplot analysis: a graphical tool for
Targeting cultivars onto rice growing environments using AMMI breeders, geneticists, and agronomists. CRC Press, Boca Raton
and SREG GGE biplot analysis. Crop Sci 45:2414–2424 Yan W, Kang MS (2006) GGEbiplot. Available at http://www.
Ukkelberg A, Borgen O (1993) Outlier detection by robust alternating ggebiplot.com
regressions. Anal Chim Acta 277:489–494 Yan W, Rajcan I (2002) Biplot evaluation of test sites and trait
Van Eeuwijk F (1995) Multiplicative interaction in generalized linear relations of soybean in Ontario. Crop Sci 42:11–20
models. Biometrics 51:1017–1032 Yan W, Tinker NA (2006) Biplot analysis of multi-environment trial
Vicente-Villardón JL, Galindo MP, Blázquez A (2006) Logistic data: principles and applications. Can J Plant Sci 86:623–645
biplots. In: Grenacre M, Blasius J (eds) Multiple correspondence Yan W, Hunt LA, Sheng Q, Szlavnics Z (2000) Cultivar evaluation
analysis and related methods. Chapman & Hall, New York and mega-environment investigation based on GGE biplot. Crop
Yan W, Falk DE (2002) Biplot analysis of host by pathogen Sci 40:597–605
interaction. Plant Dis 86:1396–1401 Yan W, Kang MS, Ma B, Woods S, Cornelius PL (2007) GGE biplot
Yan W, Hunt LA (2002) Biplot analysis of diallel data. Crop Sci vs. AMMI analysis of genotype-by-environment data. Crop Sci
42:21–30 47:643–655

123

You might also like