You are on page 1of 19

Visualizing Correlation

Michael W. Trosset
August 18, 2006

Abstract
The well-known fact that Pearsons product-moment correlation coefficient between two variables is the cosine of the angle between the centered variable profiles suggests a way to visualize correlation. This angular representation of product-moment correlation is automatically
displayed in an h-plot. Using ideas from multidimensional scaling, an alternative angular representation of correlation is proposed. The proposed method can be applied to an arbitrary
matrix of correlation coefficients. Implications for cluster analysis are considered. The proposed
method is applied to correlations of intelligence tests, pit props, and gene expression profiles.

Key words: angular separation, h-plots, multidimensional scaling, cluster analysis, gene expression
profiles, microarray experiments.

Contents
1 Introduction

2 Preliminaries
2.1 h-Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3
3
4

3 Fitting Angles to Correlation


3.1 Formulation . . . . . . . . .
3.2 Computation . . . . . . . .
3.3 Illustrative Example . . . .

5
5
6
7

Coefficients
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Clustering by Correlation

5 Three Case Studies


9
5.1 Intelligence Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Pit Props . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Gene Expression Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Discussion

16

Department of Mathematics, College of William & Mary, P.O. Box 8795, Williamsburg, VA 23185-8795, USA.
E-mail: trosset@math.wm.edu. This research was supported in part by a grant from the Virginia Department
of Planning and Budget, Commonwealth Technology Fund, Industry Inducement Program, Bringing the Future of
Bioinformatics to Virginia.

Introduction

Suppose that p variables are measured on each of n cases. We organize the measurements into an
n p data matrix X, so that the columns of X are the variable profiles. The distinction between
variables and cases is not intrinsic, but instead depends on which profiles we want to correlate.
For example, in a DNA microarray experiment, one might measure the expression (typically
the logarithm of the ratio of two fluorescence intensities; see Hamadeh and Afshari (2000) for an
elementary introduction to gene chips and functional genomics) of a number of genes for a number
of hybridizations. If we are interested in correlating the expression profiles of various genes across a
number of conditions, then variables are genes, p is the number of genes, and each genes expression
profile is a vector of length n. If we are interested in correlating the expression profiles of various
subjects across a number of genes, then variables are subjects, p is the number of subjects, and
each subjects expression profile is a vector of length n.
Given an n p data matrix X, let R = (rjk ) denote the p p matrix of correlation coefficients
computed from the variable profiles of X. Ultimately, we would like to cluster and/or classify
variables on the basis of their correlation. For example, consider the problem (from proteomics)
of clustering or classifying p subjects on the basis of each subjects protein signature, a profile
obtained by mass spectrometry. The locations of peaks in such a signature reveal the presence
of specific proteins; however, the absolute intensities of the peaks may not be as meaningful as
the relative intensities. Thus, it might be desirable to cluster subjects whose profiles are highly
correlated, rather than cluster subjects whose profiles are close in Euclidean distance.
One of the statistical advantages to working with Euclidean distance is the existence of a variety
of multivariate techniques that operate directly on X. If we replace Euclidean distance with some
measure of dissimilarity, then there are two natural ways to proceed. First, we might restrict
attention to methods (e.g., complete linkage cluster analysis and/or nearest neighbor classification)
that operate directly on dissimilarities. Second, we might use multidimensional scaling to embed the
objects to be clustered/classified in a Euclidean space. Similarity is easily converted to dissimilarity,
but correlation, which varies in [1, 1], does not transparently measure either.
Multidimensional scaling is a collection of techniques for visualizing dissimilarity as distance.
We seek analogous techniques for visualizing correlation. The key fact on which our development
is based is well-known. Given a variable profile xj <n , the centered profile is
x
j = xj

xTj e
e,
n

where e = (1, . . . , 1)T <n . Then Pearsons product-moment correlation coefficient between
variables j and k is
x
Tj x
k
rjk =
,
k
xj k k
xk k
the cosine of the angle between the centered variable profiles. Thus, correlation is best understood
as a measure of angular separation.
Our investigation is especially concerned with correlation coefficients other than Pearsons
product-moment correlation coefficient. In the case of product-moment correlation, the correlations between the variables can be visualized as angles by constructing an h-plot directly from X.
One purpose of this investigation is to extend this ability to other types of correlation. Section 2
provides relevant summaries of h-plots and multidimensional scaling, including an interesting connection between them. Section 3 proposes a new way of visualizing correlation. Section 4 describes
some issues that arise when clustering variables on the basis of correlation. Section 5 applies
2

the proposed method to correlations of intelligence tests, pit props, and gene expression profiles.
Section 6 concludes.

Preliminaries

2.1

h-Plots

The standard method for visualizing product-moment correlations as angles is the h-plot, introduced
by Corsten and Gabriel (1976). Our summary of h-plots was excerpted from Seber (1984), who
developed h-plots in the more general framework of Gabriels (1971) biplots. In the more general
framework, an h-plot is a fragment of a biplot, which displays information about both variables
and cases.
denote an n p centered data matrix of rank p. Let
Let X
= LM T =
X

p
X

i `i mTi

i=1

where 1 p > 0 are the singular values, the


denote the singular value decomposition of X,
`i are the columns of L, and the mi are the columns of M . The i2 are the eigenvalues of
TX
= (n 1)S,
X
where S is the unbiased
sample covariance matrix, and the mi are the corresponding eigenvectors.
Now let H = M / n 1, so that HH T = S. Let hTi denote row i of H. Then hi <p and
(a) The sample covariance between variables j and k is sjk = hTj hk , and the sample standard
deviation of variable j is khj k.
(b) The sample correlation between variables j and k is
rjk =

sjk

sjj skk

hTj hk
,
khj kkhk k

the cosine of the angle between hj and hk .


(c) The squared distance between hj and hk is
khj hk k2 = sjj 2sjk + skk ,
the sample variance of the difference between variables j and k.
An h-plot attempts to preserve the preceding properties in <2 . To construct an h-plot, one
with
approximates X
(2) =
X

2
X

i `i mTi

i=1

and H with

H(2)

1
2
m1 ,
m2 ,
n1
n1


then plots the rows of H(2) as vectors in <2 . These vectors approximate the hi <p , with the
interpretations described above. However, it must be emphasized that, in general, there will be
3

has been
unavoidable errors in the 2-dimensional representation. Notice that, if the data matrix X
standardized, then S is the sample correlation matrix. In this case, each khj k = 1 and the nonunit
lengths of the approximating vectors indicate the quality of the 2-dimensional representation.
(2) is the n p matrix of rank 2
The approximation of H with H(2) is optimal in the sense that X
in Frobenius norm, but it is not optimal in any angle-specific sense. Thus, h-plots
that is nearest X
are not specifically optimized for visualizing correlation. Nevertheless, if one wants to visualize
product-moment correlations, then h-plots not only accomplish the task but provide a great deal
of additional information. But what if one wants to visualize another type of correlation?

2.2

Multidimensional Scaling

Multidimensional scaling (MDS) is a collection of techniques for constructing configurations of


points (typically in a low-dimensional Euclidean space) from dissimilarity data. The basic idea is
to find a configuration for which the interpoint distances approximate the specified dissimilarities.
If one starts with similarity data, then one begins by transforming the similarities to dissimilarities.
Different authors use different definitions of dissimilarity and similarity. Usually, a dissimilarity
matrix = (jk ) is required to satisfy jk = kj 0 and jj = 0. Mardia, Kent, and Bibby
(1979) only required a similarity matrix C = (cjk ) to satisfy cjk = ckj and cjk cjj , but they
required a reasonable measure of similarity c to satisfy c(A, B) = c(B, A), c(A, B) > 0, and
c(A, B) increases as the similarity between A and B increases. Seber (1984) required cjk = ckj
and 0 cjk cjj = 1. In both books, the standard transformation for converting similarities to
dissimilarities is
jk = (cjj 2cjk + ckk )1/2 .
In our view, correlation is not a reasonable measure of similarity. Suppose that X2 is uncorrelated with X3 = X1 . In what sense is X1 more similar to X2 (r12 = 0) than it is to X3
(r13 = 1)? Nevertheless, if all rjk 0, then it may be tempting to treat the rjk as similarities
and try to visualize the correlational structure of the variables by MDS.
To see where this leads, let us begin with a sample covariance matrix S = (sjk ). Treating S
as a similarity matrix, we apply the standard transformation and obtain a dissimilarity matrix
= (jk ) that satisfies
2
jk
= sjj 2sjk + skk = khj hk k2 .
Thus, is the matrix of Euclidean distances formed by the hj <p of Section 2.1. A configuration of points with these interpoint distances can be constructed in <p (or approximated in
a lower-dimensional <q ) by MDS. There exists an isometric transformation of this p-dimensional
configuration that equals the hj , but how do we identify it? The origin of the space in which MDS
constructs a configuration is arbitrary, typically chosen to equal the centroid of the configuration.
Hence, the angles formed by vectors emanating from this origin are also arbitrary. We conclude
that it is difficult to use MDS to visualize correlation as angular separationwe dont know how
to locate the origin that produces the correct angles.
So far we have emphasized product-moment correlation. Now suppose that R = (rjk ) is just
a matrix of correlation coefficients, without a corresponding covariance matrix. If we insist on
treating R as a similarity matrix, then the standard transformation produces dissimilarities
jk = (1 2rjk + 1)1/2 =

2 (1 rjk ).

This will lead to a configuration in which short distances correspond to strong positive correlation,
intermediate distances correspond to a lack of correlation, and long distances correspond to strong
4

negative correlation. This may serve, but the approach is somewhat indirect. Our goal is the direct
approximation of angular separation.
Hills (1969) used Gowers (1966) method of principal coordinate analysis (classical MDS) with
jk = 2(1 rjk ). Later, the MDS community began to appreciate that constructing configurations
of variables may differ from constructing configurations of cases. For example, in Section 1.3.3 of
their monograph, Cox and Cox (1994) stated:
Sometimes it is not the objects that are to be subjected to multidimensional scaling
but the variables. One possibility for defining dissimilarities for variables is simply to
reverse the roles of objects and variables and to proceed regardless, using one of the
dissimilarity measures. Another possibility is to choose a dissimilarity more appropriate
to variables than objects.
The sample correlation coefficient rij is often used as the basis for dissimilarity
between variables. For instance ij = 1 rij could be used. This measure has its
critics. A similar dissimilarity can be based on the angular separation of the vectors of
observations associated with the ith and jth variables. . .
The authors proceeded to summarize research by Zegers and ten Berge (1985), Zegers (1986), and
Fagot and Mazo (1989) on developing more general measures of angular separation. Yet no one, so
far as we know, has attempted to develop MDS techniques that are customized for such measures.
Product-moment correlation coefficients are, literally, the cosines of angles. Other correlation
coefficients are not, but we see considerable virtue in visualizing them as though they were. In the
next section, we synthesize ideas from h-plots and MDS, proposing a way to construct plots that
are optimized for the specific purpose of visualizing correlations as angles.

3
3.1

Fitting Angles to Correlation Coefficients


Formulation

Corresponding to a set of p variables, let R = (rjk ) denote a p p matrix of correlation coefficients,


i.e., |rjk | 1 and rjj = 1. We do not assume that the rjk are product-moment correlation
coefficients, so R may not be positive semidefinite. Given such a matrix, we would like to visualize
the corresponding variables as vectors, and we want to do so in such a way that the angles between
the vectors convey information about the correlation between the variables.
Because our interest is in correlation, we will avoid distractions and construct vectors of unit
length. Initially, we follow the lead of Corsten and Gabriel (1976) and restrict attention to <2 .
This means that we seek p points on the unit circle, identified with scalar angles 1 , . . . , p . To
remove rotational indeterminancy, we set 1 = 0.
The angle between vectors j and k is j k . We seek = (0, 2 , . . . , n ) for which
rjk cos (j k ) = cos (k j ) .

(1)

Thus, we seek to solve an unconstrained optimization problem of the form


minimize f () = (R, C())

(2)

for some measure of discrepancy between the matrices R and C() = (cos(j k )). An obvious
measure of discrepancy is squared error, resulting in the objective function
f () = kR C()k2 = 2

X
j<k

[rjk cos (j k )]2 .

(3)

Because errors of magnitude  have different implications depending on the magnitude of |rjk |,
one might prefer to transform the rjk and cos(j k ) before comparing them. A natural choice of
transformation is Fishers z-transformation,
1
1+r
,
z(r) = log
2
1r


which approximately normalizes the product-moment correlation coefficient. Ignoring the constant
multiplier, this leads to the alternative objective function
"

fz () = 2

X
j<k

log

1 + rjk
1 rjk

1 + cos(j k )
1 cos(j k )

log

!#2

(4)

Because z(r) as r 1, (4) places greater weight on approximating pairs of highly


correlated variables than does (3). Notice that the objective function fz can also be derived by
supposing that the correlation coefficient is Kendalls b . Then (r) = (1 + r)/2 is the probability
of concordance and applying the logit transformation results in
(r)
log
1 (r)


(1 + r)/2
= log
1 (1 + r)/2


1+r
= log
1r


= 2z(r).

Finally, notice that z(r) is not defined for r = 1. However, if some rjk = 1, then a natural way
to avoid difficulty is to set k = j and proceed with a reduced set of variables.
Thus far, we have emphasized the display of scalar angles in two dimensions. Such displays
are visually appealing, but they convey information inefficiently. To obtain 2-dimensional angular
representations, we resort to spherical coordinates, parametrizing x <3 with kxk = 1 by x1 =
cos sin , x2 = sin sin , and x3 = cos , where is longitude and [0, ] is colatitude.
When 3-dimensional display of the x vectors is impractical, we display the corresponding (, ) in
two dimensions.
The angle between (j , j ) and (k , k ) is
cos j sin j cos k sin k + sin j sin j sin k sin k + cos j cos k .
After the judicious application of various trigonometric identities, this leads to the objective function
g(, ) = 2

X

rjk cos (j k ) cos2

j<k

j k
2

cos (j + k ) sin2

j k
2

2

and the optimization problem


minimize

g(, )

subject to i [0, ].

(5)

To remove rotational indeterminancy, we require 1 = 0 and 1 = 2 = /2. If one is so inclined,


then one can define the alternative objective function gz in analogy with fz .

3.2

Computation

Unlike h-plots, which can be computed by matrix factorization, solutions to (2) must be computed
by numerical optimization. The situation is precisely analogous to MDS, where the classical solution
can be computed by matrix factorization but directly fitting distances to dissimilarities entails
minimizing an objective function, e.g., Kruskals (1964) raw stress criterion, by an iterative method.
6

1.000
0.821
0.768
0.842
0.274
0.821
0.768
0.463
0.726
0.579

0.821
1.000
0.695
0.789
0.368
0.789
0.716
0.558
0.737
0.463

0.768
0.695
1.000
0.758
0.168
0.674
0.600
0.400
0.600
0.495

0.842
0.789
0.758
1.000
0.221
0.768
0.674
0.453
0.674
0.526

0.821
0.789
0.674
0.768
0.158
1.000
0.653
0.411
0.653
0.484

0.274
0.368
0.168
0.221
1.000
0.158
0.295
0.621
0.358
0.358

0.768
0.716
0.600
0.674
0.295
0.653
1.000
0.421
0.684
0.411

0.463
0.558
0.400
0.453
0.621
0.411
0.421
1.000
0.484
0.568

0.726
0.737
0.600
0.674
0.358
0.653
0.684
0.484
1.000
0.432

0.579
0.463
0.495
0.526
0.358
0.484
0.411
0.568
0.432
1.000

Table 1: A matrix of b correlation coefficients, generated by drawing five points from each of two
20-variate normal distributions with different mean vectors and the same covariance matrix.
Our experience suggests that (3) can be reliably minimized by the S-Plus1 function nlminb, a
quasi-Newton algorithm developed by Gay (1983, 1984). We start nlminb by randomly generating
i (0, 2), provide analytic gradients, and do not impose bound constraints.2 Because cos( +
2) = cos() and cos() = cos(), each distinct representation of the given correlation coefficients
corresponds to infinitely many values of . We could address this redundancy by requiring i
[0, 2] and 2 [0, ], but we are loathe to solve a constrained optimization problem when it suffices
to solve an unconstrained problem. For small problems, our experience suggests that starting from
several random generated values of usually suffices to find an element of an equivalence class of
that globally minimizes (3).
In contrast, nlminb is not likely to find a global minimizer of (4) unless it is provided with an
excellent starting value of . The reason is that fz () if any |j k | 0, which means that
each ordering of the components of corresponds to a different basin of fz . Finding the basin that
contains the global minimizer is a combinatorial problem and a randomly generated starting value
of is not likely to lie in the correct basin. One possible heuristic approach is to begin minimizing
(4) from a global minimizer of (3).

3.3

Illustrative Example

The matrix, R, of correlation coefficients in Table 1 was generated by drawing ten points in <20
(five points from each of two multivariate normal distributions with different mean vectors and
the same covariance matrix), then computing Kendalls b for each pair of points. The correlation
structure of R is readily discerned by direct inspection. There are two clusters of variables (15
and 610). Within each cluster, variables are positively correlated; between the clusters, variables
are negatively correlated. Variable 5 is not highly correlated with variables 14, but has a fairly
strong negative correlation with variable 8. We should hope that a correlation diagram will reveal
these features.
Twenty attempts (from different random starting values of ) to minimize (3) yielded three
equivalence classes of minimizers, with corresponding objective function values equal to 4.610516
(ten attempts), 5.914012 (five attempts), and 7.588137 (five attempts). Figure 1 displays the
1
S-Plus is a commercial implementation and extension of the statistical programming language S, developed at
Bell Labs (now Lucent Technologies). See http://www.insightful.com.
2
S-Plus functions for minimizing (3) and constructing the corresponding correlation diagrams are available at
http://www.math.wm.edu/trosset/r.mds.html.

correlation diagram that corresponds to 4.610516, the putative global minimum value of (3). In
this correlation diagram, each radius represents a variable.
. . . . . . . . . . .
. . .
. .
. .
. .
. .
. .
.
.
. .
.
.
.
...
.
. ..
.
...
.
.
...
. .....
.
...
...
.
.
...
...
.
...
.
...
.
.
.
.
...
.
...
.
.
.
.
...
... .....
.
.
... ...
.
.
... ...
.
.
... ...
... ..
.
.
... ...
.
.
... ...
.
... ...
.
.
.
.
.
.
.
... ...
.
.
.
... ..
.
.
.
.
... ..
.
.
.
.
... ...
.
.
... ...
.
... ...
.
... ...
.
.
... ..
.
... ...
.
...........
... ...
.
.
...........
.
... ...
.
.
.
.
.
.
.
.
...........
... ...
.
.....
.
.
.
.
.
.
...........
.
.
.
.
.
.
.
.
.
... ...
...........
.
........
.
.
.
.
.
.
.
.
.
.
.....
.
.
...........
.
.
.........
.
......
...........
.
.............
...........
......
..............
.
...........
.
......
..............
.
...........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...........
.
.......
........
.
.
.
.
.
....
.
.
.
.
.
.
.
...........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
...........
........
......................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........... .........
.
.
.
.
........... .... ................................................................................................
.
..... . .... ...............
.
..........................................................................................................................................................................................................................................
...............................
.........................
...
.
.........................
........................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.........................
.....
....
.
.........................
.......................................................
...
.
.........................
.......................
.
...
................
....................... .................................
...
.......................
..
.................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
..................
.
...
.
...
.
.
...
.
...
.
.
...
.
...
.
...
.
.
...
.
.
.
...
.
.
...
.
...
.
.
...
.
...
.
.
...
.
.
...
.
...
.
.
...
.
.
...
.
...
.
.
...
.
.
...
.
.
...
.
.
...
.
.
...
.
.
...
.
.
.
...
.
.
.
.
...
.
.
. .
...
. .
...
. .
...
. .
. .
... . . .
. .
..
. . .
. . . . . . . . . . . .

10

7
6

3
4
1
2

Figure 1: A correlation diagram. Each radius represents one variable. The angles between the radii
approximate the correlation coefficients in Table 1.
Figure 1 clearly reveals several key features of the correlation structure of R. Variables 14
have strong positive correlation and are tightly clustered in the correlation diagram. Variables
610 have positive correlation, but are not so strongly correlated as variables 14. They are weakly
clustered in the correlation diagram. Variables 14 and variables 610 are negatively clustered and
are diametrically opposed in the correlation diagram. Variable 5 is not strongly correlated with
variables 14, but has a stronger negative correlation with variable 8. In the correlation diagram,
variable 5 is distinct from the cluster of variables 14, and is diametrically opposed to variable 8.

Clustering by Correlation

Given a matrix, R, of correlation coefficients, suppose that we want to cluster the correlated
variables. For the sake of specificity, suppose that we will do so by an agglomerative hierarchical
clustering algorithm that is invariant under monotonic transformations of the (dis)similarities, e.g.,
complete linkage. There are two ways to proceed, depending on whether one interprets rjk = 1
as meaning that variables j and k are maximally similar (equivalent to rjk = 1) or maximally
dissimilar. It may be unclear which interpretation should prevail. For example, suppose that j and
k correspond to two genes in a microarray experiment and rjk measures the correlation between
8

their expression profiles over a number of hybridizations. To the statistician, rjk = 1 means that
the two genes are conveying identical information; to the biologist, rjk = 1 suggests that they
have different biological responsibilities.
If rjk = 1 indicates maximal dissimilarity, then it is natural to cluster on the basis of the
angles between the variables, i.e., using the similarity matrix A = (acos(rjk )). Because the function
acos : [1, 1] [0, 2] is increasing, this is monotonically equivalent to treating R as a similarity
matrix. Clusters can be visualized using correlation diagrams such as Figure 1, in which each
variable is represented by a radius and clusters of variables correspond to clusters of radii.
If rjk = 1 indicates maximal similarity, then it is natural to cluster on the basis of the smaller
of the angles between variable j and variable k, i.e., the smaller of the angle between the variables
and minus that angle. This is equivalent to using the similarity matrix A = (ajk ), where
ajk = min (acos (rjk ) , acos (rjk )) = acos (|rjk |) .
Because the function acos : [1, 1] [0, 2] is increasing, clustering on the basis of the similarities
ajk = acos(|rjk |) is monotonically equivalent to treating abs(R) = (|rjk |) as a similarity matrix.
Clusters can be visualized using modified correlation diagrams in which each variable and its negative is represented by a radius, i.e., each variable is represented by a diameter. Thus, clusters
of variables correspond to clusters of diameters. Figure 2s modified correlation diagram replaces
each radius in Figure 2s correlation diagram with a diameter. The impact on the clusters that
the eye discerns is striking. While one might discern as many as five distinct clusters in Figure 1,
it is difficult to discern more than two clusters in Figure 2. In particular, variables 1, 2, 3, 4 and
variables 6, 7 form two diametrically opposed clusters in Figure 1 and (along with variable 9) one
homogenous cluster in Figure 2. The same remark applies to variable 5 and variables 8, 10.
Although clustering from the similarities acos(|rjk |) is monotonically equivalent to clustering
from the similarities |rjk |, visualizing clusters via diameters in a correlation diagram formed from
R is not equivalent to visualizing clusters via radii in a correlation diagram formed from abs(R).
To illustrate the difference, consider the correlation matrix

1
2/2
0

2/2

2/2

1
2/2 0

R=
,

2/2
1
2/2
0

2/2
0
2/2
1
which has eigenvalues (2, 2, 0, 0). Because rank(R) = 2, the rjk can be represented exactly in a
correlation
diagram
with = (0, /4, /2, 3/4). However, the eigenvalues of abs(R) are (1 +

2, 1, 1, 1 2), so abs(R) is not a correlation matrix and an exact representation of the |rjk | is
impossible. In fact, if we minimize (3) for abs(R), we obtain solutions of the form = (0, a, a, 0),
in which variables 1 and 4 are represented as identical, as are variables 2 and 3.

5
5.1

Three Case Studies


Intelligence Tests

The matrix R = (rjk ) of correlation coefficients displayed in Table 2 was analyzed by Guttman
(1965) and by Borg and Groenen (1997). Because each rjk > 0, the techniques proposed here do
not yield dramatically different conclusions than traditional MDS methods. They do, however,
provide new insights into how to interpret the results of traditional analyses.

5
. . . . . . . . . . .
. . .
. .
. ....
. .
. .
...
. .
...
. .
. .
.
...
.
.
...
.
.
. ....
.
...
.
...
.
.
.
...
.
...
..
.
...
...
. .....
.
...
...
.
...
...
.
...
.
.
.
...
...
...
.
.
.
...
...
...
.
.
.
.
.
...
...
.
...
.
.
.
.
... ...
...
.
.
.
.
.
... ...
.
...
.
.
... ..
.
...
.
.
.
.
.
.
.
... ...
.
...
.
.
.
...
... ...
.
.
.
...
.
... ...
.
.
.
.
... ..
...
.
.
... ...
...
.
.
... ...
.
...
... ...
.
.
...
... ...
.
... ...
.
.
...
.
.
.
.
.
... ..
.
...
.
.
.
.
.
.
... ...
...
............
.
.
...
.
... ...
. .............
.
.
.
...........
... ...
...
.
.
.
...........
.
.
.
.
.
.
.
.
.
. ....
... ... ...
...........
..............
.
... .. ..
...........
.............. ..........................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...........
.
...... ..
........ ..........................
.
.
.
.
.
...........
.
.
.
.
.
.
.
.
.
.
.
...... ...
..
...
....
...........
.........................
.............. .................
....................... .
...... ..
...........
.........................
.............. ..............................................................
.
...... ...
...........
.........................
..............................
..
................
...
...... ..
.........................
............................... ........................
.
....................................
.
......................... ......................
..........................................................................................................................................................
.
.
.
.
......................... ........... ............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....................... ... .......................................................................
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
....... ........... ........................
.........................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.........................
.........
...........
.
...
.....
... ..
.........................
...........
....................................................................................................................................................................
... .......
.
...........
.........................
......... ........
.............
... .......
...........
.........................
.
....................... ...........................................................
...........
...
... .......
.......................
...........
........... ......................
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...........
... ......
.......................... ......................
...........
... ......
.
...........
. .........................
... .......
...........
.
... ...
........
...........
...
........... .
... ...
.
.
.
.
.
........
... ...
.
...
... ...
...
.
... ...
.
...
... ...
.
...
.
... ...
...
.
.
.
.
.
... ...
...
.
.
.
... ...
.
...
.
... ...
...
.
... ...
.
...
... ...
.
...
.
... ....
.
.
.
...
.
... ....
.
...
... ...
.
.
... ...
...
.
... ...
.
...
.
... ...
.
.
.
.
...
...
.
.
.
... .....
...
.
.
...
.
...
.
.
... .....
...
.
...
...
.
...
.
.
.
...
.
...
...
.
...
.
...
...
.
.. .
...
.
...
... . ...
.
...
....
.
.
. .
...
. .
...
. .
...
. .
. .
... . . .
. .
.
.
. . .
. . . . . . . . . . . .

10

9
2
1
4
7
6
3

3
6
7
4
1
2
9

10

Figure 2: A modified correlation diagram, corresponding to the correlation diagram in Figure 1.


Here, each diameter represents one variable.
Figure 3 displays two correlation diagrams of R, corresponding to the following values of :
i
1
2
3
4
5
6
7
8
f ()

i 0.000 0.167 1.221 1.460 1.555 1.226 0.144 0.004 8.281745


i 0.000 0.092 0.427 1.415 1.609 1.357 1.410
0.145 8.406409
These local minimizers were found by minimizing (3) from random starting values; we believe that
is a global minimizer. These solutions are roughly comparable with respect to how well they
approximate R, but they cluster the variables differently. Both suggest two clusters of tests, but
the first suggests clustering test 3 with tests 4-5-6 and test 7 with tests 1-2-8, whereas the second
suggests clustering test 3 with tests 1-2-8 and test 7 with tests 4-5-6. From these observations,
we deduce that a single dimension of angular separation is not enough to adequately display the
correlations between these tests. In fact, f ( ) is quite large, corresponding to a root mean squared
error of more than 0.38 per correlation coefficient.
In contrast, Figure 4 displays two traditional MDS representations. Given eight points in <2 ,
we store their coordinates in the rows of the 82 configuration matrix X and minimize an objective
function of the form
i2
Xh
s
s (X) = 2
dsjk (X) jk
,
j<k

10

N/G A/I Test


1
2
3
4
5
6
7
8
N
A
1
1.00 0.67 0.40 0.19 0.12 0.25 0.26 0.39
N
A
2
0.67 1.00 0.50 0.26 0.20 0.28 0.26 0.38
N
I
3
0.40 0.50 1.00 0.52 0.39 0.31 0.18 0.24
G
I
4
0.19 0.26 0.52 1.00 0.55 0.49 0.25 0.22
G
I
5
0.12 0.20 0.39 0.55 1.00 0.46 0.29 0.14
G
A
6
0.25 0.28 0.31 0.49 0.46 1.00 0.42 0.38
G
A
7
0.26 0.26 0.18 0.25 0.29 0.42 1.00 0.40
G
A
8
0.39 0.38 0.24 0.22 0.14 0.38 0.40 1.00
Table 2: Correlation coefficients of eight intelligence tests, coded for language (N = numerical,
G = geometrical) and requirement (A = application, I = inference).

5 4

3,6

. . . . ... . .... . . .
.
. .
.. ..
... . .
. .
.
... ...
...
.
..
... ...
.
.
...
.
.
.... ....
.
.
.
...
... ...
.
.
.
. ..
..
.
.
.
.
.. ...
..
.
.
.
.
.... .... ....
.
.
... ... ...
.
.
... ... ...
.
.
... ... ...
.
.. .
.
..... ...
.
.
.
.
.
.
..... ...
.
.
.
.....
.
.
.
.
.
.
.
.
.
.
............
.
.
.
.
.
................................. ....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........
.
.
.
.
.
.
.........
.
...........................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
.
. .
. . . . . . . . .

4
76

. . . ..... . . ... ... . .


. .
...
. .
.. ..
...
. .
... ...
.
...
.
.. ..
.
.
...
.. ....
.
.
.
.
...
. ...
.
.
.
.
... .. ..
.
.
.
.
... . .
.
.
.
.
.
.
.
... ....
.
.
.
.
... ....
.
.
.
... ...
.
.
... .....
......
.
.
.
.
.
... ......
.
.
.
.
.
... .....
........
.
...
........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ....
.
........
.........
........
.
.
.
.
.
.
.
.
.
.
.
.
......
...
.......... .....
.
..... ............... ......................................................................................
....................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
.
. .
. . . . . . . . .
.

2,7
1,8

3
8
2
1

Figure 3: Two correlation diagrams of the eight intelligence tests described in Table 2. Each
diagram corresponds to a different local minimizer of the objective function in (3).
where djk (X) is the Euclidean distance between points j and k and jk = (2 2rjk )1/2 is the
dissimilarity between variables j and k. Choosing s = 1 gives the raw stress criterion; choosing
s = 2 gives the raw sstress criterion. The former is more popular in the MDS community, perhaps in
part because most people process Euclidean distance more easily than squared Euclidean distance.
Notice, however, that
2
X 
1
2 (X) = 2
4 rjk 1 + d2jk (X)
2
j<k
measures the quality of the approximation of R, so that values of 2 /4 are directly comparable to
values of f . In this sense, sstress is more natural than stress for visualizing correlation.
The quality of the 2-dimensional MDS approximations to R is substantially better than the
quality of the 1-dimensional angular approximations: 2 (X 1 )/4 = 3.48787 and 2 (X 2 ) = 2.781102,
corresponding to respective root mean squared errors of approximately 0.25 and 0.22 per correlation
coefficient. Each of these roughly circular configurations represents tests 3 and 7 in diametric
11

1 2

4
7

Stress

SStress

Figure 4: Two 2-dimensional representations of the eight intelligence tests described in Table 2,
obtained by multidimensional scaling. X 1 (left) minimizes the raw stress criterion; X 2 (right)
minimizes the raw sstress criterion.
opposition, suggesting that they are quite dissimilar. This suggestion is correct: in fact, r37 = 0.18
is one of the smallest correlations in a matrix of positive correlation coefficients. Thus, a clustering
that places tests 3 and 7 in the same cluster would be unsatisfying. The two different clusterings
suggested by the correlation diagrams in Figure 3 resist placing tests 3 and 7 in the same cluster.
The first clustering, 3-4-5-6 versus 1-2-7-8, corresponds to a linear cut (roughly, from 1:00 to 7:00)
in Figure 4. The second clustering, 1-2-3-8 versus 4-5-6-7, also corresponds to a linear cut (roughly,
from 2:00 to 8:00) in Figure 4.
It is easy to misconstrue the information conveyed by the configurations in Figure 4, which
should not be interpreted as h-plots. Recall that we are unable to identify a point of origin that
would permit such an interpretation. Should we thoughtlessly position it at the center of this
roughly circular configuration, then we would be misled into thinking that many pairs of tests have
negative correlation. In contrast, the correlation diagrams in Figure 3, both of which position the
eight tests within arcs of less than 1.61 radians, correctly indicate that all of the tests are positively
correlated. Instead, Figure 4 should be regarded as planar approximations of a 2-dimensional
angular representation in which the tests are represented as unit vectors in <3 and the angles
between these vectors approximate the correlations between the tests. Evidently, the quality of
such planar approximations will depend on how much of the sphere is used for representation.
To assess the quality of the planar approximation in Figure 4, we started searches from various randomly generated values of (, ) and identified several local solutions of problem (5). The
best of these solutions is displayed in Figure 5. The quality of this optimal 2-dimensional angular approximation of R is only minutely better than the 2-dimensional sstress approximation
(2.747227 < 2.781102), while the evident similarities between Figures 4 and 5 validate our interpretation of the former as a planar approximation of the latter. We conclude that MDS can be
used to visualize correlation if the correlation coefficients are sufficiently positive.

12

|
7

C
o
l
a
t
i
t
u
d
e
()

|
/2

|
0

|
/2

Longitude ()
Figure 5: A 2-dimensional angular representation of the eight intelligence tests described in Table
2, obtained by solving the optimization problem (5). Each test is identified with a point on a sphere
of constant radius; the longitude and latitude of each point is displayed.

5.2

Pit Props

If the correlation coefficients are both positive and negative, then one should anticipate that angular
representations of correlation will be superior to distance (MDS) representations of correlation. A
well-known example of such a matrix was reported by Jeffers (1967).
Jeffers (1967) analyzed data from a study to determine whether or not pitprops cut from
home-grown timber are sufficiently strong for use in the mines. In this study, each of 180 pit props
made of Corsican pine from East Anglia were measured for maximum compressive strength and a
number of other variables, 13 of which were reported by Jeffers. Jeffers was primarily interested
in using the method of orthogonalized regression to predict maximum compressive strength from
(principal components of) the 13 specified variables. This concern is beyond the scope of our efforts
to illustrate the use of correlation diagrams. However, R = (rjk ), the correlation matrix of the 13
predictor variables, is noteworthy for containing correlations as positive as r12 = 0.954 and as
negative as r7,13 = 0.424.3
Three of five attempts to minimize f from random starting values of resulted in the correlation
diagram in Figure 6. Again, the quality of this 1-dimensional angular approximation of R is rather
poor: f ( ) = 31.60544, corresponding to a root mean squared error of approximately 0.45 per
correlation coefficient. Again, the fit of the 1-dimensional angular approximation is inferior to the
fit of the 2-dimensional MDS solution obtained by minimizing the raw sstress criterion: 2 (X 2 )/4 =
15.43195, corresponding to a root mean squared error of slightly more than 0.31 per correlation
coefficient. And again, the fit of the 2-dimensional angular approximation is superior to the fit of
the 2-dimensional MDS solution: g( , ) = 12.22801 (best of ten attempts), corresponding to a
3

This correlation matrix also appears as Data Set 148 in Hand et al (1994).

13

13 12
. . . . . .... .... . . . . .
. .
. . .
.. ...
. .
. .
... ...
. .
. .
.
... ..
... .
.. ...
. .
.
.
. ...
.
.
... . .
.
.
.. ...
..
.
.
.
..
.
.
.
.. ...
.
.
.
.
.
.
... ...
...
.
.
..
.
.
.... ...
.
.
.
.
.
.
... ...
..
...
.
... ...
.
.
..... .
..
... ...
.
.
..... .
.
.
.
.
.
.
... ...
.
.
...
.
.
.
...
.
... ...
.
.
..
.
.
.. ..
...
.....
.
.
.....
...
... ...
.....
.
.
.
.
...
.
.... ....
.
.
...
...
.
.
..
.
.
... ...
.
.
.
...
.
.
....... .
.
.
.
.
.
... ...
.
.
.
...
.....
.
.
.
.
...
.
.
.
.
... ...
.
.
.
.
...
....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
.....
.......
... ...
.
.
..
......
.....
.
.
.
.....
.
.
.
.
.
.
.
.
.
.
.
.
...
.....
.
.
.
.
...
.
.
.
........
.
.
.
.
.
.
..
.....
.......
....
..
......
.....
.
.
.
.
.
.
.
.
.
.
.
........
.
.
.
...
.....
.
.
.
.
...
.
.
.
.
......
.
.
.
.
...
.
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....
...
.
.
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
...
..
.
.
........ .... ......... .............
.
.
..... ... ....... ..........
.
.. . .
.
.
.... ..... ......................
.
.
.
.
.. ........................
.
.
.
.
....................................................... .............................................................................................................................................
.
..................................... ..................................................................................................................................................
..
. ...
.
.. ............................. ...........................
.
.
.
.
.
.
..............
.
................
.
..............
.......... .......
...
.
..............
............ ......
.
...
..............
................. .......
.
..............
...
............... .......
.
..............
................ ......
.
....
.
..............
.
................. .......
..............
..
.
..... .......... .......
.............
..
..... ............
...
.
..... .......... ...........
.
...
......
..... ...........
.
.
....
.
.
......
..... ..........
.
......
..... ...........
..
.
......
..... ..........
.
...
......
..... ............
.
......
..
..... ..........
.
......
..
.
.
.
.
.
.
.
.
.
.
......
.
.
.
.
..... ..........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..... ..........
......
.
.
......
..... ............
...
.
......
..... ..........
.
...
......
..... ...........
.
...... .
..... ...........
...
......
.
..
..... ...........
.
..... ..... .....
.
...
.
..... ...... ......
.
. .
.....
...
..... .................... . .
.
...
.....
. .
.
..
.
..... .............
.
..
.....
.
.
.
..... .
...
.
...
...
.
.
.
...
.
..
.
.
.
.
.
.
.
.
.
.
.
...
.
. .
...
. .
. .
...
. .
..
. .
.
.
.
.
. .
. .
. . .....
. . . . . . . . . . . .

11

1
2

10

78

Figure 6: A correlation diagram of 13 variables measured on each of 180 pine pit props.
root mean squared error of 0.28 per correlation coefficient.
Jeffers (1967) noted two interesting observations which may be made by inspection of these
correlations. Let us re-examine his observations in light of Figure 6. First, Jeffers noted that all
variables except j = 5, 11, 12, 13 were significantly correlated with the size of the props, i.e., with
variables k = 1 (top diameter of prop) and k = 2 (length of prop). The strong positive correlation
between top diameter and length (r12 = 0.954) is accurately represented in the correlation diagram.
Furthermore, although we have not encoded information about statistical significance in Figure 6,
the correlation diagram represents the four variables j = 5, 11, 12, 13 as those least correlated with
variables k = 1, 2.
Jeffers further noted that, besides being significantly correlated with variables k = 1, 2, variable
j = 3 was significantly correlated with variables k = 4, 6, 11, 12. In fact, the six variables most
correlated with variable j = 3 were k = 4 (0.882), k = 1 (0.364), k = 2 (0.297), k = 12 (0.220),
k = 11 (0.162), and k = 6 (0.153). Except for variable k = 9 (0.125), these are the six variables
that the correlation diagram represents as most correlated with variable j = 3. Such fidelity is
remarkably satisfying for a 1-dimensional representation of the data.

14

5.3

Gene Expression Profiles

Several studies have used MDS to visualize relationships between gene expression profiles.4 Luo
et al.s (2001) study of prostate cancer, in which profile dissimilarity was measured by a weighted
Euclidean distance, is not germane to our investigation. In Khan et al.s (1998) study of alveolar
rhabdomyosarcoma (ARMS), Bittner et al.s (2000) study of cutaneous malignant melanoma, and
Desai et al.s (2002) study of mouse mammary cancer, profile dissimilarity was measured by subtracting Pearson correlation from one. In Jazaeri et al.s (2002) study of ovarian cancer, profile
similarity was also measured by Pearson correlation, although the transformation to dissimilarity
was not reported.
Khan et al. (1998) investigated the problem of expression profiling in ARMS because these
tumors are known to be relatively uniform genetically. They used cDNA microarrays to obtain
gene expression profiles for seven ARMS cell lines, characterized by the presence of the PAX3FKHR fusion gene, and six unrelated cancer cell lines, concluding that the correlations among
the ARMS cell lines is greater than the correlations between the ARMS cell lines and the other cell
lines. Using the reported correlation coefficients, this pattern is easily discerned in Figure 7.
. . . . . . . . . . .. .
. . .
.. .. . .
. .
... ....... . ... .
. .
. .
.
..
... .....
. .
...
.... ........
. .
.
.. ....
...
.
.
..
..
.... ........
.
.
..
. ....
.
... . .
.
.
...
.
..
.. ....
.
.
...
.. .......
..
.
.
.
.
.
.
.
.
.
.
.
.
... ........ ....
.
...
.
...
.
..
... ....
...
.
.
.
.
.... ........ ....
..
.
.
.
.
... ...... ...
..
.
.
.
... ...... ...
.
..
.
.
.
.
... ...... ...
..
.
.
.
.
... ...... ...
..
.
.
.
.
... ...... ...
..
.
.
.
... ...... ...
.
..
.
.
.
.
... ...... ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. ...... ...
..
.
.
.
.
.
.
..
.. ...... ...
.
.
.
.
......
..
.
.. ...... ...
.
.
.
.
.
.
.
.
.
.
.....
.
..
.
.
.
.
.
.
.
.
.
............. ....
.
.
.......
.
...
.. ... ..
........
...
..... ..
............
.........
.
...
..............
........... ............
........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....... ....................
......
.
.
.
.
.
.
.
....... ... .....
.
.
.
.
.
.
.
.
.
.
.
.
.
.....
......... ...............................................
...... ... ....
.....................
........
.
.
.
.......... ....
......... .............................................. ...........................................
.
....
.........
.. ...
.
........... .....
...............................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.... ...... .......
........
........ ...
......... ..........................................
.
.
............. ........................................................................................
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..............................................................................................................................................................................................
.
....................................................................................................................................................................................................
.
..............
.
..............
.
..............
.
..............
.
..............
.
..............
.
..............
.
..............
.
.
..............
..............
.
.
..............
.
.............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
. .
. .
. .
. .
. .
. .
. . .
. . . . . . . . . . . .

A
A
A
AA
A
A

Figure 7: A correlation diagram of Khan et als gene expression profiles for seven ARMS cell lines
(A) and six unrelated cancer cell lines.
Jazaeri et al. (2002) used cDNA microarrays to compare gene expression patterns in ovarian
cancers associated with BRCA1 or BRCA2 mutations with gene expression patterns in sporadic
epithelial ovarian cancers. . . , obtaining profiles for 18 patients with BRCA1 mutations, 16 patients
4

Vijay Dondeti searched the literature on microarray experiments for studies that have used MDS.

15

with BRCA2 mutations, and 27 patients with sporadic ovarian cancers. These p = 61 profiles
were published electronically, in a supplemental table that contains one row for each of 6445 genes.
Within this table, there are 4703 rows/genes for which no patient is missing data.
To illustrate our method of visualizing correlation, we restricted attention to the one percent
of the complete rows (n = 47) with the largest variances across patients. We then computed the
61 61 matrix R = (rjk ) of Kendall b correlation coefficients between patient profiles, obtaining
off-diagonal entries that ranged between rjk = 0.051 and rjk = 0.804. A 1-dimensional angular
representation of these correlations, for which the root mean squared error is 0.373, is displayed in
Figure 8. In this correlation diagram, we have displayed only the endpoints of the radii and used
three concentric circles to separate the three groups of patients. The correlation diagram reveals a
tentative clustering of patients, summarized in Table 3.
. . . . . . . . . . . .
. .
. . .
. .
. .
. .
. .
.
. . . . . .
. .
. . . . .
. .
. .
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
. .
.
.
. . . . . . . . .
. .
.
.
. . .
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
. .
.
.
. .
.
.
. .
.
.
.
.
. .
.
. .
.
.
. . .
.
. .
.
.
. . .
.
.
.
.
.
.
.
.
.
.
. .
.
.
. .
.
. .
.
. .
.
. .
. .
. . . .
. . .
. .
. .
. . . . . . .
.
.
. .
. .
. .
. .
. . .
. . . . . . . . . . . .

Figure 8: A correlation diagram of Jazaeri et als (2002) gene expression profiles for 61 ovarian
cancer patients. For clarity, only one point (instead of the entire radius) per patient was plotted.
The 16 patients with BRCA2 founder mutations appear on the inner circle, the 18 patients with
BRCA1 founder mutations appear on the middle circle, and 27 other patients appear on the outer
circle.

Discussion

We have considered several methods of visualizing correlation. Our fundamental message is that
correlation is a measure of angular separation, not distance. Attempts to visualize correlation
using distances, e.g., by treating variables as objects and applying multidimensional scaling as in
16


BRCA2 BRCA1 Sporadic Total
(1.800, 1.070)
8
4
16
28
(0.798, 0.579)
2
1
2
5
(0.421, +0.108)
6
13
9
28
Table 3: Numbers of patients in three ad hoc clusters of Jazaeri et al.s (2002) 61 ovarian cancer
patients revealed by the correlation diagram in Figure 8.
Hills (1969), are subject to misinterpretation. The multivariate technique of h-plots represents
correlation using angles, but is specific to Pearson product-moment correlation. We have proposed
a technique that can be used with other correlation coefficients.
The correlation diagrams proposed herein might be improved by various graphical enhancements. Each vector might subsequently be multiplied by the standard deviation of the variable to
which it corresponds, as in an h-plot. Alternatively, a referee suggested that one might use vector
length to impart information about goodness of fit. For example, let denote a minimizer of (3),
let E = (ejk ) = R C( ) denote the matrix of residuals, and let
ej =

p
X

!1/2

e2jk

k=1

Then vector j might subsequently be multiplied by (say) 1 ej , so that short vectors indicate
variables that compromise model (1). Rather than choose between correlation diagrams like Figure 1, in which variables are represented by radii, and modified correlation diagrams like Figure 2,
in which variables are represented by diameters, Cheryl Jenkins suggested the possibility of representing the original radii in one color and the opposing radii in another color, thereby allowing one
to discern features of both diagrams in a single figure.
From a computational perspective, a major challenge is global optimization. The existence
of nonglobal minimizers necessitates multiple random starts when minimizing (3) and virtually
precludes minimizing (4). In future work, we hope to develop deterministic methods for generating
good starting values of , in the spirit of Malone, Tarazaga, and Trosset (2002).

Acknowledgments
I thank Vijay Dondeti, Cheryl Jenkins, three anonymous referees, and an anonymous associate
editor for suggestions that resulted in a better manuscript.

References
[1]

Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., simon,
R., Yakhini, Z., Ben-Dor, A., Sampas, N., Dougherty, E., Wang, E., Marincola, F., Gooden,
C., Lueders, J., Glatfelter, A., Pollock, P., Carpten, J., Gillanders, E., Leja, D., Dietrich, K.,
Beaudry, C., Berens, M., Alberts, D., Sondak, V., Hayward, N., and Trent, J. (2000). Molecular
classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406:536
540.

17

[2]

Borg, I. and Groenen, P. (1997). Modern Multidimensional Scaling: Theory and Applications.
Springer-Verlag, New York.

[3]

Corsten, L. C. A. and Gabriel, K. R. (1976). Graphical exploration in comparing variance


matrices. Biometrics, 32:851863.

[4]

Cox, T. F. and Cox, M. A. A. (1994). Multidimensional Scaling. Chapman & Hall, London.

[5]

Desai, K. V., Xiao, N., Wang, W., Gangi, L., Greene, J., Powell, J. I., Dickson, R., Furth, P.,
Hunter, K., Kucherlapati, R., Simon, R., Liu, E. T., and Green, J. E. (2002). Initiating oncogenic event determines gene-expression patterns of human breast cancer models. Procedings
of the National Academy of Science, 99(10):69676972. A subsequent correction appeared on
July 23, 2002, page 10227.

[6]

Fagot, R. F. and Mazo, R. M. (1989). Association coefficients of identity and proportionality


for metric scales. Psychometrika, 54:93104.

[7]

Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal
component analysis. Biometrika, 58:453467.

[8]

Gay, D. M. (1983). Algorithm 611. Subroutines for unconstrained minimization using a


model/trust-region approach. ACM Transactions on Mathematical Software, 9:503524.

[9]

Gay, D. M. (1984). A trust region approach to linearly constrained optimization. In Lootsma,


F. A., editor, Numerical Analysis. Proceedings, Dundee 1983, pages 171189, Berlin. Springer.

[10] Gower, J. C. (1966). Some distance properties of latent root and vector methods in multivariate
analysis. Biometrika, 53:325338.
[11] Guttman, L. (1965). A facted definition of intelligence. Scripta Hierosolymitana, 14:166181.
[12] Hamadeh, H. and Afshari, C. A. (2000). Gene chips and functional genomics. American
Scientist, 88:508515.
[13] Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J., and Ostrowski, E. (1994). A Handbook
of Small Data Sets. Chapman & Hall, New York.
[14] Hills, M. (1969). On looking at large correlation matrices. Biometrika, 56:249253.
[15] Jazaeri, A. A., Yee, C. J., Sotiriou, C., Brantley, K. R., Boyd, J., and Liu, E. T. (2002). Gene
expression profiles of BRCA1-linked, BRCA2-linked, and sporadic ovarian cancers. Journal
of the National Cancer Institute, 94(13):9901000. The gene expression profiles appear in a
supplemental table at
http://jncicancerspectrum.oupjournals.org/jnci/content/vol94/issue13/.
[16] Jeffers, J. N. R. (1967). Two case studies in the application of principal components analysis.
Applied Statistics, 16:225236.
[14] Khan, J., Simon, R., Bittner, M., Chen, Y., Leighton, S. B., Pohida, T., Smith, P. D., Jiang,
Y., Gooden, G. C., Trent, J. M., and Meltzer, P. S. (1998). Gene expression profiling of
alveolar rhabdomyosarcoma with cDNA microarrays. Cancer Research, 58(22):50095013.
[17] Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric
hypothesis. Psychometrika, 29:127.
18

[18] Luo, J., Duggan, D. J., Chen, Y., Sauvageot, J., Ewing, C. M., Bittner, M. L., Trent, J. M.,
and Isaacs, W. B. (2001). Human prostate cancer and benign prostatic hyperplasia: Molecular
dissection by gene expression profiling. Cancer Research, 61:46834688.
[19] Malone, S. W., Tarazaga, P., and Trosset, M. W. (2002). Better initial configurations for
metric multidimensional scaling. Computational Statistics and Data Analysis, 41:143156.
[20] Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multivariate Analysis. Academic Press,
Orlando.
[21] Seber, G. A. F. (1984). Multivariate Observations. John Wiley & Sons, New York.
[22] Zegers, F. E. (1986). A family of chance-corrected association coefficients for metric scales.
Psychometrika, 51:559562.
[23] Zegers, F. E. and ten Berge, J. M. F. (1985). A family of association coefficients for metric
scales. Psychometrika, 50:1724.

19

You might also like