You are on page 1of 55

Data Analysis

2018/2019
Frederico Cruz Jesus
fjesus@novaims.unl.pt
Chapter 2 – Principal Components Analysis

1. Introduction
2. Geometry of PCA
3. Analytical Approach
4. PCA application
5. Issues relating to the use of PCA
Introduction

Objectives of PCA
Imagine the following hypothetical situation:
• A financial analyst that is interested in determining the financial health of firms in a
specific market sector. For doing so, he/she has a data set consisted of 1.000 firms in
which, for each, there is information about 50 ratios. The analyst would be dealing
with 50.000 pieces of information. However, this task would be made simpler if he/she
could reduce the number of rations from 50 to, say, three.

Principal Components is the appropriate technique for achieving each of the above
objectives. PCA is a technique for forming new variables which are linear composites of
the original variables. The maximum number of new variables that can be formed is
equal to the number of original variables and the variables are uncorrelated among
themselves.
Introduction

Objectives of PCA
Principal Components Analysis can be considered as:

• A technique that is used to generate non-correlated variables, that are linear


combination of the original variables;

• A technique which allow us to generate k new variables, in which k is less or equal than
the number of original variables (p);

• It is a technique often confused with Factor Analysis (Chapter 3) but, conceptually,


different. This fact has to do, probably, with the fact that some statistical software
packages have PCA as an option of FA.
Geometry of PCA

Geometric view of PCA


In order to introduce PCA let s consider we are interesting in analyzing 12 customers in
respect to two variables (say, income - X1 - and generated profit - X2).

The variance of the two variables is, respectively, 23.091 and 21.091, thus being the total
variance 44.182. Note that the variance of the X1 in respect to the total variance is
52.26%, whereas the variance of X2 represents 47.74%.

The correlation coefficient between the two variables is 0.746.

The original and mean-corrected data, as well as its projection in the variable space, can
be seen in the next slides.
Geometry of PCA

Identification of alternative axes and forming new variables


Geometry of PCA
Geometry of PCA

Identification of alternative axes and forming new variables


Although we are in the presence of a relatively small data set, which can be easily
analyzed using uni- and bivariate statistics, as well as graphical representation of data,
let s assume we are interesting in increase its interpretability – in this case reducing the
number of dimensions, i.e., variables.

One could create a new dimension, or axis - X*1 - making an angle of Ɵ degrees with X1
and, naturally, 90-Ɵ degrees with X2. Observations could then be also projected in respect
to X*1, which can be considered a new variable.
Geometry of PCA

Identification of alternative axes and forming new variables


As discussed in Chapter 1, the coordinates of the points in respect to X*1 will be given by a
linear combination of the coordinates in the original axes X1 and X2:

x1* = cos( ).x1 + sin( ).x2


In which x*1 is the coordinate of the observation in X*1 and X1 and X2 the observation coordinates in
the axes X1 and X2
Geometry of PCA

Identification of alternative axes and forming new variables

For a Ɵ of, say, 10 degrees, the equation is:

x1* = 0.985 * x1 + 0.174 * x2


Geometry of PCA

Identification of alternative axes and forming new variables


Geometry of PCA

Identification of alternative axes and forming new variables

For a Ɵ of, say, 35 degrees, the equation is:

x1* = 0.819 * x1 + 0.574 * x2


Geometry of PCA

Identification of alternative axes and forming new variables


Geometry of PCA

Identification of alternative axes and forming new variables

It becomes clear that the percentage of variance accounted for X*1 increases as the angle
Ɵ between X*1 and for X1 increases and then, after a certain maximum value, the variance
accounted for X*1 starts to decrease. Hence, there is one and only one new axis that
results in a new variable accounting or the maximum variance in the data.
Geometry of PCA

Identification of alternative axes and forming new variables


Geometry of PCA

Identification of alternative axes and forming new variables


Note that X*1 does not account for all amount of variance in the data. Therefore, it is
possible to identify a second axis such that the corresponding new variable accounts for
the maximum of the variance that is not accounted for by X*1. Let X*2 be the second new
axis that is orthogonal to X*1. If the angle between X1 and X*1 is Ɵ, then the angle
between X2 and X*2 will also be Ɵ.

For Ɵ = 43.216 degrees, the two new variables are defined by the following equations:

x1* = cos 43.261* x1 + sin 43.261* x2 = 0.728 * x1 + 0.685 * x2

x2* = − sin 43.261* x1 + cos 43.261* x2 = −0.685 * x1 + 0.728 * x2


Geometry of PCA

Identification of alternative axes and forming new variables


Below we have the mean, SS, and variance for the two new variables as well as the SSCP,
S, and R matrices.
Geometry of PCA

X*2
X*1
Geometry of PCA

Identification of alternative axes and forming new variables


The previous geometrical illustration of PCA can be easily extended to more than two
variables. A data set consisting of p variables can be represented graphically in a p-
dimensional space with respect to the original p axes or p new axes.

The first new axis X*1 results in a new variable x*1, such that this new variable accounts
for the maximum of the total variance. After this, a second axis, orthogonal to the first
axis, is identified such that the corresponding new variable, x*2, accounts for the
maximum of the variance that has not been accounted for by the first new variable x*1
and, x*1 and x*2 are uncorrelated. This procedure is carried until all the p new axes have
been identified such that the new variables x*1, x*2…. x*p account for successive
maximum variances and the variables are uncorrelated.

Note that once the p-1 axes have been identified, the identification of the pth axes will be fixed due to
the condition that all the axes must be orthogonal. Note that the maximum number of new variables,
i.e., principal components, is equal to the number of original variables.
Geometry of PCA

PCA as a dimensional reduction technique


In the previous sub-section it was demonstrated that PCA essentially reduces to identify a
new set of orthogonal axes. The principal components scores, or the new variables, were
projections of points onto the axes.

Let s consider the case where, instead of using both the original variables we would only
use X*1 to represent most of the information in the data. Geometrically, we were
representing the data in an one-dimensional space. In the case of p variables one may
want to represent the data in a lower k-dimensional space, where k<<p.

Representing data in a lower-dimensional space is referred to as dimensional reduction.


Hence, PCA, can be considered as a dimensional reduction technique.
Geometry of PCA

PCA as a dimensional reduction technique


One of the first questions we probably ask ourselves is: “How well can the few k
variable(s) represent the information in the data?”, or, geometrically, how well can we
capture the configuration of the data in the reduced-dimensional space? Consider the
plot below.
Geometry of PCA

PCA as a dimensional reduction technique


From another perspective… How does the new variable (alone) represent the amount of
information?
Geometry of PCA

PCA as a dimensional reduction technique


There is no direct and universally corrected answer to the previous question. However, in
most cases, the sum of the variances of the new variables not used to represent the data
is used as the measure for the loss of information resulting from representing the data in
a lower-dimensional space.

For each situation where PCA is used, the question whether the loss of information is
substantial or not depends on the purpose or objective of the study.
Geometry of PCA

Objectives of PCA
From a geometric point of view, PCA aims to identify a new set of orthogonal axes such
that:
• The coordinates of the observations with respect to each of the axes give the values for
the new variables. As mentioned previously, the new axes or the variables are called
principal components and the values of the new variables are called principal
components scores;
• Each new variables is a linear combination of the original variables;
• The first new variable accounts for the maximum variance in the data;
• The second new variable accounts for the maximum that has not been accounted for
by the first variable;
• The pth new variable accounts for the variance that has not been accounted for by the
first p-1 variables.
• The p new variables are uncorrelated.
Analytical Approach

Algebraic approach to PCA

 x11 x12 x1k  11 12 1 p 


x   2 p 
x22 x2 k  PCA
  22
X =  21 =
21

   
  p << k  
 xn1 xn 2 xnk   n1  n 2  np 
Analytical Approach

Algebraic approach to PCA


Assuming that there are p variables, and we are interesting in forming p new indices that
are linear combinations, from an analytic point of view, PCA may be expressed as:

1 = w11 x1 + w12 x2 + ... + w1 p x p


 2 = w21 x1 + w22 x2 + ... + w2 p x p

 p = wp1 x1 + wp 2 x2 + ... + wpp x p


Where 1, 2,…, p are the p principal components, the Wij the weights of the jth variable to the ith
principal components; and the x1 and x2 the two first mean-corrected original variables.
Analytical Approach

Algebraic approach to PCA


The weights, wij, are estimated such that:

• The first principal component 1, accounts for the maximum variance in the data, the
second principal component 2, accounts for the maximum variance that has not been
accounted by the first principal component, and so on;

• wi21 + wi22 + ... + wip2 = 1, i = 1, 2,..., p


• wi1w j1 + wi 2 w j 2 + ... + wip w jp = 0, for all i  j
Analytical Approach

Algebraic approach to PCA


The second point means that the squares of the weights sum to one and is, to some
extent, arbitrary. This condition is used to fix the scale of the new variable and is
necessary because it is possible to increase the variance of a linear combination by
changing the scales of the weights. The third point ensures that the new axes are
orthogonal to each other.

The mathematical problem is then: how do we obtain the weights such that these
conditions are satisfied. It can be shown that PCA reduces to finding the eigenstructure of
the S matrix of the original data.

Alternatively, PCA can also be done by finding the singular value decomposition (SVD) of
the data matrix or a spectral decomposition of the S matrix.
Interpreting PCA

How to perform PCA


A number of computer programs are available
for performing PCA. Among the most popular
ones are SAS®, IBM SPSS®, and R Project for
Statistical Computing (R).

In the classes we will use SAS, although other


packages may be used for the Course Project.
The next slides are dedicated to the most
common outputs given by most of the
programs, and will be used as a PCA basis of
interpretation.

The data used for generating the outputs is the


one provided in the beginning of this chapter.
Interpreting PCA

Interpreting PCA output


We can see that the total variance is 44.182, wherein X1 is 52.26% (23.091 /
44.182) of the total variance of the data.
The covariance between the two variables can be converted into correlation
coefficient by dividing the product of their standard deviations. As the
standard deviation of X1 and X2 are, respectively, 4.805 and 4.592, the
coefficient of correlation between X1 and X2 is 0.746.
The decomposition of the covariance matrix in eigenvalues gives us a vector
of eigenvalues, each associated with an eigenvector.
Interpreting PCA
Interpreting PCA

Descriptive Statistics
This part of the output gives the basic descriptive statistics such as the mean and the
standard deviation. As can be seen, the means of the variables are 8 and 3, and the
standard deviations 4.805 and 4.592.
Interpreting PCA

Eigenvalues
As we have two variables, the Var-Cov is a 2x2 matrix. Hence, it will be possible to form
two principal components, which is the same as saying that two eigenvectors, each
one associated with one eigenvalue will be formed.

The following part of the SAS output contains the eigenvalues of the Var-Cov matrix.

Note that the sum of the eigenvalues equals the total variance in the data. This means
that the first principal component represents 87.3% of the total variance of the data,
whereas the second principal component “only” represent 12.7%.
Interpreting PCA

Eigenvectors
The eigenvalues represent the variance of each principal component.

The following part of the SAS output shows the two eigenvectors extracted:

Accordingly, the two principal components may be written as:


1 = 0.728 x1 + 0.685 x2
 2 = −0.685 x1 + 0.728 x2
Where x1 and x2 are the mean-corrected variables
Interpreting PCA

Principal components scores


The principal components scores are the values of the principal components in each
observation. For example, the scores of the first observation can be computed using the
mean-corrected data in the previous equations:

1 = 0.728*8 + 0.685*5 = 9.25


 2 = −0.685*8 + 0.728*5 = 1.84
Very commonly, the principal components scores are standardized to have standard
deviation of one (note that the average is zero, as the data is mean-corrected). This can be
accomplished by dividing the obtained principal components scores by the standard
deviation of the corresponding principal component, i.e., the square root of the
eigenvalue. Additionally, note that the sum of the squared weights of each principal
component is one and the sum of the cross products of the weights is zero.
Interpreting PCA

Loadings
As it was already mentioned, the linear correlation between each of the principal
components is zero, as the principal components are not correlated, i.e., they are
orthogonal.

The correlations between each variable and the principal components extracted is named
as loadings. The loadings show the extent to which an original variable is important to the
formation of a principal component, i.e., higher absolute values of loadings mean that the
variables are very important in the principal components’ formation. Loadings are one of
the most important tools to understand the principal components’ formation.
Interpreting PCA

Loadings
Loadings can be obtained from the following equation:

wij
lij = i
sˆ j
Where lij is the loading of the jth variable for the ith principal component; wij is the weight of the jth
variable for the ith principal components, i is the eigenvalue (i.e., the variance) of the ith principal
component and ŝj is the standard deviation of the jth variable

In the specific case of our example data, the loading between X1 and the first principal
component is given by:

0.728
l11 = 38.576 = 0.941
4.805
Interpreting PCA

Interpreting PCA output


As we can see, the results given by SAS are similar to those obtained by the geometric
approach. The first principal component represents more than 87% of the total variance in
the data, making it the most relevant*.

Until this point it was demonstrated that PCA is the formation of new variables that are
linear combinations of the original variables. However, as a data analytic technique, the
use of PCA raises a number of issues that need to be addressed.

* - This consideration is, in fact, used very loosely as there are many cases where one might think that
the principal component (which has higher variance explained) may not be the most
important/interesting.
Issues relating PCA

Issues relating principal components analysis


Until this point it was demonstrated that PCA is the formation of new variables that are
linear combinations of the original variables. However, as a data analytic technique, the
use of PCA raises a number of issues that need to be addressed:
1. What effect does the type of data (i.e., mean-corrected vs. standardized) have on
principal components analysis?
2. Is principal components analysis the appropriate technique for forming the new
variables? That is, what additional insights if parsimony is achieved by subjecting
the data to a PCA?
3. How many principal components should be retained, i.e., how many new variables
should be used for further analysis or interpretation?
4. How do we interpret the generated principal components?
5. How can principal components scores be used in further analysis?
Issues relating PCA

Effect of type of data on PCA


Principal components analysis can be either done on mean-corrected or standardized
data. Each data set could give a different solution depending upon the extent to which the
variances of the variables differ. In other words, variances of the variables could have an
effect on PCA.

In general, the weight assigned to a variable is affected by the relative variance of the
variable. If we do not want to the relative variance to affect the weights, then the data
should be standardized so that the variance of each variable is the same (i.e., one).

The choice between the analysis obtained from the mean-corrected and standardized
data also depends on other factors. In cases for which there is reason to believe that the
variances of the variables do indicate the importance of a given variable, then mean-
corrected data should be used.
Issues relating PCA

Is PCA the appropriate technique?


Whether the data should or should not be used for principal components analysis
primarily depends on the objective of the study. If the objective is to form uncorrelated
linear combinations then the decision will depend on the interpretability of the resulting
principal components. If the principal components cannot be interpreted then their
subsequent use in other statistical techniques may not be very meaningful. In such a case
one should avoid PCA for forming uncorrelated variables.

On the other hand, it he objective is to reduce the number of variables in the data set to a
few variables (principal components) that are linear combinations of the original
variables, then it is imperative that the number of principal components be less than the
original variables. In such a case, PCA should only be performed if the data can be
represented by a fewer number of principal components without a substantial loss of
information, a notion that depends on the context of the study.
Issues relating PCA

Is PCA the appropriate technique?


Consider the case where scientists have available 100 variables or pieces of information
for making a launch decision for the space shuttle. It is found that five principal
components account for 99% of all the variation in the 100 variables. However in this
case, scientists may consider the 1% of unaccounted variation (i.e., loss of information) as
substantial, making the scientists to use all of the 100 original variables.

On the other hand, if variables are answers to a customer survey questionnaire, the five
principal components explaining 99% of the variation in the 100 questions may (probably)
be considered as excellent. Hence, if PCA is the appropriate technique or not, depends to
high extent to the context of the problem.

In any case, we know that if the variables are perfectly correlated, one principal
component will be enough to explain all the variation in the data. Some statistical tests
may be used to assess the correlation level in the data, however these have same
significant drawbacks.
Issues relating PCA

Number of principal components to extract


Once it has been decided that performing PCA is appropriate, the next obvious issue is
determining the number of principal components that should be retained. Note that, as
discussed earlier, the decision is dependent on how much information (unaccounted
variance) we are willing to sacrifice, which as it is obvious, is a judgmental question.

One of the most popular decisions are based one the following:
1. Kaiser's criteria – Retain the principal components with eigenvalues greater than
one;
2. Pearson’s criteria – Retain every principal components until 80% of the variance is
explained;
3. Scree plot’s method – Plot the percent of variance accounted for by each principal
component and look for an elbow.
Issues relating PCA

Interpreting the principal components


Since the principal components are linear combinations of the original variables, it is often
necessary to interpret or provide a meaning to the linear combination. As mentioned
early, one could use the loadings for interpreting the principal components.

The higher (in absolute terms) the loading of a variable, the more influence it has in the
formation of the principal component score and vice versa. Therefore, one can use the
loadings to determine which variables are influential in the formation of principal
components, and one can then assign a meaning or label to the principal component.
Issues relating PCA

Interpreting the principal components


The questions is then what is influential? How high should the loading be, i.e., what is the
threshold? Unfortunately, there is no “magic number” for a factor loading that, above it,
can be considered as “influential”. Traditionally, the statistical literature have used a
loading of 0.5 or above as threshold. In many cases, axes rotation are used to improve
interpretation, which will be addressed in the next chapter, as this approach was
developed initially for factor analysis.
Issues relating PCA

Use of Principal Components Scores


Principal components scores can be plotted for further interpreting the results. The scores
resulting from the principal components scores can also be used as input variables for
further analyzing the data making use of other multivariate techniques, such as cluster
analysis, regression, among many others.

The advantage of using principal components scores is that the new variables are not
correlated and the problem of multicollinearity is avoided. It should be noted,
nevertheless, that although this problem of multicollinearity is avoided, a new problem
(the difficulty in interpreting the principal components) arises from this approach.
Issues relating PCA

Use of Principal Components Scores


How to perform PCA in SAS®

PCA in SAS Base


Proc PrinComp •
COV •
STD •
DATA = PCA.EX_26_DATA •
OUT=PCA.PRINCOMPSCORES •
OUTSTAT=PCA.PRINCOMPSTATISTICS •
N=2 •
PREFIX='PRIN'n •
SINGULAR=1E-08 •
VARDEF=DF •
PLOTS(ONLY)=SCREE / MATRIX / PATTERNPROFILE / PATTERN (VECTOR)
VAR DEN GDP MED HOS BIB; •
RUN;
Proc Corr •
VAR DEN GDP MED HOS BIB PRIN1 PRIN2; •
RUN;
How to perform PCA in SAS®

PCA in SAS Enterprise Guide


In SAS Enterprise Guide® one can use a “wizard” approach to conduct PCA.
How to perform PCA in SAS®

PCA in SAS Enterprise Guide


In SAS Enterprise Guide® one can use a “wizard” approach to conduct PCA.
How to perform PCA in SAS®

PCA in SAS Enterprise Guide


In SAS Enterprise Guide® one can use a “wizard” approach to conduct PCA.
How to perform PCA in SAS®

PCA in SAS Enterprise Guide


In SAS Enterprise Guide® one can use a “wizard” approach to conduct PCA.
How to perform PCA in SAS®

PCA in SAS Enterprise Guide


In SAS Enterprise Guide® one can use a “wizard” approach to conduct PCA.
Summary

Summary
Students should read Sharma, S. (1996), Applied Multivariate Techniques, Wiley, p. 58-89
if they want to extend their knowledge on these subjects.
Thank you!

Address: Campus de Campolide, 1070-312 Lisboa, Portugal


Phone: +351 213 828 610 Fax: +351 213 828 611

You might also like