You are on page 1of 9

Advanced Research Methods

STAT-6: Factor Analysis


(Course 2013)
Peter de Waal
c Peter de Waal STAT-6: Factor Analysis : 1 / 36
Exploring measurements
In empirical studies, often a multitude of variables are measured:
are these variables really measuring different dimensions ?
can related variables be grouped in terms of common themes, or
factors ?
which variables are best used for constructing a predictive
model ?
To answer these questions, statistical exploration of the measurements
is required.
c Peter de Waal STAT-6: Factor Analysis : Exploring measurements 2 / 36
Exploring measurements, continued
Some multivariate statistical techniques aim at prediction:
correlation analysis;
linear regression;
logistic regression.
Other techniques aim at exploration:
factor analysis;
cluster analysis;
multi-dimensional scaling.
The difference lies in the presence or absence of a predictor variable in
the analysis.
c Peter de Waal STAT-6: Factor Analysis : Exploring measurements 3 / 36
When to use
Factor analysis and Principal Component Analysis (PCA) are use to:
1
Understand the structure of a set of variables.
2
Construct a questionnaire to measure an underlying variable.
3
Reduce the data set to manageable size while retaining as much of
original information as possible.
Factor analysis:
Achieve reduction by explaining maximum amount of common
variance in a correlation matrix.
Explanatory construct known as factors (latent variables).
PCA:
Achieve reduction by explaining maximum amount of total
variance in correlation matrix.
Transform original variables into linear components.
c Peter de Waal STAT-6: Factor Analysis : Exploring measurements 4 / 36
The example study:
Success factors for on-line community platforms
Context: Some literature on guidelines and design principles
is available to construct on-line community platforms.
Problem: These guidelines and principles often do not translate
directly into website features.
Question: Can groups of website features be discerned which
constitute success factors for an on-line community
platform ?
c Peter de Waal STAT-6: Factor Analysis : Example 5 / 36
The dance community: an example platform
c Peter de Waal STAT-6: Factor Analysis : Example 6 / 36
The set-up of the example study
The community-platform study was set up as an on-line survey in
which members of a sample of on-line dance communities were asked
to rate the importance of each of a list of 39 website features, on a
7-point Likert scale;
to express their appreciation of the familiar dance community
sites from the sample, on a 10-point scale.
Results were obtained from a total of 284 respondents.
c Peter de Waal STAT-6: Factor Analysis : Example 7 / 36
Some results from the study
Feature Mean SD Feature Mean SD
Forum 6.30 1.14 Personal avatar 5.08 1.66
Private messaging 5.76 1.40 Exclusive reviews 5.04 1.50
Clear netiquette 5.75 1.54 Message board 4.98 1.64
Moderation 5.74 1.47 Poll 4.88 1.60
Quote function 5.53 1.60 Report-to-moderator 4.88 1.70
Member search 5.37 1.52 Reaction to photos 4.81 1.77
User guidance / FAQ 5.35 1.40 Prole, personal photo 4.76 1.66
Add parties 5.28 1.48 Prole, personal agenda 4.75 1.75
Reaction to articles 5.25 1.46 Emoticons in posts 4.70 1.86
Reaction to news items 5.22 1.58 Add news items 4.66 1.64
Reaction to parties 5.18 1.68 Indication who on-line 4.62 1.59
Suggestion box 5.17 1.44 Ubb-code in posts 4.56 1.79
Prize contest 5.15 1.64 Personal signature 4.43 1.68
Exclusive articles 5.15 1.48 Prole, favourite artists 4.39 1.69
Exclusive interviews 5.14 1.51 Personalised 4.38 1.59
Exclusive news items 5.14 1.47 . . .
c Peter de Waal STAT-6: Factor Analysis : Example 8 / 36
Reduction of dimensions
The most common use of factor analysis is for reducing the
dimensions of the problem:
the number of variables is reduced, making subsequent analysis
easier;
redundancy in the available data is reduced.
Typically, only a few factors sufce to describe the bulk of the data.
c Peter de Waal STAT-6: Factor Analysis : Dimension reduction 9 / 36
The basic idea of factor analysis
Consider ten random variables en their interrelationships:
The variables are clustered in groups, called factors, such that
there are strong correlations between the variables within the
factors;
there are weak or zero correlations between the variables across
factors.
The factors then are relatively independent of one another.
c Peter de Waal STAT-6: Factor Analysis : Factors 10 / 36
Factors
A factor/component is a derived and latent variable, which can be
dened in terms of the variables of the study such as
F
1
= x
1
+ 3 x
2
+ 0.25 x
3
0.37 x
4
and which serves to provide deeper insight in the concepts underlying
the measurements.
c Peter de Waal STAT-6: Factor Analysis : Factors 11 / 36
The example revisited
An analysis of the variables in the community-platform study revealed
that the variables
Clear netiquette
Moderation
User guidance / FAQ
Report-to-moderator
constitute a factor:
They show a strong mutual correlation, but have
Small correlations with other variables.
This factor captures features that allow for regulating, monitoring and
sanctioning member behaviour.
c Peter de Waal STAT-6: Factor Analysis : Factors 12 / 36
Factors and variables
Some remarks:
factors are not measured directly, but are hidden in the available
data;
factors are extracted or inferred from the data by analysing the
correlations between the variables which were directly measured;
factors need to be interpreted and assigned a meaning.
c Peter de Waal STAT-6: Factor Analysis : Factors 13 / 36
Outline of factor analysis
A factor analysis starts with the measurements on the variables of the
study:
in the rst step, a correlation matrix is computed for the variables;
from the correlation matrix, a factor matrix is established;
from the factor matrix, the number of factors to constitute the
factor solution is determined;
the factors are dened and assigned a meaning if necessary, the
factor solution is rotated.
c Peter de Waal STAT-6: Factor Analysis : Outline of factor analysis 14 / 36
Correlation revisited
Correlation matrix
A correlation matrix on the random variables x
1
, . . . , x
n
, n 2, is a
symmetrical matrix of the following form:
x
1
x
2
x
n
x
1
1.00 r
12
r
1n
x
2
r
21
1.00 r
2n
.
.
.
.
.
.
.
.
.
x
n
r
n1
r
n2
1.00
where r
ij
= r
ji
is the correlation coefcient of the variables x
i
and x
j
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 15 / 36
Correlation revisited
Recall that for two random variables x and y, the correlation coefcient
r for x and y equals
r =

n
i=1
(x
i
x) (y
i
y)
(n 1) s
x
s
y
where
larger values of |r| denote a strong correlation between the
variables x and y;
smaller values of |r| indicate a weak correlation between x and y;
r = 0 indicates a lack of correlation between x and y.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 16 / 36
Studying the correlation matrix
Consider the following correlation matrix on the random variables
x
1
, . . . , x
5
:
x
1
x
2
x
3
x
4
x
5
x
1
1.00 0.72 0.63 0.54 0.45
x
2
0.72 1.00 0.56 0.48 0.40
x
3
0.63 0.56 1.00 0.42 0.35
x
4
0.54 0.48 0.42 1.00 0.30
x
5
0.45 0.40 0.35 0.30 1.00
Is there an underlying factor F that explains the correlations between
the variables x
1
, . . . , x
5
?
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 17 / 36
Partial correlations
Partial correlation coefcient
Consider the three random variables x
i
, x
j
, x
k
. The partial correlation
coefcient r
ijk
of x
i
and x
j
given x
k
is
r
ijk
=
r
ij
r
ik
r
jk

(1 r
2
ik
) (1 r
2
jk
)
where r
ij
is the correlation coefcient of x
i
and x
j
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 18 / 36
Partial correlations continued
The partial correlation coefcient
r
ijk
=
r
ij
r
ik
r
jk

(1 r
2
ik
) (1 r
2
jk
)
equals zero
if the correlation between the variables x
i
and x
j
is fully described
by their separate correlations with the variable x
k
;
if r
ij
= r
ik
r
jk
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 19 / 36
Studying correlation matrices, continued
Consider again
x
1
x
2
x
3
x
4
x
5
x
1
1.00 0.72 0.63 0.54 0.45
x
2
0.72 1.00 0.56 0.48 0.40
x
3
0.63 0.56 1.00 0.42 0.35
x
4
0.54 0.48 0.42 1.00 0.30
x
5
0.45 0.40 0.35 0.30 1.00
There is a common factor that fully explains the correlations among
x
1
, . . . , x
5
, if a factor variable F can be constructed such that
r
ijF
= 0
for all pairs of variables x
i
, x
j
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 20 / 36
Studying correlation matrices, continued
Consider again
x
1
x
2
x
3
x
4
x
5
x
1
1.00 0.72 0.63 0.54 0.45
x
2
0.72 1.00 0.56 0.48 0.40
x
3
0.63 0.56 1.00 0.42 0.35
x
4
0.54 0.48 0.42 1.00 0.30
x
5
0.45 0.40 0.35 0.30 1.00
and let F be a factor variable with
r
1F
= 0.9 r
2F
= 0.8 r
3F
= 0.7 r
4F
= 0.6 r
5F
= 0.5
Then,
r
12
= r
1F
r
2F
= 0.9 0.8 = 0.72
r
13
= r
1F
r
3F
= 0.9 0.7 = 0.63
r
14
= r
1F
r
4F
= 0.9 0.6 = 0.54
.
.
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 21 / 36
Studying correlation matrices, continued
Consider again
x
1
x
2
x
3
x
4
x
5
x
1
1.00 0.72 0.63 0.54 0.45
x
2
0.72 1.00 0.56 0.48 0.40
x
3
0.63 0.56 1.00 0.42 0.35
x
4
0.54 0.48 0.42 1.00 0.30
x
5
0.45 0.40 0.35 0.30 1.00
After excluding the effect of the factor F, a residual matrix results:
x
1
x
2
x
3
x
4
x
5
x
1
0.19 0 0 0 0
x
2
0 0.36 0 0 0
x
3
0 0 0.51 0 0
x
4
0 0 0 0.64 0
x
5
0 0 0 0 0.75
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 22 / 36
The factor matrix
Factor matrix
A factor matrix on the random variables x
1
, . . . , x
n
, n 2, and the
factors F
1
, . . . , F
m
, m 1, is a matrix of the following form:
F
1
F
2
F
m
x
1
l
11
l
12
l
1m
x
2
l
21
l
22
l
2m
.
.
.
.
.
.
.
.
.
x
n
l
n1
l
n2
l
nm
where l
ij
is the correlation coefcient between the variable x
i
and the
factor F
j
; l
ij
is called the loading of x
i
on F
j
.
c Peter de Waal STAT-6: Factor Analysis : Factor matrix 23 / 36
Factor matrix: an example
F
1
F
2
F
3
F
4
F
5
x
1
0,86 -0,29 0,15 -0,13 -0,37
x
2
0,82 -0,38 0,14 0,25 0,32
x
3
0,84 -0,19 -0,14 -0,48 0,06
x
4
0,47 0,77 0,42 -0,08 0,03
x
5
0,68 0,53 0,48 0,19 0,01
c Peter de Waal STAT-6: Factor Analysis : Factor matrix 24 / 36
The example revisited
For the community-platform study, part of the computed factor matrix
is
Factor 1: Factor 3:
Feature Identity Governance
Prole, personal agenda 0.76 0.17
Prole, favourite artists 0.73 0.06
Prole, favourite parties 0.81 0.09
Buddy system 0.77 0.10
Prole, personal photo 0.73 0.04
.
.
.
Clear netiquette 0.06 0.84
Moderation 0.01 0.89
User guidance / FAQ 0.13 0.72
Report-to-moderator 0.16 0.77
.
.
.
c Peter de Waal STAT-6: Factor Analysis : Factor matrix 25 / 36
Loadings
The loading l of a variable x on a factor F is interpreted as follows:
a large value of |l| indicates that the variable x contributes to the
meaning of the factor F;
a small or zero value of |l| indicates that x does not contribute
much to the meaning of F, but rather contributes to that of another
factor.
A loading ranges between 1.00 and +1.00.
c Peter de Waal STAT-6: Factor Analysis : Factor matrix 26 / 36
Communalities
Communality, uniqueness
Consider the random variables x
1
, . . . , x
n
, n 2, and the factors
F
1
, . . . , F
m
, m 1. Let l
ij
be the loading of the variable x
i
on the factor
F
j
. The communality c
i
of x
i
equals
c
i
=
m

j=1
l
2
ij
The uniqueness u
i
of x
i
equals
u
i
= 1 c
i
c Peter de Waal STAT-6: Factor Analysis : Communalities 27 / 36
The example revisited
Consider part of the factor matrix of the community-platform study
and the communalities of the original variables:
Factor 1: Factor 3:
Feature Identity Governance Communality
Clear netiquette 0.06 0.84 0.71
Moderation 0.01 0.89 0.79
User guidance / FAQ 0.13 0.72 0.54
Report-to-moderator 0.16 0.77 0.62
The communality c
M
of the Moderation variable equals
c
M
= l
2
MF
1
+ l
2
MF3
= 0.79
which expresses that 79% of the variance of the variable Moderation is
explained by the two factors F
1
and F
3
.
c Peter de Waal STAT-6: Factor Analysis : Communalities 28 / 36
Factor matrix: an example
F
1
F
2
F
3
F
4
F
5
x
1
0,86 -0,29 0,15 -0,13 -0,37
x
2
0,82 -0,38 0,14 0,25 0,32
x
3
0,84 -0,19 -0,14 -0,48 0,06
x
4
0,47 0,77 0,42 -0,08 0,03
x
5
0,68 0,53 0,48 0,19 0,01
c Peter de Waal STAT-6: Factor Analysis : Communalities 29 / 36
The number of factors
The number of factors selected for further processing, is based upon
prior knowledge:

domain knowledge;

knowledge of related studies;


the amount of variation explained by the factors:

Cattells scree test of eigenvalues;

Kaisers rule of eigenvalues;


the comprehensibility of the extracted factors.
The selected factors constitute the factor solution.
c Peter de Waal STAT-6: Factor Analysis : Factor solution 30 / 36
Eigenvalues
Eigenvalue
Consider the random variables x
1
, . . . , x
n
, n 2, and the factors
F
1
, . . . , F
m
, m 1. Let l
ij
be the loading of the variable x
i
on the factor
F
j
. The eigenvalue e
j
of the factor F
j
equals
e
j
=
n

i=1
l
2
ij
c Peter de Waal STAT-6: Factor Analysis : Factor solution 31 / 36
Eigenvalues, continued
Suppose that for n random variables, n factors are extracted from the
data:
each variable could essentially account for (100/n)% of the total
variance in the data;
a factor with an eigenvalue e accounts for as much variance as e
variables in essence could.
The eigenvalue is sometimes called the amount of variance explained.
c Peter de Waal STAT-6: Factor Analysis : Factor solution 32 / 36
Eigenvalues: an example
F
1
F
2
F
3
F
4
F
5
x
1
0,86 -0,29 0,15 -0,13 -0,37
x
2
0,82 -0,38 0,14 0,25 0,32
x
3
0,84 -0,19 -0,14 -0,48 0,06
x
4
0,47 0,77 0,42 -0,08 0,03
x
5
0,68 0,53 0,48 0,19 0,01
eigenvalues 2,80 1,14 0,46 0,35 0,25
c Peter de Waal STAT-6: Factor Analysis : Factor solution 33 / 36
Tests of eigenvalues
For the factor solution, several tests are in use for selecting the factors
to be included:
Cattells scree curve shows the eigenvalue of each subsequent
factor:
The factors just prior to the levelling of the curve are included in
the factor solution;
Kaisers rule of eigenvalues is to include factors with an
eigenvalue e > 1 only.
c Peter de Waal STAT-6: Factor Analysis : Factor solution 34 / 36
Rotation
To achieve markedly different loadings on the various factors, often a
rotation of the factor solution is performed:
the original variables and their values remain unchanged;
the factors of the solution are redened in terms of the original
variables, while (more or less) maintaining their mutual
independence.
c Peter de Waal STAT-6: Factor Analysis : Factor solution 35 / 36
Lessons learned
The overall lessons learned from this lecture are:
factor analysis is a statistical technique for exploration;
factors are extracted from the data by analysing the correlations
between the measured variables;
factor analysis allows for data reduction.
c Peter de Waal STAT-6: Factor Analysis : Lessons learned 36 / 36

You might also like