Professional Documents
Culture Documents
T o p ic X Factor
10 Analysis
and
Reliability
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Describe the requirements for factor analysis for a given data set;
2. Use the appropriate method to determine the principal components
underpinning the responses on a set of variables; and
3. Compute the reliability index.
X INTRODUCTION
Factor Analysis is used to uncover the latent structures (dimensions) of a set of
variables. It is a family of analysis under data reduction. Other methods are
Latent Class Analysis, Latent Profile Analysis, Latent Trait Analysis, and
Principal Component Analysis (PCA). The focus of Principal Component
Analysis is to reduce the number of variables into a smaller set of principal
components (dimensions). It allows researchers to use a smaller number of
ÂfactorsÊ to explain what the long list of ÂvariablesÊ actually measure. PCA is
normally used to reduce a large number of variables to a smaller number of
factors. Prior to Multiple Regression analysis, factor analysis was used to create a
set of factors to be treated as uncorrelated variables as one approach to handle
multi-collinearity. Factor analysis is an Interdependency Technique; it aims to
find the latent factors that account for the patterns of collinearity among multiple
metric variables. Some statisticians do not consider PCA as factor analysis.
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 201
X1 X2 X3 X4 X5 X6 X7 X8 X9
Basic Principle:
Variables that significantly correlate with each other do so because they are
measuring the same "construct".
The Problem:
What is the "construct" that brings the variables together?
To find out the answer, we need to carry out Factor Analysis or The Principal
Component Analysis, to be more precise.
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 203
Step 1:
Collect data on the two variables (X and Y) from a group of respondents (letÊs say
we use 10 respondents).
Table 10.2 shows the example of scores for the above data collected:
Step 2:
Determine the variance-covariance matrix for the two variables. The formulas
below define the variance and covariance.
n
∑ ( X i − X )( X i − X )
Variance [Cov (X, X)]: i = 1 , using the standardised value
n −1
Cov (X, X) = 1.00
204 X TOPIC 10 FACTOR ANALYSIS AND RELIABILITY
n
∑ (Yi − Y )(Yi −Y )
Variance [Cov (Y, Y)]: i = 1 = using the standardised value
n −1
Cov (Y, Y) = 1.00
n
∑ ( X i − X )(Yi − Y )
Covariance [Cov (X, Y)]: i = 1 = 0.77
n −1
⎛ cov( x , x ) cov( x , y ) ⎞
Variance-covariance matrix: ⎜⎜ ⎟⎟ ; if the X and Y scores are
⎝ cov( y , x ) cov( y , y ) ⎠
transformed into standardised scores, the variance-covariance will give us the
correlation matrix.
Thus, for the above example, the variance-covariance matrix for the standardised
value of
⎛ 1.00 0.77 ⎞
X and Y is ⎜⎜ ⎟⎟
⎝ 0.77 1.00 ⎠
Step 3
Calculate the eigenvectors and eigenvalues of the covariance matrix:
(where r12 is the correlation between the two variables, in this case, the
correlation value is 0.77)
Thus, the eigenvalues are 1.77 and 0.23
The eigenvalues will give the eigenvectors.
⎛ 1 ⎞
When eigenvalue is 1.77, the eigenvector is ⎜⎜ ⎟⎟
⎝ 2.54 ⎠
⎛ 1 ⎞
When eigenvalue is 0.23, the eigenvector is ⎜⎜ ⎟⎟
⎝ − 1.43 ⎠
206 X TOPIC 10 FACTOR ANALYSIS AND RELIABILITY
Step 4
Plotting the standardised values on a two dimensional plane and overlying the
eigenvectors.
2.00
1.50
Eigenvector
⎛ 1 ⎞
1.00 ⎜⎜ ⎟⎟
⎝ 2.54 ⎠
0.50
0.00
-2.00 -1.00 0.00 1.00 2.00
-0.50
-1.00 Eigenvector
⎛ 1 ⎞
-1.50 ⎜⎜ ⎟⎟
⎝ − 1.43 ⎠
-2.00
Figure 10.1: Plotting on a two dimensional plane and overlying the eigenvectors
From the plot shown in Figure 10.1, it can be concluded that the data set is fairly
well represented by the eigenvector derived when the eigenvalue is 1.77.
The above discussion is just for illustrative purpose. In real situations, there will
be more than two „observed‰ variables and thus, visual representation (e.g.
graphical) is not possible.
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 207
Having run the correlation analysis, the researcher found that some of the items
have high correlations with one another while others, not so. Table 10.4 shows an
example of the correlation analysis.
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 209
X1 X2 X3 X4 X5 X6 .... Xk
X1 1.00 0.76 0.84 ⁄
X2 1.00 0.76 ⁄
X3 1.00 ⁄
X4 1.00 0.76 0.77 ⁄
X5 1.00 0.81
X6 1.00
---
Xk 1.00
The next logical thing to do is to cluster the variables with high inter correlations
together and define them as belonging to the same family. This is what factor
analysis (or Principal Component Analysis, to be precise) is all about. Table 10.5
displays an example of the factor analysis. The values in the cells are the factor
loadings (Refer to Subsection 10.3.1 for further explanation on factor loadings).
Example
Note the factor loadings for variable X1.
As such, the initial communality for the variables before extracting the factors is
always 1.00. In the above example, emotional intelligences is operationalised
using 23 specific situations and the initial factors will be 23, with some having
greater dominance than the others (this will be reflected in the eigenvalues).
Once the dominant factors are identified (e.g. those with eigenvalue greater than
1.00), the communality value for each variable will be less than 1.00. This is
because in factor analysis, those factors that have negligible effect on the variables
will be dropped.
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 211
Step 1
Compute a k by k inter-correlation matrix. According to Hair et.al, inter-
correlation values must be at least 0.3 for the items to be considered for factor
analysis.
Step 2
Extract an initial solution.
Step 3
Determine the appropriate number of factors to be extracted in the final solution.
Step 4
Rotate the factors to clarify the factor pattern in order to better interpret the
nature of the factors if necessary.
Step 5
Establish the measures of goodness-of-fit of the factor solution
C1
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
C2
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
All the observed variables will have some influence on all the factors extracted,
however, a different set of the variables will have different degrees of influence
on the different common factors.
The following Figure 10.2 summarises the requirements and assumptions for
principal component analysis.
Correlation Matrixa
rq1 rq2 rq3 rq4 rq5 rq6 rq7 rq8 rq9 rq10
Correlation rq1 1.000 .604 .578 .419 .514 .580 .497 .555 .554 .481
rq2 .604 1.000 .615 .518 .488 .545 .543 .402 .402 .401
rq3 .578 .615 1.000 .519 .567 .536 .572 .481 .484 .496
rq4 .419 .518 .519 1.000 .581 .430 .450 .336 .174 .357
rq5 .514 .488 .567 .581 1.000 .577 .577 .466 .382 .574
rq6 .580 .545 .536 .430 .577 1.000 .575 .510 .417 .437
rq7 .497 .543 .572 .450 .577 .575 1.000 .459 .442 .521
rq8 .555 .402 .481 .336 .466 .510 .459 1.000 .585 .602
rq9 .554 .402 .484 .174 .382 .417 .442 .585 1.000 .529
rq10 .481 .401 .496 .357 .574 .437 .521 .602 .529 1.000
a. Determinant =0.005
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 215
Test Results
χ2 = 887.955 ; df = 45 ; p < 0.0001
Statistical Decision
The inter-correlation matrix of the variables is significantly different from
an identity matrix. In other words, the sample inter-correlation matrix did
not come from a population in which the inter-correlation matrix is an
identity matrix.
If aij ≅ 0.0; the variables are measuring a common factor, and KMO ≅ 1.0
If aij ≅ 1.0; the variables are not measuring a common factor, and KMO ≅ 0.0
Interpretation
The degree of common variance among the ten variables is marvellous.
Note: In this module, we will only focus on the Principal Component Method.
initial factor (item) now belongs to Ânew factorsÊ and the new factors explain a
certain proportion of the variance in the variable. Thus, the proportion of
variance of each variable (item) explained by the new factors is less than 1.00
(refer to Table 10.12).
Communalities
Initial Extraction
The variance of each variable is 1.0, the total variance to be explained is 10 (10
variables, each with a variance = 1.0). Since a single variable can account for 1.0
unit of variance, a useful Ânew factorÊ must account for more than 1.0 unit of
variance, or have an eigenvalue (λ) greater than 1.0. Otherwise, the factor
extracted (new factor) explains less variance than a single variable. Table 10.7
shows the results of the factor analysis of the 10 items.
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 219
Referring to the above Table 10.13, the results of the initial solution:
Interpretation
10 factors (components) were extracted, the same as the number of variables
factored:
(a) Factor I
The 1st factor has an eigenvalue = 5.489. The value is greater than 1.0, as
such, it explains more variance than a single variable, in fact 5.489 times as
much.
(b) Factor II
The 2nd factor has an eigenvalue = 1.041. It is also a value greater than 1.0,
and therefore, explains more variance than a single variable.
The total proportion of the variance in rq1 explained by the two factors is:
(0.7852 + 0.0992) = 0.626
222 X TOPIC 10 FACTOR ANALYSIS AND RELIABILITY
The key to determining what the factors measure is the factor loadings.
Component Matrixa
Component
1 2
rq1 .785 .099
rq2 .748 -.253
rq3 .795 -.127
rq4 .640 -.567
rq5 .776 -.216
rq6 .762 -.086
rq7 .765 -.096
rq8 .727 .406
rq9 .667 .563
rq10 .728 .289
Extraction Method: Principal Component
Analysis.
a. 2 components extracted.
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 223
Factor I
Factor II
Variable Factor Loading
The correlation coefficient between rq1 and ÂFactor IIÊ is 0.099
rq1 .099
The correlation coefficient between rq2 and ÂFactor IIÊ is -0.253
rq2 -.253
The correlation coefficient between rq3 and ÂFactor IIÊ is -0.127
rq3 -.127
The correlation coefficient between rq4 and ÂFactor IIÊ is -0.567
rq4 -.567
The correlation coefficient between rq5 and ÂFactor IIÊ is -0.216
rq5 -.216
The correlation coefficient between rq6 and ÂFactor IIÊ is -0.086
rq6 -.086
The correlation coefficient between rq7 and ÂFactor IIÊ is -0.096
rq7 -.096
The correlation coefficient between rq8 and ÂFactor IIÊ is 0.406
rq8 .406
The correlation coefficient between rq9 and ÂFactor IIÊ is 0.563
rq9 .563
The correlation coefficient between rq10 and ÂFactor IIÊ is 0.289
rq10 .289
Reproduced Correlations
rq1 rq2 rq3 rq4 rq5 rq6 rq7 rq8 rq9 rq10
Reproduced rq1 .626a .562 .611 .446 .588 .590 .591 .611 .580 .600
Correlation rq2 .562 .623a .626 .622 .635 .591 .596 .441 .357 .471
rq3 .611 .626 .647a .580 .644 .616 .620 .526 .459 .542
rq4 .446 .622 .580 .732a .619 .536 .544 .235 .108 .302
rq5 .588 .635 .644 .619 .649a .610 .614 .477 .397 .503
rq6 .590 .591 .616 .536 .610 .588a .591 .519 .460 .530
rq7 .591 .596 .620 .544 .614 .591 .594a .517 .456 .529
rq8 .611 .441 .526 .235 .477 .519 .517 .694a .714 .647
rq9 .580 .357 .459 .108 .397 .460 .456 .714 .762a .649
rq10 .600 .471 .542 .302 .503 .530 .529 .647 .649 .614a
Residualb rq1 .042 -.033 -.027 -.074 -.009 -.094 -.056 -.026 -.119
rq2 .042 -.011 -.104 -.147 -.047 -.053 -.039 .046 -.070
rq3 -.033 -.011 -.061 -.077 -.080 -.048 -.045 .025 -.046
rq4 -.027 -.104 -.061 -.038 -.106 -.094 .101 .066 .055
rq5 -.074 -.147 -.077 -.038 -.033 -.037 -.011 -.014 .071
rq6 -.009 -.047 -.080 -.106 -.033 -.016 -.009 -.042 -.093
rq7 -.094 -.053 -.048 -.094 -.037 -.016 -.058 -.014 -.008
rq8 -.056 -.039 -.045 .101 -.011 -.009 -.058 -.129 -.045
rq9 -.026 .046 .025 .066 -.014 -.042 -.014 -.129 -.120
rq10 -.119 -.070 -.046 .055 .071 -.093 -.008 -.045 -.120
The upper half of the above Table 10.14 presents the bivariate correlations.
Compare these with the lower half of the table that presents the residuals.
Less than half of the residuals (42%) are greater than 0.05
10.5 RELIABILITY
In many areas of educational and psychological research, the precise
measurement of various variables or theoretical constructs poses a challenge. For
example, the precise measurement of personality variables or attitudes is usually
a necessary first step before any theories of personality or attitudes can be
considered. In general, unreliable measurements of people's beliefs or intentions
will obviously hamper efforts to predict their behaviour. Reliability analysis is
often used to statistically check the reliability of an instrument. Reliability is the
measure of consistency of a particular instrument. This refers to the „capability‰
of the instrument producing consistently similar results if it were administered to
a homogenous group of respondents. Generally, there are four classes of
reliability estimates. They are inter-rater or inter-observer reliability, test-retest
reliability, parallel-form reliability, and internal consistency. The inter-rater or the
inter-observer reliability is used to assess the degree to which two different
observers describes a phenomenon. This is widely used in establishing reliability
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 227
for open-ended questions. The test-retest, the parallel-forms and the internal
consistency reliability are mainly used to assess the reliability for fixed response
items. The test-retest is used to measure the consistency of the measure from one
time to another, while the parallel-form is the reliability measure of the
consistency of two tests which were constructed using the same content domain.
α= k ⎛⎜ k S 2 i ⎞⎟
1 − ∑
k − 1 ⎜⎜ i = 1 S 2 ⎟
⎝ sum ⎟⎠
where
Si2 = variance for k individuals
S2sum = variance for the sum of all items
• If there is no true score but only random errors in the items
(uncorrelated across items) then Si2 = S2sum and α = 0
• If all items measure the same thing (true score) then α=1
• Nunnaly (1978) suggests an α > 0.7
Example
A researcher gave a 10-item questionnaire on Emotional Intelligence to a sample
of randomly selected secondary school students. The aim is to determine the
internal consistency of the scale using CronbachÊs alpha. The Table 10.16 below is
the SPSS output.
Item-Total Statistics
Corrected Squared Cronbach's
Scale Mean if Scale Variance Item-Total Multiple Alpha if Item
Item Deleted if Item Deleted Correlation Correlation Deleted
rq1 41.89 63.948 .718 .560 .895
rq2 41.78 64.915 .676 .533 .897
rq3 41.89 64.380 .731 .555 .894
rq4 42.24 65.499 .560 .458 .905
rq5 42.19 62.074 .713 .573 .895
rq6 42.14 63.800 .692 .516 .896
rq7 42.00 63.202 .696 .508 .896
rq8 41.83 64.745 .654 .521 .899
rq9 41.93 66.185 .583 .491 .903
rq10 41.97 64.849 .658 .517 .898
TOPIC 10 FACTOR ANALYSIS AND RELIABILITY W 229
ACTIVITY 10.1
• Among the required assumptions for factor analysis are a large sample,
normality (not for PCA), linear relationship among variables, absence of
outliers, and no multi collinearity.
• Factor loading is the correlation between a variable and a factor that has been
extracted from the data.
• There are four classes of reliability estimates. They are inter-rater or inter-
observer reliability, test-retest reliability, parallel-form reliability, and internal
consistency.