Professional Documents
Culture Documents
Table of Contents
65 Structural Equation Modelling (SEM) ...................................................................................................... 2
the sample
X2
the difference between
and the model
min f ( S , ˆ )
Minimise
Multiple regression
X1 directed path
Paradoxically while this model comparison approach might appear very different
(single arrow)
from the traditional hypothesis testing approach (here you are hoping for the
error
Co-variation path least difference between the two) is can also incorporate the traditional approach
(double headed arrow)
(where you are hoping for a big a possible difference between the observed and
Y1
X2 zero effect models). For details see the online chapter SEM equivalent to basic
statistical procedures.
Box =
observed
X3 variable Schumacher and Lomax's excellent book, A Beginner's guide to Structural
Equation Modeling, 2010, provides details of what SEM is, from which I have
compiled the following list:
Factor analysis
1. Based on Correlations (Covariance)
2. Complex mathematical approach only made widely available with the use of suitable
observed
variables
paragrap software
also called
wordmean
3. Allows the definition of complex relationships using models (mathematically using
manifest PA1
variables covariance matrices which can be partially represented by diagrams)
sentence 4. Extends Regression (Path models)
If you consider only those variables which have one or more single arrowed paths pointing
towards them, each group can be considered to be a separate equation (actually a
regression equation). In the diagram below, we have one such group.
Let's look at the 'predictors of mortality' regression example below produced using the free
Ωnyx program. When you right click on the diagram you can view either the RAM or LISREL
matrices (menu option below). I have shown the RAM option.
Variables:
Education, Popden, Nonwhite, Mortality
Means (defaults to zero)
2. Is there any strategy to finding an improved fit between the two matrices each time
around?
By making use of one of the above computed discrepancy function values we can
compare these for difference guesses of our parameters and hopefully gradually home
in until we get a sufficiently small value. There are special search algorithms build into
SEM computer programs to help it gradually 'home in' but because the search program
may go widely astray we can usually also set a maximum number of times (i.e. iterations)
it can have a shot at finding a sufficiently close answer.
We will now look at a typical SEM diagram showing a set of model path estimates.
Medical Model of quality of life for cardiac patients (Romney et al 1992) n=469 The results are taken from a correlational
Standardised Parameters study concerning 469 cardiac patients
0.31 0.925 where the investigators were keen to
Severity of Diminished
E5
Symptoms of Socioeconomic find out the associations between
Illness various measures.
0.48
0.14 SEM is usually carried out on large
0.18
0.838 samples and 469 is small compared to
Correlation
Low morale E1
= standardised covariance many studies. Also usually, the study is of
0.17 a retrospective/correlational design
0.54
although often there are attempts in the
Neurological Poor 0.842 discussion of such papers to interpret the
interpersonal E4
dysfunction
relationships
relationships as causal. We will return to
-0.13
this issue later.
Proportion of
Beta weights unexplained Title of Standardised Parameters - this
= standardised partial regression coefficients variance
indicates that the variables are
Reference: Romney D M. Jenkins C D. Bynner J M. 1992 A structural analysis of health-related standardised (i.e. z scores). This allows
quality of life dimensions. Human Relations, 45, 165-176 Discussed in
Grimm & Yarnold 1995 Reading and understanding multivariate statistics p.84. the various parameters in the diagram to
be compared and interpreted in a specific way described below.
Double arrow lines – These values indicate a correlation where the modelling process
estimates the value. Variables that do not have lines between them specify that the
correlation between them is set to zero in the model (i.e. they are independent).
Error terms (E1, E4, E5)– This is the unexplained variance, that which is either measurement
error (random) or not explained by the model. This diagram uses the 'e' style to show them
rather than the self directed double arrows.
Proportion of unexplained variance values – these are also called the standardised residual
Self-test: variances (SRV). One minus these values gives the
2
1. Please complete this sentence. A level of Neurological R value (proportion of variance explained). So the
dysfunction one full standard deviation above the mean SRV x100 gives the % of the variance that is not
predicts an increase in poor interpersonal relationships by
____ Standard _____ below the mean controlling for predicted by the specific inputs that have arrows
Symptoms of illness. pointing to that variable. In some diagrams and
2. What does the value 0.54 represent between 'Low morale' and software (e.g. EQS) the square root value is given
'Poor relationships'?
instead.
3. What do the values from E5, E1 and E4 suggest about the
model? How do they relate to R2.
Direction of the single arrow - this does matter as it
4. What does the double pointed arrow ( <--> ) indicate. defines which variable will have the error term (the
5. There is no path between 'Neurological dysfunction' and 'Low variable that the arrow points to) and also how to
morale' in the model. What does this say about the model
specification? interpret the path coefficient.
6. Does the above model more closely represent a Regression
or Factor analysis? Give reasons for your answer. To check your understanding I have included some
7. How do you think you might calculate the correlation between
self-test questions opposite.
Severity of illness and Poor interpersonal relationships
(discussed further below). 65.2.3 Interpreting a Standardised SEM model
(i.e. circles and squares)
Interpreting an SEM model
SVR= containing both observed and
standardized
residual
latent variables (discussed latter) is
variance the same as for the above path
model.
I hope that this strengthens your belief that the SRV is a measure of what
remains unexplained after the regression analysis, as well as the idea that the
proportion of variance explained can be calculated from the beta weights and
the simple correlations between each input and the output.
To display the standardised path estimates alongside the unstandardised ones in
Ωnyx you need to select each path and then select the menu option shown
opposite.
65.2.5 Calculating Direct, Indirect and Total Effects
In a structural equation model it is
Standardised Estimates possible to calculate the various
(although you can also use unstandardized ones to work out the various effects) effects one or more variables have
Fitness upon another in the model via
-.260 Direct effect of Fitness upon illness = -0.260 other variables as well as directly.
From the example opposite it might
-.109 Indirect effect of Fitness upon illness = (-0.109)(0.291)
Illness
appear easy to calculate such
= -0.031719
effects, however this is not the case
Total effects are the sum of all direct and indirect with complex models as the indirect
effects of one variable on another
Stress
.291 path may follow tortuous routes. To
Total effect of Fitness upon illness = -.260 + -0.031719 help simplify this problem Sewell
Taken from Kline 2005p. 125 -129 adapted = -0.291719 Wright in the 1920's specified three
rules:
Admissible tracing rule example
example 1. You can trace backward up an arrow and then forward along the next, or forwards from
A one variable to another, but never forward and then back.
B
2. You can pass through each variable only once in a given chain of paths.
C
Self test:
E1 1. What is the total effect of Severity of Symptoms of
0.18 Low morale 0.84
Illness on Poor Interpersonal relationships?
0.17
2. What is the total effect of Neurological dysfunction
upon Diminished Socioeconomic Status?
From Onyx:
Education Popden Nonwhite Mortality error
Standardised scores Raw score
akronOH 0.4266666666666641 -623.0500000000002 -3.070000000000004 -18.481666666666456 9.637116969994196
albanyNY 0.02666666666666373 414.9499999999998 -8.370000000000005 57.518333333333544 88.37715695175669
allenPA -1.1733333333333356 393.9499999999998 -11.070000000000004 22.018333333333544 33.23561534231747
atlantGA 0.12666666666666337 -741.0500000000002 15.229999999999997 41.91833333333352 -9.874930492230668
baltimMD -1.3733333333333366 2574.95 12.529999999999994 130.61833333333357 24.987311233272635
birmhmAL -0.773333333333337 -541.0500000000002 26.629999999999995 89.61833333333357 -32.32122750565975
bostonMA 1.1266666666666634 812.9499999999998 -8.370000000000005 -5.681666666666388 50.06894986721164
bridgeCT -0.3733333333333366 -1726.0500000000002 -6.570000000000005 -40.88166666666643 -10.394939626850245
bufaloNY -0.47333333333333627 2715.95 -3.770000000000005 61.61833333333357 43.020748755494466
cantonOH -0.273333333333337 346.9499999999998 -5.170000000000004 -28.08166666666648 -17.13378429709057
We will now see how it is not always necessary to use the raw data to
carry out a SEM analysis.
65.4 Covariances rather than raw data are used in model development
From the above discussion it should
now be clear that calculation of the
Sample covariance matrix:
sample data: V1, V2, V3, V4, V5, V6, various path values (coefficients) only
V1, V2, V3, V4, V5, V6, V1,
makes use of summary statistics (i.e.
Each has N V2,
observations V3, covariances or correlations). You may
V4,
V5,
remember in the chapter on factor
V6, analysis we saw how a matrix of
This dataset now only has 21 data items! (shaded cells)
=V(V+1)/2 = (6x7)/2 = 21 correlations along with a value
may be thousands
of values for these specifying the sample size could be
6 variables
but you only used instead of the raw data; the same
end up with 21
values! goes for SEM analysis.
Just because the 'raw data' is not used in the calculation of the path values, it should not be
assumed that sample size is not important, in contrast, it is extremely important. This is
because we always need to assess if the various estimated values are any different from
zero. This means it is necessary to either produce p-values or confidence intervals, both of
which rely upon the calculation of standard errors (unless we use bootstrapping techniques),
which itself is directly dependent upon the raw sample size. Furthermore, because SEM
models often have a substantial number of variables and paths, a large sample size (as a rule
of thumb, of at least 200) is needed to begin to make any sense of the data.
have have
Variables Paths Values "parameters"
Fixed
Manifest Latent Directed Bi-directional Constrained Free
"constant"
"observed/measured" "factor/construct" "regression paths" "covariances/correlation"
equality
"standard" s
Between variables inequality
phantom variable
Cheung 2015 p.39 Within variable cross-group
error/residual variance variance (independ. vars) proportional
error/residual variance nonlinear
equivalent (depend. vars)
Cheung 2015 p.40
Along with the identification problem comes the problem of assessing how well overall the
model fits the observed data.
Schumacker & Lomax then suggest a three stage strategy (2010 p. 74):
o Chi-squared - you are aiming for an insignificant result i.e. high p-value
(usually above .95) resulting from a small actual value. This value is
problematic both when you have df=0 or a large initial sample (number of
subjects, not the number of covariances). Various versions of this statistic
such as GFI and AGFI have been developed to take these problems into
account.
2. Inspect significance of individual parameters. The critical ratio (CR) for each
parameter estimate is computed by dividing the estimate by its standard error. In
most SEM programs it is compared to a z or t value of 1.96 at the .05 level of
significance. Some software (i.e. EQS) indicate parameters that are statistically
significant by placing an asterisk (*) beside them. Often in the model development
process insignificant paths are removed from the model.
Overall model fit metrics – Be wary Baggozzi, 2012 quoting various writers recommends the following standards for
indications of a good fit! assessing adequate fitness of SEM models: CFI≥0.95, and SRMR≤.08, also see
• Chi-squared p-value ≥ .95 Schumacker & Lomax, 2010. Again both the values reported opposite on the
• CFI≥0.95 previous page indicate a good model fit. I'll discuss the RMSEA next.
• SRMR≤.08
• RMSEA ≤.10 "good" ≤ .05
"very good" fit
2. How precisely have we determined population fit from our sample data?
3. Does the fit still appear good when we take into account the complexity of the model and
its number of free parameters?
By developing a confidence interval for the RMSEA we can take into account the three above
issues testing a null hypothesis of poor, instead of perfect, fit. If the upper limit of the 90%
confidence interval lies above the desired cutoff, for example .10, we can then reject the
hypothesis that the fit of the model in the population is worse than our desired level.
Similarly, if the CI does not extend beyond this cut-off we can conclude our model is of
D:\stats book_scion\new_version2016\65_structural_equation_modelling_2018.docx Book chapter 65 Page 12
adequate fit. The width of the confidence interval will also provide information about the
accuracy of the estimate which is always useful. Because the RMSEA is based upon the non-
central chi-squared distribution it can also form the basis of a power analysis discussed next.
The easiest approach is to consider the RMSEA for two models from which you can readily
calculate the power. Schoemann, Preacher & Coffman, 2010 provide an online calculator
along with several other comparable measures, along with R code. Similarly the R package
SEM Tools contains a function plotRMSEApower() which allows the plotting of power across
a range of sample sizes much like Gpower. The investigation of power during the planning
stages of a study that intends to use SEM techniques can be a long and arduous process as it
requires not least the investigators to define their proposed models at the start of the
process and also make a guess (quite literally) of what the RMSEA value might be.
Therefore detailed power analysis is not for the faint-hearted and anyone planning to
undertake this I would recommend they start by reviewing the literature. I would advise
MacCallum, Browne & Sugawara, 1996; Hancock & Freeman, 2001; Lee, Cai & MacCallum,
2012 and Miles, 2003. Professor David Kaplan, University of Wisconsin Madison, provides a
brief review of these at: http://www2.gsu.edu/~mkteer/power.html
Rex Kline who has written a popular introduction to SEM (Kline, 2016) and provides a set of
notes concerning power estimation for a variety of SEM models, including some R code, at
http://psychology.concordia.ca/fac/kline/SEM/qicss2/qicss2setA.pdf
Professor Timo Gnambs at the Leibniz Institute for Educational Trajectories in Bamberg
provides an online R code generator for SEM power analysis at:
http://timo.gnambs.at/en/scripts/powerforsem
To use Structural Equation Modelling successfully often requires time to dedicate oneself to
the SEM literature and also specific SEM software with the result that frequently those
researchers that become competent tend to apply it to almost all problems. Because of the
paradox between the complexity of the mathematics and interpretation of the results
compared to the relative ease of being able to draw the diagrams for a structural equation
model it is often inappropriately used.
One common problem often encountered with developing SEM models is that of
identification, which is discussed next.
I'll now demonstrate how you can create a simple SEM model by
Type of model Evaluate
Relationships? the model repeating the analysis described in the factor analysis chapter. I’ll
Variables / types
Parameters fixed
also demonstrate a more complex model.
/ free
Report
65.11 A basic SEM model – the Hozinger & Swineford 1939 data analysed using Onyx (Ωnyx)
Details of the Hozinger & Swineford 1939 dataset were discussed in the factor analysis
chapter. In this section we will use a separate program that produces R code called Ωnyx
(Ωnyx) to carry out the SEM analysis. This allows us to specify the model using a set of
graphical symbols which you may recognise from the factor analysis chapter.
65.11.1 Preliminaries
Watch my youtube video demonstrating the steps described below:
https://youtu.be/GN-HOWd31Ao (this link is case sensitive, the last character is lower case 'o' as in 'on')
You first need to download the dataset (tab-delimited format) from the link below and save
it locally: http://www.robin-beaumont.co.uk/virtualclassroom/book2data/grnt_fem.dat you
can also find it at the books website.
http://www.java.com/en/
http://onyx.brandmaier.de/download/
To run Ωnyx Double click on the jar file. This brings up the main window of Ωnyx.
model panel
To load the dataset into Ωnyx, either select the following menu
option or simply drag the data file from the file browser window.
File->load data
To add one or more variables to your model window simply drag them
from the dataset panel to the model panel. You can select more than
one using either the shift (continuous blocks) or CTRL keys
(discontinuous blocks).
Ωnyx gives each graphical representation of the variable the same name
as the variable in the dataset.
You can move and reshape the elements by selecting and then clicking
and dragging.
Similarly, you can move the model panel by dragging its top or bottom border (a hand
appears) or resize it by dragging the right hand lower corner .
We are going to create a type of SEM model known as a Confirmatory Factor Analysis (CFA)
and to interpret the findings of such an analysis is it best to have the variables standardised.
Luckily, you don't need to convert the variables as Ωnyx can do it for you.
variance
The error variance for each
variable is often simply labelled
model panel E1 to En where n is the
dataset panel number of variables in the
selection
indicators model requiring error
variances.
panel
resize area
or
Create Variable->Latent
I repeat this several times to create the model shown on the left.
Important: note that the arrows go from the latent variables to the
observed ones.
The only other path to add is that for the correlation (i.e. covariance)
between the two latent variables.
Select one of the latent variables by left mouse clicking on it and hold
down the shift key + right mouse click then drag to target
variable.
Select the path (left mouse click) then right mouse click on
it to bring up the menu and select the following menu
option:
Free Parameter
You need to carry out this process for all the paths
directed from the observed variables to the latent ones as
well. Once again, you can check to see which ones are
fixed or allowed to be estimated by hovering over each
path.
Uniqueness (u2) for each observed variable this is that portion of the variable that cannot be
predicted from the other variables (i.e. the latent variables). As the communality can be
interpreted as the % of the variability that is predicted by the model we can say that
uniqueness is the % of variability in a specific observed variable that is NOT predicted by the
model. This means that we want this value for each observed variable to be as low as
possible. Lower the better.
u2 = uniqueness (=1-h2)
Standardised value=
Factor loading
When single line from
indicator squared value
same as commonality h2
all variables centred
The results are shown above. It is important to realise that Ωnyx calculates the results 200
times and the final part of the estimate summary provides details of the results from which
we can see that luckily the results stabilised.
Overall model fit metrics – Be wary Because the above model is very similar to that presented in the factor
indications of a good fit! analysis chapter, mimicking most closely the Promax rotation result, we can
• Chi-squared p-value ≥ .95 interpret the paths between the latent variables and the observed variables as
• CFI≥0.95 commonality values and the error terms as being the measurement errors. The
• SRMR≤.08
slight differences between these values and those in the factor analysis
• RMSEA ≤.10 "good" ≤ .05
"very good" fit chapter, where u2i + h2i =1, is because here we have constrained several of
the paths to zero (i.e. there is no path between lozenges and verbal). This is
known as a Confirmatory Factor Analysis (CFA) approach in contrast to the previous chapter
where we followed what is known as an Exploratory Factor (EFA) approach.
Graphically to explicitly specify the means in a SEM model we add a special graphical
symbol, a triangle, to the model and then draw lines from it, pointing to those variables
for which you want the means to be included in the model. Adding this symbol implicitly
sets the menu option:
For further details see the online chapters at the books website:
An Introduction to Regression Modelling using Structural Equation Modelling
(SEM)
and
65.14.1 Lavaan
I will demonstrate the use of Lavaan using the code generated by Ωnyx for the
Confirmatory Factor Analysis (CFA) described above. You first need to obtain
and load the data to go with the model into R. Copy the following code into
the R Console window.
hozdata<- read.table("http://www.robin-beaumont.co.uk/virtualclassroom/book2data/grnt_fem.dat",
sep="\t", header=TRUE); names(hozdata)
specify the data filename (text file) you wish to use for the model –by editing this value
or remove this line. I usually remove it. See below
load the lavaan package, we have already
done that, so you can remove this line
library(lavaan);
dat <- read.table(DATAFILENAME, header =
" or ' TRUE) ;
we now begin to specify the lavaan model, model<-" a comment line, you could use the # symbol instead
basically we have model_name <-" ! regressions
note the double quote that is important
visual=~cubes
visual=~lozenges =~
Latent variable (continuous variables) definitions.
reading=~paragrap The name of the latent variable is on the left of the "=~"
~~
reading=~sentence operator, while the terms on the right, separated by "+"
Variances and covariances. The "~~"
(‘double tilde’) operator specifies reading=~wordmean operators, are the indicators of the latent variable.
(residual) variances of an visual=~visperc Alternatively, as here each is on a separate line.
observed or latent variable, or a set of Read "=~" as “is measured by”
! residuals, variances and covariances
covariances between one variable, and
visperc ~~ visperc
several other
cubes ~~ cubes variances
variables. Several variables, separated by
"+" operators can appear on the right. This lozenges ~~ lozenges
way, several pairwise (co)variances paragrap ~~ paragrap
involving the same left-hand variable "*" are referred to as modifiers. 1 * sets the variance
sentence ~~ sentence to one, it becoming a fixed parameter
can be expressed in a single expression.
The distinction between variances and wordmean ~~ wordmean
residual variances visual ~~ 1.0*visual
covariances - only one in this model
is made automatically. reading ~~ 1.0*reading
visual ~~ reading
you need to specify the dataframe here. Change
"; "data=data" to "data = hozdata"
" or ' result<-lavaan(model, data=dat,
To indicate we have finished specifying now send the model definition to the lavaan function
the lavaan model. Followed either by a
fixed.x=FALSE);
SEMi colon (;) or a blank line summary(result, fit.measures=TRUE);
produce a set of summary results for our model
including fit indices
~ (not used in above model)
Regression definitions. The output (dependent) variable is on
the left of a "~" operator and the input (independent)
variables, separated by "+" operators, are on the right.
To gain further information about the parameter estimates we use the parameterEstimates()
function which also provides confidence intervals, given below. I have shaded out two
irrelevant columns. standardized estimate
parameterEstimates(result, standardized = TRUE) either a standardised path
value, covariance or a
variance
1. Adding paths - Langrange multiplier tests (LM) provide information about how much
better the model would fit (i.e. 2 statistic would be reduced) if a particular
parameter were added to the model. That is if the parameter were allowed to be
estimated rather than being constrained/fixed.
2. Removing paths - Wald tests (backward search) indicate which parameters can be
dropped without affecting model fit.
The lavaan package takes the first approach with the modificationIndices() function, using
the output from our model modificationIndices(result) the output of which is shown below.
Expected Parameter Change (EPC).
improvement in model fit if Estimated parameter value if it were
parameter added added to the model
standardized EPC value While blindly looking through the
indices and purely deciding on
'improving the model' by adding one or two paths
with the highest mi values might be tempting this is
not really the best strategy as the modelling process
should also be informed by knowledge of the research
area.
change_results <-modificationIndices(result)
resid(result, type="raw")
resid(result, type="normalized")
resid(result, type="standardized")
# like z scores
effects(hozmodel1)
# gives total/direct/indirect effects
residuals(hozmodel1)
standardizedResiduals(hozmodel1)
covariances amongst
vcov(hozmodel1) all parameters
I have not shown all the results for the above options as
they would produce results very similar to those shown previously.
Another useful command in the sem() function, is objective =objectiveGLS which instructs
the program to produce generalized least squares estimates instead of the default maximum
likelihood ones. This can be useful if there is a problem with the initial estimation process.
Secondly you can get the sem() function to provide details for each iteration of the
estimation process by adding the option debug=TRUE. The use of these options is shown
below.
hozmodel2
same as the mi measures in
the lavaan package
The modIndices() function provides modification indices.
By setting the n.largest=n you will obtain the nth largest
values, the default is 5, as shown opposite.
summary(hozmodel3)
I personally prefer to use the sem() function as you then know exactly what you are getting.
We have now considered both creating SEM models directly in Ωnyx, specifying them initially
in Ωnyx, then repeating the analysis in R using the automatic code generated and finally
specifying the model directly in R. I deliberately kept the example as simple as possible but
now it is time to see a more typical example to demonstrate some of the representative
characteristics of an SEM analysis.
The latest version of Ωnyx (v1.00) provides an export to the SEM package, producing the
following code on the left, as with the previous export code you need to edit it by adding the
dataframe you wish to use. Also the first call to the sem() function seems to create an error
for this particular model.
Ωnyx export Edited
require("SEM"); require("SEM");
paths <- c("Education <-> Education", "Popden <-> Popden", paths <- c("Education <-> Education", "Popden <-> Popden",
"Nonwhite <-> Nonwhite", "Mortality <-> Mortality", "Education -> "Nonwhite <-> Nonwhite", "Mortality <-> Mortality", "Education ->
Mortality", "Nonwhite -> Mortality", "Nonwhite <-> Popden", Mortality", "Nonwhite -> Mortality", "Nonwhite <-> Popden",
"Education <-> Popden", "Education <-> Nonwhite", "Popden -> "Education <-> Popden", "Education <-> Nonwhite", "Popden ->
Mortality") Mortality")
parameter <- c("VAR_Education", "VAR_Popden", "VAR_Nonwhite", parameter <- c("VAR_Education", "VAR_Popden", "VAR_Nonwhite",
"VAR_Mortality", "Education_TO_Mortality", "Nonwhite_TO_Mortality", "VAR_Mortality", "Education_TO_Mortality", "Nonwhite_TO_Mortality",
"COV_Nonwhite_Popden", "COV_Education_Popden", "COV_Nonwhite_Popden", "COV_Education_Popden",
"COV_Education_Nonwhite", "Popden_TO_Mortality") "COV_Education_Nonwhite", "Popden_TO_Mortality")
model <- array(c(paths, parameter, values), dim = c(10,3)) model <- array(c(paths, parameter, values), dim = c(10,3))
colnames(model) <- c("col1","col2","col3") colnames(model) <- c("col1","col2","col3")
dat <- read.table(DATAFILENAME, header = TRUE) dat <- hozdata
result <- SEM(model = model, data = dat, fixed.x ="Intercept", raw result <- SEM(model = model, data = dat)
= TRUE) summary(result)
result <- SEM(model = model, data = dat) standardizedCoefficients(result)
There is a lot to take in concerning this complex model. Possible key points are:
1. Parental reports of tobacco, alcohol and cannabis use are correlated in contrast to self-
reported cannabis use, which does not correlate with parental or other self reported
measures.
2. Self-reported measures of tobacco, alcohol and cannabis use have higher standardised
factor loadings (.77 to .94) compared to parental values ( .46 to .79). Remember the
higher the loading the greater the degree to which the measure is reflected in the factor.
3. Relationship between tobacco, alcohol and cannabis use (latent variable) and
vulnerability - the standardised loading factor values here (.66 to .79) indicates a strong
relationship between the three specific latent variables and the general vulnerability
latent variable.
5. The asterisks (*) beside the parameter estimate indicates that the value is statistically
significant.
6. The overall model fit measures indicate a well-fitting model with CFI =0.99 (good fit
CFI≥0.95) and RMSEA=0.06 (RMSEA≤.05 very good fit).
• Non-normal distributions
• Bayesian approaches
• Ordinal and binary data, mimicking factorial ANOVA and Logistic regression.
• Time series (also called latent growth models)
• Comparisons of multiple groups (means)
• moderation & mediation effects (see
https://www.youtube.com/watch?v=mirI5ETQRTA)
• Meta-analysis (Cheung 2015)
A good introduction to the above techniques, except moderation and mediation, can be
found in Schumacker & Lomax, 2010.
1. Data Preparation
1. Have you adequately described the population from which the random sample data was drawn?
2. Did you report the measurement level of your variables?
3. Did you report the descriptive statistics on your variables? [+ normality]
4. Did you test for multivariate normality?
5. Did you create a table with correlations, means, and standard deviations?
6. Did you have missing data if so why and what strategy did you use to overcome the problem.
7. Did you have outliers and did you resolve this by using robust statistics or deletion methods?
8. Did you resolve non-normality of a variable by some form of transformation?
9. Did you have problems with multi-collinearity among variables? If so how did you resolve it?
10. Did you specify the input data used (raw, correlations, covariances etc.)
11. Did you include the set of commands as an appendix so that others could carry out a similar analysis?
2. Model Specification
1. Did you provide a rationale for your study?
2. Did you explain why SEM rather than another approach was required?
3. Did you describe your latent variables
4. Did you provide a theoretical foundation for your model?
5. Did you theoretically justify alternative models?
6. Did you justify your sample size?
7. Did you clearly state your statistical hypotheses?
8. Did you discuss the expected magnitude and direction of expected parameter estimates?
9. Did you include a figure or diagram of your proposed model(s)?
4. Model Estimation
1. How will you consider power and sample size?
2. What is the ratio of sample size to number of parameters?
3. What estimation technique is most suitable given your sample size and normality situation?
4. Did you encounter Heywood cases (negative variance) or other impossible values in the output?
5. Did the software use raw data or a matrix as input?
6. How did you scale the latent variable (reference variable or fix the variance of the latent variable).
7. Which SEM program and version did you use?
8. Did you encounter any convergence problems?
9. Did you report the R2 values to indicate the total effect the independent variables had on each dependent variable.
5. Model Testing
1. Did you include a website providing more information?
2. Did you provide tables (American Psychological Association style, Schumacker p.241) providing details for each factor (i.e. Reliability rho) and for each
indicator Loading and R2
3. Did you use single items or composite scale scores?
Several other issues in the original list not considered here.
6. Model Modification
1. Did you compare alternative or equivalent models?
2. Did you clearly indicate how you modified the initial model?
3. Did you provide a theoretical justification for the respecified model?
4. Did you add or delete one path/ parameter at a time? [recommended]
5. Which statistics helped guide you through the process (Wald, Lagrange [good] or simple t tests [bad])
6. Did you provide parameter estimates and model fit indices for both the initial model and the respecified one(s)?
7. Did you report expected change statistics.
8. Your model is not the only model that fits the sample data, so did you check for equivalent models or theoretically justify your final model?
9. How did you evaluate and select the best model?
7. Model Validation
1. Did you replicate your SEM model analysis using another sample of data?
2. Did you cross-validate your SEM model by splitting your original sample of data?
3. Did you use bootstrapping [simulate the samples] to determine bias in your parameter estimates?
4. Did you report effect sizes and confidence intervals in addition to statistical significance testing?
5. Did you evaluate your results with regard to your original theoretical framework?
http://www.guilford.com/cgi-bin/cartscript.cgi?page=etc/kline.html
There are many websites devoted to SEM techniques. Professor Jason Newsom, at Portland
University, runs an excellent course on SEM with online resources and links to several SEM
sites at: www.upa.pdx.edu/IOA/newsom . Another is David A. Kenny (Emeritus Professor,
University of Connecticut, Department of Psychology) who keeps a useful, succinct webpage
listing and explaining most of the model fit indices http://davidakenny.net/cm/fit.htm
There are the webpages associated with each of the R packages used in this chapter which
provide further details and examples. The Ωnyx website provides some information which
hopefully will be developed in future.
Finally I have augmented the material presented in this chapter with several online chapters
and YouTube videos, all available from the books website (http://robin-
beaumont.co.uk/rbook/sem/index.html) two of which are: