You are on page 1of 8

Page 1 of 8

R-PLS Path
Modeling Example
Example: Index of Success

For this example the goal will be to obtain an Index of Success using data of Spanish pro-
fessional football teams. The data comes from the professional Spanish Football League, La
Liga, and it consists of 14 variables measured on 20 teams. Data was collected and
manually cu-rated data from the 2008-2009 season from different websites like LeagueDay.com,
BDFutbol, ceroacero.es, statto.com and LFP. The resulting dataset comes with the package
plspm under the name spainfoot. To get access to the data, you must first call the package
plspm and then use the function data() in the R console:
# load package plspm
library(plspm)

# load data spainfoot


data(spainfoot)

To get an idea of what the data looks like we can use the head() function which will show
us the first n rows in spainfoot
# first 5 rows of spainfoot
head(spainfoot, n = 5)
## GSH GSA SSH SSA GCH GCA CSH CSA WMH WMA LWR LRWL
## Barcelona 61 44 0.95 0.95 14 21 0.47 0.32 14 13 10 22
## RealMadrid 49 34 1.00 0.84 29 23 0.37 0.37 14 11 10 18
## Sevilla 28 26 0.74 0.74 20 19 0.42 0.53 11 10 4 7
## AtleMadrid 47 33 0.95 0.84 23 34 0.37 0.16 13 7 6 9
## Villarreal 33 28 0.84 0.68 25 29 0.26 0.16 12 6 5 11
## YC RC
## Barcelona 76 6
## RealMadrid 115 9
## Sevilla 100 8
## AtleMadrid 116 5
## Villarreal 102 5
Page 2 of 8

The description of each variable is given in the following table:

Table 1: Description of variables in data spainfoot


Variable Description
GSH total number of goals scored at home
GSA total number of goals scored away
SSH percentage of matches with scores goals at home
SSA percentage of matches with scores goals away
GCH total number of goals conceded at home
GCA total number of goals conceded away
CSH percentage of matches with no conceded goals at home
CSA percentage of matches with no conceded goals away
WMH total number of won matches at home
WMA total number of won matches away
LWR longest run of won matches
LRWL longest run of matches without losing
YC total number of yellow cards
RC total number of red cards

Example model

As it is typical with structural models, we usually rely on some kind of theory to propose a
model. It can be a very complex theory or an extremely simple one. In this case we are not
going to reinvent the wheel, so let’s define a simple model based on a basic yet useful theory:

the better the quality of the Attack, as well as the quality of the Defense,
the more Success.

The simple theory involves two hypotheses. In one of them we are supposing that if a team
improves its attack, it should be more successful and hence win more matches. The other
hypothesis is that if a team improves its defense, it should also be more successful, or at least
it should avoid losing matches.
The theory could also be expressed in a more abstract form like this:

Success = f (Attack, Defense)

This is simply a conceptual means to say that Success is a function of Attack and Defense.
But we could go further by specifying a linear function and expressing the theory with an
equation like this one:
Success = b1 Attack + b2 Defense

In addition to expressing the model in text and mathematical format, we can also display
Page 3 of 8

the model in a graphical format using what is called a path diagram —this is why is called
PLS path modeling — These diagrams help us to represent in a visual way the relationships
stated in the models. In this example, the following diagram depicts the relation success
depending on the quality of the attack as well as on the quality of the defense:

Attack

Success

Defense

Figure 1: Diagram depicting the simple model

Measuring Success, Attack and Defense

A model was proposed in which the Overall Success depends on the Quality of the Attack as well
as on the Quality of the Defense. These are the three latent variables. Now we need
to establish a set of indicators for each of the three constructs.

Block of Attack If you check the available variables in the data spainfoot, you will see
that the first four columns have to do with scored goals, which in turn can be considered to
reflect the Attack of a team. We are going to take those variables as indicators of Attack:

• GSH number of goals scores at home


• GSA number of goals scores away
• SSH percentage of matches with scores goals at home
• SSA percentage of matches with scores goals away
Page 4 of 8

Block of Defense The following four columns in the data (from 4 to 8) have to do with
the Defense construct

• GCH number of goals conceded at home


• GCA number of goals conceded away
• CSH percentage of matches with no conceded goals at home
• CSA percentage of matches with no conceded goals away

Block of Success Finally, columns 9 to 12 can be grouped in a third block of variables,


the block associated with Success

• WMH number of won matches at home


• WMA number of won matches away
• LWR longest run of won matches
• LRWL longest run of matches without losing

Path Model

The outer model is the part of the model that has to do with the relationships between each latent
variable and its block of indicators

Outer
Model

Figure 2: Outer model: relationships between each construct and its


indicators
Page 5 of 8

Preparing the ingredients for plspm()


Inner model matrix

# rows of the inner model matrix


Attack = c(0, 0, 0)
Defense = c(0, 0, 0)
Success = c(1, 1, 0)

# path matrix created by row binding


foot_path = rbind(Attack, Defense, Success)

# add column names (optional)


colnames(foot_path) = rownames(foot_path)

To see how the inner matrix looks like just type its name in the R console:
# let's see it
foot_path
## Attack Defense Success
## Attack 0 0
0
## Defense 0 0
0
## Success 1 1
0

The way in which you should read this matrix is by “columns affecting rows”. A number one
in the cell i, j (i-th row and j-th column) means that column j affects row i. For instance,
the one in the cell 3,1 means that Attack affects Success. The zeros in the diagonal of the
matrix mean that a latent variable cannot affect itself. The zeros above the diagonal imply
that PLS-PM only works with recursive models (no loops in the inner model).

A nice feature available in plspm is the function innerplot() that allows us to visualize the
inner matrix in a path diagram format. This is specially useful when we want to visually
inspect the model defined for path matrix.
# plot the path matrix
innerplot(foot_path)
Page 6 of 8

Outer model list

# define list of indicators: what variables are associated with


# what latent variables
foot_blocks = list(1:4, 5:8, 9:12)

Vector of modes

# all latent variables are measured in a reflective way


foot_modes = c("A", "A", "A")

An alternative type of measurement is the formative mode, known as mode B. So, if you had
some latent variables in mode B, say Success, you would have to specify this as follows:
# Success in formative mode B
foot_modes2 = c("A", "A", "B")

Running plspm()

The default usage of the function is:


plspm(Data, path matrix, blocks, modes = NULL)

# run plspm analysis


foot_pls = plspm(spainfoot, foot_path, foot_blocks, modes = foot_modes)

Here is how you would normally do all the previous steps in a single piece of code:
# rows of the path matrix
Attack = c(0, 0, 0)
Defense = c(0, 0, 0)
Success = c(1, 1, 0)

# path matrix (inner model)


foot_path = rbind(Attack, Defense, Success)

# add column names


colnames(foot_path) = rownames(foot_path)

# blocks of indicators (outer model)


foot_blocks = list(1:4, 5:8, 9:12)

# vector of modes (reflective)


foot_modes = c("A", "A", "A")

# run plspm analysis


Page 7 of 8

foot_pls = plspm(spainfoot, foot_path, foot_blocks, modes = foot_modes)

plspm() output

# what's in foot_pls?
foot_pls
## Partial Least Squares Path Modeling (PLS-PM)
## ---------------------------------------------
## NAME DESCRIPTION
## 1 $outer_model outer model
## 2 $inner_model inner model
## 3 $path_coefs path coefficients matrix
## 4 $scores latent variable scores
## 5 $crossloadings cross-loadings
## 6 $inner_summary summary inner model
## 7 $effects total effects
## 8 $unidim unidimensionality
## 9 $gof goodness-of-fit
## 10 $boot bootstrap results
## 11 $data data matrix
## ---------------------------------------------
## You can also use the function 'summary'

How do you know what class of object is foot pls? Use the function class() to get the
answer:
# what class of object is foot_pls?
class(foot_pls)

# path coefficients
foot_pls$path_coefs
## Attack Defense Success
## Attack 0.0000 0.0000 0
## Defense 0.0000 0.0000 0
## Success 0.7573 -0.2836 0

In the same manner, if you want to check the inner model results contained in $inner model,
just type:
# inner model
foot_pls$inner_model
## $Success

## Estimate Std. Error t value Pr(>|t|)


## Intercept -1.998e-16 0.09218 -2.167e-15 1.000e+00
Page 8 of 8

## Attack 7.573e-01 0.10440 7.253e+00 1.349e-06


## Defense -2.836e-01 0.10440 -2.717e+00 1.466e-02

# summarized results
summary(foot_pls)

Plotting results
# plotting results (inner model)
plot(foot_pls)

In order to check the results of the outer model, let’s say the loadings, you need to use the
parameter what (what="loadings") of the plot() function:
# plotting loadings of the outer model
plot(foot_pls, what = "loadings", arr.width = 0.1)

Show the Index

# show me the first scores


head(foot_pls$scores, n = 5)
## Attack Defense Success
## Barcelona 2.6116 -1.74309 2.7891
## RealMadrid 1.7731 -1.13284 2.3246
## Sevilla -0.1123 -2.24651 0.5541
## AtleMadrid 1.5334 0.02392 0.7771
## Villarreal 0.2801 0.16761 0.6084

# show me the last scores


tail(foot_pls$scores, n = 5)
## Attack Defense Success
## Valladolid -0.7615 0.4655 -0.7458
## Getafe 0.4188 0.6726 -0.9608
## Betis -0.4853 0.5492 -0.5288
## Numancia -1.4872 1.2356 -1.1348
## Recreativo -1.2255 0.7037 -0.9807

You might also like