You are on page 1of 179

The Unscrambler

Tutorials

By CAMO Process AS
This manual was produced using ComponentOne Doc-To-Help 2005 together with Microsoft
Word. Visio and Excel were used to make some of the illustrations. The screen captures were taken
with Paint Shop Pro.

Trademark Acknowledgments
Doc-To-Help is a trademark of ComponentOne LLC.
Microsoft is a registered trademark and Windows 95, Windows 98, Windows NT, Windows
2000, Windows ME, Windows XP, Excel and Word are trademarks of Microsoft Corporation.
PaintShop Pro is a trademark of JASC, Inc.
Visio is a trademark of Shapeware Corporation.

Restrictions
Information in this manual is subject to change without notice. No part of the documents that build it
up may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any
purpose, without the express written permission of CAMO Process AS.

Software Version
This manual is up to date for version 9.5 of The Unscrambler.
Document last updated on March 17, 2006.

Copyright 1996-2006 CAMO Process AS. All rights reserved.


Contents
Tutorials 1
Tutorials: Read First ...........................................................................................................................1
What is in the Tutorials? ....................................................................................................... 1
How To Use the Tutorials ..................................................................................................... 2
A Simple Example of Calibration (Tutorial A)................................................................................... 3
Description of Tutorial S....................................................................................................... 3
Starting The Unscrambler ..................................................................................................... 4
Tutorial A - Opening the Data File........................................................................................ 4
Tutorial A - Define Sets ........................................................................................................ 5
Tutorial A - Univariate Regression ....................................................................................... 7
Tutorial A - Calibration .........................................................................................................8
Tutorial A - Interpret Results ................................................................................................ 9
Tutorial A - Prediction ........................................................................................................ 12
Tutorial A - Evaluation of Predictions ................................................................................ 12
Tutorial A - Recalibration with Proper Validation .............................................................. 14
Tutorial A - Plotting Options............................................................................................... 15
Quality Analysis with PCA and PLS (Tutorial B)............................................................................ 18
Description of Tutorial B .................................................................................................... 18
Tutorial B - Insert Category Variables ................................................................................ 20
Tutorial B - Check Variable Sets.........................................................................................22
Tutorial B - Define Sample Sets.......................................................................................... 23
Tutorial B, Problem I: Find the Main Sensory Qualities..................................................... 23
Tutorial B - Make a PCA Model .........................................................................................23
Tutorial B - Interpret the Residual Variance for PCA .........................................................24
Tutorial B - Interpret the Variance Plot in the PCA Overview............................................ 25
Tutorial B - Interpretation of the Score Plot........................................................................ 26
Tutorial B - Interpretation of the Correlation Loadings Plot ............................................... 28
Tutorial B - Interpretation of Scores and Loadings ............................................................. 29
Tutorial B - Interpretation of the Influence Plot .................................................................. 30
Tutorial B, Problem II: Explore the Relationships between Instrumental /Chemical Data
(X) and Sensory Data (Y).................................................................................................... 30
Tutorial B - Make a PLS Regression Model .......................................................................30
Tutorial B - Interpretation of the Warning List ................................................................... 32
Tutorial B - Interpretation of the Variance Plot................................................................... 33
Tutorial B - Interpretation of the Score Plot........................................................................ 34
Tutorial B - Interpretation of the Loadings and Loading Weights Plot ............................... 34
Tutorial B - Interpretation of the Predicted vs Measured Plot............................................. 35
Tutorial B, Problem III: Predict User Preference from Sensory Measurements.................. 36
Tutorial B - Make a PLS1 Regression Model ..................................................................... 36
Tutorial B - Interpretation of the Regression Overview...................................................... 37
Tutorial B - Interpretation of the Regression Coefficients .................................................. 39
Tutorial B - Open Result Matrices in the Editor.................................................................. 40
Tutorial B - Export Unscrambler Models............................................................................ 41
Tutorial B - Predict Preference for New Samples ............................................................... 42
Tutorial B - Interpretation of Predicted with Deviation ...................................................... 43

The Unscrambler Tutorials Contents iii


Tutorial B - Check The Error in Original Units (RMSEP).................................................. 43
Spectroscopy and Interference Problems (Tutorial C) ...................................................................... 45
Description of Tutorial C .................................................................................................... 45
Tutorial C - Read Data File and Define Sets .......................................................................46
Tutorial C - Plot Raw Data.................................................................................................. 47
Tutorial C - Univariate Regression...................................................................................... 48
Tutorial C - Calibration .......................................................................................................49
Tutorial C - Identify Outliers............................................................................................... 50
Tutorial C - Recalibration with Outliers Removed.............................................................. 50
Tutorial C - Study the Residual Variance............................................................................ 51
Tutorial C - Interpretation of the Calibration Model ...........................................................52
Tutorial C - Study the Predicted vs Measured Plot ............................................................. 53
Tutorial C - Multiplicative Scatter Correction (MSC) ........................................................ 53
Tutorial C - Check the Error in Original Units: Root Mean Square Error (RMSE) ............ 56
Tutorial C - Predict New MSCorrected Samples ................................................................ 57
Tutorial C - Check List for Spectroscopy Calibration.........................................................58
Experimental Design: Screening and Optimization (Tutorial D)...................................................... 60
Description of Tutorial D .................................................................................................... 60
Tutorial D - Build a Screening Design ................................................................................ 61
Tutorial D - Estimation of the Effects ................................................................................. 62
Tutorial D - Building an Optimization Design .................................................................... 64
Tutorial D - Computation of the Response Surface............................................................. 65
SIMCA Classification (Tutorial E) ................................................................................................... 69
Description of Tutorial E..................................................................................................... 69
Tutorial E - Re-format Data Table ...................................................................................... 69
Tutorial E - Graphical Clustering Based on Score Plots ..................................................... 70
Tutorial E - Make Class Models.......................................................................................... 71
Tutorial E - Classify Unknown Samples ............................................................................. 71
Tutorial E - Interpretation of Classification Results ............................................................ 72
Tutorial E - Diagnosing the Classification Model ............................................................... 73
Interacting with Other Programs (Tutorial F) ................................................................................... 75
Description of Tutorial F..................................................................................................... 75
Tutorial F - Import Spectra from an ASCII File.................................................................. 75
Tutorial F - Import Responses from Excel .......................................................................... 76
Tutorial F - Insert Category Variable .................................................................................. 77
Tutorial F - Define Sets.......................................................................................................77
Tutorial F - Make a PLS Model .......................................................................................... 78
Tutorial F - Inserting Plots into Word ................................................................................. 79
Tutorial F - Export ASCII-MOD File.................................................................................. 79
Tutorial F - Export Data to ASCII File................................................................................ 79
Experimental Design: Mixture (Tutorial G) ..................................................................................... 81
Description of Tutorial G .................................................................................................... 81
Tutorial G - Build a Simplex Centroid Design .................................................................... 82
Tutorial G - Import Response Values from Excel ............................................................... 83
Tutorial G - Check Response Variations with Statistics...................................................... 84
Tutorial G - Model the Mixture Response Surface with PLS.............................................. 85
Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) ...................................................... 89
Description of Tutorial H .................................................................................................... 89
Tutorial H - Toggle 3D Layouts in the 3D Editor ............................................................... 90
Tutorial H - Plot 3D Data .................................................................................................... 91
Tutorial H - Define a Primary Variable Set and a Secondary Variable Set .........................92
Tutorial H - Build a Three-Way PLS Regression model ..................................................... 93
Tutorial H - Find an Outlier and Recalculate ...................................................................... 93
Tutorial H - Interpret a Three-Way PLS Regression Model ............................................... 94
Multivariate Curve Resolution of Dye Mixtures (Tutorial I) ............................................................ 98
Description of Tutorial I...................................................................................................... 98

iv Contents The Unscrambler Tutorials


Tutorial I - Plot Raw Data ................................................................................................... 99
Tutorial I - Run MCR with Default Options .......................................................................99
Tutorial I - Plot MCR results............................................................................................. 100
Tutorial I - Interpret MCR results...................................................................................... 101
Tutorial I - Run MCR with Initial Guess........................................................................... 101
Tutorial I - Validate the Estimated Results with Reference Information .......................... 102
Tutorial I - Import an MCR Result Matrix........................................................................ 102
Constraint Settings in Multivariate Curve Resolution (Tutorial J) ................................................. 104
Description of Tutorial J.................................................................................................... 104
Tutorial J - Estimate the Number of Pure Components and Detect Outliers with PCA .... 104
Tutorial J - Run MCR With Default Settings .................................................................... 105
Tutorial J - Tune the Model's Sensitivity to Pure Components ......................................... 106
Tutorial J - Run MCR with a Constraint of Closure .......................................................... 107
Tutorial J - Remove Outliers and Noisy Wavelengths with "Recalculate" .......................108
Tutorial C - Illustrations.................................................................................................................. 110
Tutorial D - Illustrations ................................................................................................................. 119
Tutorial E - Illustrations.................................................................................................................. 126
Tutorial F - Illustrations.................................................................................................................. 130
Tutorial G - Illustrations ................................................................................................................. 133
Tutorial H - Illustrations ................................................................................................................. 143
Tutorial I Illustrations .................................................................................................................. 155
Tutorial J - Illustrations................................................................................................................... 164

The Unscrambler Tutorials Contents v


Tutorials
Try out new methods in practice and be guided through the practical steps of experimental design, data
analysis and interpretation of results with The Unscrambler.

Tutorials: Read First


By running through the tutorials you will get a basic understanding of the capabilities of The Unscrambler, an
introduction to interpretation of results, and a feeling for the procedures of multivariate data analysis. Note,
however, that a real world task seldom is this straightforward! Normally you must preprocess your data more
and run four, five, six or even more calibrations before you are satisfied.

Tutorial Files Delivered with The Unscrambler


The input data used in the tutorials were delivered together with the software. During installation they were
automatically stored in the directory Examples below the directory where the program has been installed.
Unless you specify otherwise when saving, all results files from the tutorials will also be stored in this
directory.

Note:
Some model names in these tutorials assume that your files are stored on a computer which runs Microsoft
Windows Operating Systems (Windows 9X, Windows 2K, Windows NT and Windows XP). Substitute the
long filenames with names that comply with the DOS 8.3 rule if The Unscrambler is installed on a file server
running Windows for Workgroups, or similar.

We suggest that you copy the data files to a safe place. This way you can always start from scratch with the
tutorials.

Read the details below to understand which tutorials are useful in your case, and get some practical advice for
running the tutorials.

What is in the Tutorials?


The tutorials present application examples and contain detailed instructions on how to use The Unscrambler.

Depending on your degree of experience in using The Unscrambler and your fields of interest, here are the
tutorials we recommend that you start with:

The Unscrambler Tutorials Tutorials: Read First 1


Summary of the Unscrambler Tutorials

Experience Fields of Interest Tutorial


Beginner Any A (simple example)
Limited Any F (interact with other
programs)
Limited PCA, PLS; Sensory, B (quality analysis)
consumer, chemical,
instrumental
measurements etc.
Limited PLS, Transformations; C (spectroscopy and
Spectroscopy etc. interference)
Limited Three-way data, tri-PLS. H (fluorescence emission-
excitation spectra)
Limited Experimental design, D (screening and
ANOVA, Response optimization)
Surface; Chemistry etc.
Limited Spectroscopy, analytical I (MCR of dye mixtures)
chemistry, curve
resolution.
Some practice Classification; Biology E (SIMCA classification)
etc.
Some practice Experimental design, G (mixture design)
mixtures, PLS; Food
technology etc.
Some practice Spectroscopy, analytical J (constraint settings in
chemistry, curve MCR)
resolution.

How To Use the Tutorials


Each tutorial starts with a presentation of the application example. Read carefully so as to understand the
context of the application and the nature of the data.
The next chapters of the tutorial are devoted to practical tasks. The Task section presents the task in a few
words; section How To Do It gives you detailed instructions:

Which commands to use;

How to select correct options in the dialogs;

How to interpret the results displayed on screen.

2 Tutorials: Read First The Unscrambler Tutorials


A Simple Example of Calibration (Tutorial A)
Description of Tutorial S
Context of Tutorial A
We want to measure the concentration Y of a chemical constituent a by use of conventional transmission
spectroscopy. But an interferent b is present in varying unknown quantities, and the instrument response of
b strongly overlaps that of a. (Lookup Image A001)

A001 Spectra of Constituent and Interferent


Light
Absorbance
"b" "a"

Wavelength
Blue Red

What You Will Learn in Tutorial A


This tutorial contains the following parts:

Start The Unscrambler;

Open a data file;

Define Sets;

Univariate vs. multivariate regression;

Calibration;

Prediction;

Validation;

Regression coefficients;

Plotting options.

Tutorial A - Data Table


The data for this tutorial is stored in the file Tutor_a in the Examples directory on your PC.

Seven solutions, or samples, have known concentrations, Y, of a, and can be used as the calibration samples.
Three other samples have unknown concentrations, which should be predicted by the use of a regression
model.

The Unscrambler Tutorials A Simple Example of Calibration (Tutorial A) 3


The light absorbance is measured at two different wavelengths, namely Red and Blue. Red is variable 1, Blue
is variable 2, and variable 3 is the concentration of a.

Starting The Unscrambler


The first task is always to start the program and log in as a user.

Task
Start The Unscrambler and log in.

How to Do It
Start The Unscrambler by double-clicking on The Unscrambler icon or selecting The Unscrambler from the
Start menu in Windows. A list of the users that are registered in The Unscrambler is shown. (Lookup Image
A002)

A002 The Unscrambler Startup Dialog

Select yourself from the list of users and click OK. If your name does not appear, the system supervisor has to
add your name to the list of users. You are asked to enter your password before The Unscrambler is opened if
the system supervisor has set this option.

Tutorial A - Opening the Data File


You have to get data into The Unscrambler before you can start to analyze it.

Task
Read the Tutor_a data file into the Editor and view some basic statistics of the data table.

How to Do It
Use File - Open to select the file Tutor_a in the Examples directory. This directory should be below the
directory where you installed The Unscrambler. (Lookup Image A003)

4 A Simple Example of Calibration (Tutorial A) The Unscrambler Tutorials


A003 Open File Dialog A004 Tutor_a data table displayed in
the Editor

An Editor containing the data table is opened. (Lookup Image A004)

Some basic statistics like the Mean, Standard Deviation and Skewness of the samples and variables can be
calculated and shown in a new Editor. Select View - Sample Statistics or View - Variable Statistics. A
dialog pops up which asks you on which part of the data table to calculate the statistics: (Lookup Image
A005)

A005 Sample Statistics Dialog A006 Sample Statistics Results


displayed in the Editor

Accept the default choice (All samples or All variables) and click OK. A new Editor is launched with means,
standard deviations, etc. (Lookup Image A006)
Close the Editor window with the statistics before you continue.

Tutorial A - Define Sets


Most of the time, you will want to work on subsets of your data table. To do this, you must define S ets for
variables and samples. One Sample Set and one Variable Set make up a virtual matrix which is used in the
analysis.

The Unscrambler Tutorials A Simple Example of Calibration (Tutorial A) 5


Task
Define the Variable Sets Light Absorb and Constit A, and the Sample Sets Calibration Samp and
Prediction Samp.

How to Do It
Choose Modify - Edit Set to launch the Set Editor. You see the list of already defined Variable Sets (which
in this case is empty). (Lookup Image A007)

A007 Set Editor Dialog (Variable Sets) A008 New Variable Set Dialog

Press Add... to launch the New Variable Set dialog (Lookup Image A008), where you define the first
variable Set:

Name: Light Absorbance
Data Type: Non-Spectra
Interval: 1-2

You can enter the variable numbers directly in the Set Interval field, or click Select to launch an interactive
Editor where you mark the variables that belong to the Set. De-select variables you have marked by mistake by
pressing <Ctrl> while you click on the variable you want to remove from the Set.

Click OK. Back in the Set Editor, press Add... again to launch the New Variable Set dialog once more,
where you define the second variable Set:

Name: Constituent A
Data Type: Non-Spectra
Set Interval: 3

Click OK.
Change the Set type to Sample Sets by selecting Sample Sets from the drop-down list in the Set Editor.
(Lookup Image A009)

6 A Simple Example of Calibration (Tutorial A) The Unscrambler Tutorials


A009 Set Editor Dialog (Sample Sets) A010 New Sample Set Dialog

Press Add... to launch the New Sample Set dialog (Lookup Image A010), where you define the following
Sample Sets in the same way as you defined the Variable Sets:

Name: Calibration Samples
Interval: 1-7

Name: Prediction Samples
Interval: 8-10
Click OK when you are finished with the Set Editor.

You will save a lot of energy in your own analyses later by defining the necessary Sets from the beginning. All
analyses and plotting will be much easier for you to set up.

Remember to save the data table before you proceed by selecting File - Save or pressing the button.

Tutorial A - Univariate Regression


Let us warm up to the multivariate analysis by doing some univariate regression. In other words, we will make
a 2D scatter plot.

Task
Find the regression of component a on the absorbance of red light ( X1 ).

How to Do It
You do the regression by plotting the red light variable ( X 1 ) against component a:

The Unscrambler Tutorials A Simple Example of Calibration (Tutorial A) 7


Mark the two variables Red and Comp a by clicking on the column numbers (press <Ctrl> as you click to
mark the second column). Then, select Plot - 2D Scatter and choose the Set Calibration Samples in the 2D
Scatter Plot dialog. (Lookup Image A011)

We want to do the univariate regression on the calibration samples only, and the Y-values are missing in the
prediction Set.

A011: 2D Scatter Plot Dialog A012: 2D Scatter Plot

The plot displayed here appears (Lookup Image A012), but without the trend lines. Toggle the regression
and/or target line on and off using View - Trend Lines - Regression Line/Target Line. The target line is
very useful in predicted vs. measured plots.

Statistics from the plot is shown in a special frame in the upper left corner. Toggle it on and off using View -
Plot Statistics.

Tutorial A - Calibration
Now it is time to make the first multivariate model.

Task
Make a PLS regression model between the absorbance measurements and the concentration of a.

How to Do It
Activate the Tutor_a data table by clicking on it or selecting it from menu Window 1 Tutor_a. Unmark
the variables by pressing the <Esc> key.

Select Task - Regression. Use the following parameters to define the model in the Regression dialog:
(Lookup Image A013)
Method: PLS1
Samples: Calibration Samples [7]
X-variables: Light Absorbance [2]
Y-variables: Constituent A [1]
Weights: All 1.0
Validation method: Leverage Correction
Num PCs (number of components): 2
Model Size: Full

8 A Simple Example of Calibration (Tutorial A) The Unscrambler Tutorials


The options for the samples, X-variables, and Y-variables are placed on the different sheets in the dialog. Click
the tab to see which options apply for each sheet.

The Center Data and Issue Warnings tick-boxes should always be checked, and Add Start Noise un-
checked. This also applies to all models you make later.

A013 Regression Dialog (Leverage A014 Information Dialog about


Correction) validation methods

Leverage correction is a validation method that is quick, but may give too optimistic results. It is useful in the
first runs of modeling, and when the data table is small. Therefore we use it here and in most of the later
tutorials. You should use a more conservative validation method for your own data. When Leverage correction
is used in a PLS model, an information dialog pops up (Lookup Image A014).
Click OK to start the calibration.

Tutorial A - Interpret Results

Task
Interpret the residual variance curve
Display the modeling results
Study the Regression Coefficients plot

The Unscrambler Tutorials A Simple Example of Calibration (Tutorial A) 9


How to Do It
When launching the calibration, the modeling starts at once. You see how the model develops in the PLS1
Regression Progress dialog (Lookup Image A015).

A015 PLS1 Regression Progress A016 PLS1 (Leverage Correction)


Dialog Regression Overview

Interpret the Y-Validation Variance Curve


The calibration screen output tells that the residual validation variance of Y is approximately zero after two
PCs. This means that the systematic variation of the calibration data is completely described by two PCs. PC 0
describes the average point in the data set.
You may notice that the residual variance increases slightly from PC 0 to PC 1. You should almost never
accept this in your daily work. It usually indicates the presence of outliers in the data, which should be
removed before going ahead. However, bear over with thi s in this tutorial as the main goal here is to teach you
the use of The Unscrambler. Furthermore, you will see later in this tutorial that when using a proper validation
method the residual variance curve is fine.

Display the Model Results


Press View to launch a result Viewer with the modeling results. A Viewer with four predefined plots appears,
which makes up the Regression Overview. (Lookup Image A016)
The plots most frequently used to view the model are available as predefined plots from the Plot menu as long
as the result Viewer is active.

You can also always display the modeling results of saved models from the Results menu. Select the kind of
model you want to look at and mark the model in the Results dialog.
Information about the model is available in the Information field. This is useful to answer questions like:
Which sample Set did we use to make the model? Did we remember to weight the X-variables? In the Results
dialog, the residual variance curve is displayed on a small screen to see the performance of the model.

10 A Simple Example of Calibration (Tutorial A) The Unscrambler Tutorials


Less used results are available by launching a general Viewer from Results - General View, and then
selecting a plot type from the Plot menu. Many result matrices from your model are available and can be
plotted in the weirdest combinations of different plots.
The plots in the regression overview are discussed in later tutorials, as the results from this model are not so
interesting. Let us instead take a look at the regression coefficients.

Study the Regression Coefficients Plot


Choose Plot - Regression Coefficients. The plot in the preview screen in the dialog now fills the upper
left corner. Double-click the preview screen in the dialog to plot the regression coefficients using the whole
Viewer (Lookup Image A017). This illustrates how the Viewer is divided into several sub-views, which
hold different plots. Take option Raw coefficients (B) as we would like to write the model equation. Then
change the number of components to two and click OK.
Note that, in our present case, the values of the regression coefficients remain unchanged when shifting from
Weighted coefficients (BW) to Raw coefficients (B). The reason is that we chose as weights All 1.0 (no
weighting) when we calibrated the model.

A017 Regression Coefficients A018 Regression Coefficients Plot


Dialog

Go to Edit - Options and select Bars in the Plot Layout field if the plot is displayed as curves. (Lookup
Image A018)
Let the mouse cursor rest over one of the bars to see which variable it is. Click once more to get th e object
information window. The b-coefficient for the Red absorbance is 1.04, the b-coefficient for the Blue
absorbance is -0.208 and the offset (B0) is 1E-06, i.e. approximately zero.
The b-coefficients enable us to write the model equation relating the concentration of a to the Red and Blue
light absorbances:

Concentration of a = 1E-06 + 1.04 * Red 0.208 * Blue

The Unscrambler Tutorials A Simple Example of Calibration (Tutorial A) 11


So far the result file has not been saved to disk. This only happens when you close the Viewer or save the
model deliberately. Select File - Save and give the result file the name Tutorial A.

Tutorial A - Prediction
The purpose of making a regression model is most of the time to be able to predict the response value of new
samples that are measured in the future.

Task
Use the calibration model to predict the concentration of a in the three unknown samples in the data table.

How to Do It
Use menu Window 1 Tutor_a to activate the data table, but do not close the Viewer with the regression
coefficients. Then, select Task - Predict. Use the parameters below to make the necessary specifications in
the Prediction dialog: (Lookup Image A019)
Samples: Prediction Samples 3
X-variables: Light Absorbance 2
Y-reference: no selection (do not include Y-reference values)
Model Name: Tutorial A
Number of Components: 2

A019 Prediction Dialog

Press Find to select the model if you do not remember its name or are not sure where it is. You may also enter
the model name directly into the field. Click OK to start the prediction.

Tutorial A - Evaluation of Predictions


At the stage of calibrating a regression model, the quality of the predictions is checked by looking at the
Predicted vs Measured plot. The predictions can be checked because some reference measurements are
available; otherwise it would not be possible to build a model.

12 A Simple Example of Calibration (Tutorial A) The Unscrambler Tutorials


This is not the case for the unknown samples we are now predicting. There are no reference measurements of
the concentration of a in these samples, with which to compare the predicted responses. Still, there is a way
to evaluate the quality of the predictions, because we have made a projection model.

Task
Evaluate the prediction results by looking at the plots Predicted vs. Measured from the PLS calibration stage
and Predicted with Deviation from the prediction stage.

How to Do It
First, let us look at the predictions you just made. Press View in the Prediction Progress dialog to open the
Viewer with the prediction results: (Lookup Image A020)
Save the results file under the name Tutorial A Prediction 1 before you proceed.

A020 Prediction result Viewer, first prediction A021 Predicted vs Measured


Dialog

Activate the Viewer with the regression coefficients (Window 2 Tutorial A). This is one way to go back
to the model results. Then select Plot - Predicted vs Measured and specify the following parameters in the
Predicted vs Measured dialog: (Lookup Image A021)
Plot type: Predicted vs. Measured
Y-variable: 1; Comp a
Components: 2
Samples: Calibration
Click OK.
The Predicted vs Measured plot appears. (Lookup Image A022)

The Unscrambler Tutorials A Simple Example of Calibration (Tutorial A) 13


A022 Trend lines Result

Use View - Trend Lines to toggle the regression and/or target line on and off.
Use View - Plot Statistics to toggle the statistics windows on and off.
You see that the prediction by the PLS model is extremely good in this case. Compare this multivariate
regression with the univariate regression result, which used variable Red only to predict Comp a (Lookup
Image A012). The correlation between predicted and measured is higher when the multivariate model, based
on both Red and Blue, is used for prediction.

Tutorial A - Recalibration with Proper Validation


Validation of the models is an extremely important point in your data analysis. The validation tells you which
prediction error you can expect in the future predictions by your model. Validating with the wrong method
may get you too optimistic error estimates. You may think the predictions give correct results, while they are in
fact way off.

Task
Make a new calibration with the same parameters as last time, but change the validation method to cross
validation.

How to Do It
Activate the Editor (Window 1 Tutor_a) and select Task - Regression. Use the following parameters:
(Lookup Image A023)
Method: PLS1
Samples: Calibration Samples [7]
X-variables: Light Absorbance [2]
Y-variables: Constituent A [1]
Weights: All 1.0
Validation method: Cross Validation
Num PCs (number of components): 2

A023 Regression Dialog (Cross A024 Cross Validation Setup Dialog


Validation)

14 A Simple Example of Calibration (Tutorial A) The Unscrambler Tutorials


Press Setup in the Validation Method field to launch the Cross Validation Setup dialog, and choose
Random from the Method list. This assigns the samples at random to each segment. Keep seven segments
with one sample in each segment. (Lookup Image A024)
Press OK in this dialog and then OK in the regression dialog to start the calibration.
You see the residual variance results for each segment flying by in the PLS1 Regression Progress dialog
before the final results turn up. The shape of the residual variance curve is different this time. The reason is
that we find the residual variance another way this time. Note that the model itself is the same regardless which
validation method you choose. The only difference is the estimate of the prediction error. This is the reason
why it is important to choose the correct validation method. The estimate of the prediction error (validation
variance) is more conservative with cross validation than with leverage correction.

Tutorial A - Plotting Options


To find information from the results we have made, we need the help of plots to detect the patterns that may
have disguised themselves.

Task
Use some of the plot options.

How to Do It
Press View in the PLS1 Regression Progress dialog to launch the regression overview of the latest model.
(Lookup Image A025)
Activate the Scores plot, which is the upper left plot, by clicking in it.

The Unscrambler Tutorials A Simple Example of Calibration (Tutorial A) 15


A025 PLS1 (Cross Validation) Regression A026 Options Dialog, Sample
Overview Grouping sheet - Groups by
cross validation segments

Select Edit - Options and go to the Sample Grouping tab. Tick the box Enable Sample Grouping and
choose Cross Validation Segments in the Group By field. (Lookup Image A026)
Click OK. (Lookup Image A027)
You see that each segment from the cross validation has its own color in the score plot. In this example each
segment had only one sample, but in other cases this is a good way to see how the segments are distributed in
the whole population of samples.

A027 Scores Plot - Sample groups A028 Options Dialog, Sample


colored by cross validation segments Grouping sheet - Groups by Value
of X-Variable 1

Activate the Predicted vs. Measured plot (lower right corner) and select Edit - Options. Enable sample
grouping the same way as you did before, but group by Value of Variable this time, choose X-variable 1 and
select to generate three groups. (Lookup Image A028)

16 A Simple Example of Calibration (Tutorial A) The Unscrambler Tutorials


Click OK. (Lookup Image A029)
Compare the plot with the data in the Editor and see that the values of X-variable 1 are used as labels for each
point. In this plot, colors are used to differentiate between Calibration results (in blue) and Validation results
(in red). Therefore the three groups of low, medium and high values of X-variable 1 are separated with
symbols.

A029 Predicted vs Measured Plot A030 Predicted vs Measured Plot


Samples separated with symbols into Samples separated with symbols into
three groups (1 PC) three groups (2 PCs)

Use the Next Horizontal PC and Previous Horizontal PC buttons to display the active plot
Predicted vs Measured for one/two Principal Components. (Lookup Image A030)

Save this last model and give it a meaningful name, if you want to take a look at it later from Results -
Regression without remaking the whole model.

The Unscrambler Tutorials A Simple Example of Calibration (Tutorial A) 17


Quality Analysis with PCA and PLS (Tutorial B)
Description of Tutorial B
Context of Tutorial B
We want to analyze the quality of raspberry jam to determine which parameters are relevant for perceived
quality. We have made a sensory test with a trained taste panel and registered a number of different variables
using descriptive sensory analysis. Problem I is thus to find the main sensory quality properties for raspberry
jam.

We are also interested in finding a way to rationalize quality control, since the use of taste panels is very
costly. Therefore, we will try to find instrumental measurement variables to replace some of the sensory
testing. Problem II is thus to explore the relationships between sensory variables and chemical/instrumental
measurements.

Finally we would like to predict consumer preference for raspberry jam from descriptive sensory analysis. This
is Problem III.

What You Will Learn in Tutorial B


This tutorial contains the following parts:


Insert category variables;

Define Sets;

Decompose by PCA;

Interpret scores and loadings;

PLS regression;

Export models;

Predict response values from new samples;

Estimate regression coefficients;

Find optimal number of components;

Numerical results.

Tutorial B - Data Table


The data for this tutorial is stored in the file Tutor_b in the Examples directory on your PC. The analysis will
be based on 12 samples (objects) of jam, selected to span normal quality variations. Several observations and
measurements have been made on the samples.

18 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Agronomic production variables
The samples are taken from four different cultivars, at three different harvesting times:
No Name Cultivar Harvest No Name Cultivar Harvest
time time
1 C2-H1 2 1 7 C2-H3 2 3
2 C4-H1 4 1 8 C4-H2 4 2
3 C3-H3 3 3 9 C1-H2 1 2
4 C3-H1 3 1 10 C3-H2 3 2
5 C1-H1 1 1 11 C1-H3 1 3
6 C4-H3 4 3 12 C2-H2 2 2

Note that the agronomic production variables are not used as input variables in any of the matrices, but they are
known information which is very valuable for the interpretation of the results of the data analysis. They will
be utilized as category variables later.

Variable Set Instrumental


We have measured three chemical and three instrumental variables (colorimetry):
No Name Method No Name Method
1 L Lightness 4 Absorban Absorbance
2 A green-red axis 5 Soluble Soluble solids (%)
3 B blue-yellow axis 6 Acidity Titrable acidity (%)

Note that the variable numbers in that table are within the Instrumental Variable set, and not to the variable
numbers in the original data table.

Variable Set Sensory


Trained panelists evaluated 12 different sensory properties, using a 1-9 point intensity scale. The entries in the
data matrix are the average ratings over all judges. The observed variables are listed hereafter:
No Name Type No Name Type
1 Redness Redness 7 Sourness Sourness
2 Colour Color intensity 8 Bitterne Bitterness
3 Shinines Shininess 9 Off-flav Off-flavour
4 R.Smell Raspberry smell 10 Juicines Juiciness
5 R.Flav Raspberry flavour 11 Thicknes Viscosity/thickness
6 Sweetnes Sweetness 12 Chew.res Chewing resistance

Note that the variable numbers in that table are within the Sensory Variable set, and not the variable numbers
in the original data table.

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 19
Variable Set Preference
114 representative consumers tasted the 12 jam samples and gave them preference scores on a scale from 1-9.
The average over all consumers for each sample is given in the data table.

Sample Sets in Tutorial B


The data table consists of 20 samples. The first twelve samples will be used to make the model. We call them
calibration samples.
Eight other jams were tasted by the trained panelists and given a sensory rating. These samples are the eight
last samples in the table. You see that the preference and the instrumental values are set to missing (m) for
these samples, as these measurements were not done. You will use a calibration model to predict the preference
for these eight samples.

Tutorial B - Insert Category Variables


Category variables are useful to interpret patterns in the data. Here, the raspberries used to make the jam
samples come from different cultivars and were harvested at different times. These parameters are perfect
candidates for category variables.

Task
Insert two category variables Cultivar and Harvest Time.

How to Do It
Open the data file Tutor_b by selecting File - Open. (Lookup Image B001)

B001 The Tutor_b data table displayed in the Editor

Activate a cell in the first column of the table as we will insert our category variables at the beginning of the
table. Then, follow these five steps:

20 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
1. Select Edit - Insert - Category Variable. The dialog: Category Variable Wizard - Enter
Variable Name and Choosing Method pops up. (Lookup Image B002)
2. Enter Category Variable Name Cultivar and choose I want to specify the levels manually. Press
Next.
3. This launches the Specify Levels dialog, (Lookup Image B003) where you must specify the levels
of your new category variable. Use C1, C2, C3, and C4 as the level values for Cultivar. Type in the
name of the level in the Level name field and press Add to add the level; one for each cultivar.
4. Press Finish and the category variable is inserted into the Editor. (Lookup Image B004)
Note that category variable names appear in blue fonts in the Editor to distinguish them from ordinary
variables.
5. All cells are filled with m to denote missing. Enter the level for a sample by double -clicking the
category variable cell. The cell is highlit and a drop-down list appears. Click to see the available
levels and click on the correct one. Use the arrow keys to move up and down in the list. The cultivar
(and harvest time) values are seen in the sample name.

B002 The Category Variable Wizard - B003 The Category Variable Wizard -
Enter Variable Name and Choosing Specify Levels dialog
Method dialog

B004 The Tutor_b data table displayed in B005 The Tutor_b data table displayed in
the Editor (with Cultivar) the Editor (after insertion of Cultivar and
Harvest Time)

Insert the category variable Harvest Time and fill in the correct Harvest Time levels by repeating the five -
step procedure above (Lookup Image B005).

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 21
Tutorial B - Check Variable Sets
In The Unscrambler, matrices are defined by Sample and Variable Sets. It is a good habit to define all Sets
before any analyses are performed.

Task
Check that the three Variable Sets: Instrumental, Sensory and User Preference were defined.

How to Do It
Select Modify - Edit Set to open the Set Editor. Check that the three following Variable Sets were defined:
(Lookup Image B006)
Set name: Instrumental

Data Type: Non-Spectra
Size: 6 variables
Interval: 3-8

Set name: Preference
Data Type: Non-Spectra
Size: 1 variable
Interval: 14

Set name: Sensory
Data Type: Non-Spectra
Size: 12 variables
Interval: 9-13, 15-21

B006 The Set Editor dialog with three User-defined Variable Sets

These sets were defined for you, and automatically saved as data table Tutor_b was saved. When working on
your own data, to create sets press the Add button in the Set Editor dialog. This launches the New Variable

22 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Set dialog. Enter the Name and Data type of your set. Press Select to launch an Editor where you can mark
the variables that belong to the Set you are defining. You may alternatively enter the set intervals directly in
the Set Interval field.

Note that the Set Sensory is not continuous, but consists of two ranges in the data table. Together, the
variables from these two ranges define variable set Sensory.

Tutorial B - Define Sample Sets


In The Unscrambler, matrices are defined by Sample and Variable Sets. It is a good habit to define all Sets
before any analyses are performed.

Task
Define two Sample Sets: Calibration Sam and Prediction Sam.

How to Do It
In the Set Editor dialog, change the set type to Sample Sets and define the following parameters:

Name: Calibration Sam
Set Interval: 1-12

Set Name: Prediction Sam
Set Interval: 13-20

To do this: Press the Add button in the Set Editor dialog. This launches the New Sample Set dialog. Enter
the Name of the set. Press Select to launch an Editor where you can mark the samples that belong to the Set
you are defining. You may alternatively enter the set intervals directly in the Set Interval field.

Save the data file in the Editor before you continue with the tutorial.

Tutorial B, Problem I: Find the Main Sensory Qualities


We find the main variations in the sensory measurements by decomposing the variables with Principal
Component Analysis (PCA). The data matrix is decomposed into scores, loadings and residuals. We will
interpret these results to see whether we can say something about the sensory measurements made on the jam
samples.

Tutorial B - Make a PCA Model

Task
Make a PCA model using the Set Sensory (i.e. one data matrix is decomposed by PCA).

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 23
How to Do It
Select Task - PCA. Specify the following parameters in the Principal Component Analysis dialog:
(Lookup Image B007)

Samples: Calibration Sam [12]

Variables: Sensory [12]

Weights: All 1.0

Validation method: Cross Validation

Num PCs: 8

B007 The Principal Component Analysis B008 The Cross Validation Setup dialog
dialog

Press Setup in the Validation Method field to specify in the Cross Validation Setup dialog that Full
Cross Validation is to be used (Lookup Image B008). This validation method is more time consuming than
leverage correction, but the estimate of the residual variance is more reliable.

No weighting is used in this model, i.e. all weights are set to 1.0, to see which variables do actually vary the
most. However, sensory variables are often weighted when you investigate relationships with other variables.
The most common weighting to use is 1/SDev.

Click OK to start the PCA. You see how The Unscrambler makes a PCA model for each segment, twelve in
all. Finally, the global model is made and the residual variance curve is shown for this model.

Tutorial B - Interpret the Residual Variance for PCA

Task
Interpret the residual variance curve in the PCA Progress dialog. This displays the progress of the modeling.
The residual variance should decrease as the number of PCs in the model increase, and should be as small as
possible.

24 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
How to Do It
The residual variance decreases until PC 5 is reached. Then the residual va riance increases again due to
overfitting. The important decision we face now is to select the optimal number of PCs in this model.

The lowest residual variance is found with 5 PCs, but the residual variance in a model using 3 PCs is not much
worse. A simple model is more robust than a complex one, and easier to interpret. We therefore choose to work
with a model consisting of 3 PCs.
Note that the residual variance shown here is the residual variance for X, while the regression models made in
Tutorial A and later in this tutorial show the residual variance for Y. This reflects the difference between PCA
and regression models. PCA focuses on one matrix, X, containing variables which describe the samples;
regression models like PLS focus on a second matrix, Y, which contains variables to be predicted.

Press View to take a closer look at the other model results. The residual variance turns up again in the PCA
Overview, (Lookup Image B009) which consists of four plots that reveal a lot of information.

B009 The PCA Overview

The Viewer which you are now looking at has the most common model results available for you as predefined
plots in the Plot menu. You can always get this display of your model back via the Results menu. Let us look
at the different plots in the PCA overview.

Tutorial B - Interpret the Variance Plot in the PCA Overview


The Residual Variance curve (in the lower right corner) is similar to the one displayed in the progress dialog.
The residual variance plot is excellent for the selection of the optimal number of components in the model, but
the explained variance plot is easier to understand.

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 25
Task
Change the residual variance plot to an explained variance plot.

How to Do It
Activate the lower right plot by clicking in it. Select View - Source and change this option from Residual
Variance to Explained Variance. You also have access to this menu option by right-clicking with the
mouse in the plot or by using the corresponding toolbar button for explained variance . A more elaborate
way of doing this is to make the plot once again using Plot - Variances and RMSEP, but the other ways to
change the plot are preferred because faster.

The residual variance is now converted to explained variance (Lookup Image B010). The information is the
same, but presented in another way. The residual variance is well suited to find the optimal number of PCs to
use in a model, while the explained variance is a better measure to tell how much of the variation in the data
the model describes.
You see that a model with 3 PCs describes almost 92% of the validated variation in the data; for calibration it
is 97%. You can get the value by clicking at the data point in the plot. Use the toolbar buttons and to
change between having only the calibrated or validated variance curve plotted, or both.

B010 The PCA Explained Variance plot B011 The PCA Scores plot

Tutorial B - Interpretation of the Score Plot


The score plot, also called a map of samples, displays information about the samples in the PCA model.

Task
Interpret plot Scores . Use different plot options to ease interpretation.

How to Do It
The score plot shows the projected locations of the objects onto the PCs, and by studying patterns you may
find the meaning of the PCs. (Lookup Image B011) There are many patterns to be detected from score (and
loading) plots.
On the score plot, you will notice that the 12 samples are not arranged in a random way on the map. When you
move from the left to the right part of the plot, you first encounter samples harvested on time H1, then H2 and
finally H3. Moreover, if you now move from the top to the bottom, you see several C4 samples first, then C3,
then C2, and finally C1.

26 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
The category variables that were inserted into the data table will make things even clearer. Select Edit -
Options. Select the Sample Grouping tab and tick Enable Sample Grouping. Choose the following
options: (Lookup Image B012)

Separate with: Colors

Group By: Value of Variable; Levelled Variable

Markers Layout: Name
You may press Select in the Group By field to select the levelled variable that you want to use as a marker. It
launches an Editor where you can mark the category variable of your choice, for example variable 1: Cultivar.

B012 The Sample Grouping sheet in the B013 The Options dialog, Markers Layout
Options dialog Before and After adjusting the Markers
Layout

Now, we are going to alter the Markers Layout a little. The sample names are entered with an underscore in
the data table. We are going to remove this underscore from the markers in the plot.
Click once in the fifth box in the Name sequence. All boxes that are ticked correspond to letters in the sample
names that will be displayed. Press the <Ctrl> key and click the third box to remove the third character (i.e. the
underscore). (Lookup Image B013)
Note:
The first click marks the beginning; the second click marks the end of a range. Make it a habit to click twice
whenever you want to mark a range of marker characters; once to mark the beginning of the range and once
again to mark the end of the range. Press the <Ctrl> key at the same time as you click a box to (de-)select a box
in the marker.
Press OK. The Scores plot is updated with the Sample grouping options. Each level of the category variable is
assigned a unique color, and the markers in the plot are displayed without the underscore. (Lookup Image
B014)

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 27
B014 PCA, Scores plot with samples grouped by colors

Try to perform a new sample grouping, this time upon category variable Harvest Time.

Tutorial B - Interpretation of the Correlation Loadings Plot


The loading plot, also called map of variables, displays information about the variables in the PCA model.
Variable correlations are even better observed when displaying the Correlation Loadings plot.

Task
Interpret variable relationships in the correlation loadings plot.

How to Do It
Activate the X-Loadings plot by clicking on it, then use menu View - Correlation Loadings or the
corresponding shortcut button. The Correlation Loadings plot is best appropriate to study variable
correlations. (Lookup Image B015)
B015 PCA, Correlation Loadings (X) plot (PC1 vs PC2)

The plot shows that two variables (REDNESS and COLOUR) have an extreme position to the right of the plot
along PC1. They are close to each other, and far from the center, very close to the 100% explained variance

28 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
circle; they correlate positively. This also means that objects lying to the right of the score plot have higher
values for those two variables.

Along the vertical axis (PC2), you notice two variables lying at the top (R.SMELL and R.FLAV), opposed to
variable OFF FLAV which lies at the bottom. So we see that raspberry smell and flavor correlate positively
with each other, and negatively with off-flavor. Thus, the more you move up on the score plot, the more the
smell and flavor of the samples will be characteristic of raspberries.

Tutorial B - Interpretation of Scores and Loadings

Task
Relate Scores (samples) information to Loadings (variables) information.

How to Do It
The Scores plot and Correlation Loadings plot show that samples C2H3 and C1H3 have strong color and
redness intensities, while sample C1H2 has much off-flavour. Samples in one spot of the 2-vector score plot
has, in general, much of the properties of the variables pointing in the same direction in the loading plot,
provided that the plotted PCs describe a large portion of the variance.

PC 3 describes the variation in sweetness, bitterness and chewing resistance. Confirm this by activating the
loading plot (upper right quadrant) and selecting Plot - Loadings. Display PC 1 vs. PC 3 by changing
Vector 2 in the Components field in the Loadings dialog to 3. (Lookup Image B016)
B016 The Loadings dialog

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 29
On this new plot, the horizontal axis is unchanged (PC1) and the vertical axis is PC3. Use View -
Correlation Loadings to better interpret variable correlations along PC3.

Tutorial B - Interpretation of the Influence Plot

Task
Interpret the influence plot, which is used to look for outliers.

How to Do It
The influence plot is displayed in the lower left quadrant. The strongest outliers are placed in the upper right
corner of the plot, and have a large leverage and a high residual variance. In this particular case, we do not see
any outliers. (Lookup Image B017)
B017 PCA, Influence Plot

Close the PCA overview and save the results file with the name Tutorial B PCA. Close all other Viewers you
may have open at the same time.

Tutorial B, Problem II: Explore the Relationships between


Instrumental /Chemical Data (X) and Sensory Data (Y)
Would it be possible to predict the quality variations by using the instrumental measurements only? Training
and using a sensory panel is costly and time consuming. Producers of jam would be happy if they could predict
quality variations by measuring some properties by instrumental means. Our task is therefore to make a
regression model and interpret this model to see whether this goal can be achieved.

Tutorial B - Make a PLS Regression Model


In The Unscrambler, the regression between two matrices can be found with the he lp of several methods. Here,
we choose to use PLS, because this method utilizes the information both in X and Y.

30 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Task
Make a PLS2 regression model that predicts the variations in sensory variables from instrumental and chemical
variables.

How to Do It
Select Task - Regression. Specify the following parameters in the Regression dialog: (Lookup Image
B018)

Method: PLS2

Samples: Calibration Sam [12]

X-variables: Instrumental [6]

Y-variables: Sensory [12]

Weights: All 1/SDev in X and Y

Validation Method: Cross Validation

Number of components: 6

B018 The Regression dialog B019 PLS2 Regression Set Weights


dialog

Press Weights to launch the Set Weights dialog. (Lookup Image B019) Press All to change the weighting
of all variables at the same time. You can also select the variables by clicking on them in the list. Remember to
hold <Ctrl> down while you select several variables. Choose the A / (Sdev +B) radio button. Use constants A
= 1 and B = 0.
We are weighting all variables by dividing them with their own standard deviations. This allows all variables
to contribute to the model, regardless of whether they have a small or large standard deviation from the outset;
what really counts is the systematic variation.
Press Update and see the weights change in the list, then click OK.
Remember to adjust the weights for both X-variables and Y-variables.

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 31
Press Setup to launch the Cross Validation Setup dialog and choose Full Cross Validation as the cross
validation method. Normally it is more practical to use leverage correction in the first calibration runs to detect
outliers etc., and re-calibrate with a proper validation method (e.g. cross validation) as the last step.
Click OK in the regression dialog when you have set all parameters. The PLS2 Regression Progress
dialogs shows how the different segments are being made before the final model is calibrated. The prediction
error is minimized after five PCs, but the first local minimum is 0.84 after two PCs, which we must choose to
avoid overfitting.

Click View to look at the Regression Overview. (Lookup Image B020)


B020 The PLS2 Regression Overview

This Viewer is your gateway to your model. You can choose the most useful and common predefined result
plots, e.g. loading weights and residuals, from the Plot menu. At later stages you can always review this model
by using Results - Regression and selecting this results file.
Before we continue with the interpretation, let us take a look at the warnings that were issued during the
calibration.

Tutorial B - Interpretation of the Warning List


Many statistical tests are performed by The Unscrambler during the calibration of the model. The warnings
given in the Warning List are derived from a great number of statistical tests.

Task
Interpret the warnings given for this model in the Warning List.

32 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
How to Do It
Use Window - Warning List to display the warnings at the bottom of the screen. In this case, the warnings
relate to the variance curves and do not indicate any outliers in the data set.
You may also want to take a look at the actual tests that lead to these warnings by looking at the outliers list.
Click the Outliers button in the warning window to see the outlier tests displayed in the Outlier List dialog.
For details on how to find and identify outliers, see Tutorial C.

Tutorial B - Interpretation of the Variance Plot

Task
Interpret the explained variance curve, which can be shown as residual variance, as it was in the PLS
Regression Progress dialog, or as explained variance. The two different views are useful for different
tasks.

How to Do It
The residual variance plot in the lower left corner is the same as you saw in the PLS Regression Progress
dialog. We saw that a local minimum was reached with two PCs. Now we want to look at how much each of
the six first Y-variables are described by the model. We do this by looking at the explained variance.

Activate the lower left window. Select Plot - Variances and RMSEP and use the X- or Y-variance tab,
where you specify the following parameters: (Lookup Image B021)

Variables: Y; 1-6

Samples: Validation
And check the Total box. Press OK.

B021 Variances and RMSEP dialog, X- B022 PLS2, Explained Validation Variance Plot
or Y-variance sheet displayed for the Total model and for the six
individual Y-variables

Make sure that the plot shows the Explained Variance. If not, change it by selecting View - Source -
Explained Variance. (Lookup Image B022)

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 33
We concluded from the residual variance curve that two PCs were optimal. Here, we see that the variables that
are well described are done so by two PCs. About 85% of the color variation (variables 1 and 2), and 80% of
the variation in sweetness (variable 6) can be explained by a combination of the chemical and instrumental
variables.
Note that only 23% of the total Y-variance is explained by the model using two PCs.

Tutorial B - Interpretation of the Score Plot


The score plot shows how the samples are related to each other.

Task
Interpret the score plot.

How to Do It
The score plot shows patterns in the samples. This is often difficult to see without some help. Use the category
variables as markers the same way you did in Tutorial B - Interpretation of the Score Plot for the PCA
model, using Edit - Options from the Scores plot, and selecting the relevant options in the Sample
Grouping tab of the Options dialog.

You see that PC 1 describes the harvesting time. Harvest time 1 is placed to the left in the plot and harvest time
3 to the right. The score plot does not reveal information about the cultivars.
A comparison with the loading plot gives more information. Try to interpret the two plots (Scores and
Loadings) together.

Tutorial B - Interpretation of the Loadings and Loading Weights


Plot
The loadings plot is used together with the score plot to interpret it. Study the loading weights plot to find
correlating variables.

Task
Interpret the loadings plot.
Interpret the loading weights plot.

How to Do It
The loadings plot is located in the upper right quadrant. Activate it and select Plot - Loa ding Weights.
(Lookup Image B023) On the General sheet, make sure you plot both X and Y, which gives you the
loading weights for X and the loadings for Y. Plot PC1 vs. PC2 in the upper right corner.

B023 Loading Weights dialog, B024 PLS2, X-Loading Weights and Y-


General sheet Loadings Plot

34 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
General sheet Loadings Plot

Draw straight lines between the variables through the origin. Variables along the same line, far from the origin,
may be correlated. (Negatively correlated when situated on opposite sides of the origin.) (Lookup Image
B024)
It seems that the spectrophotometric color measurements (L, A, and B) are strongly negatively correlated with
color intensity and redness. Sweetness is, as expected, rather strongly negatively correlated with measured
Acidity. But the R. Flavor shows weak correlation to the PLS-factors (near origin = low PLS loadings).

We learned in Problem I that the jam quality varied both with respect to color, flavor, and sweetness. But the
results so far in Problem II show that the chemical and instrumental variables mainly predict variations in color
and sweetness (which is indicated by the low explained Y-variance of Flavor). This means that we cannot
replace the Y-variable Flavor with the present set of X-variables. There is no information in the chemical and
instrumental measurements we have made that are related to the Flavor content in the jam samples.
Use of other instrumental X-variables, e.g. gas chromatographic data, could probably have increased the flavor
prediction ability of the raspberry jam data.

Tutorial B - Interpretation of the Predicted vs Measured Plot


The predicted vs. measured plot displays the predictions done during calibration.

Task
Interpret the predicted vs. measured plot.

How to Do It
The predicted vs. measured plot in the regression overview currently displays the results for the first Y-
variable. (Lookup Image B025) Use Plot - Predicted vs Measured to see how the predictions are for
other variables. Make sure to display these plots for two PCs, as this is the right number of PCs for our model.

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 35
B025 PLS2, Predicted vs Measured Plot for variable Redness, model with two PCs

Close the results Viewer and save it with the name Tutorial B Inst-Sens.

Tutorial B, Problem III: Predict User Preference from Sensory


Measurements
Is it possible to use this model to predict new preference data from new sensory data, so that expensive
consumer tests can be replaced by the cheaper sensory tests? The PLS model we made previously was used for
interpretation purposes. Now we want to focus on prediction.

Tutorial B - Make a PLS1 Regression Model


First, we have to make a model which we can interpret. We still use PLS as the regression method, but we have
to change from PLS2 to PLS1 because we have only one Y-variable.

Task
Make a PLS1 regression model of the relationships between sensory data and preference.

How to Do It
From the Editor, select Task - Regression, and specify the following parameters in the Regression dialog:

Method: PLS1

Samples: Calibration Sam [12]

X-variables: Sensory [12]

Y-variables: Preference [1]

Weights: All 1/SDev in X and Y

Validation method: Full Cross Validation

Uncertainty test: on

Number of components: 6

36 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Press Weights to launch the Set Weights dialog, and weight all variables with 1/Sdev to get them in the
same range and let them contribute equally in the modeling.
Press OK. In the PLS1 Regression Progress dialog, we see that the residual variance seems to decrease all
the time, which may lead us to think that we should use five or six PCs for predictions. Let us for the residual
variance plot in the regression overview before we decide upon number of PCs to use. Click View to open the
regression overview. (Lookup Image B026)

B026 PLS1 Regression Overview

Tutorial B - Interpretation of the Regression Overview

Task
Interpret the regression overview plots, which display the necessary plots to diagnose the model quickly.

How to Do It
We are mostly interested in how well the model can do the predictions. We therefore only comment on the
residual variance and the Predicted vs Measured plots.

The Residual Variance


Activate the residual variance plot by clicking on it. Use Window - Copy To - 1 to display the plot in Full
window mode. (Lookup Image B027) You see that the prediction error stops decreasing significantly after
two PCs (0.143). Therefore we will use a model with two PCs.

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 37
B027 PLS1, Residual Validation Variance Plot B028 The Predicted vs Measured dialog

Predicted vs Measured
Activate the predicted vs. measured plot and select Plot - Predicted vs Measured. Specify the following
parameters in the Predicted vs Measured dialog: (Lookup Image B028)

Y-variable: 1

Components: 2

Samples: Validation
Press OK.
Turn on the regression line and the target line with View - Trend Lines. (Lookup Image B029)
We see that the predictions are fairly good. Some samples are not so well predicted, but the overall correlation
coefficient is good. The warnings issued are of no real consequence for this model.

B029 PLS1, Predicted vs Measured Plot B030 PLS1, Regression Coefficients Plot
with trend lines
Predicted Y
9
Elements: 12
Slope: 0.838829 C1_H3
Offset: 0.669368
Correlation: 0.921301 C2_H3
C1_H2
RMSEP: 0.830774
6
SEP: 0.855452 C3_H2
Bias: -0.139174 C2_H2 C3_H3C4_H3
C1_H1

C4_H1C2_H1
C4_H2
3

C3_H1
0
Measured Y

0 2 4 6 8 10
Model, (Y-var, PC):(PREFEREN,2)

38 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Tutorial B - Interpretation of the Regression Coefficients
The regression coefficients are used to calculate the response value from the X-measurements. The size of the
coefficients gives an indication of which variables have an important impact on the response variables.
There are two kinds of regression coefficients, Bw and B. The Bw coefficients are calculated from the
weighted data table and are used for interpretation. The B coefficients are calculated from the raw data table
and are used for predictions.

Task
Find which variables are important for predicting Y-variable Preference.

How to Do It
The estimated regression coefficients tell us the cumulative importance of each of the sensory variables to the
consumer preference.

Select Plot - Regression Coefficients. Double-click on the preview screen to make the plot fill the whole
Viewer. Choose the Weighted coefficients (BW) option. Specify 2 Components before you click OK.
(Lookup Image B030)

Use Edit - Options to change the layout of the plot to Bars. Then, select Edit - Mark - Significant X-
Variables Only. (Lookup Image B031)
B031 PLS1, Regression Coefficients Plot after automatic marking of significant X-
variables

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 39
Redness, Color and Sweetness are statistically significant in predicting Preference. Raspberry Smell is also
significant, but contributing negatively to the Preference. Thickness seems to be of importance also as it has a
large (negative) coefficient, however it is not shown significant in this model.

Save the results file with the name Tutorial B Sens-Pref.

Tutorial B - Open Result Matrices in the Editor


You may want to take a look at the result matrices numerically. Comparison of results may be easier in tables
and the Editor is a good starting point if you want to get data into other programs like Microsoft Excel or
Microsoft Word.
The Raw regression Coefficients (B) are available as a predefined plot from the Plot menu in the Regression
results Viewer. However, for the exercise you will learn how to i mport the B coefficients from the list of
numerous available matrices.

Task
Import regression coefficients into an Editor.

How to Do It
Select File - Import - Unscrambler Results and select Import data into New data table in the Import
Target dialog. (Lookup Image B032) Select the file Tutorial B Sens-Pref in the Import dialog. You
will find the file when you use File of Type: Regression.
B032 The Import Target dialog B033 The Import from Regression Result
dialog

In the Import from Regression Result dialog, mark the matrix B and select PCs: 2 in the field below the
matrix list, and then select B0 as well in the matrix list. (Lookup Image B033)

40 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Note that B may be used for prediction of new, un-weighted data, while Bw (studied above in the Regression
Viewer) should be used with new, weighted data. Always identify important variables by studying Bw when
the data used in the model have been weighted.

Click OK. An Editor with the regression coefficients is launched. (Lookup Image B034) The b-coefficient s
can then be treated as every other data in an Editor. You may plot the coefficients from the Plot menu, etc.
B034 Editor with the imported B coefficients from the PLS1 model relating
Preference to sensory properties

Close the Editor with the imported B-coefficients before you proceed.

Tutorial B - Export Unscrambler Models


Unscrambler models are often used in instruments to make predictions on the fly. A model format has been
developed to facilitate the easy reading of results in instruments or other software that do not read Unscrambler
models directly.

Task
Export the regression model used to predict Preference from Sensory Data.

How to Do It
Select Results - Regression and find the result file Tutorial B Sens-Pref. Mark it and look at the
information given in the lower part of the dialog. Here you see which Sample and Variable Sets were used in
the modeling, whether you used weighting, etc. The information given here is very useful when you want to
find a particular model at a later stage.
Click on the Export button. This launches the Export Model dialog. (Lookup Image B035) Select Ascii-
Mod to launch the dialog Export ASCII-MOD.

B035 The Export Model dialog B036 The Export ASCII-MOD dialog

The optimal number of components should be used in the export. Therefore, change the number of PCs to 2
before you click OK. (Lookup Image B036)

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 41
Full ASCII-MOD export includes all results that are necessary to do outlier detection, etc. You may want to
use this format if you need to use Unscrambler models outside The Unscrambler, for example in a program you
wrote yourself. The ASCII-MOD file is readable by any ASCII editor.

Close all open Viewers.

Tutorial B - Predict Preference for New Samples


Regression models are mainly used to predict the response value for new samples. It is more efficient to predict
these values than to make the reference measurements, which often are time consuming and expensive.
The purpose of the model you previously made was to predict the jam preference for some consumers.

Task
Predict the Preference for the jam samples.
Interpret the prediction results to see whether the predictions can be trusted.

How to Do It
Activate the Tutor_b Editor. Select Task - Predict and specify the following parameters in the Prediction
dialog: (Lookup Image B037)

Samples: Prediction Sam
8

X-variables: Sensory
12

Y-reference: Not included

Model: Tutorial B Sens-Pref

Number of Components: 2
Click OK to perform the prediction.

B037 The Prediction dialog B038 Prediction Results Predicted with Deviation
Plot

42 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Tutorial B - Interpretation of Predicted with Deviation
No reference measurements were made for the samples in the Prediction Sam Set. This makes it impossible
to check predicted vs. measured values. Because we have made a model based on projection, we have an
option left: To check the reliability of the predictions from the deviations.

Task
Interpret the Predicted with Deviation plot.

How to Do It
Click View in the Progress dialog to see the predicted with deviation plot. (Lookup Image B038)
Predicted preference for the unknown new jams have some uncertainty limits, i.e. the accuracy of new
predictions is not so good, but this model can be used to predict the preference of new jam samples to give an
indication of which ones will be accepted or not by customers.
Save the results file under the name Tutorial B Predict 1.

Tutorial B - Check The Error in Original Units (RMSEP)


Finally, let us see how large we have to expect the error in preference to be in predictions, i.e. what the Root
Mean Square Error of Prediction (RMSEP) is.

Task
Plot the RMSEP.

How to Do It
The information you need is stored with the PLS model -Tutorial B Sens-Pref. Therefore, we have to find a
way to look at those old results. This is done by opening a results Viewer again. Select Results -
Regression. Mark the model and click View. The regression overview appears.
Select Plot - Variances and RMSEP and go to the RMSE sheet. Double-click on the preview screen to fill
the whole Viewer with the RMSE plot. (Lookup Image B039)

B039 Variances and RMSEP B040 PLS1, Root Mean Square Error Plot
dialog, RMSE sheet

The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 43
Select only the RMSEP in the Samples section. Click OK. (Lookup Image B040)

Now you can study the RMSEP for Preference for all PCs. RMSEP (using two PCs) is 0.83. This means that
any predicted new sample on the scale from 1 to 9 will have a prediction error around 0.8. This is an acceptable
error level in sensory analysis, which has much uncertainty in all measurements.

44 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Spectroscopy and Interference Problems (Tutorial C)
Description of Tutorial C
Context of Tutorial C
We need an easy way to determine the concentration of dye (a brightly red-colored heme protein, Cytochrome-
C), predicted variable Dye, in water solutions. Dye absorbs light in the visible range, and we want to base the
concentration determination on this light absorbance.

In the solutions to be analyzed there are varying, unknown amounts of milk, which absorbs some light in the
same wavelength range as dye and therefore causes chemical interference in the measurements. In addition,
milk contains particles that give serious light scattering.
Another effect that will influence the absorbance spectra is the varying sample thickness.

The Light Absorbance Spectrum figure shows the light absorbance spectrum of one sample of the
dye/milk/water solution (Lookup Image C001). The vertical lines represent the 16 different wavelength
channels selected as predicting variables - ( x1 , x2 , , x 16 ) for this sample.

This example is constructed to enable duplication in a lab. This illustrates so well the interference effects and
other effects that make spectroscopy difficult. However - similar problems occur at many industrial
applications, eg. at measuring the concentration of different chemical species in sewer water, w hich contains
many other chemical agents, as well as physical interferences like slurries and particles.

The two major peaks (channels x 4 and x 6 ) represent the absorbance of dye, while the first peak ( x 2 )
represents absorbance due to an absorbing component in the milk. The broad peak to the right ( x12 , x13 and
x14 ) is due to light absorption by water itself.

What You Will Learn in Tutorial C


Tutorial C contains the following parts:

Handling of interference problems, MSCorrection

PLS regression

Outlier warnings

A problem similar to this tutorial is described extensively in chapter 8 in the book Multivariate Calibration,
by Martens & Naes.

The Unscrambler Tutorials Spectroscopy and Interference Problems (Tutorial C) 45


Tutorial C Data Table
A data table is prepared consisting of 28 samples (samples of solutions) that spans the two most important
types of variations: the dye and milk variations. The composition of dye/milk/water in each calibration sample
is shown. The values are given in ml making a total of 20 ml in each solution (sample).

Sample Dye Milk Water Sample Dye Milk Water


1 0.0 0.5 19.5 15 4.0 0.5 15.5
2 0.0 1.0 19.0 16 4.0 1.0 15.0
3 0.0 2.0 18.0 17 4.0 1.5 14.5
4 0.0 6.0 14.0 18 4.0 6.0 10.0
5 0.0 8.0 12.0 19 4.0 10.0 6.0
6 0.0 10.0 10.0 20 6.0 1.0 13.0
7 2.0 0.5 17.5 21 6.0 2.0 12.0
8 2.0 1.0 17.0 22 6.0 6.0 8.0
9 2.0 1.5 16.5 23 6.0 10.0 4.0
10 2.0 2.0 16.0 24 8.0 0.5 11.5
11 2.0 4.0 14.0 25 8.0 1.0 11.0
12 2.0 6.0 12.0 26 8.0 1.5 10.5
13 2.0 8.0 10.0 27 8.0 2.0 10.0
14 2.0 10.0 8.0 28 8.0 6.0 6.0

Note that the known Milk and Water quantities will not be used to make the model, only as descriptors in
result plots. The sample names are coded with these quantities as well.

Note: You will find the illustrations for this tutorial (Image C001, etc) at the end of the document.

Tutorial C - Read Data File and Define Sets


The first step in all modeling is to get the data into The Unscrambler and define the Sets. The data which we
feed to the different analyses are organized as Sets. Cleverly defined Sets make our modeling and plotting
work much easier.

Task
Open the data table and take a look at the properties of the data. Then define Sets to be used in the analyses.

How to Do It
Select File - Open and the file Tutor_c from the Examples directory. An Editor with the data table is
launched.

Go to Modify - Edit Set to define the necessary Variable and Sample Sets for later analyses in the Set
Editor. Define the Variable Sets and Sample Sets by clicking Add and entering the intervals given here:

46 Spectroscopy and Interference Problems (Tutorial C) The Unscrambler Tutorials


Variable Sets:

Name: Absorbance
Data Type: Spectra
Interval: 4-19

Name: Dye Level
Data Type: Non-spectra
Interval: 3

Name: Description
Data type: Non-spectra
Interval: 1-2

Name: Statistical
Data type: Spectra
Interval: 3-19

Sample Sets:

Name: Calibration
Interval: 1-28

Name : Prediction
Interval: 29-42

Click OK when you have finished defining the variables and samples sets and save the Editor before you
continue.

Tutorial C - Plot Raw Data


You should always start by plotting the raw data to get an impression of what you have. It will be of
tremendous help when you want to assess which pretreatments are necessary and what kind of model (e.g. how
many PCs) to expect.

Task
Plot some calibration samples in order to see how the spectra vary with varying amount of dye and milk.

How to Do It
We want to plot samples that have the same amount of milk, 10 ml. Do this by marking the samples in the
Editor (samples 6, 14, 19, and 23). Use Edit - Select Samples and specify the sample numbers in the dialog
(Lookup Image C002). The Selection method should be Select. Click OK and you see that the four
samples are marked in the Editor. You could do the same by clicking the sample numbers while holding down
the <Ctrl> key.

Select Plot - Line and specify that you wish to use the Variable set Absorbance in the Line Plot dialog
(Lookup Image C003).

The Unscrambler Tutorials Spectroscopy and Interference Problems (Tutorial C) 47


Use Edit - Options and change Plot Layout to Curve, if your plot shows bars instead of curves. The plot
should look like this: (Lookup Image C004)

These four samples have the same milk level and the plot shows that the dye level has infl uence on the
absorbance of wavelengths number 2 - 8 only.

Plot samples 20, 21, 22, and 23 the same way. These samples have the same dye level, 6 ml.

The plot shows that increasing milk level will increase the absorbance of light of all wavelengths from number
1 to number 16. There seems to be a great deal of interference or scattering to deal with, over the whole
spectrum. This indicates that we may have to do some transformations of our data to get an optimal model.

Close the Viewer so that the Editor with the data is active.

Tutorial C - Univariate Regression


Is it possible to predict the dye level from the absorbance of one single wavelength? Before we enter the
multivariate world we want to see what can be done by univariate regression.

Task
Find the best wavelength on which to make a univariate regression model.

How to Do It
You find the best wavelength by looking at the correlation between each absorbance variable and the Dye level
variable. Activate the Tutor_c Editor. Select Task - Statistics and specify the following parameters in the
Statistics dialog (Lookup Image C005).

Samples: Calibration [28]

Variables: Statistical [17]

Click Close instead of View and save the result file with the name Tutorial C Statistics. We are going to
import the correlation matrix from the result file into an Editor instead.

Select File - Import - Unscrambler Results. Specify New data table in the Import Target dialog to
avoid overwriting the data table in the Editor.

In the Import dialog, change the Files of type to Statistics and select Tutorial C Statistics before you click
Import to launch the Import from Statistics Result dialog (Lookup Image C006). The matrix where the
correlation results are stored is called StatCorr and you should import Group 1.

48 Spectroscopy and Interference Problems (Tutorial C) The Unscrambler Tutorials


What you look for in the new Editor is the highest correlation between Dye Level and some X -variable. You
may mark the first variable and plot it to see the highest correlation (after the correlation between Dye level
and Dye level, which of course is 1).

The variable with the highest correlation coefficient to Dye Level is Xvar6 with a correlation coefficient of
0.49. Close the Editor with the correlation matrix; you do not need to save the Editor. The values in the Editor
are the correlation coefficients between the variables.

Now we should illustrate the regression in a plot. To get the right plot we have to copy Xvar6 to a variable left
of Dye Level. Mark the Xvar6 variable in the Tutor_c Editor. Then, click and hold the <Ctrl> key as you click
inside the marked column and drag the Xvar6 until the Dye Level variable is framed. Release the mouse button
and the Xvar6 is copied (Lookup Image C007).

Mark the two variables and select Plot - 2D Scatter. Remember to plot only the calibration samples
(Lookup Image C008).

Turn on the Regression Line and Target Line with View - Trend Lines, if they are not tuned on by
defaut.. Hopefully we can do better with multivariate regression models. Close the Viewer after you have
studied the plot. Mark the copied variable in the Editor (column 3) and delete it.

Tutorial C - Calibration
We choose to make a PLS regression model because PLS takes the variation in Y into consideration when the
model is calibrated.

Task
Make a PLS regression model between the variable set Absorbance (X) and the variable set Dye Level(Y).

How to Do It
Activate the Tutor_c Editor and select Task - Regression. In the Regression dialog, specify the following
parameters:

Method: PLS1

Samples: Calibration [28]

X-variables: Absorbance [16]

Y-variables: Dye Level [1]

Weights: All 1.0 in X and Y

Validation method: Leverage Correction

Num PCs: 10
Start the calibration by clicking OK.

The Unscrambler Tutorials Spectroscopy and Interference Problems (Tutorial C) 49


The calibration output screen shows outlier warnings in almost any PC (two in PC 1, eight in PC 2, etc.). (The
bar showing the residual y-variance (prediction error) is very relevant, and we will come back to it later in the
tutorial.)

Tutorial C - Identify Outliers


We are a little concerned about the warnings given from the calibration.

Task
Find an outlier by looking at warnings and plots.

How to Do It
Click View to enter the Regression Overview plot. This shows the most important regression results, but
we are more interested in the warning list. Select Scores plot by clicking on it. (Lookup Image C009)

Select Window - Warning List if the warning list is not visible. . A dockable view appears with all warnings
listed (Lookup Image C010). The first warnings indicate that some samples are outliers. Look for further
information in the outlier list by clicking the outliers button.

Sample 8 is listed frequently. Investigate that sample further by plotting the raw data table.

Activate the Tutor_c Editor, mark samples 7, 8, 9, and 10. Select Plot - Line and use the Variable set
Absorbance (Lookup Image C011).

It is obvious that the pattern in sample 8 is typical. Samples that are very different from others may distort the
model so much that it becomes useless for future use. So this sample should not be included in the calibration
samples used to make the model.

The detection of outliers and the way you should treat them is an important, but difficult task. It makes no
sense to interpret the model as long as outliers are present. Close the Viewer with the line plot and save the
result file with the name Tutorial C. Now you should make a new model without the known outliers.

Tutorial C - Recalibration with Outliers Removed


Once outliers have been identified, we should remake the model without these.

Task
Make a new PLS1 model with the same parameters as before, but with sample 8 kept out of the calculations.

How to Do It
Activate the Tutor_c Editor and select Task - Regression. In the Regression dialog, specify the following
parameters:

Method: PLS1

50 Spectroscopy and Interference Problems (Tutorial C) The Unscrambler Tutorials



Samples: Calibration [28]

X-variables : Absorbance [16]

Y-variables: Dye Level [1]

Weights: All 1.0 in X and Y

Validation method: Leverage Correction

Num PCs: 10

Go to the Samples sheet and click Select next to the Keep Out of Calculation field. An Editor pops up
where you can mark the samples that should not be used in the calibration. Mark sample 8. Several samples are
marked by holding down <Ctrl> while clicking on the samples to be marked. If you mark some other samples,
you may deselect them by holding down <Ctrl> while you click on the undesired samples. Click OK and the
sample 8 is inserted in the Keep Out of Calculation field.

Press OK to start the calibration (Lookup Image C012).

There are still some warnings issued, but they do not make any real harm to the model. We go on and proceed
to look at other modeling results. Do not click View yet!

Tutorial C - Study the Residual Variance


The residual variance tells us a lot about how the model performs. In PLS models, it should decrease
continuously. An increase indicates that there is a problem which should be identified and removed.

Task
Study the residual variance in the model.

How to Do It
We want to study the prediction error in the screen output.
(Lookup Image C012). The horizontal bars in the PLS1 Regression Progress dialog indicate the residual
variance after each PC. The first bar, in PC 0, represents the total variance. The second bar is about 10%
smaller, meaning that PC no 1 explains about 10% of the total variance. After PC no 2 about 2/3 of the total
variance has been explained. The numerical value of the residual Y-variance is shown, too.

The variation in the calibration samples cannot be described significantly better with any new PC after the five
first PCs. Very little more variance is explained by PC 8, but we still have not explained all of the variance.
After 8 PCs, observe how the prediction variance now increases slightly again, due to overfitting and noise
modeling.

The minimum estimated residual variance is less in this run than in the previous run: now 1.1 compared to 2.2
in the first model. It seems that seven PCs will give the optimal model. Eight PCs give a smaller variance, but
the difference is too small to motivate the use of more PCs.

The Unscrambler Tutorials Spectroscopy and Interference Problems (Tutorial C) 51


Now you can click View.

Tutorial C - Interpretation of the Calibration Model


The interpretation of a calibration model involves several steps. First, we check whether the model has caught
up any systematic variation. This was done by looking at the residual variance in the last section.

If the model has successfully described systematic variation, we start to interpret diff erent additional modeling
results. The most important model results to study then, is the Scores, Loadings, and the Predicted vs
Measured.

Task
Interpret the plots in the regression overview.

How to Do It
The regression overview was launched when you clicked View (Lookup Image C013). It consists of four
plots of the most important modeling results from the regression model. Save the results file under the name
Tutorial C No Outliers before you continue.

The plot in the lower left corner is the residual variance. This is the same results as you saw in the regression
progress dialog while the model was being calibrated. We do not comment further on this plot.

Score Plot
The plot in the upper left corner is the Scores plot. From the Scores plot we can interpret that, the combination
of two main PCs, PC 1 and PC 2, reflects the variations in the milk and water levels. The milk level increases
from upper left to lower right in the plot, while the water level increases from right to left.

Select Edit - Options and go to the Sample Grouping sheet, where you check Enable Sample Grouping .
Go to Markers Layout and select Value of Variable where you specify Y-variables 1. The score plot now
reveals a clear pattern from lower left to upper right in the plot. You would see the same information in a 2D
scatter loading plot.

Regression Coefficients
The plot in the upper right corner displays a regression coefficients line plot instead of a 2D scatter plot of
loadings and loadings weights (which is default), when models are made from data other than spectral. This
happens because we changed the Data Type to Spectra (see section Read Data File and Define Sets). It is
easier to interpret the regression coefficients plots than loading and loading weights plots when the variables
are functions of another implicit variable, such as wavelength or time.
Use Edit - Options to change the plot layout to bars instead of a curve.

The regression coefficients plot summarizes the relationship between all predictors and a given response.

52 Spectroscopy and Interference Problems (Tutorial C) The Unscrambler Tutorials


The regression coefficients plot indicates that the wavelength numbers (X-variables) 4 and 6 are the most
important for the prediction of Y (concentration) in the first PC. The pattern is clearer here than in t he loading
plot.
Compare the regression coefficients specter to the raw Absorbance data. You see that high loading values
indicating important variables are present in the region where we know that milk and dye absorbs light.

Tutorial C - Study the Predicted vs Measured Plot


This plot shows how the model is able to predict the response value for the calibration samples. This gives an
indication of how well the model will perform in the future when new samples are collected and we want to
calculate the dye level for these samples, without actually measuring it.

Task
Take a closer look at the residual variances in the error measures plots.

How to Do It
Activate the Predicted vs Measured plot and select Plot - Variances and RMSEP and go to the X- and
Y-variance sheet, where you specify the following parameters: (Lookup Image C014)

Variables: Remove the number in the X: and Y: boxes. Only the total variances should be plotted

Samples: Both Calibration and Validation

Change the variance from residual to explained by selecting View - Source - Explained Variance
(Lookup Image C015). The upper plot shows that the model describes much of the variance in the X -
variables in the first PCs, while it takes longer time in the lower plot to describe the variance in Y (d ye level).
We are interested in describing Y, therefore we have to include enough PCs in our model to get a high
explained variance for the Y-variable.

Note that the model results are available to you as predefined plots in the Plot menu when you have a result
Viewer active. Activate the Tutor_c Editor and see that the Plot menu changes to general plot options.
Sometimes you close the result Viewer by accident. You can then get the predefined plots back by selecting
Results - Regression and opening the Tutorial C No Outliers result file with the View button. The result
Viewer with the regression overview is launched.

Tutorial C - Multiplicative Scatter Correction (MSC)


Since we suspect that the light scattering and sample thickness have multiplicative effe cts on the data, and that
the chemical absorptions have additive effects, we decide to try MSCorrection on the X -variables in order to
separate these effects from each other.

Perform a Multiplicative Scatter Correction

Task
Correct the data for multiplicative scatter effects. Omit variables 1 to 8 in the Set Absorbance as important
variables.

The Unscrambler Tutorials Spectroscopy and Interference Problems (Tutorial C) 53


How to Do It
Activate the Tutor_c Editor and save it with another name using File - Save as.

First, we verify the need for MSC by looking at the Scatter Effects plot. This plot is available from a
Statistics model. Select Task - Statistics and specify the following parameters in the Statistics dialog:
Samples: Calibration [28]


Keep out of calculation (samples): 8
Variables: Absorbance [16]

Click OK to make the model and click View in the Progress dialog when it is finished. Then use Plot -
Statistics and select the Scatter sheet. Click the All button and then OK to make the plot.

The plot has to be scaled so the origin is shown. Do this with View - Scaling - Min/Max and enter 0 in both
From fields (Lookup Image C016). Click OK.

The regression lines intercept roughly in the origin, which indicates no need to correct for offset (Lookup
Image C017). But the regression lines have different slopes, which calls for MSC using common
amplification.
Close the Viewer before you continue.

Select Modify - Transform - MSC. Specify the following parameters in the Multiplicative Scatter
Correction dialog: (Lookup Image C018)

Samples: Calibration [28]

Variables: Absorbance [16]

Function: Common Amplification

Test Samples: 8

Omit Important Variables: 1-7

Test samples are not used to find the correction factors we want to find now and use in the MSC. Sample 8 is
an outlier and will give slightly inferior results if it is used. We therefore include it in the Test Samples to
avoid that it is used.

Variables 1-8 are omitted as important because the light absorption of these variables vary with the dye level,
while wavelengths 9 to 16 (the water absorption peak) is independent of the concentration of dye. The
difference in these wavelengths is instead caused by the general light-scatter due to milk addition. It is
important that only wavelengths with no chemical information is used to find the correction factors.

Save the MSC model with the name Tutorial C MSC Model. Save the corrected data now displayed in the
Editor with the name Tutorial C MSCorrected.
Look at the corrected data by launching a general Viewer (Results - General View ) and selecting Plot -
Line. Select the data file you just saved with the corrected data in the Line Plot dialog. Plot Samples 20 -
23 using the Variables Set Absorbance. (Lookup Image C019)

54 Spectroscopy and Interference Problems (Tutorial C) The Unscrambler Tutorials


We want to compare the corrected data with the original data. Select Window - Copy To - 2. The Viewer is
split in two and the current plot is the upper one (Lookup Image C020). Click on the blank lower half in the
Viewer to activate it.

Now we are going to plot the original data, but this time not from the original data file. Select Plot - Line and
find the result file Tutorial C No Outliers (Lookup Image C021). You see that the raw data from which the
model is made is saved together with the model matrices. This time you do not have to specify a Variable Set
because the raw data used for the model is only the X-variables from that Set.

Plot samples 20 - 23 from Xraw (Lookup Image C022). You see that the MSCorrected data are different
from the original. The interference and light scatter effects have successfully been corrected for.

Calibrate with MSCorrected Data


So far we have only corrected the data, now we have to make a new PLS model using MSCorrected data.

Task
Make a PLS1 model with the same model parameters as the model Tutorial C No Outliers.

How to Do It
Activate the Editor with the corrected data. Select Task - Regression and specify the following parameters
in the Regression dialog:

Method: PLS1

Samples: Calibration [28]

X-variables: Absorbance [16]

Y-variables: Dye Level [1]

Weights: All 1.0 in X and Y

Validation Method: Leverage Correction

Num PCs: 10

Remember to keep sample 8 out of calculation. It is still an outlier.

Click OK to make the model. See how the residual variance decreases faster for each PC in this model
compared to the previous models. The MSCorrection has improved the model.
Click Close and save the model under the name Tutorial C MSCorrected.

Tutorial C - Comparison of All Models


We are now interested in seeing how the model performs with regard to prediction ability. The residual
variance is therefore the yardstick we compare the different models.

The Unscrambler Tutorials Spectroscopy and Interference Problems (Tutorial C) 55


Task
Look at the residual variance for all models in Tutorial C.

How to Do It
Select Results - General View and then Plot - Line. Click the Browse button against Source and find the
result file Tutorial C. The matrix we are interested in is called ResYValTot (Lookup Image C023).

Select Edit - Add Plot and plot the same matrix for the model Tutorial C No Outliers and Tutorial C
MSCorrected.

The plot shows the validated residual Y-variance for the three models (Lookup Image C024). From this plot
we find that the minimum square error is approximately:

for Tutorial C MSCorrected using 6 PCs,

for Tutorial C No Outliers using 8 PCs

for Tutorial C using 3 PCs

Tutorial C MSCorrected with six PCs gives the lowest estimate for the residual Y-variance. Predictions done
by this model using six PCs therefore give the predictions with the lowest prediction error.
Note again how the Results menu is your way to look at results from older models.

Tutorial C - Check the Error in Original Units: Root Mean Square


Error (RMSE)
The numerical residual variance values we used in order to find the best model and determine the optimal
number of PCs in the model are not related directly to the predictions. We cannot use the residual variance to
tell how larger we can expect the deviations in future predictions. We have to use the RMSEP - Root Mean
Square Error of Prediction - for that purpose.

Task
Finally, let us see how larger the error in ml dye we have to expect in future predictions; Root Mean Square
Error of Prediction.

How to Do It
Activate the regression overview Viewer. Select Plot - Variance and RMSEP and go to the RMSE sheet.
Double-click the screen preview to display the plot in the whole Viewer. De-select the calibration samples box
and tick the validation samples (RMSEP) instead (Lookup Image C025).

You see that the shape of the curve is exactly that of the residual variance, but the values have changed. The
plot says that predictions done with this model and using six PCs will have an average prediction error of 0.98.

Close the Viewer before you continue.

56 Spectroscopy and Interference Problems (Tutorial C) The Unscrambler Tutorials


Tutorial C - Predict New MSCorrected Samples
The best model (with MSC) is the one we will use por the prediction of new samples.

MSCorrect the Prediction Samples


All new samples to be predicted by the model Tutorial C MSCorrected must be transformed the same way as
the samples that were used to make this model.

Task
MSCorrect the prediction samples.

How to Do It
Go back to your data table Tutorial C MSCorrected containing the MSCorrected training samples. In order to
correct the prediction samples, use Modify - Transform - MSC. Specify the following parameters in the
Multiplicative Scatter Correction dialog: (Lookup Image C026)

Samples: Prediction [14]

Variables: Absorbance [16]

Use Existing MSC Model: Tutorial C MSC Model

Click OK and save the MSC coefficients. The prediction samples are then changed according to the
MSCorrection you found previously.

Run a Prediction on the MSCorrected Prediction Samples


The prediction samples are now ready for prediction.

Task
Predict the dye level of these samples.

How to Do It
Select Task - Predict. Specify the following parameters in the Prediction dialog: (Lookup Image C027)

Samples: Prediction [14]

X-variables: Absorbance [16]

Y-reference: None

Model name: Tutorial C MSCorrected

Number of Components: 6

Click View after the prediction is done. The prediction overview plot appears where the predicted values is
shown together with the deviations (Lookup Image C028). Large deviations indicate that the predictions
cannot be trusted.

The Unscrambler Tutorials Spectroscopy and Interference Problems (Tutorial C) 57


Tutorial C - Check List for Spectroscopy Calibration
Now that you have learnt the basics of calibration, let us suggest steps and useful functions for the
development of calibration models.


Read Data: File - Open or File - Import. You can import data from many instruments - directly or via
e.g. JCAMP-DX or ASCII. Many instruments also write U5 data files or Unsc-ASCII data files.

View and Prepare Data: Look at the Editor, define sets. Select some samples and Plot - Line or Matrix
to get an overview of the spectra (data plot). Histograms of Y-variables ARE useful too, as well as 3D
scatter plots of constituents if there are several.

Pre-process: Modify - Transform allows you to do spectroscopic transformations, derivation,
smoothing, etc. Modify - Reduce (Average) may be useful too. If you have a data plot of your spectra
open, you will see how the spectra change on the fly.

Statistics: Task - Statistics may be useful. The Statistics plot Scatter reveals scatter problems.

Select Samples: If you need to throw away data to get a more balanced data set you may make a PCA of
the spectra or the constituents. From the Score plot, use Edit - Mark and mark samples that span all the
important components (samples far away from the origin, but not extremes.) Select Task - Extract
Marked and save as a new file.

Reduce Spectra: If you need to use fewer wavelengths, or perhaps only a range of the spectra, select
Modify - Edit Set - Add - Special intervals - Select Every n variables - Update. You can now
change the starting point in the Interval field. Click OK twice to save the Set, e.g. under the Name New
Set. Then, choose Edit - Select Variables - Set and select the NewSet. The marked variables can now
be deleted, and you can save the new data file under a new name.

Make First Calibration Model and Look for Outliers: Task - Regression - PLS2 gives a nice
overview if you have several constituents. Otherwise use PLS1. View the results, especially Variance,
Scores and Predicted vs Measured. Plotting results, use Edit - Mark (also available under right mouse
button) to mark suspicious samples in the score plots. Plot - Sample outliers and XY Relation outliers
are useful to investigate them. You will see that the samples are marked in those plots too (and all other
sample based plots).

View - Raw data produces a link to the raw data table, high-lighting the marked samples - or vice versa!
Mark in the raw data table and see them marked in the corresponding plots.

Refine the Model: Task - Recalculate without Marked, gives a new model with the marked samples
removed. Compare results, and look for more outliers. Repeat if necessary.

Study the Model in Detail: Plot - Variances and RMSEP - RMSE/Important variables/Predicted
versus measured are useful tools. View - Trend lines - Regression line and View- Plot
statistics are useful too. Scores plots using Edit - Options - Sample grouping (also under right
mouse button) is excellent for investigating patterns.

Delete Wavelengths: From the Important variables plot you can Edit - Mark ranges in spectra that are
not important (potentially noisy). Task - Recalculate without Marked gives you a new model based
on fewer wavelengths, that is possibly more rugged and with a smaller prediction error.

Validation: Before you finish, make sure the model is properly validated using a suitable cross validation
or test set. Always keep replicates of the same samples in the same segment.

Additional Tools: Statistics on the B-vector is helpful to determine the number of PCs. Use Results -
General view - Plot - Line to plot the B-diagnosis for the model (Statistics, vector 1-6). Vector no 4 and
5 are especially useful (Bsum and SquSum B).

File - Import - Unscrambler results lets you see the numerical values of all results, e.g. B (the
regression coefficients) or ExtraVal, which contains information about the need for slope and bias
adjustment. Use Help for details.

58 Spectroscopy and Interference Problems (Tutorial C) The Unscrambler Tutorials



Access to Results: From the Results menu you have access to all models that you have saved. The
extracted information field summarizes all important model parameters.

Predict New Samples: Task - Predict is used to predict Y-values from spectra in new unknown samples.
If you have new samples with known reference values you can use Predict to make an additional
validation, to follow up the model, or to see how it performs with spectra, e.g. scanned with another
instrument. Remember to preprocess new samples in the same way as you did for the calibration samples.

Check the Model Performance for Other Instruments: By using Modify - Transform - Noise you can
add noise to new samples to see how sensitive the model is to small changes in the spectra. With File -
Import - Unscrambler results you can check the numerical values of all results, e.g. ExtraVal,. If Bias
is not close to 0 and Slope not close to 1, there may be a need to Slope and Bias Adjust the predicted Y-
values (e.g. because the spectra come from another instrument, or reference values from another lab).
SEPcorr shows which SEP you will get if you correct the Y-values in this way. Use Help for details.

Log: File - Properties - Log shows all manipulation you have made to the data set. If you Close and
Save models straight after calculation (and before View) their names will appear in the log too.

Export Model: When you have finally found a model that predicts new samples satisfactorily, you may
transfer it to your instrument. There are several possibilities:

Export The Unscrambler Model: Some instruments (e.g. from Perstorp Tecator, UOP GW, Bran
+Luebbe) can use U5 models; Select File - Export U8.0 Results. There are also export options for
NIRSystems NSAS, Vision and Foss Tracker.

Some instrument software can read the B-vector; File - Export to ASCII or JCAMP-DX. In some types
of software, especially for filter instruments, you can manually tune or edit the B-coefficients, e.g. in APC
or IDAS for Infralyzers.

File - Export - ASCII-MOD is a simple file format containing all information necessary to make
predictions, either using the full PLS or PCR models or only the B-vector. It can be used if you write your
own conversion routines.

Rerun the Model in the Instrument Software: If your software cannot take any of the above files, rerun
the model; By now you know exactly which calibration samples to use, how many PCs to use, and how to
preprocess the data. The prediction error and interpretation is known from the Unscrambler analysis.

The Unscrambler Tutorials Spectroscopy and Interference Problems (Tutorial C) 59


Experimental Design: Screening and Optimization
(Tutorial D)
Description of Tutorial D
Context of Tutorial D
This tutorial was built from the enamine synthesis example published by R. Carlsson in his book Design and
Optimization in Organic Synthesis, Elsevier, 1992.

A standard method for the synthesis of enamine from a ketone gave some problems, and a modified procedure
was investigated. A first series of experiments gave two important results:

A new procedure was built up, which shortened reaction time considerably;

It was shown that the optimal operational conditions were highly dependent on the structure of the original
ketone.

Thus, a new investigation had to be conducted to study the specific case of the formation of morpholine
enamine from methyl isobutyl ketone. It was decided to adopt a 2-step strategy:

First, at a screening stage, study the main effects of 4 factors (relative amounts of the reagents, stirring rate
and reaction temperature) and their possible interactions;

Then, conduct an optimization investigation with a reduced number of factors.

What You Will Learn in Tutorial D


Tutorial D contains the following parts:

Build suitable designs for screening and optimization purposes;

Analysis of Effects;

Response Surface Modeling.

Tutorial D - Data Table


From the previous experiments, reasonable ranges of variation were selected for the 4 design variables:

Variable Low High


A: amount of TiCl4 / Ketone (mol/mol) 0.57 0.93
B: amount of Morpholine / Ketone (mol/mol) 3.7 7.3
C: reaction temperature (
C) 25 40
D: stirring rate none high

60 Experimental Design: Screening and Optimization (Tutorial D) The Unscrambler Tutorials


Note: You will find the illustrations for this tutorial (Image D001, etc) at the end of the document.

Tutorial D - Build a Screening Design


Screening designs are used to identify which design variables influence the responses significantly.

Task
Select a screening design which requires a maximum of 11 experiments that will make it possible to estimate
all main effects and detect the existence of 2-factor interactions.
Note: With 4 design variables, you need a fractional factorial design to keep the number of experiments lower
4
than 16 (2 ).

How to Do It
Choose File - New Design to launch the Design Wizard, where you can generate a designed data table.
In the Design Wizard - Select Method to Use dialog, choose to build the design From Scratch and Click
Next.
This launches the Design Wizard - Select Design Type dialog, where you select Create Fractional
Factorial Design and proceed by clicking Next.

Define Design Variables


In the following dialog, Design Wizard - Define Design Variables, you specify the variables as shown in
the table hereafter:

ID Name Data Type Levels


A TiCl4 Continuous 2 (0.6; 0.9)
B Morpholine Continuous 2 (3.7; 7.3)
C Temperature Continuous 2 (25.0; 40.0)
D Stirring Continuous 2 (-1.0; 1.0)

Do this by clicking the New button. This launches the Add Design Variable dialog (Lookup Image
D001), where you must enter the name of the new variable (e.g. TiCl4, Morpholine, Temperature and
Stirring), select Continuous, and enter the low and high levels as stated above. Validate by clicking OK and
enter the next variable by Clicking New again.

Note: In order to be allowed to specify center samples, you will have to define Stirring rate as a continuous
variable; you can give it the arbitrary levels -1 and 1, where -1 stands for no stirring and 1 stands for high
stirring.

Define Non-design Variables (Responses)


After all design variables have been defined, click Next to enter the Define Non-design Variables dialog,
where you click New to define a response variable. This starts the Add Non-designed Variable dialog
where you can type in the name of the only response we will measure: Yield.

The Unscrambler Tutorials Experimental Design: Screening and Optimization (Tutorial D) 61


Define Design Type and Design Details
Clicking Next launches the Design Type dialog, where you will notice that the default choice is set to a
Fractional Factorial Resolution IV design, which consists of 8 experiments. Try other choices by toggling
Number of Experiments to run up or down. (Actually, there is only one possible fractional factorial design
with 4 variables; if you go up to 16 samples, then you have a full factorial design.)
Study the confounding pattern of the suggested design. You can see that all main effects are confounded with
3-variable interactions, which is acceptable if we assume that those interactions are unlikely to be significant.
The 2-variable interactions are confounded two by two.

Click Next to launch the Design Details dialog. Keep Number of Replicates to 1, and add 3 Center
Samples (Lookup Image D002).

Randomization and Last Checks


You need not make any further specification in the next dialog, Randomization Details. Click Next again
to launch the Last Checks dialog, where you make sure that all your design parameters have the correct
values. Otherwise, use Back to go to the appropriate dialog and make the necessary corrections.

Once you are satisfied with your design specifications, click Finish to exit. The generated design is
automatically displayed on screen (Lookup Image D003).
You can use the View menu to toggle between display options. Try Sample Names and Point Names,
Standard Sample Sequence and Experiment Sample Sequence (randomized order).

It should now be safe to store your new data table into a file, using File - Save As; give it a name, e.g. Enam
FRD. Note that you should not overwrite the existing file Enam_frd. You will need this file later in the
tutorial.

Tutorial D - Estimation of the Effects


After the experiments have been performed and the responses have been measured, you have to analyze the
results using a suitable method. Study the main effects of the four design variables, and check whether there
are any significant interactions. The simplest way to do this is to run an Analysis of Effects. Then, interpret the
results.

Run an Analysis of Effects on Enam_frd

Task
Run an Analysis of Effects.

How to Do It
First, you should enter the response values. Since this has already been done, you just need to read the
complete file. Use File - Open, and select from the Designed Data list in the Open File dialog the file
named Enam_frd, which already contains the response values.

62 Experimental Design: Screening and Optimization (Tutorial D) The Unscrambler Tutorials


To start the analysis, choose Task - Analysis of Effects.... In the Analysis of Effects dialog (Lookup
Image D004). Use the Samples, X-variables and Y-variables sheets to specify the following parameters:

Sample Set: Cube & Center Sampl (11)

X-variable Set: Design vars + Int (4+3)

Y-variable Set: Cont. Non-Design Vars (1)

Validate your final choices by Clicking OK.


After the calibration has been successfully completed, Click View to get an overview of the model results.
Before doing anything else, use File - Save to save the results file with a name such as Enam FRD AoE-a.

Interpret the Results of the Analysis of Effects

Task
Interpret the results of the Analysis of Effects that you have just run.

How to Do It
The Effects Overview plot shows which effects are significant (Lookup Image D005). By default, the
Significance Testing Method is Center.
Select Plot - Effects and choose COSCIND as Significance Testing Method on the Overview sheet in the
Effects dialog. Click OK to display the new plot. (Lookup Image D006)
You can see that three effects are considered to be significant: Main effect TiCl4 (A), Interaction AB or CD,
and Main effect Morpholine (B).

Select Window - Copy to - 2 : this copies the Effects Overview plot into sub-view 2 (the upper sub-view in
a system of two). Activate the lower sub-view (which is currently empty), and use Plot - Effects. On the
Response Details sheet (Lookup Image D007), select Normal Probability in the Plot type field, and
remove the option Include Table.

The normal probability plot of the effects (Lookup Image D008) confirms the results of the Effects
Overview: the effect of Morpholine (B) is clearly very significant, and AB=CD and TiCl4 (A) are also likely
to be significant.

Check the Enam_frd Data


Since there are as many terms in the model as the number of cube samples in the design, studying the residuals
is not relevant. Thus, we should just check the data for any non-linearities that would limit the validity of the
linear model with interactions.

Task
Check the data for non-linearities.

The Unscrambler Tutorials Experimental Design: Screening and Optimization (Tutorial D) 63


How to Do It
Go back to the Editor window containing the Enam_frd data table, and select Task - Statistics. Choose
Cube & Center Samples (11), on the Samples sheet, check that Cont. Non-Design Vars (1) is selected in
the Variables sheet and click OK.
View the results. Two plots appear in the Viewer (Lookup Image D009).
The upper plot shows the percentiles (min, max, median, etc.) of the response (Yield). The lower plot shows its
mean and standard deviation over all design samples.

Click in the lower plot and use Plot - Statistics. On the Compressed sheet (Lookup Image D010), go to
the Sample Groups field, where you specify that you wish to plot groups containing Design and Center
samples. Validate your choices with OK.

The lower plot (Lookup Image D011) now displays the mean and standard deviation of all Design samples
compared to that of the Center samples only.
You can see that the standard deviation for the center samples is about half the overall standard deviation. This
indicates some lack of reproducibility in the center samples; this is why most of the effects observed in the
Analysis of Effects were not found significant according to the Center Significance testing method. If you go
back to the Editor and study the Yield values, you will notice that center sample Cent-c has a very different
value from Cent-a and -b; maybe that experiment was not performed correctly.

The other important information conveyed by the plot is that there is a strong non-linearity in the actual
relationship between Yield and the design variables: The mean value for the center samples is much higher
than for the overall design.

Drawing a Conclusion from the Screening Design


The final conclusions of the screening experiments are the following:

Three effects were found likely to be significant. One of them is a confounded interaction, but since the
main effects of A and B are the only significant ones, we can try an educated guess and assume that the
significant interaction is AB;

There was some lack of reproducibility in the center samples, although the remaining part of the design
showed a clear structure (according to the COSCIND and Normal Probability results). If new experiments
are performed, it will be useful to replicate the center samples a few more times;

There seems to be a strong non-linearity in the relationship between Yield and (TiCl4, Morpholine).
Furthermore, since the center samples have a higher yield than the majority of the design samples, the
optimum is likely to be somewhere inside the investigated region;

Thus, the next sensible step would be to perform an optimization, using only variables TiCl4 and
Morpholine.

Tutorial D - Building an Optimization Design


After finding the important variables from a screening design, it is natural to proceed to the next step: Find the
optimal levels of those variables. This is achieved by an optimization design.

64 Experimental Design: Screening and Optimization (Tutorial D) The Unscrambler Tutorials


Task
Build a Central Composite Design to study the effects of the two important variables (TiCl4 and Morpholine)
in more detail.
Note: The other two variables investigated in the screening design have been set to their most convenient
values: No stirring, and Temperature=40 C.

How to Do It
Choose File - New Design to launch the Design Wizard, where you will be able to generate a designed
data table. In the Select Method to Use dialog, choose to build the design from scratch and Click Next.
This launches the Select Design Type dialog, where you select Optimization designs: Central
Composite and validate by Clicking Next.

In the Define Design Variables dialog, you will specify the variables TiCl4 and Morpholine with the same
ranges of variation as before (resp. 0.6 0.9 and 3.7 7.3), as follows:
Click New to launch the Add Design Variable dialog, where you must enter the name of the new variable,
select Continuous, and enter the Low and High levels. Validate by Clicking OK.
Enter the next variable by Clicking New again.

When both variables have been defined, check that the Define Design Variables dialog indicates the correct
Star Points Distance from Center, namely 1.41.

After all design variables have been defined, click Next to enter the Define Non-design Variables dialog,
where you click New to define the non-designed response variable Yield in the Add Non-designed
Variable dialog.
Once you are satisfied with your variable definitions, use Next to get into the Design Details dialog, where
you set the Number of Replicates to 1 and the Number of Center Samples to 5.

You need not make any further specification in the next dialog, Randomization Details. Click Next again
to launch the Last Checks dialog, where you make sure that all your design parameters have the correct
values. The design should include a total of 13 experiments. Otherwise, use Back to go to the appropriate
dialog and make the necessary corrections.

Once you are satisfied with your design specifications, use Finish to exit. The generated design is
automatically displayed on screen. Save your design for further use, e.g. with the name Enam CCD.

Tutorial D - Computation of the Response Surface


After the new experiments have been performed and their results collected, you are ready to analyze them so as
to find the optimum. You do this by finding the levels of TiCl4 and Morpholine that give the best possible
yield. You will need to use a Response Surface Analysis.

The Unscrambler Tutorials Experimental Design: Screening and Optimization (Tutorial D) 65


Running a Response Surface Analysis for Enam_ccd

Task
Run a Response Surface Analysis.

How to Do It
Normally, you would first have to enter the response values, but this has already been done. From the
Designed Data list in the Open File dialog, open the file named Enam_ccd, which already contains the
response values.

Choose Task - Response Surface. In the Response Surface dialog (Lookup Image D012), make the
following selections:

Samples: Default

X-var: Design Vars + Int + Squ (2+3)

Y-variables: Default

Click OK to start the analysis.

When the computations are done, click View to study the results. Do not forget to save the file before you start
interpreting the results!

Interpret Analysis of Variance Results for the Enam_ccd Response Surface

Task
Interpret the results from the Response Surface Analysis.

How to Do It
The viewer displays a Response Surface Overview, which consists of 4 plots (Lookup Image D013):
Analysis of Variance, Residuals, Response Surface visualized as a contour plot, and Response Surface
visualized as a landscape plot.

First, study the ANOVA results. Use Window - Copy To - 1 to copy the upper left plot to sub-view 1
(which covers the whole Viewer window). You can adjust the width of the various columns of the table if
necessary (Lookup Image D014). Study in turn: Summary, Model Check, Variables, and Lack of Fit.

The Summary shows that the model is globally significant, so we can go on with the interpretation.

The Model Check indicates that the quadratic part of the model is significant, which shows that the
interaction and square effects included in the model are useful.

The ANOVA table for variables displays the values of the b -coefficients, and their significance. You see
that the most significant coefficients are for the linear and quadratic effects of Morpholine; the quadratic
effect of TiCl4 is close to the 0.05 significance level. That section of the table also tells you that the
maximum point is reached for TiCl4=0.835 and Morpholine= 6.504; the information displayed on top of
the table shows a Predicted Max Point Value of 96.747.

66 Experimental Design: Screening and Optimization (Tutorial D) The Unscrambler Tutorials



The Lack of Fit section tells you that, with a p-value around 0.19, there is no significant lack of fit in the
model. Thus we can trust the model to describe the response surface adequately.

Checking the Residuals for the Enam_ccd Response Surface

Task
Check the residuals from the Response Surface Analysis.

How to Do It
The upper right sub-view (if necessary, use Window - Go To - 5) in the Response Surface Overview plot
shows a Normal Probability plot of the residuals. This plot can be used to detect any outliers. Here, you see
that the residuals form two groups (positive residuals and negative ones). Apart from that, they lie roughly
along a straight line, and no extreme residual is to be found outside that line. This means that there is no
apparent outlier.

From that window, go to Plot - Residuals and select Y-Residuals vs Predicted Y on the General sheet
(Lookup Image D015). Try alternatively the two options Residuals (which shows the raw residuals) and
Studentized (which shows transformed residuals that can be compared to a Student distribution).
In the Studentized residuals plot (Lookup Image D016), all values are within the (-2;+2) range, which
confirms that there are no outliers. Furthermore, there is no clear pattern in the residuals, so nothing seems to
be wrong with the model.

Select Plot - Predicted vs Measured and choose Predicted vs Measured. If necessary, use View -
Trend Lines - Regression Line to display the regression line (blue), and View - Trend Lines - Target
Line to visualize the y=x line (black) (Lookup Image D017).
You can see how the design samples are spread around the regression line; in particular, the Center samples to
the right of the plot show an important spread. This is why so few effects in the model are very significant:
There is quite a large amount of experimental variability.

Interpret the Enam_ccd Response Surface Plots


Now that the model has been thoroughly checked, you can use it for final interpretation. This is most easily
done by studying the two plots which visualize the response surface.

Task
Interpret the response surface plots.

How to Do It
The landscape plot displayed in the lower right quadrant shows you the shape of the response surface: a kind of
round hill with a maximum somewhere between the center and maximum values of the design variables.
That plot is not precise enough to spot the coordinates of the maximum; the contour plot displayed left
(Lookup Image D018) is better suited for that purpose. For instance, you can change the scaling to zoom
around the optimum, so as to locate its coordinates more accurately. Check that they match what is displayed
in the ANOVA table.

The Unscrambler Tutorials Experimental Design: Screening and Optimization (Tutorial D) 67


You can also click at various points in the neighborhood of the optimum, to see how fast the predicted values
decrease. You will notice that the top of the surface is rather flat, but that the further away you go, the steeper
the Yield decreases.

Finally, you may also have noticed that the Predicted Max Point Value is smaller than several of the actually
observed Yield values (sample Cube004a for instance has a Yield of 98.7). This is not paradoxical, since the
model smoothes the observed values; those high observed values might not be reproduced if you performed the
same experiments again.

Drawing a Conclusion from the Enam_ccd Optimization Design


The Response Surface Analysis gave a significant model, in which the quadratic part in particular was
significant, thus justifying the optimization experiments.

Since there was no apparent lack of fit, no outliers, and the residuals showed no clear pattern, the model could
be considered valid and its results interpreted more thoroughly.

The response surface showed an optimum predicted Yield of 96.747 for TiCl4=0.835 and Morpholine= 6.504;
the predicted Yield is larger than 95 in the neighboring area, so that even small deviations from the optimal
settings of the two variables will give quite acceptable results.

68 Experimental Design: Screening and Optimization (Tutorial D) The Unscrambler Tutorials


SIMCA Classification (Tutorial E)
Description of Tutorial E
Context of Tutorial E
The data to be classified in this tutorial is taken from the classical paper by Fisher (see Bibliographical
References). The task is to see whether three different types of the iris flowers can be classified by four
measurements made on them; the length and width of the Sepal and Petal.

What You Will Learn in Tutorial E


Tutorial E contains the following parts:

Make models of different classes

Classify new data

Diagnose the classification model

Tutorial E Data Table


The data table is stored in the file Tutor_e and contains 75 training (calibration) samples and 75 testing
samples.

The training samples are divided into three Sample Sets, each containing 25 samples. The three Sets are:
Setosa, Versicolor, and Virginica. The Sample Set Testing will later be used to test the classification.

Four variables are measured; Sepal length, Sepal width, Petal length, and Petal width. The measurements
are given in centimeters.

Note: You will find the illustrations for this tutorial (Image E001, etc) at the end of the document.

Tutorial E - Re-format Data Table


Whenever working with classification, it is very useful to identify samples belonging to the same class under
all circumstances in the raw data table and on PCA or classification plots.
In order to do this, we need to create a category variable stating class membership for all samples.

Task
Insert a category variable into the Tutor_e data table.

The Unscrambler Tutorials SIMCA Classification (Tutorial E) 69


How to Do It
Open the file Tutor_e from the Examples folder.
Select Edit - Insert - Category Variable. This starts the Category Variable Wizard where you should
type in a name for the new Category Variable, e.g. Iris, and choose option I want my levels to be based on
a collection of sample sets.
Click Next and in the next dialog, move the three sets Setosa, Versicolor, and Virginica from right to left.
Click Finish to complete the operation.
There is now a new variable in your data table (Lookup Image E001) - as you can see, the type of iris is
written for each of the 75 training samples.

Enter the right type for each of the 75 test samples. A simple way to do this is as follows:
Click on the first cell containing m. From the keyboard, type in m (which activates the entry mode on the
cell) then v (initial of Versicolor), followed by <Enter>. You are now positioned in the next cell; apply the
same procedure, until you reach the first Setosa sample. There, type in m and s followed by <Enter>. Go
on like this, until you reach the first Virginica sample. There, type in m, v and v (we need to type in
v twice to activate the second level which has v as initial).
Save the data table once you have completed this task.

Tutorial E - Graphical Clustering Based on Score Plots


It is always a good idea to start a classification with a PCA model of all samples. If you do not know the
classes in advance, this is a way of doing the clustering. The calibration samples must be assigned to the
different classes.

Task
Make a PCA model of all calibration samples.

How to Do It
Use Task - PCA and select the following parameters:

Samples: Training

Variables: Measurements

Weights: 1/SDev

Validation Method: Leverage correction

Number of PCs: 4

We assume that you are familiar with making models by now. Refer to one of the previous tutorials if you have
trouble finding your way in the PCA dialog.

You see that there are few outlier warnings and most of the variance is explained by three PCs. Click View to
look at the modeling results.

70 SIMCA Classification (Tutorial E) The Unscrambler Tutorials


Activate the explained variance plot and click on the Cal button so that only Validation variance remains on
the plot. (Lookup Image E002)
We see that the Explained Validation Variance is 91% with 2 PCs.

Activate the score plot and select Edit - Options. Enable sample grouping and select Value of Variable in
the Group By field. Make sure Leveled Variable 1 is selected. Click OK . (Lookup Image E003) You can
see the three groups in different colors; one very distinct (Setosa) and two that are not so well separated
(Versicolor and Virginica). This indicates that it may be difficult to differentiate Versicolor from Virginica.

Tutorial E - Make Class Models


Before we classify new samples, each class must be described by a PCA model. These models should be made
independently of each other. This means that the number of components must be determined for each model,
outliers found and removed separately, etc.

Task
Make PCA models for the three classes Setosa, Versicolor, and Virginica.

How to Do It
Go back to the Editor window containing your re-formatted data table. Select Task - PCA and make a model
with the following parameters:

Samples: Setosa

Variables: Measurements

Weights: 1/SDev

Validation: Leverage correction

Number of PCs: 4

When the model is computed, close the PCA Progress dialog and save the class model with name Setosa.
Repeat the procedure successively on Sample Sets Versicolor and Virginica, also saving each new PCA
model.

Tutorial E - Classify Unknown Samples


When the different class models have been made and new samples are collected, it is time to assign them to the
known classes. In our case the test samples are already in the data table, ready to use.

Task
Assign the Sample Set Testing to the classes Setosa, Versicolor, and Virginica.

How to Do It
Select Task - Classify. Use the following parameters: (Lookup Image E004)

The Unscrambler Tutorials SIMCA Classification (Tutorial E) 71



Samples: Testing

Variables: Measurements

Make sure that Centered Models is checked. Add the three models Setosa, Versicolor, and Virginica.
The suggested number of PCs to use is 3 for all models; keep that default (it is based on the variance curve for
each model). If you are curious, you may select a model in the list and click Variance to display the
calibration and validation variances for that model.
Click OK to start the classification.

Tutorial E - Interpretation of Classification Results


The classification results are displayed directly in a table, but you may also investigate the classification model
closer in some plots.

Interpret the Classification Table

Task
Interpret the classification results displayed in a table plot.

How to Do It
Click View when the classification is finished. (Lookup Image E005)
A table plot is displayed, called Classification Table. There are three columns: one for each class model.
Samples recognized as members of a class (they are within the limits on sample-to-model distance and
leverage) have a star * in the corresponding column.

The significance level can be toggled with the Significance option, which is available as a drop-down menu
from the menu bar.

At the 5% significance level, we can see that all but three samples (false negatives: virg1,virg36,virg42) are
recognized by their rightful class model.
However, some samples are classified as belonging to two classes (false positives): 12 Versicolor samples are
also classified as Virginica, while 6 Virginica samples are also classified as Versicolor. Only the Setosa
samples are 100% correctly classified (no false positives, no false negatives).

If you tune up the significance limit to 25%, this reduces the number of false positives but also increases the
number of false negativse (vers41 and Virg35 come in addition).

Interpret the Coomans Plot


If a sample is doubly classified, you should study both Si (sample-to-model distance) and Hi (leverage) to find
the best fit; at similar Si levels, the sample is probably closest to the model to which it has the smallest Hi. The
classification results are well displayed in the Coomans plot.

72 SIMCA Classification (Tutorial E) The Unscrambler Tutorials


Task
Look at the Coomans plot.

How to Do It
Select Plot - Classification and choose the Coomans plot for models Virginica and Versicolor. (Lookup
Image E006)

This plot displays the sample-to-model distance for each sample to two models. The newly classified samples
(from sample set Testing) are displayed in green color, while the calibration samples for the two models are
displayed in blue and red. (Lookup Image E007)

The Coomans plot for the classes Virginica and Versicolor shows that all Setosa samples are far away from
the Virginica model (they appear far to the right). However, we can see that many Virginica and Versicolor
samples are within the distance limits for both models. This suggests some classification problems.

Interpret the Si vs Hi Plot


We also have to look at the distance from the model center to the projected location of the sample, ie. the
leverage. This is done in the Si vs. Hi plot.

Task
Look at the Si vs. Hi plots.

How to Do It
Select Plot - Classification and choose Si vs. Hi for model Versicolor. Before you start interpreting the plot,
turn on Sample Grouping in the Options dialog and choose Name as Markers Layout, with length 2 (tick
only the first two boxes in the Name field). (Lookup Image E008) The plot is much easier to interpret: iris
type appears clearly with the initials Se, Ve, Vi in three different colors.

Some Virginica samples are classified as belonging to the class Versicolor, but most samples that are not
Versicolor are outside the lower left quadrant. The reason for the difficult classification between Versicolor
and Virginica is that the samples are overlapping in the score plot. They are very similar with respect to the
width and length of the sepal and petal.

Tutorial E - Diagnosing the Classification Model


In addition to the Coomans and Si vs Hi plots, there are three more plots that give us information regarding
the classification.

Interpret Model-to-Model Distance

Task
Look at the Model Distance plots.

The Unscrambler Tutorials SIMCA Classification (Tutorial E) 73


How to Do It
Select Plot - Classification and choose the Model Distance plot for the Versicolor model. (Lookup
Image E009)

This plot allows you to compare different models. A distance larger than three indicates good class separation.
The models are different.

It is clear from this plot that the Setosa model is different from the Versicolor, while the distance to Virginica
is smaller.

Interpret Discrimination Power

Task
Look at the Discrimination Power plots.

How to Do It
Select Plot - Classification and choose the Discrimination Power for Versicolor projected onto the
Setosa model.

This plot tells which of the variables that are most useful in describing the difference between the two types of
iris. (Lookup Image E010) We can see that variables Sepal Length and Sepal Width have high
discrimination powers (7.5 8) while it is lower for Petal length and Petal Width (4.5 5).

Do the same for Versicolor onto Virginica: all variables have discrimination powers lower than 5. This is
obviously not enough.

Interpret Modeling Power

Task
Look at the Modeling Power plots.

How to Do It
Select Plot - Classification and choose the Modeling Power for Versicolor.

Variables with a modeling power near one are important for the model. A rule of thumb says that variables
with modeling power less than 0.3 are of little importance for the model.

The plot tells us that all variables have a modeling power larger than 0.3, which means that all variables are
important for describing the model. None of the variables should be deleted from the modeling. The only
chance to improve on the classification between Versicolor and Virginica is to measure some additional
variables.

74 SIMCA Classification (Tutorial E) The Unscrambler Tutorials


Interacting with Other Programs (Tutorial F)
Description of Tutorial F
Context of Tutorial F
You probably want to use The Unscrambler together with other programs in your daily work. This could be a
word processor you use to document your latest work, or instrument software.

In this tutorial we show you some of the capabilities The Unscrambler has to interact with other programs
under the Windows operating system. The main focus here is how The Unscrambler is used in conjunction
with other software.

What You Will Learn in Tutorial F


Tutorial F contains the following parts:

Import data file;

Dragn drop from other programs;

Insert category variable;

Insert and edit plots from within another program;

Write an ASCII-MOD file.

Tutorial F Data Table


The data you work on in this tutorial are NIR spectra of wheat samples collected at a mill. Fifty-five samples
were collected and the NIR spectra were taken with an instrument using 20 channels.

The water content of wheat samples was measured and is the response variable in the data.

Note: You will find the illustrations for this tutorial (Image F001, etc) at the end of the document.

Tutorial F - Import Spectra from an ASCII File


Data are stored in many different ways. The most simple and flexible way is to store data in ASCII files.

Task
Import the ASCII file Tutor_F.txt.

The Unscrambler Tutorials Interacting with Other Programs (Tutorial F) 75


How to Do It
Start The Unscrambler and go to File - Import - ASCII. Select the file Tutor_F.txt in the Import dialog
and Click Import.

This launches the Import ASCII dialog, where you specify what the ASCII file looks like (Lookup Image
F001). Use the options displayed in the dialog. Note that the first row in the data file contains variable names
and the first column contains sample names.

Click OK to import the file and the data are read into an Editor.

Tutorial F - Import Responses from Excel


Spreadsheet programs are becoming more and more used. It is easy to transfer data between such a program
and The Unscrambler. The water content of the wheat samples is stored in an Excel file together with the
sample names.

Task
Import the data file Tutor_F.xls from Excel.

How to Do It
There are two procedures. Use Procedure I if you have Excel or Lotus installed on your Personal Computer or
Procedure II if you do not have a spreadsheet program that can read the file Tutor_F.xls. You only need to
follow one of the procedures.

Procedure I: Dragn Drop from Excel


Start Excel and read the file Tutor_F.xls. Tile The Unscrambler and Excel so that you can see both programs
at the same time. Mark the area that contains the data, i.e. cell A2:A56.

You are now going to drag the selected data area to the first variable in the Editor in The Unscrambler. Hold
down <Ctrl> and click on one of the sides of the marked area; the cursor changes and you see a + sign on top
of the cursor. Drag the data from Excel to the Editor in The Unscrambler that contains the wheat data. Note
how a frame marks the data area that is covered by the data you copy. Let go of the left mouse button when
you see that the frame covers the first variable completely, i.e. from sample 1 and down.

The dialog Select Drop Method appears (Lookup Image F002). Select Insert as 1 new column. Import
the sample and variable names from Excel the same way.

Procedure II: Import Excel file


Place the cursor in the leftmost column and go to Edit - Insert - Variable. Select File - Import - Excel
(Lookup Image F003). First you have to specify in the Import Target dialog how you want the data from
the Excel file to be imported. Select Import data into - Current data table (from origin) in the Import
Target dialog and click OK. This will take you to the Import dialog.

Find the file Tutor_F.xls in the Import dialog and Click Import. This launches the Import Worksheet
dialog, where you specify the options (Lookup Image F004). The Excel file is prepared by defining Range

76 Interacting with Other Programs (Tutorial F) The Unscrambler Tutorials


Names and Sheet range. This makes it easy to choose the data area. You may also enter the cell ranges
manually, if you remember them in your own files.

Select Water Content for Range names against Data and specify A2:A56 in the Sheet range Delete the
entries A1:A1 in the sheet range for Sample names when you import data without names. In the Sheet
range field against Variable names specify A1:A1. Then click OK.

Tutorial F - Insert Category Variable


Category variables are useful to calculate statistics and to use in plot interpretation.

Task
Insert a category variable to group the samples into three categories, depending on the water content level.

How to Do It
Place the cursor in the first column and select Edit - Insert - Category variable. This launches the
Category Variable Wizard - Enter Variable Name and Choosing Method dialog (Lookup Image
F005). Enter a name for the variable in the first dialog, select I want to specify the levels manually under
Method and Click Next to enter the next dialog, where you specify the levels. Add three levels: Low (Water <
13.0), Medium (13.0 > Water >15.0), and High (15.0 > Water).

Enter the category values according to the distribution above. Double click the category variable cell and select
the drop-down list. A list of the valid levels is displayed. A faster way to enter the value is to double-click the
cell and Click the first character of the desired level. Click the character repeatedly if many levels begin with
the same character.

The name of the category variable is written in blue text to distinguish this kind of variable from the ordinary
ones.

Tutorial F - Define Sets


Sets are the cornerstone in The Unscramblers analyses. We therefore have to define the necessary Sets. We
also have to set the properties of the data table correctly in order to get the optimal predefined plots after the
analysis.

Task
Define the Variable Sets NIR Spectra and Water Content . Change the data table properties to Spectra.

How to Do It
In the Editor, mark variable number two which now contains the water content of the wheat samples. Select
Modify - Edit Set and make sure that Variable Sets is selected. Click Add to define the Set Water
Content from current Editor. Define another Variable Set NIR Spectra using variables 3 22. Change the
Date Type to Spectra for both. We do not need to define a Sample Set because All Samples is automatically
defined as a Set.

The Unscrambler Tutorials Interacting with Other Programs (Tutorial F) 77


The predefined plots change a little depending on the status of the data table. Models based on spectra are
easier to interpret with line loading plots than 2D scatter plots, for instance.

Tutorial F - Make a PLS Model


The NIR spectra should contain information which makes it possible to predict the water content from them.
Let us make a model and find out.

Task
Make a PLS1 model from NIR spectra to the Water Content.

How to Do It
Select Task - Regression and specify the following parameters in the Regression dialog:

Method: PLS1

Samples: All Samples [55]

X-variables: NIR Spectra [20]

Y-variables: Water Content [1]

Weights: All 1.0

Validation method: Leverage Correction

Number of components: 5

You see how the model describes more and more of the water content.

Interpretation of the Regression Overview


The most important PLS analysis results are given in the regression overview plot.

Task
Look at the model results.

How to Do It
Click View in the dialog when the model is made. The following plot appears: (Lookup Image F006)

The residual Y-variance goes down nicely and is close to 0 after two PCs. The Predicted vs Measured plot
looks OK. The fit is quite good. From the regression Coefficients we see that there is a distinct peak around
1940.

Save the results file under the name Tutor F.

78 Interacting with Other Programs (Tutorial F) The Unscrambler Tutorials


Tutorial F - Inserting Plots into Word
Much of the work we do requires documentation of the results. It may therefore be necessary to transfer plots
from The Unscrambler into a word processor.

Task
Transfer plots from The Unscrambler into Word using Copy and Paste.

How to Do It
Open Word. Select the score plot in the regression overview plot and select Edit - Copy. Go to Word and
place the cursor where you want the plot to appear. Select Edit - Paste. The score plot is now inserted as a
graphical object in your Word document.

The plot can be transferred either as a bitmap or a picture file. The picture file option will usually give better
quality of the plot, but also larger Word files. You may want to use the bitmap option if you transfer plots with
many plot objects.

You choose between the two options from File - System Setup and the Viewer tab.

Tutorial F - Export ASCII-MOD File

Task
Export an ASCII-MOD file.

How to Do It
Open Results - Regression and select the PLS model you made. The ordinary thing to do would be to open
the regression overview plot and look at the different predefined plots for this model. But now we take a look
at the numerical results in the model that is available.

Click Variance to see the variances for different PCs in the model (Lookup Image F007). Scroll through
the information field to look at properties of your model.

Click Export to write the ASCII-MOD file to disk.

Take a look at the ASCII file that is generated. The format of the file is described in chapter Technical
References, available as .PDF file from CAMOs web site www.camo.com/TheUnscrambler/Appendices .

Tutorial F - Export Data to ASCII File


A common file format that most programs read is the simple ASCII file. There are different ways of writing
the ASCII file. You have to decide how it should look like based upon your knowled ge of how your other
programs prefer to read the ASCII files.

The Unscrambler Tutorials Interacting with Other Programs (Tutorial F) 79


Task
Write the Wheat data table to an ASCII file.

How to Do It
Activate the Wheat Editor and select File - Export. Make sure that the File Format is set to Flat ASCII /
Wide ASCII before you Click OK. Specify the ASCII file as suggested in the Export ASCII dialog (Lookup
Image F008).

Wide ASCII means that each sample is written as a row in the ASCII file with a paragraph mark to tell the end
of the row. The sample and variable names are written as the first column and first row in the ASCII file.

Open the file in an ASCII editor and look at the file. All names are enclosed in double quotes.

80 Interacting with Other Programs (Tutorial F) The Unscrambler Tutorials


Experimental Design: Mixture (Tutorial G)
Description of Tutorial G
Context of Tutorial G
This application, inspired from an example in John A. Cornells reference book Experiments With Mixtures,
illustrates the basic principles and specific features of mixture designs.

A fruit punch is to be prepared by blending three types of fruit juice: watermelon, pineapple and orange. The
purpose of the manufacturer is to use their large supplies of watermelons by introducing watermelon juice, of
little value by itself, into a blend of fruit juices. Therefore, the fruit punch has to contain a substantial amount
of watermelon - at least 30% of the total. Pineapple and orange have been selected as the other components of
the mixture, since juices from these fruits are easy to get and relatively inexpensive.

The manufacturer decides to use experimental design to find out which combination of those three ingredients
maximizes consumer acceptance of the taste of the punch.

What You Will Learn in Tutorial G


This tutorial contains the following parts:

Build a suitable design for a mixture optimization;

Import response values from Excel;

Check response variations with Statistics;

Analyze the results with PLS and Martens Uncertainty Test;

Tutorial G - Design Variables and Responses


The ranges of variation selected for the experiment are as follows:

Ranges of variation for the fruit punch design

Ingredient Low High


Watermelon 30% 100%
Pineapple 0% 70%
Orange 0% 70%

The responses of interest for the manufacturer are detailed in the table below.

Responses for the fruit punch design

The Unscrambler Tutorials Experimental Design: Mixture (Tutorial G) 81


Variable Type of Measurement Target
Consumer acceptance Average of 63 individual Maximum
ratings on a 0-5 scale
Production cost Computed from mixture Minimum
composition and raw material
cost
Sweetness Average ratings by sensory Descriptive only
panel on a 0-9 scale
Bitterness Average ratings by sensory Descriptive only
panel on a 0-9 scale
Fruitiness Average ratings by sensory Descriptive only
panel on a 0-9 scale

Consumer acceptance is the most important response, but if the analysis of the results should reveal two areas
with equally high consumer acceptance, the mixture with lower production cost will be preferred. The sensory
descriptors are here to provide an explanation for consumer acceptance and directions for further improvement
(for instance by adding sugar or sweetener if the consumers seem to prefer sweeter mixtures).

Note: You will find the illustrations for this tutorial (Image G001, etc) at the end of the document.

Tutorial G - Build a Simplex Centroid Design


Since we have only three design variables, we can build an optimization design right away. We will see during
our progress in the Design Wizard that a simplex centroid design is the most suitable.

Task
Build a simplex centroid design with the help of the Design Wizard.

How to Do It
Use File - New Design to start the Design Wizard. The first dialog is Design Wizard - Select Method to
Use, where you select option From Scratch and click Next to proceed.
You enter dialog Design Wizard - Select Design Type, where you select option Mixture Design
(Lookup Image G001); you can see that the contents of the Information field at the bottom of the dialog
box are updated and give you some advice about the selected type of design. For instance, the last sentence
states that for optimization purposes we should add interactions and sq uares to our model. We will remember
that!
Click Next; this starts the Design Wizard - Define Mixture Variables dialog where we will create a new
variable for each of our fruit juices. Click New to access the Add Design Variable dialog. Type in the
details of the first fruit juice (Lookup Image G002):
Name: Watermelon
Lower Bound: 30 %
Upper Bound: 100 %
Click OK to accept your choices and go back to the Design Wizard - Define Mixture Variables dialog.
Apply the same procedure to specify the other fruit juices (Pineapple and Orange, varying from 0% to 70%).

82 Experimental Design: Mixture (Tutorial G) The Unscrambler Tutorials


Once the three mixture variables have been defined, check that their details are correct (Lookup Image
G003) before proceeding. You may notice that the field Mix Sum is set to 100%; this is correct in our case,
and could be tuned down (e.g. to 90%) if we had some additional mixture ingredient with a constant amount
(e.g. sugar 10%). Click Next to proceed.

In the next dialog, you have the possibility to define Process Variables (i.e. other design v ariables which are
not part of the mixture). As we do not need any of those, just click Next.
You are now in the Design Wizard - Define Non-design Variables dialog, where you should specify
your responses. Click New to access the Add Non-design Variable dialog; type in the name of the
response variable. Do that for each response: Accept, Cost, Sweet, Bitter, Fruity. Click Next when all five
responses are specified.

The next dialog, called Design Wizard - Define Model, allows you to add terms to a default linear model.
As you can see (Lookup Image G004), the only available choice in our case is Mixture Interactions and
Squares. Tick that box and proceed with Next.
This leads you to the Design Wizard - Define Design Purpose dialog, where the system detects that your
purpose must be Optimization since you have added interactions and squares. Click Next to proceed.
The next dialog is Design Wizard - Design Type (Mixture). It recommends a Simplex-Centroid Design
with Interior Points (Lookup Image G005), and we accept that choice. Click Next to proceed.

In the next dialog called Design Wizard - Design Details, we accept the default choice of 1 Replicate and
3 Center Samples and click Next. In the Design Wizard - Randomization Details (General) dialog,
just click Next to proceed.
In the Design Wizard - Last Checks dialog, check that all details of the design are correct (Lookup
Image G006). Should anything be different from what you were supposed to have chosen, go back as many
dialogs as necessary with the Back button, then move forward again.
Click Preview to have a look at the randomized list of experiments. If you are not happy with the
randomization, click OK to go back to the main dialog then Re-randomize to start a new randomization (then
click Preview again to check the result). If you wish to print out the randomized list of experiments, click Lab
Report then OK.

Once you have made all necessary checks and corrections, click Finish; this displays an information dialog
(Lookup Image G007) (click OK after reading its contents) and opens the new designed data table into the
Editor (Lookup Image G008). Be aware that if you need to do any further corrections after that, you will
have to use command File - Duplicate - As Modified Design to access the Design Wizard once again.
Save the new table with File - Save (you may call it Fruit Punch empty for instance).

Tutorial G - Import Response Values from Excel


To save you the trouble of manually entering all response values, you will just need to import them f rom an
Excel spreadsheet.

Task
Import response values from Excel into your designed data table.

The Unscrambler Tutorials Experimental Design: Mixture (Tutorial G) 83


How to Do It
Once your designed data table is displayed in the Editor, place the cursor on the first empty cell (sample 1,
variable Accept) and select File - Import - Excel .
In the Import Target dialog (Lookup Image G009), choose option Current data table (from cursor
position) and click OK.
In the Import dialog, make sure that you are accessing files from the Examples folder within your
Unscrambler Data directory, and select the Excel Spreadsheet File called Fruit Punch Original. Click
Import; this launches the Import Target dialog (Lookup Image G010) where you should specify the
spreadsheet cell ranges that contain the data you need to import. Select as follows:

Sheet name Responses


Data Response_data
Sample names Sample_names
Variable names Response_names

Click OK after double-checking your choices: you are now back in your data table, with the response values
filled in.
Select File - Save As and give the table a new name (for instance Fruit Punch).

Tutorial G - Check Response Variations with Statistics


You will now run a first analysis Statistics, and interpret the results with the following questions in mind:

How much does each response vary?

Is there more variation over the whole design than over the replicated Center samples?

Is there any response value outside the expected range?

Task
Run Statistics, display the results as plots, check response variations and look for abnormal values.

How to Do It
With your Fruit Punch data table displayed in the Editor, select Task - Statistics.
Choose the following settings in the Statistics dialog:
Sample Set: All Samples (12)
Variable Set: Cont Non-Design Vars (5)
Calculate Cross-Correlation: not selected
then click OK to start the computations.

Click View in the Statistics Progress dialog: the Statistics results are displayed as two plots (Lookup
Image G011). The upper plot is Percentiles , the lower Mean and SDev.
Save the results file as Fruit Punch Stats.

84 Experimental Design: Mixture (Tutorial G) The Unscrambler Tutorials


Let us have a look at the upper plot: Percentiles.
If you have never interpreted a box-plot (or Percentiles plot) before, hit <F1> and read the description that is
displayed in the Help window; use the Back button on top to go back to the Tutorial instructions if you are
reading them on line.
Click on the leftmost blue box on the Percentiles plot to display the min, max, median etc. for the first response
(Accept). Check that the ranges of variation are within the expected range for that response (0-5): with a
minimum of 1.6 and a maximum of 4.1, it looks fine.
Do the same for the other responses. NB- Cost is supposed to take values of 0 (pure Watermelon) or more
(mixtures of Watermelon with purchased juices). The three sensory responses should take values from 0-9.

Now we are going to display the same two plots for Design samples and Center samples, in order to compare
variation over the whole design to variation over the replicated Center samples. If the experiments have been
performed correctly, there should be much more variation among design points than among the three replicates
of the Center sample.
Select Plot - Statistics; in the Statistics dialog (Lookup Image G012), look at the Compressed sheet
and focus on the Sample Groups field. Design should already be selected; select Center as well (you can
see that the plot preview is updated as a result, now showing several groups in different colors) and click OK.
The Percentiles and Mean and SDev plots are now displayed for two groups (Lookup Image G013). The
bars or boxes for Design samples appear in blue and for Center samples, in red (unless you are using your own
color scheme).
On the Percentiles plot, you can see that there is much more variation among design points than among the
Center samples. This also appears clearly on the Mean and SDev plot: for instance, if you click successively
on the blue and red bars for variable Accept, you will see that SDev is 0.75 for Design samples and only 0.25
for Center samples.

Conclusions:

The ranges of variation of the 5 responses are as expected.

There is no abnormal value for any response.

There is much more variation over the whole design than among the Center samples, which suggests that
the experiments were performed correctly.

Tutorial G - Model the Mixture Response Surface with PLS


We are now ready to start modeling, that is to say study the quantitative relationships between fruit punch
composition and consumer acceptance, production cost and sensory properties of the mixtures.
The easiest way to do so is with PLS2, selecting cross validation and Uncertainty testing so as to diagnose the
stability of the model.
Once the model has been diagnosed (and errors, if any, corrected), the results will be interpreted by plotting a
Response Surface for each response variable.

Task
Build a PLS model of the response variations, validate it with cross validation and uncertainty testing. View
the results and check the model.

The Unscrambler Tutorials Experimental Design: Mixture (Tutorial G) 85


How to Do It
Switch back to the Editor window containing the Fruit Punch data table, and run Task - Regression. In the
Regression dialog, make the following choices:

Method PLS2
Sample Set All Samples (12)
X-Variables Design Def Model (3+6)
Weights for X-vars All 1/SDev
Y-Variables Cont Non-Design Vars (5)
Weights for Y-vars All 1/SDev
Validation Method Cross Validation
Uncertainty test Selected
Model Size Full
Num PCs 5
Issue Warnings Selected

Click OK, then have a look at the PLS2 Regression Progress dialog (Lookup Image G014). The
model needs 4 PCs, and even then the Y-validation variance is quite high (0.50). We can also see that several
warnings have been issued, especially for PC 0 (that is to say, at the Centering stage of the computations) and
PC 1.
This suggests some problems in the data maybe an outlier? We will have to investigate.
Click View to access the Viewer where the regression results are displayed.

Diagnosing the Model


We disregard the four plots of the regression overview and focus on finding out what causes the high residual
Y-variance and the warnings. Select Plot - Residuals (Lookup Image G015) and choose an Influence
Plot for Y-variables; double-click on the plot preview to get a full-subview plot, then click OK to produce the
plot.
Study the Influence plot (Lookup Image G016). One sample is far from the others, at the top of the plot:
sample 08 -Axis2 has a high residual Y-variance.
What is wrong with that sample? Select View - Outlier List and study the Outlier List (Lookup Image
G017). At the bottom of the list, we can read that Calibration sample 8 is badly described for Y-variable
4.

Correcting the Data and Making New Model


th
Go back to the Editor window containing the Fruit Punch data table, and look at the value of the 4 response
(Bitter) for sample 8 (08 -Axis2). It shows 7.30.
It turns out that this is an error, and the true value is 4.30. Type in the correct value, save the table (as Fruit
Punch corrected) and re-run Task - Regression with almost the same choices as before (remember to
select Uncertainty test) but choose 6 PCs this time. View the results.

86 Experimental Design: Mixture (Tutorial G) The Unscrambler Tutorials


Interpreting Variances
Look at the Variance plot. Display Explained variances (Cal and Val) for Y (Lookup Image G018). The
optimal number of PCs is 5, with a Calibration variance of 98% and a Validation variance of 85%. This is a
very good result.

Interpreting Correlation Loadings


Click on the plot of X- and Y-Loadings, and use the tool to display Correlation Loadings (Lookup
Image G019).
In blue, the X-variables are located on the corners of a triangle, which reflects the 3 axes of t he design
(Watermelon, Pineapple, Orange).
In red, the response variables are all located between the two ellipses, which means that they are well
explained by the model.
Responses Cost, Sweet and Fruity point in the direction opposite to Watermelon: these three responses
increase when Watermelon decreases. This is logical, since Watermelon is the cheapest, but also the least
sweet and fruity ingredient of the fruit punch.
Response Accept lies closer to design variable Orange. This means that a relatively high proportion of
Orange juice in the mixture is necessary to achieve high consumer acceptance.
In an opposite direction lies response Bitter, not very far from design variable Pineapple. This suggests that
mixtures with a high proportion of Pineapple have been found bitter by the sensory panel, and that bitterness
might explain low consumer acceptance.
NB- Is pineapple juice really bitter? The sensory panel may have reacted to astringency and called it
bitterness

Interpreting Regression Coefficients Plots


Plot Regression Coefficients (based on 5 PCs) for variables Accept, Sweet, Bitter, Fruity (one in each sub-
view). For each plot, turn on View - Uncertainty Test - Uncertainty Limits to display the uncertainty
limits of the regression coefficients. Study the plots (Lookup Image G020).
The model is not very stable for response Accept: all uncertainty limits but one cross the zero line. This is
not surprising since consumer acceptance, resulting from many individual tastes, is likely to be a noisy
variable.
For the three sensory responses, however, the results seem quite stable, with generally smaller uncertainty
limits.

Note: Since this is a mixture model, all terms of the model are linked. Therefore it would be meaningless to
remove the non-significant effects from the model. This is why we do not mark the non-significant
coefficients nor recalculate the model without the marked variables, as we would have done in another context.

Interpreting Response Surface Plots


You are now going to plot the Response Surface for each response. Follow the instructions for the first
response.
Click on the upper left sub-view. Select Plot - Response Surface. From the General sheet of the
Response Surface dialog, choose the following:
Layout Contour
Y-variable 1 Accept

The Unscrambler Tutorials Experimental Design: Mixture (Tutorial G) 87


Components 5

From the X-variables sheet (Lookup Image G021), choose the following:
Axis 1 Watermelon(A)
Axis 2 Pineapple(B)
Axis 3 Orange(C)

Double-check your choices then click OK. The Response Surface plot for variable Accept is now displayed in
the upper left sub-view.
Do the same in the other three sub-views with responses Sweet, Bitter and Fruity (Lookup Image G022).
Have a look at the four response surfaces and interpret them.

You may copy one of the plots to sub-view 1 (with Window - Copy To - 1) so as to study it in more detail.
Let us do so with response Accept (Lookup Image G023). We can see that consumer acceptance is low
(blue curves) for mixtures with high Watermelon or high Pineapple contents.
Maximum acceptance is reached for a fruit punch with relatively high Orange and low Pineapple. By
clicking on that point we dan display its coordinates (A= 38.75, B= 16.04, C= 45.21) and the Accept value
(3.76).

Conclusions:

With the help of the Y-variance curve, the Influence plot and the Outlier List, we have found an error in
the data.

Once the punching error has been corrected, the PLS2 model has good quality (high explained Calibration
and Validation Y-variance).

The Correlation Loadings show the underlying logic in response variations.

The Regression Coefficients have large uncertainties for response Accept, but are better for the sensory
responses.

The Response Surface plots show maximum consumer acceptance for a fruit punch with about 39%
Watermelon, 16% Pineapple and 45% Orange.

88 Experimental Design: Mixture (Tutorial G) The Unscrambler Tutorials


Three-Way PLS Analysis of Fluorescence Spectra
(Tutorial H)
Description of Tutorial H
General Context of Tutorial H
In this tutorial we will utilize Fluorescence Excitation-Emission spectra to study the process of refining wood
into fibers for the production of fiberboard by steam treatment of various severities.
The original data are from the Institute of Applied Research (Prof. Kessler), Reutlingen University, Germany.

Detailed Context of Tutorial H


Steam treatment of wood at different temperatures results in a softening of the fiber composite structure as well
as in a separation of the wood into the main products cellulose, hemicelluloses and lignin. The flexibility of
this process allows producing a broad range of products ranging from fiberboard up to pulp. Due to the
complexity of this process it is important to know in detail the kinetics of degradation of the wood composite.
There are numerous investigations to characterize the raw material and reaction products by means of FTIR or
NIR spectroscopy, but little work has been done on Fluorescence spectroscopy, although Fluorescence is an
extraordinary sensitive tool.

Fluorescence spectroscopy is able to distinguish similar molecules and can discriminate identical molecules in
different chemical environments. This is due to the possibility to scan excitation spectra at specified emission
wavelengths and to scan emission spectra at specified excitation wavelengths (EEM -scans). This procedure
results in 3-D graphs of the fluorescence intensity with respect to different excitation and emission
wavelengths. But the EEM data are strongly intercorrelated and difficult to interpret. Standard unfolding
methods often give unsatisfactory results. We will use a three-way analysis approach to overcome this
problem.

What You Will Learn in Tutorial H


This tutorial contains the following parts:

Toggle 3D layouts in the 3D Editor;

Plot 3D data;

Define a Primary Variable set and a Secondary Variable set;

Build a three-way PLS regression model;

Find an outlier and recalculate;

Interpret a three-way PLS regression model

Tutorial H - Data Tables


The data for this tutorial are stored in files Tutor_h_X3D and Tutor_h_Y2D in the Examples directory on
your computer.

The Unscrambler Tutorials Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) 89


Wood Samples (X and Y Data)
The samples (objects) are common for the X and Y data tables. They consist of 32 fibre samples of steam
treated and refined woodchips. Two types of wood are studied: beech (B) is a hard wood and spruce (S) is a
soft wood. The wood samples were either fresh (F, 3 months) or old (O, 6 months). Two plate gaps of grinding
were used: fine (Fi) and coarse (C).
The sample names indicate this information. For example:

BFFi means Beech, Fresh and Fine

SOC means Spruce, Old and Coarse

Fluorescence Excitation-Emission Spectra (X Data)


The X-variables are fluorescence excitation-emission spectra. They are saved in a 3D data table
(Tutor_h_X3D ) with 32 rows for the 32 woodchip samples, and 2046 columns corresponding to 66 Primary
2
Variables (Excitation) x 31 Secondary Variables (Emission). This is a so-called OV table as it contains one
Object mode and two Variable modes.
The fluorescence spectra were measured in the following ranges: Excitation 250 - 575 nm with a step of 5 nm,
Emission 300 - 600 nm with a step of 10 nm.

Severity (Y Data)
The Y data is found in table Tutor_h_Y2D, consisting of 32 rows for the 32 woodchip samples and one
column, Severity.
Severity of steaming is a measure reflecting the duration and temperature of steam treatment. The spruce and
beech samples were treated with steam at temperatures from 160C to 220C. The Severity values range from
1.7 to 3.5.

Note: You will find the illustrations for this tutorial (Image H001, etc) at the end of the document.

Tutorial H - Toggle 3D Layouts in the 3D Editor


3D tables can be displayed in 12 different layouts. By easily changing the layout of a table, you will be able to
organize your data set as best suits your analysis needs.

Task
Toggle 3D data layouts.

How to Do It
Open the data file Tutor_h_X3D by selecting File - Open. It is a file of type 3D Data. (Lookup Image
H001)
The table opens in the 3D Editor. It is a table of OV 2 layout (1 object mode, 2 variable modes), therefore its
column numbers are two-fold. For example, column 1:6 corresponds to primary variable number 1 (Excitation
wavelength 250 nm) and secondary variables number 6 (Emission wavelength 350 nm). (Lookup Image
H002)

90 Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) The Unscrambler Tutorials


Use menu Modify - Toggle 3-D Layouts or its corresponding shortcut Ctrl+3. Using this menu once will
exchange Primary (now Emission spectra) and Secondary variables (now Excitation spectra). For example,
column 1:6 will now correspond to Emission wavelength 300 nm and Excitation wavelength 275 nm.
Several sub-menus of the Modify menu allow you to change the layout of a 3-D table, for example by
exchanging Primary and Secondary variables, or swapping layout from OV2 to O2 V (2 object modes, 1
variable mode). You may freely try some of these menus and observe how the table is transposed in 3
dimensions.

2
Toggle the layout several times (Ctrl+3) until you are back to an OV table of size 32 x (66 x 31), that is to say
32 samples, 66 Primary Variables and 31 Secondary variables. The size of the table is shown at the bottom
right corner of the Editor. (Lookup Image H003)

Tutorial H - Plot 3D Data


It is always recommended to study your raw data before engaging into modelling. Let us plot the raw spectra
of a few wood samples of Beech and Spruce and compare these. We will use a Matrix 3-D plot to display the
fluorescence spectra.

Task
Study the raw data by plotting the fluorescence spectra of a few wood samples.

How to Do It
Go to menu Plot - Matrix 3-D and select sample 13, BFFi (Beech, Fresh wood, Fine grinding). The
excitation-emission spectrum for this sample is displayed in the Viewer. (Lookup Image H004)

You may use the Rotate option ( or View - Rotate) to view the spectral landscape from various angles.
Use either the mouse or the arrow keys on your keyboard to rotate the plot. Holding your finger on an arrow
key will allow a continuous rotation of the plot; pressing the Alt Gr key at the same time will slow down the
rotation.

Menu Edit - Options (or ) allows you to change the Plot Layout from a 3-dimensional Landscape
view into Contour or Map. (Lookup Image H005)
Go back to the 3D Editor and use menu Plot - Matrix 3-D to plot sample 29, SFFi (Spruce, Fresh wood, Fine
grinding). (Lookup Image H006)

Tutorial H: Interpretation of the Raw Fluorescence Spectra Plots


Both sample 13 and sample 29 were submitted to a high severity treatment (Severity values are indicated in the
Y-data table, Tutor_h_Y2D). Yet, they have very different spectra, showing that the degradation process of
the wood (softening of lignin with destruction of the hemicellulose lignin complex) is very different in soft-
and hardwood.
We can also notice that only excitation wavelengths number 1-14 and 60-66, and emission wavelengths 1-7 do
not contain information. We will define a Primary Variable set with variables 15-59 (Excitation 320-540 nm)
and a Secondary Variable sets with variables 8-31 (Emission 370-600 nm) only.

Close your various matrix plots before proceeding with the tutorial.

The Unscrambler Tutorials Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) 91


Tutorial H - Define a Primary Variable Set and a Secondary
Variable Set
Defining variable sets allows you to keep the full spectra in the table, yet to work on just the relevant part of
the data.

Task
Define a Primary Variables set and a Secondary Variables set.

How to Do It
Go to menu Modify - Edit Set or use the corresponding shortcut Ctrl+E. This opens up the Set Editor
dialog. (Lookup Image H007)

Click on the Add button to open the New Primary Variable Set dialog. Use the following settings:

Name: Excitation 320-540 nm

Data type: Spectra

Interval: 15-59
Alternatively, click the Select button and select wavelengths 320 to 540 nm in the Select Variables
dialog.
(Lookup Image H008)

Click OK; you are back in the Set Editor dialog where you can see your Primary Variable Set.
Use the drop-down list and select option Secondary Variable Set. (Lookup Image H009). Click on the
Add button to open the New Secondary Variable Set dialog, and define a set as follows:
Name: Emission 370-600 nm


Data type: Spectra
Interval: 8-31

Alternatively, click the Select button and select wavelengths 370 to 600 nm in the Select Variables
dialog.
(Lookup Image H010)

Click OK; you are back in the Set Editor dialog where you can see your Secondary Variable Set. (Lookup
Image H011)

Note!
If you made any mistake in defining the variable sets, use the Properties button to return to the New
Primary/Secondary Variable Set dialog and make corrections accordingly.

Click OK; you are back in the 3D Editor. Use menu File-Save As to save the data sets information. You
may call your new table Tutor_h_X3D with sets. (Lookup Image H012)

92 Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) The Unscrambler Tutorials


Tutorial H - Build a Three-Way PLS Regression model
A Three-Way PLS regression relates a three-way X-data array (here: fluorescence intensity) to a one- or two-
way Y-data array (here: Severity of steam treatment).

Task
Set up the options for a Three-Way PLS Regression and launch the model calculations.

How to Do It
Make sure that your 3D data table Tutor_h_X3D with sets is on screen. Select Task - Regression to
open the Regression (Three-Way PLS) dialog. Choose the following options:

Sample Set: All Samples [32]

Match samples in X and Y Data Tables By row numbers

Pri. X-Vars: Excitation 320-540 [45]
Weights: All 1.0

Sec. X-Vars: Emission 370-600 [24]
Weights: All 1.0

Y-Variable File: Tutor_h_Y2D
Variable Set: Severity [1]
Weights: All 1.0

Validation Method: Cross Validation. Use the Setup button to choose Full Cross Validation

Num PCs: 10

Center Data: selected
(Lookup Image H013)

Note!
In the Y-variables sheet, you may have to Browse to find the Y-Variable File Tutor_h_Y2D.

Click OK to launch the calculations. The Three-Way PLS Regression Progress dialog appears. As the
calculations run, the Y-Validation Residual Variance curve per cross validation segment is shown. When the
calculations are over the Residual Y-Validation Variance curve for the global model is displayed. (Lookup
Image H014)

Hit the View button. The Regression Overview opens, showing four default plots. These are (clockwise):
Scores, X1-Loading Weights and Y-Loadings, Predicted vs. Measured, Residual Y-Validation Variance.
(Lookup Image H015)

Go to menu File - Save and save your model as Wood Severity_model 1

Tutorial H - Find an Outlier and Recalculate


Before interpreting a model, one should always check the model for potential outliers.

The Unscrambler Tutorials Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) 93


Task
Detect an outlier and recalculate the model without it.

How to Do It
Go to menu Plot - Sample Outliers. Keep the default settings and click OK. Four plots appear in the
Viewer: Scores, Influence, Y -Residual Sample Variance and X-Residual Sample Variance. (Lookup Image
H016)

Click on the Influence plot so that it is active, then use the X and Y buttons ( ) to display only X
information, or only Y information, or both. Sample 18 (SOFi) is an outlier with a high Residual Y-Variance.

Go to menu Edit - Mark - One By One or use the corresponding shortcut , then click on sample 18 in
the Influence plot. This sample is now marked by a circle on all plots. (Lookup Image H017)

Go to menu Task - Recalculate Without Marked. This brings up the Regression (Three-Way PLS)
dialog, and you can observe that sample 18 is shown in the Keep Out of Calculation field.
Check that the Cross Validation setup is still Full Cross Validation, and that the number of components
(Num PCs) is 10. (Lookup Image H018)

Click OK to compute a new model without sample 18 (Lookup Image H019). Click View to display the
Regression Overview. Go to menu Plot - Sample Outliers and check that no sample is outlying in this
new model. (Lookup Image H020)

Go to menu File - Save and save the new model as Wood Severity_model 2

Tutorial H - Interpret a Three-Way PLS Regression Model


Three-Way PLS regression models are interpreted in very much the same way as (2-way) PLS-R models. We
will focus our interpretation on some of the available pre-defined plots.

Tasks
the regression coefficients and the Predicted vs. Measured plot.

Tutorial H: Determine the Optimal Number of PCs for the Model

Task
Interpret the Y-Residual Validation Variance plot and determine optimal number of components (PCs).

94 Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) The Unscrambler Tutorials


How to Do It
Go to Plot - Regression Overview. This opens the Regression Overview dialog. In the last section of
this dialog, you can observe that the Suggested number of Components is 8. Keep the default settings and
click OK.
In the Regression Overview, study the bottom left plot: Y-Residual Validation Variance. (Lookup Image
H021)

Note!
If your plot differs from the picture, you may adjust it using this set of buttons:

which control Calibration and/or Validation results, X or Y variables display,


and Explained or Residual variance. To determine number of PCs for the model, you should look at the Y-
Validation Variance (Residual or Explained).

The Y-residual validation variance shows a plateau from PCs 7-8, in agreement with the suggested number of
components given by the software. We decide to be conservative and use 7 PCs for this model.

Tutorial H: Interpret the Scores

Task
Interpret the Scores plot and find out if there are any clear groups of samples.

How to Do It
Activate the Scores plot (map of samples) by clicking on it; it is the plot situated in the first quadrant. The
sample names contain a lot of information. Let us focus on Wood type.

Go to Edit - Options or click on this shortcut: . This opens the Options dialog. In the Markers Layout
field, choose option Name, then click on the first box. This will disable the following boxes, so that only the
first character in the sample name will be kept (Lookup Image H022). Click OK. The Sample names only
indicate S for Spruce wood (soft) or B for Beech wood (hard).

Click on the Next Vertical PC button , or use the Up arrow key on your keyboard to display the Scores
for PC1 vs. PC3. We can observe that PC3 separates the Spruce samples (to the bottom) from the Beech
samples (to the top). (Lookup Image H023)

Tutorial H: Interpret the Loading Weights


Let us find out which fluorescence information is carried by PC3, which separates Spruce from Beech.

Task
Interpret the X-Loading Weights and find out which information is carried by PC3.

The Unscrambler Tutorials Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) 95


How to Do It
Go to Plot - Loading Weights. In the Loading Weights dialog, select the following settings:

Plot type: Line

Vector 1: 1-3

Variables: X
(Lookup Image H024)

Click OK. The Loading Weights for excitation spectra (Primary variables, X1) appear in the top window and
the Loading Weights for emission spectra (Secondary variables, X2) appear in the bottom window. (Lookup
Image H025)

PC3 is represented in green on the plots. On the top plot, it shows a peak for excitation 355 nm. On the bottom
plot, it shows a peak for emission 400 nm.
These peaks describe the CH3O functional groups of hardwood and softwood. The CH3O functional groups
are higher in hardwood lignin than in softwood. This information is shown with PC3. The beech samples have
higher scores than the spruce samples for this PC.

Tutorial H: Interpret the Regression Coefficients

Task
Interpret the Regression Coefficients and find important absorption/emission bands.

How to Do It
Go to Plot - Regression Coefficients, and in the Regression Coefficients dialog choose the following
settings:

Plot type: Matrix

X-variables: Primary X Vs Secondary X

Y-variable: 1, Severity

Components: 7
Double click on the preview screen at the top of the dialog to enlarge the plot: the plot will be displayed in Full
Window (Lookup Image H026)

Click OK to display the regression coefficients plot. The plot is shown in landscape layout. (Lookup Image
H027) We can observe four major areas presenting high regression coefficients (three positive, one negative).
To better study the plot, use the rotate function ( or View - Rotate). Use either the mouse or the arrow
keys on your keyboard to rotate the plot. Holding your finger on an arrow key will allow a continuous rotation
of the plot; pressing the AltGr key at the same time will slow down the rotation.

Menu Edit - Options (or ) allows you to change the Plot Layout from a 3-dimensional Landscape
view into Map. Move your mouse over the Map plot to get the coordinates for excitation and emission
wavelengths. (Lookup Image H028)

96 Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) The Unscrambler Tutorials


Use the buttons or corresponding keyboard arrows to navigate the regression coefficients plot for
models of various numbers of components. This is available both in landscape and in map layouts.
As we navigate from PC1 over PC2, to PC3 we can see that the regression coefficients change corresponding
to the absorption/emission band of the CH3O-functional groups (Excitation 360 nm, emission 400 nm) when
we include the third component. (Lookup Image H029)

Tutorial H: Interpret the Predicted and Measured Plot

Task
Interpret the Predicted and Measured plot and find out which samples are best predicted.

How to Do It
Go to Plot - Predicted vs Measured. In the dialog, choose the following settings:

Plot type: Predicted and Measured

Y-variable: 1, Severity

Components: 7

Samples: Calibration
(Lookup Image H030)

Click OK to display the plot. The blue curve corresponds to our model, while the red curve corresponds to the
measured values. There is a good fit of the model. Yet we can observe that several samples are not as well
predicted as the others. By moving the mouse over these samples to identify them, it is seen that especially
fresh wood samples (F) are generally better predicted than old wood samples (O). (Lookup Image H031)

The RMSEC for the model is accessible from Plot - Predicted vs Measured. Choose settings:

Plot type: Predicted vs Measured

Y-variable: 1, Severity

Components: 7
Samples: Calibration

RMSEC is of 0.11, for steam treatments severity values that ranged from 1.7 to 3.5. This is about the size for
the reproducibility of the severity measurement.

The Unscrambler Tutorials Three-Way PLS Analysis of Fluorescence Spectra (Tutorial H) 97


Multivariate Curve Resolution of Dye Mixtures (Tutorial
I)
Description of Tutorial I
Multivariate Curve Resolution (MCR) attempts recovery of response profiles (spectra, pH profiles, time
profiles, elution profiles, etc) of the components in an unresolved mixture of at least two constituents. This is
especially useful for mixtures obtained in evolutionary processes and when no prior information is available
about the nature and composition of these mixtures.
The Unscrambler MCR algorithm is based on pure-variable selection from PCA loadings to find the initial
estimation of spectral profiles, and then Alternative Least Squares (ALS) to optimize resolved spectral and
concentration profiles.
The algorithm can apply a constraint of Non-negativity in either spectral or concentration profiles or both.
It can also apply a constraint of Unimodality in concentration profiles that have only one maximum, and/or a
constraint of Closure in concentration profiles where the sum of the mixture constituents is constant.
The Unscrambler MCR functionality does not require any initi al guess input. A mixture dataset suitable for
MCR analysis should have at least four samples and four variables. If no initial guess is typed in, the
maximum number of variables is 5000.

General Context of Tutorial I


In this tutorial we will utilize UV/Vis spectra of dye mixtures to extract pure dye spectra and their relative
concentrations. The data are from the Institute of Applied Research (Prof. Kessler), Reutlingen University,
Germany.

What You Will Learn in Tutorial I


This tutorial contains the following parts:

Run a basic MCR analysis

Plot MCR results

Interpret MCR results

Run an MCR analysis with initial guess

Validate MCR results with reference information

Import an MCR result matrix and convert estimated concentrations into real scale.

Note: You will find the illustrations for this tutorial (Image I001, etc) at the end of the document.

Tutorial I - Data Tables


The data for this tutorial are stored in file Tutor_i in the Examples folder of Unscrambler data. Use menu
File - Open to open the data set. (Lookup Image I001)

98 Multivariate Curve Resolution of Dye Mixtures (Tutorial I) The Unscrambler Tutorials


Samples
The samples (objects) consist of 39 spectra of dye mixture samples. Samples 1 to 3 are pure dyes of blue, green
and orange, respectively. Samples 4 to 39 are 36 mixture samples of those 3 dyes. Click Modify - Edit Set
to open the Set Editor dialog box, and select the Sample Sets option to see the list of existing sample sets.
(Lookup Image I002)

Variables
The first three variables are concentration measurements of blue, green and orange dyes. Variables 4 to 59 are
UV/Vis spectra measured at range 250-800 nm with step 10 nm. In the Set Editor dialog box, select the
Variable Sets option to see the list of existing variable sets. (Lookup Image I003)
When you have seen the sets, click OK to leave this box and return to the data table.

Tutorial I - Plot Raw Data


Before starting the analysis, it is a good idea to have a look at the data.

Task
Plot the spectra of all mixture samples together:

How to Do It
1. Select the mixture samples 4-39 (either directly from the Editor, or with Edit - Select Samples the set
you are interested in is called Mixture).

2. Use Plot - Line (or the button from the toolbar) and choose Variable set 250-800nm as scope for
the plot. (Lookup Image I004)

To plot the reference spectra of the three pure components, select samples 1-3 and make a Line plot of
Variable set 250-800nm. (Lookup Image I005)
To plot the reference concentrations of the three dyes, select columns 1-3 and make a Line plot of Sample set
Mixture. (Lookup Image I006)

Note:
Reference measurements of spectra and concentrations of pure components are not necessary to make your
data set suitable for MCR!

Tutorial I - Run MCR with Default Options

Task
Set up the options for an MCR analysis, launch the calculations and plot results.

The Unscrambler Tutorials Multivariate Curve Resolution of Dye Mixtures (Tutorial I) 99


How to Do It
When dataset Tutor_i is active on screen, click Task - MCR. The MCR dialog box with default settings
will open up. Select Mixture [36] under the Samples tab, and 250-800nm [56] under the Variables tab.
(Lookup Image I007)
Keep all other settings as default, then click OK. After the calculation is done, click the View button in the
MCR Progress dialog box to see the results. (Lookup Image I008)
The MCR results overview includes four plots, from upper-left to lower-right: Estimated Concentrations,
Estimated Spectra, Sample Residuals, MCR Fitting and Total Residuals, MCR Fitting. The results
overview plots are displayed at the optimum number of pure components, which the system estimates to 3 in
this case. Our optimal number of components (3) is displayed at the bottom of each plot. (Lookup Image
I009)
Click File - Save As... and save your first MCR model as Dye_Result1.

Tutorial I - Plot MCR results

Task
Plot MCR results for various numbers of pure components.

How to Do It
Actually, the Unscrambler MCR procedure generates several sets of results, covering a number of estimated
pure components from 2 to <optimum +1>. By default, the results are plotted for the optimal number of
components.
You may view the results for varying numbers of pure components. Let us plot the spectral profiles for a 2 -
component solution. Click on the Estimated Concentrations plot to make it active (blue frame), then click Plot
- Estimated Spectra, select Number of Components as 2, and Profiles 1-2 as shown. (Lookup Image
I010)
Click OK: the plot of estimated spectra for a resolution with two pure components is displayed.
In a similar manner, click on the bottom left subview to make that plot active, then use Plot - Estimated
Spectra, to plot the 4-component solution.
MCR fitting and Principal Component Analysis (PCA) fitting results are also available for varying numbers of
pure components from 2 to <optimum +1>. Each fitting includes Variable Residuals, Sample Residuals and
Total Residuals plots. The plot of Total Residuals for MCR fitting is shown by default in the lower-right
subview. Like any other plot, it can also be accessed from the Plot menu. Click and activate the lower-right
subview, then click Plot - Residuals. In the MCR fitting tab, select Total Residuals. (Lookup Image
I011)
Click OK.
Here are the four plots which should now be displayed in your Viewer: (Lookup Image I012)

If the lower-right plot appears as a curve instead of bars, use Edit - Options (or or Ctrl+L) and select
Bars as Plot Layout.

100 Multivariate Curve Resolution of Dye Mixtures (Tutorial I) The Unscrambler Tutorials
Tutorial I - Interpret MCR results

Task
Determine the optimum number of pure components.

How to Do It
In the Total Residuals, MCR Fitting plot, residuals are high for 2 components, low for 3 components, and not
significantly decreasing for 4 components. (Lookup Image I012) This suggests that 3 components is the
optimum solution.
Click and activate the Estimated Spectra plot with 3 components, and enlarge it by clicking Window - Copy
To - 1. The toolbar contains a set of buttons , which is used to navigate between results at
different numbers of components. Use the buttons to increase and decrease the number of components, and
watch the impact on the profiles.
As you can see, the 4-component solution contains two almost identical spectral profiles. This also suggests
that 4 components may not be the optimum number, and that the mixtures contain three pure components only.

Tutorial I - Run MCR with Initial Guess

Task
Run an MCR calculation with Initial Guess.

How to Do It
If prior knowledge such as spectra of pure components or concentrations of mixture samples exists, you may
include this information in the MCR calculation to help the algorithm converge towards the right solution of
curve resolution.
Go back to data table Tutor_i data by using menu Window - Tutor_i. Click Task - MCR. The MCR
dialog box with default settings will open up. In the dialog box, click Enable Initial Guess and select option
Spectra (Samples). (Lookup Image I013)
Click the Select button and pick rows 1 to 3 as initial guess for spectra (Lookup Image I014), then click
OK to return to the MCR dialog box.
Click OK to launch the calculations, then View to open the model results. (Lookup Image I015)
Save the result file as Dye_Result2.

Notes
1. When using the initial guess option, The Unscrambler requires all pure components to be included as
initial guess inputs. Partial reference will generate erroneous results. It is recommended to run MCR without
initial guess if only partial reference is available.
2. The Unscrambler only requires either spectra or concentration of pure component as an initial guess input.

The Unscrambler Tutorials Multivariate Curve Resolution of Dye Mixtures (Tutorial I) 101
Tutorial I - Validate the Estimated Results with Reference
Information

Task
We are going to compare the models Estimated Concentrations for a 3-component solution to the existing
reference concentrations found in the data table and plotted earlier. In a first step we are going to compare the
concentration profiles visually.

How to Do It
Select the Estimated Concentrations plot, then use menu Window - Copy To - 1. Reduce the window size of
the plot on your screen. Then go back to the data table (Window - Tutor_i) and build a line plot of the three
concentrations (first 3 columns of the table). Resize the windows of the two plots in order to compare them on
screen. (Lookup Image I016)
st
You can observe that the 1 estimated concentration profile is similar to the reference profile of the blue dye
nd
(blue curves on the plots), the 2 estimated concentration profile is similar to the reference profile of the green
dye (red curves on the plots), and the 3rd estimated concentration profile is very close to the reference
concentration of the orange dye (green curves on the plots).

Note!
Estimated concentrations are relative values within an individual component itself. Estimated concentrations of
a sample are NOT its real composition.

The estimated spectral profiles can be compared to the reference spectral profiles in the same way as for the
concentrations. Because we used the spectra as initial guess inputs in this example, the comparison shows a
perfect match. However, estimated spectra are unit-vector normalized, they are not the real spectral profile
of the samples. (Lookup Image I017)

Tutorial I - Import an MCR Result Matrix

Tasks

Import the MCR result matrix of estimated concentrations,

Compare the estimated concentrations to the reference concentrations in 2D scatter plots,

Convert the estimated concentrations into real scale.

How to Do It
Use menu File - Import - Unscrambler Results, and select your MCR result file Dye_Result2. Click
Import. The Import from MCR Result dialog box will open up. Select matrix Estimated Conc and type in 3
in the PCs box, to import the concentration profiles for a 3-component mixture system. (Lookup Image
I018)
Click OK to perform the importation. A new data table Dye_Result2_Estimated Conc is generated.
(Lookup Image I019)
Insert three empty rows at the top of this table, so that the table has a total of 39 rows. (Lookup Image I020)

102 Multivariate Curve Resolution of Dye Mixtures (Tutorial I) The Unscrambler Tutorials
Go to table Tutor_i, select the first three columns (blue, green and orange), copy them and paste them at the
beginning of the new data table. We now have a table of six columns, containing the three measured
concentrations of the pure dyes followed by the three estimated concentrations. (Lookup Image I021)

Select columns Blue and 1 (press the Ctrl key on your keyboard to select several columns at a time). Click
Plot - 2D Scatter to display a 2D Scatter plot of these columns. The correlation between estimated and
reference concentrations for the blue dye is of 0.994. If the box containing plot statistics (among which
correlation) is not displayed on the upper left corner of your plot, use View - Plot Statistics to display it.
For the green dye (columns Green and 2 in the table), the correlation between estimated and reference
concentrations is of 0.997.
As for the orange dye (columns Orange and 3), the correlation is of 0.998. These very high correlations
indicate that the MCR calculations have determined concentration profiles accurately in this case. (Lookup
Image I022)

Now let us convert the estimated Orange concentrations to real scale. In order to do this, at least one reference
measurement is needed. The estimated concentrations (in relative scale) of all samples can be converted into
real concentration scale by multiplying by a factor <real concentration / estimated concentration>.
In the present case, we can use for example sample PROBE_11, which has a reference concentration of Orange
dye of 7 and an estimated concentration of 0.4443.
Use menu Edit - Append - Variables to append a new column at the end of the table, and name it MCR
Orange real scale. Go to Modify - Compute General, and type in the expression: V7=V6*(7/0.4443)
in the Expression space. (Lookup Image I023)
Click OK to perform the calculation. The new column fills up with the values of estimated Orange dye
concentrations converted to real scale. (Lookup Image I024)

The Unscrambler Tutorials Multivariate Curve Resolution of Dye Mixtures (Tutorial I) 103
Constraint Settings in Multivariate Curve Resolution
(Tutorial J)

Description of Tutorial J
Context of Tutorial J
In this tutorial we will utilize FTIR spectra of an esterification reaction to extract pure spectra and their relative
concentrations. The original data are from University of Rhode Island (Prof. Chris Brown), USA.
The esterification reaction of iso-propanol and acetic anhydride using pyridine as a catalyst in carbon
tetrachloride solution was monitored by FTIR. The initial concentrations of these three chemicals were 15%,
10% and 5% in volume, respectively. Iso-propyl acetate was one of the products in this typical esterification
reaction. The reaction was carried out in a ZnSe cell, and mixture spectra were measured at 4 cm -1 resolution.
The data set consisted of 25 spectra, covering approximately 75 minutes of the reaction. To shift the
equilibrium of the esterification, one-tenth of the volume was removed from the cell at 24, 45 and 60 minutes.
An equal amount of a single reactant was added to the cell in the sequence of acetic anhydride, pyridine and
iso-propanol.

What You Will Learn in Tutorial J


This tutorial contains the following parts:

Estimate the number of pure components and detect outliers with PCA

Run MCR with default settings

Tune the sensitivity to pure components setting

Run MCR with a constraint of closure

Use the Recalculate functionality in MCR

Tutorial J - Data Table


The data for this tutorial are stored in file Tutor_j in the Examples folder of Unscrambler data. Use menu File
- Open to open the data table. (Lookup Image J001)

Note: You will find the illustrations for this tutorial (Image J001, etc) at the end of the document.

Tutorial J - Estimate the Number of Pure Components and Detect


Outliers with PCA
Principal Component Analysis (PCA) is recommended before running an MCR calculation. It provides some
information on the number of pure components and on sample outliers.

104 Constraint Settings in Multivariate Curve Resolution (Tutorial J) The Unscrambler Tutorials
Task
Run a PCA on the raw data.

How to Do It
Click Task - PCA to run a Principal Component Analysis and choose the following settings:

Sample set: All Samples

Variable set: All Variables

Validation Method: Full cross-validation

Num PCs: 10
(Lookup Image J002)

Once the PCA calculations are done, click View to open the result viewer. (Lookup Image J003)

Click Plot - Loadings, select a plot of type Line, and type in value 1-3 in field Vector 1, so that the first
three principal components will be represented into the same line plot. (Lookup Image J004)
Click OK to display the plot.
Select another plotting area by clicking on it with the mouse, for example the upper-right subview. Click Plot -
Loadings, select a plot of type Line, and type in value 4-6 in field Vector 1. Click OK to display the plot.
(Lookup Image J005)

th
You can see that the loadings along the 6 principal component are quite noisy. The program recommends four
components as the optimal number of PCs in this model. Select the Explained Variance plot by clicking on it
with the mouse, then click View - Numerical. (Lookup Image J006) As you can see, the explained
variance globally reaches a plateau from the 4 th principal component. The 5th and 6 th PCs still show some slight
increase; at that stage, it is difficult to know whether they represent noise or real information.

Now, study the Influence plot at the bottom-left corner of the Viewer. You may observe that sample 1 sticks
out from the group of samples, with a high leverage and a high residual variance. Go to menu Plot - Sample
Outliers to display a combination of four useful plots for outlier detection. The plot of Residual Sample
Variance at the bottom-left corner indicates a high validation residual for sample 1. (Lookup Image J007)
As there is no validation check in MCR, we may use the outlier information issued from PCA into our MCR
modelling later on.

Save the model file as PCA Tutorial J.

Tutorial J - Run MCR With Default Settings

Task
Build a first MCR model with default settings.

How to Do It

The Unscrambler Tutorials Constraint Settings in Multivariate Curve Resolution (Tutorial J) 105
Using menu Window - Tutor_j, go back to the data table. Click Task - MCR and keep the default
settings:

Sample set: All Samples

Variable set: All Variables

Non-negative concentrations: selected

Non-negative spectra: selected

Closure: not selected

Unimodality: not selected

Sensitivity to pure components: 100

(Lookup Image J008)

Click OK to launch the calculations.

Note: MCR computations are demanding. Building the model can easily take several minutes depending on the
size of the data set, the selected options and the capacity of your machine.

Click View when the calculations are finished; the MCR result viewer opens. Notice that the program suggests
4 as the optimal number of pure components, by indicating (4) at the bottom of each plot. (Lookup Image
J009)

Save the model file as mIR Result1.

Tutorial J - Tune the Model's Sensitivity to Pure Components

Task
Read the MCR Message List and follow the systems recommendation for the Sensitivity to pure
components setting.

How to Do It
Click on menu View - MCR Message List in model mIR Result1 to check the recommendations given
by the system. There are four types of recommendations:

Type 1: Increase sensitivity to pure components

Type 2: Decrease sensitivity to pure components

Type 3: Change sensitivity to pure components (increase or decrease)

Type 4: Baseline offset or normalization is recommended.
In the present case, the system recommends to change the setting for sensitivity to pure components. (Lookup
Image J010)

106 Constraint Settings in Multivariate Curve Resolution (Tutorial J) The Unscrambler Tutorials
The default setting (100) that was used for Sensitivity to pure components is usually a good starting point.
After interpreting the results and reading the system recommendations, you can tune it up or down between 10
and 190. The higher the Sensitivity, the more pure components will be extracted. Therefore, if too many
components are extracted, it is recommended to reduce the setting. On the opposite, if you would like to see
more components at an almost undetectable level, or even some noise profiles, it is recommended to increase
the setting.

Let us build a model with an increased setting.

Go back to the data table and re-do the MCR calculation with a Sensitivity to pure components setting of
150. (Lookup Image J011)
The plot of Estimated spectra is now shown by default for 5 components instead of 4 in the previous model.
(Lookup Image J012)
One can compare those profiles with FTIR spectra of known constituents, and identify the 5 estimated spectra
as pyridine, iso-propanol, a possible intermediate, propyl acetate and acetic anhydride, from curves 1-5
respectively.

Save the model file as mIR Result2.

Tutorial J - Run MCR with a Constraint of Closure

Task
Run MCR with a closure constraint. Compare two MCR models on the same data, with and without closure.

How to Do It
Among the MCR settings we have used so far, two types of constraints were not selected.
A constraint of Unimodality can be applied to restrict the resolution to concentration profiles that have only
one maximum.
With a constraint of Closure, the resolution will yield concentration profiles whose sum is constant.
th th
In the present case, acetic anhydride was added at 24 minutes (between the 8 and the 9 samples), which
means that the first 8 samples can be treated in closure conditions.
Go back to the data table and run a new MCR model with the following settings:

Sample set: Closure [8]
(contains the first 8 samples of the data table)

Variable set: All Variables

Non-negative concentrations: selected

Non-negative spectra: selected

Closure: selected

Unimodality: not selected

Sensitivity to pure components: 100
(Lookup Image J013)

The Unscrambler Tutorials Constraint Settings in Multivariate Curve Resolution (Tutorial J) 107
Once the computations are finished, save the model file as mIR Result3.

You may compare the resolved concentration and spectral profiles of pure components with and without the
closure setting. To do that, compute a new MCR model on sample set Closure without checking the Closure
constraint option. Save the new model file as mIR Result4 and compare the results to mIR Result3.
The spectral profiles under the constraint of closure present higher peaks for pure component 1 (blue) for
-1
wavelengths around 110 and 1250 cm . (Lookup Image J014)
You can also observe that under constraint of closure, the concentrations of the pure components always add
up to 1. (Lookup Image J015)

Notes on MCR result interpretation


1. The spectral profiles obtained may be compared to a library of FTIR spectra in order to identify the nature of
the pure components that were resolved.
2. Estimated concentrations are relative values within an individual component itself. Estimated concentrations
of a sample are NOT its real composition.

Tutorial J - Remove Outliers and Noisy Wavelengths with


"Recalculate"

Task
Use the interactive Recalculate functionality to remove samples or variables with high residuals.

How to Do It
Click menu Window - mIR_Result1 to bring back your first MCR model on screen.

The Validation calculations of the PCA model that we built earlier indicated that Sample 1 was an outlier. We
can check this again in the MCR model by looking at the PCA fitting residuals. Click on the bottom-right
subview to highlight it, then use Plot - Residuals, choose sheet PCA Fitting and option Sample Residuals.
You may notice a high residual showing for Sample 1, compared to the other samples. Let us build a model
without this sample.

Use the marking tools to highlight sample 1 on one of the plots, for example the Sample Residuals,
PCA Fitting plot. (Lookup Image J016)
Click menu Task - Recalculate Without Marked to specify a new MCR calculation without sample 1.
(Lookup Image J017)
This brings you back to the MCR dialog, where Sample 1 is now included in the Keep Out Of Calculation
field. You may launch the calculations to get the new MCR results.

Note that similarly, you may want to keep out of the model non-targeted wavelength regions, or highly
overlapped wavelength regions.
Click Plot - Residuals and choose Variable Residuals. (Lookup Image J018)

108 Constraint Settings in Multivariate Curve Resolution (Tutorial J) The Unscrambler Tutorials
Mark any unwanted variables on the plot using the marking tools, for examples variables around 1100-1140
cm-1 which present very high residuals (Lookup Image J019), then use Task - Recalculate Without
Marked to specify a new MCR calculation.

General notes on MCR settings and interpretation


1. To have reliable results on the number of pure components, one should cross-check with a PCA result,
change the sensitivity to pure components setting, and use the navigation bar to study the MCR results for
various numbers of pure components.
2. Weak components (either low concentration or noise) are usually listed first.
3. One can utilize estimated concentration profiles and other experimental information to analyze a chemical/
biochemical reaction mechanism.
4. One can utilize estimated spectral profiles to study the mixture composition or even intermediates during a
chemical/biochemical process.

The Unscrambler Tutorials Constraint Settings in Multivariate Curve Resolution (Tutorial J) 109
Tutorial C - Illustrations
C001 The Light Absorbance Spectrum

Absorbance
log(1/T)
3.5

3.0

2.5

2.0
12
1.5 1 23 4 56 78 9 10 11 1314 15 16

500 750 1000 nm

C002 The Select Samples dialog

C003 The Line Plot dialog

110 Tutorial C - Illustrations The Unscrambler Tutorials


C004 Line plot of raw data (4 samples selected)

C005 The Statistics dialog

C006 The Import from Statistics Result dialog

The Unscrambler Tutorials Tutorial C - Illustrations 111


C007 The Tutor_c data table (Xvar6 and Dye Level marked)

C008 The 2D Scatter plot of Dye Level against Xvar6

C009 The Scores plot for the first PLS1 model

112 Tutorial C - Illustrations The Unscrambler Tutorials


C010 The Warning List for the first PLS1 model

C011 The Line Plot of Absorbance for Samples 7,8,9,10

C012 The PLS1 Regression Progress dialog

The Unscrambler Tutorials Tutorial C - Illustrations 113


C013 The PLS1 Regression Overview (Tutorial C No Outliers)

C014 The Variances and RMSEP dialog

114 Tutorial C - Illustrations The Unscrambler Tutorials


C015 The X-variance and Y-variance plots

C016 The Scaling dialog

C017 The Scatter Effects result

The Unscrambler Tutorials Tutorial C - Illustrations 115


C018 The Multiplicative Scatter Correction dialog

C019 The Line Plot dialog

C020 The General - Line Plot of raw data (MSCorrected data)

C021 The Line Plot dialog with Source: Tutorial C No Outliers

116 Tutorial C - Illustrations The Unscrambler Tutorials


C022 The General Line Plot of raw data ( MSCorrected data and Original data)

C023 The Line Plot dialog with Source: Tutorial C and Matrix: ResYValTot

C024 Residual Variance plotted as General - Line Plot (Tutorial C, Tutorial C No


Outliers and Tutorial C MSCorrected)

The Unscrambler Tutorials Tutorial C - Illustrations 117


C025 The Variances and RMSEP dialog with RMSE sheet active

C026 The Multiplicative Scatter Correction dialog

118 Tutorial C - Illustrations The Unscrambler Tutorials


C027 The Prediction dialog (Tutorial C MSCorrected)

C028 The Prediction result (MSCorrected Predicted samples)

Tutorial D - Illustrations

D001 The Add Design Variable dialog

The Unscrambler Tutorials Tutorial D - Illustrations 119


D002 The Design Wizard - Design Details dialog

D003 The Enam FRD designed data table

D004 The Analysis of Effects dialog

120 Tutorial D - Illustrations The Unscrambler Tutorials


D005 The Enam FRD Analysis of Effects results displayed with Significance Testing
Method: Center

D006 The Enam FRD Analysis of Effects results displayed with Significance Testing
Method: COSCIND

D007 The Effects dialog

The Unscrambler Tutorials Tutorial D - Illustrations 121


D008 The effects displayed as Normal Probability Plot in the lower sub-view

D009 The Statistics results plotted as Percentiles and Mean and SDev (Design
Samples)

122 Tutorial D - Illustrations The Unscrambler Tutorials


D010 The Statistics dialog with the Compressed sheet active

D011 The Statistics results plotted as Percentiles and Mean and SDev (Design
samples and Center samples)

D012 The Response Surface dialog with the X-var sheet active

The Unscrambler Tutorials Tutorial D - Illustrations 123


D013 The Enam_ccd Response Surface Analysis results

D014 The ANOVA Table for the Response Surface model

124 Tutorial D - Illustrations The Unscrambler Tutorials


D015 The Residuals dialog

D016 The Studentized Residual plot

D017 The Predicted vs Measured plot

The Unscrambler Tutorials Tutorial D - Illustrations 125


D018 The Enam CCD Response Surface displayed as a Contour Plot

Tutorial E - Illustrations
E001 Data Table with category variable Iris

E002 Tutor_e Variances and RMSEP Plot

126 Tutorial E - Illustrations The Unscrambler Tutorials


E003 Score plot of the Training samples with Sample Grouping

E004 Classification Dialog

The Unscrambler Tutorials Tutorial E - Illustrations 127


E005 Classification Table for the Testing samples

E006 Classification Dialog (Coomans Plot)

128 Tutorial E - Illustrations The Unscrambler Tutorials


E007 Coomans plot Versicolor vs. Virginica

E008 Si vs Hi plot for model Versicolor

E009 Model distance to Versicolor model

The Unscrambler Tutorials Tutorial E - Illustrations 129


E010 Discrimination Power: Versicolor onto Setosa

Tutorial F - Illustrations
F001 The Import ASCII dialog

F002 The Select Drop Method dialog

130 Tutorial F - Illustrations The Unscrambler Tutorials


F003 The Import Target dialog

F004 The Import Worksheet dialog

F005 The Category Variable Wizard Enter Variable Name and Choosing Method
dialog

The Unscrambler Tutorials Tutorial F - Illustrations 131


F006 The PLS1 results displayed as Regression Overview

F007 The Variance dialog

132 Tutorial F - Illustrations The Unscrambler Tutorials


F008 The Export ASCII dialog

Tutorial G - Illustrations

G001 The Design Wizard - Select Design Type dialog

The Unscrambler Tutorials Tutorial G - Illustrations 133


G002 The Add Design Variable dialog

G003 The Design Wizard - Define Mixture Variables dialog with three defined
variables

G004 The Design Wizard - Define Model dialog

134 Tutorial G - Illustrations The Unscrambler Tutorials


G005 The Design Wizard - Design Type (Mixture) dialog

G006 The Design Wizard - Last Checks dialog

G007 The Information dialog displayed upon exiting the Design Wizard

The Unscrambler Tutorials Tutorial G - Illustrations 135


G008 The mixture design table displayed in the Editor (not yet saved)

G009 The Import Target dialog

G010 The Import Worksheet dialog - Selecting ranges for Data, Sample names and
Variable names

136 Tutorial G - Illustrations The Unscrambler Tutorials


G011 The Statistics results displayed after clicking View

G012 The Statistics dialog with the Compressed sheet active

The Unscrambler Tutorials Tutorial G - Illustrations 137


G013 The Statistics results for Design samples and Center samples

G014 The PLS2 Regression Progress dialog showing high residual variance and
several warnings

138 Tutorial G - Illustrations The Unscrambler Tutorials


G015 The Residuals dialog - General sheet, Influence plot for Y selected (4 PCs)

G016 The Influence plot for the first PLS2 model

G017 The Outlier List for the first PLS2 model

The Unscrambler Tutorials Tutorial G - Illustrations 139


G018 The Calibration and Validation variances for the second PLS2 model (after
correcting the data)

G019 The Correlation Loadings for X and Y (PC1, PC2)

140 Tutorial G - Illustrations The Unscrambler Tutorials


G020 The Regression Coefficients for Accept, Sweet, Bitter and Fruity displayed
with Uncertainty Limits

G021 The Response Surface dialog with the X-variables sheet active

The Unscrambler Tutorials Tutorial G - Illustrations 141


G022 The Response Surface for Accept, Sweet, Bitter and Fruity

G023 The Response Surface for Accept with the optimum coordinates and value

142 Tutorial G - Illustrations The Unscrambler Tutorials


Tutorial H - Illustrations
H001 Open File dialog, opening the data file of type 3D Data for tutorial H

H002 Tutor_h_X3D data table displayed in the 3D Editor

The Unscrambler Tutorials Tutorial H - Illustrations 143


H003 Tutor_h_X3D data table in OV2 layout, size 32x(66x31)

H004 Fluorescence spectra of sample 13 (Beech, Fresh, Fine, high Severity


treatment) in Landscape layout. Variables are in their series, not in real wavelength

144 Tutorial H - Illustrations The Unscrambler Tutorials


H005 Fluorescence spectra of sample 13 (Beech, Fresh, Fine, high Severity
treatment) in Map layout. Variables are in their series, not in real wavelength

H006 Fluorescence spectra of sample 29 (Spruce, Fresh, Fine, high Severity


treatment) in Landscape layout. Variables are in their series, not in real wavelength

H007 Set Editor dialog for an OV2 data table. Primary Variable Sets, Secondary
Variable sets and Sample sets can be defined

The Unscrambler Tutorials Tutorial H - Illustrations 145


H008 New Primary Variable Set dialog

H009 Set Editor dialog for an OV 2 data table. A Primary Variable Set was defined, now
the Secondary Variable Sets option is selected to define a new set

H010 New Secondary Variable Set dialog

146 Tutorial H - Illustrations The Unscrambler Tutorials


H011 Set Editor dialog for an OV2 data table. A Secondary Variable Set was defined

H012 Save As dialog

H013 Regression (Three-Way PLS) dialog

The Unscrambler Tutorials Tutorial H - Illustrations 147


H014 Three-Way PLS Regression Progress dialog, Model 1 calculations

H015 Three-Way PLS Regression Overview, Wood Severity_model 1

148 Tutorial H - Illustrations The Unscrambler Tutorials


H016 Sample Outliers plots, Wood Severity_model 1

H017 Sample Outliers plots, Wood Severity_model 1, with sample 18 marked with a
circle

The Unscrambler Tutorials Tutorial H - Illustrations 149


H018 Regression (Three-Way PLS) dialog with sample 18 kept out of calculations

H019 Three-Way PLS Regression Progress dialog, Model 2 calculations

150 Tutorial H - Illustrations The Unscrambler Tutorials


H020 Sample Outliers plots, Wood Severity_model 2

H021 Y Residual Validation Variance plot, Wood Severity_model 2

H022 Options dialog

The Unscrambler Tutorials Tutorial H - Illustrations 151


H023 Scores Plot, PC1 vs. PC3, Spruce (S) and Beech (B) samples

H024 Loading Weights dialog

H025 X1-Loading Weights (Excitation spectra) and X2-Loading Weights (Emission


spectra) for PC1, PC2 and PC3

152 Tutorial H - Illustrations The Unscrambler Tutorials


H026 Regression Coefficients dialog

H027 Regression Coefficients of Excitation Emission spectra for modelling Severity;


7 PCs, Landscape layout

H028 Regression Coefficients of Excitation Emission spectra for modelling Severity;


7 PCs, Map layout

The Unscrambler Tutorials Tutorial H - Illustrations 153


H029 Regression Coefficients of Excitation Emission spectra for modelling Severity;
2 PCs (left) and 3 PCs (right), Map layout

H030 Predicted vs Measured dialog

H031 Predicted and Measured plot, Calibration results, 7 PCs

154 Tutorial H - Illustrations The Unscrambler Tutorials


H032 Predicted vs Measured plot, Calibration results, 7 PCs

Tutorial I Illustrations
I001 Tutor_i data table, size 39x59

I002 Tutor_i data table - Sample Sets

The Unscrambler Tutorials Tutorial I Illustrations 155


I003 Tutor_i data table - Variable Sets

I004 UV/Vis spectra of 36 mixtures at 250-800 nm


0.6

0.4

0.2

0
Variables
200 400 600 800
PROBE_01 PROBE_1B PROBE_02PROBE_2B PROBE_03 PROBE_3B PROBE_04

I005 Spectra of original 3 pure dyes (blue, green and orange)


1.5

1.0

0.5

0
Variab les
200 400 600 800
BB_50 GR_50 OR_50

156 Tutorial I Illustrations The Unscrambler Tutorials


I006 Concentrations of 3 pure dyes (blue, green and orange) in 36 mixtures
20

15

10

0
Samples
10 20 30 40
Blue Green Orange

I007 MCR dialog

I008 MCR Progress dialog

The Unscrambler Tutorials Tutorial I Illustrations 157


I009 MCR Overview, Dye_Result1 model

I010 Estimated Spectra dialog, plotting estimated spectra for a 2-component solution

I011 Residuals dialog, plotting total residuals at all components

158 Tutorial I Illustrations The Unscrambler Tutorials


I012 MCR plots, clockwise from up-left: estimated spectra at 2 components,
estimated spectra at 3 components, total residuals and estimated spectra at 4
components.

I013 MCR calculation with initial guess

The Unscrambler Tutorials Tutorial I Illustrations 159


I014 Select Samples dialog, selection of the three pure dye spectra for the initial
guess

I015 MCR Overview, Dye_result2 model with initial guess of spectra

I016 Comparing the estimated concentrations to the reference concentrations


Estimated concentrations: from model Dye_Result2 (left)
Reference: from data table Tutor_i (right)

160 Tutorial I Illustrations The Unscrambler Tutorials


I017 Comparing the estimated spectra to the reference spectra of the three pure dyes
Estimated spectra: from model Dye_Result2 (left)
Reference: from data table Tutor_i (right)

I018 Import from MCR Result dialog

The Unscrambler Tutorials Tutorial I Illustrations 161


I019 Imported matrix of estimated concentrations for a 3-component mixture

I020 Imported matrix after insertion of three empty rows to the top

I021 Reference and estimated concentrations of the three dyes.

162 Tutorial I Illustrations The Unscrambler Tutorials


I022 2D Scatter Plot of original and estimated Orange concentrations from the MCR
model Dye_Result2

I023 Compute dialog

I024 Editor with a column presenting the estimated Orange concentrations converted
to real scale.

The Unscrambler Tutorials Tutorial I Illustrations 163


Tutorial J - Illustrations
J001 Data Table Tutor_j

J002 PCA dialog box

164 Tutorial J - Illustrations The Unscrambler Tutorials


J003 PCA Overview

J004 Loadings dialog

J005 PCA viewer


Top left: Loadings on PC1, PC2 and PC3.
Top right: Loadings on PC4, PC5 and PC6

The Unscrambler Tutorials Tutorial J - Illustrations 165


J006 Explained calibration (blue) and validation (red) variances

J007 Residual Sample Variance plot in PCA


Sample 1 has a high residual validation variance.

166 Tutorial J - Illustrations The Unscrambler Tutorials


J008 Multivariate Curve Resolution (MCR) dialog

J009 MCR Overview for model mIR Result1 with default settings

J010 View - MCR Message List...

The Unscrambler Tutorials Tutorial J - Illustrations 167


J011 Increase Sensitivity to pure components in the MCR dialog box

J012 Estimated spectra with Sensitivity to pure components set at 150 (model mIR
Result 2)

168 Tutorial J - Illustrations The Unscrambler Tutorials


J013 Check the Closure option in the MCR dialog box

J014 Estimated spectra with and without closure constraint


Top: model mIR Result 3, with closure constraint
Bottom: model mIR Result4, without closure constraint

The Unscrambler Tutorials Tutorial J - Illustrations 169


J015 Estimated concentrations with and without closure constraint
Top: model mIR Result 3, with closure constraint
Bottom: model mIR Result4, without closure constraint

J016 Marking of sample 1 on a Sample Residuals plot

170 Tutorial J - Illustrations The Unscrambler Tutorials


J017 Recalculate by keeping out sample 1

J018 Residuals dialog

The Unscrambler Tutorials Tutorial J - Illustrations 171


J019 Marking of variables around 1124 cm -1

172 Tutorial J - Illustrations The Unscrambler Tutorials


The Unscrambler Tutorials 173

You might also like