You are on page 1of 22

Chemoface

User guide






Cleiton A. Nunes
Universidade Federal de Lavras
Lavras MG Brasil
Feb/2013
2

Contents

Installing 3
Installation on Microsoft Windows 3
Installation on Unix-like systems 3
Updating 3
Running 3
Experimental Design 4
Designing the experiment 4
Inserting the responses 5
Analyzing the design (not applicable to Mixture Design) 5
Building model and response surfaces (not applicable to Fractional and Plackett-Burman designs) 5
Additional features 6
Pattern Recognition 7
Inserting data set 7
Data preprocessing 8
Performing PCA 8
Performing HCA 9
Additional features 10
Multivariate calibration 11
Inserting data set 11
Data preprocessing 11
Selecting test set (optional) 12
Performing leave-one-out cross-validation 12
Performing the calibration (obtaining the model) 12
Performing external validation using test set (optional) 14
Predicting y for new samples 14
Performing Discriminant Analysis (PLS-DA, PCR-DA, MLR-DA) 14
Additional features 14
Data Plot 15
Inserting data set 15
Data preprocessing 15
Plotting data 15
Additional features 16
Data Organization 17
Importing files (only .txt, .dat, .csv, .bmp) 17
Exporting the dataset 19
Additional features 19
Figure handling 20
Saving figure with high resolution 20
Tools on the figure window 21
References 22

3

Installing

Chemoface
1
was developed on the MATLAB
2
environment. It is a stand-alone application and does not
require a MATLAB license installation to run. Indeed, only MATLAB Compiler Runtime (MCR) is required to
be installed, which is freely available along with Chemoface. MCR is a set of shared libraries that provides
complete support for all the features of MATLAB.

Installation on Microsoft Windows
Donwload and install MCR.
Download and install Chemoface.

Installation on Unix-like systems
Install MCR and Chemoface using the Wine (www.winehq.org).
For better viewing, install the corefonts and enable fontsmooth.
Installing corefonts using Winetricks:
Select the default wineprefix > Install a font > check corefonts > OK
Enabling fontsmooth using Winetricks:
Select the default wineprefix > Change settings > check fontsmooth=rgb > OK

Updating
At Chemoface home screen, access menu Help > Check for update. If there is a new version, download and
install Chemoface as reported above. For update, MCR installation is no required.

Running
Access the modules from Chemoface home screen (Figure 1).
A screen resolution lower than 1024x768 is not recommended.
Chemoface uses dot (.) as decimal mark.

Figure 1. Chemoface home screen
4

Experimental Design


Figure 2. Designing experiments

Designing the experiment
Select the design type (1).
Select the number of factors (2).
Select the number of replications for the response (dependent variable) (3).
If appropriate, add central point (4) and select the number of replications (5).
If Fractional Factorial is used, select the size of the fraction (6).
If Mixture Design is used, select the Simplex type (7).
Set the label of the response and the labels and levels for the factors (8).
Note: If Full, Fractional, Central Composite or Plackett-Burman is used, provide only values for -1 and +1;
additional levels (e.g. axial and central points) are automatically calculated. If Mixture Design is used,
provide values ranging from 0 to 1; constraints can be used, providing values >0 and/or <1. If Plackett-
Burman is used, dummy factors are just left empty.
Press Design button (9).
Note: It is possible to alternate between coded and uncoded factor levels using menu Edit > Alternate coded-
uncoed variables. The model will be fitted according to the provided levels (coded or uncoded).
1
2
3
4 5
6

7
8 9
10
11
12
13 14
18
19
20

21
15
17


16
22
5

Inserting the responses
Insert the responses (dependent variable) typing it on the table (10) or paste from a spreadsheet using
menu File > Paste responses.

Analyzing the design (not applicable to Mixture Design)
Select the significance level (11).
Select the mode to calculate the error (12). If selected, error is calculated by replications on central
points (if available). If unselected, the error is calculated by replications in each assay (if available). All
errors are calculated as pure error. If Plackett-Burman is used and it is unselected, error is calculated by
dummy factors.
Press Effects button (13) to obtain the effects table (Figure 3A).
Press Paretos Chart button (14) to obtain a Paretos graph for the effects (Figure 3B).


A

B

Figure 3. Table and graph for the effects

Building model and response surfaces (not applicable to Fractional and Plackett-Burman designs)
Select the model type (15).
Select the significance level (16).
Press Coefficients Stats button (17) to obtain a table with the model coefficients (Figure 4A).
Choose if experimental points should be plotted (18) and if only significant coefficients (19) should be
used for model fit.
Select the graph type (20).
If the design has more than two factors, select the factors to be plotted (21) and the set the level of the
factor(s) no plotted (Figure 4B).
Press Surface Plot button (22) to obtain the graph and the statistics for the model (Figure 4C-D).
6


A

B
C


D

Figure 4. Tables and graph for the model

Additional features
All obtained tables can be copied using menu Edit > Copy
A designed experiment can be saved using menu File > Save data. After, this file can be opened using
menu File > Open data.
Experimental assays for a Mixture design can be plotted using menu +plot > Experimental points.


7

Pattern Recognition


Figure 5. Pattern recognition by PCA and HCA

Inserting data set
Type data on the table (1) or paste it from a spreadsheet (or from Data Organization module) using
menu File > Paste data. Only numeric data should be inserted. Use sample in rows and variables in
columns. In the example, samples are propolis from different seasons and variables are specific m/z
signals from its mass spectra.
3

Note: see Data Organization chapter to detail about how to import data set such as spectra, bitmaps, etc.
If appropriate, insert sample labels using menu Edit > Sample labels or classes. If there are sample
classes, it can be inserted similarly (Figure 6A).
If appropriated, insert variable labels using menu Edit > Variable labels. If data set is spectroscopic, use
> Spectroscopic data (Figure 6C) and insert the spectral range. For all other data set, use > Generic
data (Figure 6B).
Note: if labels are not provided, these are numbered.


1
2 3
4
5
6
7

8

10

16
8
11

12

13


14
15
15
17
8


A

B

C

D

Figure 6. Inserting sample and variable labels

Data preprocessing
Select the preprocessing method (2).
Press Apply button (3).
Note: Multiple preprocessing methods can be applied just by selecting it and pressing Apply.
To return the original data after a preprocessing, select No preprocess and press Apply.

Performing PCA
Press Run PCA button (4).
Select PC to be plotted on X and Y axis (5). For a 3D plot (Figure 7D), select the PC to be plotted on Z
axis; if a 2D plot is used, set it in none.
Note: If sample classes are provided, they may be colored on the scores plot: select Colored by sample
classes (6). Optionally sample labels also can be inserted: just provide the labels when solicited (Figure
6D).
For scores graph, press Scores plot button (7) (Figure 7A).
For loadings graph (Figure 7B), select the plot type: points or vectors (8).
Press Loadings plot button (9)
For a scores and loadings graph (Figure 7C), press Biplot button (10)
9


A



B


C



D
Figure 7. PCA plots


Performing HCA
Select the distance type (11).
Select the linkage type (12).
Note: Optionally PCA can be preciously applied to HCA: select Apply PCA to data (13) and select the
number of PC to be used when solicited.
Press Dendrogram button (14) to rum HCA and obtain the dendrogram (Figure 8B).
Insert the threshold to color the similar groups (Figure 8A). If not applicable, press Cancel.


10


A
B

Figure 8. HCA plot

Additional features
Additional columns and rows can be inserted on the table using + buttons (15).
Columns and rows can be deleted using menu Edit > Delete
Data set can be transposed using menu Edit > Transpose.
Data set can be saved using menu File > Save data. After, this file can be opened using menu File >
Open data.
Numeric values used to build the graphs may be copied pressing Copy plot data button (16).
PC variance table can be copied pressing Copy table data button (17).

11

Multivariate calibration


Figure 9. Performing multivariate calibration

Inserting data set
To insert the independent variables (matrix X) paste it from a spreadsheet (or from Data Organization
module) using menu File > Paste X. Use sample in rows and variables in columns. In the example, the
matrix X is formed by transmittances from 4000 to 600 cm
-1
for coffee samples (1).
4

To insert the dependent variable (vector y) paste it from a spreadsheet using menu File > Paste Y. In
the example, y is the content of husk (%) in coffee samples (2).
Note: It is possible to insert multiple y and build the respective models; for this, provide Y as a matrix with
samples in rows and each y in a column.
Optionally, sample and variables labels for X (3), as well as, property label for y (4) can be inserted
using menu Edit > similarly to Figure 6. If labels are not provided, these are numbered.

Data preprocessing
Select the preprocessing method (5).
Press Apply button (6).
Note: Multiple preprocessing methods (7) can be applied just by selecting it and pressing Apply.
To return the original data after a preprocessing, select No preprocess and press Apply.
1 2
3
4
5 6
7
8
9
10
11 22 15 22 18 22
12 23 16 23 19 23
13 14
17
20
24
24 24
12

Selecting test set (optional)
Samples can be selected and used as test set (external validation); for this use menu Edit > Select
sample for test set and select the method (Figure 10A). The test set samples can be informed manually
(Figure 10B) or automatically selected by Kennard-Stone algorithm
5
(Figure 10C). Test samples are
marked with a t on the tables (1,2).


A

B

C

Figure 10. Selecting samples for test set

Performing leave-one-out cross-validation
Select the regression method (PLS, PCR or MLR) (8).
Select the maximum number of latent variables (for PLS) or principal components (for PCR) to be tested
on the cross-validation (9).
Press Rum CV button (10).
To build the graphs (such as RMSECV, R, and others) (Figure 11A), select the data to be plotted in X
and Y axis (11) and press Plot button (12).

Performing the calibration (obtaining the model)
If PLS or PCR is used, select the adequate number of LV or PC to be used in the model (13).
Press Fit button (14).
To build the graphs (such as measured x predicted and others), select the data to be plotted in X and Y
axis (15) and press Plot button (16).
To obtain the performance parameters of the model (such as RMSEC, R and others) (Figure 11B),
select Model stats in X axis or Y axis (15) and press Plot button (16).
Note: To detail about statistical parameters for model performance, please see cited references.
6-7


13


A

B

C

D

Figure 11. Plots for multivariate calibration
14

Performing external validation using test set (optional)
If samples for test set was previously provided (informing it manually or by Kennard-Stone algorithm
5
),
press Predict button (17).
If test set was not provided as above, insert the X and y (or Y if multiple y) for external validation using
menu File > Paste new X for prediction and menu File > Paste new Y for prediction test respectively.
To build the graphs (such as measured x predicted and others), select the data to be plotted in X and Y
axis (18) and press Plot button (19).
To obtain the performance parameters of the prediction, select Pred stats in X axis or Y axis (18) and
press Plot button (19).

Predicting y for new samples
Insert the new X to have the y predicted using menu File > Paste new X for prediction.
Press Predict button (17).
View the predicted values selecting Samples in X axis and Predicted in Y axis (18) and press Plot
button (19).

Performing Discriminant Analysis (PLS-DA, PCR-DA, MLR-DA)
If discriminant analysis is used, provide the classes as numbers in dependent variables (Y data, 2);
e.g.: Class A = 1, Class B = 2, Class C = 3, etc.
Enable Classes (For DA) (20).
Perform leave-one-out cross validation as previously described.
Perform calibration as previously described.
Perform external validation using test set as previously described (optional).
Predict the classes for new samples as previously described above in Predicting y for new samples.

Additional features
Outliers samples can be checked (Figure 11C). After to perform the calibration (14), select Sample
leverages in X axis and Studentized residuals in Y axis (15) and press Plot button (16).
If multiple y is used, a property (22) should be selected before any plot.
Data plotted on the graphs can be copied and used out of Chemoface; use Copy plot data buttons (23).
Additional columns and rows can be inserted on the tables using + buttons (24).
Columns and rows can be deleted using menu Edit > Delete
Data set can be transposed using menu Edit > Transpose.
Measured x Predicted values for cross-validation, calibration and test sets can be plotted in a single
graph (Figure 11D) using menu Multiplot > Measured x Predicted. Scores for PLS-DA or PCR-DA can
be plotted similarly.
Data set can be saved using menu File > Save data. After, this file can be opened using menu File >
Open data.
Obtained model can be saved using menu File > Save data. After, this file can be opened using menu
File > Open model and the calibrated model can be used to further predictions.

15

Data Plot


Figure 12. Plotting data

Inserting data set
Type data on the table (1) or paste it from a spreadsheet (or from Data Organization module) using
menu File > Paste data. Only numeric data should be inserted. Use sample in rows and variables in
columns. In the example, the data set is formed by transmittances from 4000 to 600 cm
-1
(1).
4

Note: see Data Organization chapter to detail about how to import data set such as spectra, bitmaps, etc.
Optionally, sample and variables labels (2) can be inserted using menu Edit > similarly to Figure 6. If
labels are not provided, these are numbered.

Data preprocessing
Select the preprocessing method (3).
Press Apply button (4).
Note: Multiple preprocessing methods (5) can be applied just by selecting it and pressing Apply.
To return the original data after a preprocessing, select No preprocess and press Apply.

Plotting data
Insert the X axis label (6).
Insert the Y axis label (7).
If appropriate, insert the Y axis label for the preprocessed graph (8).
1
2
3 4
5
6 7 9

8
11
12
12
16

If appropriate, set reverse X axis (9).
To plot no-preprocessed/preprocessed data (Figure 13), check Plot data without preprocessing (10).
Press Pot data button (11).


Figure 13. No preprocessed and preprocessed spectra

Additional features
Additional columns and rows can be inserted on the tables using + buttons (12).
Columns and rows can be deleted using menu Edit > Delete
Data set can be transposed using menu Edit > Transpose.
Data set can be saved using menu File > Save data. After, this file can be opened using menu File >
Open data.

17


Data Organization

Data Organization module allows importing numerical data from .txt, .dat, .csv files, and images in .bmp.
Multiple files, such as spectra files, can be imported simultaneously. The process of importing images (.bmp)
is based on converting them in a three-way array containing the RGB values for each pixel. Then the values
of R, G, and B are summed to each pixel, resulting in a two-way array (matrix). Finally, this matrix is unfolded
to generate a vector. This is particularly useful to import molecular figures to be used as descriptors in MIA-
QSAR models.
8
In the example, files are MIR spectra for adulterated coffee.
4



Figure 14. Importing data set

Importing files (only .txt, .dat, .csv, .bmp)
Menu File > Import data.
Select the file type (only .txt, .dat, .csv or .bmp) (Figure 15A).
Select the files to be imported (use Shift or Ctrl keys to select multiple files) (Figure 15B)
Press Open.

Note:
Rows from respective file names are showed as sample labels (1).
If data set is very large, it is asked to show data in the table (Figure 15C). If No is selected, the files
are imported and a message is showed (2) (Figure 15D).


1
18


A

B

C

D
Figure 15. Selecting files to be imported

2
19

Exporting the dataset
To use the obtained dataset into Chemoface, use menu File > Copy data set. Then, paste it in the
Chemoface tables using menu File > Paste of the used module, as previously described in Inserting
data. Data set also can be pasted in another software with support to spreadsheet.
Data set also can be saved as .txt (ASCII with space as column separator) using menu File > Save
data. This saved file cannot be opened using the Chemoface modules using menu File > Open, but
the numerical data can be copied and pasted using menu File > Paste in the Chemoface modules, as
previously described.

Additional features
Specific, odd, even or null standard deviation columns and rows can be deleted using menu Edit >
Delete
Data set can be transposed using menu Edit > Transpose.
Columns and rows can be numbered using menu Edit > Numbered


20

Figure handling

Saving figure with high resolution
Access menu File > Export setup in the figure window.
In the setup window (Figure 16A), access Rendering (1).
Set the color to be saved, e.g., RGB color, gray scale, etc (2).
Set the resolution to be saved, e.g., 600 dpi (3).
Press Export button (4).
Set the file name (5) (Figure 16B)
Select the file type, e.g., .tif (6).
Press Save button (7)
Note: If .fig (MATLAB figure) is selected in file type, the file can be opened only using menu File > Open
Figure in the Chemoface home screen or by Matlab software. Figures (.fig) can be edited (e.g., font size,
font color) only on the Matlab software.


A

B
1
2
3
4
5 7

6
21


Tools on the figure window






*To move legend, click with left mouse button and drag.


Save figure
(low resolution)
Print
figure
Zoon
Move
graph
Rotate 3D
graph
Read data
Click over graph
to read values
Enable/disable
color bar
Enable/disable
Legend*
22

References

1. NUNES, C. A.; FREITAS, M. P.; PINHEIRO, A. C. M.; BASTOS, S. C. Chemoface: a novel free user-
friendly interface for Chemometrics. Journal of the Brazilian Chemical Society, 23, 2003-2010, 2012.
2. MATLAB. The MathWorks, Inc.: Natick, MA, USA.
3. NUNES, C. A.; GUERREIRO, M. C. Characterization of Brazilian green propolis throughout the seasons
by Headspace-GC/MS and ESI-MS. Journal of the Science of Food and Agriculture, 92, 433-438, 2012.
4. TAVARES, K. M.; PEREIRA, R. G. F.; PINHEIRO, A. C. M.; NUNES, C. A.; GUERREIRO, M. C.;
RODARTE, M. P. Mid-infrared spectroscopy and sensory analysis applied to detection of adulteration in
roasted coffee by addition of coffee husks. Qumica Nova, 35, 1164-1168, 2012.
5. DE GROOT, P. J.; POSTMA, G. J.; MELSSEN, W. J.; BUYDENS, L. M. C. Selecting a representative
training set for the classification of demolition waste using remote NIR sensing. Analytica Chimica Acta,
392, 67-75, 1999.
6. KIRALJ R.; FERREIRA M. M. C. Basic validation procedures for regression models in QSAR and QSPR
studies: theory and application. Journal of the Brazilian Chemical Society, 20, 770-787, 2009.
7. ROY, P. P.; PAUL, S.; MITRA, I.; ROY, K. On Two Novel Parameters for Validation of Predictive QSAR
Models. Molecules, 14, 1660-1701, 2009.
8. NUNES, C. A.; FREITAS, M. P. Introducing new dimensions in MIA-QSAR: a case for chemokine receptor
inhibitors. European Journal of Medicinal Chemistry, 62, 297-300, 2013.

You might also like