Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Chemometrics in Food Chemistry
Chemometrics in Food Chemistry
Chemometrics in Food Chemistry
Ebook907 pages9 hours

Chemometrics in Food Chemistry

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The issues related to food science and authentication are of particular importance for researchers, consumers and regulatory entities. The need to guarantee quality foodstuff – where the word "quality" encompasses many different meanings, including e.g. nutritional value, safety of use, absence of alteration and adulterations, genuineness, typicalness, etc. – has led researchers to look for increasingly effective tools to investigate and deal with food chemistry problems. As even the simplest food is a complex matrix, the way to investigate its chemistry cannot be other than multivariate. Therefore, chemometrics is a necessary and powerful tool for the field of food analysis and control.

For food science in general and food analysis and control in particular, there are several problems for which chemometrics are of utmost importance. Traceability, i.e. the possibility of verifying the animal/botanical, geographical and/or productive origin of a foodstuff, is, for instance, one area where the use of chemometric techniques is not only recommended but essential: indeed, at present no specific chemical and/or physico-chemical markers have been identified that can be univocally linked to the origin of a foodstuff and the only way of obtaining reliable traceability is by means of multivariate classification applied to experimental fingerprinting results.

Another area where chemometrics is of particular importance is in building the bridge between consumer preferences, sensory attributes and molecular profiling of food: by identifying latent structures among the data tables, bilinear modeling techniques (such as PCA, MCR, PLS and its various evolutions) can provide an interpretable and reliable connection among these domains. Other problems include process control and monitoring, the possibility of using RGB or hyperspectral imaging techniques to nondestructively check food quality, calibration of multidimensional or hyphenated instruments etc.

LanguageEnglish
Release dateJun 8, 2013
ISBN9780444595294
Chemometrics in Food Chemistry

Related to Chemometrics in Food Chemistry

Titles in the series (13)

View More

Related ebooks

Food Science For You

View More

Related articles

Reviews for Chemometrics in Food Chemistry

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Chemometrics in Food Chemistry - Elsevier Science

    Data Handling in Science and Technology

    Chemometrics in Food Chemistry

    28

    EDITED BY

    Federico Marini

    Department of Chemistry, University of Rome La Sapienza, Rome, Italy

    Table of Contents

    Cover image

    Title page

    Copyright

    Contributors

    Preface

    Chapter 1. Introduction

    1 Another Book on the Wall

    2 Organisation of the Book

    References

    Part I: Theory

    Chapter 2. Experimental Design

    1 Introduction

    2 Full Factorial Design 2k

    3 Plackett–Burman Designs

    4 Central Composite Design

    5 Doehlert Design

    6 D-Optimal Designs

    7 Qualitative Variables at More Than Two Levels

    8 Mixture Designs

    9 Conclusions

    References

    Chapter 3. Exploratory Data Analysis

    1 The Concept (Let Your Data Talk)

    2 Descriptive Statistics

    3 Projection Techniques

    4 Clustering Techniques

    5 Remarks

    References

    Chapter 4. Regression

    1 Introduction

    2 Multivariate Calibration

    3 Theory

    4 Validation

    5 Diagnostics and Error Measures

    6 Model Interpretation

    7 Variable Selection

    References

    Chapter 5. Classification and Class-Modelling

    1 Introduction

    2 Discriminant Classification Methods

    3 Class-Modelling Methods

    4 Conclusions

    References

    Chapter 6. Multivariate Curve Resolution Methods for Food Chemistry

    1 Introduction

    2 MCR: The Basics

    3 MCR Applied to Qualitative and Quantitative Analysis of Compounds in Food Samples

    4 MCR and Food Fingerprinting

    5 MCR for Food Processes

    6 Conclusions

    References

    Chapter 7. Multiway Methods

    1 Introduction: Why Multiway Data Analysis?

    2 Nomenclature and General Notation

    3 Parallel Factor Analysis

    4 Parallel Factor Analysis 2

    5 Tucker Models

    6 Multiway Regression

    7 Future Perspectives

    References

    Chapter 8. Robust Methods in Analysis of Multivariate Food Chemistry Data

    Notations and Abbreviations

    1 Introduction

    2 Basic Concepts in Robust Statistics

    3 Robust Modelling of Data Variance

    4 Classic and Robust Calibration

    5 Discrimination and Classification

    6 Dealing with Missing Elements in Data Containing Outliers

    7 Further Reading and Software

    References

    Part II: Applications

    Chapter 9. Hyperspectral Imaging and Chemometrics: A Perfect Combination for the Analysis of Food Structure, Composition and Quality

    Acronyms

    1 Introduction

    2 Structure of a Hyperspectral Image

    3 Hyperspectral Analysis and Chemometrics: Practical Examples

    4 Final Remarks

    References

    Chapter 10. The Impact of Chemometrics on Food Traceability

    1 Introduction

    2 Food Traceability Applications

    3 Food Authenticity Applications

    References

    Chapter 11. NMR-Based Metabolomics in Food Quality Control

    1 Introduction

    2 Methodology

    3 NMR-Base Metabolomics Applications

    References

    Chapter 12. Interval-Based Chemometric Methods in NMR Foodomics

    1 Introduction

    2 Interval-Based Methods

    3 Concluding Remarks

    References

    Index

    Copyright

    Elsevier

    The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, UK

    Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands

    First edition 2013

    Copyright © 2013 Elsevier B.V. All rights reserved

    No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher

    Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: permissions@elsevier.com. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material

    Notice

    No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made

    British Library Cataloguing in Publication Data

    A catalogue record for this book is available from the British Library

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    ISBN: 978-0-444-59528-7

    ISSN: 0922-3487

    For information on all Elsevier publications visit our web site at store.elsevier.com

    Printed and bound in Great Britain

    13  14  15  16    10  9  8  7  6  5  4  3  2  1

    Contributors

    Numbers in Parentheses indicate the pages on which the author’s contributions begin.

    José Manuel Amigo,     (265, 343), Department of Food Science, Quality and Technology, Faculty of Life Sciences, University of Copenhagen, Frederiksberg C, Denmark

    Lucia Bertacchini,     (371), Department of Chemical and Geochemical Sciences, University of Modena and Reggio Emilia, Modena, Italy

    Marta Bevilacqua,     (127, 171), Department of Chemistry, University of Rome La Sapienza, Rome, Italy

    Remo Bucci,     (171), Department of Chemistry, University of Rome La Sapienza, Rome, Italy

    Giorgio Capuani,     (411), Department of Chemistry, Sapienza University of Rome, Rome, Italy

    Marina Cocchi,     (55, 371), Department of Chemical and Geochemical Sciences, University of Modena and Reggio Emilia, Modena, Italy

    Michał Daszykowski,     (315), Department of Analytical Chemistry, Chemometric Research Group, Institute of Chemistry, The University of Silesia, Katowice, Poland

    Maurizio Delfini,     (411), Department of Chemistry, Sapienza University of Rome, Rome, Italy

    Caterina Durante,     (55, 371), Department of Chemical and Geochemical Sciences, University of Modena and Reggio Emilia, Modena, Italy

    Søren Balling Engelsen,     (449), Department of Food Science, Quality & Technology, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark

    Aoife Gowen,     (343), School of Biosystems Engineering, University College Dublin, Dublin 4, Ireland

    Anna de Juan,     (235), Department of Analytical Chemistry, Universitat de Barcelona, Martí i Franquès, Barcelona, Spain

    Riccardo Leardi,     (9), Department of Pharmacy, University of Genoa, Genoa, Italy

    Mario Li Vigni,     (55, 371), Department of Chemical and Geochemical Sciences, University of Modena and Reggio Emilia, Modena, Italy

    Andrea D. Magrì,     (171), Department of Chemistry, University of Rome La Sapienza, Rome, Italy

    Antonio L. Magrì,     (171), Department of Chemistry, University of Rome La Sapienza, Rome, Italy

    Andrea Marchetti,     (371), Department of Chemical and Geochemical Sciences, University of Modena and Reggio Emilia, Modena, Italy

    Federico Marini,     (1, 127, 171, 265), Department of Chemistry, University of Rome La Sapienza, Rome, Italy

    Idoia Martí,     (343), Analytical and Organic Chemistry Department, Universitat Rovira i Virgili, Tarragona, Spain

    Sílvia Mas,     (235), Department of Analytical Chemistry, Universitat de Barcelona, Martí i Franquès, Barcelona, Spain

    Alfredo Miccheli,     (411), Department of Chemistry, Sapienza University of Rome, Rome, Italy

    Riccardo Nescatelli,     (171), Department of Chemistry, University of Rome La Sapienza, Rome, Italy

    Morten Arendt Rasmussen,     (449), Department of Food Science, Quality & Technology, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark

    Åsmund Rinnan,     (449), Department of Food Science, Quality & Technology, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark

    Elisa Salvatore,     (371), Department of Chemical and Geochemical Sciences, University of Modena and Reggio Emilia, Modena, Italy

    Francesco Savorani,     (449), Department of Food Science, Quality & Technology, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark

    Simona Sighinolfi,     (371), Department of Chemical and Geochemical Sciences, University of Modena and Reggio Emilia, Modena, Italy

    Michele Silvestri,     (371), Department of Chemical and Geochemical Sciences, University of Modena and Reggio Emilia, Modena, Italy

    Ivana Stanimirova,     (315), Department of Analytical Chemistry, Chemometric Research Group, Institute of Chemistry, The University of Silesia, Katowice, Poland

    Alberta Tomassini,     (411), Department of Chemistry, Sapienza University of Rome, Rome, Italy

    Beata Walczak,     (315), Department of Analytical Chemistry, Chemometric Research Group, Institute of Chemistry, The University of Silesia, Katowice, Poland

    Frank Westad,     (127), CAMO Software AS, Oslo, Norway

    Preface

    For many years, food was not considered an important or even decent scientific subject. "Food belongs in the kitchen!" Those days are over and for good reasons. Food still belongs in the kitchen, but at the same time food science is an extremely challenging, interesting and rewarding area of research.

    Food is of a fundamental importance and covers complicated and cross-disciplinary aspects ranging from e.g. sensory perception, culture, nutrition, gastronomy, physics, chemistry and engineering.

    • What is the impact of seasonal variations in the raw material?

    • How will the long-term stability of cream cheese change when switching to another breed of cows?

    • How to evaluate the complex changes in aroma occurring over the course of a fermentation?

    • Is it possible to determine if an unknown toxic substance is present in a curry sauce?

    • In which way will shelf-life be affected if the ultrafiltration conditions are modified?

    • How can the Maillard reaction be controlled during cooking?

    • Can we have more timely and more accurate characterization of whether production is running smoothly?

    The above questions are difficult to answer without comprehensive and relevant information. Such information will almost invariably be multivariate in nature in order to comprehensively describe the complex underlying problems. Therefore, the need for advanced experimental planning and subsequent advanced data analysis is obvious. Chemometrics provides the necessary tools for digging into food-related problems. This book is a highly needed and relevant contribution to the food research area in this respect. The book provides an impressive, very detailed and illustrative tour de force through the chemometric landscape.

    This book will prove useful to newcomers trying to understand the field of chemometrics, for the food researcher wanting to more actively use chemometric tools in practice and to teachers and students participating in chemometrics courses.

    A recurring motto in our Department of Food Science has been

    If you think rocket science is difficult—try food science

    With this book, you can actually seriously start to unravel the deep and intricate mysteries in food science and I would like to sincerely thank Federico Marini and the many competent researchers for taking time to write this book. Enjoy!

    Rasmus Bro

    Frederiksberg, Denmark, May 2013

    Chapter 1

    Introduction

    Federico Marini*,    Department of Chemistry, University of Rome La Sapienza, Rome, Italy, *Corresponding author: federico.marini@uniroma1.it

    Abstract

    The chapter describes the motivation behind the book and introduces the role of chemometrics in food quality control and authentication. A brief description of the structure of the monograph is also provided.

    Keywords

    Chemometrics; Food chemistry; Food authentication

    1 Another Book on the Wall

    Issues related to food science and authentication are of particular importance, not only for researchers but also for consumers and regulatory entities. The need to guarantee quality foodstuff—where the word quality encompasses many different meanings, including, for example, nutritional value, safety of use, absence of alteration and adulterations, genuineness, typicalness, and so on [1]—has led researchers to look for more and more effective tools to investigate and deal with food chemistry problems. As even the simplest food is a complex matrix, the way to investigate its chemistry cannot be other than multivariate [2]. Therefore, chemometrics is a necessary and powerful tool in the field of food analysis and control [3–5].

    Indeed, since the very beginning, chemometrics has been dealing with different problems related to food quality [6–8]. Today, when considering food science in general and food analysis and control in particular, several problems can be listed in the resolution of which chemometrics can be of utmost importance and relevance. Traceability [9,10], that is, the possibility of verifying the animal/botanical, geographical and/or productive origin of a foodstuff, is, for instance, one of the issues where the use of chemometric techniques is not only recommended but essential [11]; indeed, till date, no specific chemical and/or physico-chemical markers have been identified that can be univocally linked to the origin of a foodstuff, and the only way of obtaining a reliable traceability is by application of multivariate classification to experimental fingerprinting results [12,13]. Another area where chemometrics is of particular importance is in building the bridge between consumer preferences, sensory attributes and molecular profiling of food [14,15]; indeed, by identifying latent structures among the data tables, bilinear modelling techniques (such as PCA, MCR, PLS and its various evolutions) can provide an interpretable and reliable connection among these domains. Other problems that can be listed include process control and monitoring [16], the possibility of using RGB or hyperspectral imaging techniques to non-destructively check food quality [17,18], calibration of multidimensional or hyphenated instruments [19,20,21] and so on.

    Despite these considerations, while a huge amount of the literature deals with the design of chemometric techniques and their application to different ambits of food science, a general monograph covering the main aspects of this topic as comprehensively as possible is lacking. This book aims to fill the gap, such that it can be used by both food chemists wanting to learn how chemometric techniques can help in many aspects of their work and chemometricians having to deal with food-related problems.

    2 Organisation of the Book

    The twofold scope (and the corresponding prospective audience) of the book drives the way it is conceived and organised. Indeed, the monograph is organised in two parts: a first part (Chapters 2–8) covering the theory, and a second part (Chapters 9–12) presenting some selected applications of chemometrics to hot topics in food science. As it is hoped that this book will be read and used not just by professional chemometricians, all the topics, especially the ones in the theoretical part, are covered extensively, starting from a beginner level up to an intermediate or advanced one. In the same theoretical part, the description of the methods is accompanied by a wide variety of examples taken from food science to illustrate how the different techniques can be fruitfully applied to solve real-world food-related issues.

    In particular, the first chapters of this book are suitable to be used as an introductory textbook on chemometrics or as a self-study guide, as they cover most of the principal aspects of the topic; the reader who is more interested in specific topics and/or applications can just pick the chapters that she/he prefers as each of the chapters is self-contained. As already anticipated, the first part of the book covers the theory of the main chemometric methods and each chapter is meant to be a tutorial on the specific topic. The aim of Chapter 2 is to review the rationale and strategies for the design of experiments, which constitute a fundamental step in the set-up of any kind of experimental procedure. The topics covered include screening and two-level factorial designs, multi-level designs for both qualitative and quantitative variables, and response surface methodologies. Chapter 3 presents an extensive description of the chemometric methods used for exploratory data analysis, with the attention specifically focused on principal component analysis (PCA) and data preprocessing methods. Additional topics covered include descriptive statistics and other projection methods such as multidimensional scaling and nonlinear mapping. Chapter 4 is devoted to calibration, from univariate to multivariate, and discusses extensively the strategies for model validation and interpretation. The topics covered include ordinary least squares, principal component regression, Partial least squares (PLS) regression, identification of outliers and variable selection. The aim of Chapter 5 is to provide the reader with a comprehensive description of chemometric pattern recognition tools. A distinction is provided between discriminant and modelling approaches and the most frequently used techniques (LDA, QDA, kNN, PLS-DA, SIMCA, UNEQ and density methods) are described in detail. Taken together, Chapters 2–5 cover the theory behind the most fundamental chemometric methods; on the other hand, Chapters 6–8 describe some advanced topics that have gained more and more importance during the last years. Chapter 6 is focused on multivariate curve resolution (MCR) for single data matrices and for multi-set configuration. Basic MCR theory is reviewed together with a detailed discussion of all the different scenarios in food control where this approach could be of importance. Chapter 7 presents an overview of the chemometric techniques used for the analysis of multi-way arrays, that is, the data arrays resulting from experiments in which a signal is recorded as a function of more than two sources of variation. The topics covered include methods for deconvolution/resolution (PARAFAC and PARAFAC2), data description (TUCKER) and calibration (N-PLS and multi-way covariate regression). Finally, Chapter 8 discusses robust methods, that is, methods that provide a reliable answer even when a relatively high percentage of anomalous observations are present. The topics covered include robust measures of location and scale, robust PCA and PLS, and robust classification methods.

    The second part of the book—Chapters 9–12—presents some selected applications of chemometrics to different topics of interest in the field of food authentication and control. Chapter 9 deals with the application of chemometric methods to the analysis of hyperspectral images, that is, of those images where a complete spectrum is recorded at each of the pixels. After a description of the peculiar characteristics of images as data, a detailed discussion on the use of exploratory data analytical tools, calibration and classification methods is presented. The aim of Chapter 10 is to present an overview of the role of chemometrics in food traceability, starting from the characterisation of soils up to the classification and authentication of the final product. The discussion is accompanied by examples taken from the different ambits where chemometrics can be used for tracing and authenticating foodstuffs. Chapter 11 introduces NMR-based metabolomics as a potentially useful tool for food quality control. After a description of the bases of the metabolomics approach, examples of its application for authentication, identification of adulterations, control of the safety of use, and processing are presented and discussed. Finally, Chapter 12 introduces the concept of interval methods in chemometrics, both for data pretreatment and data analysis. The topics covered are the alignment of signals using iCoshift, and interval methods for exploration (iPCA), regression (iPLS) and classification (iPLS-DA, iECVA), and the important roles they play in the emerging discipline of foodomics.

    Moreover, the book is multi-authored, collecting contributions from a selected number of well-known and active chemometric research groups across Europe, each covering one or more subjects where the group’s expertise is recognised and appreciated. This interplay of high competences represents another added value to the proposed monograph.

    References

    1. Trienekens J, Zuurbier P. Quality and safety standards in the food industry, developments and challenges. Int J Prod Econ. 2008;113:107–122.

    2. Gaonkar AG, ed. Characterization of food: emerging methods. Amsterdam, The Netherlands: Elsevier; 1995.

    3. Forina M, Lanteri S, Armanino C. Chemometrics in food chemistry. Topics Curr Chem. 1987;141:91–143.

    4. Munck L, Nørgaard L, Engelsen SB, Bro R, Andersson CA. Chemometrics in food science—a demonstration of the feasibility of a highly exploratory, inductive evaluation strategy of fundamental scientific significance. Chemometr Intell Lab Syst. 1998;44:31–60.

    5. Forina M, Casale M, Oliveri P. Application of chemometrics to food chemistry. In: Oxford, UK: Elsevier; 2009;75–128. Brown SD, Tauler R, Walczak B, eds. Comprehensive chemometrics. vol. 4.

    6. Saxsberg BEH, Duewer DL, Booker JL, Kowalski BR. Pattern recognition and blind assay techniques applied to forensic separation of whiskies. Anal Chim Acta. 1978;103:201–212.

    7. Kwan WO, Kowalski BR. Classification of wines by applying pattern recognition to chemical composition data. J Food Sci. 1978;43:1320–1323.

    8. Forina M, Armanino C. Eigenvector projection and simplified non-linear mapping of fatty acid content of Italian olive oils. Ann Chim. 1982;72:127–141.

    9. Brereton P. Preface to the special issue Food authenticity and traceability. Food Chem. 2010;118:887.

    10. Guillou C. Foreword to the special issue Food authenticity and traceability. Food Chem. 2010;118:888–889.

    11. Available from: http://www.trace.eu.org, last accessed 22.03.2013.

    12. Reid LM, O’Donnell CP, Downey G. Recent technological advances for the determination of food authenticity. Trends Food Sci Technol. 2006;17:344–353.

    13. Luykx DMAM, van Ruth SM. An overview of the analytical methods for determining the geographical origin of food products. Food Chem. 2008;107:897–911.

    14. Naes T, Risvik E, eds. Multivariate analysis of data in sensory science. Amsterdam, The Netherlands: Elsevier; 1996.

    15. Naes T, Brockhoff PM, Tomic O. Statistics for sensory and consumer science. New York, NY: John Wiley and Sons; 2010.

    16. Bro R, van den Berg F, Thybo A, Andersen CM, Jørgensen BM, Andersen H. Multivariate data analysis as a tool in advanced quality monitoring in the food production chain. Trends Food Sci Technol. 2002;13:235–244.

    17. Pereira AC, Reis MS, Saraiva PM. Quality control of food products using image analysis and multivariate statistical tools. Ind Eng Chem Res. 2009;48:988–998.

    18. Gowen AA, O’Donnell CP, Cullen PJ, Downey G, Frias JM. Hyperspectral imaging—an emerging process analytical tool for food quality and safety control. Trends Food Sci Technol. 2007;18:590–598.

    19. Amigo JM, Skov T, Bro R. ChroMATHography: solving chromatographic issues with mathematical models and intuitive graphics. Chem Rev. 2010;110:4582–4605.

    20. Pierce KM, Kehimkar B, Marney LC, Hoggard JC, Synovec RE. Review of chemometric analysis techniques for comprehensive two dimensional separations data. J Chromatogr A. 2012;1255:3–11.

    21. de Juan A, Tauler R. Factor analysis of hyphenated chromatographic data—exploration, resolution and quantification of multicomponent systems. J Chromatogr A. 2007;1158:184–195.

    Part I: Theory

    Chapter 2 Experimental Design

    Chapter 3 Exploratory Data Analysis

    Chapter 4 Regression

    Chapter 5 Classification and Class-Modelling

    Chapter 6 Multivariate Curve Resolution Methods for Food Chemistry

    Chapter 7 Multiway Methods

    Chapter 8 Robust Methods in Analysis of Multivariate Food Chemistry Data

    Chapter 2

    Experimental Design

    Riccardo Leardi¹,    Department of Pharmacy, University of Genoa, Genoa, Italy, ¹Corresponding author: riclea@difar.unige.it

    Abstract

    In this chapter, some of the most commonly used designs (e.g. Full Factorial, Plackett–Burman, Central Composite, Doehlert, D-Optimal, qualitative variables at more than two levels, mixture) will be presented. It will be shown how it is often possible to obtain them by hand, without using any software. How to compute the coefficients of the model and their significance will also be shown. The different designs will be illustrated and commented by means of real examples.

    Keywords

    Experimental Design; Factorial Design; Plackett–Burman Design; Central Composite Design; Doehlert Design; D-Optimal Design; Qualitative variables; Mixture design

    1 Introduction

    The first paper about experimental design was published by Fisher almost 80 years ago [1]. Unfortunately, this huge time span has not been sufficient to make this approach as common as it should be (better, it should be the only valid approach). The great majority of people still continue to study and ‘optimize’ their problems one variable at a time (OVAT). This can be very well found in many papers, with the titles of the subsections proudly remarking this: ‘3.1. Effect of pH’, ‘3.2. Effect of temperature’, ‘3.3. Effect of flow’ and so on. The widespread ignorance of experimental design makes it possible to have papers like these published without any problem, in spite of the fact that the approach is completely wrong (they can be published simply because the referees reviewing them still believe that studying one variable at a time is the correct approach).

    Instead, the optimization performed OVAT does not guarantee at all that the real optimum will be hit. This is because this approach would be valid only if the variables to be optimized were totally independent from each other, a condition that very seldom happens to be true.

    By studying OVAT the interactions among variables will be totally missed.

    What is an interaction? Let us try to explain this concept with some examples taken from everyday life.

    If somebody asks you what is the best gear in which to ride a bike, your reply would surely be: ‘It depends.’

    ‘What is the best cooking time for a cake?’ ‘It depends’.

    ‘What is the best waxing for your skis?’ ‘It depends’.

    ‘What is the best setup for a racing car?’ ‘It depends’.

    This means that you do not have ‘the best’ gear, but the best gear depends on the levels of the other factors involved, such as the slope of the road, the direction and the speed of the wind, the quality of the cyclist, how tired the cyclist is and the speed he wants to maintain.

    Similarly, when baking a cake the best time depends on the temperature of the oven, the best waxing depends on the conditions of the weather and of the snow, the best setup for a racing car depends on the circuit and so on.

    Every time your reply is ‘it depends’ it means that you intuitively recognize that the effect of the factor you are talking about is not independent of the levels of the other factors; this means that an interaction among those factors is relevant and that not taking it into account can give terrible results.

    So, it is evident that the housewife knows very well that there is a strong interaction between cooking time and oven temperature, a cyclist knows very well that there is an interaction between the gear and the surrounding conditions and so on.

    Of course, you will never hear a housewife using the word ‘interaction’, but her behaviour demonstrates clearly that she intuitively understands what an interaction is.

    Could you imagine somebody looking for the best gear on a flat course (i.e. changing gear while keeping all the remaining variables constant) and then using it on any other course simply because the first set of experiments demonstrated that it was the best?

    Well, chemists optimizing their procedures OVAT behave in the very same way!

    Why do the very people who answer ‘it depends’ on a lot of questions about their everyday life never give the same answer when entering a lab and working as chemists?

    Why, when looking for the best pH, do chemists usually behave like the foolish cyclist described earlier, changing the pH and keeping constant all the remaining variables instead of thinking that the ‘best pH’ may depend on the setting of the other variables?

    While in the OVAT approach the only points about which something is known are the points where the experiments have been performed, the experimental design, by exploring in a systematic way the whole experimental domain, also allows to obtain a mathematical model by which the value of the response in the experimental domain can be predicted with a precision that, provided that the experimental variability is known, can be estimated even before performing the actual experiments of the design and that only depends on the arrangement of the points in space and on the postulated model (this will be explained in greater detail later on). This means going from a local knowledge to a global knowledge.

    By comparing the information obtained by an OVAT approach with the information obtained by an experimental design we can say that:

    • The experimental design takes into account the interactions among the variables, while the OVAT does not;

    • The experimental design provides a global knowledge (in the whole experimental domain), while the OVAT gives a local knowledge (only where the experiments have been performed);

    • In each point of the experimental domain, the quality of the information obtained by the experimental design is higher than the information obtained by the OVAT;

    • The number of experiments required by an experimental design is smaller than the number of experiments performed with an OVAT approach.

    Summarizing, it should be clear that:

    • The quality of the results depends on the distribution of the experiments in the experimental domain;

    • The optimal distribution of the experiments depends on the postulated model;

    • Given the model, the experimental limitations and the budget available (= maximum number of experiments), the experimental design will detect the set of experiments resulting in the highest possible information.

    People should also be aware that building the experimental matrix (i.e. deciding which experiments must be performed) is the easiest part of the whole process, and that in the very great majority of the cases it can be performed by hand, without any software.

    What is difficult is rather the definition of the problem: Which are the factors to be studied? Which is the domain of interest? Which model? How many experiments?

    To perform an experimental design, the following five steps must be considered:

    1. Define the goal of the experiments. Though it can seem totally absurd, many people start doing experiments without being clear in their minds as to what the experiments are done for. This is a consequence of the general way of thinking, according to which once you have the results you can anyway extract information from them (and the more experiments have been performed, the better).

    2. Detect all the factors that can have an effect. Particular attention must be given to the words ‘all’ and ‘can’. This means that it is not correct to consider a predefined number of factors (e.g. let us take into account only three factors), and saying that a factor ‘can’ have an effect is totally different from saying that we think that a factor has an effect. One of the most common errors is indeed that of performing what has been defined a ‘sentimental screening’, often based only on some personal feelings rather than on scientific facts.

    3. Plan the experiments. Once the factors have been selected, their ranges have been defined and the model to be applied has been postulated, this step requires only a few minutes.

    4. Perform the experiments. While in the classical way of thinking this is the most important part of the process, in the philosophy of experimental design doing the experiments is just something that cannot be avoided in order to get results that will be used to build the model.

    5. Analyse the data obtained by the experiments. This step transforms data into information and is the logical conclusion of the whole process.

    Very often one single experimental design does not lead to the solution of the problem. In those cases the information obtained at point 5 is used to reformulate the problem (removal of the non-significant variables, redefinition of the experimental domain, modification of the postulated model), after which one goes back to step 3.

    As the possibility of having to perform more than one single experimental design must always be taken into account, it is wise not to invest more than 40% of the available budget in the first set of experiments.

    2 Full Factorial Design 2k

    The 2k Factorial Designs are the simplest possible designs, requiring a number of experiments equal to 2k, where k is the number of variables under study. In these designs each variable has two levels, coded as - 1 and + 1, and the variables can be either quantitative (e.g. temperature, pressure, amount of an ingredient) or qualitative (e.g. type of catalyst, type of apparatus, sequence of operations).

    The experimental matrix for k = 3 is reported in Table 1, and it can be seen that it is quite easy to build it also by hand. The matrix has eight rows (2³, each row corresponding to an experiment) and three columns (each column corresponding to a variable); in the first column the - 1 and + 1 alternate at every row, in the second column they alternate every second row and in the third column they alternate every fourth row. The same procedure can be used to build any Factorial Design, whatever the number of variables.

    Table 1

    A 2³ Factorial Design (Experimental Matrix)

    From a geometrical point of view, as shown in Figure 1, a Factorial Design explores the corners of a cube (if the variables are more than three, it will be a hypercube; our mind will no more be able to visualize it, but from the mathematical point of view nothing will change).

    Figure 1 Geometrical representation of a 2³ Factorial Design.

    Contrary to what happens in the OVAT approach, in which variable 1 is changed while variables 2 and 3 are kept constant, in the Factorial Design variable 1 is changed while variables 2 and 3 have different values (of course the same happens for all the variables).

    This means that the Factorial Design is suitable for estimating the interactions between variables (i.e. the difference in changing variable 1 when variable 2 is at its higher level or at its lower level and so on).

    The mathematical model is therefore the following:

    As a consequence, with just eight experiments it is possible to estimate a constant term, the three linear terms, the three two-term interactions and the three-term interaction.

    To illustrate the application of a Factorial Design the following example is reported [2].

    A chemical company was producing a polymer, whose viscosity had to be > 46.0 × 10³ mPa s. As a consequence of the variation of a raw material, they got a final product rather different from the ‘original’ product (being produced since several years), with a viscosity below the acceptable value. Of course, this was a very big problem for the company, as the product could not be sold anymore. The person in charge of the product started performing experiments OVAT, but after about 30 experiments he could not find any acceptable solution.

    It was then decided to try with an experimental design.

    At first, three potentially relevant variables were detected: they were the amounts of three reagents (let us call them A, B and C). The original formulation was 10 g of A, 4 g of B and 10 g of C.

    Therefore, it was decided to keep this experimental setting as a starting point and to explore its surroundings. As the number of possible experiments was quite limited, it was decided to apply a 2³ Factorial Design, requiring a total of eight experiments.

    The next step was to define the levels of the variables and to write down the experimental plan.

    As mentioned earlier, it had been decided to keep the original recipe as the centre point and to set the levels - 1 and + 1 of each variable symmetrically to the original value (9 and 11 for reagents A and C, 3.6 and 4.4 for reagent B), leading to the experimental plan reported in Table 2.

    Table 2

    The Experimental Plan for the Polymer Factorial Design

    As it can be seen, while the experimental matrix contains the coded values (- 1 and + 1), the experimental plan reports the real values of the variables and therefore can be understood by anybody.

    A very important point is that the experiments must be performed in random order, in order to avoid the bias related to possible systematic effects.

    Let us suppose we are doing our experiments on a hot morning in July, starting at 8 a.m. and finishing at 2 p.m., following the standard order reported in Table 2. Let us also suppose that, for some unknown and unsuspected reason, the outcome of our experiments increases with external temperature, while none of the variables under study has a significant effect. As a result, the responses of the eight experiments, instead of being the same (inside the experimental error), will regularly increase. We would therefore conclude, just looking at the results, that reagent C has a very relevant positive effect (the four best experiments are all the four experiments performed when it was at a higher level), reagent B has a moderate positive effect and reagent A has a smaller but constant positive effect. This happens because an uncontrolled and unsuspected systematic trend is confounded with the effect of the variables. Instead, if the experiments are performed in random order, the same systematic and uncontrolled variations (if any) will be ‘spread’ equally among all the variables under study.

    After having performed the eight experiments and having recorded the responses (Table 3), it was immediately clear that in several cases the viscosity was much higher than the minimum acceptable value.

    Table 3

    Experimental Design, Experimental Plan and Responses of the Polymer Factorial Design

    How is it possible not to have found those solutions in more than 30 previous experiments?

    Before computing any coefficient, let us look at the results shown in Figure 2.

    Figure 2 Spatial representation of the results of the polymer Factorial Design.

    It can be clearly seen that all the experiments performed at a lower value of reagent A led to responses greater than the threshold value. It can therefore be said that by lowering the amount of A an increase of the response is obtained.

    In what concerns reagent B, it can be seen that its increase leads to a decrease of the response when reagent C is at a lower level and to an increase of the response when reagent C is at a higher level. This is a clear example of interaction between two variables. The same interaction is detected when taking into account reagent C. It can be seen that an increase of reagent C improves the response when reagent B is at a higher level, while a worsening occurs when reagent B is at a lower level.

    It should be clear now that the experiments performed by following an experimental design are usually very few but highly informative, and therefore some information can be obtained just by looking at the data.

    To compute the coefficients, we must go from the experimental matrix to the model matrix (Table 4). While the former has as many rows as experiments and as many columns as variables, the latter has as many rows as experiments and as many columns as coefficients and can be easily obtained in the following way: the first column (b0) is a column of + 1, the columns of the linear terms are the same as the experimental matrix, the columns of the interactions are obtained by a point-to-point product of the columns of the linear terms of the variables involved in the interaction (e.g. the column b12 of the interaction between variables 1 and 2 is obtained by multiplying point to point the column b1 by the column b2). If quadratic terms were also present, their columns would be obtained by computing the square of each element of the corresponding linear term.

    Table 4

    Model Matrix and Computation of the Coefficients of the Polymer Factorial Design

    Computing the coefficients is very simple (again, no software required!). For each of them, multiply point to point the column corresponding to the coefficient that has to be estimated by the column of the response, and then take the average of the results. For instance, for estimating b1 (the linear term of X1), just calculate (- 51.8 + 51.6 - 51.0 + 42.4 - 50.2 + 46.6 - 52.0 + 50.0)/8 = - 1.8.

    An interesting thing to notice is that, as every column of the model matrix has four - 1 and four + 1, every coefficient will be computed as half the difference between the average of the four experiments with positive sign and the average of the four experiments with negative sign. This means that each coefficient is computed with the same precision, and that this precision, being the difference of two averages of four values, is much better than that of an OVAT experiment, where the difference between two experiments (one performed at higher level and one performed at lower level) is usually computed. Once more, it can be seen how the experimental design can give much more information (the interaction terms) of much higher quality (higher precision of the coefficients).

    The following model has been obtained:

    As eight coefficients have been estimated with eight experiments (and therefore no degrees of freedom are available) and as the experimental variability is not known, it is impossible to define a statistical significance of the coefficients. Anyway, the linear term of X1 (reagent A) and the interaction X2–X3 (reagent B–reagent C) have absolute values larger than the other ones.

    The negative coefficient of X1 indicates that by increasing the amount of reagent A, a decrease of the viscosity is obtained, and therefore better results are obtained by reducing its amount. As X1 is not involved in any relevant interaction, we can conclude that this effect is present whatever the values of the other two reagents.

    In what concerns the interaction of reagent B–reagent C, it can only be interpreted by looking at the isoresponse plot shown in Figure 3. As we are plotting the response on the plane defined by two variables (think of a slice of the cube depicted in Figure 1), we must define the level of the third variable (reagent A) at which we want to represent the response (i.e. where to cut the slice). The clear effect of reagent A (the lower, the better) leads us to the choice of setting the value of X1 at its lower level (- 1, corresponding to 9 g).

    Figure 3 Isoresponse plot of the polymer Factorial Design.

    The geometrical shape of a linear model without interactions is a plane (the isoresponse lines are parallel); if relevant interactions are present, it becomes a distorted plane (the isoresponse lines are not parallel). This is the case of the response surface on the plane of reagent B–reagent C. By looking at the plot, it can be seen that an increase of reagent B decreases viscosity when reagent C is at its lower level, while it has the opposite effect when reagent C is at its higher level. In the same way, an increase of reagent C decreases viscosity when reagent B is at its lower level, while it has the opposite effect when reagent B is at its higher level.

    Looking at the plot, it can also be understood why the OVAT approach did not produce any good result. If you go to the centre point (corresponding to the original formulation) and change the amount of either reagent B or reagent C (but not both at the same time), you will realize that, whatever experiment you will do, nothing will change. Instead, owing to the strong interaction, you only have relevant variations when you change both variables at the same time.

    Two combinations produce the same response: 3.6 g of reagent B and 9 g of reagent C and 4.4 g of reagent B and 11 g of reagent C. As a higher amount of reagents increases the speed of the reaction, and therefore the final throughput, the latter has been selected and therefore the best combination is 9 g of reagent A, 4.4 g of reagent B and 11 g of reagent C. All the experiments were performed at lab scale, and therefore this formulation had to be tested at the plant. When doing it, the results obtained in the lab were confirmed, with a viscosity in the range 50.0–52.0 × 10³ mPa s, well over the acceptability value.

    Happy but not totally satisfied, the person performing the experimental design tried one more experiment. The results of the experimental design showed that a decrease of reagent A was leading to better products, and that this variable was not involved in interactions with the other variables.

    Of course, this behaviour was demonstrated only inside the experimental domain, but it could have been worthwhile to check if the effect was the same also outside it. The most logical development would have been to do a further experimental design centred on the new formulation, but she did not have enough time to do eight more experiments. So, she just tried to further reduce reagent A, and she tested the formulation with 7 g of reagent A, 4.4 g of reagent B and 11 g of reagent C. This experiment was a total success, as the product obtained at the plant had a viscosity in the range 55.0–60.0 × 10³ mPa s, well above the acceptable value.

    Of course, everybody in the company was very happy with the result—everybody except one person. Can you guess who? It was the expert in charge of the problem, who could not accept that somebody else could succeed with just nine experiments where he totally failed, in spite of having performed a huge number of experiments.

    One more comment: the previous example is not an optimization.

    Probably, if more experiments would have been performed with more experimental designs, even better results could have been obtained. Anyway, the immediate goal of the company was not to find the optimum, but rather to get out of an embarrassing situation and to find a commercially valid solution as fast as possible, and the Factorial Design, the simplest of all the experimental designs, allowed getting a substantial improvement with a very limited experimental effort.

    The main problem with the previous design was that, as there were no degrees of freedom and no previous estimate of the experimental variable was available, it was not possible to determine which coefficients were statistically significant.

    Furthermore, as in a 2k Factorial Design each variable has two levels, only linear models (with interactions) can be estimated. In order to use them as predictive models they must be validated. To do that, an experiment (or, better, a set of experiments) is performed at the centre point. The experimental response is then compared with the predicted response (corresponding to the b0 coefficient). If the two values are not significantly different, then the model is said to be validated and therefore it can be used to predict the outcome of the experiments in the whole experimental domain. It has to be well understood that validating a model does not mean demonstrating that it is true; instead, validating a model means that it has not been possible to demonstrate that it is false. It is a subtle, but very relevant difference (the same between being acquitted because it has been demonstrated that you are not guilty or being acquitted because it was not possible to demonstrate that you are guilty).

    A group of crystallographers at NASA was interested in studying the effect of three variables (amount of precipitant, degree of supersaturation, amount of impurities) on the growth of the crystals of a protein [3]. The goal of the study was to obtain the largest possible crystal, and the measured response (to be minimized) was the logarithm of the average number of crystals obtained in different wells (the lower the number, the greater the crystals). As a high variability was expected, each experiment had been run in duplicate; this also allowed a better estimate of the experimental variance. In order to validate the model, a centre point had also been added. The total number of experiments was 18, much fewer than what they were used to doing. Table 5 shows the experimental design, the experimental plan and the responses.

    Table 5

    Experimental Design, Experimental Plan and Responses of the NASA Factorial Design

    The resulting model was the following:

    For each experiment two replicates were available, and therefore the experimental standard deviation could be computed as pooled standard deviation from the nine pairs of replicates. This value was 0.125, with nine degrees of freedom (one from each pair).

    The model matrix for this design is reported in Table 6 (it has to be noticed that it has only 16 rows, because the two experiments at the centre point are only used for validation, and are not taken into account for computing the coefficients).

    Table 6

    Model Matrix of the NASA Factorial Design

    The model matrix is commonly denoted as X. By premultiplying it by its transposed and then doing the inverse of this product the dispersion matrix is obtained (D = (XX)- 1). The dispersion matrix is a square matrix having as many rows and as many columns as coefficients in the model (eight in our case, see Table 7). When multiplied by the experimental variance, the diagonal terms give the variance of the coefficients, while the extradiagonal terms give the covariance of the coefficients.

    Table 7

    Dispersion Matrix of the NASA Factorial Design

    The fact that the dispersion matrix is diagonal means that there is no covariance among the coefficients, and therefore all of them can be computed independently from each other (it is an orthogonal design).

    It can also be seen that all the elements of the diagonal are the same, meaning that all the coefficients are estimated with the same precision. This is not a surprise, because, as we have previously seen, the estimation of the coefficients of a Factorial Design is performed in the same way for all of them (it is always the average of the response vector multiplied point to point by the corresponding vector of the model matrix, having as many ‘+1’ as ‘-1’ terms). More in detail, their value is 0.0625, which is 1/16. Generally speaking, the 2k Factorial Designs in which all the experimental points have the same number of replicates are orthogonal designs producing a diagonal dispersion matrix with the diagonal terms being equal to 1/(number of experiments). It is clear now how (and how much) performing replicates improves the quality of the design by decreasing the standard deviation (and therefore the confidence interval) of the coefficients. And once more it has to be noted that every calculation we have done till now does not require any software.

    As previously said, the variance of the coefficients can be computed by multiplying the experimental variance by the terms on the diagonal of the dispersion matrix. In our case, the standard deviation of the coefficients will be sqrt(0.125² ∗ 0.0625) = 0.031. As the experimental variance has been estimated with nine degrees of freedom, the corresponding values of t are 2.26, 3.25 and 4.78 for p = 0.05, 0.01 and 0.001, respectively. Therefore, the semi-amplitude of the confidence interval is 0.07, 0.10 and 0.15 for p = 0.05, 0.01 and 0.001. Each coefficient can now be given its significance level, and the model can be written accordingly:

    (the level of significance is indicated according to the usual convention: ∗p < 0.05, ∗∗p < 0.01,

    Enjoying the preview?
    Page 1 of 1