You are on page 1of 9

Chemometrics Applications in Biotech Processes: A Review

Anurag S. Rathore, Nitish Bhushan, and Sandip Hadpe


Dept. of Chemical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India

DOI 10.1002/btpr.561
Published online February 28, 2011 in Wiley Online Library (wileyonlinelibrary.com).

Biotech unit operations are often characterized by a large number of inputs (operational
parameters) and outputs (performance parameters) along with complex correlations amongst
them. A typical biotech process starts with the vial of the cell bank, ends with the final prod-
uct, and has anywhere from 15 to 30 such unit operations in series. The aforementioned pa-
rameters can impact process performance and product quality and also interact amongst
each other. Chemometrics presents one effective approach to gather process understanding
from such complex data sets. The increasing use of chemometrics is fuelled by the gradual
acceptance of quality by design and process analytical technology amongst the regulators
and the biotech industry, which require enhanced process and product understanding. In
this article, we review the topic of chemometrics applications in biotech processes with a
special focus on recent major developments. Case studies have been used to highlight some
of the significant applications. V
C 2011 American Institute of Chemical Engineers Biotechnol.

Prog., 27: 307–315, 2011


Keywords: chemometrics, MVDA, multivariate data analysis, bioprocessing

Introduction process analyzers. PCA has been used to transform 12 cross-


correlated variables of an activated sludge wastewater treat-
Chemometrics as a separate field has grown considerably ment into three uncorrelated principal components, which
in the past 4 decades. One of the popular definitions is that were able to account for 78% of the system variability. This
from Wold—‘‘How to get chemically relevant information enabled easier analysis, monitoring, and diagnosis of the sys-
out of measured chemical data, how to represent and display tem.6 Combination of visible (Vis) and NIR spectroscopy
this information, and how to get such information into and chemometrics has been used for discrimination between
data.1’’ Chemometrics is being increasingly used in both ba- samples of Australian commercial white wines of different
sic research and applied scientific fields and has enabled the varietal origins—Chardonnay and Riesling.7 Models devel-
diagnostic evaluation of parameter interactions that were pre- oped using PCA, PCR, and discriminant PLS regression
viously undefined. For complex datasets, univariate or bivari- gave an excellent discrimination between samples of the two
ate analysis is often inefficient and is likely to result in varietal origins under consideration with an accuracy up to
misinterpretation of data.2–4 Use of projection methods can 98%. These models could be used by the wine industry for
effectively deal with challenges such as multidimensionality identification of white wine varieties or their blends. FTIR-
of the data set, multicollinearity, missing data, and variabili- PCA has been used to study oxidation of lubricating base
ty from experimental error and noise.5 oils.8 The principal components generated were able to
The chemical industry was early in recognizing and adopt- explain 99.99% of the variance, with the second PC showing
ing chemometrics as a quick and economical method of that iron favored formation of alcohols and esters and thus
extracting real-time information from data and, thus, leading influenced the oxidation process. Applications of chemomet-
to improved process monitoring and control. Visible spec- rics in pattern recognition and multivariate calibration have
troscopy, near-infrared (NIR) spectroscopy, mid-infrared been shown to result in greater profits for industry because
(MIR) spectroscopy, nuclear magnetic resonance (NMR) of better process control, faster verification of raw material
spectroscopy, and Fourier transform infrared (FTIR) spec- identification and quality, and faster analysis of wastewater.9
troscopy are some of the commonly used process analyzers Chemometrics has also been applied for authentication of
that have been used in the chemical industry. Principal com- meat products, on-line estimation of the carcinogenic poten-
ponent analysis (PCA), partial least squares (PLS) regression, tial of lubricant base oil, and rapid analysis of a gaseous
principal component regression (PCR), canonical variable effluent from a heterogeneously catalyzed reaction.10–12 As
analysis (CVA), and modified soft independent modeling of demonstrated by the aforementioned applications, the combi-
the class analogy (SIMCA) are some of the statistical tools nation of chemometrics and a suitable process analyzer has
that have been used to facilitate analysis and modeling of been proven beneficial for the chemical industry.
the abundant data that are provided by the aforementioned The use of chemometrics in the pharmaceutical industry
has been relatively more recent with a lot of work being
Correspondence concerning this article should be addressed to A. S. done in the last 2 decades. NIR spectroscopy has emerged as
Rathore at asrathore@biotechcmz.com. the analyzer of choice for a wide range of applications. NIR,

C 2011 American Institute of Chemical Engineers


V 307
308 Biotechnol. Prog., 2011, Vol. 27, No. 2

in combination with PCA and SIMCA, has been used for tured at different scales using an identical process. A com-
routine testing of packed pharmaceutical substances directly bined NIR-PCA approach made it possible to fingerprint the
in warehouses using a fiber optic probe.13 The proposed raw materials and to distinguish between the good and poor
approach allowed for measurements through closed polyeth- performing media lots.
ylene bags, and a trichotomy classification procedure was In the following sections, we review chemometrics appli-
proposed, which allowed an operator to identify raw materi- cations in upstream and downstream processing of biotech
als of satisfactory quality. NIR, in combination with PCA products. It will be evident that because of the complexity
and SIMCA, has also been used for the identification of present in most biotech unit operations, implementation of
counterfeit drugs.14 The tool was capable of identifying quality by design (QbD) and process analytical technology
subtle alterations in drug composition with 100% accuracy (PAT) initiatives will require an increased use of chemomet-
and could be used as a tool for rapid, nondestructive testing. ric tools and approaches.30–35 We wish to provide to the
Other NIR-based applications include analysis of high-shear readers a summary of major research that has been done on
granulation, characterization and determination of azithromy- this topic recently.
cin polymorphs, and determination of the crystalline form
present in amorphous miokamycin.15–17 UV–vis spectros-
copy, attenuated total reflectance-Fourier transform infrared Chemometrics Applications in Upstream Processing
(ATR-FTIR) spectroscopy, infrared imaging, and Raman Chemometric tools have been used in the cell culture
spectroscopy are some of the other commonly used process operations in the last decade as summarized in Table 1. PCA
analyzers in the pharmaceutical industry. NIR transmittance has been used for detection and diagnosis of abnormal pro-
and Raman spectroscopy with PLS have been used for the cess conditions in an industrial fed-batch cell culture process.
prediction of active substance content in tablets.18 Successful The model was successfully able to detect abnormal process
calibrations were developed with prediction errors less than conditions, which resulted from three known fault types,
3.7% and comparable to the error of the chromatographic namely irregular thermal heating, elevated dissolved oxygen
reference method. PLS of absorbance spectra has been used values, and large variation in agitation.26 PLS calibration
for simultaneous determination of benzyl alcohol and diclo- models of NIR spectra have been utilized for the measure-
fenac in pharmaceutical formulations.19 The proposed ment of glucose, lactate, glutamine, and ammonia in
method was found to be simple, precise, inexpensive, requir- undiluted serum-based cell culture media.37 Robust, analyte-
ing no complex pretreatment, and thus an effective means specific models were generated, and the low values of stand-
for quality control of pharmaceutical products. ard errors of prediction for each analyte demonstrate that the
The last decade has seen a flurry of activities in the area models can be used to (off-line) determine the important nu-
of development of chemometrics applications for the bio- trient and byproduct content in a serum-based cell culture
pharmaceutical processes.20,21 They include analysis of NIR medium. A novel PLS approach called evolving PLS has
spectral information for an antibiotic production process, been compared with the traditional PLS using data from an
multivariate statistical process monitoring for processing of industrial fed-batch mammalian cell culture process for pre-
pharmaceutical granules, the assessment of seed inoculum diction of intermediate and final quality variable values.40
quality from a manufacturing process, and development of Use of in situ 2D fluorometry in combination with chemo-
an integrated on-line multivariate statistical process monitor- metrics has been evaluated for monitoring the concentration
ing, product attributes prediction, and fault diagnosis frame- of viable cells and the concentration of recombinant proteins
work for a fed-batch penicillin fermentation.22–25 A flexible in mammalian cell culture.42 PCA was used to filter the large
process monitoring method has been applied for analysis of volumes of redundant spectral data, while PLS correlated the
pilot plant cell culture data for fault detection and diagno- reduced data with the target state variables. Both viable cells
sis.26 A PCA model was constructed from 19 batches, and density and glycoprotein concentration were accurately esti-
the model was shown to successfully detect abnormal pro- mated, which strongly suggests that the combination of 2D
cess conditions and diagnose root causes. Feasibility of using fluorometry with suitable chemometric techniques is a con-
chemometrics for supporting key activities required for suc- sistent technique for monitoring of a cell culture medium.
cessful manufacturing of biopharmaceutical products, includ- Table 2 presents a summary of chemometrics applications
ing scale-up, process comparability, process characterization, targeting the fermentation unit operation. FTIR combined
and fault diagnosis, has been examined.27 Representative with multiple linear regression has been demonstrated to be
data from small-scale (2 L) and large-scale (2,000 L) cell useful for measurement of fermentation components includ-
culture batches were analyzed in this study. Scores plots, ing proteins, polysaccharides, lipids, and microbes that are
loadings plots, and variable importance for the projection otherwise difficult to measure using sensors.51 The resulting
(VIP) plots were utilized for assessing scale-up and compara- calibration model could be used for measuring component
bility of the cell culture process. Batch control charts were concentrations. A combination of attenuated total reflec-
also shown to be useful for fault diagnosis during routine tance-mid-infrared spectroscopy (ATR-MIR) and PLS has
manufacturing. A subsequent publication examined the use- been used to monitor the concentrations of key analytes in a
fulness of chemometrics for root cause analysis to identify Streptomyces clavuligerus bioprocess for the synthesis of
scale-up differences and parameter interactions that clavulanic acid.45 Quantitative, at-line models having low
adversely impacted cell culture process performance.28 We prediction error were developed and validated, demonstrating
will discuss this application in more detail in the next sec- the utility of ATR-MIR along with chemometrics for real-
tion. In an even more recent publication, NIR-PCA was time monitoring of complex fermentation process. Utility of
effectively used for screening of lots of basal medium pow- including seed-quality information into data-based models
ders based on their impact on process performance and prod- for estimation of final productivity in an industrial antibiotic
uct attributes for a cell culture process.29 These lots had fermentation process has been examined.24 Multiway PCA
identical composition as per the supplier and were manufac- and PLS regressions were applied to assess the seed quality
Biotechnol. Prog., 2011, Vol. 27, No. 2 309

Table 1. Chemometrics Applications in Mammalian Cell Culture Unit Operation of Biotech Processes
Application Process Analyzer Chemometric Tool Observations
Development of rapid peptone screening NMR spectroscopy PCA and PLS NMR-PLS was used to analyze
method to optimize the efficiency and spectral data for prediction of cell
consistency of protein production culture titer for a given peptone lot.36
Fault detection and diagnosis for an PCA Model successfully detected abnormal
industrial fed-batch process process conditions and diagnosed root
cause for process underperformance.26
Chemometric approach for root PLS Model successful in identifying the root
cause analysis cause and designing experimental
conditions to demonstrate and correct it.28
Measurement of multiple analytes in undiluted NIR spectroscopy PLS Robust, analyte-specific models were
serum-based cell culture media generated with low prediction errors.37
Measurement of glucose and glutamine in NIR spectroscopy PLS Accurate measurements were
insect cell culture media possible with low error values.38
Routine screening of cell culture media used Fluorescence Multiway robust PCA, Robust methods could identify compositional
in industrial biotechnology spectroscopy NPLS-DA, and NPLS changes and predict product yield.39
Use of chemometrics in data analysis PLS Chemometrics used for scale-up,
comparability, process characterization,
and fault diagnosis.27
Prediction of product titer in an industrial PLS, ‘‘evolving’’ PLS Method was able to accurately predict
fed-batch cell culture process both intermediate and final quality
measurements
as well as detect process faults.40
Noninvasive glucose monitoring NIR spectroscopy PLS A successful multivariate calibration
model was built for glucose
detection with very low error values.41
Monitoring concentration of viable cells and In situ 2D fluorometry PCA, PLS Accurate measurements over a wide range of
recombinant proteins reactor operating conditions.42

Table 2. Chemometrics Applications in Microbial Fermentation Unit Operation of Biotech Processes


Application Process Analyzer Chemometric Tool Observations
Yeast fermentations Mid-infrared (MIR) PLS Prediction of concentrations of
spectroscopy glucose and ethanol.43
Process and quality control of FT-MIR and PCR and PLS High prediction errors with Raman
fermentative production of ethanol FT-Raman spectroscopy. MIR could be used
spectroscopy for rapid detection of analytes.44
Complex antibiotic fermentation Attenuated total PLS At-line models with low prediction error
reflectance-MIR values for monitoring of ammonium,
spectroscopy glucose, methyl oleate, and biomass.45
Influence of seed quality on productivity Multiway PCA Model could distinguish between high
of an industrial antibiotic fermentation and PLS and low productivity using seed fermentation
data from both pilot and production scales.24
Automated production support in an PCA, multiway Algorithm provided concise indicators of
industrial fermentation system PCA, ‘‘moving process faults and assisted operators by
window’’ PCA taking corrective measures.46
Monitoring and control of fed-batch PLS Accurate inference of difficult-to-measure
fermentation processes quality variables; detection and isolation of
fault conditions.47
On-line batch fermentation NIR PCA, PLS, MSPC Easy and efficient identification of abnormal
process monitoring charts fermentation runs at an early stage of the
fermentation.48
Monitoring of industrial MPCA, MPLS Detection of dissimilar batches and irregular
fermentation processes quality variables, prediction of final
product concentration.49
’’ NMR PCA First two principal components accounted for
over 99% of the sample variance. Loading
plot predicts the analyte variation.50
Solid-state fermentations FTIR Multiple linear Standard errors of prediction were less than 5%
regression for measurement of proteins, polysaccharides,
lipids, and microbes.51
Improvement of the fermentation process Gas chromatography Artificial neural Prediction of accurate optimal conditions
of docosahexaenoic acid production NETWORK (ANN) without any experimentation.52
Rapid analysis of the expression of Mass spectrometry, ANN, PLS regression Quantitation of expression of a heterologous
heterologous proteins in Escherichia coli FTIR spectroscopy protein directly in fermentation broths.53
310 Biotechnol. Prog., 2011, Vol. 27, No. 2

Figure 1. Illustration of using chemometrics for root cause analysis for a mammalian cell culture step.
A: Variable importance for the projection plot shows which variables have the most influence on the process. B: The score scatter plot shows a cor-
relation between the raw material type and product attributes. Adapted from Refs. 28 and 29.

and estimate the final productivity, respectively. Using these inocula trains were evaluated using chemometrics. The VIP
chemometric techniques, it was possible to extract seed fer- plot in Figure 1A summarizes the observations made from
mentation features related to the final productivity at both the score and loading plots by showing the relative impor-
pilot and production scales. tance of each included variable in the analysis. It is observed
that large-scale raw material type and culture metabolism
evolution (e.g., elevated lactate and osmolality) exert the
Case study illustrating use of chemometrics for root cause greatest influence on product attributes. These two parame-
analysis ters were further examined for their impact and interactions.
Figure 1 illustrates an application involving use of chemo- Next, a score scatter plot (shown in Figure 1B) was gener-
metrics for root cause analysis to identify scale-up differen- ated based on data analysis. It provides a visual summary of
ces and parameter interactions that adversely impacted cell the process behavior over time, with the score vectors for
culture process performance.29 Data from a total of 171 the first two principal components, t[1] and t[2], plotted
batches run at small scale (2 L), pilot scale (2,000 L), and against each other. It is seen in Figure 1B that a correlation
commercial scale (15,000 L) were used in this analysis. exists between raw material type and product attributes. On
Input parameters examined included continuous on-line the basis of the chemometric analysis, the authors were able
measurements of operating parameters (e.g., pH, dissolved to conclude that the root cause of process underperformance
oxygen, and temperature), daily measurements of dissolved was the combination of large-scale media and high pCO2
CO2, metabolic indicators, and cell growth parameters. The conditions in the large-scale bioreactor. This hypothesis was
output parameters included product attributes such as product later confirmed by experiments performed at pilot scale
titer, viable cell density, cell viability, and osmolality. Time using the different types of raw materials and CO2
course performance variables (daily, initial, peak, and end concentrations.29
point) were also evaluated. A total of 119 output variables As is evident from Tables 1 and 2, significant work has
from raw materials, product attributes, time course, and seed been done in the mammalian cell culture and microbial
Biotechnol. Prog., 2011, Vol. 27, No. 2 311

Table 3. Summary of Downstream Applications


Application Process Analyzer Chemometric Tool Observations
Use of on-line HPLC for making HPLC Linear regression Scale ability demonstrated of using
real-time decisions for pooling on-line anion exchange HPLC for
of process scale chromatography operations at pilot scale and
columns based on product quality eventually at manufacturing scale
attributes to get consistent product for pooling of a hydrophobic
quality interaction chromatography (HIC)
column.54
’’ UPLC ’’ UPLC offered significantly faster
separations (7 min) while retaining
the separation efficiency.55
’’ HPLC ’’ Feasibility of using a commercially
available on-line high-performance
liquid chromatography (HPLC)
system for real-time pooling of process
chromatography column.56
Competitive binding for partial Capillary Response surface Successful estimation of affinity binding
filling affinity capillary electrophoresis methodology (RSM) constant and prediction of the
electrophoresis (ACE) significance of injection time, voltage,
and ligand concentration on
protein–neutral ligand binding.57
’’ ’’ RSM-Box Behnken design Identification of the greatest influential
factor in reaching a targeted response
out of capillary length, voltage, and
injection time on protein–ligand
binding.58
Separation of lysozyme from UV–vis Parameter scanning Assessment of the suitability of the two
chicken egg white using membranes (Biomax 30-kDa
ultrafiltration polyethersulfone and Ultracel Amicon
YM 30-kDa regenerated cellulose)
and determination of significant
process parameters.59
Thin layer chromatography NIR spectroscopy PCA Fast and easy method for classification
for analysis of amino acids of L-form amino acids either in
dissolved or solid condition in
blood samples.60
Peptide profiling of cheese extracts HPLC Fuzzy approach (FA), Fuzzy approach used for data reduction
PCA, MDS, with results matching that from
nonhierarchical visual matching.61
cluster analysis
Real-time control of antibody loading On-line HPLC, Linear regression Accurate control of antibody loading
during protein A affinity Off-line ELISA on a protein A column in
chromatography a CHO-derived recombinant
antibody purification process and
assessment of chromatography
column performance.62
Bioprocess monitoring and control NIR spectroscopy PLS Cost-effective and useful approach for
of flocculation from yeast cell monitoring of cell debris, protein,
homogenate and ribonucleic acid (RNA) with
a [90% correlation between the
predictions and the actual data.63
Properties of benzoic acid derivative HPLC, NMR, circular PCA Model based on the data matrix of 18
with polypeptide dichroism (CD) conjugates described by 19 variables
(benzoyl poly-L-lysine) materials and the six principal components
explained 85.7% of the total
data variance.64
Determination of structural parameters Chromatography Quantitative structure Structural parameters of a series of
from chromatography retention data retention relationships benzodiazepines (BDZS) were
(QSSRs) identified determining quantitative
structure–enantiospecific retention
relationships (QSERRs), and the
topography of BDZ-binding sites
on HSA was determined.65

fermentation unit operations involving the use of chemomet- Chemometrics Applications in Downstream Processing
rics. As upstream processing is a critical part of the biotech
process, with the product of interest being expressed as well In comparison to upstream applications, chemometrics appli-
as most of the product-related and host-related impurities cations are somewhat sparse in the area of downstream proc-
formed, this will be an area that will continue to see interest essing. This could be due to the fact that a lot of the analyzers
in use of chemometric tools and approaches for process used in downstream processing, such as high-performance liq-
modeling and optimization. uid chromatography (HPLC) or capillary electrophoresis, have
312 Biotechnol. Prog., 2011, Vol. 27, No. 2

proteins from real biological solutions.59 The authors suc-


cessfully carried out separation of lysozyme directly from
natural chicken egg white solutions with 30-kDa membranes.
Fuzzy approach and PCA have been used to make inferential
analysis easier by greatly reducing the number of dependent
variables to enable chemometric analysis of peptide profiles
from cheese extracts.61 Combination of PLS and NIR has
been used for monitoring of a process that involves selec-
tively removing contaminants such as cell debris, host cell
proteins, and nucleic acids by flocculation of yeast cell ho-
mogenate.63 The approach was very cost effective and
yielded a [90% correlation between the predictions and the
actual data. PCA has been used to analyze data from
high-performance size exclusion chromatography to study
chromatographic behavior of conjugates of benzoic acid de-
rivative with polypeptide (benzoyl poly-L-lysine) materials.64
Analysis was performed on the data matrix of 18 conjugates
described by 19 variables and the six principal components.
The resulting model could explain 85.7% of the total var-
iance in the data and was able to describe the structural
behavior of each conjugate compound.

Case study illustrating use of chemometrics


as a PAT enabler
Use of on-line HPLC has been demonstrated as a feasible
approach for analysis that can facilitate real-time decisions
for pooling of a hydroxyapatite process chromatography col-
umn based on product quality attributes.55 Linear regression
fit was used to make this predictive model. The bottle neck
of this approach was the analysis time (13 min) of the frac-
tions that impact the yield. The analysis time was lowered to
7 min in a subsequent publication by using ultra-perform-
ance liquid chromatography (UPLC) for pooling a diethyl
amino ethyl process chromatography column.56 In a more
Figure 2. The fluorescence spectra of the permeate and reten- recent publication, a combination of the use of monolithic
tate stream obtained using a Varian fiber optic columns and UPLC was able to reduce the analysis time to
probe (FOP) assembly connected to a Varian Cary 1.3 min.54 This study involved pooling of a hydrophobic
Eclipse spectrofluorometer equipped with a xenon
flash lamp as the light source. interaction process chromatography column. The control
Reprinted with permission from Elshereef et al., Biotechnol
scheme was demonstrated to work at pilot scale (a 2,294-
Prog., 2010, 26, 168-178, V
C AIChE. fold scale-up from a 3.4-mL column in the lab to a 7.8-L
column in the pilot plant) and eventually to manufacturing
relatively longer times of analysis (compared to spectroscopy- scale (a 45,930-fold scale-up from a 3.4-mL column in the
based methods such as UV–vis) and also yield discrete data lab to a 158-L column in the manufacturing plant). Although
points. Table 3 presents a summary of these applications. Most chemometrics is not used here in the same manner as in the
upstream applications, process modeling is an integral part
of them are focused on chromatography and membrane separa-
of these applications involving pooling of process chroma-
tions. The chemometric tools used in these applications com-
tography columns as despite the faster analysis, the analysis
monly include PCA, PLS, quantitative structure retention lags elution by at least one fraction and so a model is needed
relationships, response surface methodology, and multidimen- to be able to predict the purity of the (n þ 1)th fraction
sional scanning (MDS). PCA has been used to analyze data based on the purity data available till nth fraction.54–56
from inactivated polio vaccine (IPV) purification process for
process optimization, enhancement of efficiency, and robust-
ness.66 This application will be later discussed in this section in Case study illustrating use of chemometrics for
the form of a case study. process modeling
Chemometric analysis has also been used for optimizing A study involving monitoring of fractionation of a whey
and predicting protein–ligand binding conditions in affinity protein isolate (WPI) during dead-end membrane filtration
capillary electrophoresis.57,58 Response surface methodology using fluorescence and chemometric tools was published
(Box Behnken design) has been used to model the effect of recently.67 The model system of WPI consisted of a-lactal-
injection time, voltage, and ligand concentration on protein– bumin (a-LA), b-lactoglobulin (b-LG), and a small propor-
neutral ligand binding. The model was subsequently vali- tion of bovine serum albumin (BSA). The objective of the
dated by a series of experimental runs. Parameter scanning study was to quantify the concentrations of the three species
technique, in combination with UV detection, was used for in the permeate and the retentate streams during ultrafiltra-
screening ultrafiltration membranes for the fractionation of tion. Fluorescence spectroscopy was used because of its
Biotechnol. Prog., 2011, Vol. 27, No. 2 313

greater sensitivity in comparison to UV–vis spectroscopy Although the chemometrics applications in downstream
and its ability to detect concentrations as low as 1010– processing are sparse compared to those in upstream process-
1012 M. The technique also provided multidimensional ex- ing, the growing push for process understanding from QbD
citation emission spectra, which required chemometric tools and PAT implementation in the biotech industry will ensure
for analysis. The composition of a-LA, b-LG, and BSA in that more and more is accomplished in using chemometric
feed, permeate, and retentate streams was analyzed using an approaches toward understanding functioning of complex
HPLC system equipped with the size exclusion column. Fig- downstream unit operations such as chromatography.
ure 2 illustrates a fluorescence spectra of the permeate and
retentate stream acquired using a HPLC measurements of a-
Conclusions
LA, b-LG, and BSA, which were treated as output matrix Y,
whereas the fluorescence measurements were treated as input This article reviews chemometric applications in biotech
data matrix X for the purpose of calibrating PLS model. processes. We hope to have demonstrated that chemometric
Sixty-four synthetic ternary mixtures containing the individ- tools can be very useful and powerful in extracting useful
ual whey proteins in different proportions were randomly process information through analysis of the readily available
designed and used to develop the calibration models. data in order to maximize process understanding. In view of
The number of factors in the PLS model was selected by a the complexity of biotech processes and products, we expect
cross-validation method that consisted of leaving out one chemometrics to continue to act as a significant enabler of
sample at a time. Nonlinear tools such as artificial neural the QbD and PAT initiatives.
network were used in combination to PLS for prediction of
samples that have higher protein content (i.e., relating fluo-
rescence to protein concentration). The estimated correlation Literature Cited
coefficients for a-LA, b-LG, and BSA were 0.99, 0.98, and 1. Wold S. Chemometrics; what do we mean with it, and what do
0.87, respectively. Good agreement was obtained between we want from it? Chemom Intell Lab Syst. 1995;30:109–115.
predicted and measured values, indicating the applicability 2. Kourti T. Process analytical technology and multivariate statisti-
of the proposed method for simultaneous determination of cal control, Part 1. Process Anal Technol. 2004;1:13–19.
the three species under discussion. 3. Kourti T. Process analytical technology and multivariate statisti-
cal control, Part 2. Process Anal Technol. 2005;2:24–28.
4. Kourti T. Process analytical technology and multivariate statisti-
cal control, Part 3. Process Anal Technol. 2006;3:18–24.
Case study illustrating use of chemometrics for 5. Martin EB, Morris AJ. Enhanced bio-manufacturing through
analysis of manufacturing data advanced multivariate statistical technologies. J Biotechnol.
2002;99:223–235.
A recently published study involved chemometric analysis 6. Tomita RK, Park SW, Sotomayor OAZ. Analysis of activated
of data obtained from historical manufacturing batches of the sludge process using multivariate statistical tools—a PCA
Salk-IPV production.66 The objective of this study was to approach. Chem Eng J. 2002;90:283–290.
extract useful information from the manufacturing data for 7. Cozzolino D, Smyth HE, Gishen M. Feasibility study on the use
potential use in process optimization and enhancement of ef- of visible and near-infrared spectroscopy together with chemo-
metrics to discriminate between commercial white wines of dif-
ficiency and robustness. The dataset comprised of both on- ferent varietal origins. J Agric Food Chem. 2003;51:7703–7708.
line (measured continuously and recorded digitally) and off- 8. Gracia N, Thomas S, Bazin P, Duponchel L, Thibault-Starzyk F,
line (measured occasionally and recorded by paper trace) Lerasle O. Combination of mid-infrared spectroscopy and chemo-
data of over 50 production runs of three polio viruses, Maho- metric factorization tools to study the oxidation of lubricating base
ney (Type 1), MEF-1 (Type 2), and Saukett (Type 3), on the oils. Catal Today. 2010;155:255–260.
Vero cell line at two different bioreactor scales—2  350 L 9. Seasholtz MB. Making money with chemometrics. Chemom
Intell Lab Syst. 1999;45:55–63.
(38 batches) and 2  750 L (12 batches)—and the ensuing 10. Al-Jowder O, Defernez M, Kemsley EK, Wilson RH. Mid-infra-
unit operations that involved virus purification and inactiva- red spectroscopy and chemometrics for the authentication of
tion. Individual unit operations were analyzed first before meat products. J Agric Food Chem. 1999;47:3210–3218.
developing a model for the entire process. Random distribu- 11. Lima FSG, Araújo MAS, Borges LEP. Determination of the
tion of points for the cell culture step in the two bioreactors carcinogenic potential of lubricant base oil using near infrared
in the score plot (350-L scale) indicated that both the bio- spectroscopy and chemometrics. Trib Int. 2003;36:691–696.
12. Wilkin OM, Maitlis PM, Haynes A, Turner ML. Mid-IR spec-
reactors operated in a similar way. PCA analysis of the viral
troscopy for rapid on-line analysis in heterogeneous catalyst
inactivation showed that the polio virus cultivation was inde- testing. Catal Today. 2003;81:309–317.
pendent of the virus strain type. PCA analysis of downstream 13. Rodionova OY, Sokovikov YV, Pomerantsev AL. Quality con-
data revealed two groups representing the two production trol of packed raw materials in pharmaceutical industry. Anal
scales. To study the operational behavior, the data were cor- Chim Acta. 2009;642:222–227.
rected for volume, and the analyzed data were found to be 14. Scafi SHF, Pasquini C. Identification of counterfeit drugs using
randomly distributed, implying similar operational behavior near-infrared spectroscopy. Analyst. 2001;126:2218–2224.
15. Rantanen J, Wikström H, Turner R, Taylor LS. Use of in-line
at both production scales, indicating consistent manufactur- near-infrared spectroscopy in combination with chemometrics
ing. A five factor PLS model was developed to predict the for improved understanding of pharmaceutical processes. Anal
specific activity (defined as the ratio of D-antigen and pro- Chem. 2005;77:556–563.
tein nitrogen content) at 700-L production scale, resulting in 16. Blanco M, Valdés D, Bayod MS, Férnandez-Marı́ F, Llorente I.
R2 of 93% and model error of about 0.6. Thus, using this Characterization and analysis of polymorphs by near-infrared
model, specific activity could be predicted reasonably well spectrometry. Anal Chim Acta. 2004;502:221–227.
17. Blanco M, Coello J, Itturiaga H, Maspoch S, Pérez-Maseda C.
from the process data. This example highlighted the useful- Determination of polymorphic purity by near infrared spectrom-
ness of chemometrics in extracting useful information from etry. Anal Chim Acta. 2000;407:247–254.
complex datasets and leading to increased robustness, better 18. Dyrby M. Chemometric quantitation of the active substance
troubleshooting, and possibilities for future improvements. (containing C¼N) in a pharmaceutical tablet using near-infrared
314 Biotechnol. Prog., 2011, Vol. 27, No. 2

(NIR) transmittance and NIR FT-Raman spectra. Appl Spec- 40. Gunther JC, Conner JS, Seborg DE. Process monitoring and
trosc. 2002;56:579–585. quality variable prediction utilizing PLS in industrial fed-batch
19. Ghasemi J, Niazi A, Ghobadi S. Simultaneous spectrophotomet- cell culture. J Process Control. 2009;19:914–921.
ric determination of benzyl alcohol and diclofenac in pharma- 41. Jung B, Lee S, Yang IH, Good T, Cote GL. Automated on-line
ceutical formulations by chemometrics method. J Chin Chem noninvasive optical glucose monitoring in a cell culture system.
Soc. 2005;52:1049–1054. Appl Spectrosc. 2002;56:51–57.
20. Gabrielsson J, Lindberg NO, Lundstedt T. Multivariate methods 42. Teixeira AP, Portugal CAM, Carinhas N, Dias JML, Crespo JP,
in pharmaceutical applications. J Chemom. 2002;16:141–160. Alves PM, Carrondo MJT, Oliveira R. In situ 2D fluorometry
21. Johnson R, Yu O, Kirdar AO, Annamalai A, Ahuja S, Ram K, and chemometric monitoring of mammalian cell cultures. Bio-
Rathore AS. Applications of multivariate data analysis in biotech technol Bioeng. 2009;102:1098–1106.
processing. Biopharm Int. 2007;20:130–134, 136–138, 140–144. 43. Mazarevica G, Diewok J, Baena JR, Rosenberg E, Lendl B. On-
22. Vaidyanathan S, Arnold SA, Matheson L, Mohan P, McNeil B, line fermentation monitoring by mid-infrared spectroscopy. Appl
Harvey LM. Assessment of near-infrared spectral information Spectrosc. 2004;58:804–810.
for rapid monitoring of bioprocess quality. Biotechnol Bioeng. 44. Sivakesava S, Irudayaraj J, Demirci A. Monitoring a bioprocess
2001;74:376–388. for ethanol production using FT-MIR and FT-Raman spectros-
23. Ündey C, Cinar A. Statistical monitoring of multistage, multi- copy. J Ind Microbiol Biotechnol. 2001;26:185–190.
phase batch processes. IEEE Control Syst Mag. 2002;22:40–52. 45. Roychoudhury P, Harvey LM, McNeil B. At-line monitoring of
24. Cunha CCF, Glassey J, Montague GA, Albert S, Mohan P. An ammonium, glucose, methyl oleate and biomass in a complex
assessment of seed quality and its influence on productivity esti- antibiotic fermentation process using attenuated total reflec-
mation in an industrial antibiotic fermentation. Biotechnol Bio- tance-mid-infrared (ATR-MIR) spectroscopy. Anal Chim Acta.
eng. 2002;78:658–669. 2006;561:218–224.
25. Ündey C, Ertunç S, Çinar A. Online batch/fed-batch process per-
46. Lennox B, Kipling K, Glassey J, Montague G, Willis M, Hiden
formance monitoring, quality prediction, and variable-contribution
H. Automated production support for the bioprocess industry.
analysis for diagnosis. Ind Eng Chem Res. 2003;42:4645–4658.
Biotechnol Prog. 2002;18:269–275.
26. Gunther JC, Conner JS, Seborg DE. Fault detection and diagno-
47. Zhang H, Lennox B. Integrated condition monitoring and con-
sis in an industrial fed-batch cell culture process. Biotechnol
trol of fed-batch fermentation processes. J Process Control.
Prog. 2007;23:851–857.
27. Kirdar AO, Conner JS, Baclaski J, Rathore AS. Application of 2004;14:41–50.
multivariate analysis toward biotech processes: case study of a 48. Jørgensen P, Pederson JG, Jensen EP, Esbensen KH. On-line
cell-culture unit operation. Biotechnol Prog. 2007;23:61–67. batch fermentation process monitoring (NIR)—introducing ‘bio-
28. Kirdar AO, Green KD, Rathore AS. Application of multivariate logical process time’. J Chemom. 2004;18:81–91.
data analysis for identification and successful resolution of a root 49. Ferreira AP, Lopes JA, Menezes JC. Study of the application
cause for a bioprocessing application. Biotechnol Prog. of multiway multivariate techniques to model data from an
2008;24:720–726. industrial fermentation process. Anal Chim Acta. 2007;595:
29. Kirdar AO, Chen G, Weidner J, Rathore AS. Combining near- 120–127.
infrared (NIR) spectroscopy and multivariate data analysis 50. Clark S, Barnett NW, Adams M, Cook IB, Dyson GA, Johnston
(MVDA) for screening of raw materials used in the cell culture G. Monitoring a commercial fermentation with proton nuclear
medium for the production of a recombinant therapeutic protein. magnetic resonance spectroscopy with the aid of chemometric-
Biotechnol Prog. 2010;26:527–531. s,Anal Chim Acta. 2006;563:338–345.
30. Kozlowski S, Swann P. Considerations for biotechnology prod- 51. Gordon SH, Green RV, Wheeler BC, James C. Multivariate
uct quality by design. In: Rathore AS, Mhatre R, editors. Qual- FTIR analysis of substrates for protein, polysaccharide, lipid
ity by Design for Biopharmaceuticals: Perspectives and Case and microbe content: potential for solid state fermentations.
Studies. NJ: Wiley; 2009:9–30. Biotechnol Adv. 1993;11:665–675.
31. Read EK, Park JT, Shah RB, Riley BS, Brorson KA, Rathore 52. Rosa SM, Soria MA, Vélez CG, Galvagno MA. Improvement
AS. Process analytical technology (PAT) for biopharmaceutical of a two-stage fermentation process for docosahexaenoic acid
products: concepts and applications—Part 1. Biotechnol Bioeng. production by Aurantiochytrium limacinum SR21 applying sta-
2010;105:276–284. tistical experimental designs and data analysis. Bioresour Tech-
32. Read EK, Park JT, Shah RB, Riley BS, Brorson KA, Rathore nol. 2010;101:2367–2374.
AS. Process analytical technology (PAT) for biopharmaceutical 53. McGovern AC, Ernill R, Kara BV, Kell DB, Goodacre R. Rapid
products: concepts and applications—Part 2. Biotechnol Bioeng. analysis of the expression of heterologous proteins in Esche-
2010;105:285–295. richia coli using pyrolysis mass spectrometry and Fourier trans-
33. Rathore AS, Bhambure R, Ghare V. Process analytical technol- form infrared spectroscopy with chemometrics: application to
ogy (PAT) for biopharmaceutical products. Anal Bioanal Chem. a2-interferon production. J Biotechnol. 2009;72:157–167.
2010;398:137–154. 54. Rathore AS, Parr L, Dermawan S, Lawson K, Lu Y. Large scale
34. Rathore AS, Winkle H. Quality by design for pharmaceuticals: demonstration of a process analytical technology application in
regulatory perspective and approach. Nat Biotechnol. bioprocessing: use of on-line high performance liquid chromatog-
2009;27:26–34. raphy for making real time pooling decisions for process chroma-
35. Rathore AS. A roadmap for implementation of quality by design tography. Biotechnol Prog. 2010;26:448–457.
(QbD) for biotechnology products. Trends Biotechnol. 55. Rathore AS, Wood R, Sharma A, Dermawan S. Case study
2009;27:546–553. and application of process analytical technology (PAT)
36. Luo Y, Chen G. Combined approach of NMR and chemomet- towards bioprocessing. II. Use of ultra-performance liquid
rics for screening peptones used in the cell culture medium for chromatography (UPLC) for making real-time pooling deci-
the production of a recombinant therapeutic protein. Biotechnol sions for process chromatography. Biotechnol Bioeng. 2008;
Bioeng. 2007;97:1654–1659. 101:1366–1374.
37. Rhiel M, Cohen MB, Murhammer DW, Arnold MA. Non de- 56. Rathore AS, Yu M, Yeboah S, Sharma A. Case study and
structive near-infrared spectroscopic measurement of multiple application of process analytical technology (PAT) towards
analytes in undiluted samples of serum-based cell culture media. bioprocessing: use of online high performance liquid chroma-
Biotechnol Bioeng. 2002;77:73–82. tography (HPLC) for making real time pooling decisions for
38. Riley MR, Rhiel M, Zhou X, Arnold MA, Murhammer DW. Si- process chromatography. Biotechnol Bioeng. 2008;100:306–
multaneous measurement of glucose and glutamine in insect 316.
cell culture media by near infrared spectroscopy. Biotechnol 57. Montes RE, Hanrahan G, Gomez FA. Use of chemometric
Bioeng. 1997;55:11–15. methodology in optimizing conditions for competitive binding
39. Ryan PW, Li B, Shanahan M, Leister KJ, Ryder AG. Prediction partial filling ACE. Electrophoresis. 2008;29:3325–3332.
of cell culture media performance using fluorescence spectros- 58. Hanrahan G, Montes RE, Pao A, Johnson A, Gomez FA. Imple-
copy. Anal Chem. 2010;82:1311–1317. mentation of chemometric methodology in ACE: predictive
Biotechnol. Prog., 2011, Vol. 27, No. 2 315

investigation of protein–ligand binding. Electrophoresis. 64. Bolzacchini E, Consonni V, Lucini R, Orlandi M, Rindone B.
2007;28:2853–2860. HP-SEC behaviour of substituted benzoylpoly-L-lysines by prin-
59. Wan Y, Lu J, Cui Z. Separation of lysozyme from chicken egg cipal component analysis and molecular dynamics simulation. J
white using ultrafiltration. Sep Purif Technol. 2006;48:133–142. Chromatogr A. 1998;813:255–265.
60. Heigl N, Huck CW, Rainer M, Najam-ul-Haq M, Bonn GK. 65. Kaliszan R. Chemometric analysis of biochromatographic data:
Near infrared spectroscopy, cluster and multivariate analysis implications for molecular pharmacology. Chemom Intell Lab
hyphenated to thin layer chromatography for the analysis of Syst. 1994;24:89–97.
amino acids. Amino Acids. 2006;31:45–53. 66. Thomassen YE, van Sprang ENM, van der Pol LA, Bakker
61. Piraino P, Parente E, McSweeney PLH. Processing of chromato- WAM. Multivariate data analysis on historical IPV production
graphic data for chemometric analysis of peptide profiles from data for better process understanding and future improvements.
cheese extracts: a novel approach. J Agric Food Chem. Biotechnol Bioeng. 2010;107:96–104.
2004;52:6904–6911. 67. Elshereef R, Budman H, Moresoli C, Legge RL. Monitoring the
62. Fahrner RL, Blank GS. Real-time control of antibody loading fractionation of a whey protein isolate during dead-end mem-
during protein-A affinity chromatography using an on-line brane filtration using fluorescence and chemometric methods.
assay. J Chromatogr A. 1999;849:191–196. Biotechnol Prog. 2010;26:168–178.
63. Yeung KSY, Hoare M, Thornhill NF, Williams T, Vaghjiani
JD. Near infrared spectroscopy for bioprocess monitoring and Manuscript received Sept. 29, 2010, and revision received Dec. 1,
control. Biotechnol Bioeng. 1999;63:684–693. 2010.

You might also like