You are on page 1of 11

Analytica Chimica Acta 552 (2005) 25–35

A procedure to assess linearity by ordinary least squares method


Scheilla V.C. de Souza, Roberto G. Junqueira ∗
Universidade Federal de Minas Gerais (UFMG), Faculdade de Farmácia, Departamento de Alimentos,
Av. Antônio Carlos 6627, Pampulha, Zip Code 31.210-010 Belo Horizonte, Minas Gerais, Brazil

Received 27 April 2005; received in revised form 5 July 2005; accepted 19 July 2005
Available online 24 August 2005

Abstract

A detailed procedure for testing linearity of calibration curves in method validation by the ordinary least squares method (OLSM),
including experimental design, estimation of the parameters, outlier treatment and evaluation of the assumptions, was proposed. The theoretical
background was discussed and the assumptions considered were: (i) normality, homoscedasticity and independency of residuals and (ii)
adjustment to the correct model. The procedure involved precise statistical techniques that were easily carried out by any commercial
spreadsheet software. The suitability of the procedure for assessing linearity was demonstrated by the application in food analysis.
© 2005 Elsevier B.V. All rights reserved.

Keywords: Linearity; Ordinary least squares method; Chemometry; Method validation

1. Introduction There are several definitions concerning linearity in the lit-


erature. It “is the ability to elicit test results that are directly or
Reliable analytical methods are required for compliance by means of well-defined mathematical transformations, pro-
with national and international regulations in all areas of portional to the concentration of analytic in samples within a
analyses. It is internationally recognised that a laboratory given range” [4]. “Quantitation requires that one knows how
must take appropriate measures to ensure that it is capable the response measured depends on the analyte concentration”
of providing data of the required quality [1]. Such measures [5]. “Defines the ability of the method to obtain test results
include establishing traceability of the measurements, use of proportional to the concentration” [6]. “Linear data are data,
validated methods of analyses, use of defined internal qual- where the relationship between analyte concentrations and
ity control procedures, participation in proficiency testing test results can be fitted (in the least squares sense) as well
schemes and becoming accredited by an international stan- by a straight line as by any other function” [7].
dard, normally ISO/IEC 17025 [2]. Different guidelines, protocols and papers provide rec-
Calibration is a procedure that determines the systematic ommendations for method validation, including the linearity
difference that may exist between a measurement system and assessment [1,3–12]. Most of them describe the ordinary
a reference system represented by the reference materials and least squares method (OLSM) as the statistical method to be
their accepted values [3]. Considering that the majority of used. However, the recommendations are sometimes flawed
the analytical methods use linear relationships in one way or or controversial and do not detail the experimental designs,
another, examination of a calibration function for linearity is the statistics calculation and the respective assumptions that
an important performance figure in validating an analytical need to be checked, objectifying regard that the principles of
method, as well as an everyday task in routine analytical the statistical tests are not seriously affected.
operations. The improper recommendation to establish linearity that
is most frequently written into protocols and papers is the use
∗ Corresponding author. Tel.: +55 31 34996913; fax: +55 31 34996988. of the correlation coefficient r or the determination coefficient
E-mail address: junkeira@netuno.lcc.ufmg.br (R.G. Junqueira). R2 . There are several misconceptions about these statistics.

0003-2670/$ – see front matter © 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.aca.2005.07.043
26 S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35

If y and x are related in a non-linear fashion, R2 will often 2. Proposed procedure


be large. A large R2 does not necessarily imply that the
regression model will provide accurate predictions of future 2.1. Experimental design
observations [13]. Certainly, it is true that, if the calibration
points are tightly clustered around a straight line, the experi- The first step of the procedure suggested for linear-
mental value of R2 will be close to one. But a value of R2 close ity assessment is the experimental design that follows: (i)
to one is not necessarily the outcome of a linear relationship. determination of the range of interest, considering that the
It could, for example, result from points clustered around a expected sample concentration should lie near the centre of
slight curve [14]. Mulholland and Hibbert [9] showed that a the range; (ii) preparation of calibration solutions (solvent or
very large percentage of errors at the lower end of the concen- matrix matched, depending on the results of the matrix effects
tration range can coexist with an acceptable correlation and studies), in at least six concentration levels (evenly spaced,
are grossly underestimated by confidence limits of the slope each level in three independent replicates) and a zero level
and intercept. Thompson et al. [1] discussed non-linearity (prepared as a control tool to adjust the instrumental zero);
and lack-of-fit test by examining the residuals or by analysis (iii) measurement of the response of the calibration solutions
of variance and recommended that the correlation coefficient in a random order.
in the context of linearity testing is inadequate and should be
avoided. 2.2. OLSM
Fitting a calibration function by OLSM requires several
assumptions related to the residuals (normality, homoscedas- The second step in linearity assessment is the OLSM
ticity and independency) and to the model. The indiscrim- including estimation of the parameters and outlier treatment:
inate use of OLSM is a consequence of the omission of (i) acquisition of the experimental data; (ii) visual inspection
the assumptions tests and frequently is an important source of the x−y plot; (iii) estimation of the slope, intercept, resid-
of errors in analytical chemistry. Mulholland and Hibbert ual, the respective variances and R2 ; (iv) visual inspection
[9] discussed the OLSM limitations that result from het- of the residual plot; (v) investigation and deletion of outliers
eroscedastic residuals and recommended the use of y-residual by the Jacknife standardised residuals test. The schematic
plots to provide distinctive visualisation of non-linearity and presentation of this step is shown in Fig. 1.
heteroscedasticity. They also suggested the outlier tests, such The linear first-order model for OLSM is described in Eq.
as the Cook distance to detect and remove values with large (A.1). For a given Xi , a corresponding observation Yi con-
errors. Danzer and Kurrie [12], Eurachem [8], ISO 11095 [3] sists of the value α + βXi plus an amount εi , the increment
and Thompson et al. [1] also considered heteroscedasticity, by which any individual Yi may lie off the regression line. In
suggesting that the calibration data are best treated by the such circumstances, the variable Yi is subject to error, while
weighted least squares method. ISO 11095 [3] and Thomp- the magnitude of the error in variable Xi is considered neg-
son et al. [1] discussed the need to check non-linearity by ligible. This model is often applicable in calibration curves;
the test for lack-of-fit with either a simple or a weighted however, when the uncertainty on the reference value is con-
regression. Danzer and Kurrie [12] discussed about normal- siderable, OLSM should not be used. Different approaches to
ity and orthogonal least squares method in cases with error fit a linear function to data with error on both variables have
in both variables. ISO 11095 [3] mentioned normality and been detailed [12,15,16].
independency but did not discuss about these assumptions.
Mark [7] suggested a procedure for testing linearity of ana-
2.2.1. Estimation of the parameters
lytical methods for pharmaceutical analyses by regression
The regression parameters α and β are estimated by
analysis. Based on the theory that any function can be approx-
the least squares estimators a and b, respectively, namely
imated by a polynomial, this author evaluated non-linearity
the quantities that minimize the residual sum of squares
by testing how many terms should be included in a fitting
n
function. (yi − ŷi )2 , where ŷi is the predicted dependent variable
Considering the need for a complete system to assess i=1
linearity by OLSM applied to method validation in ana- given by the estimated regression Eq. (A.2). The estimate
lytical chemistry, this paper presents a procedure based of the slope b and the intercept a of the fitted straight line
on consistent acceptable statistics. This procedure basically are demonstrated in Eqs. (A.3) and (A.4), respectively. For
includes three steps: (i) the experimental design; (ii) the each value xi , at which a yi measured signal is available,
OLSM including estimation of the parameters and outlier the residual ei is given by Eq. (A.5). The variance of the
treatment; (iii) the verification of the assumptions related slope sb2 , intercept sa2 and residual sres
2 are calculated by Eqs.

to the OLSM model and residuals. The ability of the pro- 2


(A.6)–(A.8), respectively. The R is defined by Eq. (A.9).
cedure proposed here to test linearity was demonstrated by The values of the slope, intercept and the respective vari-
applying it to real data obtained from in-house validation ances are used to construct the equation that will be used to
of an analytical method to detect residues of pesticides in predict the sample concentrations, with an associated uncer-
food. tainty. Despite the known limitations of R2 , this statistic is
S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35 27

Fig. 1. Assessing linearity: OLSM calculation and outlier treatment.

evaluated as the proportion of total variation about the mean through the corresponding assumptions is achieved using the
of measurements explained by the regression [17]. following tests for the residuals assumptions: (i) normality
(Ryan–Joiner test); (ii) homoscedasticity (Brown–Forsythe
2.2.2. Outlier treatment test); (iii) independency (Durbin–Watson test). For the model
The OLSM has the inconvenience of being very sensitive assumption, the lack-of-fit test (ANOVA) was used. If the data
to the presence of outliers and/or high-leverage points [18]. If are not normally distributed or not homoscedastic, a trans-
only a few calibration points are available, a plot of the resid- formed variable, such as square root, logarithm and reciprocal
uals would reveal a trend, if any is present. Initially, the resid- should be tested [23]. If the transformed variable is nor-
uals are plotted against each respective concentration level. mally distributed and homoscedastic, it can be used in the
Two horizontal dotted lines corresponding to ±t(0.975,n−2) sres analyses.
are used to indicate the accepted variation of each single point
in the residual plot, cases with a dashed line being perceived 2.3.1. Test of normality
as a trend. The characteristic shape of the residual plot is also Normality of residuals is verified by Ryan–Joiner test. The
compared with the shapes of standard sets of residual plots residuals are ordered and plotted against the corresponding
to identify deficiencies in the original model [19]. This qual- percentage points from the standard normal distribution (nor-
itative evaluation is confirmed by the formal tests for outliers mal quantiles). Denoting the ordered residuals observations
and homoscedasticity that follow. in a sample of size n by ei , a normal probability plot, also
The outliers are assessed by the Jacknife (externally stu- known as normal quantile–quantile (QQ) plot, is produced
dentised) residuals test with Jei been described by Eq. (A.10). by plotting the ei against ci , where ci is the pith percentage
This test uses an estimate of the standard deviation indepen- point of the standard normal distribution. The normal quan-
dent of the point. Thus, the residuals are easily computed for tiles are obtained by Eq. (A.11).
every point without having to fit the n separate regressions, If the data come from a normal distribution, they will
each excluding one point [20,21]. The Jacknife residuals are fall on an approximately straight line, whereas if they come
distributed as the t distribution on n−p−1 degrees of free- from some alternative distribution, the plot will exhibit some
dom. Jei values higher than the critical t value are considered degree of curvature. If the data fall nearly on a straight line,
outliers [20] and dropped unless the overall fraction of data the correlation coefficient will be near unity, whereas if the
dropped exceeds 29 [22]. For each drop of data, the OLSM plot is curved, the correlation coefficient will be smaller.
should be performed again. If it falls below an appropriate critical value, doubt will
be cast on the null hypothesis of normality. Approximate
2.3. OLSM assumptions
critical correlation coefficients for the probability plot for
The third step of the procedure recommended in this paper different significance levels are given by Eqs. (A.12)–(A.14)
is illustrated by Fig. 2. The validation of the use of the OLSM [24].
28 S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35

Fig. 2. Assessing linearity: tests of assumptions. * The application of OLSM will be inappropriate for cases with non-normality after the transformation.

2.3.2. Test of homoscedasticity each group j = 1 and 2 are calculated. The Levene tL statistic
Linear regression by the OLSM assumes that each is assessed by Eq. (A.15).
data point in the range has a constant absolute variation When the test statistic does not exceed the critical value of
(homoscedasticity). However, many analytical methods pro- t(1 − α/2) for n1 + n2 − 2 degrees of freedom, there is no rea-
duce data that are heteroscedastic because the errors have a son to reject the null hypothesis and believe that the residual
constant relative value [25]. The Levene test [26] modified variance is not constant [26,27].
by Brown and Forsythe [27] is used to evaluate homoscedas- The weighted least squares method, a particular case of the
ticity. The residuals are split into two groups of size n1 and generalised least squares method, is usually recommended
n2 , with respect to the levels of xi . The residuals median for for heteroscedastic data [28].
the groups ẽ1 and ẽ2 are calculated. The absolute values of
the differences between residuals and their group medians 2.3.3. Test of independency
are obtained di1 = |ẽ1 − ei1 | and di2 = |ẽ1 − ei2 |. The mean A serial correlation of the residuals is called autocor-
d̄j and the sum of squared deviations SSDj of dij values for relation. Autocorrelation affects the variance of the least
S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35 29

Table 1
ANOVA table for OLSM
Source d.f. SS MS F
n 2
(xi −x̄)(yi −ȳ)
Due to regression 1 n
i=1 SSRegr
d.f.Regr
MSRegr
2
(xi −x̄)
2 sres
i=1

n
SSres
Residual n−2 (yi − ŷi )2 d.f.res = sres
2

i=1

u
SSLack MSLack
Lack-of-fit u−2 nk (ŷk − ȳk )2 d.f.Lack MSPure error
k=1

u 
nk
SSPure error
Pure error n−u (ykj − ȳk )2 d.fPure error
k=1 j=1
n
SSTotal
Total n−1 (yi − ȳ)2 d.f.Total
i=1

d.f., degrees of freedom; SS, sum of squares; MS, mean square; F, variance ratio; u, number of concentration levels; nk , number of j-points in the k-concentration
level; yi , measured signal; xi , know concentration; ŷi , predicted dependent variable; n, number of i-calibration points; ȳ, mean of the measured signals; x̄, mean
of known concentrations.

squares estimates and may lead to an underestimation of σ 2 relationship between the raw measured data and the analyte
and the confidence intervals. In hypothesis testing, it could concentrations are non-linear [17]. To perform the lack-of-fit
lead to erroneous inference, indicating false significance of test, the total variability of the responses is decomposed into
the regressors [17]. Assuming that the residuals ei are inde- the sum of squares due to regression and the residual (about
pendent variables, all serial correlations are ρs = 0. The null regression) sum of squares. The residual sum of squares is
hypothesis that all ρs = 0 via the Durbin–Watson test [29] is separated into lack-of-fit (deviation from linearity) and pure
tested against the alternative that ρs = ρ (ρs = 0 and |ρ| < 1). error (from repeated points) sums of squares. ANOVA table
The Durbin–Watson statistic d is defined by Eq. (A.16). can be constructed from equations shown in Table 1; how-
For each data set, there are two limits for d (dL is the lower ever, the use of all the equations is not necessary. The residual
limit, dU is the upper limit). If d lies within these limits, the sum of squares can be obtained by the difference between the
test is inconclusive. However, d < dU indicates autocorrela- total sum of squares and the sum of squares due to regression.
tion and rejects the null hypothesis at the 2α significance Similarly, the lack-of-fit sum of squares can be calculated by
level; d > dU indicates no autocorrelation and does not reject subtracting the pure error sum of squares from the residual
the null hypothesis. The Durbin–Watson value varies from 0 sum of squares.
to 4 with a mean of 2. If the calculated value converges to 2, it A significant lack-of-fit indicates that the model appears
means that there is no autocorrelation. When it moves away to be inadequate. Attempts would be made to discover where
from 2 toward either 0 or 4, the autocorrelation increases. A and how the inadequacy occurs. A non-significant lack-of-
Durbin–Watson value of 0 indicates a perfect positive auto- fit indicates that there appears to be no reason to doubt the
correlation and 4 indicates a perfect negative autocorrelation. adequacy of the model and both the pure error and lack-of-fit
Values of 1.5 and 2.5 can be used as lower and upper cut-offs mean squares can be used as estimates of σ 2 [17].
in many cases [29]. Tables giving limit values of d dependant
on the α significance level, the number of residual points and
the number of predictor variables are available. To avoid the 3. Experimental
use of tables, the limit values of d can be estimated, for differ-
ent significance levels and one predictor variable, as proposed To illustrate the suitability of the procedure described to
in this paper by Eqs. (A.17)–(A.22). The independency can assess linearity by OLSM, it was applied to the data from an
be graphically demonstrated by plotting each ei value against in-house validation of a method for detection of organophos-
the immediately ei − 1 value, being indicate by a random pat- phorus compounds in fruits and vegetables by gas chromatog-
tern of the regression residues. raphy using a nitrogen phosphorus detector (GC/NPD). The
In the case of autocorrelated data, OLSM is not appro- organophosphorus compounds studied were chlorfenvinphos
priate and the use of generalised least squares methods is and parathion methyl.
recommended [28].
3.1. Chemicals and standard solutions
2.3.4. Lack-of-fit and regression significance
For some analytical techniques, the linear model cannot Stock solutions of 204 and 202 ␮g ml−1 , respectively,
be applied and non-linear or polynomial models are bet- of chlorfenvinphos and parathion methyl were prepared by
ter adapted. Many analytical methods are known, where the dilution of chlorfenvinphos and parathion methyl (from Dr.
30 S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35

Ehrenstorfer GmbH, Bgm-Schlosser-Str. 6A, D-86199 Augs- same reading or obtained by successive dilutions, the residual
burg, Germany) in HPLC/GC grade ethyl acetate from EM 2 will tend to underestimate the variance σ 2 and
variance sres
Science (Gibbstown, NJ, USA). The intermediate solutions the lack-of-fit test will tend to wrongly detect non-existent
of 2.04 and 2.02 ␮g ml−1 , respectively, were prepared by lack-of-fit [17].
dilutions of stock solutions in ethyl acetate. Calibration Another important aspect of the proposed approach is the
solutions at 10.2, 30.6, 51.0, 71.4, 91.8 and 112.2 ng ml−1 y = a + bx model used rather than the single-parameter model,
for chlorfenvinphos and at 10.1, 30.3, 50.5, 70.7, 90.9 and y = bx. The practice of constraining the least squares curve
111.1 ng ml−1 for parathion methyl were prepared by dilu- by forcing the curve through the origin is a controversial
tion of respective intermediate solutions with ethyl acetate, subject. Some authors consider that constraints are appropri-
in three independent replicates and run in random order. The ate because the fitting curve in each case is known to pass
levels studied corresponded to concentrations of chlorfenvin- through the theoretical fixed points [30,31]. However, while
phos and parathion methyl in food in the range from 0.007 constraints might be justifiable from a purely scientific point
to 0.075 mg kg−1 and from 0.007 to 0.074 mg kg−1 , respec- of view, the practical consequence is to enlarge sres2 [19,32].

tively. The omission of a parameter from a model is a very strong


assumption that is usually unjustified [17]. Constraining the
3.2. Equipments and working conditions least squares curve by forcing the curve through the origin
based on blank correction is also not reliable, since the blank
The experiments were performed with a Varian CP3800 value is not exactly known, but only estimated. Although the
gas chromatograph equipped with a Varian CP8200 autosam- zero level is used to adjust the instrumental zero, the inclu-
pler and a nitrogen phosphorus detector. The data was pro- sion of this point in regression analysis is only justified when
cessed using a Varian Star Chromatography Workstation, it is definitely known that the linear model holds true. More-
Version 5.31 (Walnut Creek, CA, USA). over, it is well known that the points near the ends of the
A DB-5 capillary column (length 30 m, i.d. 0.25 mm and range will be subject to greater oscillations than those near
particle size 0.25 ␮m) from J&W Scientific (Folsom, CA, the centre [13,17] and that lack-of-fit is common at a low end
USA) was used. The He carrier gas head pressure was of an estimated calibration function. The use of an unreliable
4.5 kg cm−2 at a 2 ml min−1 flow rate. The calibration solu- zero point in the regression analysis affects the estimation of
tions were analysed by GC/NPD under the followed condi- the parameters, distorts the residual distribution and enlarges
tions: injection volume, 1 ␮l; injector temperature, 180 ◦ C; 2 .
sres
detector temperature, 300 ◦ C; bead current, 2950 A; total run Visual examination of the residual plots indicated possible
time, 25 min. The mean retention time of chlorfenvinphos outliers and revealed no other obvious deficiency. The points
was 12.45 and that of parathion methyl was 10.36. The tem- that were outside the accepted variation ±t(0.975,n−2) sres were
perature program was 80, 200, 230, 250 and 280 ◦ C at a rate confirmed as outliers by the Jacknife residual test. The resid-
of 0, 30, 15, 5 and 30 ◦ C min−1 for the total run time 1, 8, 16, ual plots and outliers removed for each curve are shown in
22 and 25 min, respectively. Fig. 3. Three outliers were detected in the chlorfenvinphos
curve at the 71.4, 91.8 and 112.2 ng ml−1 levels and two out-
liers in the parathion methyl curve at 50.5 and 70.7 ng ml−1 .
4. Results and discussion Considering that this experiment was carried out with n = 18,
the 29 limit to terminate the dropping of data [22] was four
The calibration experiment requires the establishment of outliers. After discarding the third outlier in the chlorfenvin-
a preliminary working range that depends on the practice- phos curve and the second outlier in the parathion methyl
related objective of the calibration. The procedure proposed is curve, no more outliers were detected and the use of the rule
based on at least six evenly spaced concentration levels each to terminate the test was not necessary. This step permitted
level in three independent replicates. The most frequently the deletion of some points that could have an influence on
expected sample concentration is near the centre of the range the fitted regression equation.
of interest. An experimental design with three levels (two The assumption that the residuals are normally distributed
extremes and a central) with a larger number of replicates in was confirmed. The QQ plots and the respective Ryan–Joiner
the lower and upper levels have already been proposed [17]. correlation coefficients are illustrated in Fig. 4. The corre-
However, references related to method validation suggested lation coefficients were 0.9907 and 0.9791 for the chlor-
at least five or six concentrations levels [1,4,6,10,11], equally fenvinphos and parathion methyl curves, with critical val-
spaced across the concentration range, at least in triplicate [1]. ues of 0.9506 and 0.9529, respectively, indicating no sig-
Replicates of each calibration point give information about nificant deviation from normality at α = 0.10. The valid-
the inherent variability of the response measurements (pure ity of parametric statistical inference procedures in finite
error). The calibration must be carried out with genuine repli- samples depends crucially on the underlying distributional
cates, measured in a random order to avoid the problem of assumptions. Consequently, there has been extensive focus
confusing non-linearity with temporal effects, such as cal- on whether hypothesised distributions are compatible with
ibration drift [1]. If the replicates are just repetitions of the the data. Non-normality, for example, is impeditive to t-test
S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35 31

and F-test inference procedures [33]. There are numerous


examples of normality tests, including both graphical and
statistical tests [34–36]. The Ryan–Joiner test [24] used to
prove the normal distribution of the regression residuals is
easy to calculate and no special tables are required for its
computation. It is conceptually simple in that it combines
two fundamental concepts: the normal probability plot and
the correlation coefficient. Numerically, this test is very simi-
lar to the Filliben test, whose power for small samples (n = 20)
has been demonstrated [37] and it is also essentially equiva-
lent to the powerful Shapiro–Wilk test [24].
Residuals were statistically independent and the
Durbin–Watson statistic was 1.12 (p > 0.02) for the chlorfen-
vinphos curve and 2.05 (p > 0.10) for the parathion methyl
curve, demonstrating that no autocorrelation was observed
(Fig. 5). As originally designed, the Durbin–Watson test was

Fig. 3. Residual plots for outlier diagnose by Jacknife standardised resid-


uals test: ei is the residual, filled points outliers and dashed lines are
±t(0.975,n−2) sres .

Fig. 4. Normal QQ plots of residuals: ei , residual and R, correlation coeffi-


cient of Ryan–Joiner test. Fig. 5. Plots of residuals autocorrelation: ei , residual and d, Durbin–Watson
statistic.
32 S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35

Table 2 [12], but they are very sensitive to the normal distribution
Residual homoscedasticity evaluation by modified Levene test assumption [43]. The Cochran C-test assumes that the num-
Statistic Chlorfenvinphos Parathion methyl ber of replicates within the groups are the same or similar and
Group 1 Group 2 Group 1 Group 2 that there are sufficient numbers of replicates to get a reason-
nj 9 6 8 8
able estimate of the group variances. The Bartlett test allows
ẽj −3.07 × 101 4.48 × 101 4.23 × 101 −3.33 × 101 unequal replication, while the Hartley Fmax test requires equal
d̄j 5.52 × 101 7.52 × 101 1.01 × 102 1.30 × 102 replication among groups and does not consider all the points
SSDj 2.40 × 104 1.89 × 104 4.38 × 104 3.29 × 104 2
of the calibration curve, only those groups with smax 2
and smin
sp2 3.30 × 103 5.48 × 103 [12]. On the other hand, the Levene test [26] generalised to
tL −6.62 × 10−1 −7.87 × 10−1
unequal sample sizes takes into account all the points of the
t0.975 2.16 2.15
p p > 0.05 p > 0.05 calibration function. Working with the median in place of the
 mean and the modulus of the deviations to minimize pos-
nj , number of observations per group; ẽj , median; d̄j = |ẽj − eij |/nj ,
mean of differences between each residual and median; SSDj , sum of squared
sible problems caused by correlations between deviations in
deviations; sp2 , pool variance; tL , Levene t statistic; t0.975 = t critical; p, sig- the same group of small samples, this test is robust even under
nificance. condition of non-normality [27].
The results obtained for the tests of normality,
applied to study the structure or randomness of the regres- homoscedasticity and independency of residuals indicated
sion residues, detecting errors in first-order autocorrelation that the use of OLSM was appropriate. Also, a high sig-
[17,29]. However, this statistic also fulfills an important role nificance (p > 0.001) of the regression was observed, while
as a general test of model misspecification, being important the lack-of-fit was not significant (p > 0.05) for the chlorfen-
for finding the optimal number of latent variables for the vinphos and parathion methyl curves (Table 3). Thus, the
model [38]. It is very sensitive for verifying non-linearity, but relationship between the areas of the measured signals and
it has a low statistical power to detect this condition [7,39]. the organophosphorus concentrations was linear. The x−y
Otherwise, the Durbin–Watson test is robust for detecting plots and the respective OLSM statistics are presented in
autocorrelation caused by situations where the regression Fig. 6. These plots demonstrated linearity in the ranges from
residuals can be described by a first-order autoregressive 10.2 to 112.2 ng ml−1 for chlorfenvinphos and from 10.1 to
process [40]. 111.1 ng ml−1 for parathion methyl.
Homoscedasticity was accessed by the modification pro- The diagnosis and the drop of outliers were shown to be a
posed by Brown and Forsythe for the Levene test [26,27]. critical step in the application of this procedure. The Jacknife
Table 2 shows that the residual variability across all con- residual was used as a formal test for the outlier treatment.
centration levels was not significantly different (p > 0.05), Numerous tests for detection of influential observations in
indicating homoscedasticity. There are a number of possi- OLSM have been proposed in the literature [20]. Consider-
ble tests for the homogeneity of variance assumption. The ing that each diagnostic test should be used to detect a specific
Cochran C-test that is used to treat outliers in collaborative phenomenon in the data, the selection of these tests must be
studies [41] was reported to verify non-constant variances careful. The Cook distance [9,44], for instance, is a common
of residuals in order to select the most appropriate calibra- measurement available in statistical packages that combines
tion curve fitting [42]. The Hartley Fmax and Bartlett tests residuals and leverages in a single measure of influence. How-
were also proposed to evaluate the residual homoscedaticity ever, this statistic was unable to diagnose the outliers detected

Table 3
ANOVA statistics for regression including lack-of-fit test
Source d.f. SS MS F p
Chlorfenvinphos
Due to regression 1 8.07 × 106 8.07 × 106 1.13 × 103 5.00 × 10−14
Residual 13 9.27 × 104 7.13 × 103
Lack-of-fit 4 4.46 × 104 1.11 × 104 2.08 1.66 × 10−1
Pure error 9 4.82 × 104 5.35 × 103
Total 14 8.16 × 106
Parathion methyl
Due to regression 1 7.99 × 107 7.99 × 107 3.67 × 103 2.39 × 10−18
Residual 14 3.05 × 105 2.18 × 104
Lack-of-fit 4 1.68 × 105 4.19 × 104 3.06 6.89 × 10−2
Pure error 10 1.37 × 105 1.37 × 104
Total 15 8.02 × 107
d.f., degrees of freedom; SS, sum of squares; MS, mean square; F, variance ratio; p, significance.
S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35 33

are causes of lack-of-fit other than non-linearity that can arise


in calibration curves [1], so the lack-of-fit test must be used
in conjunction with the inspection of the residual plot and the
formal tests of the residual assumptions.
The tests recommended by this procedure were chosen
considering the theoretical background and ease of applica-
tion. As demonstrated, it is fundamental that the assumptions
be checked by appropriate tests before making any inference
based on the OLSM application, since the assessment of lin-
earity will affect the reliability of all the other parameters in
method validation.

5. Conclusions

The experimental design proposed here was easy for an


in-house validation application. All the calculations were per-
formed using a commercial spreadsheet and no specific or
expensive statistical package was necessary. Precise statisti-
cal techniques were applied to test the assumptions related
to residuals and model. The application of this procedure
indicated its suitability for linearity assessment by OLSM
in method validation, considering theoretical and empirical
aspects.

Acknowledgements
Fig. 6. Chlorfenvinphos and parathion methyl calibration curves and respec-
tive OLSM statistics. The authors would like to acknowledge the Instituto
Mineiro de Agropecuária (IMA) for the experimental data
and the financial support from the Brazilian agencies Capes
by the Jacknife residual test, since all the values of the Cook and CNPq.
distance were smaller than 1, the reference value for this
diagnostic test [13]. Indeed, the Cook distance expresses the
influence of a specific point on the estimated parameters. Appendix A
When a point does not significantly affect the model parame-
ters, the value of this measurement is low, but the point could A.1. Estimation of the parameters
heavily affect the residual variance [18]. If the assumption
tests were carried out without the drop of the outliers, some Yi = α + βXi + εi (A.1)
problems would be observed in the application of OLSM.
The adjustment to the model was clearly affected by these where Yi is the dependent or response variable, Xi the inde-
points, since heteroscedasticity (p > 0.05) and a significant pendent or predictor variable, α the intercept, β the slope and
lack-of-fit (p > 0.05) were observed for the chlorfenvinphos εi is the residual.
and parathion methyl curves, respectively, when they were
included. ŷi = a + bxi (A.2)
The understanding of these findings should be based on
where ŷi is the predicted dependent variable, xi the known
the properties of the F lack-of-fit test. If a calibration line
concentration, a the estimate of intercept and b is the estimate
has a significant curvature, the null hypothesis of linearity
of slope.
will be rejected and attempts must be made to find a more
appropriate model [13]. An obvious alternative would be a Sxy
polynomial fitting, but the question of how complex a model b= (A.3)
Sxx
would need to be is difficult and fundamental. On the other
hand, if the null hypothesis is not rejected, it does not mean where
n n
that the linear model is correct, only that the model is not 
n n
i=1 xi i=1 yi
contradicted by the data [17] or that insufficient data exist to Sxy = (xi − x̄)(yi − ȳ) = xi y i − ,
i=1 n
detect the inadequacies of the model [19]. In addition, there i=1
34 S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35

n 

n n 2 2
i=1 xi
0.1288 0.6118 1.3505

Sxx = (xi − x̄)2 = xi2 − , Rcrit (n) ≈ 1.0063 − − + for
i=1 n n n n2
i=1
α = 0.05 (A.13)
n
i=1 xi
x̄ = ,
n 0.0211 1.4106 3.1791
Rcrit (n) ≈ 0.9963 − √ − + for
n n n n2
i=1 yi
ȳ = , α = 0.01 (A.14)
n
where yi is the measured signal at xi and n is the number of
calibration points.
A.4. Test of homoscedasticity
a = ȳ − bx̄ (A.4)
d̄1 − d̄2
ei = yi − ŷi (A.5) tL =
(A.15)
1
n1 + 1
n2 sp2
where ei is the residual.

s2 where tL is the Levene t statistic, d̄j = |ẽj − eij |/nj the
sb2 = res (A.6) mean of differences between each residual and the median
Sxx
for each group j = 1 and j = 2, nj the number of observations
where sb2 is the slope variance and sres
2 is the residual variance. for each group, ẽj the group median for each group, sp2 =
SSD1 +SSD2
n 2 n1 +n2 −2 the pooled variance, SSDj the sum of squared
i=1 xi deviations of dij for each group and di = |ẽj − eij | is the
sa2 = sres
2
(A.7)
nSxx modulus of the differences between each residual and the
group median.
where sa2 is the intercept variance.
n A.5. Test of independency
(yi − ŷi )2
sres = i=1
2
(A.8) n
n−2 (ei − ei−1 )2
n d = i=1n 2 (A.16)
(ŷi − ȳ)2 i=1 ei
R2 = i=1 n (A.9)
i=1 (yi − ȳ)
2 where d is the Durbin–Watson statistic.
2.8607 3.4148 16.6400
where R2 is the determination coefficient. dL ≈ 1.9693 − √ − + for
n n n2

A.2. Outlier treatment α = 0.05 (A.17)



n−p−1 3.0547 1.3862 16.3662
Jei = ri (A.10) dU ≈ 1.9832 − √ + + for
n − p − ri2 n n n2
α = 0.05 (A.18)
where p is the number of model parameters, ri = Seei the
√ i 3.6875 2.6136 20.6393
standardised residual, sεi = sres 1 − hi the residual standard dL ≈ 1.9845 − √ − + for
(xi −x̄)2
n n n2
error and hi = 1
+ is the leverage.
n Sxx
α = 0.025 (A.19)
A.3. Test of normality 3.1647 0.6472 31.5772
dU ≈ 1.9480 − √ − + for
 n n n2
−1 (i − 3/8)
ci = φ , i = 1, ..., n (A.11) α = 0.025 (A.20)
(n + 1/4)
4.5929 1.3228 20.2288
where ci is the percentage point of the standard normal dL ≈ 1.9934 − √ − + for
n n n2
distribution and φ−1 is the inverse of the standard normal
distribution function. α = 0.01 (A.21)
0.1371 0.3682 0.7780 4.2974 1.0812 29.6862
Rcrit (n) ≈ 1.007 − √ − + for dU ≈ 1.9784 − √ + + for
n n n2 n n n2
α = 0.10 (A.12) α = 0.01 (A.22)
S.V.C. de Souza, R.G. Junqueira / Analytica Chimica Acta 552 (2005) 25–35 35

where dL is the lower critical limit and dU is the upper critical [19] P.C. Meyer, R.E. Zund, Statistical Methods in Analytical Chemistry,
limit. John Wiley & Sons, NY, 1993, pp. 81–134.
[20] D.A. Belsley, E. Kuh, R.E. Welsch, Regression Diagnostics: Identi-
fying Influential Data and Sources of Collinearity, Wiley, NY, 1980,
p. 292.
References [21] S. Weisberg, Applied Linear Regression, Wiley, NY, 1985, p. 324.
[22] W. Horwitz, Pure Appl. Chem. 67 (1995) 331–343.
[1] M. Thompson, S.L.R. Ellison, R. Wood, Pure Appl. Chem. 74 (2002) [23] G.E.P. Box, D.R. Cox, J. R. Stat. Soc. 26 (1964) 211–243.
835–855. [24] T.A. Ryan, B.L. Joiner, Normal Probability Plots and Tests for Nor-
[2] International Standard Organization (ISO), ISO/IEC 17025, General mality, The Pennsylvania State University, State College, 1976, p.
requirements for the competence of testing and calibration laborato- 15.
ries, 1999, 26 pp. [25] G.E.P. Box, Biometrika 40 (1953) 318–335.
[3] International Standard Organization (ISO), ISO 11095, Linear cali- [26] H. Levene, Contributions to Probability and Statistics, Stanford Uni-
bration using reference materials, 1996, 29 pp. versity Press, Stanford, 1960, pp. 278–292.
[4] L. Huber, LC/GC Int. (February 1998) 96–105. [27] M.B. Brown, A.B. Forsythe, J. Am. Stat. Assoc. 69 (1974) 364–367.
[5] P. Bruce, P. Minkkinen, M.L. Riekkola, Mikrochim. Acta 128 (1998) [28] J. Johnston, Econometric Methods, McGraw-Hill Book Company,
93–106. NY, 1984, pp. 287–342.
[6] AOAC, Peer Verified Methods Program, Manual on Policies and [29] J. Durbin, G.S. Watson, Biometrika 38 (1951) 159–178.
Procedures, AOAC International, Arlington, 1998, 35 pp. [30] F.C. Strong, Anal. Chem. 51 (1979) 288–299.
[7] H. Mark, J. Pharm. Biomed. Anal. 33 (2003) 7–20. [31] J.J. Leary, E.B. Messick, Anal. Chem. 57 (1985) 956–957.
[8] Eurachem, The fitness for purpose of analytical methods, A Lab- [32] R.R. Ellerton, Anal. Chem. 52 (1980) 1151–1152.
oratory Guide to Method Validation and Related Topics, 1998, 61 [33] J.M. Dufour, A. Farhat, L. Gardiol, L. Khalaf, Econom. J. 1 (1998)
pp. 154–173.
[9] M. Mulholland, D.B. Hibbert, J. Chromatogr. A 762 (1997) 73–82. [34] K.V. Mardia, Tests of univariate and multivariate normality, in: P.R.
[10] ICH, International Conference on Harmonization of Technical Krishnaiah (Ed.), Handbook of Statistics: Analysis of Variance, vol.
Requirements for Registration of Pharmaceuticals for Human Use, 1, North Holland, Amsterdam, 1980, pp. 279–320.
Validation of Analytical Procedures: Methodology, Geneva, 1997, [35] R.B. D’Agostino, M.I.A. Stephens, Goodness-of-Fit Techniques,
pp. 27463–27467. Marcel Dekker, NY, 1986, pp. 367–419.
[11] European Commission (EC), Commission Decision of 12 August [36] L. Baringhaus, R. Danschke, N. Henze, Commun. Stat. Simul. Com-
2002 Implementing Council Directive 96/23/EC Concerning Perfor- put. 18 (1989) 363–379.
mance of Analytical Methods and the Interpretation of Results, Offic. [37] J.J. Filliben, Technometrics 17 (1975) 111–117.
J. Eur. Communities, 2002, L 221/8. [38] D.N. Rutledge, A.S. Barros, Anal. Chim. Acta 454 (2002) 277–295.
[12] K. Danzer, L.A. Kurrie, Pure Appl. Chem. 70 (1998) 993–1014. [39] H. Mark, J. Workman, Spectroscopy 20 (2005) 34–38.
[13] D.C. Montgomery, G.C. Runger, Applied Statistics and Probabil- [40] K. Albertson, J. Aylen, K.B. Lim, J. Stat. Comput. Simul. 72 (2002)
ity for Engineers, John Wiley & Sons Inc., NY, 1994, pp. 471– 507–516.
624. [41] Association of Official Analytical Chemists (AOAC), Guidelines for
[14] J.C. Miller, J.N. Miller, Statistics for Analytical Chemistry, Ellis Collaborative Study Procedures, AOAC International, Appendix D,
Horwood Limited, London, 1993, pp. 101–141. 1997, p. 13.
[15] B.D. Ripley, M. Thompson, Analyst 112 (1987) 337–383. [42] P. Chiap, P. Hubert, B. Boulanger, J. Crommen. Anal. Chim. Acta
[16] J. Riu, F.X. Rius, J. Chemometry 9 (1995) 343–362. 391 (1999) 227–238.
[17] N.R. Draper, H. Smith, Applied Regression Analysis, Wiley, NY, [43] G.W. Snedecor, W.G. Cochran, Statistical Methods, Iowa State Uni-
1998, p. 706. versity, Ames, 1989, pp. 149–176.
[18] M. Meloun, J. Militký, Anal. Chim. Acta 439 (2001) 169–191. [44] R.D. Cook, Technometrics 19 (1977) 15–18.

You might also like