You are on page 1of 3

Recursive Calculation of the Standard Deviation with Increased Accuracy

H. R. Biesel
Hewlett-Packard GmbH, Ohmstral~e 6, D-7500 Karlsruhe

Summary When all x i values are summed, we calculate according to


When calculating the standard deviation of a series of (2) and (4):
measured data the analyst often underestimates the influ- ~ = ~u (6a)
ence of rounding errors that lead to wrong results. This
paper presents a procedure which avoids this disadvantage,
being especially suitable for use in minicomputers.

Note that each value x i may be cleared in memory as


soon as it has been processed by equation (5a, b).
Unfortunately, equation (4) contains the difference o f
1. The Constraints in Conventional Procedures for two nearly equal magnitudes. Modern minicomputers
Calculating the Standard Deviation usually have a precision of about 6 decimal digits in the
floating point mode. While such an accuracy is adequate
A very powerful and c o m m o n method for examining and for most analytical applications, it may lead to extremely
improving the accuracy of measured data is to compute erroneous results when computing the standard deviation
mean values and standard deviations. Let with equation (4). This fact, which is widely underestimat-
xi, i = 1,2 . . . . . n (1) ed, will be demonstrated by a simple example:
be the results of a series of n measurements. Their mean
xi x?
value will be
n
1.998 3.992004
1 2.000 4.000000
~n : ~ ~.~ xi (2)
i=l
2.002 4.008004
and the standard deviation is
3 3
u = X~' xi = 6, v = ~ ' x = 12.000008
Sn = (~n -- (3) i=l i=l
i=l
Obviously, equations (2) and (4) result in
When using equation (3), we obviously have to make two
passes: first we compute the mean value according to (2), 6
% =-~ = 2
then we have to cumulate the sum of the deviations
(~-n - xi) 2. Consequently all values x i must be available
s3 =
r 1
(12.000008 - 3 " 36
for this calculation; this constraint may turn out to be a
serious problem if electronic calculating machines or mini-
computers with limited storage capacity are used. There- ~/1
= ~ " 0.000008 = 0.002
fore the following equation, which is equivalent to (3), is
often preferred:
If the precisions is limited to 6 decimal places, equation
1 x '2 ~ n 2 (4) produces a wrong result:
Sn = ' -- n (4)
i=l s3 = (12.0000 - -~" 36) = 0 .

When applied in conjunction with (2), this delivers both


mean value and standard deviation and requires only two In practical analytical applications the error may be less
cumulations: Assume that two variables u, v have been obvious, but it will always exist. On the other hand,
initialized to zero. Whenever a new value xi has been equation (3) leads to the correct value in spite o f the
obtained, we cumulate x i and x 2 : limited numerical precision, but this advantage must be
payed for by a more complicated data handling proce-
u + x i =~ u (5a) dure. The following section will show how the advantages
v + x~ :~ v (5b) of both methods can be combined.

Chromatographia, Vol. 10, No. 4, April 1977 Originals 173


2. Recursive calculation of the Standard Deviation So we obtain
Without Loss of Accuracy due to Rounding E r r o r s
2 1 - 2 1
S n -- Sn_ 1 = x n - n ( n X n ) + (nR n -- Xn) 2
Assume that n - 1 results x i have already been processed, n-1
so we have according to (2, 4):
1
n-1 = x2n - nR2n + ~ - f (n = R2n - 2nRnX n + xan)
_ 1
Xn-t n- 1 xi (Ta)
i=l 1
=n 1 ( ( n - 1 ) x 2 n - n ( n - 1 ) x 2 n +n:x2n -

Sn_ 1 =
\i=l
xi - n -_ 1
- ,(? x)3 i=l
(7b)
- 2nRnX n
1
+ x2n)

= n - I (nx2n + n~zn -- 2nXnXn)

For convenience, we write n

n,
22 1
(n) 2
n - 1 (Xn - Xn) 2

an-1 = Xi n - 1 xi (7c) Thus we have tt~e recursive formula


i=l \ =
n
Sn = Sn-1 + n - - ] (Xn -- x . ) 2 (9b)
Now the value xn m a y become available as a result o f the
n-th analysis. We will calculate the updated mean value and therefrom the standard deviation
and standard deviation:
n
Sn = Sn (9c)
x. =~ ~ xi (8a)
Note that the procedure described here does not require
i=l
all the values x i to be stored permanently in the m e m o r y ,
because the calculation is based upon a recursive process.
s. = - x, (8b) Nevertheless, all terms in (9b) are of the same order o f
i-I i=~l magnitude, and the influence of rounding errors is
eliminated. The usual precision of six decimal digits is
Again, we will use the abbJ'eviation sufficient in practical applications, as long as floating
point data format is provided. An application o f formula
S. = x~ - ~ xi (Sc) (9a, b, c) may demonstrate its efficiency. We will handle
i=l i=l the same example as in section I. Obviously the recursive
processing requires some variables to be substituted by
Let us see how the new values in (8a, b, c) can be derived appropriate values. Therefore let
from the former ones as contained in (7a, b, c). The new
x l = xl (10a)
mean value can easily be calculated recursively:
S1 = 0 (10b)
x~-= l ( ( n - 1) ' X'n-1 + Xn) (9a)

i xi Ki i (xi
i-1 _ xi)2 Si si
In order to determine the increment in the standard
deviation, we write using (8c) and (7c): 1 1.998 1.998 n.a. 0 n.a.
2 2.000 1.999 0.000 002 0.000002 0.001414
I 2 nX, ~
3 2.002 2.000 0.000006 0.000008 0.002000
S n -- Sn_ 1 =
2
i= 1
X? - ~
i=l
Xi -
i= 1
// X 2i - I-

As indicated, the mean values and standard deviations


result in the correct values.
n 1 " Xi
\i=l

3. Practical applications
With respect to (8a), we substitute
The recursive calculation of the standard deviation as
n
presented in this paper is especially suitable for repetitive
Xi = n Xn
L, chromatographic analyses, though not restricted to that
i=l
purpose. Its full usefulness will become manifest in mini-
computer-based laboratory data systems, where the
Xi = xi -- Xn = n.~n -- Xn, and further
computer acquires and stores the results o f the analyses
i=l i=l
and, after they are completed, initiates appropriate
n I1-1
routines that evaluate and administrate the resulting data.
The standard software package can be expanded so that
i=1 i=l statistical calculations are automatically performed over

174 Chromatographia, Vol. 10, No. 4, April 1977 Originals


a series of runs. However, such an activity is reserved to
experienced software specialists. This restriction does not
[ STA.T S~StE.----]
PREPARE METHOD I apply where the laboratory data system provides for
background programming capability, as does the HP3352B
system with LAB BASIC, LAB BASIC is an easy but
" PE R gORbl ANALYSIS,
AFER"REFORT"BASIC IS CALLED; powerful programming language which enables any user
. I GET RESULTS XCI ], I =] TO K to write his programs according to his individual require-
ments. The discussion in this paper takes into account the
capabilities and limitations o f LAB BASIC; however,
NOY
r : l N= ~ I there may be a wider field of other applications. Rather
l than discuss the details of a special program, we will
therefore use the flow chart (Fig. 1) as a general guide to
l calculate mean values and standard deviations.
. T
rPROCESS MEAN VALUE" This flow chart assumes that K data values will be sub-
/MI;) = ( I N - 1 ) * M E [ ] : , XIS[]} /N I
mitted to the statistical processing after each run, i.e.
INITIALIZE: their mean values and standard deviations are to be com-
s{t] :r
puted. The BASIC program expects those values to be
PROCESS STANDARD DEVIATION: [
found in a data array named X(I), I = 1 to K. This array
S [I] =
Sill *N-~ M[1]-XEI]},2) /IN-l)[ must be reserved by a DIM-statement, as well as two
+ auxiliary arrays M(I) and S(I) that will contain the
cumulated data for mean values and standard deviations.
This routine will continuously cumulate the results of
each run, whereby N indicates the number of processed
runs. The user must tell the program (e.g. by setting the
NO switch register, by special counters or by dialogue) if he
wishes to start a new series of runs; the program will
PRINT REPORT, | : 1 "[OK; then clear N to zero, and will reset the data arrays during
MEAN VALUE: M [ ] ] ;
STANDARD DEVIATION: execution. A similar procedure may be used to obtain a
5 ~ = SQR(SEI]/(N-I]}
printed report. The user can easily add those details that
l ....... (7oo ;o FXECUJ':"xI are important to him. The author has used the procedures
Fig. 1
'xNE X7 AN~.LYSIS/t described above in several analytical systems where they
have proved superior to the conventional methods as
9 General flowchart mentioned in section 1.

Received: Aug. 4, 1976


Accepted: Dec. 15, 1976

Chromatographia, Vol. 10, No. 4, April 1977 Originals 175

You might also like