You are on page 1of 6

INSTITUTE OF PHYSICS PUBLISHING MEASUREMENT SCIENCE AND TECHNOLOGY

Meas. Sci. Technol. 12 (2001) 14391444 PII: S0957-0233(01)21508-2


On the robust utilization of
non-parametric tests for evaluation of
combined cyclical and monotonic drift
Paolo Cappa
1
, Sergio Silvestri
1
and Salvatore Andrea Sciuto
2
1
Department of Mechanics and Aeronautics, University of Rome La Sapienza, Via Eudossiana 18,
00184 Roma, Italy
2
Department of Mechanical and Industrial Engineering, University of Rome Roma Tre, Via della
Vasca Navale 79, 00146 Roma, Italy
E-mail: paolo.cappa@uniroma1.it, s.silvestri@dma.ing.uniroma1.it and sciuto@uniroma3.it
Received 30 January 2001, in nal form 29 May 2001, accepted for publication
11 June 2001
Published 2 August 2001
Online at stacks.iop.org/MST/12/1439
Abstract
Non-parametric statistical tests are usually adopted to recognize drift phenomena
in data series acquired by automatic measuring systems. Sometimes different
tests provide different responses with regard to the presence of drift in data sets.
In particular, the application of two commonly used statistical tests (i.e. the
WaldWolfovitz test, also known as the run test, and the MannWhitney test, also
known as the reverse arrangement test) to data acquired during long-term trials
showed that they exhibited different behaviours regarding shift detection. Thus, a
deeper examination of previously indicated statistical tests in real measurements
appears of interest; more precisely their effectiveness is veried (i) when cyclical
and monotonous drift are concurrently present and (ii) various levels of noise are
superimposed on the data gathered. Finally, on the basis of the theoretical results
obtained, the statistical tests here examined are applied to the same data set in
order to validate their capability in on-line drift recognition.
Keywords: cyclical and monotonic drift, drift evaluation, shift detection,
non-parametric tests, run test, reverse arrangement test
Nomenclature
A amplitude of cyclical component in data set
a, b linear constants
B value of reverse arrangements in the MannWhitney
reverse arrangement test obtained by examining the
whole data set
B
i
value of the generic reverse arrangement inthe Mann
Whitney reverse arrangement test
E end value of the linear component in the data set,
where E = a + bT
h
ij
reverse arrangement obtained by examining x
i
and x
j

B
mean of arrangements in the MannWhitney reverse
arrangement test

r
mean of runs in the WaldWolfovitz run test
n number of cycles present in data set
N total number of runs
N
1
number of positive runs in the WaldWolfovitz run
test
N
2
number of negative runs in the WaldWolfovitz run
test
r random variable of the run distribution
R the ratio A/E
RAT MannWhitney reverse arrangement test
RT WaldWolfovitz run test

2
B
variance of arrangements in the MannWhitney
reverse arrangement test

2
r
variance of runs in the WaldWolfovitz run test

y
standard deviation associated with y, where y =
a + bt + Asin[2(n/T )t ]
t time
T total duration of data acquisition
x random variable
x
m
median value of a random variable
0957-0233/01/091439+06$30.00 2001 IOP Publishing Ltd Printed in the UK 1439
P Cappa et al
1. Introduction
As is well known, zero drift is a relevant problem in long term
experiments when the stability of the measurement system
cannot be checked periodically. In fact, both monotonic and
cyclical variations in environmental parameters, for example
due to uctuations in temperature, humidity, etc, cause self-
generated systemdrift that can affect the validity of data. These
unwanted variations in signal can be clearly detected only
when their magnitude exceeds the noise oor of the system.
Obviously, an on-line indication of noticeable instrumentation
drift would be useful for an effective diagnostic of metrological
properties of the system.
Statistical analysis is an effective means for on-line
analysis of time series in order to recognize drift phenomena
affecting the data collected [1, 2]. In particular, non-parametric
tests, i.e. the WaldWolfovitz run test (RT) [36] and Mann
Whitney reverse arrangement test (RAT) [35], are utilized
to process data in order to highlight the presence of a
systematic trend in the results observed as a function of time.
The previously mentioned statistical procedures are called
distribution-free or non-parametric procedures because they do
not hypothesize a specic distribution function for the original
random variable of interest (the acquired data set) [711].
Furthermore, the efciency of statistical analysis strongly
depends on the spread of the acquired data; consequently the
experimentalist must be sure of the reliability and effectiveness
of non-parametric tests before using them.
In a previous paper [12], the effectiveness of the RT and
RAT were evaluated for the case of monotonic drift and graphs
have been drawn showing reliability and unreliability zones of
these tests as a function of the slope of the linear trend and the
spread of acquired data. Those graphs, by a preliminary simple
calculation of standard deviation of data, allow the evaluation
of the minimum noticeable linear drift.
A further analysis appeared of interest in the case of
combined periodic and monotonic drift in order to get a deeper
estimationof the reliabilityof the test beingexamined. The aim
of the present work, therefore, is to determine reference graphs
for the reliability of both the statistical methods examined
for detecting drift phenomena as a function of frequency of
uctuation, linear drift and amplitude of noise.
The results obtained will nally be applied to the data
relative to a long term test of a novel electrical strain
gauge conditioning unit based on a direct resistance method
(DRM) scheme [13]. The conditioning unit has been tested
continuously over a long term eld situation with a test area
temperature ranging from 6 to 60

C; the test lasted for six


months and data have been stored and processed after being
acquired with an analogue-to-digital converter.
2. Description of the non-parametric tests
As mentioned previously, statistical procedures which do
not assume a specic distribution function for the original
random variable of interest are called distribution free or non-
parametric procedures. One of the best known distribution
free procedures used for evaluation of data is the well known

2
goodness of t test, but also the RT and RAT are widely
utilized to process experimental data to detect instability of the
-50
-30
-10
10
30
50
0 200 400 600 800 1000
Time
M
e
a
s
u
r
e
d

V
a
l
u
e
+ + +
- - -
+ + +
1 2 3
median
-50
-30
-10
10
30
50
0 200 400 600 800 1000
Time
M
e
a
s
u
r
e
d

V
a
l
u
e
+ + +
- - -
+ + +
1 2 3
median
Figure 1. The data sequence used for the WaldWolfovitz run test
calculation.
system. Every statistical test gives its response of acceptance
or rejection of the starting hypothesis at a certain level of
condence or signicance, also called the level. Usually, an
level equal to 95%, which obviously means that there is a 5%
probability of failure, is commonly accepted for experimental
data processing. To make the following considerations clearer
and to allow the use of the herewith examined statistical tests,
their application will be briey described.
2.1. The WaldWolfovitz run test
Let us consider a sequence of N observed values of a random
variable x such that each observation can be classied into one
of two mutually exclusive categories, which may be identied
simply by a plus (+) or a minus (). For example, in gure 1,
a sequence of x
i
, i = 1, 2, 3, . . . , N measured values with a
median value x
m
is depicted in arbitrary units; we will count
a (+) or a () for each x
i
x
m
and x
i
< x
m
, respectively. A
run is dened as a sequence of identical observations, which
is positive if referred to (+) observations or negative if vice
versa, that is followed and preceded by a different observation,
i.e., respectively, () or (+). The number of runs occurring
in the whole sequence of observations gives an indication of
whether data are independent observations of the same random
variable. More specically, if N observations of the same
random variable are independent, the probability of a (+) or a
() result does not change from one observation to the next.
So the sampling distribution of the number of runs occurring
in the sequence is a random variable r with a mean value and
a variance evaluated as follows:

r
=
2N
1
N
2
N
+ 1 (1)

2
r
=
2N
1
N
2
(2N
1
N
2
N)
N
2
(N 1)
(2)
where N
1
is the number of positive runs and N
2
the number of
negative ones. Then, a normalized Gaussian curve is obtained
by means of equations (1) and (2). If N
1
lies in the condence
interval dened by the level then the response is positive
and the two categories have the same distributions, indicating
an absence of drift. Limited tabulations of percentage points
for the distribution function of runs are also available in the
literature [10].
1440
Non-parametric tests for evaluation of drift
Wald Wolfovitz
Test
Mann Whitney
Test
Gaussian noise
Linear trend
Cyclical trend
+ Data set
Parameters
calculation
Graph
Graph
R,
y
,

no. of cycles
R increase
no. of cycles increase
noise variance increase

Figure 2. A ow chart of the software implemented for the determination of test reliability graphs.
2.2. The MannWhitney reverse arrangement test
Given a sequence of N observed values of a random variable
x such that the observations are denoted by x
i
, i =
1, 2, 3, . . . , N, each time that x
i
> x
j
for i < j must be
counted as a reverse arrangement. Therefore, for a set of
observations x
1
, x
2
, . . . , x
N
, a general denition for the total
number of reverse arrangements denoted by B reads
B =
N1

i=1
B
i
(3)
where any element of the sum B
i
is dened by
B
i
=
N

j=i+1
h
ij
(4)
with
h
ij
=

1 if x
i
> x
j
0 otherwise
(5)
for any i < j.
If the N observations are independent observations of the
same population, i.e. no drift is present, then the number of
reverse arrangements is a random variable B, with a mean
value and a variance calculated as follows:

B
=
N(N 1)
4
(6)

2
B
=
N(2N + 5)(N 1)
72
. (7)
Then a normalized Gaussian curve is built with a mean
value and variance calculated according to equations (6) and
(7), respectively. If B lies within the condence interval
dened by the level, then the response is positive and drift is
not present. Also in this case, limited tabulations of percentage
points for the distribution function are available in the literature
[10].
3. Determination of reliability graphs and discussion
In order to determine the reliabilities of the two tests considered
for detection of drift, simulation software was specically
Time
Data
A
E

y
n
o
. o
f c
y
c
le
s
Ratio R
A
E
=
Figure 3. An example of the data set generated by the simulation
program with its main parameters.
designed and implemented in LabView
TM
. It was decided to
set a condence level equal to 95% for acceptance of the
response of each of the tests examined; this value is commonly
accepted for data processing in experimental mechanics.
With reference to gure 2, the software is composed of two
main modules. The rst module is developed for generation
of data by taking the sum of a linear equation and a sinusoidal
function, whose general equation is
y = a + bt + Asin[2(n/T )t ]. (8)
Then the software, as shown in gure 3, generates Gaussian
noise with a variable standard deviation (
y
) that is
superimposed on the data set produced by equation (8). The
following variables are individualized as parameters:
y
; the
number of cycles n; and, nally, R, where R is the ratio of A
(the amplitude of the sinusoidal component) and E (the end
value of the linear component).
The latter module performs RT and RAT on the data
generated in the rst one after a preliminary evaluation of main
parameters, such as: (i) the coefcients a and b of the best t
curve found by the least square method; (ii) the end value of the
linear component in the data set, E, and, by means of spectral
analysis, (iii) the number of cycles n and, nally, (iv) their
amplitude A. Then, the software summarizes graphically the
results representing the success/failure of the specic test as a
1441
P Cappa et al

y
R
n
o
. o
f
c
y
c
l
e
s
Reliability zone
Unreliability zone
80 samples
120 samples
240 samples
Figure 4. Zones of reliability and unreliability for the RT as a function of
y
, the number of cycles and the R ratio (the amplitude of the
sinusoidal component/the end value of the linear component of the deterministic trend).
n
o
. o
f
c
y
c
le
s
R

y
Reliability zone
Unreliability zone
80 samples
120 samples
240 samples
Figure 5. Zones of reliability and unreliability for the RAT as a function of
y
, the number of cycles and the R ratio (the amplitude of the
sinusoidal component/the end value of the linear component of the deterministic trend).
function of
y
that, as previously mentioned, is representative
of the accuracy of the system, R and n representing the
instability of the sytem.
The simulation software provided as an output two
diagrams represented in gures 4 and 5, relative to an integer
number of cycles. From an overall analysis of these diagrams
it is possible to observe that, as expected, an increase in
amplitude of noise causes an increase in unreliability of the test
in the case of a small drift component R the RTis more reliable
than the RAT when the sinusoidal component is predominant
(R has a high value); on the other hand, the RAT is more
reliable than the RTwhen the linear component is predominant
(R has a low value). The RT is substantially independent of
the number of cycles present in the data and its reliability is
related only to the amplitude of the drift component, whereas
the RAT is strongly dependent on the number of cycles. This
is the main reason for the common assumption that the RT has
a greater sensitivity than does the RAT in the case of detection
of periodic variation and, as a consequence, the RT is widely
adopted for the detection of periodic drift. Furthermore, two
main zones can be outlined: (i) a success zone, where the test
gives a reliable response; and (ii) a failure zone, over which
the result of the test is denitely unreliable.
The individuation of surfaces relative to the reliability
limit for both tests allows a priori evaluation fromthe data set if
the test gives a reliable answer and, in a way, if it is applicable.
Thus, the statistical tests examined can be considered effective
means for validation of data because they are able to provide
reliable and sudden indications regarding the instability of the
system. Once the presence of drift has been recognized, the
level of drift can be evaluated by means of common methods
such as the least square method, drawing the best t curve, etc.
1442
Non-parametric tests for evaluation of drift
-20
-15
-10
-5
0
5
10
15
20
0 20 40 60 80 100 120 140 160 180
[ m/m]
[days]
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
0 0.5 1 1.5 2
2 days, 80 samples
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
0 0.5 1 1.5 2 2.5 3
3 days, 120 samples
-8
-6
-4
-2
0
2
4
0 1 2 3 4 5 6
6 days, 240 samples
Figure 6. The apparent strain acquired from the DRM conditioning unit in a long term zero shift experiment.
4. Experimental verication of the reliability of the
non-parametric tests
In order to validate the results obtained by the drift test
analysis, application to a data set that had previously been
acquired from a DRM conditioning unit was performed. The
conditioning unit was assembled in a sealed aluminium box
to protect the electronics from environmental agents, such as
dust, wind, rain and electromagnetic noise. The box was
placed outdoors to be exposed to environmental variations,
whereas the voltage supply, the off-the-shelf instrumentation
and the personal computer with A/D converter were placed
indoors. The box was intentionally not thermo-insulated in
order to allow severe temperature variation (from 6 to 60

C)
over a period of six months in order to examine environmental
inuences on the stability of the DRM conditioning unit [13].
The apparent strain due only to the uctuation in temperature
is reported in gure 6 as function of time. Cyclical thermal
variations are manifested after 2, 3 and 6 days of acquisition
so that it was decided to process data by calculating the cited
parameters of equation (8) and performing the RT and RAT for
the corresponding 80, 120 and 240 sample data sets. After the
processingof the rst 80samples the RTgave a positive answer,
outliningthe presence of a periodic drift, whereas the RATgave
a negative answer, i.e. negating the presence of a monotonic
drift. However, from the analysis of the data set considered
and the comparison with the graphs determined here, only the
RT can be considered reliable since the representative point of
the rst 80 samples (green point), shown in gure 4, lies in
the zone of reliability. On the other hand, the same point in
gure 5 conrms that the RAT is unreliable for that data set,
so nothing can be stated about the presence of a monotonic
drift and, nally, further analysis with a greater number of
samples has to be conducted. The examination of the data
acquired after 3 days provides a positive answer for both tests,
whose reliability is conrmed by the position of the point
representative of the rst 120 samples which lies in the zone
of reliability for both tests; in particular, it can be afrmed
that, after 3 days, a combined cyclic and monotonic drift is
present. The monotonic component of drift can be attributed
to seasonal variation in temperature, as can be observed in the
graph relative to the whole 180-day data set. It is denitely
relevant to consider how a long term drift component could
soon be detected only by analysing a few samples, if the
reliability of the chosen tests had previously been validated
by means of the proposed graphs.
Six days after, an apparent compensation of the monotonic
drift component seems to appear. Furthermore, the RAT gives
a negative response regarding drift for this data set. However,
since the point representative of the rst 240 sample (the red
point in gure 5) lies in the unreliability zone for the RAT, the
statistical test results cannot be accepted as reliable because
the reliable application of RAT is not conrmed. Also in this
case, as is shown by the point relative to the rst 240 samples
shown in gure 4, the RT is able to recognize the presence of
drift.
1443
P Cappa et al
5. Conclusions
A reliability analysis of the run test and reverse arrangement
test for the evaluation of combined monotonic and cyclical
drift in an experimental data set was conducted. As a
result, graphs of reliability/unreliability with a 95%condence
level were drawn for both the non-parametric tests examined.
Experimentalists have been provided with a useful method
able (i) to state whether the tests examined can be applied
to an acquired data set and also (ii) to preliminarily evaluate
the minimum noticeable drift when the overall accuracy
associated with the measuring system and also the number
of periodic variations induced by environmental effects are
known. Finally, the methodology proposed herein has been
validated with experimental data: the application of the
reliability graphs determined permitted understanding of the
discordance manifested by the two tests when they were used
with the same data set. On the basis of the considerations
reported, experimentalists will be able to implement an
automatic procedure to choose in real time which test would
be preferable to apply for drift evaluation and, therefore to
reliably and immediately recognize the presence of drift. Once
the instability of the system has been outlined by results
of statistical analysis, the magnitude of drift could be then
estimated by common techniques such as the least squares
method.
References
[1] Hojstrup J 1993 A statistical data screening procedure Meas.
Sci. Technol. 4 1537
[2] Lawunmi D 1997 A theoretical analysis of exponentially
decaying time series Meas. Sci. Technol. 8 7036
[3] Taylor J R 1982 An Introduction to Error Analysis (Mill
Valley: University Science Books)
[4] Draper N R and Smith H 1981 Applied Regression Analysis
(New York: Wiley)
[5] Brownlee K A 1965 Statistical Theory and Methodology in
Science and Engineering (New York: Wiley)
[6] Wald A and Wolfovitz J 1940 On a test whether two samples
are from the same population Ann. Math. Statist. 11 14762
[7] Conover W J 1980 Practical Nonparametric Statistics 2nd edn
(New York: Wiley)
[8] Blalock H M 1979 Social Statistics 2nd edn (New York:
McGraw-Hill)
[9] Lehmann E L 1975 Non Parametrics: Statistical Methods
Based on Ranks (San Francisco: Holden Day)
[10] Wayne D W 1978 Applied Non-parametric Statistics (Boston:
Houghton Mifin)
[11] Hollander M and Wolfe D A 1973 Non-parametric Statistical
Methods (New York: Wiley)
[12] Cappa P, Sciuto S A and Silvestri S 2001 Reliability analysis
of non-parametric statistical tests for the evaluation of linear
drift in experimental data Strain 37 6974
[13] Cappa P, Del Prete Z and Marinozzi F 2001 Long term
stability of a novel strain gage conditioner based on the
direct resistance method Exp. Techn. 25 247
1444

You might also like