You are on page 1of 4




Extracting Colored Noise Statistics

in Time Series via Negentropy
Jean-Philippe Montillet, Member, IEEE, Simon McClusky, and Kegen Yu, Senior Member, IEEE

AbstractIn the analysis of some specific time series (e.g.,

Global Positioning System coordinate time series, chaotic time
series, human brain imaging), the noise is generally modeled
as a sum of a power-law noise and white noise. Some existing
softwares estimate the amplitude of the noise components using
convex optimization (e.g., Levenberg-Marquadt) applied to a
log-likelihood cost function. This work studies a novel cost function based on an approximation of the negentropy. Restricting the
study to simulated time series with flicker noise plus white noise,
we demonstrate that this cost function is convex. Then, we show
thanks to numerical approximations that it is possible to obtain
an accurate estimate of the amplitude of the colored noise for
various lengths of the time series as long as the ratio between the
colored noise amplitude and the white noise is smaller than 0.6.
The results demonstrate that with our proposed cost function we
can improve the accuracy by around 5% when compared with the
log-likelihood ones with simulated time series shorter than 1400
Index TermsColored noise, convex optimization, LevenbergMarquadt, negentropy, time series.


N the last decade, there has been a concerted effort to develop methods to estimate the statistics of colored noise
from time series in various research fields. [8] show the importance of modelling biological data set (e.g., functional magnetic
resonance imaging) with a power-law model in order to characterize the underlying dynamical behaviour of the time series. In
the study of dynamical systems [2] and [6] show the relationship
between colored random noise and deterministic chaos (strange
attractor). In signal processing, the colored noise needs to be
modelled to analyse received signals for some specific applications [13], [18]. Furthermore, modelling the colored noise is
also important in the study of geodetic time series [9], [14], [16].
For instance, current research focuses on investigating different
models to best fit the noise on GPS receivers coordinates in
order to monitor accurately geodetic phenomena such as earthquakes or tectonic movements. The general method is to fit a

linear trend to the GPS coordinates time series and then model
the residual (colored noise) following a power-law with a spectral index
[16] such as:
With this definition, flicker noise corresponds to equal to 1,
the random-walk to equal to 2, and white noise is related to
equal to 0. The flicker noise and random-walk are classified as
long-term dependency phenomena [9]. An extensive scientific
literature is available on this model [10], [14], [19].
In [11], the authors used a Monte Carlo Markov Chain to estimate the statistics of colored noise in GPS time series. [16] and
[19] have extensively investigated how to use a maximum likelihood cost function combined with a simplex downhill algorithm to accurately estimate the amplitude of the colored noise
in geodetic time series.
The study in [10] indicates that it is possible to accurately
estimate the white noise statistics in the colored noise model.
Using this information and applying negentropy theory to a
time series, we demonstrate that it is possible to calculate the
ratio between the power-law noise and white noise. Specifically,
a negentropy-based cost function is developed first. The ratio
estimate is then produced by minimizing this nonlinear cost
function using a nonlinear optimization or minimization method
such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) family
of algorithms [15]. Once the ratio estimate is obtained and the
white noise amplitude is preliminarily estimated (e.g., [10]), the
power-law noise amplitude can be determined.
The following section describes the data model and the
derivation of the new cost function based on negentropy. The
third part analyses the convexity of this new cost function fixing
the power-law equal to the flicker noise. The fourth section discusses on the results when comparing the performance of this
new cost function with the log-likelihood cost function. It also
underlines the limit of validity of the numerical approximations
in deriving the new cost function.

Manuscript received January 09, 2013; revised April 07, 2013; accepted June
20, 2013. Date of publication June 26, 2013; date of current version June 28,
2013. This work was supported by the Australian Research Council under Grant
DP0877381. The associate editor coordinating the review of this manuscript and
approving it for publication was Prof. Alireza Seyedi.
J.-P. Montillet and S. McClusky are with the Research School of Earth
Sciences, The Australian National University, Canberra, ACT 0200, Australia
K. Yu is with the Satellite Navigation and Positioning Laboratory, School of
Surveying and Spatial Information Systems, The University of New South of
Wales, Sydney, NSW 2052, Australia (e-mail:
Color versions of one or more of the figures in this paper are available online
Digital Object Identifier 10.1109/LSP.2013.2271241

A. Data Model
which is a -vector
Consider the time series
). If
is a time series of the coordinates of a GPS receiver, then we can model it as:
where is the sum of white noise and power-law noise at the
is the sum of signals such as geodetic signals
-th epoch.
such as the velocity rate and the seasonal variations in GPS

1070-9908/$31.00 2013 IEEE



time series [16] or sinusoidal signals in noise [1]. From the previous works of [16], [19], the variance-covariance matrix of the
white noise is equal to
( is the identity matrix), whereas
the power-law variance-covariance matrix is equal to
With the transpose operator, the spectral index, and
variance of the colored noise. is a lower triangular matrix defined for each coefficient such as:




and with the approximation of the determinant as described in




Throughout this work, the covariance of the colored noise is

equal to:

Note that



Note that (8) holds as

is a linear map [7] and if the series

is converging. Moreover,
a sum of covariance matrices, thus positive definite. It implies
that its eigenvalues are ordered such as
. Remembering that by eigen decomposition of a
square matrix, one can write:

B. Negentropy and the New Cost Function

In information theory, the entropy is an important concept,
especially its maximality property for Gaussian random variable [3]. This property states that a Gaussian random variable
of unit variance has the largest entropy [3]. Thus, the negentropy
is the differential entropy and it is a measure of
non-Gaussianity. It is equal to zero for a Gaussian random variable and always positive. Furthermore it has been shown in [5]
that the negentropy can be approximated by cumulants, but [3]
warns about the validity of such approximation as it suffers from
the non robustness encountered with high-order cumulants (e.g.,
kurtosis) and the authors propose the approximation:

is the eigen vector associated
with the eigen value . Thus,
is diverging if
. One possibility is to scale
with its highest eigen values
Property 01:
is a
converging series ( is in
Proof: According to (9) and the linearity property of the
operator, one can write:

is the expectation operator,
are respectively
equal to 7.413 and 33.699.
is the function entropy as defined in [3]. Although this expression is more robust than the
one with higher-order cumulants, the accuracy of this expression is relatively sensitive to the stochastic property of the time
(e.g., Gaussian distributed, unit variance ) [3].
One can apply the above equations to the special case
an -vector of the residual of the GPS time series
of one coordinate after removing the linear trend following (2).
Based on the definition of the negentropy, a zero-mean normal
distributed random variable with a standard deviation equal to
1 has a null negentropy [3]. Let us apply the negentropy to
which is supposed to be zero-mean distributed.

The equation is then equal to:

The series above on the left-hand side is an harmonic series and
convergent by definition [7].
Equation (7) becomes a polynomial of -th degree, such as:

with defined in (4). Moreover, it is possible to factorize the
determinant of as it is a non-singular matrix [4]. In addition
if the standard deviation of the white noise
is known (e.g.,
[10]), one can normalize the time series
with the standard
deviation of the white noise. One can write then:

The resolution of (12) will depend on the degree of . Note
is defined relatively to the standard deviation of
the white noise. In the remainder, the covariance matrix of the
power-law noise
is normalized by the highest eigenvalue


time the dimension of the covariance matrix

and is
equal to 2 in (12).
Finally, we define of GPS receiver coordinates normalized
such as in (7) (e.g.,
, and overlapping sub-vectors
of , with the length of equal to .
One can define the cost function based on (12):



. The length of the sub-vector is an
with equal to
important parameter and it can increase the processing time, but
the estimated amplitude can be biased when using short subtime series (i.e.
samples). That is why the value of
is chosen to be close to . is an additional parameter which is
introduced to compensate the non-robustness of the negentropy
approximation underlined in (5). Note that is a -by- matrix
in (13).
With the cost function defined in the previous part, we use the
Davidson Fletcher Powell (DFP) optimization algorithm (see
[15]) in order to calculate the amplitude of the power-law noise.
Note that the power-law noise is generally estimated using the
log-likelihood cost function [12], [17] such as:
is similar to the -vector , but without the normalization with the standard deviation of the white noise.
is the sum of covariance matrix as defined in (4). Note that
the log-likelihood cost function is a concave function and has
a unique global maximum (e.g. [12]). Before investigating the
performances of our new cost-function, one can look at the optimization performances defined in the following property.
Property 02: One can define the following open convex sets
subset of
. For in
, posses
a global minimizer (
) in the case of white noise plus
flicker noise.
Proof: We want to find an optimizer such as
. Let us do a numerical estimation of the
gradient and Hessian matrices, when restraining the subsets
to [0, 100] which span the ratio of the amplitude of the colored
noise between [0, 0.1]
with a white noise amplitude
between [0.001, 0.1] mm. The subset
span the interval [0,
100]. Note that for symmetric reasons, we also draw the figure
for the negative amplitude values when showing graphically
the cost function.
The results of numerically calculating the Hessian matrix are
shown on the left-hand side of Table I. From the results, we can
conclude that the Hessian matrix is positive and hence the cost
is a convex function on the selected intervals
[15]. Let
denote the global minimizer
which is a vector and
the global minimum of the cost
function . Also, and are located in the same intervals as
defined earlier. Then we calculate the gradient vector and the

Fig. 1. Example of cost function

for a time series of 700 samples
with two noises model with the amplitude of flicker noise equal to 3 mm and
the white noise amplitude equal to 5 mm.

and the results are shown on the right-hand
side of Table I. It can be seen that the components of the gradient
vector at the point of the global minimizer approaches zero and
the minors of the Hessian matrix are non-negative. Fig. 1 shows
the numerically calculated cost function when the noise is the
sum of flicker noise and white noise.
The proposed cost function is applied to the estimation of the
amplitude of colored noise in simulated geodetic time series,
which contain 3 different components: white Gaussian noise, a
sinusoidal signal, power-law noise as described in [16]. A sinusoidal signal, with small amplitude (typically 0.4 mm), models a
residual signal found for example in geodetic time series (at annual and semi-annual frequencies) [14], or when estimating the
parameters associated with sinusoidal signals in colored noise
[1]. Furthermore, the power-law noise in time series can be simulated using either the equation of the colored noise defined in
[16], [19], or the derivative method [10]. The results with our
cost-function are compared with the log-likelihood cost function (e.g., using the software CATS [17]). Before estimating the
amplitude of the colored noise, the algorithm studied in [10] is
used to get the amplitude of the white noise. All results are averaged over 50 simulated time series. Note that the number of
is 3 and the approximation
is 5 when applying
We simulate some time series with a fixed amplitude of the
white noise equal to 5 mm/yr and varying the amplitude of the
flicker noise between 1 and 4
. The mean error between the estimated amplitude of the colored noise and the true



To conclude, the presented study shows that this negentropy

based cost function can be used to estimate the colored noise
amplitude for the case where the white noise amplitude is twice
larger than the colored noise amplitude. The results demonstrate
that on average the estimation based on our cost function is more
accurate than the existing method for time series shorter than
1400 samples.
The authors acknowledge the useful comments of the anonymous reviewers.

Fig. 2. Mean error (in percentage) between the estimated and true amplitude
of the flicker noise varying the length of the simulated time series).

Fig. 3. Mean error (in percentage) when varying the amplitude of the flicker
noise over the white noise and fixing the length of the simulated time series to
1000 samples.

one is then converted in percentage. In Fig. 2, the results show

that our cost function performs on average 5% better for short
time series less than 1400 samples. But the improvement is marginal for longer ones. Note that the performances of the log-likelihood cost function are improving with longer and longer time
series, as it is asymptotic to the Cramer-rao bound [10]. However, [10] and [17] warn about the long processing time for time
series longer than 3000 samples (e.g., 10 years long GPS coordinate time series).
Now, let us vary the amplitude of the flicker noise with the
same fixed amplitude value of the white noise, and with a fixed
length of the simulated time series (equal to 1000 samples).
Fig. 3 displays the results. The error increases drastically with
the colored noise amplitude when the ratio is greater than 0.6.
This underlines that our cost function is sensitive to the amplitude of the colored noise. In other words, the approximation of
the negentropy in (5) holds as long as the measurements remain
Gaussian distributed.

[1] C. Chatterjee, R. L. Kashyap, and G. Boray, Estimation of close sinusoids in colored noise and model descrimination, IEEE Trans. Acoust.,
Speech, Signal Procees., vol. 35, no. 3, pp. 328337, 1987.
[2] B. Goode, J. R. Cary, I. Doxas, and W. Horton, Differentiating
between colored random noise and deterministic chaos with the root
mean squared deviation, J. Geophys. Res., vol. 106, no. A10, pp.
2127721288, 2001.
[3] A. Hyvarinen, New approximations of differential entropy for independent component analysis and projection pursuit, in Advances
in Neural Information Processing Systems. Cambridge, MA, USA:
MIT Press, 1998, vol. 10, pp. 273279.
[4] I. Ipsen and D. Lee, Determinant approximations, Numer. Lin. Alg.
Applicat., May 2011 [Online]. Available: arXiv:1105.0437, to be published
[5] M. C. Jones and R. Sibson, What is projection pursuit?, J. Roy.
Statist. Soc. A, vol. 150, no. 1, pp. 137, 1987.
[6] M. B. Kennel and S. Isabelle, Method to distinguish possible chaos
from colored noise and to determine embedding parameters, Phys.
Rev. A, vol. 46, no. 6, pp. 31113118, 1992.
[7] E. Kreizig, Advanced Engineering Mathematics, 8th ed. Hoboken,
NJ, USA: Wiley, 2003.
scaling exponents
[8] O. Miramontes and P. Rohani, Estimating
from short time-series, Physica D: Nonlin. Phenomena, vol. 166, no.
34, pp. 147154, 2002.
[9] J. P. Montillet and K. Yu, Leaky LMS algorithm and fractional
Brownian motion model for GNSS receiver position estimation, in Proc. IEEE Veh. Technol. Conf. (VTC11 Fall), 2011,
[10] J.-P. Montillet, P. Tregoning, S. McClusky, and K. Yu, Extracting
white noise statistics in GPS coordinate time series, IEEE Geosci. Remote Sens. Lett., vol. 10, no. 3, pp. 563567, May 2013.
[11] G. Olivares, F. N. Teferle, and S. D. P. Williams, An evaluation of a
Monte Carlo Markov chain method for the statistical analysis of GPS
time series, in IGS Workshop, Olsztyn, Poland, Jul. 2012 [Online].
[12] L. Paninski, J. W. Pillow, and E. P. Simoncelli, Maximum likelihood
estimation of a stochastic integrate-and-fire neural encoding model,
Neural Comput., Nov. 2004.
[13] C. Rver, R. Meyer, and N. Christensen, Modelling coloured residual
noise in gravitational-wave signal processing, Ann. Appl. Statist.,
2008 [Online]. Available: arXiv:0804.3853
[14] P. Tregoning and C. Watson, Atmospheric effects and spurious signals
in GPS analyses, J. Geophys. Res., vol. 114, p. B09403, 2009.
[15] W. Sun and Y.-X. Yuan, Optimization Theory and Methods: Nonlinear
Programming. Berlin, Germany: Springer, 2006.
[16] S. D. P. Williams, Y. Bock, P. Fang, P. Jamason, R. M. Nikolaidis, L.
Prawirodirdjo, M. Miller, and D. J. Johnson, Error analysis of continuous GPS position time series, J. Geophys. Res., vol. 109, p. B03412,
2004, 10.1029/2003JB002741.
[17] S. D. P. Williams, CATS: GPS coordinate time series analysis software, GPS Solutions, vol. 12, no. 2, pp. 147153, 2008.
[18] P. C. Young, Recursive Estimation and Time Series Analysis, 1st ed.
Berlin/Heidelberg, Germany: Springer-Verlag, 2011.
[19] J. Zhang, Y. Bock, H. Johnson, P. Fang, S. Williams, J. Genrich, S.
Windowski, and J. Behr, Southern California permanent GPS geodetic
array: Error analysis of daily position estimates and sites velocity, J.
Geophys. Res., vol. 102, pp. 1803518055, 1997.