You are on page 1of 5

Non-particle filters

Fred Daum & Misha Krichman


Raytheon Company
225 Presidential Way
Woburn, MA 01801
ABSTRACT
We have developed a new nonlinear filter that is
superior to particle filters in five ways: (1) it
exploits smoothness; (2) it uses an exact
solution of the Fokker-Planck equation in
continuous time; (3) it uses a convolution to
compute the effect of process noise at discrete
times; (4) it uses the adjoint method to compute
the optimal density of points in state space to
represent the smooth conditional probability
density, and (5) it uses Bayes rule exactly by
exploiting the exponential family of probability
densities. In contrast to particle filters, which
do not exploit smoothness, the new filter avoids
importance sampling and Monte Carlo methods.
The new non-particle filter should be superior to
particle filters for a broad class of practical
problems. In particular, the new filter should
dramatically reduce the curse of dimensionality
for many (but not all) important real world
nonlinear filter problems.

KEY WORDS
Nonlinear filters, particle filter, adjoint method,
meshfree, Fokker-Planck equation, Kalman
filter

1.0

INTRODUCTION

We describe a new nonlinear filter that exploits


smoothness to reduce the curse of
dimensionality for a broad class of important
practical problems. We use a hybrid model of

nonlinear dynamics that allows us to solve the


Fokker-Planck equation exactly; in particular,
we use discrete time diffusion but continuous
time drift. With this hybrid model, the FokkerPlanck equation is equivalent to an ODE for the
unnormalized conditional density of the ddimensional state vector. Therefore, we do not
need an extremely fine quantization in time (or
an implicit method or ADI method) to
compensate for fine quantization in state space;
that is, we do not need to worry about stability
of the numerical solution of the Fokker-Planck
equation, as defined by the Courant-FriedrichLewy stability criterion. Moreover, we
implement Bayes rule exactly for updates of
the unnormalized density with measurements
using the exponential family of probability
densities. The effect of diffusion (also called
process noise by engineers) in the FokkerPlanck equation is computed using a fast
convolution of two probability densities. The
new filter is summarized in Tables 1, 2 & 3.
The derivation in Table 3 assumes that the
probability density is smooth and nowhere
vanishing; this formula was known to Liouville.
The new filter fully exploits the smoothness of
the Fokker-Planck equation, and therefore it
should be superior to particle filters, which do
not exploit any smoothness and which do not
exploit exact solutions or the exponential
family. However, if we use a uniform grid in ddimensional state space to represent the
conditional probability density, then we would
still suffer from the curse of dimensionality.
Therefore, we represent the density using a
sparse grid computed adaptively in real time

with the adjoint method [11]. The adjoint


method for solving PDEs numerically is
analogous to the adjoint (aka Lagrange
multipliers) used in optimal control (see Table
2). The adjoint method is an industrial strength
numerical algorithm that is widely used for
solving PDEs. Intuitively, the reason that we
can use this hybrid discrete-time/continuoustime model for nonlinear filtering is that
engineers use the diffusion term (so-called
process noise) as a design parameter, unlike
physics and chemistry, where the diffusion
tensor is defined by Nature. In particular,
engineers typically tune the process noise
covariance matrix to get improved results with
extended Kalman filters [4], but Nature does not
allow such tuning of Avogadros number or
other physical constants. Engineers commonly
increase or decrease process noise variance by a
factor of two or three without any significant
effect on filter performance, but changing the
drift term by one percent can wreak havoc with
performance in some applications. We exploit
this insensitivity to model variation in diffusion,
but we pay strict attention to the physics
which is encoded in the drift term in the FokkerPlanck equation. This allows us to model
process noise in discrete time and use a
convolution to compute the effect of diffusion
on the conditional density; this greatly reduces
computational complexity. It would be a shame
to lavish Gflops of computer throughput on
carefully solving the Fokker-Planck equation
with non-zero diffusion tensor, considering that
an exact model of the diffusion tensor is both
unknown and of little importance in practical
engineering applications. In most practical
applications the process noise covariance matrix
is diagonal; if not, it can be diagonalized at the
cost of d3 computations; this means that we can
use d one-dimensional convolutions of the two
probability densities.
The key issue in nonlinear filters is the curse of
dimensionality, which is a phrase coined by
Richard Bellman four decades ago. The curse

of dimensionality means that the computational


complexity of solving a problem increases
extremely fast with the dimension of the
problem. For nonlinear filters, the dimension
refers to the dimension of the state vector of the
dynamical system to be estimated. The term
extremely fast is usually taken to mean that
computer time increases exponentially with
dimension. It is easy to see why the curse of
dimensionality is relevant for nonlinear filters.
As explained below, we need to solve a partial
differential equation (PDE) in d-dimensional
state space in order to solve the nonlinear
filtering problem. Standard textbook methods
for solving PDEs numerically use a fixed grid in
d-dimensional space, and the computational
complexity grows as Nd where N is the number
of grid points in each dimension. We can
conclude from this that using a fixed grid results
in computational complexity growing
exponentially with dimension. Hence, using a
fixed grid is an extremely bad idea, and that a
non-uniform set of nodes computed adaptively
is required to have any hope of mitigating the
curse of dimensionality. That is the key idea of
this paper, as well as particle filters, as well as
all modern work on solving PDEs numerically.
We emphasize that hardboiled engineers are
only interested in good approximations rather
than exact solutions. The question of what is
good enough depends on the specific
application. There are many different
algorithms to solve the nonlinear filtering
problem, including: extended Kalman filters,
unscented Kalman filters, particle filters,
explicit numerical solution of the Fokker-Planck
equation, Daum filters, etc. A tutorial
introduction to a wide range of state-of-the-art
nonlinear filters is given in [4].
It has been asserted in engineering journals that
particle filters beat the curse of dimensionality,
but this is generally wrong. It turns out that
particle filters depend on a good proposal
density, and without such help the particle filters
also suffer from the curse of dimensionality [4].

Particle filters are extremely popular, owing to


the ease of coding and the simple theory
required. One can code a pretty good particle
filter in one or two pages of MATLAB, and one
does not need to understand the finer points of
stochastic calculus or any fancy methods for
solving partial differential equations. Also,
particle filters are popular due to their generality
and flexibility, as well as a certain amount of
hype associated with the claim that they beat the
curse of dimensionality. On the other hand,
particle filters do not exploit the smoothness of
the nonlinear filtering problem, and hence we
expect that the new filter described here should
be superior to particle filters for many practical
applications.

2.0 THE VALUE OF SMOOTHNESS


Smoothness can dramatically reduce
computational complexity for high dimensional
problems. In particular, for approximation of
smooth functions, a well known theoretical
bound [12] gives:

T=

c(d )

d/s

in which

T = computation time to achieve an


approximation error of

derivatives of order s that are bounded by unity


[12]. To quote Nemirovsky & Yudin [13]:
Smoothness does not, in itself, count for much;
what is important is the values of the numerical
parameters which characterize this smoothness
(the values of the corresponding derivatives, and
so on). This is intuitively obvious.
For example, for d = 20, if the conditional
density is in the class of functions with s = 2, we
have reduced the computational complexity by
an enormous factor, as if the dimension was
only d = 10. We might be tempted to say that
the effective dimension is d = 10 in this case.
If the theoretical bound given above applies to
our nonlinear filter problem, then we have not
beaten the curse of dimensionality, but we have
certainly improved the situation dramatically.
Unfortunately, the simple bound given above is
isotropic, whereas our problem might be much
smoother in certain directions than others, and
therefore it is difficult to quantify the reduction
in computational complexity using a simple
formula with just a few parameters.
Nevertheless, the simple back-of-the-envelope
formula above gives considerable insight into
the benefit of smoothness. There are other
bounds on computational complexity for
multivariate integration of smooth functions in
d-dimensions [12], as well as distinct formulas
that apply for estimation of smooth probability
densities in d-dimensions [10] and [14]-[15].

d = dimension of independent variable


s = smoothness of the functions being
approximated
c(d) = time for one function evaluation (e.g., d
for typical engineering problems)
We emphasize that the word smoothness in
this context does not mean, for example twice
continuously differentiable (for s = 2), but
rather the word smoothness as used here
defines a class of functions with mixed partial

References
(1)

W. Bangerth and R. Rannacher, Adaptive


finite element methods for differential
equations, Birkhauser Inc., 2003.

(2)

M. B. Giles and E. Suli, Adjoint methods


for PDEs, Acta Numerica, pages 145236, Cambridge University Press, 2002.

(3)

R. Becker and R. Rannacher, An optimal


control approach to a posteriori error
3

estimation in finite element methods,


Acta Numerica, pages 1-102, Cambridge
University Press, 2001.
(4)

(5)

(6)

(7)

(8)

(9)

F. E. Daum, Nonlinear filters: beyond the


Kalman filter, special tutorial issue of
IEEE AES Systems Magazine, August
2005.
F. E. Daum, Industrial Strength
Nonlinear Filters, Proceedings of
Workshop in honor of Yaakov BarShalom, Monterey California, May 2002.
F. E. Daum, New Exact Nonlinear
Filters, Chapter 8 in Bayesian Analysis
of Time Series and Dynamic Models,
edited by J. C. Spall, New York: Marcel
Dekker, Inc. 1988.
F. E. Daum, Exact Finite Dimensional
Nonlinear Filters, IEEE Transactions on
Automatic Control, July 1986.
M.-S. Oh, Monte Carlo integration via
importance sampling: dimensionality
effect and an adaptive algorithm,
Contemporary Mathematics, volume 115,
pages 165-187, 1991.
K. Kastella, Finite Difference Methods
for Nonlinear Filtering and Automatic
Target Recognition, in
Multitarget/Multisensor Tracking Volume
III, edited by Y. Bar-Shalom & W. D.
Blair, Artech House, Inc., 2000.

(10) Luc Devroye and Gabor Lugosi,


Combinatorial Methods in Density
Estimation, Springer-Verlag, 2001.
(11) Fred Daum and Mikhail Krichman,
Meshfree Adjoint Methods for Nonlinear
Filtering, Proceedings of IEEE Aerospace
Conference, Big Sky Montana, March
2006.

(12) J. Traub and A. Werschultz, Complexity


and Information, Cambridge University
Press, 1998.
(13) A. S. Nemirovsky and D. B. Yudin,
Problem Complexity and Method
Efficiency in Optimization, translated by
E. R. Dawson, John Wiley & Sons, Inc.,
1983.
(14) F. Cucker and S. Smale, On the
mathematical foundations of learning,
Bulletin of American Math. Society,
volume 39, number 1, pages 1-49, 2001.
(15) M. Griebel, Sparse grids and related
approximation schemes for higher
dimensional problems, Univ. Bonn, 2006.

Table 1 New filter vs. Particle filter

1. Prediction of
density (drift)
2. Prediction of
density
(diffusion)
3. Adaptive
method to avoid
uniform grid
4.
Representation
of density
5. Exploits
smoothness

NEW
FILTER
Exact
solution of
FokkerPlanck PDE
convolution
of two
probability
densities
Adjoint
method

Hybrid of
continuous &
discrete in
state-space
Yes

PARTICLE
FILTER
Monte Carlo

Monte Carlo

Importance
sampling
from
proposal
density
Particles

No

Table 2 Adjoint method for PDEs vs.


optimal control
PDEs
Computes optimal
density of points in
state space: q(x,t)
Uses feedback:
residuals of both the
primal & dual
solutions
Lp = f & L*v = g
Functional to be
minimized: error in
numerical
approximation of
conditional mean

Optimal control
Computes optimal
control: u(x,t)
Uses feedback

Euler-Lagrange
equations
Functional to be
minimized:
J = L(x, u ,t) dt

Table 3 Exact solution of FokkerPlanck equation for zero diffusion


p/t = - p/x f p Tr(f/x)
+ Tr(Q p/x)
p/t = - p/x f p Tr(f/x) for Q = 0
dp/dt = p/t + p/x f
dp/dt = - p Tr(f/x)
dp/p = - Tr(f/x) dt for p > 0.
Hence,
p(x, t) = p(x, 0) exp ( - Tr( f/x) dt )

You might also like