You are on page 1of 48

Randomly State Space Tracking

Bart Massey
Assoc. Prof. Computer Science
Portland State University
bart@cs.pdx.edu
First, Let's Talk About Me
● PSU Assoc Prof for 10 years, PhD AI UO,
MS PL impl. UO, BA Physics Reed
● Interested in technical tools for cross-
disciplinary problem solving
● ISO bright, motivated partners
– Check out http://psas.pdx.edu
– Check out http://code.google.com/soc for
Google Summer of Code info
– State-space search course, Spring 2010
What I'll Talk About
● Brief history of navigation!
● The state space tracking problem
● Brief mention of Kalman Filters
● Intro to Bayesian Particle Filters
● The weighted resampling problem
● Efficient weighted resampling
● Sketch correctness (?) and optimality
of my weighted resampling method
● The Ziggurat Method
● Efficient variate generation for my weighted resampling
method
● 1M particle updates per second!
History of Navigation
In One Slide
● Whenever people traveled, they used
– “Dead reckoning”: Navigation using a
model of where you are
– Observation: Navigation using sensory
evidence of where you are
● Accurate clocks help with both
● Observation has come to dominate
– In particular, single “reliable” sensor
– Skilled navigators are better than this
– An “AI-complete” problem?
State Space Tracking

● Not just for navigation


● Sensors always lie (except clocks)
● Filtering vs smoothing
“Formalizing” the problem
● Given
– g(t0..n-1 ) = estimated state values
– hi(t0..n ) = sensed state values
– Noise models for g and h
● Estimate
– g(tn) = new state estimate
● AKA “sensor fusion”
State Space Filtering
● Strategy: step the model forward in
time from last estimate, then
compare with the sensors
● Kalman Filtering
– Fine linear method
– Requires “only” O(s3) multiplies per step
– Various “fixes” exist
● Bayesian Particle Filtering (BPF)
– Nonlinear, but computationally expensive
BPF
● Recall Bayes' rule:
Pr(H|E) Pr(E) = Pr(H∧E) = Pr(E|H) Pr(H)
Pr(H|E) = Pr(E|H) Pr(H) / Pr(E)
● Here, the “evidence” comes from the
sensors, and the “hypothesis” from
the model
● But...we don't just want to know how
likely is that our model gave the right
position; we want to know what the
most likely position is?
“Particles”!

N
N

v
v
N

v N

N
v

● Track lots of models! ● Curse of dimensionality:


need lots of particles
● At each step, replicate
likely, discard unlikely ● Remember: I want 1Mu/s
models
● Particles “map” likely
position
“Bootstrap” BPF
● for each particle i
– gi(tn) = xi ← (m + Δ)(gi(tn-1 ))

– hi(tn) = wi = Pr(xi|E1..n )
= Pr(xi|E1) Pr(xi|E2) ...
● Estimate state from ensemble
● Resample (m → n, usually n = m)
– for i in {1..n}
– select p'i weighted-randomly
Weighted Random Resampling
● The naïve single-sample algorithm
– μ ← u[0..1] ; normalize wi
– w ← 0; i ← 1
– while w < μ
– w ← w + wi ; i ← i + 1
– return i
● Can iterate to get n samples
Faster Resampling
● Definitive method: O(mn)
● “Obvious” improvements:
– Binary search or treeify: O(m + n lg m)
– Sort variates and merge: O(n lg n + m)
● Could give up correctness
– Regular resampling: O(m)
– Regular with shuffle: O(m)
Optimal Resampling
● Variate merge is O(m lg m) because it
requires the variates in sorted order
● Just generate them that way?
● Distribution of the leftmost of n
variates μ ← u[0..1] is
p(μ0) = (1 - μ0)n-1
● Independence also implies recursion
● So we just generate variates in
increasing order and merge!
Sampling A Distribution

● Brute force: integrate, take ratio,


invert function
● Leftmost variate at μ0= 1 - μ1/n
Optimal, But Not Fast Enough
● The optimal algorithm
– normalize wi
– w ← w1 ; i ← 1; μ0 ← 1 - μ1/n
– while i ≤ n
– while μ0 ≤ w
– μ0 ← μ0 + (1-μ0) (1 - μ1/(n-i) )
– select particle i; i ← i + 1
– w ← w + wi
● O(m + n), but that one line is expensive
The “Ziggurat Method”

● Marsaglia and Tsang 2000; accelerated


“rejection method”
● Works directly on PDF; no integration or
inversion required
Ziggurat for the Whole Family

● Problem: (1-μ)n is parameterized on n


– We don't want to build n ziggurats
● Solution: Squint
– For large enough n, the curves are similar
– In fact, they look a lot like the
compounding curve: consider
rescaling μ and substituting to get
(1+(-μ/a))an
Notice
lim (1+(-μ/a))an = e-μn
A Slight Expansion

● For small i, use direct method


● For larger i, use expanded Ziggurat
My Trick Doesn't Work

● Need equal area under curve


● Two years, published paper, technical
talks: no one caught me
The Standard Trick
● Give up on simulating naïve sampling
as too hard and unnecessary
● Just sample at regular intervals
– w ← w1 ; j ← 1 ; μ0 ← μ / (n + 1)
– while μ0 ≤ 1 do
– while μ0 ≤ w
– μ 0 ← wj ; j ← j + 1
– select particle j
– μ0 ← μ0 + 1 / n ; w ← w + wi
Performance
Results

● Fast general-purpose BPF: 2.5Mu/s!


● Good results on vehicles
Machine Learning??
● “It's a lovely talk, but why this class?”
● Clearly AI: Attempt to infer from
incomplete, incorrect data and models
● Opportunities to apply ML to BPF
– Infer sensor model from measurements
– Discover systems failures
● Opportunities to apply BPF to ML
– Condition model data
– Find underlying probabilities
Acknowledgments and Availability

● Thanks to Prof James McNames and


Prof Gerardo Lafferriere
● Thanks to Jules Kongslie, Jamey Sharp
and Josh Triplett
● Thanks to BSP and PSAS students
● Thanks for the chance to talk!
● Paper is available w/ GPL code at
http://wiki.cs.pdx.edu/bartforge/bmpf
Randomly State Space Tracking

Bart Massey
Assoc. Prof. Computer Science
Portland State University
bart@cs.pdx.edu
First, Let's Talk About Me
● PSU Assoc Prof for 10 years, PhD AI UO,
MS PL impl. UO, BA Physics Reed
● Interested in technical tools for cross-
disciplinary problem solving
● ISO bright, motivated partners
– Check out http://psas.pdx.edu
– Check out http://code.google.com/soc for
Google Summer of Code info
– State-space search course, Spring 2010
What I'll Talk About
● Brief history of navigation!
● The state space tracking problem
● Brief mention of Kalman Filters
● Intro to Bayesian Particle Filters
● The weighted resampling problem
● Efficient weighted resampling
● Sketch correctness (?) and optimality
of my weighted resampling method
● The Ziggurat Method
● Efficient variate generation for my weighted resampling
method
● 1M particle updates per second!
History of Navigation
In One Slide
● Whenever people traveled, they used
– “Dead reckoning”: Navigation using a
model of where you are
– Observation: Navigation using sensory
evidence of where you are
● Accurate clocks help with both
● Observation has come to dominate
– In particular, single “reliable” sensor
– Skilled navigators are better than this
– An “AI-complete” problem?
State Space Tracking

● Not just for navigation


● Sensors always lie (except clocks)
● Filtering vs smoothing
“Formalizing” the problem
● Given
– g(t0..n-1 ) = estimated state values
– hi(t0..n ) = sensed state values
– Noise models for g and h
● Estimate
– g(tn) = new state estimate
● AKA “sensor fusion”
State Space Filtering
● Strategy: step the model forward in
time from last estimate, then
compare with the sensors
● Kalman Filtering
– Fine linear method
– Requires “only” O(s3) multiplies per step
– Various “fixes” exist
● Bayesian Particle Filtering (BPF)
– Nonlinear, but computationally expensive
BPF
● Recall Bayes' rule:
Pr(H|E) Pr(E) = Pr(H∧E) = Pr(E|H) Pr(H)
Pr(H|E) = Pr(E|H) Pr(H) / Pr(E)
● Here, the “evidence” comes from the
sensors, and the “hypothesis” from
the model
● But...we don't just want to know how
likely is that our model gave the right
position; we want to know what the
most likely position is?
“Particles”!

N
N

v
v
N

v N

N
v

● Track lots of models! ● Curse of dimensionality:


need lots of particles
● At each step, replicate
likely, discard unlikely ● Remember: I want 1Mu/s
models
● Particles “map” likely
position
“Bootstrap” BPF
● for each particle i
– gi(tn) = xi ← (m + Δ)(gi(tn-1 ))

– hi(tn) = wi = Pr(xi|E1..n )
= Pr(xi|E1) Pr(xi|E2) ...
● Estimate state from ensemble
● Resample (m → n, usually n = m)
– for i in {1..n}
– select p'i weighted-randomly
Weighted Random Resampling
● The naïve single-sample algorithm
– μ ← u[0..1] ; normalize wi
– w ← 0; i ← 1
– while w < μ
– w ← w + wi ; i ← i + 1
– return i
● Can iterate to get n samples
Faster Resampling
● Definitive method: O(mn)
● “Obvious” improvements:
– Binary search or treeify: O(m + n lg m)
– Sort variates and merge: O(n lg n + m)
● Could give up correctness
– Regular resampling: O(m)
– Regular with shuffle: O(m)
Optimal Resampling
● Variate merge is O(m lg m) because it
requires the variates in sorted order
● Just generate them that way?
● Distribution of the leftmost of n
variates μ ← u[0..1] is
p(μ0) = (1 - μ0)n-1
● Independence also implies recursion
● So we just generate variates in
increasing order and merge!
Sampling A Distribution

● Brute force: integrate, take ratio,


invert function
● Leftmost variate at μ0= 1 - μ1/n
Optimal, But Not Fast Enough
● The optimal algorithm
– normalize wi
– w ← w1 ; i ← 1; μ0 ← 1 - μ1/n
– while i ≤ n
– while μ0 ≤ w
– μ0 ← μ0 + (1-μ0) (1 - μ1/(n-i) )
– select particle i; i ← i + 1
– w ← w + wi
● O(m + n), but that one line is expensive
The “Ziggurat Method”

● Marsaglia and Tsang 2000; accelerated


“rejection method”
● Works directly on PDF; no integration or
inversion required
Ziggurat for the Whole Family

● Problem: (1-μ)n is parameterized on n


– We don't want to build n ziggurats
● Solution: Squint
– For large enough n, the curves are similar
– In fact, they look a lot like the
compounding curve: consider
rescaling μ and substituting to get
(1+(-μ/a))an
Notice
lim (1+(-μ/a))an = e-μn
A Slight Expansion

● For small i, use direct method


● For larger i, use expanded Ziggurat
My Trick Doesn't Work

● Need equal area under curve


● Two years, published paper, technical
talks: no one caught me
The Standard Trick
● Give up on simulating naïve sampling
as too hard and unnecessary
● Just sample at regular intervals
– w ← w1 ; j ← 1 ; μ0 ← μ / (n + 1)
– while μ0 ≤ 1 do
– while μ0 ≤ w
– μ0 ← w j ; j ← j + 1
– select particle j
– μ0 ← μ0 + 1 / n ; w ← w + wi
Performance
Results

● Fast general-purpose BPF: 2.5Mu/s!


● Good results on vehicles
Machine Learning??
● “It's a lovely talk, but why this class?”
● Clearly AI: Attempt to infer from
incomplete, incorrect data and models
● Opportunities to apply ML to BPF
– Infer sensor model from measurements
– Discover systems failures
● Opportunities to apply BPF to ML
– Condition model data
– Find underlying probabilities
Acknowledgments and Availability

● Thanks to Prof James McNames and


Prof Gerardo Lafferriere
● Thanks to Jules Kongslie, Jamey Sharp
and Josh Triplett
● Thanks to BSP and PSAS students
● Thanks for the chance to talk!
● Paper is available w/ GPL code at
http://wiki.cs.pdx.edu/bartforge/bmpf

You might also like