You are on page 1of 119

Particle Filter Tracking Architecture

for use Onboard Unmanned Aerial Vehicles


A Thesis
Presented to
The Academic Faculty
by
Ben T. Ludington
In Partial Fulllment
of the Requirements for the Degree
Doctor of Philosophy
School of Electrical and Computer Engineering
Georgia Institute of Technology
December 2006
Particle Filter Tracking Architecture
for use Onboard Unmanned Aerial Vehicles
Approved by:
Dr. George Vachtsevanos, Advisor
School of Electrical and Computer Engineering
Georgia Institute of Technology
Dr. Bonnie Heck Ferri
School of Electrical and Computer Engineering
Georgia Institute of Technology
Dr. Patricio Vela
School of Electrical and Computer Engineering
Georgia Institute of Technology
Dr. Anthony Yezzi
School of Electrical and Computer Engineering
Georgia Institute of Technology
Dr. Eric Johnson
Daniel Guggenheim School of Aerospace Engineering
Georgia Institute of Technology
Date Approved: November 10, 2006
ACKNOWLEDGEMENTS
This work was nancially supported by both the School of Electrical and Computer Engi-
neering at Georgia Tech and DARPAs HURT project.
This work would not have been possible without the advice, guidance, and mentoring
of my advisor, Professor George Vachtsevanos. The many conversations we have had about
this work over the past years have been invaluable. Professor Eric Johnson, director of
the Georgia Tech UAV Research Facility, provided additional guidance and support with
simulation and ight testing. I also thank the other members of my committee, Professor
Bonnie Heck, Professor Patricio Vela, and Professor Tony Yezzi.
Several students and co-workers were also instrumental in the completion of this work:
Johan Reimann, Dr. Graham Drozeski, Dr. Suresh Kannan, Nimrod Rooz, Phillip Jones,
Wayne Pickell, Henrik Christophersen, Dr. Liang Tang, Alan Wu, Sharon Lawrence and
others.
Finally, my ancee, Leah, helped me in so many ways. I do not want to think about
doing this without her being there every step of the way.
iii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
NOMENCLATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 INTRODUCTION AND MOTIVATION . . . . . . . . . . . . . . . . . . . 1
1.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Closing the Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 General Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Other Application Domains . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Organization Of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 PARTICLE FILTER BACKGROUND . . . . . . . . . . . . . . . . . . . . 8
2.1 Problem Description and Bayesian Tracking . . . . . . . . . . . . . . . . . 8
2.2 Other State Estimation Tools . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Particle Filter Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Model Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 Measurement Model and Data Fusion . . . . . . . . . . . . . . . . . 14
2.5 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Performance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6.1 Measurement Residuals . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.2 Kullback-Leibler Distance . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.3 Change Detection Approaches . . . . . . . . . . . . . . . . . . . . . 19
3 BASELINE PARTICLE FILTER FOR VISUAL TRACKING . . . . . 20
3.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 System Update Model . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . 22
iv
3.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 PARTICLE FILTER ADAPTATION . . . . . . . . . . . . . . . . . . . . . 33
4.1 System Update Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Eciency Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Performance Estimate Using a Multi-Layer Perceptron Neural Network . . 37
4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 TARGET TRACKING BACKGROUND . . . . . . . . . . . . . . . . . . 48
5.1 Visual Servoing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.1.1 Position-Based Visual Servoing . . . . . . . . . . . . . . . . . . . . 49
5.1.2 Image-Based Visual Servoing . . . . . . . . . . . . . . . . . . . . . 50
5.2 Visual Servoing and Unmanned Aerial Systems . . . . . . . . . . . . . . . 51
6 CLOSED LOOP TRACKING IMPLEMENTATION . . . . . . . . . . . 55
6.1 GTMax Unmanned Helicopter . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Coordinate System Transformation . . . . . . . . . . . . . . . . . . . . . . 58
6.3 Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.4 Camera Command and Waypoint Generation . . . . . . . . . . . . . . . . 62
7 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.1.1 Tracking With No Obstructions . . . . . . . . . . . . . . . . . . . . 68
7.1.2 Tracking With Obstructions . . . . . . . . . . . . . . . . . . . . . . 70
7.2 Flight Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8 CONCLUSIONS AND POSSIBLE FUTURE WORK . . . . . . . . . . 82
8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.2 Possible Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2.1 Further Comparisons to Other Techniques . . . . . . . . . . . . . . 84
8.2.2 Target Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.2.3 Multiple Aircraft . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.2.4 Multiple Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
v
8.2.5 Non-Steerable Cameras . . . . . . . . . . . . . . . . . . . . . . . . . 86
APPENDIX A PROOF OF MOTION METRIC . . . . . . . . . . . . . 87
APPENDIX B A NOTE ABOUT UPDATING A PORTION OF THE
PARTICLE SET AND CONVERGENCE . . . . . . . . . . . . . . . . . . 89
APPENDIX C PARTICLE FILTER RESULT MOVIES . . . . . . . . 92
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
RELATED PUBLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
vi
LIST OF TABLES
1 Parameters used for the baseline particle lter. . . . . . . . . . . . . . . . . 27
2 Ranges of parameters used for the adaptive particle lter. . . . . . . . . . . 38
3 Summary of the case study comparisons. . . . . . . . . . . . . . . . . . . . . 43
4 Characteristics of the Yamaha RMax airframe. . . . . . . . . . . . . . . . . 56
5 GTMax Baseline Avionics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
vii
LIST OF FIGURES
1 Frame from the GTMax unmanned helicopter. Due to clutter and the limited
resolution of the onboard camera, it can be dicult for both human operators
and traditional automated systems to estimate the state of the target, which
is a sports utility vehicle near the center of the frame. This work seeks to
provide an automated solution that will track such a target under similar
tracking conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Vehicle tracking movie taken from the GTMax unmanned helicopter. Like
Figure 1, this movie illustrates the diculties encountered in tracking a vehi-
cle from an aerial platform. This is a movie; in electronic copies of the thesis,
click on the image to play the movie. . . . . . . . . . . . . . . . . . . . . . . 3
3 Graphical overview of the system. . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Typical image captured from a camera mounted on a UAV. . . . . . . . . . 10
5 Measurement likelihood distribution. The likelihood distribution was gener-
ated using color and motion cues. . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Desired particle lter output when attempting to track a soldier in an urban
environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7 Typical frame from the GTMax unmanned helicopter. The sports utility
vehicle near the center of the frame is the target of interest. . . . . . . . . . 23
8 The right side shows the dierence pixels of the left side image. Notice the
number of dierence pixels in the vicinity of the target. . . . . . . . . . . . 24
9 System diagram of the automatic initialization routine. . . . . . . . . . . . . 26
10 Three typical outputs of the automatic initialization routine. In all cases,
the majority of the particles are placed in the vicinity of the target. . . . . 27
11 Frames 8, 23, 38, 53, 68, and 86 of the baseline particle lter output used
to track a soldier in an urban environment. The output movie is included in
Appendix C, Figure 55. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
12 Tracking error of the baseline particle lter used to track a soldier in an urban
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
13 Processing time of the baseline particle lter used to track a soldier in an
urban environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
14 Frames 32, 95, 158, 222, 285, and 348 of the adaptive particle lter output
used to track a SUV in a rural environment. The output movie is included
in Appendix C, Figure 56. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
15 Tracking error of the baseline particle lter used to track a SUV in a rural
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
viii
16 Frames 17, 50, 83, 116, 149, and 182 of the adaptive particle lter output
used to track a van in an urban environment. The output movie is included
in Appendix C, Figure 57. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
17 Tracking error of the baseline particle lter used to track a van in an urban
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
18 Frames 8, 23, 38, 53, 68, and 86 of the adaptive particle lter output used
to track a soldier in an urban environment. The output movie is included in
Appendix C, Figure 58. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
19 Tracking error of the adaptive particle lter used to track a soldier in an
urban environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
20 Processing time of the adaptive particle lter used to track a soldier in an
urban environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
21 Number of particles used by the adaptive particle lter to track a soldier in
an urban environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
22 Output of the neural network performance estimate used by the adaptive
particle lter to track a soldier in an urban environment. . . . . . . . . . . . 43
23 Frames 32, 95, 158, 222, 285, 348 the adaptive particle lter output used
to track a truck in a rural environment. The output movie is included in
Appendix C, Figure 59. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
24 Tracking error of the adaptive particle lter used to track a SUV in a rural
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
25 Output of the neural network performance estimate used by the adaptive
particle lter to track a SUV in a rural environment. . . . . . . . . . . . . . 45
26 Number of particles used by the adaptive particle lter to track a SUV in a
rural environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
27 Frames 17, 50, 83, 116, 149, 182 the adaptive particle lter output used
to track a van in an urban environment. The output movie is included in
Appendix C, Figure 60. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
28 Tracking error of the adaptive particle lter used to track a van in an urban
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
29 Output of the neural network performance estimate used by the adaptive
particle lter to track a van in an environment environment. . . . . . . . . . 47
30 Number of particles used by the adaptive particle lter to track a van in an
urban environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
31 System diagram of a position-based visual servoing system. . . . . . . . . . 49
32 System diagram of an image-based visual servoing system. . . . . . . . . . . 50
33 GTMax unmanned research helicopter . . . . . . . . . . . . . . . . . . . . . 55
34 Axis 213 camera. This camera is used onboard the GTMax. . . . . . . . . . 58
ix
35 Screenshot of the GTMax simulation tools. The scene window on the right
shows the simulated camera view that can be used for tracking testing. . . . 59
36 Simplied geometry used for coordinate system transformation. The particle
lter yields e
h
and e
v
, which are used to nd
e
and
e
. . . . . . . . . . . . 60
37 Waypoint generation, case one. No obstructions are known. Therefore, the
next waypoint will be in the direction of the target and the existing GTMax
software is used. The gray circle represents the reachable points. The same
result will occur even if the target is outside of the reachable points. . . . . 64
38 Waypoint generation, case two. A linear obstruction is known and the lines
of sight are not reachable. Therefore, the next waypoint is the nearest point
on one of the lines of sight. The dashed circle represents the reachable points. 65
39 Waypoint generation, case three. A linear obstruction is known and the lines
of sight are reachable. Therefore, the next waypoint is the reachable point
on one of the lines of sight that is nearest the target. The dashed circle
represents the reachable points, and the gray region represents the reachable
points that are not occluded. . . . . . . . . . . . . . . . . . . . . . . . . . . 66
40 Target used to test the particle lter in simulation. The left image was taken
near the target, while the right image was taken from an altitude of 150 feet. 68
41 Simulated paths of the target, helicopter, and commanded camera position. 69
42 Simulated path of the target, and particle lter output. . . . . . . . . . . . 70
43 Simulated paths of the target, helicopter, commanded camera position, and
particle lter output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
44 Position of the target in the image. . . . . . . . . . . . . . . . . . . . . . . . 72
45 Moving average of the tracking error, performance estimator, and number of
particles. The window for the moving average is 50 frames. . . . . . . . . . 73
46 Simulated paths of the target, helicopter, and commanded camera position.
The obstruction is represented by the thick black line. . . . . . . . . . . . . 74
47 Simulated path of the target, and particle lter output. The obstruction is
represented by the thick black line. . . . . . . . . . . . . . . . . . . . . . . . 75
48 Simulated paths of the target, helicopter, commanded camera position, and
particle lter output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
49 Position of the target in the image. . . . . . . . . . . . . . . . . . . . . . . . 77
50 Moving average of the tracking error, performance estimator, and number of
particles. The window for the moving average is 50 frames. . . . . . . . . . 78
51 Frames collected during particle lter ight testing. . . . . . . . . . . . . . . 79
52 Frames collected during adaptive particle lter ight testing. The movie
collected during this portion of the ight is included in Appendix C, Figure 61. 80
x
53 Flight test paths of the GTMax, commanded camera position, and particle
lter output. The turn of the target in the plot corresponds to the turn in
Figure 51. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
54 Output of the particle lter in the image. . . . . . . . . . . . . . . . . . . . 81
55 Output of the baseline particle lter used to track a soldier in an urban
environment. This is a movie; in electronic copies of the thesis, click on the
image to play the movie. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
56 Output of the baseline particle lter used to track a SUV in a rural environ-
ment. This is a movie; in electronic copies of the thesis, click on the image
to play the movie. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
57 Output of the baseline particle lter used to track a van in an urban environ-
ment. This is a movie; in electronic copies of the thesis, click on the image
to play the movie. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
58 Output of the adaptive particle lter used to track a soldier in an urban
environment. This is a movie; in electronic copies of the thesis, click on the
image to play the movie. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
59 Output of the adaptive particle lter used to track a SUV in a rural environ-
ment. This is a movie; in electronic copies of the thesis, click on the image
to play the movie. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
60 Output of the adaptive particle lter used to track a van in an urban en-
vironment. This is a movie; in electronic copies of the thesis, click on the
image to play the movie. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
61 Onboard imagery collected during ight testing of the adaptive particle lter.
This is a movie; in electronic copies of the thesis, click on the image to play
the movie. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
xi
NOMENCLATURE
Mathematical Symbols
a
p
predictor coecients
A
i
k
area of particle i at time k in pixels
2
DCM
c,v
direction cosine matrix for rotation from the camera frame to the vehicle
DCM
v,i
direction cosine matrix for rotation from the vehicle frame to the inertial
e
h
signed dierence between targets horizontal frame position and frame center
e
v
signed dierence between targets vertical frame position and frame center
FOV
h
horizontal angular eld of view width
FOV
v
vertical angular eld of view width
h
i
k
_
h
i
k,j
, j = 1, . . . , N
B
_
, histogram of particle i at time k
h
ref
{h
ref,j
, j = 1, . . . , N
B
}, reference histogram
J
i
image Jacobian
M
i
k
sum of the dierence pixels within particle i at time k
N
pix,h
number of pixels in the horizontal dimension
N
pix,v
number of pixels in the vertical dimension
N(x,
2
) Gaussian distribution with mean x and covariance
2
N
S
number of particles (samples)
p predictor order, manipulator position
v
k
measurement noise random variable at time k
w
i
k
weight of particle i at time k
w
i,C
k
color weight of particle i at time k
w
i,M
k
motion weight of particle i at time k
w
i,S
k
smoothness weight of particle i at time k
w
k
system noise random variable at time k
x
c
target position vector in the camera frame
x
k
state at time k
x
i
target position vector in the inertial frame
xii
x
i
k
particle i at time k
x
i
k,1
horizontal position of particle i at time k
x
i
k,2
vertical position of particle i at time k
x
t
target position coordinate
x
t
predicted target ground position
y
k
measurement at time k
y
1:k
{y
i
, i = 1, . . . , k}, set of measurements from time 1 to time k
y
C
k
color measurement at time k
y
M
k
motion measurement at time k
y
S
k
smoothness measurement at time k
z
i
k
randomly selected high motion pixel
time discount coecient

RW
adaptable weighting term between 0 and 1
Image-based visual servoing gain

v
vehicle roll

c
camera pan

e
pan error

v
vehicle heading

C
adaptable term for the color likelihood model

j
adaptable term for the jump portion of the system update model

M
adaptable term for the motion likelihood model

RW
adaptable term of the random walk portion of the system update model

S
adaptable term for the smoothness likelihood model

c
camera tilt

e
tilt error

v
vehicle pitch
feature vector

d
goal feature vector
xiii
Acronyms and Abbreviations
DARPA Defense Advanced Research Projects Agency
ELL Expected Log Likelihood
FOV Field Of View
fps frames per second
GCS Ground Control Station
GPS Global Positioning System
HSV Hue Saturation Value
ISR Intelligence, Surveillance, and Reconnaissance
KLD Kullback-Leibler Distance
pdf Probability Density Function
SUV Sports Utility Vehicle
UAV Unmanned Aerial Vehicle
UAVRF Unmanned Aerial Vehicle Research Facility
xiv
SUMMARY
Unmanned Aerial Vehicles (UAVs) have become powerful intelligence, surveillance, and
reconnaissance (ISR) tools for both military and civilian organizations. Providing real-time
imagery of a moving ground target is a typical ISR mission that UAVs are well suited for,
because they are capable of placing sensors at unique vantage points without endangering
a pilot. However, performing the mission can be burdensome for the operator. To track
a target, the operator must estimate the position of the target from the incoming video
stream, update the orientation of the camera, and move the vehicle to an appropriate
vantage point. The purpose of the research in this thesis is to provide a target tracking
system that performs these tasks automatically in real-time.
The rst task, which receives the majority of the attention in this thesis is estimating the
position of the target within the incoming video stream. This is a state estimation problem,
where the goal is to estimate the probability that an object of interest is at a particular point
in the image. Because of the inherent clutter in the imagery, the probability distributions are
typically non-Gaussian and multi-modal. Therefore, classical state estimation techniques,
such as the Kalman lter and its variants, are unacceptable solutions. The particle lter has
become a popular alternative since it is able to approximate the multi-modal distributions
using a set of samples.
While the particle lter has been used previously for visual state estimation, there are
two substantial challenges in using a particle lter onboard an autonomous vehicle. The
rst challenge is initializing the lter. Many previous research eorts have either initialized
the particles according to a uniform distribution or relied on a manual particle placement.
To increase the autonomy of the lter without wasting computational resources, this thesis
utilizes an automatic initialization routine that takes advantage of the a priori knowledge.
The second challenge is managing the computational burden of the particle lter to allow
the lter to run in real-time using the modest computational resources available onboard
a UAV. Previous research eorts have typically addressed this problem by changing the
number of particles base on lter performance. A similar approach is taken here. However,
instead of only changing the number of particles, the model parameters of the lter are
xv
adapted as well to improve tracking performance. Additionally, a neural network is used
to fuse various lter performance measures instead of only using one. These changes to the
particle lter are shown to improve the performance of the lter in two test cases. In a
third test case, the adaptive particle lter performed approximately the same as a baseline
lter. However, the adaptive lter did not have to be tuned.
Once the position of the target is estimated in the frame, a set of rotation matrices
and a linear predictor are used to estimate the future target location. The predicted target
location is used by the camera command and waypoint generator. The camera command
generator utilizes the vehicle attitude information to determine the angles at which the
camera should be pointed. The waypoint generator determines where the vehicle should
travel to observe the target, and uses three factors. First, the vehicle should move very
smoothly to reduce the apparent jumps of the target. Second, the vehicle should move
closer to the target to ensure the target is large enough in the image. Finally, the vehicle
should try to keep obstructions from coming between the target and the sensor.
This thesis has contributed to the eld by providing the following:
A closed-loop, visual target tracking architecture for use onboard an unmanned heli-
copter that operates in real-time.
An adaptive particle lter for visual state estimation.
A novel automatic initialization routine for the adaptive particle lter.
A neural network performance estimator that fuses various performance measures
together to estimate the overall performance of the lter.
A methodology for adjusting the parameters of the particle lter parameters
using the output of the neural network performance estimator.
A camera command and waypoint generator for target tracking applications.
Through oine, simulation, and ight testing, these contributions were shown to pro-
vide a powerful visual tracking system for use onboard the GTMax unmanned research
helicopter. The ight test results were generated as part of the Defense Advanced Research
Projects Agencys (DARPA) Heterogeneous Urban Reconnaissance, Surveillance, and Tar-
get Acquisition (RSTA) Team (HURT) program.
xvi
CHAPTER 1
INTRODUCTION AND MOTIVATION
Unmanned aerial vehicles (UAVs) have become valuable as surveillance tools since they
are capable of placing sensors in unique locations without endangering a human operator
[66, 67]. Small UAVs have been used extensively in Operation Enduring Freedom and
Operation Iraqi Freedom to provide a real-time view of the battlespace [49]. UAVs are also
beginning to be used in civil operations such as border enforcement. While it has been
reported that small UAVs are eective tools to see around the corner, they currently do not
possess enough autonomy to provide continuous imagery of a moving ground target without
placing a heavy burden on the vehicle operator. In fact, larger UAVs such as the General
Atomics Predator, require a sensor operator in addition to the pilot to manually control
the orientation of the sensing payload.
1
When a vehicle operator attempts to track a moving ground target, he must perform
three tasks concurrently. First, he must interpret the incoming video stream to estimate
the position of the target. Second, he must update the orientation of the camera to keep
the moving target near the center of the sensors eld of view (FOV). Finally, he must
update the desired position of the vehicle to ensure it remains in the vicinity of the target.
This research seeks to develop an automated system that performs these three tasks with
minimal operator input.
The goal of the video interpretation portion of the tracking problem is to estimate the
position of the target within each frame of video. This portion of the problem can be
very challenging because of the inherent clutter in the imagery and possibly poor image
quality. An example image that was captured onboard the GTMax unmanned helicopter
1
While the work described in this thesis can be used to track targets in any video stream, it does not
address the geolocation that would be required to track targets from the altitudes normally used by larger
UAVs. Instead, it focuses on smaller UAVs that y at lower altitudes where geolocation is not required for
target localization.
1
is shown in Figure 1, and a movie showing the target moving across the eld is shown in
Figure 2. In these gures, the sports utility vehicle near the center is the target of interest.
Because of the limited resolution of the image, the clutter in the background, and the
vibrations and other movements of the camera, it can be dicult for both human operators
and automated systems to estimate the state of the target. Automated systems have the
additional challenge of processing the plethora of information in real-time. Because of these
signicant challenges, the automated video interpretation portion of the problem receives
the majority of the attention in this research. A closed-loop tracking system that utilizes
the target position estimate is included for completeness.
Figure 1. Frame from the GTMax unmanned helicopter. Due to clutter and the limited
resolution of the onboard camera, it can be dicult for both human operators and tradi-
tional automated systems to estimate the state of the target, which is a sports utility vehicle
near the center of the frame. This work seeks to provide an automated solution that will
track such a target under similar tracking conditions.
Once the position of the target is estimated, it can be used to close the loop in both
human and automated systems. The position is used to update the orientation of the
steerable camera and to generate new vehicle waypoints. The camera should be pointed at
the target, and the vehicle should move to an appropriate vantage point to keep the target
in view. However, to ease the diculty of the video interpretation problem, any camera
and vehicle movements should be kept smooth.
2
Figure 2. Vehicle tracking movie taken from the GTMax unmanned helicopter. Like
Figure 1, this movie illustrates the diculties encountered in tracking a vehicle from an
aerial platform. This is a movie; in electronic copies of the thesis, click on the image to play
the movie.
1.1 System Overview
A graphical representation of the automated visual tracking system architecture is shown
in Figure 3. In this architecture, video frames from the camera onboard the unmanned
aircraft, which is assumed to be a helicopter in this research, are sent to a particle lter
for state estimation. The particle lter uses visual cues to estimate the position, or state,
of the target within the image. This information can then be used to estimate the current
ground position of the target. The current target position is used along with past target
positions to predict the future target position. The future target position is used to close
the loop of the tracking problem. The prediction is sent to the camera controller, which
attempts to keep the camera pointed at the specied position. The prediction is also used
for waypoint generation, which updates the position of the helicopter.
1.1.1 Particle Filter
As mentioned above, the particle lter is a state estimation tool. Like other Bayesian
techniques, the particle lter recursively estimates the probability distribution over the
state using information from the previous measurements. Unlike classical techniques such
as the Kalman lter [48], the particle lter is adept at estimating non-Gaussian probability
3
Figure 3. Graphical overview of the system.
distributions that are governed by nonlinear models. It is used to estimate the position
of the target within each video frames since the expected distributions are not only non-
Gaussian, but also multi-modal because of the image clutter. The lter uses a weighted
set of samples, or particles to perform the estimation. At each time step, the samples are
moved in the state space according to a system update model. Then, measurements are
taken and the weight of each particle is updated using a measurement model. The weighted
average of the particle set is the expected value of the system and is used as the estimated
target position.
While the particle lter has been successfully used in many state estimation problems
since its introduction in [36], several challenges exist in using the particle lter to visually
track ground targets from onboard a UAV. The rst challenge is initialization of the lter.
Traditionally, the initial particle placement has either been done manually or according to a
uniform distribution. Because of the constraints of the problem, neither of these approaches
is an acceptable solution. An alternative initialization routine is discussed in Chapter 3.
The second challenge is managing the large computational burden of the particle lter so
that it can run in real-time using the limited resources onboard the UAV. An adaptive
approach is discussed in Chapter 4 that improves the real-time capabilities of the lter
while reducing the tracking error.
4
1.1.2 Closing the Loop
After the target state is estimated, the UAV must respond to keep the target within the
FOV. To do this, a linear predictor is used to estimate the future target position. The
camera onboard the UAV is commanded to point at the future location. The low-level
camera control problem is not addressed in this work.
The future target location is also used for waypoint generation. The goal of waypoint
generation is to determine the appropriate vantage point for the vehicle. While the vehicle
should remain in the vicinity of the target to ensure the target appears large enough in the
imagery, the vehicle must also move smoothly to reduce the apparent jumps of the target in
the incoming video stream. The waypoint generation presented in this thesis also accounts
for known linear obstructions such as a building wall or a line of trees. Both the camera
command generation and waypoint generator are discussed in Chapter 6.
1.2 General Assumptions
The particle lter state estimation framework is intended to be somewhat general. However,
the lter adaptation is reliant on the existence of a dominant cue. In this case, color is
assumed to be dominant since previous work shows that in some instances, color can be
used as the only cue in a successful visual tracker [19, 24, 58, 64, 68]. This framework also
assumes that only one target will be tracked. While the particle lter can be extended to
track multiple targets [40, 50, 16, 90, 24], the multi-target tracking problem is not addressed
here.
It is assumed the target is inside the sensor FOV during initialization and is not occluded.
During trajectory generation, any obstructions must be known a priori and be correctly
and conservatively parameterized.
The closed loop tracking system is designed with the GTMax unmanned helicopter in
mind and cannot be directly applied to vehicles with dierent capabilities. Specically, the
design assumes the vehicle can have zero airspeed and that the camera onboard the vehicle
is steerable. Since the camera controller is not addressed in the design, it is assumed that
a camera control system exists that keeps the sensor FOV pointed at a specied location.
5
It is also assumed that the state data of the vehicle as well as the orientation of the camera
are both available.
Since neither a terrain database or range-nder is used during tracking, it is assumed
that the terrain is at. If the tracking architecture from this thesis is used in hilly terrain,
the camera should still be oriented correctly. However, the trajectory generation will be
subject to the inaccuracies of using a at terrain model. If terrain information is known, it
can be used when the particle lter output is projected onto the ground.
1.3 Other Application Domains
While this thesis focuses on using a particle lter for state estimation using visual cues, the
adaptation framework is applicable to particle lters that are used for other state estimation
problems including: navigation, robot localization, recognition, parameter estimation, etc.
The closed loop tracking system can be applied to other problems in the visual servoing
domain in addition to UAVs.
1.4 Organization Of This Thesis
The thesis is divided into two main parts. Part one, which begins with Chapter 2, dis-
cusses the particle lter as a visual state estimation tool, and part two, which begins with
Chapter 5, discusses the development of the closed loop tracking system for use onboard an
unmanned helicopter. Chapter 2 builds the particle lter from the Bayesian tracking frame-
work and then provides background into other particle lter issues such as computational
burden, adaptation, and performance estimation. Chapter 3 discusses the baseline particle
lter used for visual tracking including the system update model and measurement model.
The automatic initialization routine is also discussed in Chapter 3. Chapter 4 discusses the
adaptation of the particle lter. This includes changes to the system model, measurement
model, and the number of particles. Since a performance estimate is required for adapta-
tion, it is also discussed in Chapter 4. Particle lter results are presented at the end of
Chapter 4. Chapter 5 begins the closed loop tracking portion of the thesis by providing
6
background information. Chapter 6 discusses the linear predictor, camera orientation com-
mand generator, and waypoint generator after providing details of the GTMax unmanned
helicopter, which was used for testing. Chapter 7 presents simulation and ight test results
of the closed loop tracking system. Finally, Chapter 8 concludes the thesis and suggests
future extensions.
7
CHAPTER 2
PARTICLE FILTER BACKGROUND
The rst and most dicult step in the visual tracking system is estimating the position or
state of the target using information from the incoming video stream. The work in this
thesis is focused on exploiting imagery from an unmanned helicopter. Therefore, the state
estimation tool must be able to track the target in non-ideal video. This chapter discusses
the state estimation problem in a general context and introduces the particle lter.
2.1 Problem Description and Bayesian Tracking
The goal of the generic (discrete-time) state estimation problem is to estimate the proba-
bility distribution function (pdf) of the state of a Markov process, X = {X
k
, k N} using
all of the previous measurements, Y = {Y
k
, k N\{0}},

k|k
(dx
k
) = Pr(X
k
dx
k
|Y
1:k
= y
1:k
). (1)
It is assumed that the Markov process is updated by the state transition kernel, K (x
k
|x
k1
),
and the measurements are generated using the measurement function g (y
k
|x
k
),
Pr(X
k
A|X
k1
= x
k1
) =
_
A
K(dx
k
|x
k1
), A R
nx
, (2)
Pr(Y
k
B|X
k
= x
k
) =
_
B
g(dy
k
|x
k
), B R
ny
, (3)
where n
x
is the dimension of the state and n
y
is the dimension of the measurements.
The optimal, Bayesian solution to the estimation problem is given recursively in two
steps, prediction and update. In the prediction step, the state transition kernel is applied
to the previous distribution to yield the prior distribution,

k|k1
(dx
k
) =
_
R
nx

k1|k1
(dx
k1
)K(dx
k
|x
k1
). (4)
Typically, the prediction step increases the variance of the estimate. In the update step,
the measurements are incorporated into the prior using the measurement model and Bayes
8
Rule,

k|k
(dx
k
) =
g(y
k
|x
k
)
k|k1
(dx
k
)
_
R
nx
g(y
k
|x
k
)
k|k1
(dx
k
)
. (5)
Typically, the measurement or innovation step decreases the variance of the estimate.
If K (x
k
|x
k1
) and g (y
k
|x
k
) are both linear functions and
k|k
(dx
k
) can be completely
described using only the mean and second moment (i.e.
k|k
(dx
k
) is Gaussian), Equations
4 and 5 become the very well known Kalman lter [48] equations. However, in general
cases, the solution to these equations cannot be given in closed form, and an approximation
technique must be used. A brief survey of conventional approximation techniques is given
in the next section.
2.2 Other State Estimation Tools
Two methods have become popular for cases when the models are nonlinear. The rst
method is the extended Kalman lter, which was introduced in [2]. In this method, the
system and measurement models are approximated using a rst order Taylor series expan-
sion. While this technique can be eective when the nonlinearities are relatively small,
large nonlinearities or large model errors can result in prohibitively large estimation errors
or even divergence of the lter. The other popular approximation method is the unscented
Kalman lter, which was introduced in [46] and [47]. This technique uses a set of sigma
points to approximate the mean and covariance of
k|k
(dx
k
). The nonlinear system model
is then applied to the points at each time step, and measurements are used as in the tra-
ditional Kalman lter to update the distribution. The resulting points form an estimate
of the new posterior distribution. While the unscented Kalman lter typically outperforms
the extended Kalman lter, it is only able to accurately approximate distributions up to
the fourth moment. Therefore, this method could yield large tracking errors when used for
visual tracking since the distributions are typically multi-modal.
In cases when
k|k
(dx
k
) is non-Gaussian, the Gaussian sum lter can be used [2]. In this
approach, the pdf is decomposed in to a sum of Gaussian distributions. Then, each Gaussian
distribution is passed through a Kalman lter (or extended or unscented Kalman lter).
This results in a bank of Kalman lters. The output of each lter is then weighted and
9
summed. While this lter has been successful in cases when the noise is relatively small,
it tends to require numerous reinitializations when the noise increases. These repeated
reinitializations may result in lter divergence.
If real-time performance is not required, a grid-based approach can be used. In such
an approach, the state space is decomposed into a nite number of discrete states. Then,
the probability within each state is evaluated following the two-step process of predict and
update using the equations in [4]. This method is typically used when the system is governed
by a hidden Markov model as described in [73]. However, as the number of discrete states
increases, the required computational burden becomes overwhelming. The burden can be
reduced by truncating the state space, but this typically results in a higher estimation error.
While the above techniques work well in many application domains, they all lack ro-
bustness to the non-Gaussian, multi-modal distributions that are common in visual state
estimation. An example of a typical likelihood distribution for the image in Figure 4 is
shown in Figure 5. The target of interest is the sports utility vehicle that is traveling near
the tree line. While this distribution only accounts for measurements at one time step,
it does show how image clutter results in complicated, non-Gaussian, multi-modal distri-
butions, which the above techniques are unable to accurately approximate. However, the
particle lter is eective at estimating distributions of any shape. It will be described in
the next section.
Figure 4. Typical image captured from a camera mounted on a UAV.
10
Figure 5. Measurement likelihood distribution. The likelihood distribution was generated
using color and motion cues.
2.3 Particle Filter Fundamentals
The particle lter uses a set of samples to approximate
k|k
(dx
k
),

k|k
(dx
k
)
Ns

i=1
w
i
k

_
dx
k
x
i
k
_
, (6)
where x
i
k
is a particle, or point in the state space, and w
i
k
is its corresponding weight.
The weights are scaled so their sum is unity. As with the other techniques, the particles
are updated in two steps, prediction and update. In the prediction step, the particles are
moved in the state space using the system model and sampling from
k|k1
(dx
k
). Then,
after measurements are taken, the weights are updated using the measurement model,
w
i
k
g(y
k
|x
i
k
). (7)
The particles are resampled at each time step according to their weights. Therefore, the
previous weights, w
i
k1
, may be neglected.
As the number of particles increases, the sample-based approximation converges to the
optimal, Bayesian lter pdf [22, 30]. By using a set of non-deterministic points, the particle
lter can accurately approximate non-Gaussian distributions that evolve according to non-
linear models, and the particle lter typically yields less tracking error than Kalman lter
based techniques [62].
The particle lter was rst used as the bootstrap lter by Gordon, Salmond, and Smith,
where it was compared to the extended Kalman lter in a four-dimensional bearings only
11
tracking problem [36]. The particle lter successfully estimated the state of the nonlinear
system, while the extended Kalman lter diverged. Since the initial work, the particle lter
has gained popularity in problems that the Kalman lter and its derivatives are not suited
for. References [30, 74, 23] describe many radar tracking applications in which particle
lters outperform extended Kalman lters because of the presence of nonlinear models and
non-Gaussian distributions. Since the particle lter can easily handle multiple hypotheses,
it has also been recently used as a robot localization tool [33, 54].
The particle lter became a popular tool for visual state estimation after Isard and Blake
used it to track curves in highly cluttered images in what they called the CONDENSATION
algorithm [42]. Objects with distinct edges were consistently tracked in images. For exam-
ple, a particle lter was used to track the edge of a single leaf in a video taken of a bush
blowing in the wind. This proved the particle lters unique ability to process multi-modal,
non-Gaussian distributions. This work was extended in 2002 with the implementation of a
particle lter that was used to track objects in video using only color cues [68]. The particle
set was a group of rectangles that was placed within the image. A histogram was populated
for each particle, and these histograms were compared to a reference histogram to generate
measurements. While the particle lter performed well, the lter was initialized manually
and the lter was unable to process the frames in real-time with modest computational
resources.
While the particle lter has been used in numerous applications domains, it is by no
means a panacea for the state estimation problem. In fact, the baseline particle lter
described above is rarely implemented as-is. Usually, modications to the algorithm are
required based on the particular problem the lter is addressing. In fact, Daum and Huang
point out, to say that we are using a particle lter is about as specic as saying that we are
sailing a boat, which could mean anything from a dinghy to the Queen Mary [26]. Some of
the algorithmic changes and improvements that are used for tracking targets from a UAV
are discussed in the remainder of this chapter.
12
2.4 Model Adaptation
As in any state estimation problem, the particle lter requires system and measurement
models to accurately estimate the state. Since the target motion characteristics and tracking
conditions change over time, the models should either change over time, or be robust to the
changes.
2.4.1 System Model
As discussed above, the system model is generated from the transition kernel of the Markov
process. In the case of the particle lter, the system model is also used to control the
location of the particles in the state space. Typically, the system model takes one of four
forms: xed, generated, multiple hypothesis, or guided.
The xed model case is the simplest. In this case, the transition kernel is known a priori
and can be used to directly control the motion of the particles. While this can be a useful
tool in simulation, it is typically not used in real applications, since the exact transition
kernel is rarely known. If an inaccurate transition kernel is used, the particle lter will
diverge since the particles will not be placed in the correct portion of the state space [82].
In the generated model case, the previous lter output is used to generate an online
model. This can typically be done using a xed order autoregressive model as in [68]. An
algorithm to generate an autoregressive model can be found in [38]. While this technique is
more applicable to real world applications than using a xed model, it will not accurately
estimate the system model of a maneuvering target since the future target dynamics may
be statistically independent from the past target dynamics.
When estimating the state of a maneuvering target, a multiple hypothesis or guided
model must be used. A multiple hypothesis system model utilizes the sample based nature
of the particle lter and augments the state space to include a system mode identier.
Then, particles in a particular mode are propagated forward using the system model for
that mode. The particle lter then blends the hypotheses appropriately using the particle
weights [10, 57, 91, 31]. Such an approach not only estimates the state of the system of
interest, but it can also estimate the mode in which the system is operating.
13
When using a guided model, the particle locations are guided by measurements in ad-
dition to assumptions about the target dynamics. This method is typically used when
complimentary cues, such as audio and visual or motion and color, are present. When
tracking a speaker, [34] uses sound measurements to guide the particles to locations where
the speaker is more likely before taking visual measurement. Conversely, [76] use the image
cues to guide the particles in the system update step and then use the audio cues to take
measurements. Perez et al. use motion cues to guide the particles toward a target in video
and then take color measurements [68]. A priori data is used in [51] to guide the particle
toward the center of roads when tracking vehicles. Finally, the particle structure itself can
be used to guide the motion of the particles as was done in [77]. Here, the particles are
moved toward the distribution peaks in a method similar to mean shift.
The choice of model type is governed by details of the intended application. However,
due to the adaptive nature of the multiple hypothesis and guided models, they are typically
better suited to handling the changing tracking conditions.
2.4.2 Measurement Model and Data Fusion
After the particles are moved forward in time using the system model, the particle weights
are updated using the measurement model, which maps information from the sensors into
a target likelihood. When tracking targets visually, the information can be generated from
cues in the visual frames such as color, motion, shape, and size. When multiple cues are
used, they are typically assumed to be statistically independent. Therefore, weights can
be generated from the individual cues and simply multiplied together (and normalized) to
yield the total weight. This was done in [16, 55, 69]. In [89], separate particle lters were
run for each cue, and the output of the entire lter was a weighted average of the individual
lters.
As the tracking conditions change over time, the measurement model should change
accordingly. When more than one measurement cue is available, measurement model adap-
tations typically result in changes in the contributions of the individual cues to the total
likelihood. This was rst done in [81] for head tracking. The quality if each cue is measured
14
using the residual between the expected value of the combined distribution and the ex-
pected value of the distribution made with each cue. Cues with higher qualities make larger
contributions to the combined distribution. This approach is extended in [60] and [3] to
use Kullback-Leibler Distance (KLD), which will be discussed later in this chapter, instead
of the residuals. A dierent approach is used in [18] to visually track targets; instead of
using residuals to determine how well each cue performed, a heuristic is used. The approach
assumes that the cues that measure a large dierence between the estimated target position
and the surrounding area provide more information to the combined distribution. While
this approach blends the cues appropriately, it usually requires a larger number of particles.
When color is used as a tracking cue, its measurement model can also be updated as
tracking conditions change. Typically, color measurements are taken by comparing the
color histogram of a given particle to a reference histogram. As lighting, pose, and other
conditions change, the reference histogram should also be updated. In the past, the ref-
erence histogram has been updated by blending the previous reference histogram with the
histogram of the highest weighted particle. However, care must be taken to update the
reference histogram using only the target histogram and not the histogram of the clutter in
the background. An adaptive color model was rst used in [85]. In this approach, motion
and color measurements are used for visual tracking and the reference histogram changes
over time. To ensure the color model is only updated using the target color, the reference
histogram is only updated when motion has been detected. In [65], the reference histogram
is updated when the highest weighted particle is above a predetermined threshold. This
work is extended in [56] to allow the size of the histogram update to be determined by an
auxiliary variable in the particle lter. In other words, the particle lter itself determines
how much the reference histogram should change when the highest weighted particle is
above a threshold.
While a number of dierent approaches have been taken to update the models used with
a particle lter, there seems to be a general consensus that particle lter adaptation leads
to improved tracking performance and provides a way to decrease the number of particles.
Other methods of managing the particle lter computational burden is discussed in the next
15
section.
2.5 Computational Complexity
While the particle lter has been an eective tool for estimating non-Gaussian distributions,
its sample-based structure can require a large computational burden to achieve the desired
accuracy. This inherent complexity can make real-time performance of the particle lter a
challenge. In the past, the computational burden has typically been addressed in problems
when the number of state dimensions is relatively large since the rate of convergence of the
lter is dependent on both the number of particles and the dimension of the state space
[26]. However, large computational complexity can accompany visual estimation problems
in cases where there are very few states because of the large amount of data included in
each video frame.
The computational burden is typically managed by adjusting the number of particles
used at each time step. KLD has become a popular tool for determining how to make this
adjustment. Fox rst used KLD to measure the similarity of the sample-based distribution
to the true distribution in [32] and extended its use in [33]. Using the bounds derived
in [8], Fox determined the number of particles that are required to guarantee a specied
accuracy between the distributions. Since the true distribution is not available during actual
implementation, the prior distribution is used instead. The KLD-derived bounds are then
used to ensure the error between the prior and posterior distributions is within a specied
bound. This approach was used in [32, 33, 53, 54] for robot localization problems. However,
since calculating the KLD can be somewhat burdensome, it is unclear that its use improves
the speed of the particle lter.
In another attempt to manage the computational burden of the particle lter, [54] spread
the particles out over time-adjacent measurements in addition to using KLD to control the
number of particles. A similar approach was taken by [11] without the use of KLD. By
spreading the particles over time adjacent measurements, the particle lter can extract
information from high data rate sensors instead of ignoring information that comes in too
quickly. As will be discussed later in this thesis, spreading the particles out over time is, in
16
eect, modifying the measurement model. If the modications to the model are valid, then
this change does not eect the convergence of the lter.
Other techniques have also been used to manage the computational burden. In [12] a set
of rules are used to make the changes to the number of particles. The rules use information
from the particle distribution, such as the residuals and the values of the weights to estimate
the shape of the distribution. When the rules determine that there appears to be a sharp
peak in the distribution, the number of particles is increased. When the peak begins to
spread, the number of particles is drastically increased. Residuals are also used in [50] to
change the number of particles. The residuals are used to estimate the performance of two
lters that have a dierent number of particles. If the two lters have similar residuals, then
the lter with fewer particles is used at the next time step. Otherwise, the lter with more
particles is used. Finally, [16] uses the genetic algorithm nature of the particle lter to allow
the lter itself to estimate the number of particles. The number of particles is included as
a state, and the particle lter itself determines the proper value of the parameter.
2.6 Performance Estimation
As discussed above, a key part of managing the computational burden is determining how
well the particle lter is estimating the state. Once a performance estimate is made, other
parameters of the lter, such as system and measurement model parameters, can be ad-
justed. However, generating a particle lter performance estimate can be challenging. Un-
like the Kalman lter and its ospring, there is no direct tool to determine the estimation
error. While the covariance of the particle lter generated distribution can be easily found,
it does not have the same meaning as the covariance of the Kalman lter generated distrib-
ution since the particle lter distribution is typically non-Gaussian and can be multi-modal.
There are many other measures that can be used to estimate the particle lter performance.
For example, the maximum value of the weights, or the standard deviation of the weights
can yield information about the magnitude of the largest peak in the particle lter distribu-
tion. The spread of the particles in the state space can also yield performance information.
Three other techniques for evaluating the performance of the particle lter are discussed
17
in this section. The rst two are methods for comparing pdfs, and the third is an entropy
measure of the distribution.
2.6.1 Measurement Residuals
In a carry over from the Kalman lter, the measurement residuals can be used to estimate
the particle lters performance. The measurement residual is the dierence between the
expected measurement (from the prior distribution) and the actual measurement. When the
lter is accurately estimating the state, the dierence between the two is relatively small. In
other words, the new measurements do not yield very much new information. In adaptive
Kalman lters, the Kalman gain is adjusted to whiten the residual sequence. When the
sequence is white, the lter is optimal [35]. As mentioned above, residuals were used for
particle lters in [12] and [50]. However, since the particle lter is typically used to estimate
non-Gaussian distributions, the residuals, which only examine means of distributions, may
compare distributions incorrectly.
2.6.2 Kullback-Leibler Distance
To compare the entire distribution, instead of just the means, the Kullback-Leibler distance
(KLD) measurement can be used. KLD is a common tool used to compare two pdfs, p (x)
and q (x) [21]:
K (p (x) , q (x)) =

x
p (x) log
p (x)
q (x)
(8)
KLD is never negative and is only zero when p (x) = q (x). However, since it does not
satisfy the triangle inequality, it is not a metric. Like residuals, it is expected that the KLD
between the prior and the posterior will be relatively small when the lter is accurately
estimating the state.
As mentioned above, Fox uses KLD to adapt the number of particles in a particle lter
based upon performance, and others have followed suit. However, in the implementations,
KLD is used to compare the prior distribution to the posterior. Therefore, there is an
underlying assumption that the prior distribution is very accurate. For this assumption to
18
hold, there must be enough a priori information to generate an acceptable system model.
Also, Fox pointed out that calculating KLD at each step can be time consuming [33].
2.6.3 Change Detection Approaches
Recently, attention has been given to the change detection problem and its relation to
tracking with a particle lter. In the change detection problem, the goal is to approximate
when the system model changed. It has been recently used to detect abnormal activity
[83]. While it is intended to be used to approximate tracking errors that result from using
incorrect system models, it is also useful at measuring the performance of the particle lter.
The idea behind change detection for particle ltering is to nd the correct statistics
that can be used to alert the system that the lter is losing lock on the target. Vaswani has
found that the expected log likelihood (ELL) statistic can detect slow changes [84]. ELL is
dened to be the posterior expectation of the negative log of likelihood of the state [82]:
ELL =
1
N
N
S

i
log
_
p
_
x
i
k
__
, (9)
and it is equal to the Kerridge Inaccuracy between the posterior and prior distributions.
Other statistics have also been used for change detection [5], but ELL appears to be the
most successful in detecting slow changes.
Each of these performance estimation techniques has its strengths and weaknesses. The
performance of the tools that compare the prior to the posterior can be degraded if the
system update model is inaccurate. However, simpler techniques, such as measuring the
distance between the highest weighted particles, may not be able to detect if the particles
are very far away from the true state. Therefore, a handful of the estimation techniques are
blended in Chapter 4 to try to take advantage of the strengths of each tool.
19
CHAPTER 3
BASELINE PARTICLE FILTER FOR VISUAL TRACKING
This chapter and the next discuss the use of an adaptive particle lter for visual tracking.
This chapter focuses on the development of the tools required for any particle lter to
estimate the state of an object in video, such as the system model, the measurement model,
and initialization. The next chapter focuses on improvements to the particle lter that are
made through the use of adaptation.
For the remainder of this work, the particle lter will be discussed in the context of
estimating the state of a rectangle within an image. Therefore, the state is four dimensional:
x
i
k
= {center
horizontal
, center
vertical
, width, height}. (10)
The output of the particle lter should be the rectangle that describes the position of the
target. For example, if the particle lter is used to track a soldier moving in urban terrain,
then the desired particle lter output is shown in Figure 6.
Figure 6. Desired particle lter output when attempting to track a soldier in an urban
environment
20
3.1 Models
The performance of the particle lter is heavily dependent on the models used. The system
update model should not only describe the motion of the target but also describe the
desired evolution of the search space
1
. The measurement model should gather the desired
information from the frame and blend it appropriately. The development of both models is
discussed below.
3.1.1 System Update Model
During every step of the particle lter, the system update model is used to move each
particle from the previous location in the state space, or frame, to the next location. Since
measurements are only taken at particle locations, an appropriate choice of system model is
vital for eective tracking. Therefore, the assumptions used in generating the system model
must be chosen carefully.
For this work, the system model is generated through two assumptions. First, it is
assumed that the target moves smoothly from one time step to the next. Since this work is
focused on tracking ground targets, the assumption will hold in most situations as long as
the frame rate remains relatively high. This assumption can be put in mathematical terms
through the use of a Gaussian random walk model:
x
i
k
= N
_
x
i
k1
,
RW
_
, (11)
where x
i
k
is the i
th
particle at time k, x
i
k1
is the i
th
particle at time k 1, N (x, ) is a
Gaussian distribution with mean x and covariance .
It is also assumed, that there may be instances where the random walk model will not
create a diverse search space. Therefore, a jump model is also used. This model draws the
particles toward pixels where motion has been detected:
x
i
k
= N
_
z
i
k
,
J
_
, (12)
where z
i
k
is a randomly selected pixel where motion has been detected. Motion detection
1
The term search space is used to describe the portions of the state space where particles are located.
The term is used to emphasize that the lter can only look for the target in regions where the particles are
21
is done through a preprocessing stage where the value (in the hue, saturation, value sense)
of each pixel is subtracted from the value of the corresponding pixel in the previous image.
If the absolute value of the dierence is greater than a predetermined threshold, the pixel
is labeled as High Motion and it is made available to be drawn when generating the jump
model. Each particle uses a randomly selected pixel from the set of all High Motion particles.
The jump model becomes particularly important in cases when the target is emerging from
an occlusion.
The models generated from the two assumptions are blended together through the use
of a convex combination to form the entire system update model:
x
i
k
=
RW
N
_
x
i
k1
,
RW
_
+ (1
RW
)N
_
z
i
k
,
J
_
, (13)
where
RW
is an adaptable term that controls the size of each contribution. It will be
discussed further in the next chapter. A similar but static system update model was used
by Perez in [69].
3.1.2 Measurement Model
Once the particles are moved in the state space, the next step is to take measurements to
generate the particle weights. While any cue that is available from the imagery can be used,
this work focuses on the use of color and motion along with a smoothness measurement.
As can be seen in Figure 7, which was captured from the GTMax unmanned helicopter,
shape and edge cues are too unpredictable to be used because of the limited resolution of
the frame grabber and the typical operating altitude of the GTMax.
Color has become a popular cue for visually tracking targets. It was the only cue used
in [19, 24, 58, 64] and [68]. Typically, color cues perform well when the target is operating
in an area of low clutter and when the particles are tightly clustered around the target.
When color is used, a histogram is populated for each particle in the set, and it is compared
to a reference histogram to form a distance,
D
C
_
h
i
k
, h
ref
_
=
_
_
1
N
B

j=1
_
h
i
k,j
h
ref,j
_
_
1
2
, (14)
22
Figure 7. Typical frame from the GTMax unmanned helicopter. The sports utility vehicle
near the center of the frame is the target of interest.
where h
i
k
=
_
h
i
k,j
, j = 1, . . . , N
B
_
is the histogram for particle x
i
k
and
h
ref
= {h
ref,j
, j = 1, . . . , N
B
} is the reference histogram. This distance is a metric [19].
To keep the measurement robust to lighting changes, the hue, saturation, value (HSV)
color space is used.
Once the distance from the reference is measured, it is common to use an exponential
to determine the corresponding likelihood:
p
_
y
C
k
|x
i
k
_
exp
_

D
C
_
h
i
k
, h
ref
_

2
C
_
, (15)
where
2
C
is an adaptable term that controls the slope of the exponential curve.
As in [69], motion cues are used in addition to color cues. Like color, motion cues are
formed by rst generating a distance:
D
M
_
x
i
k
_
=
_
1
M
i
k
A
i
k
_
1
2
, (16)
where M
i
k
is the sum of the absolute value of the dierence pixels in particle x
i
k
and A
i
k
is the area of particle x
i
k
. The dierence pixels are generated by subtracting the values
(in the HSV sense) of the pixels within the particle from the corresponding pixels in the
previous image. The dierence pixels for an entire image are shown in Figure 8. As can be
seen, particles placed around the target will have lower distances than particles placed in
23
Figure 8. The right side shows the dierence pixels of the left side image. Notice the
number of dierence pixels in the vicinity of the target.
other portions of the frame. The motion measurement distance is also a metric as shown
in Appendix A. As with the color measurements, the distance is used to form the motion
likelihood,
p
_
y
M
k
|x
i
k
_
exp
_

D
M
_
x
i
k
_

2
M
_
, (17)
where
2
M
is analogous to
2
C
.
As pointed out in [4], when the particles are resampled at each time step, the lter
becomes sensitive to outliers. To combat this unwanted sensitivity, a smoothness term is
also included in the measurement model. This term rewards the particles that remain close
to the previous lter output. As with the color and motion measurements, rst a distance
measure is created,
D
S
_
x
i
k
_
=
_
_
x
i
k,1
x
k1,1
_
2
+
_
x
i
k,2
x
k1,2
_
2
_1
2
, (18)
where x
i
k,1
and x
i
k,2
are the horizontal and vertical coordinates of particle x
i
k
, respectively
and x
k1,1
and x
k1,2
are the horizontal and vertical coordinates of the lter output at
time k 1, respectively. This is a Euclidean distance measure, and is clearly a metric. The
distance is used to form the smoothing likelihood,
p
_
y
S
k
|x
i
k
_
exp
_

D
S
_
x
i
k
_

2
S
_
(19)
where
2
S
is analogous to
2
C
and
2
M
.
24
The measurement cues are fused together assuming that they are statistically indepen-
dent. Therefore, the combined likelihood is given by the product:
p
_
y
k
|x
i
k
_
p
_
y
C
k
|x
i
k
_
p
_
y
M
k
|x
i
k
_
p
_
y
S
k
|x
i
k
_
. (20)
The relative sensitivity of the likelihood equation to the cues is dependent upon the selection
of
M
,
C
,
S
and the operating point on the exponential curve. The sensitivity can be
observed through the use of partial derivatives [13]:
p(y
k
|x
i
k
)
D
C
(h
i
k
, h
ref
)

1

2
C
exp
_

D
C
(h
i
k
, h
ref
)

2
C
_
(21)
p(y
k
|x
i
k
)
D
M
(x
i
k
)

1

2
M
exp
_

D
M
_
x
i
k
_

2
M
_
(22)
p(y
k
|x
i
k
)
D
S
(x
i
k
)

1

2
S
exp
_

D
S
_
x
i
k
_

2
M
_
(23)
Since the operating point can be approximated a priori, appropriate choices of
C
,
M
, and

S
can be made based on the expected tracking conditions (the choice of these parameters
will be discussed in the next chapter in the context of adaptation). In the past, the choice
of these values has been made using trial and error [68, 69].
3.2 Initialization
As with other Bayesian lters, the particle lter must be initialized using some sort of a
priori information. There are typically three choices for initialization techniques: using a
uniform distribution, using manual particle placement, or using measurements [64]. Placing
the particles according to a uniform distribution is wasteful of the limited computational
resources. Manually placing the particles can increase the operator workload, and can be
dicult to do when the camera is located onboard a UAV. Therefore, this work uses an
automatic initialization routine that relies on measurements to place the particles in the
rst frame.
The automatic initialization routine utilizes color, motion, and size information to place
the cues. A diagram of the routine is shown in Figure 9. The routine begins by passing
two time adjacent images through color lters. The bands of the lter are derived from
25
the reference histogram as will be discussed in Chapter 6. The ltered images are then
subtracted pixel-wise and the absolute value is taken to yield motion information. If the
target is expected to be large (> 25% of the frame size), a coarse image is made to remove
noise. The value of each pixel in the coarse image is the sum of the ten pixel by ten pixel
region in the dierenced image. The resulting image is then sent through a comparitor to
remove more noise. The nal image is then grouped to form candidate regions. Particles
are assigned to the candidate regions based on size using a Gaussian distribution. More
particles are placed over candidate regions that are nearer the expected target size.
Figure 9. System diagram of the automatic initialization routine.
It should be noted that the goal of the automatic initialization routine is not to identify
only one target region. Instead, the goal is to nd any number of possible target locations in
the rst image. Because of the multi-hypothesis nature of the particle lter, it is acceptable
for multiple candidate regions to receive particles. After the initialization process, the
particles that were placed over incorrect regions will either be replaced by higher weighted
particles in the subsequent time steps or be moved toward the true target due to the jump
portion of the system model, Equation 13.
A sample of outputs from the automatic initialization routine is shown in Figure 10.
The particles are represented by the red and blue rectangles. The ten highest weighted
particles are shown using blue rectangles, while the remaining particles are shown in red.
In all three cases, the majority of the particles are placed in the vicinity of the target. The
incorrectly placed particles all move toward the target in subsequent frames.
26
Figure 10. Three typical outputs of the automatic initialization routine. In all cases, the
majority of the particles are placed in the vicinity of the target.
3.3 Results
For comparison purposes, the baseline particle lter described in this chapter was imple-
mented and tested on multiple test videos. Results from three of the videos are described
here as case studies. The rst video shows a soldier maneuvering in an urban environment.
The video was taken using a professional quality, hand-held camcorder from a rooftop. The
second video shows a sports utility vehicle (SUV) driving in a rural setting. The video was
taken using the camera onboard the GTMax unmanned helicopter (the helicopter will be
discussed further in Chapter 6). The third video shows a large van operating in an ur-
ban environment. The video was taken using the camera onboard the GTMax2 unmanned
helicopter, which is very similar to the GTMax. All of the videos for the baseline lter
were converted to a series of frames at 10 frames per second (fps). Therefore, real-time
performance is achieved if the lter processing time was less than 0.1 seconds for all the
frames (with the exception of initialization). The baseline particle lter used 75 particles.
The model parameters were set using guess and check as described in [69]. The values of
the parameters used in each case is shown in Table 1.
Table 1. Parameters used for the baseline particle lter.
Soldier Tracking SUV Tracking Van Tracking

RW
0.85 0.85 0.85

RW
and
J
_
8 8 1.5 1.5
_
T
_
8 8 0.5 0.5
_
T
_
8 8 0.5 0.5
_
T

C
0.1 0.1 0.2

M
4 8 4

S
10 40 40
The particle lter output from the soldier tracking case is shown in Figure 11. As will
27
be the case with the other displays of the output, the particles are represented with red and
blue rectangles. The ten highest weighted particles are shown using blue rectangles, and
the remaining particles are shown in red. The error plot is shown in Figure 12. As with the
other error plots, the error was generated by comparing the output of the particle lter to
a set of manually selected target positions.
Figure 11. Frames 8, 23, 38, 53, 68, and 86 of the baseline particle lter output used
to track a soldier in an urban environment. The output movie is included in Appendix C,
Figure 55.
The soldier movie tests the ability of the particle lter to recover from severe target
occlusions. Near the midpoint of the image sequence, the soldier leaves the FOV. When
executing the lter 50 times, the lter was able to reacquire the target on only 23 of the trials.
While the particle lter can outperform other Bayesian techniques, this movie demonstrates
that improvements must be made to the algorithm to improve its robustness to occlusions.
The processing time is shown in Figure 13
2
. Near frame 20, the processing time climbs
above 0.1 seconds per frame, and real-time performance is not achieved. This is caused by
both the particle size and the number of particles. As can be seen in the error plot of Figure
12 the tracking error during this time is relatively low. Therefore, the number of particles
could have been reduced and real-time performance could have been achieved. However,
2
When processing time is discussed, it is measured using a Dell Inspiron I6000 laptop computer with an
Intel Pentium M processor that has a 1.60 GHz clock speed and 1 GB of RAM. The laptop is using the
Microsoft Windows XP Professional operating system
28
Figure 12. Tracking error of the baseline particle lter used to track a soldier in an urban
environment.
the decrease in particles would result in a lower probability that the lter would reacquire
the target after the occlusion.
The particle lter output from the SUV tracking case is shown in Figure 14, and the
tracking error is shown in Figure 15.
The SUV movie tests the ability of the particle lter to track the target in clutter.
While the baseline lter estimates the position of the target during most of the sequence,
the errors would be unacceptable if the particle lter was used for closed loop tracking. The
error near frame 120 is particularly concerning since the lter would not have reacquired
the target if the target did not move close to the incorrectly located particle cloud.
Since the target is so small, the baseline particle lter achieved real-time performance.
Therefore, the number of particles could have been increased in the hope of reducing the
tracking error. However, it was kept at 75 for comparison purposes. Also, the lter could
have been tuned further using the guess and check method. However, this process is tedious,
and there is not a structured way to adjust the parameters.
The particle lter output from the van tracking case is shown in Figure 16, and the
29
Figure 13. Processing time of the baseline particle lter used to track a soldier in an
urban environment.
tracking error is shown in Figure 17.
Like the SUV video, the van video tests the ability of the particle lter to track the
target in clutter. Again, the baseline lter correctly estimates the target position during
most of the sequence, which shows the ability of the particle lter to track targets in clutter.
However, picking the correct model parameters is a dicult and time-consuming process.
The next chapter discusses how adaptation can alleviate this problem.
The particle lter described in this section is a baseline particle lter. While the baseline
is able to track the target under some circumstances, it does not manage the computational
burden, the choice of model parameters must be done a priori, and the parameters can
not be updated. The next chapter discusses how adaptation can be used to improve the
performance of the lter.
30
Figure 14. Frames 32, 95, 158, 222, 285, and 348 of the adaptive particle lter output
used to track a SUV in a rural environment. The output movie is included in Appendix C,
Figure 56.
Figure 15. Tracking error of the baseline particle lter used to track a SUV in a rural
environment.
31
Figure 16. Frames 17, 50, 83, 116, 149, and 182 of the adaptive particle lter output used
to track a van in an urban environment. The output movie is included in Appendix C,
Figure 57.
Figure 17. Tracking error of the baseline particle lter used to track a van in an urban
environment.
32
CHAPTER 4
PARTICLE FILTER ADAPTATION
As demonstrated in the last chapter, the particle lter is a powerful visual tracking tool, but
under some circumstances it lacks robustness to occlusions and large amounts of clutter.
When tracking a target in these dicult conditions, the lter performance can be improved
by changing the lter parameters. However, these changes typically require more particles,
which can slow down the lter even when the target is not occluded and maneuvering
away from clutter. In addition to the real-time diculties, determining the correct lter
parameters can be a dicult and time consuming process, and the process must be redone
when the tracking conditions change. Therefore, an automated, online method for lter
parameter generation is needed. This chapter discusses a technique to change the lter
parameters as the tracking conditions change to improve the particle lter in terms of both
tracking error and computational eciency. This technique is implemented on the baseline
particle lter from the last chapter.
4.1 System Update Model
Since the system model controls the location of the particles in the state space, it should
not only roughly estimate the motion of the target but also select which portions of the
state space should be searched. When using a particle lter to track targets in video,
incorrectly placed particles waste valuable processing power and unnecessarily expose the
lter to clutter. Therefore, the system update model should be tuned to search the areas
where the target is most likely to be.
As discussed in the previous chapter, the system model for the particle lter used in
this thesis consists of a random walk portion and a jump portion, Equation 13. A model of
this form has three design variables:
RW
,
RW
, and
J
.
The choice of
RW
controls how much of a contribution the random walk portion of the
33
model will make to the entire model, and is typically made through experimentation [69].
As
RW
approaches unity, the particle motion from one time step to the next is restricted.
This is acceptable if the particles are already located around the target. If the target is
occluded or if the particles are placed near clutter,
RW
should be reduced to allow the
particles to move more from one time step to the next. In other words, if the particles are
already near the target, the search space can be reduced. However, if the particles are not
already near the target, the multi-hypothesis nature of the particle lter should be exploited
to allow the particle lter to search for the target. A method for determining if the particles
are near the target is discussed in a later section of this chapter.
The choice of
RW
and
J
, which are both four dimensional also controls how far the
particles move from one time step to the next. Therefore, when the particles are near the
target all of the values of both
RW
and
J
should be reduced. However, when the lter
begins to lose lock on the target, they should be increased to increase the number of pixels
that are examined in the frame.
Because the particle lter has the ability to nd peaks in pdfs, the structure of the
particle lter can be used to nd appropriate values of the system model parameters [28,
59]. This results in an increase in the dimension of the particles. Thus, the particle from
Equation 10 now becomes:
x
i
k
= {center
horizontal
, center
vertical
, width, height,
RW
,
RW
,
J
}. (24)
However, the increase in the dimensionality of the particles can result in an increase in the
required number of particles [26].
For the particle lter used in this thesis, the system model parameters are included
as auxiliary variables in the particle as shown in Equation 24. The value of each of the
auxiliary variables is assigned according to a Gaussian distribution. However, to reduce
the required number of particles, the mean of the distribution is adjusted based on the
performance estimate (discussed in Section 4.4).
34
4.2 Measurement Model
In addition to placing the particles appropriately, the measurements should be blended
properly based on the tracking conditions. As discussed in the previous chapter, the mea-
surement model is made up of three parts: a color likelihood model, a motion likelihood
model, and a smoothness likelihood model. A model of this form has three design variables:

C
,
M
, and
S
. The choice of these variables determines the sensitivity of the lter to the
three types of measurements.
In [19, 24, 58, 64, 68] color was the only cue used for visual target tracking, and the
respective trackers were successful in the cases reported in each. Color is particularly useful
when the target is operating in regions where there is little clutter and the particles are
placed in the vicinity of the target. Therefore, when the performance estimate (Section
4.4) determines that the lter has an acceptable lock on the target, the sensitivity of the
measurement model to color measurements is increased by reducing
C
. Also, to limit
the motion of the particles when the particles are already located around the target the
sensitivity of the measurement model to smoothness measurements is increased by reducing

S
. Conversely, when the particle lter is beginning to lose lock on the target, the sensitivity
of the measurement model to motion measurements is increased by increasing
C
and
S
.
Unlike the system update model, the measurement model parameters cannot be added to
the particle lter as auxiliary variables since the parameters of the measurement model have
a direct eect on the particle weights.
As pointed out in Chapter 2, more robustness can be added to the lter through the
use of an adaptive color model [85, 65, 56]. In other words, the tracking performance can
be improved by updating the reference histogram when the lter has a strong lock on the
target. For the particle lter used in this thesis, the reference histogram is changed through
the use of a convex contribution. The size of the contribution is determined a priori. The
histogram is only updated when the output of the performance estimate of Section 4.4 is
above a predetermined value.
35
4.3 Eciency Improvements
In addition to the model parameters, the number of particles has a direct impact on the
performance of the particle lter. Typically, an increase in the number of particles reduces
the tracking error, and as the number of particles approaches innity, the sample-based
distribution converges to the distribution of the optimal, Bayesian lter [22]. However,
increasing the number of particles will also increase the computational burden. Additionally,
if there is a sharp peak in the sample-based distribution, a substantial portion of the particle
set will be the at the same point in the state space, which can waste computational resources.
The challenge in changing the number of particles is determining when the number of
particles can be reduced and when the number of particle should be increased. Several
methods of making this determination were discussed in Section 2.5. All of the methods
attempt to estimate how well the lter is performing. However, there are a number of
dierent tools used to estimate the performance. In all of the cases, when the lter is
performing well, the number of particles is decreased, and when the lter is losing lock, the
number of particles is increased. For the particle lter used in this thesis, the determination
will be done using a neural network as discussed in Section 4.4.
When the neural network performance estimate determines that the particle lter has
a strong lock on the target, the number of particles is decreased. However, when the
performance estimate determines that the particle lter is beginning to lose the target, the
number of particles is increased in the hopes that computational eciency can be traded for
tracking accuracy. As the number of particles change, the weights of the particles are scaled
to ensure that they make the same size contribution to the sample-based distribution.
In addition to changing the number of particles based on the lter performance, the
eciency can also be improved by only updating a portion of the particle set at each time
step as in [11] and [54]. This is only valid if the measurement model can be rewritten as
p
_
y
k
|x
i
kj
, j
_

j
p
_
y
C
k
|x
i
kj
_
p
_
y
M
k
|x
i
kj
_
p
_
y
S
k
|x
i
kj
_
, (25)
where is a constant that is less than unity. For the particle lter used in this thesis, it
is assumed that the measurement model can be rewritten in this way. In other words, if
36
a target is detected to be at a location during one time step, it is likely to still be at that
location during the next time step. The eect of this change on convergence is in Appendix
B. The number of particles update at each time step is controlled by the output of the
performance estimate, which will be described in the next section.
4.4 Performance Estimate Using a Multi-Layer Perceptron
Neural Network
The adaptations to the particle lter are done based on an estimate of the lter performance.
As discussed in the previous chapter, there are numerous measures of lter performance,
and they all have their unique strengths and weaknesses. A neural network is used to fuse
the various measures together to form a performance estimate.
A neural network is a tool that can generate an input to output mapping using a set
of weights, biases, and (possibly nonlinear) activation functions [70]. For the particle lter
used here, a multi-layer perceptron neural network is used. The network contains four
layers, including the input layer and the output layer. The network is trained oine using
a conventional back propagation algorithm.
The input vector to the neural network consists of measurements that are intended to
yield information about the shape of the sample-base distribution and the likelihood that the
mean of the distribution corresponds to the target position. The following measurements are
used: spread of the ten highest weighted particles, maximum value of the weights, maximum
value of the weights generated using only the color cues, standard deviation of the weights,
standard deviation of the weights generated using only the color cues, the measurement
residual, the color measurement residual, and the expected log likelihood. Information
from the color measurements is used because it is assumed that the color measurements are
more reliable than the motion or smoothness measures.
The training data for the network is generated by manually labeling a large set of par-
ticle lter output frames that result from varying tracking conditions. The performance of
the lter is estimated by an operator who observes the location of the ten highest weighted
37
particles with respect to the target in a given frame. The operator then assigns a perfor-
mance estimate to the frame and it is used to train the network. The neural network is
then tested on another set of data.
The output of the neural network is a value between positive and negative three. The
range of output was chosen out of convenience, and it could be mapped onto any line
segment. The output is then passed to a set of rules that either increase or decrease the
parameters of interest. For example, if the output of the neural network is 1.7, the number
of particles is decreased, the mean of
RW
is increased, while the means of
RW
and
J
, as
well as the values of
C
and
S
are decreased. If the output of the neural network is above a
predetermined threshold, the reference histogram is also updated. The color measurement
residual is also used when
C
is changed. If the color residual is large, this means that the
color measurements provided more information about the target location. Therefore,
C
is
decreased further to increase the sensitivity to color measurements.
4.5 Results
The improvements discussed in this chapter were implemented on the baseline particle lter
from the previous chapter. The lter was then run on the same videos as the baseline to
determine how much the adaptation improved the lter performance. The values of the
parameters where determined from the results discussed in the baseline case. However, due
to the adaptation, the same set of parameters could be used for all three cases. In other
words, the adaptation automatically tuned the lter, which allows the lter to be used in
cases where the tracking conditions are not know a priori. The range of values used is
shown in Table 2.
Table 2. Ranges of parameters used for the adaptive particle lter.
All cases

RW
0.6 - 1

RW
and
J
_
2 2 0 0
_
T
-
_
15 15 1 1
_
T

C
0.02 - .1

M
4

S
10 - 40
38
The output of the particle lter used to track the soldier in an urban environment is
shown in Figure 18, and the error plot is shown in Figure 19. The use of adaptation allowed
the lter to reacquire the target when he reemerges from the edge of the frame much more
consistently. When executing the lter 50 times, the lter was able to reacquire the target
on 43 of the trials. This is an improvement of 85% over the baseline.
Figure 18. Frames 8, 23, 38, 53, 68, and 86 of the adaptive particle lter output used
to track a soldier in an urban environment. The output movie is included in Appendix C,
Figure 58.
The improved performance came along with a decrease in computation time. The com-
putation time is shown in Figure 20, and the number of particles is shown in Figure 21. The
decrease in computational burden is due to a decrease in the number of particles when the
lter is tracking the target well. In the adaptive case the average number of particles was
71, which is marginally less than the baseline case. Additionally, the parameter adaptation
kept the particles from becoming as large as they did in the baseline case. Adaptation
decreased the computation time to the point that the adaptive particle lter operates in
real-time, unlike the baseline lter.
The output of the neural network performance estimator is shown in Figure 22. When
Figure 22 is compared to Figure 19, it is clear that neural network correctly identied when
the target left the eld of view (near frame 45) since the output of the network decreased.
This decrease caused an increase in the number of particles and more spread in the particles.
Also, the importance of color and smoothness measurements was decreased. These changes
39
Figure 19. Tracking error of the adaptive particle lter used to track a soldier in an urban
environment.
gave the particle lter a better chance at nding the target again when he reemerged (near
frame 60). At this time, the output of the neural network grew, signaling that the lter
had a strong lock on the target. This caused the lter parameters to become more ecient
without increasing the tracking error.
The output of the particle lter used to track the SUV in a rural setting is shown in
Figure 23, and the error plot is shown in Figure 24. In this case, the adaptation reduced the
tracking error without increasing the computation time. The adaptation to the reference
histogram was particularly useful because of the heavy reliance on color cues.
The output of the neural network performance estimator is shown along with its moving
average in Figure 25, and the number of particles used is shown in Figure 26. The moving
average is shown to make the sustained changes in the performance estimate more visible.
For example, when the error increased near frame 50, the neural network output decreased.
Like the soldier tracking case, the neural network identied times when the number of
particles can be reduced without increasing the error. In this case, the the average number
of particles was only 61, which is a 19% improvement over the baseline. In both the baseline
40
Figure 20. Processing time of the adaptive particle lter used to track a soldier in an
urban environment.
and the adaptive cases, real-time performance was achieved.
The end of this tracking case motivates the need for closed-loop tracking, which will be
discussed in the next two chapters. The target moves very far from the camera. The output
of the neural network decreases as the tracking error increases. However, the adaptation
cannot generate a suitable set of parameters because the target does not take up enough
pixels in the image. If the camera platform had moved closer to the target, the particle
lter might have been more successful.
Finally, the output of the particle lter used to track the van in an urban setting is
shown in Figure 27, and the error plot is shown in Figure 28.
The output of the neural network performance estimator is shown in Figure 29, and the
number of particles used is shown in Figure 30. In this case, the adaptation did not improve
the performance in terms of tracking error or computation time. The average tracking error
and the average number of particles increased marginally. The average tracking error was
10.2 pixels in the adaptive case and 6.8 in the baseline case, and the average number of
particles was 81. While the adaptive lter did not outperform the baseline tracker, it also
41
Figure 21. Number of particles used by the adaptive particle lter to track a soldier in an
urban environment.
did not lose the target. This means that the baseline lter was appropriately tuned for
the tracking conditions and that the conditions remained relatively static. Additionally,
clutter in the form of sidewalks, trees, and buildings kept the adaptive lter from trading
o accuracy (in distribution) for eciency near the end of the sequence. This is evident
from examining the output of the neural network performance estimator, Figure 29. In the
second half of the movie, the neural network determined that the particle-based distribution
is being eected by the clutter. Thus, the number of particles was increased to ensure the
error in distribution is kept as low as possible during these dicult tracking conditions.
The comparison of the particle lter performance is summarized in Table 3. From the
table, it is evident that the use of adaptation improved the performance of the lter in
the soldier and SUV cases in terms of both tracking error and computational eciency
(compared using number of particles). As pointed out before, adaptation did not improve
the lter performance when the lter was used to track the van. However, the same lter
with the same parameters was used in all three cases with the adaptive lter, while the
baseline lter was tuned for each case. The baseline lter tuning was done a priori using
42
Figure 22. Output of the neural network performance estimate used by the adaptive
particle lter to track a soldier in an urban environment.
a trial and error method, which cannot be used in real world applications. Therefore, even
if the performance of the adaptive lter was not superior to the baseline lter, it can be
argued that the adaptive lter is better suited for real world tracking applications.
Table 3. Summary of the case study comparisons.
Case Average Tracking Error [pixels] Average Number of Particles
Soldier Baseline 39

75
Soldier Adaptive 18 71
SUV Baseline 10 75
SUV Adaptive 6.5 61
Van Baseline 6.8 75
Van Adaptive 10.2 81

In the baseline soldier tracking case, the lter did not reacquire the target after it left
the FOV
The remaining chapters of this thesis are focused on using the output of the particle
lter for closed loop tracking onboard a specic unmanned aerial vehicle.
43
Figure 23. Frames 32, 95, 158, 222, 285, 348 the adaptive particle lter output used to
track a truck in a rural environment. The output movie is included in Appendix C, Figure
59.
Figure 24. Tracking error of the adaptive particle lter used to track a SUV in a rural
environment.
44
Figure 25. Output of the neural network performance estimate used by the adaptive
particle lter to track a SUV in a rural environment.
Figure 26. Number of particles used by the adaptive particle lter to track a SUV in a
rural environment.
45
Figure 27. Frames 17, 50, 83, 116, 149, 182 the adaptive particle lter output used to
track a van in an urban environment. The output movie is included in Appendix C, Figure
60.
Figure 28. Tracking error of the adaptive particle lter used to track a van in an urban
environment.
46
Figure 29. Output of the neural network performance estimate used by the adaptive
particle lter to track a van in an environment environment.
Figure 30. Number of particles used by the adaptive particle lter to track a van in an
urban environment.
47
CHAPTER 5
TARGET TRACKING BACKGROUND
The target location within the video frame is used as feedback to close the loop on the
tracking problem. The idea of using visual data as feedback for a control system is not a
new one. In fact, it is becoming more popular in many robotics applications as visual sensors
and processors become more available and more powerful. This chapter discusses ways in
which visual information can be included as part of a control system, and then briey
discusses how these ideas can be applied to target tracking from aerial vehicles. However,
the focus of this thesis is on the previously described image interpretation problem. Thus,
the remainder of this chapter is only intended to give a basic introduction to visual servoing
and to place the aerial, visual tracking problem in a visual servoing context.
5.1 Visual Servoing
Because of the large amount of information provided by imagery, vision is a valuable tool
used to guide robots in uncertain environments. The use of visual feedback was rst reported
in 1973 [78]. Since then, the research has steadily grown along with the strides made in
computer processing power. Presently, vision has been used in applications spanning from
manufacturing to aircraft landing [37]. The eld has grown so much that there are now
several topics within the broad eld of vision-based feedback control.
On such topic is known as visual servoing. Visual servoing combines topics from image
processing, robot kinematics, mechanical dynamics, control theory, and real-time computing
to control the motion of a robotic end-eector using information from a video stream.
Visual servoing is popular in many manufacturing applications such as grasping items on
a conveyor belt or part mating since vision allows the robots perform the desired tasks
under some uncertainty [20]. However, [37] reports that due to a lack of image processing
techniques that are able to react to complex and unpredictable situations, moving visual
48
servoing systems from the factory and into the real world is still an active research area.
Visual servoing systems are typically classied into two approaches, position-based and
image-based [37, 41]. These two approaches are described in the remainder of this section.
5.1.1 Position-Based Visual Servoing
When a position-based visual servoing approach is used, the vision task is somewhat de-
coupled from the control task as shown in Figure 31. The camera is used to estimate the
position and pose of both the manipulator and the target in some common coordinate sys-
tem. This estimation is referred to as image exploitation in Figure 31. A particle lter or
some other image processing tool can be used to perform this estimation. Once the relative
position between the manipulator and target is estimated, an error signal is generated by
subtracting the reference signal from the estimate. This error is used as the input to the
inner-loop controller.
Figure 31. System diagram of a position-based visual servoing system.
The position-based approach is well suited for cases in which vision is being added to
system and the inner-loop controller has already been designed and proven, since the vision
is only used to generate a new reference signal to the controller. However, the approach
does not have a direct method to estimate the range to the target, which is typically needed.
While an estimate of target size can yield information about range, it typically requires an
excellent computer vision algorithm or very limited circumstances to accurately estimate
the target range. Therefore, stereo vision, lidar, and other range measurement techniques
must typically be added to position-based systems, which can add to the complexity.
49
5.1.2 Image-Based Visual Servoing
Instead of estimating the position of the target relative to the manipulator, the image-based
approach performs the entire control task in image or feature space as shown in Figure 32.
Instead of using the visual data to estimate the positional error between the target and
manipulator, the visual data is used to form an error in feature-space. A set of reference
or goal features is used the to form an error between this reference and the actual. The
interpreter and controller then commands the plant to move in a manner to reduce the
error.
Figure 32. System diagram of an image-based visual servoing system.
To generate the command to the plant, the image Jacobian must be calculated. If the
feature vector is denoted by , and the manipulator state is denoted by p then the image
Jacobian is
J
i
=

p
. (26)
If the feature vector is the same length as the p vector (i.e. J
i
is square), and J
i
is invertible,
then a simple control law can be used,
p = J
1
i
(
d
) , (27)
where is the controller gain. If J is not square it is possible to use the pseudo-inverse
instead.
The image-based approach alleviates the need to generate range information, since the
entire problem is formulated in feature-space. Additionally, it is typically more ecient to
generate the feature vector than the state estimate. Therefore, the approach can be more
50
computationally ecient than the position-based approach. However, generation of the
image Jacobian is not necessarily straight-forward, and like other controllers, it is subject
to model inaccuracies. Also, if J is singular, another technique must be used to map the
error vector into a set of commands.
Since the GTMax unmanned helicopter is already equipped with a camera controller and
a state-of-the-art vehicle controller, which will be briey discussed in Chapter 6, a position-
based approach was used as part of this research. This allowed the vision portion of the
target tracking problem to be decoupled from the camera and vehicle controllers. Therefore,
the tracking system could take advantage of the very capable controllers currently onboard
the helicopter, and improvements to either controller would be automatically incorporated
into the tracking system in the future. Also, the helicopter state data can be used to
estimate the range to the target if the terrain is assumed to be at.
5.2 Visual Servoing and Unmanned Aerial Systems
The ideas behind visual servoing have been applied to various unmanned systems to improve
performance. Typical applications include: state estimation for navigation, state estimation
for target tracking, and control law generation for tracking. In much of the previous work, it
is assumed that some sort of image processing tool exists that is able to generate the desired
information from visual measurements. Therefore, the focus has been on the generation of
estimators and controllers to perform the desired tasks. These systems usually rely on
position-based visual servoing ideas to decouple the estimation problem from the control
problem.
In the position-based visual servoing problem, the state of the manipulator is estimated.
If the manipulator is replaced by a camera-equipped unmanned vehicle, then similar ap-
proaches can be used for generating a navigation solution. This is also referred to as the
structure from motion problem [1]. In conventional, non-vision systems the navigation so-
lution is typically formed by integrating accelerometers and angular rate gyros. Therefore,
some means is required to correct for the accumulated bias errors of the sensors. Because
of the ubiquity of the global positioning system (GPS) and the availability of receivers, it
51
is typically used to make the correction. However, in some circumstances, like low altitude
ight or ying in the urban canyon, GPS becomes unreliable and cannot be used in ight
[88]. In these circumstances, position-based visual servoing has been used as an alternative
to GPS. Position-based visual servoing has also been used instead of inertial sensors [72].
Vision is typically integrated into the navigation solution through the use of an extended
Kalman lter [75, 7, 71, 87, 88]. In this approach, an approximated system model is used to
propagate an estimate forward in time. This estimate is then used to predict the location
of various features or landmarks in the visual data. If inertial data is available, it can be
included to form the prediction. Then, measurements are made from the imagery to estimate
the position of the landmarks. The dierence between the prediction and measurements is
used to rene the navigation solution.
When estimating the state of the features or landmarks [71] reports that it is common
to use a Lucas-Kanade type tracker [61]. This type of tracker recursively searches a user-
dened window to nd selected features from the previous image. Therefore, it is assumed
that the appearance of a feature in one frame does not change in subsequent frames. Also,
it is assumed that the feature does not move outside the window in time-adjacent frames.
The accuracy of the navigation solution can be diminished if either of these assumptions
are invalid.
The other half of the estimation portion of the position-based visual servoing problem
is estimating the position of the target object. When applied to unmanned aerial systems,
this portion of the problem uses visual data to estimate the state of a ground or aerial
target. Typically, algorithms from computer vision are used to detect features. Then,
state data from both the camera and vehicle are used to map the estimate into the inertial
frame. As mentioned above, the Lucas-Kanade tracker is a popular tool for tracking target
features as they move smoothly from one frame to the next. The performance of the tracker
can be improved through the use of a pyramid transform [14]. The transform generates
representations of the image at various resolutions in the hopes of nding the resolution
that best matches the feature. Kumar et al. describe another popular technique that relies
on change detection [52]. High performance, specialty hardware is used to smooth and
52
mosaic images, which removes much of the ego-motion of the camera. Then, time adjacent
images can be subtracted pixel-wise, and moving targets are detected. Other ideas from
computer vision are also popular tools for estimating the position of features in video. For
example, [25] uses a color-based segmentation routine to estimate the state of other robots
within a formation, [79] uses a line detection algorithm to estimate the position of roads
for blimp trajectory generation, and [44] and [39] use an active contour approach to track
ground targets.
The nal portion of the position-based visual servoing problem applied to unmanned
systems is determining an appropriate guidance law that maps the vision-based target state
estimate to a set of actuator commands. Since a position-based visual servoing approach
is typically used, this amounts to determining desired accelerations or angular rates that
will result in a specied distance and orientation to the target. In [72], a controller was
developed using simulation techniques that was able to guide a glider into a window. In
the approach described in [86] and [29], an asymptotically stable controller is developed for
xed-wing aircraft that drives the heading of the aircraft toward the target. The controller
was demonstrated using a model aircraft and resulted in the vehicle ying circular patterns
above the target, which was estimated using a commercially available image processing
tool. Cao, Hovakimyan, and Evers develop a guidance law from the use of a pan/tilt
camera pointed on a translational rod [15]. This work is extended in [80], where the class
of allowable target maneuvers is limited to eliminate the translational movements of the
camera.
The next chapter describes how the visual servoing ideas from this chapter are applied
to the problem of tracking a maneuvering ground target from an unmanned helicopter
equipped with a pan/tilt camera. Because of the capabilities of the GTMax, which is the
desired tracking platform, many of the diculties of generating a guidance law are reduced.
In fact, the approach taken decouples the orientation of the camera from the movements of
the helicopter much like [29, 86]. Therefore, the movements of the helicopter were only used
to aid the particle lter. The helicopter movements were kept smooth and in the direction
of the target. The development of the camera orientation command generator and waypoint
53
generator for ground target tracking is presented in the next chapter.
54
CHAPTER 6
CLOSED LOOP TRACKING IMPLEMENTATION
The output of the particle lter described in Chapter 4 is implemented onboard the GTMax
unmanned helicopter as part of a position-based visual servoing system. The adaptive
particle lter acts as the target state estimator. Then, the helicopter and camera state
data is used to rotate the estimated target state into the inertial frame. Once the inertial
position of the target is estimated, a linear predictor is used to estimate the position of
the target one time step into the future. This position is used to update the orientation of
the camera and to generate a new waypoint for the helicopter. This chapter discusses each
piece of the process after giving an overview of the GTMax and its capabilities.
6.1 GTMax Unmanned Helicopter
The GTMax is an unmanned research helicopter that is developed, maintained, and oper-
ated by the UAV Research Facility (UAVRF) at the Georgia Institute of Technology [45].
It is shown in Figure 33.
Figure 33. GTMax unmanned research helicopter
55
The vehicle utilizes Yamahas RMax industrial helicopter airframe, which has the char-
acteristics listed in Table 4
Table 4. Characteristics of the Yamaha RMax airframe.
Rotor Diameter 10.2 feet
Length 11.9 feet (including rotor)
Engine 2 cylinder, water cooled, 256 cc, 21 horsepower
Max Weight 205 pounds
Payload (including avionics) > 66 pounds
Endurance 60 minutes
The airframe is equipped with an avionics package consisting of a variety of sensors and
general purpose computers. The avionics components are listed in Table 5. Most of the
avionics are packaged into modules that are mounted under the body of the helicopter in a
vibration-isolated rack. The camera is mounted beneath the nose of the aircraft.
Table 5. GTMax Baseline Avionics.
266 MHz Pentium II embedded PC, 2 Gb Flash Drive (primary ight computer)
850 MHz Pentium III embedded PC, 2 Gb Flash Drive (secondary ight computer)
Inertial Sciences ISIS-IMU Inertial Measurement Unit
NovAtel OEM-4 Dierential GPS
Honeywell HMR-2300 3-Axis Magnetometer
Custom Made Ultra-Sonic Sonar Altimeter
Custom Made Optical RPM Sensor
Actuator Control Interface
11 Mbps Ethernet Data Link and Ethernet Switch
FreeWave 900MHz Spread Spectrum Serial Data Link
Axis 213 Pan, Tilt, Zoom Network Camera
The GTMax software is written in C/C++ and is operating system independent. The
onboard software executes in either the primary or secondary ight computers. Usually the
primary ight computer is reserved for low-level tasks like navigation, trajectory generation,
and control, while the secondary computer is reserved for higher-level algorithms such as
the particle lter tracker described in Chapter 4. A operator interfaces to the helicopter
through the ground control station (GCS) software. Generic serial and Ethernet software
is used to communicate between the GCS and the onboard computers.
The onboard navigation software uses a 17 state extended Kalman lter. The system
is all-attitude capable and updates at 100 Hz [27]. The onboard controller is an adaptive
56
neural network, trajectory following controller [43]. The controller commands the vehicle
to follow trajectories that are generated using a set of waypoints. The onboard controller
can also limit various dynamics of the GTMax. For example, the operator can specify a
maximum allowable jerk, and the baseline controller will automatically limit the jerk of the
vehicle. This attribute of the controller will be utilized to minimize the jerk and resulting
attitude changes when the GTMax is pursuing a target.
For vision-based control applications, the GTMax utilizes the Axis 213 camera, which
is a commercially available imaging solution that uses the Hypertext Transfer Protocol
(HTTP) to send images and positional data as well as receive orientation commands. It is
shown in Figure 34 (Other camera congurations have been used; however, the Axis 213
was used for tracking using a particle lter.) While the camera orientation is controlled to
within very small tolerances (< 0.1 degrees), the camera suers from a latency of 0.5 seconds
between the issuing of a new orientation command and the execution of the command. The
baseline onboard software includes functionality that will generate desired pan and tilt
angles that will keep the camera pointing at a desired location. Due to the latency and the
attitude changes of the helicopter, these commands will have inherent error. In any case,
this functionality will be used to close the loop between the particle lter and the camera
orientation. In addition to the camera controller, the GTMax is also equipped with software
that will move the helicopter toward the point at which the camera is pointed.
The UAVRF also operates a variety of simulation tools for the GTMax. The simulation
tools utilize the actual ight software as well as models for the sensors and actuators onboard
the vehicle. These tools include a simulator for the onboard camera system (including the
latency of the Axis 213) and a simulated camera that utilizes the OpenGL graphics library.
Therefore, vision-based control systems can be implemented and tested in simulation on
synthetic images. Simulation results for the particle lter-based tracker will be presented
in the next chapter along with ight test results using the GTMax. A screenshot of the
simulator is shown in Figure 35. In this scenario, a simulated target is shown in the camera
view.
57
Figure 34. Axis 213 camera. This camera is used onboard the GTMax.
6.2 Coordinate System Transformation
The weighted average of the particles from the lter of Chapter 4 yields the expected target
position in the image. As in other position-based visual servoing system, the position in the
image must be used to generate the target position in the inertial frame. A simplied version
of the problem is depicted in Figure 36. The output of the particle lter directly gives e
h
and e
v
, the estimated horizontal and vertical distances between the target (in the frame)
and the center of the frame in pixels. Using the technique in [72], these measurements can
be used to form a vector in the direction of the target:
x
c
=
_
_
_
_
_
_
tan
_
e
h
FOV
h
N
pix,h
_
tan
_
evFOVv
N
pix,v
_
1
_
_
_
_
_
_
, (28)
where FOV
h
and FOV
v
are the horizontal and vertical elds of view, respectively, and N
pix,h
and N
pix,v
are the number of pixels in the horizontal and vertical dimensions, respectively.
The distance between the cameras focal point and the helicopters center of gravity is
ignored. Therefore, t
c
can be rotated from the camera coordinate system to the inertial
coordinate system through the use of rotation matrices:
x
i
= DCM
v,i
DCM
c,v
x
c
, (29)
58
Figure 35. Screenshot of the GTMax simulation tools. The scene window on the right
shows the simulated camera view that can be used for tracking testing.
where
DCM
v,i
=
_
_
_
_
_
_
cos(
v
) sin(
v
) 0
sin(
v
) cos(
v
) 0
0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
cos(
v
) 0 sin(
v
)
0 1 0
sin(
v
) 0 cos(
v
)
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0
0 cos(
v
) sin(
v
)
0 sin() cos(
v
)
_
_
_
_
_
_
(30)
DCM
c,v
=
_
_
_
_
_
_
cos(
c
) sin(
c
) 0
sin(
c
) cos(
c
) 0
0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
cos(
c
) 0 sin(
c
)
0 1 0
sin(
c
) 0 cos(
c
)
_
_
_
_
_
_
, (31)
where
v
,
v
, and
v
and are the vehicle pitch, roll, and yaw, and
c
and
c
are the camera
pan and tilt. (There is no roll axis available on the Axis 213. However, this work could
be extended to include camera roll by adding the rotation to DCM
c,v
.) Once the vector
between the vehicle and the target in the inertial frame, t
i
, is found, it is extended until it
intersects the target altitude. Therefore, the target is assumed to stay at the same altitude,
59
Figure 36. Simplied geometry used for coordinate system transformation. The particle
lter yields e
h
and e
v
, which are used to nd
e
and
e
.
on the ground. The intersection point is the estimated, relative target position, which is
then sent to a linear predictor. If a range-nder or terrain database is used, the vector could
be extended until it reaches the ground.
6.3 Linear Prediction
Once the current position of the target is estimated in the inertial frame, it is sent to a
predictor to account for inherent processing and communication delays. Since the GTMax
operates at low altitudes (< 1000 ft.) and has a relatively wide FOV (> 30 degrees hor-
izontal) the speed of the prediction is deemed to be more important than the accuracy.
Therefore, a linear predictor is used for this task.
The goal of linear prediction is to estimate future values of a signal using a linear
combination of past signal values, which are possibly noisy. The problem is to nd the
predictor coecients, a
k
so that the estimate
x
t
=
p

k=1
a
k
x
tk
(32)
is as close as possible, in a least squares sense to the future value, x
t
. The solution to the
60
equation is given by the normal equations [38]
p

k=1
a
k

t
x
tk
x
ti
=

t
x
t
x
ti
, 1 i p. (33)
This equation can be written in matrix form as
_

_
r
0
r
1
r
2
r
p1
r
1
r
0
r
1
r
p2
r
2
r
1
r
0
r
p3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
r
p1
r
p2
r
p3
r
0
_

_
_

_
a
1
a
2
a
3
.
.
.
a
p
_

_
=
_

_
r
1
r
2
r
3
.
.
.
r
p
_

_
, (34)
where
r
i
=

t
x
t
x
ti
(35)
The range of summation in Equations 33 and 35 depends on the prediction method used [63].
In the correlation method, the range of summation is t . However, since the
value of x is not known for all time, a rectangular window that begins at time S and ends at
time T is typically used and unknown values are set equal to zero. This causes the array of
equation 34 to be Toeplitz. Therefore, the normal equations can be solved eciently using
the Levinson recursion [38] even when large order predictors are used.
However, setting the unknown values to zero increases the prediction error. In the
covariance method, only known values are used for the prediction. The range of summation
in Equation 33 is S +p t T, and Equations 34 and 35 become
_

_
r
1,1
r
1,2
r
1,3
r
1,p
r
2,1
r
2,2
r
2,3
r
2,p
r
3,1
r
3,2
r
3,3
r
3,p
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
r
p,1
r
p,2
r
p,3
r
p,p
_

_
_

_
a
1
a
2
a
3
.
.
.
a
p
_

_
=
_

_
r
1,0
r
2,0
r
3,0
.
.
.
r
p,0
_

_
, (36)
r
i,j
=
T

t=p
x
tj
x
ti
. (37)
The matrix of Equation 36 is no longer Toeplitz. However, it is near Toeplitz, and it can
be solved eciently using the method described in [63].
61
Two covariance method linear predictors are used to predict the future north and east
coordinates of the target based on the output of the particle lter rotated into the inertial
frame. These future target coordinates are used to update the commanded orientation of
the camera and for waypoint generation.
6.4 Camera Command and Waypoint Generation
As mentioned before, the GTMax is equipped with software that generates the pan and tilt
angles required to keep the onboard camera pointed at a desired location. This software ex-
ecutes on the primary ight computer and accounts for the attitude changes of the GTMax.
Therefore, this software is utilized to command the camera to point at the predicted target
position. The output of the linear predictor is low-pass ltered and sent to the onboard
computer, where the camera control functionality does the necessary rotations and sends
the commands on to the camera. The ltering is done to ensure the camera movements are
kept smooth enough for the particle lter to successfully estimate the target state. This
method will be subject to errors that result from the camera orientation command latency.
The only way to remove this error is to predict the future attitude of the helicopter, which
depends on environmental factors, such as wind. This was not addressed as part of this
thesis.
The output of the linear predictor is also sent to the waypoint generator. The goal of the
waypoint generator is to determine the desired vantage point for the GTMax. The vantage
point is generated by blending three factors. First, any movements of the vehicle should
not require large attitude changes. If the attitude changes are too large, the target will not
appear to move smoothly from one frame to the next, and the particle lter might lose the
target. Additionally, attitude changes result in increased error in the camera orientation
commands. Second, the GTMax should be moved to a position that is near the target.
Finally, known obstructions, such as a line of trees or a wall, should be accounted for, and
the GTMax should move to a point where it will have an unobstructed view of the target.
To reduce the attitude changes of the vehicle, the particle lter tracking architecture
takes advantage of the ability of the GTMax baseline controller to automatically limit
62
the jerk of any trajectory. By limiting the acceleration changes, the attitude changes are
limited as well. This results in a steady camera platform that will not violate the smoothness
assumptions used to generate the particle lter system update model.
Once the attitude changes have been reduced, the waypoint generation is done over
a xed, user-dened planning horizon. The rst step in this planning is estimating the
points that the vehicle can reach during this planning horizon. This estimation uses simple
kinematics to determine a set of points that the will result if the helicopter accelerates in
a set of directions. Using linear interpolation, the space between these locations is lled in
to form a boundary. Any point within the boundary is assumed to be reachable within the
planning horizon.
Once the set of reachable points is approximated, the waypoint generator takes one of
three cases. The simplest case assumes that the there are no known obstructions to account
for as shown in Figure 37. Therefore, the existing software can be used that automatically
moves the helicopter toward the point at which the camera is pointed. The existing software
generates a waypoint that is in the direction of the target if the target is more than a
predetermined threshold away. The waypoint also commands the vehicle to y at the same
velocity of the target. The heading of the GTMax is commanded to slowly rotate until
its heading is parallel to the targets velocity vector. If the target is directly behind the
GTMax, the vehicle is commanded to rotate so the avionics bay will not obstruct the view
of the target.
The second and third cases for the waypoint generator involve known linear obstruc-
tions
1
. It is assumed that the obstruction is conservatively parameterized. Therefore, the
target is predicted to go behind the obstruction before it actually becomes occluded. When
the target is predicted to be occluded, a waypoint is generated that is on one of the lines
of sight. The lines of sight are generated using the predicted target location and the ob-
struction endpoints. The second waypoint generation case is used when neither line of sight
intersects the locus of reachable points as shown in Figure 38. In this case, the waypoint
1
The work in this thesis only accounts for known, parameterized linear obstructions such as a building
wall or a row of trees. Other types of obstructions and online obstruction parameterization are out of the
scope of this work.
63
Figure 37. Waypoint generation, case one. No obstructions are known. Therefore, the
next waypoint will be in the direction of the target and the existing GTMax software is
used. The gray circle represents the reachable points. The same result will occur even if
the target is outside of the reachable points.
is simply the nearest point that is on one of the lines of sight. Since the controller auto-
matically limits the jerk (and acceleration), the GTMax will move in the direction of the
nearest line of sight and not actually reach the waypoint within the planning horizon. The
heading of the vehicle is commanded to point toward the target as in [86] and [29].
The third and nal waypoint generation case is used when one of the lines of sight is
within the reachable set as shown in Figure 39. In this case the waypoint is generated that
will place the GTMax on the line of sight and nearer the target. The intersection of the
nearest line of sight with the reachable boundary is used as the next waypoint. The choice
of this waypoint allows the GTMax to move closer to the target and move away from the
obstruction.
The planning horizon is always longer than the update rate for the waypoint generator.
Therefore, the generated waypoint is rarely reached. Instead, is used to move the GTMax
in the desired direction with the desired heading.
The adaptive particle lter from Chapter 4 along with the camera and waypoint genera-
tor from this chapter were implemented and integrated with the GTMax secondary onboard
64
Figure 38. Waypoint generation, case two. A linear obstruction is known and the lines
of sight are not reachable. Therefore, the next waypoint is the nearest point on one of the
lines of sight. The dashed circle represents the reachable points.
computer and GCS. The next chapter presents simulation and ight test results when the
system was used to track a moving ground target.
65
Figure 39. Waypoint generation, case three. A linear obstruction is known and the lines
of sight are reachable. Therefore, the next waypoint is the reachable point on one of the
lines of sight that is nearest the target. The dashed circle represents the reachable points,
and the gray region represents the reachable points that are not occluded.
66
CHAPTER 7
RESULTS
The adaptive particle lter described in Chapter 4 was integrated with the secondary ight
computer software onboard the GTMax, and an operator interface was used integrated
with the GTMax GCS software. The operator interface allowed the operator to select
parameters and generate the initial reference histogram as well as monitor the performance
of the particle lter. After starting the particle lter software, the operator selects an
object of interest from the onboard imagery. This selection is used to generate the reference
histogram, the expected target size, and the boundaries of the pass bands used on the
chromatic lter used for initialization. All of this information is sent to the secondary ight
computer, where the particle lter initializes and tracks the target.
Since the GTMax utilizes the same software in simulation conguration as it does in
ight conguration, the particle lter closed loop tracking system was tested and proven in
the simulator. Those results will be presented rst followed by ight test results of parts of
the target tracking system.
7.1 Simulation Results
The GTMax simulator was used to track the target shown in Figure 40 using an adaptive
particle lter. The target follows a pattern at a speed of 20 feet per second (13.6 mph) and
is allowed to turn at a rate of 20 degrees per second. The simulated GTMax operated at
a constant altitude of 150 feet above ground level and the jerk was limited to two feet per
second cubed.
The tracking scenario begins with the helicopter hovering at an altitude of 150 feet
above ground. The ground target is at a stand still 100 feet in front of the the GTMax.
The camera on the GTMax is pointed in the direction of the target. However, the target is
not centered in the image initially.
67
Figure 40. Target used to test the particle lter in simulation. The left image was taken
near the target, while the right image was taken from an altitude of 150 feet.
When tracking commences, the operator pauses the simulated video stream and selects a
rectangle around the target. A histogram is generated using the pixels within the rectangle,
and the histogram is sent to the simulated secondary ight computer. Peaks in the histogram
are found and used to generate the pass bands of the initialization color lter. These peaks
are sent to the simulated secondary ight computer along with the size of the rectangle and
the reference histogram. All of this information is used to initialize the particle lter using
the technique described in Section 3.2.
Once the lter is initialized, the closed loop tracking system is turned on from a GCS
command. The orientation of the camera is adjusted to center the target in the image, and
the helicopter is moved closer to the target. The target is then put in motion to test the
tracking system.
The tracking system was tested in two simulation scenarios. The rst involved no ob-
structions to show the typical behavior of the system, and the second involved an obstruction
to show the utility of the waypoint generation.
7.1.1 Tracking With No Obstructions
The scenario involving no obstructions will be presented rst. The paths of the target,
helicopter, and commanded camera position are shown in Figure 41. The commanded
camera position and the path of the target are nearly identical showing that the tracking
system was successful. The output of the particle lter is shown along with the target
68
position in Figure 42. For the most part, the output of lter is identical to the actual target
position. However, when the target turns, a small amount of error is introduced. This is
a result of the latencies in the camera controller. The camera controller uses the current
attitude of the helicopter to generate the angular orientation commands that are sent to
the camera. These commands take 0.5 seconds to begin executing. Therefore, the ground
position that corresponds to the center of the image does not move smoothly, and it takes a
few frames for the particle lter to respond to the apparent jumps of the target. However,
since the output of the lter is ltered, the camera still follows the target.
Figure 41. Simulated paths of the target, helicopter, and commanded camera position.
The information in Figures 41 and 42 is plotted against time in Figure 43. From these
plots it is apparent that the helicopter smoothly followed the motion of the target. Since
there are no obstacles, this is the expected result.
The goal of the system is to provide constant imagery of the target. The target location
within the image is shown in Figure 44. As evident in the gure, the target stays well within
the FOV. Therefore, the system performed as desired.
69
Figure 42. Simulated path of the target, and particle lter output.
The moving average of the tracking error, performance estimator, and number of par-
ticles is shown in Figure 45. This plot shows how the neural network allows the particle
lter to respond to the changing tracking conditions. Although the neural network is used
to estimate the performance of the lter and not the tracking error, the network should
identify instances when the tracking error is large. Near 100 seconds the tracking error
increases, the neural network identies that the performance has degraded, and the number
of particles is increased along with the other changes that were discussed in Chapter 4.
Near 120 seconds, the error decreases, and the neural network allows the particle lter to
reduce the number of particles until the performance degrades again.
7.1.2 Tracking With Obstructions
To prove the ability of the waypoint generation system to negotiate obstructions, a lin-
ear obstruction is added to the previous test. The obstruction end points are located at
(north, east) = (40, 440) feet and (north, east) = (425, 440) feet, and the obstruction is 50
feet tall. The target turns near this obstruction when it transitions from moving north to
moving east. In a real life scenario this could represent an evasive target attempting to hide
70
Figure 43. Simulated paths of the target, helicopter, commanded camera position, and
particle lter output.
behind a building.
The paths of the target, helicopter, and commanded camera position are shown in Figure
46 along with the obstruction. Again, the commanded camera position and the path of the
target are nearly identical showing that the tracking system was successful. Also, it is
apparent that the helicopter motion kept the obstruction from coming between the camera
and the target. The output of the particle lter is shown along with the target position in
Figure 47. Again, the output matches the position of the target well, and a small amount
of error was introduce when the helicopter turns.
The tracking output along with the target position are plotted against time in Figure
48. Like Figure 46 this plot shows the helicopter maneuvering away from the obstruction.
Near 85 seconds, the target begins to make the turn to the east. Initially, the helicopter
begins to turn as well. However, once the conservatively parameterized obstruction begins
to come between the camera and the predicted target location, a waypoint is generated
71
Figure 44. Position of the target in the image.
that is on the line of sight. Therefore, the helicopter turns back to the west briey and
continues to move north to get on the same side of the obstruction as the target. Once
the simulated GTMax has negotiated the obstruction, it begins to follow the target again.
Near 110 seconds, the helicopter must again adjust its path to keep the obstruction from
blocking the view when the target turns back to the southwest. The helicopter continues
on its path to the east to get past the obstruction and then turns back to the southwest.
The target location within the image is shown in Figure 49. As evident in the gure,
the target stays well within the FOV even when the simulated GTMax was avoiding the
obstructions. Again, the system performed as desired.
The moving average of the tracking error, performance estimator, and number of par-
ticles is shown in Figure 50. This plot shows how the neural network allows the particle
lter to respond to the changing tracking conditions. Near 105 seconds the tracking error
increases as the helicopter attitude changes, the neural network identies that the perfor-
mance has degraded, and the number of particles is increased along with the other changes
72
Figure 45. Moving average of the tracking error, performance estimator, and number of
particles. The window for the moving average is 50 frames.
that were discussed in Chapter 4. Near 115 seconds, the error decreases, and the neural net-
work allows the particle lter to reduce the number of particles until the lter performance
degrades again.
7.2 Flight Test Results
The adaptive particle lter was also used to track a moving target in ight as part of The
Defense Advanced Research Projects Agencys (DARPA) Heterogeneous Urban Reconnais-
sance, Surveillance, and Target Acquisition (RSTA) Team (HURT) project. For this project
the GTMax2 was used. This vehicle is very similar to the GTMax. However, there is no
secondary ight computer. Therefore, the particle lter was implemented on a ground com-
puter. The ground computer relied upon an IEEE 802.11b wireless link to capture the
onboard imagery. Because of the slow speed of this link, the frame rate was on the order of
0.5 seconds. As a result, the particle lter was only able to track very slow moving targets
(< 5 mph). Also, due to project constrains, only imagery was collected.
The purpose of the test was to prove that the closed loop tracking system would be
successful. Therefore, the adaptive particle lter of Chapter 4 was modied to minimize
73
Figure 46. Simulated paths of the target, helicopter, and commanded camera position.
The obstruction is represented by the thick black line.
project risk and to take advantage of the additional processing power available on a ground
computer. First, the automatic initialization was not used. Instead the operator selected the
region of the image where the initial particles should be placed. This selection also generated
the reference histogram as it would do if the automatic initialization was used. The oine
initialization results of Chapter 3 show that the automatic initialization approach would
also have been successful. Second, a set of rules were used to map the various attributes of
the particle lter to parameter changes instead of the neural network. These rules had the
same eect as the neural network, but did not require any training data. The oine tracking
tests of Chapter 4 show that the neural network is also able to estimate the performance
of the lter. Unlike the rule-set, the neural network does not have to be retuned when the
tracking conditions change.
The ight test began with the helicopter hovering near the target of interest, a partially
occluded white van. The lter was initialized when the operator selected a rectangle around
the target. The target then emerged from the occlusion and began to slowly traverse a block
in urban terrain. The terrain contained a large amount of clutter that is the same color
74
Figure 47. Simulated path of the target, and particle lter output. The obstruction is
represented by the thick black line.
as the target. This clutter included sidewalks, objects on nearby buildings, as well as the
buildings themselves. The terrain also included a few trees that partially obstructed the
view of the target. However, the lter maintained lock on the target and was used to guide
the motion of the GTMax2.
Some of the collected imagery is shown in Figure 51. From these images it is clear that
the particle lter was successfully estimating the position of the target. The same imagery
was used to prove the particle lter performance in other parts of this thesis. The output of
the lter was used to generate waypoints for the GTMax2. This resulted in the helicopter
following the target as it traveled around a block of urban terrain.
The adaptive particle lter described in this thesis was implemented onboard the GTMax
and tested in ight. Again, the test scenario began with the helicopter hovering near
the target of interest, which was a red vehicle. The operator selected a rectangle around
the target, and this rectangle is used to generate the reference histogram onboard the
ground station. The ground station uploads this reference histogram to the secondary
ight computer when the tracking begins. Upon a successful initialization, the commands
75
Figure 48. Simulated paths of the target, helicopter, commanded camera position, and
particle lter output.
to the camera controller begin to be updated. At this point, the target would begin moving.
The target traveled along a grass runway in rural terrain at speeds between 10 and 30 feet
per second and would execute turns when it reached the end of the runway. While there was
relatively little clutter in the imagery, the target would become occluded by the helicopter
landing gear when it turned around. However, the adaptive particle lter was still able to
successfully track the target.
A sample of the resulting imagery is shown in Figure 52. The path of the GTMax,
commanded camera position, and particle lter output is shown in Figure 53. The estimated
position of the target in the frame is shown in Figure 54. From these plots it is evident
that the particle lter successfully estimated the position of the target, which resulted in
the target staying near the center of the frame, and the helicopter smoothly following the
target. These two ight tests along with the cases studied in Chapter 4 prove the ability of
the visual tracking system presented in this thesis to provide constant imagery of a target
76
Figure 49. Position of the target in the image.
of interest with minimal operator input.
77
Figure 50. Moving average of the tracking error, performance estimator, and number of
particles. The window for the moving average is 50 frames.
78
Figure 51. Frames collected during particle lter ight testing.
79
Figure 52. Frames collected during adaptive particle lter ight testing. The movie
collected during this portion of the ight is included in Appendix C, Figure 61.
80
Figure 53. Flight test paths of the GTMax, commanded camera position, and particle
lter output. The turn of the target in the plot corresponds to the turn in Figure 51.
Figure 54. Output of the particle lter in the image.
81
CHAPTER 8
CONCLUSIONS AND POSSIBLE FUTURE WORK
As UAVs take on more of the intelligence, surveillance, and reconnaissance (ISR) burden
for both military and civilian organizations, there is a need to increase the autonomy of
the vehicles. The work described in this thesis does that by providing an architecture that
enables a camera-equipped, rotary-winged UAV to provide constant imagery of a moving
ground target with minimal operator input.
The architecture utilizes an adaptive particle lter to interpret the incoming image
stream. The particle lter is an eective state estimation tool that is able to approxi-
mate the optimal, Bayesian solution even when non-Gaussian, multi-modal distributions
are present. However, like other state estimation lters, nding the correct model parame-
ters can be challenging. Previous particle lters typically use xed model parameters and
a xed number of particles. When the parameters are changed, they are usually modied
using auxiliary variables in the state space, which can increase the computational burden,
or based on a single performance estimate, which can incorrectly classify the performance
of the lter. The work in this thesis uses a performance estimate that relies upon several
performance metrics, which allows the lter parameters to respond to the changing tracking
conditions. When the lter is performing well, the parameters are changed to trade accu-
racy (in distribution) for eciency. When the lter is losing the target, the parameters are
changed to give the lter a better chance of regaining lock on the target. The performance
estimate is made using a set of measurements that are taken from the particle distribu-
tion. The various measurements are fused together using a multi-layer perceptron neural
network that is trained oine. The output of the neural network is then mapped to a set
of parameter changes using a rule set. The adaptation allows the lter to have the best of
both worlds by taking advantage of the multi-hypothesis nature of the particle lter when
the target was occluded or operating in an extremely cluttered environment, and operating
82
very eciently when the target is clearly visible and away from clutter.
The adaptive particle lter is compared to a baseline lter. The adaptive lter out-
performs the baseline in two of the three case studies in terms of both tracking error and
computational eciency. In the rst case, the adaptation not only decreases the computa-
tional load to allow the lter to operate in real-time, but it also allows the lter to quickly
reacquire the target after he comes back into the FOV. In the second case, the adaptation
allows the lter to maintain lock on a target that is operating in a heavily cluttered envi-
ronment without increasing the average number of particles. In the third case, the particle
lter performs approximately the same as the baseline lter. This shows that the baseline
lter is properly tuned for the tracking conditions. However, the adaptive lter does not
need to be tuned. The same lter parameters are used for all three tracking cases. Therefore
the adaptive lter is better suited for real-world applications.
Once the position of the target is estimated in the image, it is transformed into the
inertial frame much like a position-based visual servoing system does. The targets inertial
position is used in a linear predictor to allow the camera command and waypoint generator
to stay a step ahead of the target. The predicted target location is sent to the camera
controller onboard the GTMax unmanned helicopter. The location is used to determine
the appropriate camera orientation commands, which are then sent to the camera. The
predicted target location is also sent to the waypoint generator. Here, a vantage point for
the GTMax is determined that will allow the helicopter to move smoothly, stay near the
target, and avoid known, linear obstructions. The vantage point is sent to the helicopter as
the next waypoint.
8.1 Contributions
This thesis is contributing to the eld by providing the following:
A closed-loop, visual target tracking architecture for use onboard an unmanned heli-
copter that operates in real-time.
An adaptive particle lter for visual state estimation.
83
A novel automatic initialization routine for the adaptive particle lter.
A neural network performance estimator that fuses various performance measures
together to estimate the overall performance of the lter.
A methodology for adjusting the parameters of the particle lter parameters
using the output of the neural network performance estimator.
A camera command and waypoint generator for target tracking applications.
Through oine, simulation, and ight testing, these contributions were shown to provide a
powerful visual tracking system for use onboard the GTMax.
8.2 Possible Future Work
The work in this thesis could be extended in several areas. Five of those areas are discussed.
8.2.1 Further Comparisons to Other Techniques
As more capable processing power becomes available, other visual tracking techniques could
be implemented in real-time, and could possibly outperform the particle lter discussed in
this thesis (in terms of tracking error). For example, a template-matching search that
utilizes two-dimensional correlation measurements could be used to nd the probability
distribution over the image. In such a routine the target template would be available a
priori. The template would then be placed in several deterministically selected test regions
within the image, and the correlation between the template and the test region in the image
would be calculated. The region that yields in the highest correlation value would be the
estimated target position.
Such a routine would require substantial computational power particularly if the cor-
relation measurements are taken using color images. The computational burden would be
directly proportional to the the number of test regions. Therefore, it might be possible
to manage the burden by changing the number of test regions or the test region density
based on the tracking conditions. Additionally, the routine would require some means to
change the template when the target size changes in the imagery. A multi-resolution search
similar to the method used in a Lucas-Kanade tracker [14] could account for these changes.
84
However the method would require additional computational power. Comparing the per-
formance of the adaptive particle lter of this thesis to template-matching tracker could be
a valuable extension to this work, particularly as faster processors become more prevalent.
8.2.2 Target Recognition
In this work, the particle lter was initialized by processing the entire image and placing
particles in the regions the target is most likely to be. Therefore, the target has to be present,
and the a priori target description must be valid. An automated target recognition tool
could be used instead of this technique to add more autonomy to the system.
The particle lter has been used for target recognition [91, 24, 9]. An auxiliary state is
added that represents if the target is in the frame. Therefore, the particle lter can directly
estimate the probability that a target is present. The same ideas are also used to estimate
the number of targets that are present in cases where multiple targets are tracked.
Because of the increase in state dimension, real-time performance of the lter might be
a challenge, and the adaptations to the particle lter presented in this thesis could be used
to reduce the number of particles required. However, if targets could enter the frame at any
time, the particles must remain spread out. Therefore, the adaptation method would have
to be modied.
8.2.3 Multiple Aircraft
Teaming of small UAVs has recently gained interest as evidenced by DARPAs HURT
project, where a group of vehicles is assigned a set of ISR tasks that must be performed
cooperatively. If multiple aircraft are available to track a single ground target, the state
estimation could be improved by fusing information gathered from the multiple sensors.
When using multiple aircraft the particles would represent points in the inertial frame
and the lter must be distributed. When the lter is distributed, the resampling step
becomes challenging since all of the sensor platforms must inform the other platforms of
the weights of their particles. This problem was studied in [6, 17]. The performance of
the distributed particle lter could be evaluated in a hierarchical manner, where the neural
network operated on a higher level fusion node that communicated with all of the sensor
85
nodes. However, this was not studied as part of this thesis.
8.2.4 Multiple Targets
The particle lter has also been used to track multiple targets [40, 50, 16, 90, 24], where
the data association problem must be addressed along with the target localization problem.
Typically, an auxiliary variable is included that represents the probability that each particle
is in the vicinity of the target. This variable is used for data association.
The adaptation scheme developed in this thesis could be applicable since the required
number of particles is typically high due to the dimension of the problem. However, using the
neural network to measure the performance of the data association portion of the problem
is an open issue. Therefore directly applying the adaptation may be non-trivial.
8.2.5 Non-Steerable Cameras
The work in this thesis assumed the orientation of the camera could be separated from
the movement of the camera platform. Therefore, the camera orientation problem was
somewhat decoupled from the waypoint generation problem. If only a xed camera is
available onboard the UAV, the visual tracking problem becomes much more dicult since
vehicle movements can result in large FOV movements. While the adaptive particle lter
could still be used to estimate the state of the target, a dierent scheme would be required
to close the loop. In fact, an image-based visual servoing approach might be more applicable
than the position-based approach taken in this thesis.
86
APPENDIX A
PROOF OF MOTION METRIC
The distance generated by the motion measurements,
D
M
_
x
i
k
_
=
_
1
M
i
k
A
i
k
_
1
2
, (38)
is a metric. The distance is a measure of how far the particle is from full motion. Full
motion occurs when
M
i
k
A
i
k
= 1. The distance between the motion of two particles can be
written as
D
M
_
x
i
k
, x
j
k
_
=

M
i
k
A
i
k

M
j
k
A
j
k

1
2
. (39)
This is a metric.
Proof. For the distance to be a metric, the following three items must be valid.
1. D
M
_
x
i
k
, x
j
k
_
= D
M
_
x
j
k
, x
i
k
_
This condition is trivially satised since

M
i
k
A
i
k

M
j
k
A
j
k

1
2
=

M
j
k
A
j
k

M
i
k
A
i
k

1
2
.
2. D
M
_
x
i
k
, x
j
k
_
0
Trivially,

M
i
k
A
i
k

M
j
k
A
j
k

1
2
> 0 unless
M
i
k
A
i
k
=
M
j
k
A
j
k
, then D
M
_
x
i
k
, x
j
k
_
= 0.
3. D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
There are six cases to study.
Case 1.
M
i
k
A
i
k

M
j
k
A
j
k

M
l
k
A
l
k
_
D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
j
k
, x
l
k
__
2
=
_
M
l
k
A
l
k

M
i
k
A
i
k
_
+ 2D
M
_
x
i
k
, x
j
k
_
D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
j
k
_
+ D
M
_
x
j
k
, x
l
k
_
=
_
D
M
_
x
i
k
, x
l
k
_
2
+ 2D
M
_
x
i
k
, x
j
k
_
D
M
_
x
j
k
, x
l
k
__1
2

D
M
_
x
i
k
, x
l
k
_
87
Case 2.
M
i
k
A
i
k

M
l
k
A
l
k

M
j
k
A
j
k
D
M
_
x
i
k
, x
j
k
_
D
M
_
x
i
k
, x
l
k
_
D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
+D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
Case 3.
M
j
k
A
j
k

M
i
k
A
i
k

M
l
k
A
l
k
D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
i
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
Case 4.
M
j
k
A
j
k

M
l
k
A
l
k

M
i
k
A
i
k
D
M
_
x
i
k
, x
j
k
_
D
M
_
x
i
k
, x
l
k
_
D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
+D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
Case 5.
M
l
k
A
l
k

M
i
k
A
i
k

M
j
k
A
j
k
D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
i
k
, x
l
k
_
D
M
_
x
i
k
, x
l
k
_
Case 6.
M
l
k
A
l
k

M
j
k
A
j
k

M
i
k
A
i
k
_
D
M
_
x
i
k
, x
j
k
_
+D
M
_
x
j
k
, x
l
k
__
2
=
_
M
i
k
A
i
k

M
l
k
A
l
k
_
+ 2D
M
_
x
i
k
, x
j
k
_
D
M
_
x
j
k
, x
l
k
_
D
M
_
x
i
k
, x
j
k
_
+ D
M
_
x
j
k
, x
l
k
_
=
_
D
M
_
x
i
k
, x
l
k
_
2
+ 2D
M
_
x
i
k
, x
j
k
_
D
M
_
x
j
k
, x
l
k
__1
2

D
M
_
x
i
k
, x
l
k
_
Since all three items are true, the distance is a metric.
88
APPENDIX B
A NOTE ABOUT UPDATING A PORTION OF THE
PARTICLE SET AND CONVERGENCE
As discussed in Chapter 4, the eciency of the particle lter can be improved by not
updating all the particles at each step. Instead, information that was collected during
previous time steps can be used to generate the weights. This can only be done if the
measurement model can be rewritten from
p
_
y
k
|x
i
k
_
p
_
y
G
k
|x
i
k
_
, (40)
to
p
_
y
k
|x
i
kj
, j
_

j
p
_
y
G
k
|x
i
kj
_
, (41)
where y
G
k
is the a generic measurement. The eect on convergence is examined.
Reference [22] develops a convergence proof for the particle lter. The proof examines
each of the three particle ltering steps, prediction, update, and resampling, and shows
that the particle lter converges uniformly to the optimal, Bayesian lter with a rate of
1/N. Since the eciency improvement only eects the measurement, only that portion of
the convergence proof will be examined.
B.1 Preliminaries and Notation
The normal notation is used:
(u, v) =
_
uv
||u|| = (u, u)
g is the measurement function.

N
k|k1
is the sample-based prior pdf.

N
k|k
is the sample-based posterior pdf.

N,a
k|k
is the sample-based posterior pdf generated using only part of the particle set.
89

k|k1
is the optimal prior pdf.

k|k
is the optimal posterior pdf.
c
k|k1
and c
k|k
are constants.

k
is the error induced in the measurement by only updating part of the particle set.
is a generic, continuously dierentiable function that maps from the state space to a
probability,
nx
. This function is intended to reect the true pdf.
B.2 Discussion
In [22], Crisan shows if
E
_
__

N
k|k1
,
_

k|k1
,
_
_
2
_
c
k|k1
||||
N
, (42)
then
E
_
__

N
k|k
,
_

k|k
,
_
_
2
_
c
k|k
||||
N
. (43)
When only a portion of the particle set is updated, the sample-based posterior becomes

N,a
k|k
. This change in measurement model is represented by using g + as the measurement
function instead of g. Clearly, if the optimal lter distribution is also generated using
Equation 41, there is no eect on convergence. However, if the actual distribution is actually
generated using Equation 40, the convergence rate will be eected. Those eects will be
examined in the generic setting of using inaccurate measurement models.
When an incorrect measurement model is used, the convergence rate will be unaected
if
_

N
k|k1
,
_
= 0. (44)
In other words, if the error is uncorrelated with the prior, the convergence rate will be
unaected. While this is somewhat intuitive, it will be proven below.
Proof.
Only the convergence of the measurement step will be examined. The goal is to show
that if Equation 42 holds, then Equation 43 will result when
_

N
k|k1

_
= 0.
90
One has:
E
_
__

N
k|k
,
_

k|k
,
_
_
2
_
1/2
= E
_
_
_
_

N
k:k1
, (g +)
_
_

N
k:k1
, (g +)
_
_

k|k1
, g
_
_

k|k1
, g
_
_
2
_
_
1/2
. (45)
When the terms are expanded, the result is:
E
_
__

N
k|k
,
_

k|k
,
_
_
2
_
1/2
= E
_
_
_
_

N
k:k1
, g
_
+
_

N
k:k1
,
_
_

N
k:k1
, g
_
+
_

N
k:k1
,
_
(
k:k1
, g)
(
k:k1
, g)
_
2
_
_
1/2
(46)
Since
_

N
k|k1
,
_
= 0,
_

N
k:k1
,
_
= 0. This can be shown using integration by parts,
_
ab = a
_
b
_ _
a

_
b
_
. If is substituted for a and
N
k:k1
is substituted for b, then
_

N
k:k1
,
_
=
_

N
k|k1
,
_

_
_

N
k|k1
,
__
= 0 0 (47)
Therefore, Equation 46 can be simplied to.
E
_
__

N
k|k
,
_

k|k
,
_
_
2
_
1/2
= E
_
_
_
_

N
k:k1
, g
_
_

N
k:k1
, g
_
(
k:k1
, g)
(
k:k1
, g)
_
2
_
_
1/2
(48)
The proof in [22] picks up at this point to show that
E
_
_
_
_

N
k:k1
, g
_
_

N
k:k1
, g
_
(
k:k1
, g)
(
k:k1
, g)
_
2
_
_
1/2
c
k|k
||||
N
. (49)
.
While this result shows that it is possible to use a dierent measurement model than the
optimal lter without degrading the convergence rate, the change in measurement model
used in this thesis is also intended to be made to the optimal lter. Therefore, there is no
change in convergence rate since the particle lter is only able to converge to the optimal
lter. Additionally, the error introduced by not updating the entire particle set would most
likely be correlated to the prior. Therefore, this sucient (but not necessary) condition
does not apply.
91
APPENDIX C
PARTICLE FILTER RESULT MOVIES
Figure 55. Output of the baseline particle lter used to track a soldier in an urban
environment. This is a movie; in electronic copies of the thesis, click on the image to play
the movie.
Figure 56. Output of the baseline particle lter used to track a SUV in a rural environment.
This is a movie; in electronic copies of the thesis, click on the image to play the movie.
92
Figure 57. Output of the baseline particle lter used to track a van in an urban envi-
ronment. This is a movie; in electronic copies of the thesis, click on the image to play the
movie.
Figure 58. Output of the adaptive particle lter used to track a soldier in an urban
environment. This is a movie; in electronic copies of the thesis, click on the image to play
the movie.
93
Figure 59. Output of the adaptive particle lter used to track a SUV in a rural envi-
ronment. This is a movie; in electronic copies of the thesis, click on the image to play the
movie.
Figure 60. Output of the adaptive particle lter used to track a van in an urban envi-
ronment. This is a movie; in electronic copies of the thesis, click on the image to play the
movie.
94
Figure 61. Onboard imagery collected during ight testing of the adaptive particle lter.
This is a movie; in electronic copies of the thesis, click on the image to play the movie.
95
REFERENCES
[1] Aggarawal, J. and Nandhakumar, N., On the computation of motion from se-
quences of images - a review, Proceedings of the IEEE, vol. 76, no. 8, pp. 917 935,
1988.
[2] Anderson, B. D. O. and Moore, J. B., Optimal Filtering. Englewood Clis, NJ:
Prentice Hall, 1979.
[3] Apostoloff, N. and Zelinsky, A., Robust vision based lane tracking using multiple
cues and particle ltering, in Proceedings of the IEEE Intelligent Vehicle Symposium,
pp. 558563, 2003.
[4] Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T., A tutorial on
particle lters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Transac-
tions on Signal Processing, vol. 50, pp. 174188, February 2002.
[5] Azimi-Sadjadi, B. and Krishnaprasad, P. S., Change detection for nonlinear
systems; a particle ltering approach, in Proceedings of the 2002 American Control
Conference, vol. 5, pp. 4074 4079, 2002.
[6] Bashi, A. S., Jilkov, V. P., Li, X. R., and Chen, H., Distributed implementations
of particle lters, in Proceedings of the Sixth International Conference of Information
Fusion, pp. 1164 1171, 2003.
[7] Bernatz, A. and Thielecke, F., Navigation of a low ying VTOL aircraft with
the help of a downwards pointing camera, in Proceedings of the AIAA Guidance,
Navigation, and Control Conference and Exhibit, 2004.
[8] Boers, Y., On the number of samples to be drawn in particle ltering, in IEE
Colloquium on Target Tracking: Algorithms and Applications, pp. 5/15/6, November
1999.
[9] Boers, Y. and Driessen, H., A particle-lter-based detection scheme, IEEE Signal
Processing Letters, vol. 10, no. 10, pp. 300 302, 2003.
[10] Boers, Y. and Driessen, J., Interacting multiple model particle lter, in IEE
Proceedings - Radar, Sonar and Navigation, vol. 150, pp. 344349, 2003.
[11] Bolic, M., Djuric, P. M., and Hong, S., New resampling algorithms for particle
lters, in Proceedings of the IEEE International Conference on Acoustics, Speech, and
Signal Processing, vol. 2, pp. II 589 II 592, 2003.
[12] Bolic, M., Hong, S., and Djuric, M. ., Performance and complexity analysis of
adaptive particle ltering for tracking applications, in Proceedings of the Thirty-Sixth
Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 853857, 2002.
[13] Brogan, W. L., Modern Control Theory. Upper Saddle River, NJ: Prentice Hall,
third ed., 1991.
96
[14] Burt, P. and Adelson, E., The Laplacian pyramid as a compact image code,
IEEE Transactions on Communications, vol. 31, pp. 532 540, 1983.
[15] Cao, C., Hovakimyan, N., and Evers, J., Active control of visual sensor for aerial
tracking, in Proceedings of the AIAA Guidance, Navigation, and Control Conference
and Exhibit, 2006.
[16] Checka, N., Wilson, K., Siracusa, M., and Darrell, T., Multiple person and
speaker activity tracking with a particle lter, in Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp. V881 V884,
2004.
[17] Coates, M. J., Distributed particle ltering for sensor networks, in Proceedings of
the International Symposium on Information Processing in Sensor Networks, pp. 99
107, 2004.
[18] Collins, R. T. and Liu, Y., On-line selection of discriminative tracking features, in
Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 1,
pp. 346452, 2003.
[19] Comaniciu, D., Ramesh, V., and Meer, P., Real-time tracking of non-rigid objects
using mean shift, in Proceedings of the Conference on Computer Vision and Pattern
Recognition, vol. 2, pp. 142149, 2000.
[20] Corke, P. I., Visual control of robot manipulators a review, in Visual Servoing
(Hashimoto, K., ed.), (Singapore), pp. 1 32, World Scientic, 1993.
[21] Cover, T. and Thomas, J., Elements of Information Theory. New York, NY: John
Wiley & Sons, Inc., 1991.
[22] Crisan, D. and Douocet, A., A survey of convergence results on particle ltering,
IEEE Transactions on Signal Processing, vol. 50, pp. 736746, January 2002.
[23] Cui, N., Hong, L., and Layne, J., A comparison of nonlinear ltering approaches
with an application to ground target tracking, in Signal Processing, vol. 85, pp. 1469
1492, 2005.
[24] Czyz, J., Ristic, B., and Macq, B., A particle lter for doing detection and tracking
of multiple objects in color video sequences, in Proceedings of the 7th International
Conference on Information Fusion, pp. 176 182, 2005.
[25] Das, A., Fierro, R., Kumar, V., Ostrowski, J., Spletzer, J., and Taylor, C.,
A framework for vision based formation control, IEEE Transactions on Robotics,
vol. 18, no. 5, pp. 813 825, 2002.
[26] Daum, F. and Huang, J., Curse of dimensionality and particle lters, in Proceedings
of the 2003 IEEE Aerospace Conference, pp. 41979 41993, 2003.
[27] Dettrich, J. and Johnson, E., Multi-sensor navigation system for an autonomous
helicopter, in Proceedings of the 21st Digital Avionics Systems Conference, 2002.
97
[28] Djuric, P. M., Bugallo, M. F., and Miguez, J., Density assisted particle lters
for state and parameter estimation, in Proceedings of the 2004 IEEE International
Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. II 701 II 704,
2004.
[29] Dobrokhodov, V. N., Kaminer, I. I., and Jones, K. D., Vision-based tracking
and motion estimation for moving targets using small UAVs, in Proceedings of the
AIAA Guidance, Navigation, and Control Conference and Exhibit, 2006.
[30] Doucet, A., de Fritas, N., and Gordon, N., eds., Sequential Monte Carlo Methods
in Practice. New York, NY: Spring-Verlag, 2001.
[31] Driessen, H. and Boers, Y., Ecient particle lter for jump markov nonliner
systems, in IEE Proceedings - Radar, Sonar and Navigation, vol. 152, pp. 323326,
2005.
[32] Fox, D., KLD-sampling: Adaptive particle lters, in Advances in Neural Informa-
tion Processing Systems 14, 2002.
[33] Fox, D., Adapting the sample size in particle lters through KLD-sampling, Inter-
national Journal of Robotics Research, vol. 22, no. 12, pp. 9851003, 2003.
[34] Gatica-Perez, D., Lathoudand, G., McCowan, I., Odobez, J., and Moore,
D., Audio-visual speaker tracking with importance particle lters, in Proceedings of
the 2003 International Conference on Image Processing, vol. 3, pp. 2528, 2003.
[35] Gelb, A., ed., Applied Optimal Estimation. Cambridge: The MIT Press, 1984.
[36] Gordon, N. J., Salmond, D. J., and Smith, A. F. M., Novel approach to
nonlinear/non-Gaussian Bayesian state estimation, in IEE Proceedings of Radar and
Signal Processing, vol. 140, pp. 107113, April 1993.
[37] Hashimoto, K., A review on vision-based control of robot manipulators, Advanced
Robotics, vol. 17, no. 10, pp. 969 991, 2003.
[38] Hayes, M. H., Statistical Digital Signal Processing and Modeling. Hoboken, NJ: John
Wiley & Sons, Inc., 1996.
[39] Huang, C.-M., Jean, J.-H., Cheng, Y.-S., and Fu, L.-C., Visual tracking and
servoing system design for circling a target of an air vehicle simulated in virtual reality,
in Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots
and Systems, pp. 2393 2398, 2005.
[40] Hue, C., Cadre, J. P. L., and Perez, P., A particle lter to track multiple objects,
in Proceedings of the 2001 IEEE Workshop on Multi-Object Tracking, pp. 61 68, 2001.
[41] Hutchinson, S., Hager, G. D., and Corke, P. I., A tutorial on visual servo
control, IEEE Transactions on Robotics and Automation, vol. 12, no. 5, pp. 651
670, 1996.
[42] Isard, M. and Blake, A., CONDENSATION conditional density propagation for
visual tracking, International Journal of Computer Vision, vol. 29, no. 1, pp. 528,
1998.
98
[43] Johnson, E. and Kannan, S., Adaptive ight control for an autonomous unmanned
helicopter, in Proceedings of the AIAA Guidance, Navigation, and Control Conference,
2002.
[44] Johnson, E. N., Proctor, A. A., Ha, J., and Tannenbaum, A. R., Visual
search automation for unmanned aerial vehicles, IEEE Transactions on Aerospace
and Electronic Systems, vol. 41, pp. 219 232, 2005.
[45] Johnson, E. N. and Schrage, D. P., System integration and operation of a re-
search unmanned aerial vehicle, Journal of Aerospace Computing, Information, and
Communication, vol. 1, pp. 518, January 2004.
[46] Julier, S. J., Skewed approach to ltering, in Proceedings of the SPIE Vol. 3373,
p. 271-282, Signal and Data Processing of Small Targets 1998, Oliver E. Drummond;
Ed., pp. 271282, September 1998.
[47] Julier, S., Uhlmann, J., and Durrant-Whyte, H. F., A new method for the
nonlinear transformation of means and covariances in lters and estimators, IEEE
Transactions on Automatic Control, vol. 45, pp. 477482, March 2000.
[48] Kalman, R. E., A new approach to linear ltering and prediction problems, ASME
Journal of Basic Engineering, vol. 82 series D, pp. 3545, March 1960.
[49] Kappenman, J., Unmanned aerial system in the U.S. Army, in Proceedings of the
IDGA UAV Summit, 2006.
[50] Karlsson, R. and Gustafsson, F., Monte Carlo data association for multiple target
tracking, in IEEE Target Tracking: Algorithms and Applications, vol. 1, pp. 13/1
13/5, 2001.
[51] Kravaritis, G. and Mulgrew, B., Ground tracking using a variable structure
multiple model particle lter with varying number of particles, in Proceedings of the
2005 IEEE International Radar Conference, pp. 837841, 2005.
[52] Kumar, R., Sawhney, H., Samarasekera, S., Hsu, S., Tao, H., Guo, Y.,
Hanna, K., Pope, A., Wildes, R., Hirvonen, D., Hansen, M., and Burt, P.,
Aerial video surveillance and exploitation, Proceedings of the IEEE, vol. 89, no. 10,
pp. 1518 1539, 2001.
[53] Kwok, C., Fox, D., and Melia, M., Adaptive real-time particle lters for robot
localization, in Proceedings of the 2003 IEEE International Conference on Robotics
& Automation, vol. 2, pp. 28362841, 2003.
[54] Kwok, C., Fox, D., and Melia, M., Real-time particle lters, Proceedings of the
IEEE, vol. 92, pp. 469484, March 2004.
[55] Li, P. and Chaumette, F., Image cues fusion for object tracking based on particle
ltering, in Proceedings of the Third International Workshop on Articulated Motion
and Deformable Objects, pp. 99107, 2004.
[56] Li, X. and Zheng, N., Adaptive target color model updating for visual tracking using
particle lter, in IEEE International Conference on Systems, Man and Cybernetics,
pp. 31053109, 2004.
99
[57] Li, Y., Shen, Y., and Z.Liu, A new smoothing particle lter for tracking a ma-
neuvering target, in Proceedings of the Second International Conference on Machine
Learning and Cybernetics, pp. 10041008, 2003.
[58] Liu, F., Liu, Q., and Lu, H., Robust color-based tracking, in Proceedings of the
Third Conference on Image and Graphics, pp. 132135, 2004.
[59] Liu, J. and West, M., Combined parameter and state estimation in simulation-based
ltering, in Sequential Monte Carlo Methods in Practice (Doucet, A., de Freitas,
N., and Gordon, N., eds.), pp. 197 223, Springer, 2001.
[60] Loy, G., Fletcher, L., Apostoloff, N., and Zelinsky, A., An adaptive fu-
sion architecture for target tracking, in Proceedings of the Fifth IEEE International
Conference on Automatic Face and Gesture Recognition, pp. 248253, 2002.
[61] Lucas, B. and Kanade, T., An iterative image registration technique with an appli-
cation to stereo vision, in Proceedings of the Seventh International Joint Conference
on Articial Intelligence, 1981.
[62] Morelande, M. and Challa, S., Manoeuvring target tracking in clutter using par-
ticle lters, IEEE Transactions on Aerospace and Electronic Systems, vol. 41, pp. 252
270, January 2005.
[63] Morf, M., Dickinson, B., Kailath, T., and Vieira, A., Ecient solution of
covariance equations for linear prediction, IEEE Transactions on Acoustics, Speech,
and Signal Processing, vol. 25, pp. 429433, October 1977.
[64] Nummiaro, K., Koller-Meier, E., and Gool, L. V., A color-based particle lter,
in First International Workshop on Generative-Model-Based Vision, pp. 5360, 2002.
[65] Nummiaro, K., Koller-Meier, E., and Gool, L. V., Object tracking with an
adaptive color-based particle lter, in 24th DAGM Symposium on Pattern Recognition,
pp. 353360, September 2003.
[66] Unmanned aerial vehicles roadmap 2002-2027, tech. rep., Oce of the Secretary of
Defense, Washington, DC 20301, December 2002.
[67] Unmanned aerial systems roadmap 2005-2030, tech. rep., Oce of the Secretary of
Defense, Washington, DC 20301, August 2005.
[68] Perez, P., Hue, C., and Vermaak, J., Color-based probabilistic tracking, in
Proceedings of the European Conference on Computer Vision, pp. 661675, May 2002.
[69] Perez, P., Vermaak, J., and Blake, A., Data fusion for visual tracking with
particles, Proceedings of the IEEE, vol. 92, pp. 495513, March 2004.
[70] Pham, D. T. and Liu, X., Neural Networks for Identication Prediction and Control.
London: Springer-Verlag, 1995.
[71] Prazenica, R. J., Watkins, A., Kurdila, A. J., Ke, Q. F., and Kanande,
T., Vision-based Kalman ltering for aircraft state estimation and structure from
motion, in Proceedings of the AIAA Guidance, Navigation, and Control Conference
and Exhibit, 2005.
100
[72] Proctor, A. A. and Johnson, E. N., Vision-only aircraft ight control methods
and test results, in Proceedings of the AIAA Guidance, Navigation, and Control Con-
ference and Exhibit, 2004.
[73] Rabiner, L. R. and Juang, B. H., An introduction to hidden Markov models,
IEEE Acoustic and Speech Signal Processing Magazine, pp. 416, January 1986.
[74] Ristic, B., Arulampalam, S., and Gordon, N., Beyond the Kalman Filter: Particle
Filters for Tracking Applications. Boston: Artech House Publishers, 2004.
[75] Roberts, B. A. and Vallot, L. C., Vision aided inertial navigation, in Record of
the IEEE Position, Location, and Navigation Symposium, pp. 347 352, 1990.
[76] Rui, Y. and Chen, Y., Better proposal distributions: object tracking using un-
scented particle lter, in Proceedings of the 2001 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, vol. 2, pp. II 786 II 793, 2001.
[77] Shan, C., Wei, Y., Tan, T., and Ojardias, F., Real time hand tracking by com-
bining particle ltering and mean shift, in Proceedings of the Sixth IEEE International
Conference on Automatic Face and Gesture Recognition, pp. 669674, 2004.
[78] Shirai, Y. and Inoue, H., Guiding a robot by visual feedback in assembly tasks,
Pattern Recognition, vol. 5, pp. 99 108, 1973.
[79] Silveria, G. F., Carvalho, J. R. H., Madrid, M. K., Rives, P., and Bueno,
S. S., A fast vision-based road following strategy applied to the control of aerial
robots, in Proceedings of the XIV Brazilian Symposium on Computer Graphics and
Image Processing, pp. 226 231, 2001.
[80] Stepanyan, V. and Hovakimyan, N., Visual tracking of a maneuvering target, in
Proceedings of the AIAA Guidance, Navigation, and Control Conference and Exhibit,
2006.
[81] Triesch, J. and von der Malsburg, C., Self-organized integration of adaptive
visual cues for face tracking, in Proceedings of the Fourth IEEE International Con-
ference on Automatic Face and Gesture Recognition, pp. 102107, March 2000.
[82] Vaswani, N., Bound on errors in particle ltering with incorrect model assumptions
and its implication for change detection, in Proceedings of the International Confer-
ence on Acoustics, Speech, and Signal Processing, vol. 2, pp. II729 II732, 2004.
[83] Vaswani, N., Roy-Chowdhury, A. K., and Chellappa, R., Shape Activity: a
continuous-state HMM for moving/deforming shapes with application to abnormal
activity detection, IEEE Transactions on Image Processing, vol. 14, pp. 1603 1616,
October 2005.
[84] Vaswani, N., Change Detection in Stochastic Shape Dynamical Models with Applica-
tions in Activity Modeling and Abnormality Detection. PhD dissertation, University of
Maryland, College Park, 2004.
[85] Vermaak, J., Perez, P., Gangnet, M., and Blake, A., Towards improved obser-
vation models for visual tracking: selective adaptation, in Proceedings of the European
Conference on Computer Vision, May 2002.
101
[86] Wang, I. H., Dobrokhodov, V. N., Kaminer, I. I., and Jones, K. D., On
vision-based target tracking and range estimation for small UAVs, in Proceedings of
the AIAA Guidance, Navigation, and Control Conference and Exhibit, 2006.
[87] Webb, T. P., Prazenica, R. J., Kurdila, A. J., and Lind, R., Vision-based
state estimation for uninhabited aerial vehicles, in Proceedings of the AIAA Guidance,
Navigation, and Control Conference and Exhibit, 2005.
[88] Wu, A. D., Johnson, E. N., and Proctor, A. A., Vision-aided inertial navigation
for ight control, in Proceedings of the AIAA Guidance, Navigation, and Control
Conference and Exhibit, 2005.
[89] Wu, Y. and Huang, T. S., Robust visual tracking by integrating multiple cues
based on co-inference learning, International Journal of Computer Vision, vol. 58,
no. 1, pp. 5571, 2004.
[90] Yang, C., Duraiswami, R., and Davis, L., Fast multiple object tracking via a
hierarchical particle lter, in Proceedings of the Tenth IEEE International Conference
on Computer Vision, 2005.
[91] Zhou, S., Chellappa, R., and B. Moghaddam, B., Visual tracking and recognition
using appearance-adaptive models in particle lters, IEEE Transactions on Image
Processing, vol. 13, pp. 14911506, November 2004.
102
RELATED PUBLICATIONS
LUDINGTON, B., REIMANN, J., and VACHTSEVANOS, G., Target tracking and adversarial
reasoning for unmanned aerial vehicles in Proceedings of the IEEE Aerospace Conference, (Big
Sky, MT), March 2007, abstract accepted.
LUDINGTON, B., JOHNSON, E., and VACHTSEVANOS, G., Augmenting UAV autonomy:
vision-based navigation and target tracking for unmanned aerial vehicles IEEE Robotics and
Automation Magazine, September 2006.
LUDINGTON, B., REIMANN, J., BARLAS, I., and VACHTSEVANOS, G., Target tracking with
unmanned aerial vehicles: from single to swarm vehicle autonomy and intelligence in Proceedings
of the 14th Mediterranean Conference on Control and Automation, (Ancona, Italy), June 2006.
VACHTSEVANOS, G., LUDINGTON, B., REIMANN, J., ANTSAKLIS, P., and VALAVANIS, K.,
Modeling and control of unmanned aerial vehicles - current status and future directions in
Proceedings of the Workshop on Modeling and Control of Complex Systems, (Los, Cyprus), 2005.
HEGAZY, T., LUDINGTON, B., and VACHTSEVANOS, G., Reconnaissance and surveillance in
urban terrain with unmanned aerial vehicles in Proceedings of 16th IFAC World Congress,
(Prague, Czech Republic), July, 2005.
LUDINGTON, B., TANG, L., and VACHTSEVANOS, G., Target tracking in an urban warfare
environment using particle lters in Proceedings of the IEEE Aerospace Conference, (Big Sky,
MT), March, 2005.
103

You might also like