Paper v9

UAV Position and Attitude Estimation
using IMU, GNSS and Camera

Cesario Vincenzo Angelino, Vincenzo Rosario Baraniello, Luca Cicala
CIRA, the Italian Aerospace Research Centre
Capua, Italy
Email: {c.angelino,v.baraniello,l.cicala}@cira.it
AbstractThe aim of this paper is to present a method for
integration of measurements provided by inertial sensors (gyro-
scopes and accelerometers), GPS and a video system in order
to estimate position and attitude of an UAV (Unmanned Aerial
Vehicle). Inertial sensors are widely used for aircraft navigation
because they represent a low cost and compact solution, but
their measurements suffer of several errors which cause a rapid
divergence of position and attitude estimates. To avoid divergence
inertial sensors are usually coupled with other systems as for
example GNSS (Global Navigation Satellite System). In this paper
it is examined the possibility to couple the inertial sensors also
with a camera. A camera is generally installed on-board UAVs
for surveillance purposes, it presents several advantages with
respect to GNSS as for example great accuracy and higher data
rate. Moreover, it can be used in urban area or, more in general,
where multipath effects can forbid the application of GNSS. A
camera, coupled with a video processing system, can provide
attitude and position (up to a scale factor), but it has lower data
rate than inertial sensors and its measurements have latencies
which can prejudice the performances and the effectiveness of
the ight control system. The integration of inertial sensors
with a camera allows exploiting the better features of both the
systems, providing better performances in position and attitude
estimation.
I. INTRODUCTION
The integration of measurements provided by inertial sen-
sors (gyroscopes and accelerometers) and video systems (VS)
has gained an increasing popularity in the last decade [1].
These kind of sensors bring complementary chacacteristics.
Indeed inertial sensors cannot discern a change in inclination
from the body acceleration, due to Einsteins equivalence
principle. Moreover, inertial sensors present high uncertainty
at low velocities and low relative errors at high velocities
while camera motion estimation becomes less reliable at high
velocity because of the low camera acquisition framerate.
Furthermore, the camera estimates the motion up to a scale
factor, i.e., a near object with low relative speed appears the
same as a far object with high relative speed.
Based on these observations, the motivation for integration
of vision and inertial sensing is clear. There are two broad
approaches usually called loosely and tightly coupled. The
loosely coupled approach uses separate INS and VS blocks,
running at different rates and exchanging information. The
tightly-coupled systems combine the disparate raw data of
vision and inertial sensors in a single, optimum lter, rather
than cascading two lters, one for each sensor. Tipically,
in order to fusion the data a Kalman Filter is used. The
integration of visual and inertial sensing mode opens up
new directions for the application in the eld of autonomous
navigation, robotics and many other elds.
The use of inertial sensors in machine vision applications
has been proposed by now more than twenty years ago and fur-
ther studies have investigated the cooperation between inertial
and visual systems in autonomous navigation of mobile robots,
in image correction and to improve motion estimation for 3D
reconstruction of structures. More recently, a framework for
cooperation between vision sensors and inertial sensors has
been proposed. The use of gravity as a vertical reference
allows the calibration of focal length camera with a single
vanishing point, the segmentation of the vertical and the
horizontal plane. In [2] is presented with a function for de-
tecting the vertical (gravity) and 3D mapping, and in [3] such
vertical inertial reference is used to improve the alignment and
registration of the depth map.
Virtual reality applications have always required the use
of motion sensors worn by the user. The augmented reality,
where virtual reality is superimposed on a real time view, is
particularly sensitive to any discrepancy between the estimated
and the actual motion of the user. An accurate estimate of
the user position can be obtained through the use of many
sensors, using the external vision and specic markers such
as radio transponder, ultrasound beams or lasers, etc. Having
available MEMS inertial sensors at low cost, these are used
in combination with computer vision techniques [4]. The
objective is to have a visual-inertial tracker, able to operate
in arbitrary environments relying only on natural observable
characteristics, suitable for augmented reality applications.
In [5] an extended Kalman lter is used to combine the low-
frequency stability of vision sensors with the high frequency of
the gyroscope, thereby reaching a stable tracking of the pose (6
degrees of freedom) both static and dynamic. The augmented
reality systems are based on hybrid tracker to fuse the images
successfully in real time with 3D models dynamic [4], [6], [7],
[8].
Applications in robotics are increasing. Initial works on
vision systems for automated passenger vehicles have also
incorporated inertial sensors and have explored the benets
of visual-inertial tracking [9], [10]. Other applications include
agricultural vehicles [11], wheelchairs [12] and robots in
indoor environments [13], [14]. Other recent work related
to Unmanned Aerial Vehicles (UAV) include xed-wing air-
craft [15], [16], [17], [18], rotorcraft [19], [20] and space-
craft [21]. For underwater vehicles (AUVs) recent results can
be found in [22], [23], [24].
A good overview of the methods applied for attitude estima-
tion using a video system is provided in [25]. These methods
can be generally distinguished in: a) methods based on horizon
detection, b) methods based on the analysis of the optical ow,
c) methods based on the detection of salient features in the
eld of view. In [26] the pitch and roll angles, and three body
angular rates are determined by detection of the horizon in a
sequence of images and by the analysis of the optical ow.
The use of the horizon detection technique does not allow
determining the value of the yaw angle, whose knowledge
is required, for example, in trajectory tracking algorithms.
Moreover, for some congurations of the video system or
particular maneuvers, the horizon could not be present in the
images (for example cameras could frame only the sky if they
are also used for Collision Avoidance). In [27] it is proposed
another method, based on the detection of the horizon in an
images sequence, to estimate roll and pitch angles. In [28] the
aircraft complete attitude is determined using a single camera
observations. Specically, the proposed method required the
knowledge of the positions of the own aircraft and of the points
identied in the camera images. (a georeferenced map of the
surrounding space is required). In [29] it is proposed another
method to estimate both aircraft position and attitude using
a single camera. In this application is required the presence
of some reference points or some known geometries in the
surrounding space too.
This paper is organized as follow. In Section II some
notations are introduced. Section III provides a description
of the camera system. Section IV and Section V deal with
the estimation of camera egomotion and the Kalman Filter
theory respectively. Experimental setup and numerical analysis
are described in Section VI. Finally, conclusions are drawn in
Section VII.
II. NOTATIONS
We rst introduce some notation. Matrices are denoted by
capital italics. Vectors are denoted by bold fonts either capital
or small. A three-dimensional column vector is specied by
(s
1
, s
2
, s
3
)
T
. A vector is sometimes regarded as a column
matrix. So vector operation such as cross product () and
matrix operations such as matrix multiplication are applied to
three-dimensional vectors. Matrix operations precede vector
operations. 0 denotes a zero vector. For a matrix A = [a
ij
],
||A|| denotes the Euclidean norm of the matrix, i.e., ||[a
ij
]|| =
_
ij
a
2
ij
. We dene a mapping []
from a three-dimensional
vector to a 3 by 3 matrix:
_
(x
1
, x
2
, x
3
)
T
=
_
_
0 x
3
0
x
3
0 x
1
x
2
x
1
0
_
_
. (1)
Using this mapping, we can express cross operation of two
vectors by the matrix multiplication of a 3 by 3 matrix and a
column matrix:
X Y = [X]
Y. (2)
The reference system s, in which the coordinates of the
vector x are expressed, is reported as superscribe on the upper-
left corner of x with the notation x
s
.
The reference systems considered in the paper are the follow-
ing:
i inertial system;
e the ECEF (Earth Cenetered Earth Fixed) system;
n the NED (North East Down) system, tangent to the
earth ellipsoid, at a reference Latitude and Longitude
(Lat
0
, Lat
0
);
b the body system as seen by the IMU (Inertial
Measurement Unit);
c the camera system, with xed orientation with re-
spect the IMU.
III. CAMERA EGO-MOTION
A. Coordinate System
The default coordinate system in which the UAV position,
i.e. (Lat, Lon, Alt) of the vehicle center of mass, is given is
the ECEF system. C
e
n
is the reference change matrix from
NED to ECEF. The attitude and heading of the vehicle is
given as C
n
b
(, , ) where C
n
b
is the reference change matrix
associated with the roll (), pitch (), and yaw () angles.
In general, the camera standard reference system might be
rotated with respect to the body reference system. Thus, the
transformation from the camera reference to ECEF is given
by
C
e
c
= C
e
n
(Lat, Lon) C
n
b
(, , ) C
b
c
(3)
where C
b
c
represents the constant reference change matrix from
camera system c to body unit system b.
B. Essential Matrix
Suppose we have a calibrated camera, i.e. with known
intrinsic parameters, moving in a static environment following
an unknown trajectory. Consider two images taken by the
camera at different time. The equation linking two image
points p
and p which are the projection of the same 3D point

is:
p
Ep = 0, (4)
where p
= [u
, v
, 1]
T
and p = [u, v, 1]
T
are the (normalized)
homogeneous coordinates in the second and the rst camera
reference frame respectively. The matrix E = [r
c2
21
]
C
c2
c1
is
called the essential matrix and contains the information about
the camera movement. The essential matrix depends on 3
parameters for the rotation and 2 for the translation (since
eq. (4) does not depend on the modulus of r
c2
21
).
Here
_
C
c2
c1
, r
c2
21
represents the rigid transformation (rotation

and translation) which brings points (i. e. transforms coordi-
nates) from the camera 1 to the camera 2 standard reference
system. The coordinates change equation is given by
x
c2
= C
c2
c1
x
c1
+r
c2
21
(5)
where x
c
k
represent the coordinates in the camera k standard
reference system.
The translation vector between the two camera center r
s
12
=
O
s
2
O
s
1
expressed in a generic reference system s is equal
to C
s
2
r
c2
21
.
The matrix R contains in it different contributions:
1) the rotation of the camera by the pan-tilt unit;
2) changes in the vehicle attitude and heading;
3) rotation between the two NED systems (in t
2
and t
1
).
Indeed it can be also expressed as
C
c2
c1
= C
c2
e
C
e
c1
= C
e
c2
T
C
e
c1
(6)
By means of eq. (3) and with a little algebra we obtain
C
c2
c1
= C
c2
b2
C
b2
n2
C
n2
e
C
e
n1
. .
I
C
n1
b1
C
b1
c1
(7)
In the above equation we supposed (Lat,Lon) did not change
a lot between two consecutive frames, hence the idendity
approximation. This also means that the two NED reference
systems (i.e., corresponding to n
1
and n
2
) can be thought to
be coincident up to a translation. The eq.(7) then reads as
C
c2
c1
C
c2
b2
C
b2
n2
C
n1
b1
C
b1
c1
. (8)
Therefore, supposing a xed camera orientation with respect
to the body, the functional dependency is only in the rotation of
the body, i.e. in the change of the vehicle attitude and heading,
C
b2
b1
C
b2
n2
C
n1
b1
R(d, d, d) = R
x
(d)R
y
(d)R
z
(d)
(9)
where d =
2
1
, d =
2
1
and d =
2
1
.
The expression of R is given in (10) on the top of the next
page. It can be approximated for small angle variations by
R(d, d, d)
_
_
1 d d
d 1 d
d d 1
_
_
= I Wdt (11)
where = [d, d, d]
T
/dt is the angular velocity (in the
body reference system) and W = []
.
Once estimated r
c2
21
and C
c2
c1
from the essential matrix E,
they give information on v(t) and (t) respectively. Such
an information can be used either as a prediction or as a
measurement for the Kalman Filter. Similarly, the IMU output
can be used as a priori information for the vision algorithms.
IV. COMPUTATION OF E AND ERROR ANALYSIS
A. Fast computation of image point corrispondences
To calculate the essential matrix by video analysis, the
point corrispondences between two successive frames can
be calculated using a computer vision system based on the
Lukas-Kanade method [30]. This method assumes that the
displacements of the points belonging to the image plane
is small and approximately constant. Thus the velocity of
the points in the image plane must satisfy the Optical Flow
equation
I
u
(p) u +I
v
(p) v =
I(p) (12)
where p = (u, v) is a point in the image plane, I
u
, I
v
are the
components of the spatial gradient of the image intensity in the
image plane and x is the notation for the temporal derivative
of x.
The Lucas-Kanade method [30] can be used only when the
image ow vector between the two frames is small enough for
the differential equation of the optical ow to hold. When this
condition does not hold, a rough preliminary prediction of the
optical ow must be performed with other methods and the
Lukas-Kanade method can be used for renement only. When
external sensors, like IMU or GNSS (Global Navigation Satel-
lite System), are available, the camera motion is predictable,
so that a rough estimation of the optical ow can be easily
initialized starting from the sensor measurements [31]. To aim
this task, a basic knowledge of the scene 3D geometry should
be known or at least approximatively known. This is the case
of the Earth surface that can be modeled as ellipsoid, geoid
or digital elevation model. Using this prior information, the
computational burden of the optical ow estimation can be
strongly reduced, so that the delay of the video processing
can be made compatible with the application of the proposed
technology of sensor fusion to the navigation scenario.
To further reduce the computational burden of the com-
puter vision algorithms an improvement of the Lukas-Kanade
method, the Kanade-Lucas-Tomasi (KLT) feature matching
algorithm [32], has been adopted. The KLT has been improved
using the ellipsoid model of the Earth surface, increasing the
speed of calculus of the Optical Flow.
B. Essential matrix calculation
Obtained a set of corrispondences of image points for a cou-
ple of successive frames, our objective is to nd the essential
matrix E. Each point corrispondence gives one equation 4
which is linear and homogeneous in the elements of E. n
point correspondences give N such equations. Let
E =
_
_
e
1
e
4
e
7
e
2
e
5
e
8
e
3
e
6
e
9
_
_
, e = (e
1
, e
2
, . . . , e
9
)
T
. (13)
Rewriting equation (4) in the elements of E using k point
correspondences, we have
Ae = 0. (14)
where A is
A =
_
_
u
1
u
1
u
1
v
1
u
1
v
1
u
1
v
1
v
1
v
1
u
1
v
1
1
u
2
u
2
u
2
v
2
u
2
v
2
u
2
v
2
v
2
v
2
u
2
v
2
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
u
k
u
k
u
k
v
k
u
k
v
k
u
k
v
k
v
k
v
k
u
k
v
k
1
_
_
(15)
In presence of noise, A is a full rank matrix. Therefore we
solve for unit vector h such that
||Ah|| = min. (16)
R(d, d, d) =
_
_
cos d cos d cos d sin d sin d
sin dsin d cos d cos dsin d sin dsin d sin d + cos dcos d sin dcos d
cos dsin d cos d + sin dsin d cos dsin d sin d sin dcos d cos dcos d
_
_
(10)
The solution of h is the unit eigenvector of A
T
A associated
with the smallest eigenvalue. Then E is determined by
E = [E
1
E
2
E
3
] =
2
_
_
h
1
h
4
h
7
h
2
h
5
h
8
h
3
h
6
h
9
_
_
. (17)
C. Getting motion parameters from E
Once got the essential matrix, the Huang and Faugeras
theorem [33] allows to factorize E in rotation (C
c2
c1
) and
translation (r
c2
21
). Let E = UDV
be the singular value

decomposition (SVD) of E and let
S
=
_
_
0 1 0
1 0 0
0 0 0
_
_
and R
=
_
_
0 1 0
1 0 0
0 0 1
_
_
(18)
Then, there are four possible factorizations for E = [r
c2
21
]
C
c2
c1
given by
[r
c2
21
]
= U(S
)U
(19)
C
c2
c1
= UR
or C
c2
c1
= UR
(20)
where
= det
_
UV
_
. (21)
As observed in [34], the right factorization can be deter-
mined by imposing that 3D points (which can be founded by
triangulation), must lie always in front of the camera, i.e., their
z-coordinate must be positive.
D. Error analysis
In order to do an error analysis, we want to relate the
errors in the correpsondence matrix A to the solution of
the minimization problem (16). The sources of errors in the
image coordinates include spatial quantization, feature detector
errors, point mismatching and camera distortion. In a system
that is well calibrated so that systematic errors are negligible,
errors in the image coordinates of a feature can be modeled
by random variables. These errors result in the errors of the
estimates of the motion parameters.
The covariance matrix for e is
e
= 2
h
2D
h
A
T D
T
h
, (22)
where
A
T is the covariance matrix of A
T
and
D
h
= G
h
G
A
T
A
(23)
G
A
T
A
= [F
ij
] + [G
ij
], (24)
where [F
ij
] and [G
ij
] are matrices with 9 by n submatrices
F
ij
and G
ij
, respectively. F
ij
= a
ij
I
9
. G
ij
is a 9 by 9 matrix
with the ith column being the column vector A
j
and all other
columns being zeros. The matrix G
h
is given by
G
h
= SS
T
[h
1
I
9
h
2
I
9
. . . h
9
I
9
], (25)
where the columns of S represent the eigenvectors (or-
dered in nondecreasing order) of S
T
S, = diag {0, (
1

2
)
1
, . . . , (
1

n
)
1
}, being the eigenvalues of A
T
A,
and [h
1
, . . . , h
9
] is the solution of the minimization prob-
lem (16).
Assume the errors between different points and different
components in the image coordinates are uncorrelated, and
they have the same variance
2
. With this assumption we get
A
T =
2
diag {
1
,
2
, . . . ,
n
} , (26)
where
i
, 1 i n, is a 9 by 9 submatrix:
i
=
_
_
p
i
p
T
i
0 0
0 p
i
p
T
i
0
0 0 p
i
p
T
i
_
_
+
_
_
u
i
u
i
J u
i
v
i
J u
i
J
u
i
v
i
J v
i
v
i
J v
i
J
u
i
J v
i
J J
_
_
,
(27)
where
J =
_
_
1 0 0
0 1 0
0 0 0
_
_
. (28)
V. KALMAN FILTER
The nal purpose of the proposed approach is the estimation
of the position and attitude of the UAV. The data fusion
algorithm is based on the Extended Kalman Filter (EKF),
because dynamic and observation equations are non-linear in
their original form.
The state vector includes the position and speed of the
aircraft UAV in the tangent reference frame, and the rotation
matrix from body to tangent reference frame. The EKF is
applied to estimate the state variables. The input measurements
for prediction of state variables are inertial accelerations and
angular speeds processed by an IMU.
The fusion algorithm is composed by three EKF as sketched
out in Figure 1. With reference to Figure 1:
the EKF block GPS-IMU uses linear accelerations
provided by the IMU and position and speed provided
by GPS in order to compute position and speed of the
aircraft center of gravity;
the VIDEO-IMU(1) block is based on angular speed
measurements provided by the IMU and attitude
differences measured by the video-system in order to
estimate aircraft attitude. Attitude estimates are provided
to the other two EKF blocks;
Fig. 1. Proposed fusion algorithm block diagram.
the VIDEO-IMU (2) block takes the linear accelerations
provided by the IMU as in GPS-IMU, but it uses as
observables the position differences provided by the
video system, in order to estimate position and speed of
the aircraft center of gravity. This lter is used during
GPS outages only in order to propagate position and
speed. When it is not activated, its output is set equal to
zero.
For the sake of simplicity, we will consider the IMU
installed at the center of gravity of the aircraft (in order to
eliminate lever arm effects) with its axes aligned with the
body axes. Euler angles are used to represent aircraft attitude.
Therefore, we have
r
n
cg
=

C
n
b

f
b
+g
n
(29)
r
n
cg
= v
n
cg
(30)
and
_
_
= J
_
,

,

_

b
nb
(31)
Starting from these non-linear equations, EKF perturbated
form of the prediction equations are derived. Output equations
are based on the following GPS predicted measumerents
p
s
GPS
= C
s
n
_
r
n
cg
+C
n
b
r
b
cgGPS
(32)
v
n
GPS
=
_
v
n
cg
+C
n
b

_

b
nb
r
b
cgGPS
_
. (33)
For the video system considering that it provides delta-
attitudes (derived from the essential matrix), a pseudo-attitude
video based is built starting from an initial known value.
_
_
_
_
video(k)
=
_
_
_
_
video(k1)
+
_
_
d
_
_
video(k)
(34)
The output equation for the estimation of speed based on video
measurements is:
p
t
|V | = v
n
cg
, (35)
where p is provided by the video system and represent the
three unit vector in the direction of the movement, |V | is
the Euclidean norm of the speed considered constant (is the
last valid value obtained from the GPS-IMU sensor fusion
algorithm) and t is the time difference between two mea-
surements of the video system.
VI. NUMERICAL ANALYSIS
Two types of sensors congurations have been considered.
The rst one refers to a big lever arm between GPS and aircraft
center of gravity (10 meters on each axis), while the second
one is a more typical conguration because it consider a lever
arm only for the y-body axis with a magnitude of two meters.
Figure 2 shows the UAV attitude estimation results relative
to the rst conguration. Attitude is reported in terms of roll,
yaw and pitch angles. A performance improvement of attitude
estimation accuracy is clearly observable since the accuracy
of video based estimation is better than the GPS one. Indeed
the static accuracy of Video System is of the order of only 1
degree, while the GPS accuracy is of the order of 5 degrees.
For the second conguration, results are reported in Fig-
ure 3. It can be observed that the angle estimation shows
the same behavior of the rst conguration for both GPS and
video assisted estimation algorithms. This happens because
observability is guaranteed by the lever arm. On the contrary,
for (not reported here for the sake of brevity) and angles,
the estimation error with only the GPS is much greater with
respect to the previus conguration. This is principally due
to the absence of a signicant lever arm, in which case
the estimation error is that of simple INS measurements
integration. However, the IMU+VIDEO algorithm is more
robust to the lever arm effect since it still performs as well
as in the rst congifuration.
The distance between the GPS antenna and the aircraft
center of gravity is generally minimized because of the aircraft
conguration and the need to reduce cables and wiring.
Moreover, a big lever arm introduces errors in the estimation
of aircraft center of gravity position and speed. Figures 4
and 5 show the results of position and speed estimations
with video measurements. At time instant 120 sec a GPS
outage has been simulated with a duration of 120 sec. This
situation could happen, for example, while traveling through
a tunnel or in wooded area, in indoor navigation and/or in
hostile environments where the GPS signal is not available.
The use of video measurements allows continuing estimation
of position and speed with a good accuracy throughout the
sequence. Without the fusion of these measurements, solution
would diverge after short time, due to integration of inertial
measurement errors.
VII. CONCLUSION
In this paper an innovative technique for estimation of the
position and attitude of an UAV has been proposed. The main
0 50 100 150 200 250
10
0
10
20
30
40
Time [sec]
P
h
i

A
n
g
l
e

[
d
e
g
]

Estimated Video
Reference
Estimated GPS
0 50 100 150 200 250
10
8
6
4
2
0
2
4
6
Time [sec]
T
h
e
t
a

A
n
g
l
e

[
d
e
g
]

Estimated Video
Reference
Estimated GPS
0 50 100 150 200 250
50
0
50
100
150
200
Time [sec]
P
s
i

A
n
g
l
e

[
d
e
g
]

Estimated Video
Reference
Estimated GPS
Fig. 2. Attitude estimation in terms of Euler angles for the big lever arm
conguration. From top to bottom: roll () pitch () and yaw () angles.
Groundtruth (Red), IMU+GPS estimation (Black) and Video aided estimation
(Blue).
innovative aspect of this technique concerns the introduction of
a vision-based system composed of a low-cost camera device.
The information carried by the camera is then integrated with
classical data coming from the IMU and GPS units in a
sensor fusion algorithm. A linearized Kalman Filter has been
0 50 100 150 200 250
10
0
10
20
30
40
Time [sec]
P
h
i

A
n
g
l
e

[
d
e
g
]

Estimated Video
Reference
Estimated GPS
0 50 100 150 200 250
100
50
0
50
100
150
200
Time [sec]
P
s
i

A
n
g
l
e

[
d
e
g
]

Estimated Video
Reference
Estimated GPS
Fig. 3. Attitude estimation in terms of Euler angles for the small lever
arm conguration. Roll (up) and yaw (down) angles. Groundtruth (Red),
IMU+GPS (Black) and IMU+Video estimation (Blue).
implemented, because dynamic and observation equations are
non-linear in their original form.
The simulation setup is composed of accurate models for
realistic UAV ight behavior, sensors simulation and 3D pho-
torealistic rendering. Preliminary results are very encouraging,
since they show the improvement up to one order of magnitude
on the attitude accuracy, using an HD camera. Future work is
the validation of simulation results on a real ight test using
a mini-UAV platform.
REFERENCES
[1] P. Corke, J. Lobo, and J. Dias, An introduction to inertial and visual
sensing, I. J. Robotic Res., vol. 26, no. 6, pp. 519535, 2007.
[2] J. Lobo and J. Dias, Vision and inertial sensor cooperation using gravity
as a vertical reference, Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 25, no. 12, pp. 15971608, Dec. 2003.
[3] , Inertial sensed ego-motion for 3d vision, J. Robot. Syst., vol. 21,
no. 1, pp. 312, Jan. 2004.
[4] S. You, U. Neumann, and R. Azuma, Orientation tracking for outdoor
augmented reality registration, IEEE Computer Graphics and Applica-
tions, vol. 19, pp. 3642, 1999.
[5] S. You and U. Neumann, Fusion of vision and gyro tracking for robust
augmented reality registration, in Proceedings of the Virtual Reality
2001 Conference (VR01), ser. VR 01. Washington, DC, USA: IEEE
Computer Society, 2001, pp. 71.
0 50 100 150 200 250
50
40
30
20
10
0
10
20
30
Time [sec]
V
e
l

N
o
r
t
h

[
m
/
s
e
c
]

Estimated Video
Reference
0 50 100 150 200 250
5
0
5
10
15
20
25
Time [sec]
V
e
l

E
a
s
t

[
m
/
s
e
c
]

Estimated Video
Reference
0 50 100 150 200 250
5
0
5
Time [sec]
V
e
l

D
o
w
n

[
m
/
s
e
c
]

Estimated Video
Reference
Fig. 4. Velocity estimation in North-East-Down (NED) reference system.
Groundtruth (Black), IMU+Video estimation (Blue).
[6] L. P., K. A., P. A., and B. G., Inertial tracking for mobile augmented
reality, in Proc. of the IMTC 2002, vol. 2, Anchorage, USA, 2002, pp.
15831587. [Online]. Available: Lang2002.pdf
[7] U. Neumann, S. You, J. Hu, B. Jiang, and J. Lee, Augmented virtual
environments (ave): Dynamic fusion of imagery and 3d models, in
Proceedings of the IEEE Virtual Reality 2003, ser. VR 03. Washington,
DC, USA: IEEE Computer Society, 2003, pp. 61.
[8] B. Jiang, U. Neumann, and S. You, A robust hybrid tracking system for
outdoor augmented reality, in Proceedings of the IEEE Virtual Reality
0 50 100 150 200 250
600
500
400
300
200
100
0
Time [sec]
P
o
s

D
o
w
n

[
m
]

Estimated Video
Reference
120 140 160 180 200 220 240
560
540
520
500
480
460
440
Time [sec]
P
o
s

D
o
w
n

[
m
]

Estimated Video
Reference
Fig. 5. Down position estimation. Top: entire simulation. Bottom: zoom
during the GPS outage. Groundtruth (Black), IMU+Video estimation (Blue).
2004, ser. VR 04. Washington, DC, USA: IEEE Computer Society,
2004, pp. 3.
[9] E. D. Dickmanns, Vehicles capable of dynamic vision: a new breed of
technical beings? Artif. Intell., vol. 103, no. 1-2, pp. 4976, Aug. 1998.
[10] J. Goldbeck, B. Huertgen, S. Ernst, and L. Kelch, Lane following
combining vision and dgps, Image and Vision Computing, vol. 18, no. 5,
pp. 425433, 2000.
[11] T. Hague, J. A. Marchant, and N. D. Tillett, Ground based sensing sys-
tems for autonomous agricultural vehicles, Computers and Electronics
in Agriculture, vol. 25, no. 1-2, pp. 1128, 2000.
[12] T. Goedem?, M. Nuttin, T. Tuytelaars, and L. V. Gool, Vision based
intelligent wheel chair control: The role of vision and inertial sensing
in topological navigation, Journal of Robotic Systems, vol. 21, no. 2,
pp. 8594, 2004.
[13] D. D. Diel, P. DeBitetto, and S. Teller, Epipolar constraints for vision-
aided inertial navigation, Proc. IEEE Motion and Video Computing, pp.
221228, 2005.
[14] I. Stratmann and E. Solda, Omnidirectional vision and inertial clues
for robot navigation, Journal of Robotic Systems, vol. 21, no. 1, pp.
3339, 2004.
[15] J. Kim and S. Sukkarieh, Real-time implementation of airborne inertial-
slam, Robot. Auton. Syst., vol. 55, no. 1, pp. 6271, Jan. 2007.
[16] M. Bryson and S. Sukkarieh, Building a robust implementation of
bearing-only inertial slam for a uav, J. Field Robotics, vol. 24, no.
1-2, pp. 113143, 2007.
[17] J. Nyg ards, P. Skoglar, M. Ulvklo, and T. H ogstr om, Navigation aided
image processing in uav surveillance: Preliminary results and design of
an airborne experimental system, J. Robot. Syst., vol. 21, no. 2, pp.
6372, Feb. 2004.
[18] S. Graovac, Principles of fusion of inertial navigation and dynamic
vision, J. Robot. Syst., vol. 21, no. 1, pp. 1322, Jan. 2004.
[19] L. Muratet, S. Doncieux, Y. Briere, and J.-A. Meyer, A contribution
to vision-based autonomous helicopter ight in urban environments,
Robotics and Autonomous Systems, vol. 50, no. 4, pp. 195209, 2005.
[20] P. Corke, An inertial and visual sensing system for a small autonomous
helicopter, J. Field Robotics, vol. 21, no. 2, pp. 4351, 2004.
[21] S. I. Roumeliotis, A. E. Johnson, and J. F. Montgomery, Augmenting
inertial navigation with image-based motion estimation, in ICRA.
IEEE, 2002, pp. 43264333.
[22] R. Eustice, H. Singh, J. Leonard, M. Walter, and R. Ballard, Visually
navigating the rms titanic with slam information lters, in Proceedings
of Robotics: Science and Systems, Cambridge, USA, June 2005.
[23] A. Huster and S. M. Rock, Relative position sensing by fusing monoc-
ular vision and inertial rate sensors, Ph.D. dissertation, STANFORD
UNIVERSITY, 2003.
[24] M. Dunbabin, K. Usher, and P. I. Corke, Visual motion estimation for
an autonomous underwater reef monitoring robot, in FSR, 2005, pp.
3142.
[25] A. E. R. Shabayek, C. Demonceaux, O. Morel, and D. Fo, Vision
based uav attitude estimation: Progress and insights, Journal of Intel-
ligent & Robotic Systems, vol. 65, pp. 295308, 2012.
[26] D. Dusha, W. Boles, and R. Walker, Attitude estimation for a xed-
wing aircraft using horizon detection and optical ow, in Proceedings
of the 9th Biennial Conference of the Australian Pattern Recognition
Society on Digital Image Computing Techniques and Applications, ser.
DICTA 07. Washington, DC, USA: IEEE Computer Society, 2007,
pp. 485492.
[27] S. Thurrowgood, D. Soccol, R. J. D. Moore, D. Bland, and M. V.
Srinivasan, A Vision based system for attitude estimation of UAVS, in
IEEE/RSJ International Conference on Intelligent Robots and Systems.
IEE, October 2009, pp. 57255730.
[28] V. Sazdovski, P. M. G. Silson, and A. Tsourdos, Attitude determination
from single camera vector observations, in IEEE Conf. of Intelligent
Systems. IEEE, July 2010, pp. 4954.
[29] Z. Zhu, S. Bhattacharya, M. U. D. Haag, and W. Pelgrum, Using single-
camera geometry to perform gyro-free navigation and attitude deter-
mination, in IEEE/ION Position Location and Navigation Symposium
(PLANS), vol. 24, no. 5. IEEE, May 2010, pp. 4954.
[30] B. D. Lucas and T. Kanade, An iterative image registration technique
with an application to stereo vision, in Proceedings of the 7th in-
ternational joint conference on Articial intelligence - Volume 2, ser.
IJCAI81. San Francisco, CA, USA: Morgan Kaufmann Publishers
Inc., 1981, pp. 674679.
[31] M. Hwangbo, J.-S. Kim, and T. Kanade, Inertial-aided klt feature
tracking for a moving camera, in IROS. IEEE, 2009, pp. 19091916.
[32] C. Tomasi and T. Kanade, Detection and tracking of point features,
International Journal of Computer Vision, Tech. Rep., 1991.
[33] T. S. Huang and O. D. Faugeras, Some properties of the e matrix
in two-view motion estimation, IEEE Trans. Pattern Anal. Mach.
Intell., vol. 11, no. 12, pp. 13101312, Dec. 1989. [Online]. Available:
http://dx.doi.org/10.1109/34.41368
[34] H. C. Longuet-Higgins, A computer algorithm for reconstructing a
scene from two projections, Nature, vol. 293, no. 5828, pp. 133135,
1981.

Paper v9

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper v9

Uploaded by

Copyright:

Available Formats

UAV Position and Attitude Estimation

using IMU, GNSS and Camera

and p which are the projection of the same 3D point

represents the rigid transformation (rotation

be the singular value

You might also like