Professional Documents
Culture Documents
ij
a
2
ij
. We dene a mapping []
from a three-dimensional
vector to a 3 by 3 matrix:
_
(x
1
, x
2
, x
3
)
T
=
_
_
0 x
3
0
x
3
0 x
1
x
2
x
1
0
_
_
. (1)
Using this mapping, we can express cross operation of two
vectors by the matrix multiplication of a 3 by 3 matrix and a
column matrix:
X Y = [X]
Y. (2)
The reference system s, in which the coordinates of the
vector x are expressed, is reported as superscribe on the upper-
left corner of x with the notation x
s
.
The reference systems considered in the paper are the follow-
ing:
i inertial system;
e the ECEF (Earth Cenetered Earth Fixed) system;
n the NED (North East Down) system, tangent to the
earth ellipsoid, at a reference Latitude and Longitude
(Lat
0
, Lat
0
);
b the body system as seen by the IMU (Inertial
Measurement Unit);
c the camera system, with xed orientation with re-
spect the IMU.
III. CAMERA EGO-MOTION
A. Coordinate System
The default coordinate system in which the UAV position,
i.e. (Lat, Lon, Alt) of the vehicle center of mass, is given is
the ECEF system. C
e
n
is the reference change matrix from
NED to ECEF. The attitude and heading of the vehicle is
given as C
n
b
(, , ) where C
n
b
is the reference change matrix
associated with the roll (), pitch (), and yaw () angles.
In general, the camera standard reference system might be
rotated with respect to the body reference system. Thus, the
transformation from the camera reference to ECEF is given
by
C
e
c
= C
e
n
(Lat, Lon) C
n
b
(, , ) C
b
c
(3)
where C
b
c
represents the constant reference change matrix from
camera system c to body unit system b.
B. Essential Matrix
Suppose we have a calibrated camera, i.e. with known
intrinsic parameters, moving in a static environment following
an unknown trajectory. Consider two images taken by the
camera at different time. The equation linking two image
points p
Ep = 0, (4)
where p
= [u
, v
, 1]
T
and p = [u, v, 1]
T
are the (normalized)
homogeneous coordinates in the second and the rst camera
reference frame respectively. The matrix E = [r
c2
21
]
C
c2
c1
is
called the essential matrix and contains the information about
the camera movement. The essential matrix depends on 3
parameters for the rotation and 2 for the translation (since
eq. (4) does not depend on the modulus of r
c2
21
).
Here
_
C
c2
c1
, r
c2
21
1
, d =
2
1
and d =
2
1
.
The expression of R is given in (10) on the top of the next
page. It can be approximated for small angle variations by
R(d, d, d)
_
_
1 d d
d 1 d
d d 1
_
_
= I Wdt (11)
where = [d, d, d]
T
/dt is the angular velocity (in the
body reference system) and W = []
.
Once estimated r
c2
21
and C
c2
c1
from the essential matrix E,
they give information on v(t) and (t) respectively. Such
an information can be used either as a prediction or as a
measurement for the Kalman Filter. Similarly, the IMU output
can be used as a priori information for the vision algorithms.
IV. COMPUTATION OF E AND ERROR ANALYSIS
A. Fast computation of image point corrispondences
To calculate the essential matrix by video analysis, the
point corrispondences between two successive frames can
be calculated using a computer vision system based on the
Lukas-Kanade method [30]. This method assumes that the
displacements of the points belonging to the image plane
is small and approximately constant. Thus the velocity of
the points in the image plane must satisfy the Optical Flow
equation
I
u
(p) u +I
v
(p) v =
I(p) (12)
where p = (u, v) is a point in the image plane, I
u
, I
v
are the
components of the spatial gradient of the image intensity in the
image plane and x is the notation for the temporal derivative
of x.
The Lucas-Kanade method [30] can be used only when the
image ow vector between the two frames is small enough for
the differential equation of the optical ow to hold. When this
condition does not hold, a rough preliminary prediction of the
optical ow must be performed with other methods and the
Lukas-Kanade method can be used for renement only. When
external sensors, like IMU or GNSS (Global Navigation Satel-
lite System), are available, the camera motion is predictable,
so that a rough estimation of the optical ow can be easily
initialized starting from the sensor measurements [31]. To aim
this task, a basic knowledge of the scene 3D geometry should
be known or at least approximatively known. This is the case
of the Earth surface that can be modeled as ellipsoid, geoid
or digital elevation model. Using this prior information, the
computational burden of the optical ow estimation can be
strongly reduced, so that the delay of the video processing
can be made compatible with the application of the proposed
technology of sensor fusion to the navigation scenario.
To further reduce the computational burden of the com-
puter vision algorithms an improvement of the Lukas-Kanade
method, the Kanade-Lucas-Tomasi (KLT) feature matching
algorithm [32], has been adopted. The KLT has been improved
using the ellipsoid model of the Earth surface, increasing the
speed of calculus of the Optical Flow.
B. Essential matrix calculation
Obtained a set of corrispondences of image points for a cou-
ple of successive frames, our objective is to nd the essential
matrix E. Each point corrispondence gives one equation 4
which is linear and homogeneous in the elements of E. n
point correspondences give N such equations. Let
E =
_
_
e
1
e
4
e
7
e
2
e
5
e
8
e
3
e
6
e
9
_
_
, e = (e
1
, e
2
, . . . , e
9
)
T
. (13)
Rewriting equation (4) in the elements of E using k point
correspondences, we have
Ae = 0. (14)
where A is
A =
_
_
u
1
u
1
u
1
v
1
u
1
v
1
u
1
v
1
v
1
v
1
u
1
v
1
1
u
2
u
2
u
2
v
2
u
2
v
2
u
2
v
2
v
2
v
2
u
2
v
2
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
u
k
u
k
u
k
v
k
u
k
v
k
u
k
v
k
v
k
v
k
u
k
v
k
1
_
_
(15)
In presence of noise, A is a full rank matrix. Therefore we
solve for unit vector h such that
||Ah|| = min. (16)
R(d, d, d) =
_
_
cos d cos d cos d sin d sin d
sin dsin d cos d cos dsin d sin dsin d sin d + cos dcos d sin dcos d
cos dsin d cos d + sin dsin d cos dsin d sin d sin dcos d cos dcos d
_
_
(10)
The solution of h is the unit eigenvector of A
T
A associated
with the smallest eigenvalue. Then E is determined by
E = [E
1
E
2
E
3
] =
2
_
_
h
1
h
4
h
7
h
2
h
5
h
8
h
3
h
6
h
9
_
_
. (17)
C. Getting motion parameters from E
Once got the essential matrix, the Huang and Faugeras
theorem [33] allows to factorize E in rotation (C
c2
c1
) and
translation (r
c2
21
). Let E = UDV
=
_
_
0 1 0
1 0 0
0 0 0
_
_
and R
=
_
_
0 1 0
1 0 0
0 0 1
_
_
(18)
Then, there are four possible factorizations for E = [r
c2
21
]
C
c2
c1
given by
[r
c2
21
]
= U(S
)U
(19)
C
c2
c1
= UR
or C
c2
c1
= UR
(20)
where
= det
_
UV
_
. (21)
As observed in [34], the right factorization can be deter-
mined by imposing that 3D points (which can be founded by
triangulation), must lie always in front of the camera, i.e., their
z-coordinate must be positive.
D. Error analysis
In order to do an error analysis, we want to relate the
errors in the correpsondence matrix A to the solution of
the minimization problem (16). The sources of errors in the
image coordinates include spatial quantization, feature detector
errors, point mismatching and camera distortion. In a system
that is well calibrated so that systematic errors are negligible,
errors in the image coordinates of a feature can be modeled
by random variables. These errors result in the errors of the
estimates of the motion parameters.
The covariance matrix for e is
e
= 2
h
2D
h
A
T D
T
h
, (22)
where
A
T is the covariance matrix of A
T
and
D
h
= G
h
G
A
T
A
(23)
G
A
T
A
= [F
ij
] + [G
ij
], (24)
where [F
ij
] and [G
ij
] are matrices with 9 by n submatrices
F
ij
and G
ij
, respectively. F
ij
= a
ij
I
9
. G
ij
is a 9 by 9 matrix
with the ith column being the column vector A
j
and all other
columns being zeros. The matrix G
h
is given by
G
h
= SS
T
[h
1
I
9
h
2
I
9
. . . h
9
I
9
], (25)
where the columns of S represent the eigenvectors (or-
dered in nondecreasing order) of S
T
S, = diag {0, (
1
2
)
1
, . . . , (
1
n
)
1
}, being the eigenvalues of A
T
A,
and [h
1
, . . . , h
9
] is the solution of the minimization prob-
lem (16).
Assume the errors between different points and different
components in the image coordinates are uncorrelated, and
they have the same variance
2
. With this assumption we get
A
T =
2
diag {
1
,
2
, . . . ,
n
} , (26)
where
i
, 1 i n, is a 9 by 9 submatrix:
i
=
_
_
p
i
p
T
i
0 0
0 p
i
p
T
i
0
0 0 p
i
p
T
i
_
_
+
_
_
u
i
u
i
J u
i
v
i
J u
i
J
u
i
v
i
J v
i
v
i
J v
i
J
u
i
J v
i
J J
_
_
,
(27)
where
J =
_
_
1 0 0
0 1 0
0 0 0
_
_
. (28)
V. KALMAN FILTER
The nal purpose of the proposed approach is the estimation
of the position and attitude of the UAV. The data fusion
algorithm is based on the Extended Kalman Filter (EKF),
because dynamic and observation equations are non-linear in
their original form.
The state vector includes the position and speed of the
aircraft UAV in the tangent reference frame, and the rotation
matrix from body to tangent reference frame. The EKF is
applied to estimate the state variables. The input measurements
for prediction of state variables are inertial accelerations and
angular speeds processed by an IMU.
The fusion algorithm is composed by three EKF as sketched
out in Figure 1. With reference to Figure 1:
the EKF block GPS-IMU uses linear accelerations
provided by the IMU and position and speed provided
by GPS in order to compute position and speed of the
aircraft center of gravity;
the VIDEO-IMU(1) block is based on angular speed
measurements provided by the IMU and attitude
differences measured by the video-system in order to
estimate aircraft attitude. Attitude estimates are provided
to the other two EKF blocks;
Fig. 1. Proposed fusion algorithm block diagram.
the VIDEO-IMU (2) block takes the linear accelerations
provided by the IMU as in GPS-IMU, but it uses as
observables the position differences provided by the
video system, in order to estimate position and speed of
the aircraft center of gravity. This lter is used during
GPS outages only in order to propagate position and
speed. When it is not activated, its output is set equal to
zero.
For the sake of simplicity, we will consider the IMU
installed at the center of gravity of the aircraft (in order to
eliminate lever arm effects) with its axes aligned with the
body axes. Euler angles are used to represent aircraft attitude.
Therefore, we have
r
n
cg
=
C
n
b
f
b
+g
n
(29)
r
n
cg
= v
n
cg
(30)
and
_
_
= J
_
,
,
_
b
nb
(31)
Starting from these non-linear equations, EKF perturbated
form of the prediction equations are derived. Output equations
are based on the following GPS predicted measumerents
p
s
GPS
= C
s
n
_
r
n
cg
+C
n
b
r
b
cgGPS
(32)
v
n
GPS
=
_
v
n
cg
+C
n
b
_
b
nb
r
b
cgGPS
_
. (33)
For the video system considering that it provides delta-
attitudes (derived from the essential matrix), a pseudo-attitude
video based is built starting from an initial known value.
_
_
_
_
video(k)
=
_
_
_
_
video(k1)
+
_
_
d
_
_
video(k)
(34)
The output equation for the estimation of speed based on video
measurements is:
p
t
|V | = v
n
cg
, (35)
where p is provided by the video system and represent the
three unit vector in the direction of the movement, |V | is
the Euclidean norm of the speed considered constant (is the
last valid value obtained from the GPS-IMU sensor fusion
algorithm) and t is the time difference between two mea-
surements of the video system.
VI. NUMERICAL ANALYSIS
Two types of sensors congurations have been considered.
The rst one refers to a big lever arm between GPS and aircraft
center of gravity (10 meters on each axis), while the second
one is a more typical conguration because it consider a lever
arm only for the y-body axis with a magnitude of two meters.
Figure 2 shows the UAV attitude estimation results relative
to the rst conguration. Attitude is reported in terms of roll,
yaw and pitch angles. A performance improvement of attitude
estimation accuracy is clearly observable since the accuracy
of video based estimation is better than the GPS one. Indeed
the static accuracy of Video System is of the order of only 1
degree, while the GPS accuracy is of the order of 5 degrees.
For the second conguration, results are reported in Fig-
ure 3. It can be observed that the angle estimation shows
the same behavior of the rst conguration for both GPS and
video assisted estimation algorithms. This happens because
observability is guaranteed by the lever arm. On the contrary,
for (not reported here for the sake of brevity) and angles,
the estimation error with only the GPS is much greater with
respect to the previus conguration. This is principally due
to the absence of a signicant lever arm, in which case
the estimation error is that of simple INS measurements
integration. However, the IMU+VIDEO algorithm is more
robust to the lever arm effect since it still performs as well
as in the rst congifuration.
The distance between the GPS antenna and the aircraft
center of gravity is generally minimized because of the aircraft
conguration and the need to reduce cables and wiring.
Moreover, a big lever arm introduces errors in the estimation
of aircraft center of gravity position and speed. Figures 4
and 5 show the results of position and speed estimations
with video measurements. At time instant 120 sec a GPS
outage has been simulated with a duration of 120 sec. This
situation could happen, for example, while traveling through
a tunnel or in wooded area, in indoor navigation and/or in
hostile environments where the GPS signal is not available.
The use of video measurements allows continuing estimation
of position and speed with a good accuracy throughout the
sequence. Without the fusion of these measurements, solution
would diverge after short time, due to integration of inertial
measurement errors.
VII. CONCLUSION
In this paper an innovative technique for estimation of the
position and attitude of an UAV has been proposed. The main
0 50 100 150 200 250
10
0
10
20
30
40
Time [sec]
P
h
i
A
n
g
l
e
[
d
e
g
]
Estimated Video
Reference
Estimated GPS
0 50 100 150 200 250
10
8
6
4
2
0
2
4
6
Time [sec]
T
h
e
t
a
A
n
g
l
e
[
d
e
g
]
Estimated Video
Reference
Estimated GPS
0 50 100 150 200 250
50
0
50
100
150
200
Time [sec]
P
s
i
A
n
g
l
e
[
d
e
g
]
Estimated Video
Reference
Estimated GPS
Fig. 2. Attitude estimation in terms of Euler angles for the big lever arm
conguration. From top to bottom: roll () pitch () and yaw () angles.
Groundtruth (Red), IMU+GPS estimation (Black) and Video aided estimation
(Blue).
innovative aspect of this technique concerns the introduction of
a vision-based system composed of a low-cost camera device.
The information carried by the camera is then integrated with
classical data coming from the IMU and GPS units in a
sensor fusion algorithm. A linearized Kalman Filter has been
0 50 100 150 200 250
10
0
10
20
30
40
Time [sec]
P
h
i
A
n
g
l
e
[
d
e
g
]
Estimated Video
Reference
Estimated GPS
0 50 100 150 200 250
100
50
0
50
100
150
200
Time [sec]
P
s
i
A
n
g
l
e
[
d
e
g
]
Estimated Video
Reference
Estimated GPS
Fig. 3. Attitude estimation in terms of Euler angles for the small lever
arm conguration. Roll (up) and yaw (down) angles. Groundtruth (Red),
IMU+GPS (Black) and IMU+Video estimation (Blue).
implemented, because dynamic and observation equations are
non-linear in their original form.
The simulation setup is composed of accurate models for
realistic UAV ight behavior, sensors simulation and 3D pho-
torealistic rendering. Preliminary results are very encouraging,
since they show the improvement up to one order of magnitude
on the attitude accuracy, using an HD camera. Future work is
the validation of simulation results on a real ight test using
a mini-UAV platform.
REFERENCES
[1] P. Corke, J. Lobo, and J. Dias, An introduction to inertial and visual
sensing, I. J. Robotic Res., vol. 26, no. 6, pp. 519535, 2007.
[2] J. Lobo and J. Dias, Vision and inertial sensor cooperation using gravity
as a vertical reference, Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 25, no. 12, pp. 15971608, Dec. 2003.
[3] , Inertial sensed ego-motion for 3d vision, J. Robot. Syst., vol. 21,
no. 1, pp. 312, Jan. 2004.
[4] S. You, U. Neumann, and R. Azuma, Orientation tracking for outdoor
augmented reality registration, IEEE Computer Graphics and Applica-
tions, vol. 19, pp. 3642, 1999.
[5] S. You and U. Neumann, Fusion of vision and gyro tracking for robust
augmented reality registration, in Proceedings of the Virtual Reality
2001 Conference (VR01), ser. VR 01. Washington, DC, USA: IEEE
Computer Society, 2001, pp. 71.
0 50 100 150 200 250
50
40
30
20
10
0
10
20
30
Time [sec]
V
e
l
N
o
r
t
h
[
m
/
s
e
c
]
Estimated Video
Reference
0 50 100 150 200 250
5
0
5
10
15
20
25
Time [sec]
V
e
l
E
a
s
t
[
m
/
s
e
c
]
Estimated Video
Reference
0 50 100 150 200 250
5
0
5
Time [sec]
V
e
l
D
o
w
n
[
m
/
s
e
c
]
Estimated Video
Reference
Fig. 4. Velocity estimation in North-East-Down (NED) reference system.
Groundtruth (Black), IMU+Video estimation (Blue).
[6] L. P., K. A., P. A., and B. G., Inertial tracking for mobile augmented
reality, in Proc. of the IMTC 2002, vol. 2, Anchorage, USA, 2002, pp.
15831587. [Online]. Available: Lang2002.pdf
[7] U. Neumann, S. You, J. Hu, B. Jiang, and J. Lee, Augmented virtual
environments (ave): Dynamic fusion of imagery and 3d models, in
Proceedings of the IEEE Virtual Reality 2003, ser. VR 03. Washington,
DC, USA: IEEE Computer Society, 2003, pp. 61.
[8] B. Jiang, U. Neumann, and S. You, A robust hybrid tracking system for
outdoor augmented reality, in Proceedings of the IEEE Virtual Reality
0 50 100 150 200 250
600
500
400
300
200
100
0
Time [sec]
P
o
s
D
o
w
n
[
m
]
Estimated Video
Reference
120 140 160 180 200 220 240
560
540
520
500
480
460
440
Time [sec]
P
o
s
D
o
w
n
[
m
]
Estimated Video
Reference
Fig. 5. Down position estimation. Top: entire simulation. Bottom: zoom
during the GPS outage. Groundtruth (Black), IMU+Video estimation (Blue).
2004, ser. VR 04. Washington, DC, USA: IEEE Computer Society,
2004, pp. 3.
[9] E. D. Dickmanns, Vehicles capable of dynamic vision: a new breed of
technical beings? Artif. Intell., vol. 103, no. 1-2, pp. 4976, Aug. 1998.
[10] J. Goldbeck, B. Huertgen, S. Ernst, and L. Kelch, Lane following
combining vision and dgps, Image and Vision Computing, vol. 18, no. 5,
pp. 425433, 2000.
[11] T. Hague, J. A. Marchant, and N. D. Tillett, Ground based sensing sys-
tems for autonomous agricultural vehicles, Computers and Electronics
in Agriculture, vol. 25, no. 1-2, pp. 1128, 2000.
[12] T. Goedem?, M. Nuttin, T. Tuytelaars, and L. V. Gool, Vision based
intelligent wheel chair control: The role of vision and inertial sensing
in topological navigation, Journal of Robotic Systems, vol. 21, no. 2,
pp. 8594, 2004.
[13] D. D. Diel, P. DeBitetto, and S. Teller, Epipolar constraints for vision-
aided inertial navigation, Proc. IEEE Motion and Video Computing, pp.
221228, 2005.
[14] I. Stratmann and E. Solda, Omnidirectional vision and inertial clues
for robot navigation, Journal of Robotic Systems, vol. 21, no. 1, pp.
3339, 2004.
[15] J. Kim and S. Sukkarieh, Real-time implementation of airborne inertial-
slam, Robot. Auton. Syst., vol. 55, no. 1, pp. 6271, Jan. 2007.
[16] M. Bryson and S. Sukkarieh, Building a robust implementation of
bearing-only inertial slam for a uav, J. Field Robotics, vol. 24, no.
1-2, pp. 113143, 2007.
[17] J. Nyg ards, P. Skoglar, M. Ulvklo, and T. H ogstr om, Navigation aided
image processing in uav surveillance: Preliminary results and design of
an airborne experimental system, J. Robot. Syst., vol. 21, no. 2, pp.
6372, Feb. 2004.
[18] S. Graovac, Principles of fusion of inertial navigation and dynamic
vision, J. Robot. Syst., vol. 21, no. 1, pp. 1322, Jan. 2004.
[19] L. Muratet, S. Doncieux, Y. Briere, and J.-A. Meyer, A contribution
to vision-based autonomous helicopter ight in urban environments,
Robotics and Autonomous Systems, vol. 50, no. 4, pp. 195209, 2005.
[20] P. Corke, An inertial and visual sensing system for a small autonomous
helicopter, J. Field Robotics, vol. 21, no. 2, pp. 4351, 2004.
[21] S. I. Roumeliotis, A. E. Johnson, and J. F. Montgomery, Augmenting
inertial navigation with image-based motion estimation, in ICRA.
IEEE, 2002, pp. 43264333.
[22] R. Eustice, H. Singh, J. Leonard, M. Walter, and R. Ballard, Visually
navigating the rms titanic with slam information lters, in Proceedings
of Robotics: Science and Systems, Cambridge, USA, June 2005.
[23] A. Huster and S. M. Rock, Relative position sensing by fusing monoc-
ular vision and inertial rate sensors, Ph.D. dissertation, STANFORD
UNIVERSITY, 2003.
[24] M. Dunbabin, K. Usher, and P. I. Corke, Visual motion estimation for
an autonomous underwater reef monitoring robot, in FSR, 2005, pp.
3142.
[25] A. E. R. Shabayek, C. Demonceaux, O. Morel, and D. Fo, Vision
based uav attitude estimation: Progress and insights, Journal of Intel-
ligent & Robotic Systems, vol. 65, pp. 295308, 2012.
[26] D. Dusha, W. Boles, and R. Walker, Attitude estimation for a xed-
wing aircraft using horizon detection and optical ow, in Proceedings
of the 9th Biennial Conference of the Australian Pattern Recognition
Society on Digital Image Computing Techniques and Applications, ser.
DICTA 07. Washington, DC, USA: IEEE Computer Society, 2007,
pp. 485492.
[27] S. Thurrowgood, D. Soccol, R. J. D. Moore, D. Bland, and M. V.
Srinivasan, A Vision based system for attitude estimation of UAVS, in
IEEE/RSJ International Conference on Intelligent Robots and Systems.
IEE, October 2009, pp. 57255730.
[28] V. Sazdovski, P. M. G. Silson, and A. Tsourdos, Attitude determination
from single camera vector observations, in IEEE Conf. of Intelligent
Systems. IEEE, July 2010, pp. 4954.
[29] Z. Zhu, S. Bhattacharya, M. U. D. Haag, and W. Pelgrum, Using single-
camera geometry to perform gyro-free navigation and attitude deter-
mination, in IEEE/ION Position Location and Navigation Symposium
(PLANS), vol. 24, no. 5. IEEE, May 2010, pp. 4954.
[30] B. D. Lucas and T. Kanade, An iterative image registration technique
with an application to stereo vision, in Proceedings of the 7th in-
ternational joint conference on Articial intelligence - Volume 2, ser.
IJCAI81. San Francisco, CA, USA: Morgan Kaufmann Publishers
Inc., 1981, pp. 674679.
[31] M. Hwangbo, J.-S. Kim, and T. Kanade, Inertial-aided klt feature
tracking for a moving camera, in IROS. IEEE, 2009, pp. 19091916.
[32] C. Tomasi and T. Kanade, Detection and tracking of point features,
International Journal of Computer Vision, Tech. Rep., 1991.
[33] T. S. Huang and O. D. Faugeras, Some properties of the e matrix
in two-view motion estimation, IEEE Trans. Pattern Anal. Mach.
Intell., vol. 11, no. 12, pp. 13101312, Dec. 1989. [Online]. Available:
http://dx.doi.org/10.1109/34.41368
[34] H. C. Longuet-Higgins, A computer algorithm for reconstructing a
scene from two projections, Nature, vol. 293, no. 5828, pp. 133135,
1981.