Wide-Area Egomotion Estimation from Known 3D Structure
Olivier Koch, Seth Teller
Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory
Robotics, Vision and Sensor Networks Lab
Initialization Algorithm
The three-line alignment algorithm
Real Datasets
lab
Aligning three 2D image edges (i,j,k) onto three 3D model line segments (r,s,t)
t
k
j
s
Number of frames
1,500
Frame rate (fps)
5
Excursion duration (min)
5
Excursion length (m)
120
Total # of 3D segments 3,000
Total surface area (m2)
450
t
j
k
j
k
s
j
s
s
k
i
2 rotations
(2-dof minimization)
R: 0-DOF
T: 0-DOF
Input:
System Overview
(translation, rotation)
PointGrey Research Ladybug camera
75% of full sphere
6 x 1024 x 768 8-bit JPG images @ 15Hz
- Scales to large environments
(several buildings)
accurate 3D localization
lock
Initialization
- Robust to clutter, lighting
variations and modest changes
in scene
lj
e
et s
Startup
10
10
5
5
0
40
30
20
10
0
10
20
30
0
6
Translation Error (cm)
4
2
0
2
4
6
Rotation Error (deg)
estimate of the camera position
a single omnidirectional image
3 translations
(closed form)
lk
- 6-DOF camera localization
15
li
er
model segments
k
j
i
s
r
(a)
t
image edges
For each pair of image edges (r,s) and 3D model lines (i,j), score by
comparing angles. Process all pairs and update scoring table.
Maintenance
(b)
(a)
(b)
(d)
(c)
es
omnidirectional video sequence
15
r
R: 0-DOF
T: 3-DOF
1 arbitrary rotation
(closed form)
20
lin
R: 2-DOF
T: 3-DOF
20
el
R: 3-DOF
T: 3-DOF
25
µ = 0.96 deg
σ = 2.16 deg
od
r
r
1,900
15
2
33
3,000
450
µ = 0.04 cm
σ = 13.2 cm
m
r
matching score
i
7,800
5
26
936
7,400
7,000
25
30
i
image edges
i
coarse 3D model
t
t
corridor handheld
Histogram value
3D localization from omnidirectional video
(d)
(c)
lab dataset
loss of lock
hand-held dataset
Example: candidate
matches on the image
(right) for a given 3D
model line segment
(left).
Offline pre-computed Visibility Set
Maintenance: robust edge-line matching
Matching 2D image edges with 3D model line
segments
3D line
Z
(R,T)
frame t + 1
frame t + 1
ejt
R: camera rotation
T: camera translation
ei: image line
lj: model line segment
α: error angle
ei
Y
X
frame t
world coordinate frame
ejt+1
Correspondence update
Each correspondence is updated using a hue-based and an angle-based matching function.
testing set
Minimize:
Translation Error (cm)
Rotation Error (deg)
25
1.6
noise and clutter
clutter only
noise only
zero noise
1.4
20
Random sample consensus
1.2
Correspondence = 1 model line +
1 image edge
15
}
{ observed for
> k consecutive
frames }
{observed}
0.4
5
UNKNOWN
PENDING
keep best scoring set
0
20
30
40
50
60
0
20
30
40
Localization error vs # correspondences
50
60
reprojected view
- Better accuracy (sensor fusion)
- Online update of the 3D model
Acknowledgments
find inliers
We gratefully acknowledge the support of Draper Laboratory’s University Independent Research and
Development Program.
Hexagonal model subdivision (step = 1.5m)
Virtual
Camera
UNKNOWN
PENDING
ACCEPTED
0.2
{not observed for
> k frames}
corridor dataset
- Signature-based initialization
- Better performance ( > 1Hz)
ACCEPTED
0.6
{not observed}
Number of correspondences
needed varies with conditions.
Visibility computation using OpenGL
0.8
10
(c)
Future Work
We draw a number of random correspondence sets, score them and keep the best one.
Inliers are promoted in the state machine. Outliers are demoted.
1
(d)
Extract set of correspondences using random sample consensus.
angular test
color test
scoring set
(b)
}
li
frame t
l
camera coordinate i α(ei, R, T, li)
frame
image edge
(a)
0
50
100
150
200
250
Frame number
Correspondence state machine
Each correspondence is assigned a state that changes over time.
300
350
400
3D model
Image
Face visibility computation using OpenGL. Each face in
the model is rendered using a unique color.
References
A. Ansar and K. Daniilidis. Linear pose estimation from points or lines. In Proc. ECCV, volume 4, pages
282–296, New York, May 2002.
A. Bartoli and P. Sturm. Structure from motion using lines: Representation, triangulation and bundle
adjustment. CVIU, 100(3):416–441, Dec. 2005.
S. Coorg and S. Teller. Matching and pose refinement with camera pose estimates. Technical Report
MIT/LCS/TM-561, MIT, 1996.
T. Drummond and R. Cipolla. Real-time visual tracking of complex structures. IEEE Trans. PAMI,
24(7):932–946, 2002.
E. Rosten and T. Drummond. Fusing points and lines for high performance tracking. In IEEE ICCV,
volume 2, pages 1508–1515, Beijing, China, 2005.
© Copyright 2025 Paperzz