Single Axis Relative Rotation from Orthogonal Lines

Single Axis Relative Rotation from Orthogonal Lines
Ali Elqursh
Ahmed Elgammal
Rutgers University
{elqursh,elgammal}@cs.rutgers.edu
Abstract
We present an efficient algorithm that computes the
relative pose between two calibrated views given that
the rotation is around a single axis. The algorithm
is suited for indoor and urban environments that have
an abundance of orthogonal lines. We also present
a framework in which this algorithm is used within a
hypothesize-and-test framework to simultaneously detect orthogonal lines and compute the relative rotation
without explicit structure computation. We study the
performance of the algorithm using synthetic and real
datasets.
1
Introduction
Structure from motion research in computer vision
has reached the point where a fully automated reconstruction of the trajectory of a video camera moving
through an unknown scene is becoming a routine. Having the ability to accurately localize a moving camera
is an essential building block for robotic navigation and
simultaneous localization and mapping (SLAM).
Existing approaches were designed with the assumption that many point features can be accurately tracked
across many frames. However, it is often the case in
indoor environments that few point features can be accurately localized. Walls, for example, have little texture and existing visible features are lines on the boundaries with other walls/objects. This suggests that lines
may be more promising features for indoor environments. Additionally, indoor environments exhibit structures that can be exploited to achieve robust motion estimation even with few features. Existing approaches
try to enforce structural constraints after computing the
motion of the camera and thus does not benefit from the
extra information available.
Our goal is two folds. First, to automatically recognize pairs of orthogonal lines without explicitly reconstructing the scene. Second, we want to be able to
robustly estimate the motion of the camera from lines
under constrained motion where the camera rotates between two views around a single axis.
The single axis rotation may seem very restrictive at
first, but as it turns out, many applications satisfy this
constraint. For example, robots usually have cameras
mounted such that they are looking in the horizontal direction. Another application, is in Pan-Tilt-Zoom(PTZ)
cameras.
In this paper we present an algorithm that computes
the relative rotation between two views under the constraint that the rotation is around a single arbitrary axis.
One important characteristic of our algorithm is that
it decouples the computation of the relative rotation
and translation. Another advantage is that the rotation
computation is not susceptible to degeneracies due to
a small baseline. In fact, we can effectively compute
the relative rotation with zero baseline without having
to handle it as a special case.
Lines from two images do not give any constraints on
the relative motion [5]. The trifocal tensor can be used
to compute the structure and motion from lines in three
images. The trifocal tensor has 18 degrees of freedom
and at least 13 lines are needed to compute it [5]. Besides the requirement of three images, the large number
of line correspondences required discourages the use of
trifocal tensor in hypothesize-and-test frameworks. In
contrast, we need only two line correspondences in two
images to compute the relative rotation and two line intersections to compute the relative translation.
Another category of SfM from lines, exploits common assumptions such as the existence vertical vanishing point and world homography [2], of three orthogonal dominant directions [11, 6], turntable motion [4],
and nonholonomic motion of a wheeled vehicle[9]. Our
approach also uses a scene constraint but with important differences. We assume only that there exist pairs
of lines that are orthogonal to each other. These lines do
not necessary have to be aligned to dominant directions
and do not have to be coplanar.
Recently, large progress was made in visual odometry (e.g[9, 8]). Almost all such systems uses stereo
cameras or camera rigs. These systems rely on solutions to the problem of relative pose estimation from a
minimal set of 5 points [7, 10] inside a RANSAC framework. Using lines is complementary to using points and
thus our approach can be used in any of these systems
to compute the relative pose. Similar to the state of the
art for points[7] which produces up to 10 solutions we
produce 4 solutions.
2
Single Axis Relative Rotation from orthogonal lines
Notation 3D points are denoted as homogeneous vectors X ∈ R4 and 2D points and lines are denoted as homogeneous vectors x, l ∈ R3 . The perspective camera
matrix is represented by a 3 × 4 matrix P = K [R | t],
where K is the 3 × 3 calibration matrix and (R, t) are
the exterior rotation and translation. A direction in 3D
is represented by a vector d ∈ R3 where the scale is
arbitrary.
Algorithm Given that there exist two orthogonal lines,
say L1 , L2 , that are imaged in two views, the goal is to
compute the relative rotation around a given axis given
the images of these lines in the two views. Let l1 , l2
be the images of two orthogonal lines in the first image and l′1 , l′2 be the corresponding lines in the second image. Additionally, let d1 , d2 be the directions
of the two lines in the first camera coordinate frame.
Without loss of generality, we assume the world coordinate frame coincides with the first camera, and
thus, P = K[I|0] and P′ = K′ [R|t]. Lets denote the
point at infinity for line Li , where i ∈ {1, 2}, as
T
Xi = di T 0 . Its image xi can thus be compute
T
by xi = PXi = K [R | t] di T 0 = KRdi . Since
Xi lies on Li , its image xi lies on li . Algebraically this
means that li · xi = 0 . This applies to the two cameras
l′1
l′2
l1 · Kd1 =0
(1)
l2 · Kd2 =0
· K′ Rd1 =0
(2)
(3)
· K′ Rd2 =0.
(4)
Additionally, since the two lines in 3D are orthogonal,
we have
d1 · d2 = 0.
(5)
Without loss of generality, assume that the rotation R is
around the y-axis and thus have the following form


a 0 b
(6)
R =  0 1 0 ,
−b 0 a
under the constraint that
a2 + b2 = 1.
(7)
We are free to select any scale for d1 , d2 and, there
T
fore, we assume that di has the form dix diy 1
for i ∈ {1, 2}. We have six equations in the six unknowns d1x , d2x , d1y , d2y , a and b. First we can eliminate the unknowns dix , diy for i ∈ {1, 2} from the
equations. From the equations (1),(2) we can solve for
dix in terms of diy for i ∈ {1, 2} and substituting into
(3), (4), (5) we eliminate these two unknowns. Similarly, from equations (3),(4) we express diy in terms
of a and b and substitute in (5). Equation (5) becomes
a polynomial equation in a and b and consists of the
six monomials {a2 , ab, b2 , a, b, 1}.Computing the relative rotation is thus reduced to the problem of solving
two polynomial equations (5),(7) in a, b. We solve for a
single variable first, say b. Then we rewrite each polynomial as
ca2 + [1]a + [2] = 0,
(8)
where we use the notation [n] to denote a polynomial of
degree n in b. In matrix form the system of equations is
   
a2
0
c [1] [2]    
a = 0 .
(9)
′
′
c
0 [2]
0
1
Let us denote the 2×3 matrix above by A. Matrix A has
rank 2 and thus has a one dimensional null space w =
[w1 w2 w3 ]T . The ith element of the null vector can
be obtained by
wi = (−1)i+1 det(Ai ),
(10)
where Ai is the matrix formed by removing the ith column of A. From the form of matrix A, we see that w1
is a cubic polynomial , w2 is quadratic, and w3 is linear
in b. The three elements of the null vector w are not
independent and satisfy the following constraint
a2 × 1 = a × a =⇒ w1 × w3 = w2 × w2 .
(11)
Equation (11) is a quartic polynomial (4th degree polynomial) in b. Fortunately, a quartic is the highest degree
polynomial that can be solved in closed form via Ferrari’s method [1]. Up to four candidate solutions for b
are obtained from the real roots of (11). We can then
w2
. d1 , and d2 is then
compute a by substituting into
w1
obtained by back substitution. Note that orthogonality
constraints on R, d1 , and d2 is automatically enforced.
To disambiguate between different solutions, similar to
the 7-point algorithm, we select solutions that gives the
largest consensus set as explained later.
Once the relative rotation is computed we can now
compute the relative translation t from two intersection
points. Any two lines do not necessarily intersect in
3D. However, naturally there are many lines which are
coplanar in any given scene. Let p1 , p2 be any two intersection points of two pairs of lines and p′1 , p′2 their
corresponding points in the second image. Thus the
epipolar constraint can be written as
120
100
Inlier Count
p′T
i [t]× Rpi = 0,
140
(12)
80
60
40
where [t]x is the skew symmetric matrix that performs
the cross product with another vector using matrix multiplication. Since R is known, we have 2 linear constraints on the elements of t, one for each intersection
point, and the relative translation can be linearly computed up to scale.
3
Robust estimation
20
0
−100
−80
−60
−40
−20
0
20
40
60
80
Angles (degrees)
Figure 1. Inlier count for each angle computed from
a pair of lines.
Since our algorithm requires pairs of lines we first
generate a set of candidate pairs. Since the number of
lines are typically in the order of ≈ 100, we accomplish
this in a brute force manner by generating all possible
pairs of lines. Our experiments show that this simple
strategy is in fact effective.
F is computed as K−T [t]× RK−1 and t is computed
from two intersection point samples. An obvious concern is the number of iteration needed by RANSAC to
achieve an outlier free sample. Using a pessimistic estimate that 90% of the intersection points are outliers,
to achieve an outlier free sample with 0.99 probability,
one can compute that 458 iterations are needed [3].
3.1 Relative rotation
4
The output of the relative rotation algorithm is a rotation matrix R which can be represented compactly as a
single rotation angle. Starting with many pairs of lines,
with each pair voting for several rotations, we select the
rotation angle that has the most votes. Each pair of lines
generates up to 4 solutions of rotation angles that are
computed in constant time. We first collect the solutions for all pairs in a single sorted array. Next, for each
randomly selected angle count how many other angles
are within τ degrees away, where τ is a user defined
threshold. This can be performed very efficiently on the
sorted array. The output is the angle that has the largest
number of inlier pairs within the threshold τ along with
the set of inliers. To demonstrate this, Figure 1 shows
the number of inliers plotted for each angle for the first
frame from the PTZ Camera dataset. The ground truth
rotation is −7.55◦ . The peak corresponds to the angle
−7.5982◦ with an error of 0.04◦ .
3.2 RANSAC for relative translation
Once the relative rotation is computed robustly, we
can then compute the relative translation from intersection points of lines. We proceed by computing all intersection points between all possible pairs of N lines
in the image; there are N (N − 1)/2 such pairs. Of
these, intersection points satisfying the epipolar constraint (12) are considered inliers. RANSAC is then
used to detect inliers using the symmetric epipolar error [12] d(p, p′ ) = d⊥ (p, FT p′ ) + d⊥ (p′ , Fp), where
Results
Synthetic data For the synthetic experiments we
create two lines; one of them in the x direction and the
other in the y direction. The setup is depicted in Figure 4. Two cameras are randomly generated with y-axis
only rotation and on the unit sphere centered around the
(0.5,0.5,0) with the principal axis pointing toward that
point. For image lines, we estimate each line from a
set of points that lie on the true image line. 20 points
were used to estimate each line. We add a zero mean
Gaussian noise of varying standard deviations to the image coordinates of these points. The noise standard deviation σ is in pixels and we perform experiments for
σ ∈ (0, 2). The performance of the algorithm is shown
in Figure 4. Each data-point is based on 10000 trials.
As can be seen the rotation error is less than 0.4◦ (0.1◦
for the lower quartile).
Book Turntable In this sequence, a book is placed on
a turntable and the camera is mounted on a tripod looking horizontally towards the book. Figure 4 shows a
sample frame from the sequence and the reconstructed
3D structure and the camera motion. The accumulated
error in the rotation angle of the last frame is < 3.5◦ .
All frames have between 7 to 9 line segments. This
shows that we can compute the relative motion even
when there is fewer than the 13 lines required for estimating the trifocal tensor.
PTZ Camera Although several methods exists that can
compute the rotation for PTZ Cameras, we evaluate the
accuracy and robustness of our method using it since
5
Rotation angle (degrees)
Error in rotation
0.4
Mean error
Lower quratile
0.35




Rotation error (degrees)
0.3
0.25
0.2
0.15

−100
−150
−200
−250
−300
−350
−400
0.1
Ground truth
5point
Our result
−50
5
10
15
20
25
30
Frame Number
35
40
45
Rotation error (degrees)
0

5point
Our result
4
3
2
1
0
−1
−2
−3
−4
−5
1
6
11
16
21
26
31
36
41
46
Frame Number
0.05
0
0
0.2
0.4
0.6
0.8
1
1.2
Noise level in pixels
1.4
1.6
1.8
2
Figure 5. (Left) Cumulative rotation and ground
Figure 2. (Left) Setup for the synthetic experiment.
Random rotations around the y-axis were used. (Right)
Error pots for the relative rotation.
truth as a function of the image frame. (Right) Rotation error per frame.
The performance of the algorithm was evaluated using
synthetic and real datasets.
References
27
2928
30
31
32
3334
35
26 25
24
23
22
36 37
38
39
1 40 2
21
20
3
19
4
5
18 17
16 15
1413
12
11
10
8 9
6 7
Figure 3. Book sequence. Sample frame (Left) and
reconstruction (Right).
the sequence of images should exactly close the loop
when performing a 360◦ pan. Figure 4 depicts some
frames from the sequence. The sequence consists of 46
frames with the angles between the each pair of frames
= 7.55◦ except the last frame which is 20◦ away. We
took the ground truth as the rotation commands given
to the PTZ Camera. However, we had no information
about how accurately the PTZ camera actually rotated.
Figure 4 shows the computed total rotation as a function of the frame number and the error per frame for our
algorithm and the 5point algorithm[7]. For our method,
the mean absolute error per frame is 0.36◦ .
5
Conclusion
We have presented an algorithm that can compute the
rotation around any given axis from the images of two
lines. We also presented how this algorithm can be used
in a framework to automatically detect orthogonal lines.
Figure 4. Sample frames from the PTZ Camera sequence.
[1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, ninth dove edition,
1964.
[2] A. Criminisi, I. Reid, and A. Zisserman. Single View
Metrology. IJCV, 40(2):123–148, 2000.
[3] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications
to image analysis and automated cartography. Commun.
ACM, 24(6):381–395, June 1981.
[4] A. Fitzgibbon, G. Cross, and A. Zisserman. Automatic
3D model construction for turn-table sequences. In European Workshop on 3D Structure from Multiple Images
of Large-Scale Environments, SMILE, pages 155–170,
1998.
[5] R. I. Hartley. Lines and Points in Three Views and the
Trifocal Tensor. IJCV, 22(2):125–140, 1997.
[6] J. Kosecka. Extraction, matching and pose recovery based on dominant rectangular structures. CVIU,
100(3):274–293, 2005.
[7] D. Nistér. An Efficient Solution to the Five-Point Relative Pose Problem. PAMI, 26(6):756–777, 2004.
[8] D. Nistér, O. Naroditsky, and J. Bergen. Visual Odometry. In CVPR, volume 1, pages 652–659. IEEE, 2004.
[9] D. Scaramuzza, F. Fraundorfer, and R. Siegwart. Realtime monocular visual odometry for on-road vehicles
with 1-point ransac. In Robotics and Automation, 2009.
ICRA ’09. IEEE International Conference on, pages
4293 –4299, may 2009.
[10] H. Stewenius, C. Engels, and D. Nister. Recent developments on direct relative orientation. ISPRS Journal of
Photogrammetry and Remote Sensing, 60(4):284–294,
June 2006.
[11] T. Werner and A. Zisserman. New Techniques for Automated Architectural Reconstruction from Photographs.
In ECCV, pages 541–555, London, UK, 2002. SpringerVerlag.
[12] Z. Zhang. Determining the Epipolar Geometry and its
Uncertainty: A Review. IJCV, 27(2):161–195, 1998.