Bundle Adjustment in the Large - Washington

Bundle Adjustment in the Large
Sameer Agarwal1? , Noah Snavely2 , Steven M. Seitz3 , and Richard Szeliski4
1
Google Inc.
Cornell University
Google Inc. & University of Washington
4
Microsoft Research
2
3
Abstract. We present the design and implementation of a new inexact Newton type algorithm for solving large-scale bundle adjustment
problems with tens of thousands of images. We explore the use of Conjugate Gradients for calculating the Newton step and its performance
as a function of some simple and computationally efficient preconditioners. We show that the common Schur complement trick is not limited
to factorization-based methods and that it can be interpreted as a form
of preconditioning. Using photos from a street-side dataset and several
community photo collections, we generate a variety of bundle adjustment problems and use them to evaluate the performance of six different
bundle adjustment algorithms. Our experiments show that truncated
Newton methods, when paired with relatively simple preconditioners,
offer state of the art performance for large-scale bundle adjustment.
The code, test problems and detailed performance data are available
at http://grail.cs.washington.edu/projects/bal.
Key words: Structure from Motion, Bundle Adjustment, Preconditioned Conjugate Gradients
1
Introduction
Recent work in Structure from Motion (SfM) has demonstrated the possibility
of reconstructing geometry from large-scale community photo collections [1–3].
Bundle adjustment, the joint non-linear refinement of camera and point parameters, is a key component of most SfM systems, and one which can consume a
significant amount of time for large problems. As the number of photos in such
collections continues to grow into the hundreds of thousands or even millions,
the scalability of bundle adjustment algorithms has become a critical issue.
The basic mathematics of the bundle adjustment problem are well understood [4], and there is also a freely available high-quality implementation –
SBA [5]. SBA is based on a dense Cholesky factorization of the reduced camera matrix. It has space complexity that is quadratic and time complexity that
is cubic in the number of photos. While this works well for problems with a
few hundred photos, for problems involving tens of thousands of photos, it is
prohibitively expensive.
?
Part of this work was done while the author was at University of Washington.
2
Agarwal, Snavely, Seitz & Szeliski
(a) Structured - 6375 photos (b) Unstructured - 4585 photos
Fig. 1. Connectivity graphs for a structured dataset (captured from a moving truck)
and a community photo collection (consisting of photos matching the search term
“Dubrovnik” downloaded from Flickr). For each dataset, we show an adjacency matrix
representation of the connectivity graph, where black indicates a connection between
two photos.
With the exception of a few efforts [6–8, 1, 9], the development of largescale bundle adjustment algorithms has not received significant attention in the
computer vision community. We believe this is because until now, the most
common sources of large SfM problems have been video and structured survey
datasets such as street-level and aerial imagery. For these datasets, the connectivity graph—i.e., the graph in which each photo is a node, and two photos are
connected if they are looking at the same part of the scene—is extremely sparse,
and has a mostly band-diagonal structure with a large diameter. For instance,
in the case of data acquired using a camera mounted on a vehicle driving down
a street, there is little to no overlap between photos taken even a few seconds
apart. Figure 1(a) shows one such graph. Thus, techniques that reduce the size
of the bundle adjustment problem by focusing on the most recently modified
part of the reconstruction are quite effective [7, 6].
Connectivity graphs of community photo collections are much less structured
and have a significantly smaller diameter, as they tend to represent popular
landmarks rather than a long, extended sequence of views. Figure 1(b) shows
the graph for a set of photos of the city of Dubrovnik downloaded from Flickr.
Compared to the structured dataset in Figure 1(a) which is 98% sparse with a
mostly band diagonal structure, the graph for Dubrovnik is only 84% sparse, with
a significantly more complex structure. This means that even though the dataset
in Figure 1(a) has almost 1800 more photos than the dataset in Figure 1(b), the
former requires 40x less time to find a sparse factorization of its reduced camera
matrix than the latter.
In this paper, we present the design and implementation of a new inexact
Newton type bundle adjustment algorithm, which uses substantially less time
and memory than standard Schur complement based methods, without compromising on the quality of the solution. We explore the use of the Conjugate
Gradients algorithm for calculating the Newton step and its performance as a
function of some simple and computationally efficient preconditioners. We also
show that the use of the Schur complement is not limited to factorization-based
Bundle Adjustment in the Large
3
methods, how it can be used as part of the Conjugate Gradients (CG) method
without incurring the computational cost of actually calculating and storing it
in memory, and how this use is equivalent to the choice of a particular preconditioner.
We present extensive experimental results on structured and unstructured
datasets with a wide variety of problem complexity, and present recommendations based on these experiments. The code, test problems and detailed performance results from this paper are available at http://grail.cs.washington.
edu/projects/bal.
The rest of the paper is organized as follow. We begin in Section 2 with
a brief overview of the general nonlinear least squares problem, the Levenberg
Marquardt (LM) algorithm, and the Schur complement trick. In Section 3, we
introduce the inexact step LM algorithm, with a discussion of various methods
for preconditioning the Conjugate Gradients (CG) algorithm in Section 4. Section 5 reports the results of our experiments and we conclude in Section 6 with
a discussion.
2
Bundle Adjustment
Given a set of measured image feature locations and correspondences, the goal
of bundle adjustment is to find 3D point positions and camera parameters that
minimize the reprojection error. This optimization problem is usually formulated
as a non-linear least squares problem, where the error is the squared L2 norm
of the difference between the observed feature location and the projection of
the corresponding 3D point on the image plane of the camera. However, we are
not limited to using the L2 norm; even when robust loss functions like Huber’s
norm are used, the problem can be cast as a re-weighted non-linear least squares
problem [10]. Thus in what follows, we will use the term bundle adjustment to
mean a particular class of non-linear least squares problems.
2.1
Levenberg Marquardt Algorithm
The Levenberg-Marquardt (LM) algorithm [11] is the most popular algorithm
for solving non-linear least squares problems, and the algorithm of choice for
bundle adjustment. In this section, we begin with a quick review of LM, and
then describe the Schur complement trick that substantially reduces the computational complexity of LM applied to bundle adjustment. Several excellent
references exist for the reader interested in more details of LM [11–14].
>
Let x ∈ Rn be an n-dimensional vector of variables, and F (x) = [f1 (x), . . . , fm (x)]
be a m-dimensional function of x. We are interested in solving the following optimization problem,
1
(1)
min kF (x)k2 .
x 2
The Jacobian J(x) of F (x) is an m × n matrix, where Jij (x) = ∂j fi (x) and the
gradient vector g(x) = ∇ 21 kF (x)k2 = J(x)> F (x).
4
Agarwal, Snavely, Seitz & Szeliski
The general strategy when solving non-linear optimization problems is to
solve a sequence of approximations to the original problem [11]. At each iteration, the approximation is solved to determine a correction ∆x to the vector x.
For non-linear least squares, an approximation can be constructed by using the
linearization F (x + ∆x) ≈ F (x) + J(x)∆x, which leads to the following linear
least squares problem:
1
min kJ(x)∆x + F (x)k2
(2)
∆x 2
Unfortunately, naı̈vely solving a sequence of these problems and updating x ←
x + ∆x leads to an algorithm that may not converge. To get a convergent algorithm, we need to control the size of the step ∆x. One way to do this is to
introduce a regularization term:
1
min kJ(x)∆x + F (x)k2 + µkD(x)∆xk2 .
∆x 2
(3)
Here, D(x) is a non-negative diagonal matrix, typically the square root of the
diagonal of the matrix J(x)> J(x) and µ is a non-negative parameter that controls the strength of regularization. It is straightforward to show that the step
size k∆xk is inversely related to µ. LM updates the value of µ at each step
based on how well the Jacobian J(x) approximates F (x). The quality of this fit
is measured by the ratio of the actual decrease in the objective function to the
decrease in the value of the linearized model L(∆x) = 21 kJ(x)∆x + F (x)k2 . This
kind of reasoning is the basis of Trust-region methods, of which LM is an early
example [11].
The dominant computational cost in each iteration of the LM algorithm
is the solution of the linear least squares problem (3). For general, small to
medium scale least squares problems, the recommended method for solving (3)
is using the the QR factorization [13]. However, the bundle adjustment problem
has a very special structure, and a more efficient scheme for solving (4) can be
constructed.
2.2
The Schur Complement Trick
We begin by introducing the regularized Hessian matrix Hµ (x) = J(x)> J(x) +
µD(x)> D(x). It is easy to show that for µD(x) > 0, Hµ is a symmetric positive
definite matrix and the solution to (3) can be obtained by solving the normal
equations:
Hµ (x)∆x = −g(x) .
(4)
Now, suppose that the SfM problem consists of p cameras and q points and the
variable vector x has the block structure x = [y1 , . . . , yp , z1 , . . . , zq ]. Where, y
and z correspond to camera and point parameters, respectively. Further, let the
camera blocks be of size c and the point blocks be of size s (for most problems
c = 6–9 and s = 3).
In most cases, a key characteristic of the bundle adjustment problem is that
there is no term fi that includes two or more camera or point blocks. In other
Bundle Adjustment in the Large
5
words, each term fi (x) in the objective function can be re-written as fi (x) =
fi (y(i) , z(i) ), where, y(i) and z(i) are the camera and point blocks that occur in
the ith term. This in turn implies that the matrix Hµ is of the form
B E
,
(5)
Hµ =
E> C
where, B ∈ Rpc×pc is a block diagonal matrix with p blocks of size c × c and
C ∈ Rqs×qs is a block diagonal matrix with q blocks of size s × s. E ∈ Rpc×qs
is a general block sparse matrix, with a block of size c × s for each observation.
Let us now block partition ∆x = [∆y, ∆z] and −g = [v, w] to restate (4) as the
block structured linear system
B E ∆y
v
=
,
(6)
w
E > C ∆z
and apply Gaussian elimination to it. As we noted above, C is a block diagonal
matrix, with small diagonal blocks of size s × s. Thus, calculating the inverse of
C by inverting each of these blocks is an extremely cheap, O(q) algorithm. This
allows us to eliminate ∆z by observing that ∆z = C −1 (w − E > ∆y), giving us
B − EC −1 E > ∆y = v − EC −1 w .
(7)
The matrix
S = B − EC −1 E > ,
(8)
is the Schur complement of C in Hµ . It is also known as the reduced camera
matrix, because the only variables participating in (7) are the ones corresponding
to the cameras. S ∈ Rpc×pc is a block structured symmetric positive definite
matrix, with blocks of size c × c. The block Sij corresponding to the pair of
images i and j is non-zero if and only if the two images observe at least one
common point.
Now, (6) can be solved by first forming S, solving for ∆y, and then backsubstituting ∆y to obtain the value of ∆z. Thus, the solution of what was an
n × n, n = pc + qs linear system is reduced to the inversion of the block diagonal
matrix C, a few matrix-matrix and matrix-vector multiplies, and the solution of
block sparse pc × pc linear system (7). For almost all problems, the number of
cameras is much smaller than the number of points, p q, thus solving (7) is
significantly cheaper than solving (6). This is the Schur complement trick [15].
This still leaves open the question of solving (7). The method of choice for
solving symmetric positive definite systems exactly is via the Cholesky factorization [16] and depending upon the structure of the matrix, there are, in general,
two options. The first is direct factorization, where we store and factor S as
a dense matrix [16]. This method has O(p2 ) space complexity and O(p3 ) time
complexity and is only practical for problems with up to a few hundred cameras. But, S is typically a fairly sparse matrix, as most images only see a small
fraction of the scene. This leads us to the second option: sparse direct methods. These methods store S as a sparse matrix, use row and column re-ordering
6
Agarwal, Snavely, Seitz & Szeliski
algorithms to maximize the sparsity of the Cholesky decomposition, and focus
their compute effort on the non-zero part of the factorization [17]. Sparse direct
methods, depending on the exact sparsity structure of the Schur complement,
allow bundle adjustment algorithms to significantly scale up over those based on
dense factorization.
This however is not enough for community photo collections, where the size
and sparsity structure of S (e.g. Figure 1) is such that even constructing it
is a significant expense, and factoring it leads to near dense Cholesky factors.
Hence we would like to find alternatives that do not depend on the construction,
storage, and factorization of S and yet give good performance on large problems.
3
A Truncated Newton Solver
The factorization methods described above are based on computing an exact
solution of (3). But it is not clear if an exact solution of (3) is necessary at each
step of the LM algorithm to solve (1). In fact, we have already seen evidence that
this may not be the case, as (3) is itself a regularized version of (2). Indeed, it is
possible to construct non-linear optimization algorithms in which the linearized
problem is solved approximately. These algorithms are known as inexact Newton
or truncated Newton methods [11].
An inexact Newton method requires two ingredients. First, a cheap method
for approximately solving systems of linear equations. Typically an iterative
linear solver like the Conjugate Gradients method is used for this purpose [11].
Second, a termination rule for the iterative solver. A typical termination rule is
of the form
kHµ (x)∆x + g(x)k ≤ ηk kg(x)k.
(9)
Here, k indicates the LM iteration number and 0 < ηk < 1 is known as the
forcing sequence. Wright & Holt [18] prove that a truncated LM algorithm that
uses an inexact Newton step based on (9) converges for any sequence ηk ≤ η0 < 1
and the rate of convergence depends on the choice of the forcing sequence ηk .
4
Preconditioned Conjugate Gradients
The convergence rate of CG for solving (4) p
depends on the distribution of eigenvalues of Hµ [19]. A useful upper bound is κ(Hµ ), where, κ(Hµ )f is the condition number of the matrix Hµ . For most bundle adjustment problems, κ(Hµ ) is
high and a direct application of CG to (4) results in extremely poor performance.
The solution to this problem is to replace (4) with a preconditioned system.
Given a linear system, Ax = b and a preconditioner M the preconditioned system is given by M −1 Ax = M −1 b. The resulting algorithm is known as Preconditioned Conjugate Gradients algorithm (PCG) and its worst case complexity
now depends on the condition number of the preconditioned matrix κ(M −1 A).
The key computational cost in each iteration of PCG is the evaluation of
the matrix vector product β = Aα and solution of the linear system M φ = ψ
Bundle Adjustment in the Large
7
for arbitrary vectors α and ψ. Thus, for each iteration of PCG to be efficient,
M should be cheaply invertible and for the number of iterations of PCG to be
small, the condition number κ(M −1 A) should be as small as possible. The ideal
preconditioner would be one for which κ(M −1 A) = 1. M = A achieves this,
but it is not a practical choice, as applying this preconditioner would require
solving a linear system equivalent to the unpreconditioned problem. So how
does one choose an effective preconditioner that is cheap to invert and results in
a significant reduction of the condition number of the preconditioned matrix?
The simplest of all preconditioners is the diagonal or Jacobi preconditioner,
i.e. , M = diag(A), which for block structured matrices like Hµ can be generalized to the block Jacobi preconditioner. Hµ also has the special property that its
diagonal blocks B and C are themselves block diagonal matrices. This property
makes the block Jacobi preconditioner
B 0
MJ =
.
(10)
0 C
the optimal block diagonal preconditioner for Hµ [20].
Another option is to apply PCG to the reduced camera matrix S instead
of Hµ . One reason to do this is that S is a much smaller matrix than Hµ , but
more importantly, it can be shown that κ(S) ≤ κ(Hµ ). There are two obvious
choices for block diagonal preconditioners for S. The matrix B [21] and the block
diagonal D(S) of S, i.e. the block Jacobi preconditioner for S.
Consider now, the generalized Symmetric Successive Over-relaxation (SSOR)
preconditioner for Hµ ,
Mω (P ) =
P
P ωE P −1 0
,
0 C
0 C −1 ωE > C
(11)
where P is some easily invertible matrix and 0 ≤ ω < 2 is a scalar parameter.
Observe that for ω = 0, M0 (B) = MJ is the block Jacobi preconditioner.
More interestingly, for ω = 1, it can be shown that using M1 (P ) as a preconditioner for Hµ is exactly equivalent to using the matrix P as a preconditioner for
the reduced camera matrix S [19]. This means that for P = I using M1 (I) as
a preconditioner for Hµ is the same as running pure CG on S and we can run
PCG on S with preconditioners B and D(S) by using M1 (B) and M1 (D(S))
as preconditioners for Hµ . Thus, the Schur complement which started out its
life as a way of specifying the order in which the variables should be eliminated
from Hµ when solving (4) exactly, returns to the scene as a generalized SSOR
preconditoner when solving the same linear system iteratively.
As discussed earlier, the cost of forming and storing the Schur complement
S can be prohibitive for large problems. Indeed, for an inexact Newton solver
that uses PCG on S, almost all of its time is spent in constructing S; the time
spent inside the PCG algorithm is negligible in comparison.
Because PCG only needs access to S via its product with a vector, one way
to evaluate Sx is to use (11) for ω = 1. However we can do even better. Observe
8
Agarwal, Snavely, Seitz & Szeliski
that,
x1 = E > x, x2 = C −1 x1 , x3 = Ex2 , x4 = Bx, Sx = x4 − x3 .
(12)
Thus, we can run PCG on S with the same computational effort per iteration as
PCG on Hµ , while reaping the benefits of a more powerful preconditioner. Even
if we decide to use the block Jacobi preconditioner D(S), it can be constructed
at a cost that is linear in the number of observations O(m) and memory cost
that is linear in the number of cameras - O(p). Both of these are substantially
less than the cost of computing and storing the full matrix S.
Equation (12) is closely related to Domain Decomposition methods for solving
large linear systems that arise in structural engineering and partial differential
equations. In the language of Domain Decomposition, each point in the SFM
problem is a domain, and the cameras form the interface between these domains.
The iterative solution of the Schur complement then falls within the sub-category
of techniques known as Iterative Sub-structuring [19, 22].
5
5.1
Experimental Evaluation
Algorithms
We compared the performance of six bundle adjustment algorithms: explicitdirect, explicit-sparse, normal-jacobi, explicit-jacobi, implicit-jacobi and implicit-ssor.
The first two methods are exact step LM algorithms, and the remaining four
are inexact step LM algorithms. explicit-direct, explicit-sparse and explicit-jacobi
explictly construct the Schur complement matrix S and solve (7) using dense
factorization, sparse direct factorization, and PCG using the block Jacobi preconditioner D(S) respectively. normal-jacobi uses PCG on Hµ with the block
Jacobi preconditioner MJ . implicit-jacobi and implicit-ssor run PCG on S using
the block Jacobi preconditioner D(S) and B respectively. Unlike explicit-jacobi
they use (12) to implicitly evaluate matrix vector products with S.
Assuming that all the algorithms store Hµ in the same format, the difference
between their memory usages depends on how they use the Schur complement
S. implicit-jacobi, implicit-ssor and normal-jacobi do not compute or store S, and
therefore require the least amount of memory. explicit-direct is the most expensive
of the three as it uses O(p2 ) memory to store and factor S. explicit-sparse and
explicit-jacobi are less expensive as they stores S as a sparse matrix, and thus
their storage requirements scale with the sparsity of S. explicit-sparse requires
additional storage to store the Cholesky factorization of S, and the amount of
memory required is a function of the sparsity structure of S and not just the
number of non-zero entries.
For each solver, LM was run for a maximum of 50 iterations, i.e. (3) was
solved 50 times. After each LM iteration the step ∆x may or may not be accepted, depending on whether it leads to a better solution. Inside each iteration
of LM, PCG was run for a minimum of 10 iterations, and terminated when either
kHµ (x)∆x + g(x)k ≤ ηk kg(x)k was satisfied or a 1000 iterations were performed.
Bundle Adjustment in the Large
9
Schur Complement Sparsity
1
0.8
Ladybug
Skeletal
Final
0.6
0.4
0.2
0 1
10
2
3
10
10
4
10
Images
Fig. 2. Datasets. This scatter plot shows each of the datasets in our testbed, colored
according to type (Ladybug, Skeletal, Final). The x-axis is the number of images in
the problem and the y-axis is the sparsity of the Schur complement matrix S. The
background of the plot is shaded according to the characteristics of the problem: small
and dense (white), then in increasing gray-level, small and sparse, large and dense, and
large and sparse.
The forcing sequence ηk was set to a constant ηk = 0.1. At the beginning of LM,
the square root of the diagonal of the matrix J(x0 )> J(x0 ) is estimated and used
as a scaling matrix for the variables. This is a standard method for normalizing
all the variables in a problem [23] and is necessary as some parameters, (e.g.,
radial distortion), are up to 20 orders of magnitude more sensitive than others
(e.g., rotation). For the factorization methods, especially CHOLMOD, this improves numerical stability. For the iterative solvers, this is equivalent to applying
the Jacobi preconditioner before any of the other preconditioners are used.
All six algorithms were implemented as part of a single C++ code base. We
use GotoBLAS2 [24] for dense linear algebra and CHOLMOD [17] for sparse
Cholesky factorization. All experiments were performed on a workstation with
dual Quad-core CPUs clocked at 2.27Ghz with 48GB RAM running a 64-bit
Linux operating system.
5.2
Datasets
We experimented with two sources of data:
1. Images captured at a regular rate using a Ladybug camera mounted on a
moving vehicle. Image matching was done by exploiting the temporal order of
the images and the GPS information captured at the time of image capture.
2. Images downloaded from Flickr.com and matched by the authors of [3]. We
used images from Trafalgar Square and the cities of Dubrovnik, Venice, and
Rome.
For Flickr photographs, the matched images were decomposed into a skeletal
set (i.e., a sparse core of images) and a set of leaf images [1]. The skeletal set
was reconstructed first, then the leaf images were added to it via resectioning
10
Agarwal, Snavely, Seitz & Szeliski
explicit-direct explicit-sparse normal-jacobi explicit-jacobi implicit-jacobi implicit-ssor
τ = 0.1
1
0.8
0.6
0.4
0.2
τ = 0.01
0
1
0.8
0.6
0.4
0.2
τ = 0.001
0
1
0.8
0.6
0.4
0.2
0 1
10
2
10
3
10
4
10
1
10
2
10
3
10
4
10
1
10
2
10
3
10
4
10
1
10
2
10
3
10
4
10
1
10
2
10
3
10
4
10
1
10
2
10
3
10
4
10
Fig. 3. Performance analysis. Each column in this set of plots corresponds to one
of six algorithms, and each row corresponds to one of three tolerances τ . For each
solver (column), a point is colored red if the solver was declared a winner for the given
tolerance, and gray otherwise. Winnings solvers are the ones for which the relative
decrease in the RMS error (rk − r∗ )/(r0 − r∗ ) ≤ τ in the least amount of time (there
can be more than one such solver). The axes of the individual plots are the same as in
Figure 2.
followed by triangulation of the remaing 3D points. The skeletal sets and the
Ladybug datasets were reconstructed incrementally using a modified version of
Bundler [25], which was instrumented to dump intermediate unoptimized reconstructions to disk. This gave rise to the Skeletal and the Ladybug problems. We
refer to the bundle adjustment problems obtained after adding the leaf images
to the skeletal set and triangulating the remaing points as the Final problems.
For each dataset we use a nine parameter camera model (6 for pose, 1 for focal
length and 2 for radial distortion).
Figure 2 plots the three types of problems. The x-axis is the number of
images on a log-scale and the y-axis is the sparsity of the S matrix. The Ladybug
(blue) set has small dense problems and large sparse problems with almost band
diagonal sparsity. The Skeletal (red) set has small dense, and medium to large
sparse problems with random sparsity. The Final (green) set has large problems
with low to high sparsity. Their size and sparsity can pose significant challenges
for state of the art algorithms. Complete details on the properties of each problem
used in the experiments can be found on the project website.
5.3
Analysis
Detailed statistics on the performance of all algorithms are available on the
project website. Here we summarize the broad trends in the data.
We compare solvers across problems by looking at how often they are the first
one to improve the RMS
error by a certain fraction. Concretely, for each solver
pP
m 2
and problem, let rk =
i fi (xk )/m denote the RMS error at end of iteration
Bundle Adjustment in the Large
1
1
explicit−direct
explicit−sparse
normal−jacobi
explicit−jacobi
implicit−jacobi
implicit−ssor
0.8
0.6
0.5
0.4
0.3
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0
0
10
1
2
10
3
4
10
10
Time (seconds)
10
0
0
10
5
10
(a) Rome Final
0.7
2
3
4
10
10
Time (seconds)
10
0.5
0.4
0.3
2
10
Time (seconds)
(d) Trafalgar Skeletal
3
10
0.3
0
0
10
3
4
10
10
explicit−direct
explicit−sparse
normal−jacobi
explicit−jacobi
implicit−jacobi
implicit−ssor
0.8
0.4
0.1
2
10
Time (seconds)
0.9
0.5
0.1
1
10
(c) Dubrovnik Skeletal
0.6
0.2
1
0.3
1
0.7
0.2
10
0.4
0
0
10
5
10
explicit−direct
explicit−sparse
normal−jacobi
explicit−jacobi
implicit−jacobi
implicit−ssor
0.8
0.6
0
0
10
0.5
0.1
1
10
0.9
Relative decrease in RMS error
0.8
0.6
0.2
1
explicit−direct
explicit−sparse
normal−jacobi
explicit−jacobi
implicit−jacobi
implicit−ssor
0.9
0.7
(b) Venice Final
1
Relative decrease in RMS error
0.8
0.7
0.2
explicit−direct
explicit−sparse
normal−jacobi
explicit−jacobi
implicit−jacobi
implicit−ssor
0.9
Relative decrease in RMS error
0.7
1
normal−jacobi
explicit−jacobi
implicit−jacobi
implicit−ssor
0.9
Relative decrease in RMS error
Relative decrease in RMS error
0.8
Relative decrease in RMS error
0.9
11
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1
10
2
10
Time (seconds)
3
10
(e) Venice Skeletal
4
10
0
0
10
1
2
10
10
3
10
Time (seconds)
Ladybug
Fig. 4. A sampling of run time plots. In each plot, the x-axis is time on a log scale,
and the y-axis is the relative decrease in the RMS error (rk − r∗ )/(r0 − r∗ ). The three
black dashed horizontal lines in each plot correspond to the three tolerances, i.e. ,
τ = 0.1, 0.01 and 0.001. Note that explicit-direct and explicit-sparse are missing from
the Venice Final plot as they ran out of memory.
k and let r∗ denote the minimum RMS error across all solvers for that problem.
Then, for a given tolerance τ , we find the solvers for which (rk − r∗ )/(r0 − r∗ ) ≤
τ is satisfied in the least amount of time. We do this for three exponentially
tighter tolerances, τ = 0.1, 0.01, 0.001. Figure 3 plots the results. The three rows
correspond to the values of τ and the six columns correspond to different bundle
adjustment algorithms. As in Figure 2, for each plot, x-axis is the number of
images on a log scale, and y-axis is the sparsity of the Schur compliment matrix
S. In each plot, we plot all the problems in light grey, and then in red, the
problems for which that solver at that tolerance level was one of the winners5 .
From Figure 3, we observe that for problems with up to a few hundred images
and all three tolerances, explicit-direct offers consistently good performance. State
of the art BLAS and LAPACK libraries on multicore systems have excellent
performance, and for small to moderate sized matrices, an exact step LM with
a dense Cholesky solver is hard to beat. This explains the continuing popularity
and success of SBA [5].
For larger problems and high tolerance values τ = 0.1, both normal-jacobi
and implicit-ssor do well, with implicit-ssor working on a much wider variety of
problems. As the value τ decreases, the performance of normal-jacobi rapidly
degrades, indicating that the quality of preconditioning is not good enough to
produce high quality Newton steps in a short amount of time. On the other
5
Since time is measured in seconds, there may be more than one such solver.
12
Agarwal, Snavely, Seitz & Szeliski
hand as τ is decreased, explicit-jacobi which is the most expensive of the iterative
solvers, becomes a viable candidate with the block Jacobi preconditioning of S
starting to show its benefits. implicit-ssor beats explicit-jacobi when S has low
sparsity. This is not surprising, as the cost of computing a nearly dense reduced
camera matrix becomes a significant factor, where as implicit-ssor is able to avoid
this extra computational burden.
A closer examination of the data reveals that despite an overall degradation
in performance, normal-jacobi continues to work well for the larger problems in
the Final set. We believe this is because of the structure of the Skeletal sets
algorithm. After the skeletal set has been reconstructed, the geometric core of
the reconstruction is quite rigid and stable. The error in the reconstruction after
the leaf images have been added is mostly local and no major global changes that
span the entire reconstruction are expected. Therefore, the simple block Jacobi
preconditioner captures the structure of Hµ quite well and at a substantially less
computational cost than any other preconditioner.
It is also worth observing that for some of the problems, as the value of τ is decreased, factorization-based solvers become more competitive. This is expected,
as lower values of τ demand that the LM algorithm take higher quality steps at
each iteration. In this regime, the higher cost of the exact step algorithms, at
least for the smaller problems, wins over the increased iteration complexity of
the inexact step algorithms. Better performance for inexact step algorithms will
require more sophisticated forcing sequence ηk and preconditioners.
There were two surprises. First, the discrepancy in the performance of explicitjacobi and implicit-jacobi . In exact arithmetic, these two algorithms should return
exactly the same answer, but that is not the case in practice. A closer look at
the data revealed that for the same linear system, the two methods resulted in
different number of iterations and answers, sometimes significantly so, indicating
numerical instability in implicit-jacobi which merits further investigation. Second,
explicit-sparse did not emerge as a clear winner in any of the problem categories.
Either the problems were too small for the additional setup cost and the more
complicated algorithm used in CHOLMOD to beat dense Cholesky factorization,
or the problems were large enough that the exact factorization algorithms, sparse
and dense, were beaten by the inexact step algorithms.
In summary, we observe that for large scale problems, the iterative methods
are a significant memory and time win over Cholesky factorization-based methods. Particularly for Final problems, this can be the difference between being
able to solve the problem or not, as evidenced by the large Venice example. But
even for medium sized problems involving a few thousand images, the iterative
solvers are up to an order of magnitude faster while consuming 3-5 times less
memory. For the sparse problems in the Ladybug and Skeletal datasets, the advantage is usually in terms of memory and simplicity of implementation rather
than time, as the cost of exact factorization is offset by its superior quality.
However, we must remember that these experiments were performed on state of
the art workstations with much more RAM than is commonly available today,
which makes the memory usage of the iterative methods even more attractive.
Bundle Adjustment in the Large
13
For small to medium problems, we recommend the use of a dense Choleskybased LM algorithm. For larger problems, the situation is more complicated and
there is no one clear answer. Both implicit-ssor and explicit-jacobi offer competitive
solvers, with implicit-ssor being preferred for problems with lower sparsity and
explicit-jacobi for problems with high sparsity. We hope that once the cause of
numerical instability in implicit-jacobi can be understood and rectified, it will
offer a memory efficient solver that bridges the gap between these two solvers
and works on large bundle adjustment problems, independent of their sparsity.
6
Discussion
The classical solution to bundle adjustment is based on exploiting the primary
sparsity structure of the problem to form a Schur complement and factoring
it [26, 4, 10]. With the exception of a few recent attempts [27, 28], it has remained
the dominant method for doing bundle adjustment. While suitable for problems
with a few hundred images, this method does not scale to larger problems with
thousands of images. In this paper, we have shown with the help of an extensive
test suite of large scale bundle adjustment problems that a truncated Newton
style LM algorithm coupled with a simple preconditioner delivers state of the
art performance at a fraction of the time and memory cost of methods based on
factoring the Schur complement.
Going forward, the preconditioners considered in this paper are relatively
simple but we hope that the identification with domain decomposition methods opens up the possibility of using much more sophisticated preconditioners
developed in the structural engineering literature [19, 22]. Numerical stability is
another critical issue. As we noted earlier, even though explicit-jacobi and implicitjacobi are algebraically equivalent algorithms, they show problem-dependent numerical behavior. A more thorough development that accounts for the numerical
stability of evaluating the matrix-vector products using the explicit and the implicit schemes is needed.
Acknowledgements The authors are grateful to Drew Steedly and David Nister for useful discussions and Kristin Branson for reading multiple drafts of the
paper.
This work was supported in part by National Science Foundation grant IIS0811878, SPAWAR, the Office of Naval Research, the University of Washington
Animation Research Labs, Google, Intel, and Microsoft.
References
1. Snavely, N., Seitz, S.M., Szeliski, R.: Skeletal graphs for efficient structure from
motion. In: CVPR. (2008) 1–8
2. Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.: Modeling and recognition of
landmark image collections using iconic scene graphs. ECCV (1) (2008) 427–440
14
Agarwal, Snavely, Seitz & Szeliski
3. Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a
day. In: ICCV. (2009)
4. Triggs, B., McLauchlan, P., R.I, H., Fitzgibbon, A.: Bundle Adjustment - A modern
synthesis. In: Vision Algorithms’99. (1999) 298–372
5. Lourakis, M., Argyros, A.A.: SBA: A software package for generic sparse bundle
adjustment. TOMS 36 (2009) 2
6. Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Generic and
real-time structure from motion using local bundle adjustment. Image and Vision
Computing 27 (2009) 1178–1193
7. Steedly, D., Essa, I.: Propagation of innovative information in non-linear leastsquares structure from motion. In: ICCV. (2001) 223–229
8. Ni, K., Steedly, D., Dellaert, F.: Out-of-core bundle adjustment for large-scale 3d
reconstruction. In: ICCV. (2007)
9. Steedly, D., Essa, I., Dellaert, F.: Spectral partitioning for structure from motion.
In: ICCV. (2003) 996–103
10. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press (2003)
11. Nocedal, J., Wright, S.: Numerical optimization. Springer (2000)
12. More, J.: The Levenberg-Marquardt algorithm: implementation and theory. Lecture Notes in Math. 630 (1977) 105–116
13. Björck, A.: Numerical methods for least squares problems. SIAM (1996)
14. Madsen, K., Nielsen, H., Tingleff, O.: Methods for non-linear least squares problems. (2004)
15. Brown, D.C.: A solution to the general problem of multiple station analytical
stereotriangulation. Technical Report 43, Patrick Airforce Base, Florida (1958)
16. Trefethen, L., Bau, D.: Numerical linear algebra. SIAM (1997)
17. Chen, Y., Davis, T., Hager, W., Rajamanickam, S.: Algorithm 887: CHOLMOD,
Supernodal Sparse Cholesky Factorization and Update/Downdate. TOMS 35
(2008)
18. Wright, S.J., Holt, J.N.: An inexact Levenberg-Marquardt method for large sparse
nonlinear least squares. J. Austral. Math. Soc. Ser. B 26 (1985) 387–403
19. Saad, Y.: Iterative methods for sparse linear systems. SIAM (2003)
20. Elsner, L.: A note on optimal block-scaling of matrices. Numer. Math. 44 (1984)
127–128
21. Mandel, J.: On block diagonal and Schur complement preconditioning. Numer.
Math. 58 (1990) 79–93
22. Mathew, T.: Domain decomposition methods for the numerical solution of partial
differential equations. Springer Verlag (2008)
23. Dennis Jr, J., Gay, D., Welsch, R.: Algorithm 573: NL2SOLan adaptive nonlinear
least-squares algorithm [E4]. TOMS 7 (1981) 369–383
24. Goto, K., Van De Geijn, R.: High-performance implementation of the level-3 blas.
TOMS 35 (2008) 1–14
25. Snavely, N., Seitz, S.M., Szeliski, R.: Photo Tourism: Exploring photo collections
in 3D. TOG 25 (2006) 835–846
26. Engels, C., Stewenius, H., Nister, D.: Bundle adjustment rules. Photogrammetric
Computer Vision 2 (2006)
27. Lourakis, M., Argyros, A.: Is Levenberg-Marquardt the most efficient optimization
algorithm for implementing bundle adjustment. In: ICCV. (2005) 1526–1531
28. Byröd, M., Åström, K., Lund, S.: Bundle adjustment using conjugate gradients
with multiscale preconditioning. (2009)