Robust Estimation of Nonrigid Transformation for Point Set

Robust Estimation of Nonrigid Transformation for Point Set Registration
Jiayi Ma1,2 , Ji Zhao3 , Jinwen Tian1 , Zhuowen Tu4 , and Alan L. Yuille2
1
Huazhong University of Science and Technology, 2 Department of Statistics, UCLA
3
The Robotics Institute, CMU, 4 Lab of Neuro Imaging, UCLA
{jyma2010, zhaoji84, zhuowen.tu}@gmail.com, [email protected], [email protected]
Abstract
widely studied [5, 4, 7, 19, 14]. By contrast, nonrigid registration is more difficult because the underlying nonrigid
transformations are often unknown, complex, and hard to
model [6]. But nonrigid registration is very important because it is required for many real world tasks including
hand-written character recognition, shape recognition, deformable motion tracking and medical image registration.
In this paper, we focus on the nonrigid case and present
a robust algorithm for nonrigid point set registration. There
are two unknown variables we have to solve for in this problem: the correspondence and the transformation. Although
solving for either variable without information regarding
the other is difficult, an iterated estimation framework can
be used [4, 3, 6]. In this iterative process, the estimate of the
correspondence is used to refine the estimate of the transformation, and vice versa. But a problem arises if there
are errors in the correspondence which occurs in many applications particularly if the transformation is large and/or
there are outliers in the data (e.g., data points that are not
undergoing the non-rigid transformation). In this situation,
the estimate of the transformation will degrade badly unless it is performed robustly. The main contribution of our
approach is to robustly estimate the transformations from
the correspondences using a robust estimator named the L2 Minimizing Estimate (L2 E) [20, 2].
More precisely, our approach iteratively recovers the
point correspondences and estimates the transformation between two point sets. In the first step of the iteration, feature descriptors such as shape context are used to establish correspondence. In the second step, we estimate the
transformation using the robust estimator L2 E. This estimator enable us to deal with the noise and outliers in the
correspondences. The nonrigid transformation is modeled
in a functional space, called the reproducing kernel Hilbert
space (RKHS) [1], in which the transformation function has
an explicit kernel representation.
We present a new point matching algorithm for robust
nonrigid registration. The method iteratively recovers the
point correspondence and estimates the transformation between two point sets. In the first step of the iteration, feature descriptors such as shape context are used to establish
rough correspondence. In the second step, we estimate the
transformation using a robust estimator called L2 E. This is
the main novelty of our approach and it enables us to deal
with the noise and outliers which arise in the correspondence step. The transformation is specified in a functional
space, more specifically a reproducing kernel Hilbert space.
We apply our method to nonrigid sparse image feature correspondence on 2D images and 3D surfaces. Our results
quantitatively show that our approach outperforms state-ofthe-art methods, particularly when there are a large number of outliers. Moreover, our method of robustly estimating
transformations from correspondences is general and has
many other applications.
1. Introduction
Point set registration is a fundamental problem which
frequently arises in computer vision, medical image analysis, and pattern recognition [5, 4, 6]. Many tasks in these
fields – such as stereo matching, shape matching, image
registration and content-based image retrieval – can be formulated as a point matching problems because point representations are general and easy to extract [5]. The points
in these tasks are typically the locations of interest points
extracted from an image, or the edge points sampled from
a shape contour. The registration problem then reduces to
determining the correct correspondence and to find the underlying spatial transformation between two point sets extracted from the input data.
The registration problem can be categorized into rigid or
nonrigid registration depending on the application and the
form of the data. Rigid registration, which only involves a
small number of parameters, is relatively easy and has been
1.1. Related Work
The iterated closest point (ICP) algorithm [4] is one
of the best known point registration approaches. It uses
1
2.1. Problem Formulation Using L2 E : Robust Estimation
nearest-neighbor relationships to assign a binary correspondence, and then uses estimated correspondence to refine the
transformation. Belongie et al. [3] introduced a method for
registration based on the shape context descriptor, which
incorporates the neighborhood structure of the point set
and thus helps establish correspondence between the point
sets. But these methods ignore robustness when they recover the transformation from the correspondence. In related work, Chui and Rangarajan [6] established a general
framework for estimating correspondence and transformations for nonrigid point matching. They modeled the transformation as a thin-plate spline and did robust point matching by an algorithm (TRS-RPM) which involved deterministic annealing and soft-assignment. Alternatively, the coherence point drift (CPD) algorithm [17] uses Gaussian radial basis functions instead of thin-plate splines. Another
interesting point matching approach is the kernel correlation (KC) based method [22], which was later improved
in [9]. Zheng and Doermann [27] introduced the notion
of a neighborhood structure for the general point matching
problem, and proposed a matching method, the robust point
matching-preserving local neighborhood structures (RPMLNS) algorithm. Other related work includes the relaxation
labeling method, generalized in [11], and the graph matching approach for establishing feature correspondences [21].
The main contributions of our work include: (i) we propose a new robust algorithm to estimate a spatial transformation/mapping from correspondences with noise and outliers; (ii) we apply the robust algorithm to nonrigid point
set registration and also to sparse image feature correspondence.
Parametric estimation is typically done using maximum
likelihood estimation (M LE). It can be shown that M LE
is the optimal estimator if it is applied to the correct probability model for the data (or to a good approximation). But
M LE can be badly biased if the model is not a sufficiently
good approximation or, in particular, there are a significant
fraction of outliers. In many point matching problems, it
is desirable to have a robust estimator of the transformation
f because the point correspondence set S usually contains
outliers. There are two choices: (i) to build a more complex
model that includes the outliers – which is complex since it
involves modeling the outlier process using extra (hidden)
variables which enable us to identify and reject outliers,
or (ii) to use an estimator which is different from M LE
but less sensitive to outliers, as described in Huber’s robust
statistics [8]. In this paper, we use the second method and
adopt the L2 E estimator [20, 2], a robust estimator which
minimizes the L2 distance between densities, and is particularly appropriate for analyzing massive data sets where
data cleaning (to remove outliers) is impractical. More formally, L2 E estimator for model f (x|θ) recommends estimating the parameter θ by minimizing the criterion:
Z
n
2X
f (xi |θ).
(1)
L2 E(θ) = f (x|θ)2 dx −
n i=1
To get some intuition for why L2 E is robust, observe that its
penalty for a low probability point xi is −f (xi |θ) which is
much less than the penalty of − log f (xi |θ) given by M LE
(which becomes infinite as f (xi |θ) tends to 0). Hence
M LE is reluctant to assign low probability to any points,
including outliers, and hence tends to be biased by outliers. By contrast, L2 E can assign low probabilities to
many points, hopefully to the outliers, without paying too
high a penalty. To demonstrate the robustness of L2 E, we
present a line-fitting example which contrasts the behavior
of M LE and L2 E, see Fig. 1. The goal is to fit a linear regression model, y = αx + , with residual ∼ N (0, 1),
by estimating α using M LE and
Pn L2 E. This gives, respectively, α̂M LE = arg
max
α
i=1 log φ(yi − αxi |0, 1)
h
i
P
n
1
2
√
and α̂L2 E = arg minα 2 π − n i=1 φ(yi − αxi |0, 1) ,
2. Estimating Transformation from Correspondences by L2 E
Given a set of point correspondences S = {(xi , yi )}ni=1 ,
which are typically perturbed by noise and by outlier points
which undergo different transformations, the goal is to estimate a transformation f : yi = f (xi ) and fit the inliers.
In this paper we make the assumption that the noise on
the inliers is Gaussian on each component with zero mean
and uniform standard deviation σ (our approach can be directly applied to other noise models). More precisely, an
inlier point correspondence (xi , yi ) satisfies yi − f (xi ) ∼
N (0, σ 2 I), where I is an identity matrix of size d×d, with d
being the dimension of the point. The data {yi − f (xi )}ni=1
can then be thought of as a sample set from a multivariate
normal density N (0, σ 2 I) which is contaminated by outliers. The main idea of our approach is then, to find the
largest portion of the sample set (e.g., the underlying inlier
set) that “matches” the normal density model, and hence
estimate the transformation f for the inlier set. Next, we introduce a robust estimator named L2 -minimizing estimate
(L2 E) which we use to estimate the transformation f .
where φ(x|µ, Σ) denotes the normal density.
As shown in Fig. 1, L2 E is very resistant when we contaminate the data by outliers, but M LE does not show this
desirable property. L2 E always has a global minimum at
approximately 0.5 (the correct value for α) but M LE’s estimates become steadily worse as the amount of outliers increases. Observe, in the bottom right figure, that L2 E also
has a local minimum near α = 2, which becomes deeper as
the number n of outliers increases so that the two minima
become approximately equal when n = 200. This is ap2
our formulation.
We define an RKHS H by a positive definite matrixvalued kernel Γ : IRd × IRd → IRd×d . The optimal transformation f which minimizes
Pn the L2 E functional (2) then
takes the form f (x) =
i=1 Γ(x, xi )ci [16, 26], where
the coefficient ci is a d × 1 dimensional vector (to be determined). Hence, the minimization over the infinite dimensional Hilbert space reduces to finding a finite set of
n coefficients ci . But in point correspondence problem the
point set typically contains hundreds or thousands of points,
which causes significant complexity problems (in time and
space). Consequently, we adopt a sparse approximation,
and randomly pick only a subset of size m input points
{x̃i }m
i=1 to have nonzero coefficients in the expansion of the
solution. This follows [18] who found that this approximation works well and that simply selecting a random subset
of the input points in this manner, performs no worse than
more sophisticated and time-consuming methods. Therefore, we seek a solution of form
25
50
20
40
15
y
y
30
10
20
5
10
0
0
0
2
4
6
8
−5
0
10
2
4
x
1
0 100
10
160
180
200
−2
1.5
2
2.5
120
−1
log−likelihood
log−likelihood
8
x 10
100
120
140
−1
6
x
3
4
x 10
−3
−4
140
160
180
200
−3
−5
−7
−5
−6
0
1
2
3
4
−9
0
5
0.5
1
α
α
0.4
0.4
α = 0.5
0
−0.4
−0.8
L2E
L2E
0
200
180
160
140
−0.4
200
180
160
−0.8
140
120
120
100
−1.2
0
f (x) =
100
1
2
3
4
5
−1.2
0
0.5
1
1.5
2
2.5
α
α
m
X
(3)
Γ(x, x̃i )ci .
i=1
The chosen point set {x̃i }m
i=1 are somewhat analogous to
“control points” [5]. By including a regularization term for
imposing smooth constraint on the transformation, the L2 E
functional (2) becomes:
Figure 1. Comparison between L2 E and M LE for linear regression as the number of outliers varies. Top row: data samples, where the inliers are shown by cyan pluses, and the outliers by magenta circles. The goal is to estimate the slope α of
the line model y = αx. We vary the number of data samples
n = 100, 120, · · · , 200, by always adding 20 new outliers (retaining the previous samples). In the second column the outliers
are generated from another line model y = 2x. Middle and bottom rows: the curves of M LE and L2 E respectively. The M LE
estimates are correct for n = 100 but rapidly degrade as we add
outliers, see how the peak of the log-likelihood changes in the second row. By contrast, (see third row) L2 E estimates α correctly
even when half the data is outliers and also develops a local minimum to fit the outliers when appropriate (third row, right column).
Best viewed in color.
n
L2 E(f, σ 2 ) =
2X
1
1
−
d
d/2
n i=1 (2πσ 2 )d/2
2 (πσ)
e−
kyi −Pm
j=1 Γ(xi ,x̃j )cj k
2σ 2
2
+ λkf k2Γ ,
(4)
where λ > 0 controls the strength of regularization, and
the stabilizer kf k2Γ is defined by an inner product, e.g.,
kf k2Γ = hf, f iΓ . By choosing a diagonal decomposable
2
kernel [26]: Γ(xi , xj ) = e−βkxi −xj k I with β determining
the width of the range of interaction between samples (i.e.
neighborhood size), the L2 E functional (4) may be conveniently expressed in the following matrix form:
propriate because, in this case, the contaminated data also
comes from the same linear parametric model with slope
α = 2, e.g., y = 2x.
We now apply the L2 E formulation in (1) to the point
matching problem, assuming that the noise of the inliers
is given by a normal distribution, and obtain the following
functional criterion:
n
1
2X
1
L2 E(C, σ ) = d
−
n i=1 (2πσ 2 )d/2
2 (πσ)d/2
2
e−
kyiT −Ui,· C k
2σ 2
2
+ λ tr(C T ΓC),
(5)
where kernel matrix Γ ∈ IRm×m is called the Gram matrix
2
with Γij = Γ(x̃i , x̃j ) = e−βkx̃i −x̃j k , U ∈ IRn×m with
2
Uij = Γ(xi , x̃j ) = e−βkxi −x̃j k , Ui,· denotes the i-th row
of the matrix U , C = (c1 , · · · , cm )T is the coefficient matrix of size m × d, and tr(·) denotes the trace.
n
2X
1
−
φ yi − f (xi )|0, σ 2 I .
L2 E(f, σ 2 ) = d
d/2
n i=1
2 (πσ)
(2)
We model the nonrigid transformation f by requiring it to
lie within a specific functional space, namely a reproducing
kernel Hilbert space (RKHS) [1, 24, 16]. Note that other parameterized transformation models, for example, thin-plate
splines (TPS) [23, 15], can also be easily incorporated into
2.2. Estimation of the Transformation
Estimating the transformation requires taking the derivative of the L2 E cost function, see equation (5), with respect
3
enables our method to be applied to large scale data.
Algorithm 1: Estimation of Transformation from Correspondences
Input: Correspondence set S = {(xi , yi )}ni=1 ,
parameters γ, β, λ
Output: Optimal transformation f
1 Construct Gram matrix Γ and matrix U ;
2
2 Initialize parameter σ and C;
3 Deterministic annealing:
4
Using the gradient (6), optimize the objective
function (5) by a numerical technique (e.g., the
quasi-Newton algorithm with C as the old value);
5
Update the parameter C ← arg minC L2 E(C, σ 2 );
6
Anneal σ 2 = γσ 2 ;
7 The transformation f is determined by equation (3).
2.3. Implementation Details
The performance of point matching algorithms depends,
typically, on the coordinate system in which points are expressed. We use data normalization to control for this. More
specifically, we perform a linear re-scaling of the correspondences so that the points in the two sets both have zero mean
and unit variance.
We define the transformation f as the initial position plus
a displacement function v: f (x) = x + v(x) [17], and solve
for v instead of f . This can be achieved simply by setting
the output yi to be yi − xi .
Parameter settings. There are three main parameters in
this algorithm: γ, β and λ. The parameter γ controls the annealing rate. The parameters β and λ control the influence
of the smoothness constraint on the transformation f . In
general, we found our method was very robust to parameter
changes. We set γ = 0.5, β = 0.8 and λ = 0.1 throughout this paper. Finally, the parameter σ 2 and C in line 2 of
algorithm 1 were initialized to 0.05 and 0 respectively.
to the coefficient matrix C, which is given by:
∂L2 E
2U T [V ◦ (W ⊗ 11×d )]
=
+ 2λΓC,
∂C
nσ 2 (2πσ 2 )d/2
(6)
where V = U C − Y and Y = (y1 , · · · , yn )T are matrices
of size n × d, W = exp{diag(V V T )/2σ 2 } is an n × 1
dimensional vector, diag(·) is the diagonal of a matrix, 11×d
is an 1 × d dimensional row vector of all ones, ◦ denotes the
Hadamard product, and ⊗ denotes the Kronecker product.
By using the derivative in equation (6), we can employ
efficient gradient-based numerical optimization techniques
such as the quasi-Newton method and the nonlinear conjugate gradient method to solve the optimization problem.
But the cost function (5) is convex only in the neighborhood
of the optimal solution. Hence to improve convergence we
use a coarse-to-fine strategy by applying deterministic annealing on the inlier noise parameter σ 2 . This starts with
a large initial value for σ 2 which is gradually reduced by
σ 2 7→ γσ 2 , where γ is the annealing rate. Our algorithm is
outlined in algorithm 1.
Computational complexity. By examining equations (5)
and (6), we see that the costs of updating the objective function and gradient are both O(dm2 + dmn). For the numerical optimization method, we choose the Matlab Optimization toolbox, which implicitly uses the BFGS QuasiNewton method with a mixed quadratic and cubic line
search procedure. Thus the total complexity is approximately O(dm2 + dm3 + dmn). In our implementation,
the number m of the control points required to construct the
transformation f in equation (3) is in general not large, and
so use m = 15 for all the results in this paper (increasing
m only gave small changes to the results). The dimension d
of the data in feature point matching for vision applications
is typically 2 or 3. Therefore, the complexity of our method
can be simply expressed as O(n), which is about linear in
the number of correspondences. This is important since it
3. Nonrigid Point Set Registration
Point set registration aims to align two point sets {xi }ni=1
(the model point set) and {yj }lj=1 (the target point set).
Typically, in the nonrigid case, it requires estimating a nonrigid transformation f which warps the model point set to
the target point set. We have shown above that once we
have established the correspondence between the two point
sets even with noise and outliers, we are able to estimate the
underlying transformation between them. Next, we discuss
how to find correspondences between two point sets.
3.1. Establishment of Point Correspondence
Recall that our method described above does not jointly
solve the transformation and point correspondence. In order
to use algorithm 1 to solve the transformation between two
point sets, we need initial correspondences.
In general, if the two point sets have similar shapes, the
corresponding points have similar neighborhood structures
which could be incorporated into a feature descriptor. Thus
finding correspondences between two point sets is equivalent to finding for each point in one point set (e.g., the
model) the point on the other point set (e.g., the target) that
has the most similar feature descriptor. Fortunately, the initial correspondences need not be very accurate, since our
method is robust to noise and outliers. Inspired by these
facts, we use shape context [3] as the feature descriptor in
the 2D case, using the Hungarian method for matching with
the χ2 test statistic as the cost measure. In the 3D case, the
spin image [10] can be used as a feature descriptor, where
the local similarity is measured by an improved correlation
4
without iteration, since we focus on determining the right
correspondences which does not need precise recovery of
the underlying transformation, and our approach then plays
a role of rejecting outliers.
Algorithm 2: Nonrigid Point Set Registration
1
2
3
4
5
6
7
8
Input: Two point sets {xi }ni=1 , {yj }lj=1
Output: Aligned model point set {x̂i }ni=1
Compute feature descriptors for the target point set
{yj }lj=1 ;
repeat
Compute feature descriptors for the model point
set {xi }ni=1 ;
Estimate the initial correspondences based on the
feature descriptors of two point sets;
Solve the transformation f warping the model
point set to the target point set using algorithm 1;
Update model point set {xi }ni=1 ← {f (xi )}ni=1 ;
until reach the maximum iteration number;
The aligned model point set {x̂i }ni=1 is given by
{f (xi )}ni=1 in the last iteration.
4. Experimental Results
In order to evaluate the performance of our algorithm,
we conducted two types of experiments: i) nonrigid point
set registration for 2D shapes; ii) sparse image feature correspondence on 2D images and 3D surfaces.
4.1. Results on Nonrigid Point Set Registration
We tested our method on the same synthesized data as in
[6] and [27]. The data consists of two different shape models: a fish and a Chinese character. For each model, there
are five sets of data designed to measure the robustness of
registration algorithms under deformation, occlusion, rotation, noise and outliers. In each test, one of the above distortions is applied to a model set to create a target set, and 100
samples are generated for each degradation level. We use
the shape context as the feature descriptor to establish initial
correspondences. It is easy to make shape context translation and scale invariant, and in some applications, rotation
invariance is also required. We use the rotation invariant
shape context as in [27].
Fig. 2 shows the registration results of our method on
solving different degrees of deformations and occlusions.
As shown in the figure, we see that for both datasets with
moderate degradation, our method is able to produce an
almost perfect alignment. Moreover, the matching performance degrades gradually and gracefully as the degree of
degradation in the data increases. Consider the results on
the occlusion test in the fifth column, it is interesting that
even when the occlusion ratio is 50 percent our method can
still achieve a satisfactory registration result. Therefore our
method can be used to provide a good initial alignment for
more complicated problem-specific registration algorithms.
To provide a quantitative comparison, we report the results of four state-of-the-art algorithms such as shape context [3], TPS-RPM [6], RPM-LNS [27], and CPD [17]
which are implemented using publicly available codes. The
registration error on a pair of shapes is quantified as the
average Euclidean distance between a point in the warped
model and the corresponding point in the target. Then the
registration performance of each algorithm is compared by
the mean and standard deviation of the registration error of
all the 100 samples in each distortion level. The statistical
results, error means, and standard deviations for each setting are summarized in the last column of Fig. 2. In the
deformation test results (e.g., 1st and 3rd rows), five algorithms achieve similar registration performance in both fish
and Chinese character at low deformation levels, and our
method generally gives better performance as the degree of
coefficient. Then the matching is performed by a method
which encourages geometrically consistent groups.
The two steps of estimating correspondences and transformations are iterated to obtain a reliable result. In this
paper, we use a fixed number of iterations, typically 10 but
more when the noise is big or when there are a large percentage of outliers contained in the original point sets. We
summarize our point set registration method in algorithm 2.
3.2. Application to Image Feature Correspondence
The image feature correspondence task aims to find visual correspondences between two sets of sparse feature
points {xi }ni=1 and {yj }lj=1 with corresponding feature descriptors extracted from two input images. In this paper,
we assume that the underlying relationship between the input images is nonrigid. Our method for this task is to estimate correspondences by matching feature descriptors using a smooth spatial mapping f . More specifically, we first
estimate the initial correspondences based on the feature descriptors, and then use the correspondences to learn a spatial
mapping f fitting the inliers by algorithm 1.
Once we have obtained the spatial mapping f , we then
have to establish accurate correspondences. We predefine a threshold τ and judge a correspondence (xi , yj ) to
be an inlier provided it satisfies the following condition:
2
2
e−kyj −f (xi )k /2σ > τ . We set τ = 0.5 in this paper.
Note that the feature descriptors in the point set registration problem are calculated based on the point sets
themselves, and are recalculated in each iteration. However, the descriptors of the feature points here are fixed
and calculated from images in advance. Hence the iterative
technique for recovering correspondences, estimating spatial mapping, and re-estimating correspondences can not be
used here. In practice, we find that our method works well
5
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
756
0.08
757
758
759
760
761
762
SC
TPS−RPM
RPM−LNS
CPD
Ours
0.06
0.04
Error
865
866
867
868
869
870
763
764
765
766
767
0.02
0
−0.02
−0.04
768
769
770
771
772
773
0.02
0.08
774
775
776
777
778
779
0.035
0.05
0.065
Degree of Deformation
SC
TPS−RPM
RPM−LNS
CPD
Ours
0.12
Error
864
0.04
0
780
781
782
783
784
785
−0.04
0.1
0.2
0.3
0.4
Occlusion Ratio
0.08
0.06
786
787
788
789
790
0.04
Error
#***
810
811
CVPR
812
#***
CVPR
CVPR 2013 Submission #***. CONFIDENTIAL
REVIEW COPY. DO NOT DISTRIBUTE.
813
#***
814
CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY.
DO NOT DISTRIBUTE.
791
792
793
794
795
796
SC
TPS−RPM
RPM−LNS
CPD
Ours
0.02
0
−0.02
−0.04
797
798
799
800
801
802
0.16
0.12
Error
CVPR
803
804
805
806
807
808
0.08
0.04
0.02
0.035
0.05
0.065
Degree of Deformation
SC
TPS−RPM
RPM−LNS
CPD
Ours
815
816
817
818
819
820
821
822
823
824
0.08
825
826
827
828
829
830
831
917
2nd and
4th rows),
we observe
thatand
ourDescription
method shows
Surface
Feature
Detection
withmuch
Apmore robustness
with the other
four algorithms.
plications compared
to Mesh Matching.
In CVPR,
pages 373–
8
380,
2009. 4 on rotation, noise and outliers
More
experiments
are also
performed on the two shape models, as shown in Fig. 3.
[19] J. Zhao, J. Ma, J. Tian, J. Ma, and D. Zhang. A RoFrom
the results, we again see that our method is able to
bust Method for Vector Field Learning with Applicagenerate good alignment when the degradation is moderate,
tion to Mismatch Removing. In CVPR, pages 2977–
and the registration performance degrades gradually and
is still acceptable as the amount of degradation increases.
6
929
930
931
932
933
934
935
936
937
938
939
838
839
840
841
842
843
0.08
940
941
942
943
944
945
844
845
846
847
848
946
947
948
949
950
951
849
850
851
852
0.5
853
854
not surprising because we use the rotation invariant shape
context as the feature descriptor. We also performed experiments on 3D data and got similar results.
In conclusion, our method is efficient for most non-rigid
point set registration problems with moderate, and in some
cases severe, distortions. It can also be used to provide
a good initial alignment for more complicated problemspecific registration algorithms.
9
923
924
925
926
927
928
832
833
0.5
834
835
836
837
899
0
900
809
0.1
0.2
0.3
0.4
901
Occlusion Ratio
902
8 5. .
Figure
855
Figure 2.results
Pointon
setdataset
registration
resultscharacter
of our method
on thetop
fishto(top)
and
Chinese
character
(bottom) shapes
27], with deformation
and
ure 4. 903
Point matching
of Chinese
[4]. From
bottom,
on deformation,
rotation,[6,noise,
Figure
6.experiments
.
856 For each
lusion 904
and outliers
presented
in every
For each
experiment,
the upper
figure
is thepluses)
model onto
and target
pointpoint
sets, set
and(red circles).
occlusion
presented
in two
everyrows.
two rows.
Thegroup
goal of
is to
align the model
point
set (blue
the target
lower 905
figure is group
the registration
result. From
left tofigure
right,isincreasing
of degradation.
of experiments,
the upper
the modeldegree
and target
point sets, and the lower figure is the registration result. From857
left to right,
[17]
L.
Yang,
P.
Meer,
and
D.
J.
Foran.
Unsupervised
Seg2011. performance
2
858 context
increasing
degree
of
degradation.
The
rightmost
figures
are
comparisons
of the2984,
registration
of our method with shape
906
mentation
on Robust[27]
Estimation
(SC) [3],
TPS-RPMBased
[6], RPM-LNS
and CPD and
[17]Color
on theAccorresponding
error
indicate the
registration
error
means
[20] datasets.
Y. ZhengThe
and
D. bars
Doermann.
Robust
Point859
Matching
907
tive
Contour
Models.
IEEE
Trans.
on
Information
and
standard
deviations
over
100
trials.
860
for
Nonrigid
Shapes
by
Preserving
Local
Neighbor908
Technology in Biomedicine, 9(3):475–486, 2005. 3
861 5, 6
hood Structures. TPAMI, 28(4):643–649, 2006.
909
862
910
deformation
increases.
In the K.
occlusion
(e.g.,
Note that our method is not affected by rotation which is
[18] A. Zaharescu,
E. Boyer,
Varanasi,test
andresults
R. Horaud.
863
911
912
913
914
915
916
918
919
920
921
922
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
879
880
881
933
934
935
936
937
938
939
882
883
884
885
886
887
940
941
942
943
944
945
888
889
890
891
CVPR
CVPR
892
#***
#***
CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
CVPR 893
2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
CVPR
#***
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
946
947
894
948
810
6.756
Results
on 3Dfeature
surfacescorrespondence
of deformable810objects
(INRIA
sequence).
Top:
results
on frames
527; bottom:
frames
Figure 4.Figure
Results
of image
on 2D
imageDance-1
pairs of
deformable
objects.
From 525
left and
to right,
increasing
degree of
757
811
811
895
949
758
812 right pair denotes the removed suspect outliers.
812
530
and
550.
For
each
group,
the
left
pair
denotes
the
identified
suspect
inliers,
and
the
deformation. The
correspondences areCVPR
79.61%, 56.57%,
51.84% and 45.71% respectively, and the corre-950
759 inlier percentages in the initial 813
813
896
#***
760
814
814
CVPR 2013 Submission #***.pairs
CONFIDENTIAL
COPY. DO NOT
DISTRIBUTE.
sponding precision-recall
are REVIEW
(100.00%,
99.73%),
(99.06%, 99.53%), (99.09%,
761
815 99.35%) and (100.00%, 98.96%) respectively. The
815
897
951
762
816
816
lines
indicate
matching
results
(blue
=
true
positive,
green
=
false
negative,
red
=
false
763
817 positive). Best viewed in color.
817
898
952
702
764
818
818
899
900
901
902
903
904
905
906
907
908
909
910
765
766
767
826
827
828
829
830
831
713
714
715
716
717
838
839
840
841
842
843
786
787
788
789
790
821
822
823
824
825
Inlier
721
722
723
724
725
844
672
726
791
845
673
727
792
846
793
847
674
728
794
848
675
729
795
849
676
730
Figure
4.
Point
matching
results
on
dataset
of
Chinese
character
[4].
From
top
to
bottom,
experiments
on
deformation,
rotation, noise,
Figure 4. Point matching results
on
dataset
of
Chinese
character
[4].
From
top
to
bottom,
experiments
on
deformation,
rotation,
noise,
796
Figure 3. Point matching results. From top to bottom, experimental results on rotation, noise, occlusion850
and outliers presented in every two
677
731 point sets, and
occlusion
and outliers
every
twopoint
rows.sets,
For and
each group 851
of experiment, the upper figure is the model and target
occlusion and outliers presented in every two rows. For each group of
experiment,
the upper
figure ispresented
the modelinand
target
797
rows. For each group of experiment,
the upper figure
is the
model
and target point
sets, and
the lower figure
is the registration result. From
678registration result. From left to right, increasing
732
thedegradation.
lower figure is the registration result. From left to right, increasing
the lower figure is the
degree of
798
852degree of degradation.
left to right, increasing degree of degradation.
679
733
799
853
680
734
800
854
801
855
681
735
802
856
682
736
911
912
913
914
915
Figure 3. From top to bottom, results on rotation, noise and outliers
916
presented in every two rows. For each group of experiments, the
CVPR
#***
689
690
691
692
693
694
695
696
697
698
699
700
701
839
840
841
842
843
844
845
846
847
848
79.61
56.57
51.84
45.71
963
964
965
deforma-966
the initial967
case, our968
(100.00%, 99.73%). On the rightmost pair, the
tion is relatively large and the inlier percentage in
correspondences is only about 45.71%. In this
method still obtains a good precision-recall pair (100.00%,969
98.96%). Note that there are still a few false positives and970
917
971
upper figure is the data, and the lower figure is the registration
false negatives in the results since we could not precisely
result. From left to right, increasing degree of degradation.
estimate the true warp functions between the image pairs
9 in this framework. The average run time of our method on
these image pairs is about 0.5 seconds on an Intel Pentium
4.2. Results on
Image Feature Correspondence
8
8
2.0 GHz PC with Matlab code.
In this section, we perform experiments on real images,
In addition, we also compared our method to two stateand test the performance of our method for sparse image
of-the-art
methods, such as identifying point corresponFigure 4. Point matching results on dataset of Chinese character [4]. From top to bottom, experiments on deformation, rotation, noise,
occlusion
and outliers
presented in every two rows. For each
group of experiment,
the uppercontain
figure is the model
and target point sets, and
feature
correspondence.
These
images
deformable
dences
by
correspondence function (ICF) [12] and vector
the lower figure is the registration result. From left to right, increasing degree of degradation.
objects and consequently the underlying relationships befield consensus (VFC) [26]. The ICF uses support vector
tween the images are nonrigid.
regression to learn a correspondence function pair which
maps points in one image to their corresponding points in
Fig. 4 contains a newspaper
with different amounts of
7
another, and then reject outliers by the estimated corresponspatial warps. We aim to establish correspondences bedence functions. While the VFC converts the outlier retween sparse image features in each image pair. In our evaljection problem into a robust vector field learning problem,
uation, we first extract SIFT [13] feature points in each inand learns a smooth field to fit the potential inliers as well
put image, and estimate the initial correspondences based
as estimates a consensus inlier set. The results are shown in
on the corresponding SIFT descriptors. Our goal is then to
Table 1. We see that all the three algorithms work well when
reject the outliers contained in the initial correspondences
the deformation contained in the image pair is relatively
and, at the same time, to keep as many inliers as possible.
slight. As the amount of deformation increases, the perforPerformance is characterized by precision and recall.
mance of ICF degenerates rapidly. But VFC and our method
The results of our method are presented in Fig. 4. For the
seem to be relatively unaffected even when the number of
leftmost pair, the deformation of the newspaper is relatively
outliers exceeds the number of inliers. Still, our method
slight. There are 466 initial correspondences with 95 outgains slightly better results compared to VFC.
liers, and the inlier percentage is about 79.61%. After using
Our next experiment involves feature point matching on
our method to establish accurate correspondences, 370 out
3D surfaces. We adopt MeshDOG and MeshHOG [25] as
of the 371 inliers are preserved, and simultaneously all the
the feature point detector and descriptor to determine the
95 outliers are rejected. The precision-recall pair is about
CVPR
683
684
685
686
687
688
826
827
828
829
830
831
957
958
832
ICF [12] (96.05,
98.38) (83.95, 86.73) (80.43, 95.48) (75.42, 92.71) 959
833
834
VFC [26] (100.00,
97.04) (98.59, 99.53) (98.09, 99.35) (98.94, 97.91) 960
835
718 Ours
836
(100.00,
99.73) (99.06, 99.53) (98.09, 99.35) (100.00, 98.96)961
719
837
962
838
720
711
712
832
833
834
835
836
837
780
781
782
783
784
785
953
819
820
Table 1. Performance comparison on the image pairs in Fig. 4. The954
707
values in the first row are the inlier percentages (%), and the pairs955
708
709
956
are the precision-recall pairs (%).
710
821
822
823
824
825
774
775
776
777
778
779
#***
703
704
705
706
819
820
768
769
770
771
772
773
CVPR
#***
756
757
758
759
760
761
762
757
758
759
760
761
762
763
764
765
766
767
763
764
765
766
767
768
769
770
771
772
773
768
769
770
771
772
773
774
775
776
777
778
779
774
775
776
777
778
779
780
781
782
783
784
785
780
781
782
783
784
785
786
787
788
789
790
786
787
788
789
790
809
791
792
793
794
795
796
797
798
799
800
801
802
CVPR
#***
CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
803
804
805
806
807
808
756
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
803
804
805
806
807
808
809
809
8
#***
810
811
812
813
814
737
738
739
740
815
816
817
818
819
820
821
822
823
824
825
861
862
863
741
742
743
744
745
746
826
827
828
829
830
831
832
833
834
835
836
837
832
833
834
835
836
837
838
839
840
841
842
843
838
839
840
841
842
843
844
791
845
792
846
793
847
794
848
795
849
4. Point experiments
matching results
on dataset ofrotation,
Chinesenoise,
character [4].850
From top to bottom, experiments on deformation, rotation, noise,
Figure 4. Point matching results on dataset of Chinese character [4].
top to bottom,
on deformation,
796 From Figure
occlusion
and outliers
presented
in every
two rows.
For each
experiment, the upper figure is the model and target point sets, and
occlusion and outliers presented in every two rows. For each group 797
of experiment,
the upper
figure is
the model
and target
point sets,
and group of851
lower figure is the registration result. From left to right, increasing852
degree of degradation.
the lower figure is the registration result. From left to right, increasing
of degradation.
798degreethe
799
853
800
854
801
855
802
856
803
804
805
806
807
808
CVPR
857
858
859
860
747
748
749
750
751
752
844
845
846
847
848
849
850
851
852
853
854
857
858
859
860
855
856
857
858
859
860
861
862
863
861
862
863
753
754
755
8
7
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
CVPR
CVPR
#***
#***
CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
108
162
163
164
165
166
109
110
111
112
113
114
167
168
169
170
171
172
115
116
117
118
119
References
173
174
175
176
177
120
121
122
123
124
125
[1] N. Aronszajn. Theory of Reproducing Kernels. Trans. of the American Mathematical Society, 68(3):337–404, 1950.
[2] A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones. Robust and
Efficient Estimation by Minimising A Density Power Divergence.
Biometrika, 85(3):549–559, 1998.
[3] S. Belongie, J. Malik, and J. Puzicha. Shape Matching and Object
Recognition Using Shape Contexts. TPAMI, 24(24):509–522, 2002.
[4] P. J. Besl and N. D. McKay. A Method for Registration of 3-D
Shapes. TPAMI, 14(2):239–256, 1992.
[5] L. G. Brown. A Survey of Image Registration Techniques. ACM
Computing Surveys, 24(4):325–376, 1992.
[6] H. Chui and A. Rangarajan. A new point matching algorithm for
non-rigid registration. CVIU, 89:114–141, 2003.
[7] A. W. Fitzgibbon. Robust Registration of 2D and 3D Point Sets.
Image and Vision Computing, 21:1145–1153, 2003.
[8] P. J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981.
[9] B. Jian and B. C. Vemuri. Robust Point Set Registration Using Gaussian Mixture Models. TPAMI, 33(8):1633–1645, 2011.
[10] A. E. Johnson and M. Hebert. Using Spin-Images for Efficient Object
Recognition in Cluttered 3-D Scenes. TPAMI, 21(5):433–449, 1999.
[11] J.-H. Lee and C.-H. Won. Topology Preserving Relaxation Labeling
for Nonrigid Point Matching. TPAMI, 33(2):427–432, 2011.
[12] X. Li and Z. Hu. Rejecting Mismatches by Correspondence Function. IJCV, 89(1):1–17, 2010.
[13] D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV, 60(2):91–110, 2004.
[14] B. Luo and E. R. Hancock. Structural Graph Matching Using
the EM Algorithm and Singular Value Decomposition. TPAMI,
23(10):1120–1136, 2001.
[15] J. Ma, J. Zhao, Y. Zhou, and J. Tian. Mismatch Removal via Coherent
Spatial Mapping. In ICIP, 2012.
[16] C. A. Micchelli and M. Pontil. On Learning Vector-Valued Functions. Neural Computation, 17(1):177–204, 2005.
[17] A. Myronenko and X. Song. Point Set Registration: Coherent Point
Drift. TPAMI, 32(12):2262–2275, 2010.
[18] R. Rifkin, G. Yeo, and T. Poggio. Regularized Least-Squares Classification. In Advances in Learning Theory: Methods, Model and
Applications, 2003.
[19] S. Rusinkiewicz and M. Levoy. Efficient Variants of the ICP Algorithm. In 3DIM, pages 145–152, 2001.
[20] D. W. Scott. Parametric Statistical Modeling by Minimum Integrated
Square Error. Technometrics, 43(3):274–285, 2001.
[21] L. Torresani, V. Kolmogorov, and C. Rother. Feature Correspondence
via Graph Matching: Models and Global Optimization. In ECCV,
pages 596–609, 2008.
[22] Y. Tsin and T. Kanade. A Correlation-Based Approach to Robust
Point Set Registration. In ECCV, pages 558–569, 2004.
[23] G. Wahba. Spline Models for Observational Data. SIAM, Philadelphia, PA, 1990.
[24] A. L. Yuille and N. M. Grzywacz. A Mathematical Analysis of the
Motion Coherence Theory. IJCV, 3(2):155–175, 1989.
[25] A. Zaharescu, E. Boyer, K. Varanasi, and R. Horaud. Surface Feature
Detection and Description with Applications to Mesh Matching. In
CVPR, pages 373–380, 2009.
[26] J. Zhao, J. Ma, J. Tian, J. Ma, and D. Zhang. A Robust Method for
Vector Field Learning with Application to Mismatch Removing. In
CVPR, pages 2977–2984, 2011.
[27] Y. Zheng and D. Doermann. Robust Point Matching for Nonrigid Shapes by Preserving Local Neighborhood Structures. TPAMI,
28(4):643–649, 2006.
178
179
180
181
182
183
126
127
128
129
130
131
184
185
186
187
188
189
132
133
134
135
136
137
138
139
140
141
142
China Scholarship Council (No. 201206160008). Z. Tu
is supported by NSF CAREER award IIS-0844566, NSF
award IIS-1216528, and NIH R01 MH094343.
Figure 1. Results on 3D surfaces of deformable objects (INRIA Dance-1 sequence, frames 525 and 527). There are 191 initial correspondences, and 156 of which are preserved by our method. The left pair denotes the identified suspect inliers, and the right pair denotes the
190
191
192
193
194
195
196
197
198
199
200
outliers.
Figureremoved
5.suspect
Results
on 3D surfaces of deformable objects (INRIA
2.
Estimating
Spatial Mapping
Corre- on
the largest
portion of the
sampleand
set (which
corresponds
to
Dance-1
sequence).
Top:fromresults
frames
525
527;
bottom:
the underlying inlier set) that “matches” the normal denspondences by L E
sity model, and consequently estimate the spatial mapping
framesGiven
530
and
550.
For
each
group,
the
left
pair
denotes
the
f based on the estimated inlier set. Next, we introduce a roa set of putative point correspondences S =
bust estimator named L -minimizing estimate (L E), and
{(x , y )}
which may be perturbed by noise and outliers,
identified
inliers,
right
pair
denotes
use
it to estimate
the spatial
mapping f . the removed
the goal is suspect
to learn a spatial mapping
f : y =and
f (x ) to the
fit
the inliers.
2.1. Problem Formulation Using L E
suspectWithout
outliers.
loss of generality, we make the assumption that
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
2
i
i
n
i=1
2
i
2
i
2
the noise on the inliers is Gaussian on each component with
zero mean and uniform standard deviation σ, i.e., for an inlier point correspondence (xi , yi ) it satisfies yi − f (xi ) ∼
N (0, σ 2 I), where I is an identity matrix of size d×d, and d
is the dimension of the point. The data set {yi − f (xi )}ni=1
can then be seen as a sample set from a multivariate normal density N (0, σ 2 I) with some unknown contamination
outliers. The main idea of our technique is then, to find
Parametric estimation algorithms typically rely on maximum likelihood estimation (M LE). It is known that the
maximum likelihood is well suited for the estimation problem in which the model is a good descriptor for the data and
all the data are indeed coming from the model. However,
the M LE estimates can be badly biased if the model is not
good enough or there are a small fraction of outliers. To
estimate the spatial mapping f , a robust estimator is desir-
201
202
203
204
205
206
207
208
209
210
211
212
initial correspondences. For the dataset, we use the INRIA
Dance-1 sequence [25], in which each surface is from the
2
same moving person. Note that
it is hard to give a quantitative performance comparison since the correctness of a
correspondence is hard to decide. So we just schematically
show our results in Fig. 5. In the upper group, the two
frames are nearby, and the level of deformation is relatively
slight. There are 191 initial correspondences, of which 156
are preserved after using our method to establish accurate
correspondences. In the lower group, the two frames are
far apart, and the level of deformation is relatively large,
leading to less good initial correspondences. There are 23
initial correspondences, and 18 of which are preserved by
our method.
161
213
214
215
5. Conclusion
In this paper, we have presented a new approach for
nonrigid point set registration. A key characteristic of our
approach is the estimation of transformation from correspondences based on a robust estimator named L2 E. The
computational complexity of estimation of transformation
is linear in the scale of correspondences. We applied our
method to sparse image feature correspondence, where the
underlying relationship between images is nonrigid. Experiments on a public dataset for nonrigid point registration,
2D and 3D real images for sparse image feature correspondence demonstrate that our approach yields results superior
to those of state-of-the-art methods when there is significant
noise and/or outliers in the data.
Acknowledgment
The authors gratefully acknowledge the financial support
from the National Science Foundation (No. 0917141) and
8