Nonrobust and Robust Objective Functions

Nonrobust and Robust Objective Functions
The objective function of the estimators in the input space is built from the sum of
squared Mahalanobis distances (residuals)
d2i =
1
(yi − yio )> C+
yi (yi − yio )
2
σ
i = 1, . . . , n.
Today, the minimization (or maximization) of all the variables are taken into account. If the estimation also has outliers, the scale σ and Cy , refers only to the
inlier structure. Generally covariance does not depend on the individual points in
the input space, Cyi = Cy . For many objective functions the σ is not required do
be given before the estimation process.
Objective functions in which σ is not given.
• Total Least Squares (TLS). Nonrobust.
For Cy = Iq
n
1X
(yi − yi0 )2
n i=1
is a linear errors-in-variables (EIV) regression model. More complex Cy are
done in the same way, either with generalized eigenvalues or singular values.
• Least Absolute Value (LAD). Nonrobust except q = 1.
For Cy = Iq
n
1X
|yi − yi0 | .
n i=1
P
If y is a scalar function n1 ni=1 |yi − α|, the location estimator α = med yi ,
i
and is robust with upto half of the data being outliers. If y is a vector, the
objective function is no longer robust.
In 2D let x̄ be the average of all xi in the data set (xi , yi ) i = 1, . . . , n.
Suppose that x1 is an outlier so far away that all the remaining xi lie on the
other side of x̄. The regression line goes through (x1 , y1 ).
Some objective function transform the input measurements y into the linear space
of the carriers x and the residuals are computed in this space. If the covariance
matrix is used too, Ci = Jxi |yi Cy J>
xi |yi , can depend on the measurement point.
We are interested only on estimation based on elemental subsets. A randomly
chosen subset has the minimum number of input points required to estimate all the
parameters which appear in the space of carriers.
• Least k-th Order Statistics (LkOS). Robust up to a limit.
For inliers Cy = Iq . For an elemental subset the n residuals are sorted in
ascending order d1:n ≤ d2:n . . . ≤ dn:n , and the estimate is dk:n .
If k = n/2 the least median of squares (LMedS) obtained as med d2i .
i
Can use the absolute values of the distances instead. The estimation has N
different elemental subsets trials, where N is a value from few hundreds to a
few thousands. Analytical computation of N is not feasible because does not
take into account how much influence the inlier noise have. Computing with
σinlier ∼ 0 is not correct.
The best estimate of LMedS is the minimum value from the N trials
min med d2i .
i
N
The LMedS rejects upto half the data being outliers, but has serious limitations.
LMedS becomes a nonrobust estimator if two lines are present with not very
large inlier noise. The LMedS is always smaller across the two lines than
just along the lower line. See 2D example page 3. Also, in certain situations
a single outlier point can be pivotal in the estimation result. See 2D example
on page 4.
The LMedS estimator was in fashion in computer vision for a few years in
the beginning of 1990-s. Once the researchers realized the limitations and
that LMedS cannot usually recover more that one inlier structure, the LMedS
disappeared from computer vision.
2
60% of the data in the lower part.
LMedS all the time fails if the noise increases.
If the structured inlier is less than six standard deviations
in the outlier separation. [Stewart, 1997]
3
n=12 points. A single outlier on the right.
LMedS fails if a single inlier point is moved a little.
4
• Generalized Projection based M-estimator (gpbM). Robust.
Several inlier structures can be estimated, including unknown matrices like,
lines in 3D [3 × 2] or projective motion factorization from F images [3F ×
(3F − 3)], one estimate per iteration. The estimation is done in the linear
space of the carriers with N elemental subsets. For each input point the
carrier with the largest Mahalanobis distance is chosen x̃i . This is the worstcase scenario.
The multidimensional projection of x̃i through the matrix Θ are Mahalanobis
distances, where Jxi |yi is given and the diagonal covariance matrix Ŝ for an
inlier structure was already found. From all the trials, choose the Θ which
projects into the closest mode being the highest.
In each iteration three steps are performed: inlier scale estimation, mean
shift based recovery of the structure and inlier/outlier dichotomy. The inlier
scale estimation starts with all the remaining data points, therefore depends
on all the nondetected structures. The inlier/outlier dichotomy, stopping of
the algorithm, is decided based on a given scalar value. These can become
potential problems if there are a lot of outliers.
The gpbM will be succinctly presented in the lecture on estimation of inlier
structures without constraints provided by the user.
• Multiple inlier structures. Robust.
The estimator does not have the above mentioned problems. The same three
steps are performed, but their implementations are completely different. For
the scale estimation just a single structure, independent from the rest of the
existing data, is taken into account. They are no differences on how a structure of inliers or outliers is recovered. All the input data is processed. The
structures are classified based on average densities, and the strongest inlier
structures come out at the top. The method does not have any threshold, the
user retains just the first structures which are inliers.
The multiple inlier structures estimator is applied for vectors θ only, which
is valid since only one σ exist. Again the worst-case scenario is applied. The
scale estimation starts from the minimum sum of Mahalanobis distances for
the first n n points. The estimator will be presented with all the steps, in
the lecture cited already above.
5
Objective functions in which σ has to be given.
• M-estimator. Not very robust.
In computer vision was mainly used for the input data for inliers Cy = Iq .
n
1X
di
ρ
n i=1
σ
ρ(u) is a nonnegative, even-symmetric loss function, nondecreasing with |u|.
Under this form, if there are random outlier, robustness is less
1/(number of unknowns + 1) .
Since the σ has to be given, the error may come also from a wrong guess.
We are interested only in the family of redescending function. In this case
0 ≤ ρ(u) ≤ 1
|u| ≤ 1
ρ(u) = 1
|u| > 1 .
The user gives the inlier noise parameter σ and results in
u=
residual
.
σ
If |u| > 1, the residual becomes an outlier and in the given iteration the point
will not contribute to the estimate. The M-estimator may not have a unique
solution.
6
The loss function
ρ(u) =
1 − (1 − u2 )d
1
|u| ≤ 1
|u| > 1
with d = 0, 1, 2, 3 was mostly used when the M-estimator was in fashion in
computer vision.
d = 0, is called the zero-one estimator. All the input points are used in every
iterations.
d = 1, is called the skipped mean since ρ(u) = u2 for |u| ≤ 1.
d = 2, is related to the Epanechnikov kernel because the weigths w(u) =
1 dρ(u)
2
u du are propositional with (1 − u ). Will be discussed in the lecture about
mean shift estimation.
d = 3, is the Tukey’s biweight function, used in the statistical literature.
The iterative computation of the M-estimator is done with iterative weighted
Total Least Squares. We will use different slides.
• RANdom SAmple Consensus (RANSAC). Robust up to a limit.
Processed in the linear (carrier) space with the inlier noise Cxi |yi = Im . The
RANSAC estimator was proposes in 1980 and is based on elemental subsets.
In most widely used in robust estimators.
The user must give the scale σ before the estimation. The estimate, from N
trials, is the one which has the largest number of residuals k satisfying
max dk:n
k
subject to
kdk:n k < σ .
RANSAC has several problems. The given σ may not fit all the inlier structures. The inlier noise may vary in a sequence of images. All these difficulties
can result in errors. A lot of improvements were published, but none of them
eliminates completely the problems with RANSAC.
RANSAC will be detailed in different slides.
7