A Gauss-Newton iteration for solving TLS pro

Un metodo iterativo di tipo Gauss-Newton
per la risoluzione del problema TLS
A Gauss-Newton iteration for solving TLS problems
Antonio Fazzi1 and Dario Fasino2
1 Gran
Sasso Science Institute.
2 University
of Udine.
Como, 17/02/2017
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
1 / 17
The Total Least Squares problem
The Total Least Squares (TLS) problem
Given A ∈ Rm×n with m > n and b ∈ Rm , the Total Least Squares (TLS)
problem is dened as
min k(E | f )k2F
with b + f ∈ Im(A + E )
E ,f
where E ∈ Rm×n and f ∈ Rm . After we nd such a matrix (Ē | f¯) whose
Frobenius norm is minimum, each x ∈ Rn satisfying
(A + Ē )x = b + f¯
is a
.
solution of the TLS problem
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
2 / 17
The Total Least Squares problem
Solution, existence and uniqueness
Dene the matrix C = (A | b) and consider the SVD
C = UΣV T .
In the following we assume that the problems has a unique solution; this
happens if
σn0 > σn+1 , where σn0 and σn+1 are the smallest singular values of A and C ,
respectively.
The solution of the TLS
problem is the vector xTLS such that
xTLS
vn+1 = −ζ
where ζ is a normalization constant.
−1
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
3 / 17
The Total Least Squares problem
The function η
It's known that xTLS can be characterized as point of global minimum of
the function
kAx − bk22
.
η(x) =
1 + kxk22
The function η(x) measures the backward error of the vector x as
approximated solution of the linear system Ax = b :
Lemma
For each vector x there exists a rank one matrix (Ē |f¯) such that
(A + Ē )x = b + f¯,
k(Ē |f¯)k2F = k(Ē |f¯)k22 = η(x).
Moreover, for each matrix (E |f ) such that (A + E )x = b + f it holds true
k(E |f )k2F ≥ k(E |f )k22 ≥ η(x).
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
4 / 17
The Total Least Squares problem
The function η
It's known that xTLS can be characterized as point of global minimum of
the function
kAx − bk22
.
η(x) =
1 + kxk22
The function η(x) measures the backward error of the vector x as
approximated solution of the linear system Ax = b :
Lemma
For each vector x there exists a rank one matrix (Ē |f¯) such that
(A + Ē )x = b + f¯,
k(Ē |f¯)k2F = k(Ē |f¯)k22 = η(x).
Moreover, for each matrix (E |f ) such that (A + E )x = b + f it holds true
k(E |f )k2F ≥ k(E |f )k22 ≥ η(x).
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
4 / 17
Solution of TLS problems using GaussNewton
The Gauss-Newton algorithm
The Gauss-Newton algorithm is a cheap optimization method that can
solve nonlinear least squares problems
min kf (x)k22 ,
x∈Rn
f : Rn → Rm ,
m ≥ n.
The basic idea is to linearize f (x) in a neighborhood of x ;
the step x → x + h is computed by replacing kf (x + h)k22 with
kf (x) + J(x)hk22 and solving the corresponding ordinary LS problem.
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
5 / 17
Solution of TLS problems using GaussNewton
The GaussNewton applied to the function η
We set
η(x) = kf (x)k22
where
1
f (x) = √
(Ax − b).
1 + xT x
Hence, minx kf (x)k2
xTLS .
The Jacobian of f is
J(x) = √
1
1+
xT x
A−
1
(1 +
3
x T x) 2
(Ax − b)x T .
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
6 / 17
Solution of TLS problems using GaussNewton
Outline of the algorithm
Basic GN-TLS method
Input:
Output:
A, b (problem data); ε, maxit (stopping criteria)
x̂ (approximation of xTLS )
Set k := 0
Compute x0 := arg minx kAx − bk2
Compute f0 := f (x0 ) and J0 := J(x0 )
while kJkT fk k2 ≥ ε and k < maxit
Compute hk := arg minh kJk h + fk k2
Set xk+1 := xk + hk
Set k := k + 1, fk := f (xk ), Jk := J(xk )
end
x̂ := xk
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
7 / 17
Solution of TLS problems using GaussNewton
Computational Cost
The GaussNewton method solves at each step a least squares problem,
which can be written in the form:
minkJk h + fk k22 =
h
2
r
k
T
x
h
+
r
min A−
k
k
h
1 + xkT xk
2
We can compute only once the QR factorization of A, and then we use a
technique which updates the QR factorization of rank one perturbations.
This update has only a quadratic cost.
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
8 / 17
Solution of TLS problems using GaussNewton
Computational Cost
The GaussNewton method solves at each step a least squares problem,
which can be written in the form:
minkJk h + fk k22 =
h
2
r
k
T
x
h
+
r
min A−
k
k
h
1 + xkT xk
2
We can compute only once the QR factorization of A, and then we use a
technique which updates the QR factorization of rank one perturbations.
This update has only a quadratic cost.
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
8 / 17
Solution of TLS problems using GaussNewton
Geometry of the method
Proposition
Let
1
f (x) = √
(Ax − b).
1 + xT x
Then its image Im(f ) ⊂ Rm is an open subset of the ellipsoid v T Xv = 1,
where X = (CC T )+ .
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
9 / 17
Solution of TLS problems using GaussNewton
Image of f (x)
Figure: Surface plot of Im(f ). Blue star: f (xTLS ); red star: f (xLS ).
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
10 / 17
An improved variant
Improved variant of the basic GN-TLS
Motivation
ensure convergence
increase convergence speed
The value f (x + h) comes from a linear combination of f (x) and
f (x) + J(x)h, so it's not the retraction of the Gauss-Newton step!
Idea
Introduce a step-length parameter α such that
f (x + αh) = τ̂ (f (x) + J(x)h).
α = 1/(1 + x T h/(1 + x T x))
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
11 / 17
An improved variant
Improved variant of the basic GN-TLS
Motivation
ensure convergence
increase convergence speed
The value f (x + h) comes from a linear combination of f (x) and
f (x) + J(x)h, so it's not the retraction of the Gauss-Newton step!
Idea
Introduce a step-length parameter α such that
f (x + αh) = τ̂ (f (x) + J(x)h).
α = 1/(1 + x T h/(1 + x T x))
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
11 / 17
An improved variant
Example
Figure: Example in dimension 1. Notice the dierence between the two methods.
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
12 / 17
An improved variant
GN-TLS method with optimal step length
Input:
Output:
A, b (problem data); ε, maxit (stopping criteria)
x̂ (approximation of xTLS )
Set k := 0
Compute x0 := arg minx kAx − bk2
Compute f0 := f (x0 ) and J0 := J(x0 )
while kJkT fk k2 ≥ ε and k < maxit
Compute hk := arg minh kJk h + fk k2
Set αk := 1/(1 + xkT hk /(1 + xkT xk ))
Set xk+1 := xk + αk hk
Set k := k + 1, fk := f (xk ), Jk := J(xk )
end
x̂ := xk
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
13 / 17
An improved variant
Equivalence with an inverse power iteration
The GN-TLS method with optimal step length is equivalent to an inverse
power method involving
the matrix C T C . Indeed, let
q
sk = (xk , −1)T /
1 + xkT xk . Then,
sk+1 = βk (C T C )−1 sk ,
Meanwhile,
βk = 1/k(C T C )−1 sk k2 .
f (xk+1 ) = βk (CC T )+ f (xk ).
Corollary
The GN-TLS method with optimal step length is convergent. Moreover,
kf (xk ) − f (xTLS )k = O(( σσn+n 1 )2k ),
kxk − xTLS k = O(( σσn+n 1 )2k )
|η(xk ) − η(xTLS )| = O(( σσn+n 1 )4k )
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
14 / 17
An improved variant
Equivalence with an inverse power iteration
The GN-TLS method with optimal step length is equivalent to an inverse
power method involving
the matrix C T C . Indeed, let
q
sk = (xk , −1)T /
1 + xkT xk . Then,
sk+1 = βk (C T C )−1 sk ,
Meanwhile,
βk = 1/k(C T C )−1 sk k2 .
f (xk+1 ) = βk (CC T )+ f (xk ).
Corollary
The GN-TLS method with optimal step length is convergent. Moreover,
kf (xk ) − f (xTLS )k = O(( σσn+n 1 )2k ),
kxk − xTLS k = O(( σσn+n 1 )2k )
|η(xk ) − η(xTLS )| = O(( σσn+n 1 )4k )
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
14 / 17
An improved variant
Numerical experiments
Test problem by Björck, Heggernes, Matstoms (2000)
−8
5
10
10
−10
1.2
0
10
10
−12
1.15
−5
10
10
−14
1.1
−10
10
10
−16
1.05
−15
10
10
−18
1
−20
10
10
0
10
20
0.95
0
10
20
0
10
20
Figure: Left: log kJkT fk k. Center: Errors log kxk − xTLS k (solid lines) and
log |η(xk ) − η(xTLS )| (dashed lines). Right: plot of αk
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
15 / 17
Conclusions
Conclusions
The method produces a sequence of approximations which converges
with no restriction. The value η(xk ), available during the iterations,
estimates the backward error in Ax ≈ b .
The method avoids to compute the SVD. At each step it only solves a
least squares problem whose matrix is a rank one perturbation of the
data matrix. This can be useful in some circumstances:
if A is large and sparse we can use (transpose free) Krylov methods
where the matrix is only involved in matrix-vector products;
if the QR factorization of A is known in advance.
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
16 / 17
Conclusions
Conclusions
D. Fasino, A. Fazzi.
A GaussNewton iteration for Total Least Squares problems.
arXiv:1608.01619 (2016). Submitted.
Thank you for your attention.
A. Fazzi, D. Fasino (1 Gran Sasso Science Institute. 2 University of Udine.)
17 / 17