Computing Covariance Matrices for Constrained

Computing Covariance Matrices for Constrained
Nonlinear Large Scale Parameter Estimation
Problems Using Krylov Subspace Methods
Ekaterina Kostina and Olga Kostyukova
Abstract. In the paper we show how, based on the preconditioned Krylov
subspace methods, to compute the covariance matrix of parameter estimates,
which is crucial for efficient methods of optimum experimental design.
Mathematics Subject Classification (2000). Primary 65K10; Secondary 15A09,
65F30.
Keywords. Constrained parameter estimation, covariance matrix of parameter estimates, optimal experimental design, nonlinear equality constraints,
iterative matrix methods, preconditioning.
1. Introduction
Parameter estimation (PE) and optimum experimental design (OED) are important steps in establishing models that reproduce a given process quantitatively
correctly. The aim of parameter estimation is to reliably and accurately identify
model parameters from sets of noisy experimental data. The “accuracy” of the
parameters, i. e. their statistical distribution depending on data noise, can be
estimated up to the first order by means of a covariance matrix approximation
and corresponding confidence regions. In practical applications, however, one often finds that the experiments performed to obtain the required measurements are
expensive, but nevertheless do not guarantee satisfactory parameter accuracy or
even well-posedness of the parameter estimation problem. In order to maximize
the accuracy of the parameter estimates additional experiments can be designed
with optimal experimental settings or controls (e.g. initial conditions, measurement devices, sampling times, temperature profiles, feed streams etc.) subject to
This work was completed with the support of ESF Activity “Optimization with PDE Constraints”
(Short visit grant 2990).
2
E. Kostina and O. Kostyukova
constraints. As an objective functional a suitable function of the covariance matrix
can be used. The possible constraints in this problem describe costs, feasibility of
experiments, domain of models etc.
The methods for optimum experimental design that have been developed
over the last few years can handle processes governed by differential algebraic
equations (DAE), but methods for processes governed by partial differential equations (PDE) are still in their infancy because of the extreme complexity of models
and the optimization problem. The aim of the paper is one aspect of numerical
methods for OED, namely an efficient computation of covariance matrices and
their derivatives. So far numerical methods for parameter estimation and optimal
design of experiments in dynamic processes have been based on direct linear algebra methods. On the other hand, for very large scale constrained systems with
sparse matrices of special structure, e.g. originating from discretization of PDEs,
the direct linear algebra methods are not competitive with iterative linear algebra
methods even for forward models. Generally, the covariance matrix can be calculated via a corresponding generalized inverse matrix of Jacobians of parameter
estimation problem. But basically the generalized inverse can not be computed
explicitly by iterative methods, hence the statistical assessment of the parameter
estimate can not be provided by the standard procedures of such methods. Hence,
in case of PE and OED in PDE models, generalizations of iterative linear algebra methods to the computation of the covariance matrix and its derivatives are
crucial for practical applications.
The aim of this paper is to show how to compute covariance matrices using
Krylov subspace methods and thus to make a step towards efficient numerical
methods for optimum experimental design for processes described by systems of
non-stationary PDEs.
2. Covariance matrix and its numerical computation using Krylov
subspace methods
As in [3], we consider the constrained nonlinear parameter estimation problem
min ||F1 (z)||22 , s.t. F2 (z) = 0,
z∈Rn
(2.1)
which results from a discretization of parameter estimation problem in a process
described by e.g. PDE. Here the vector z includes unknown parameters and variables resulting from discretization of a PDE, ||s||22 = sT s.
To solve problem (2.1) we apply a generalized Gauss-Newton method. At each
iteration of the Gauss-Newton algorithm we solve a linear least-squares problem,
which can be written in the form
min ||Ax − b||22 , s.t. Bx = 0,
x
(2.2)
Covariance Matrices for Large Scale Parameter Estimation Problems
where b = −F1 (z), A = A(z) =
m̄ = n − m > 0, z is given, and
∂F1 (z)
∂z
∈ IRk×n , B = B(z) =
rankB = m,
rank
A
B
∂F2 (z)
∂z
3
∈ IRm×n ,
= n.
(2.3)
It is shown in [3] how to compute covariance matrices when the underlying
finite dimensional constrained linearized parameter estimation problem (2.2) is
solved using a conjugate gradient technique. One of the intriguing results of [3]
is that solving linear constrained least squares problems by conjugate gradient
methods we get as a by-product the covariance matrix and confidence intervals as
well as their derivatives. The results are generalized for LSQR methods and are
numerically tested in [10, 6] and are briefly summarized in the following.
Under the regularity assumptions (2.3), the optimal solution x∗ and the Lagrange vector λ∗ satisfy the KKT system
∗ T T
x
A b
A A BT
K
=
,
K
:=
,
λ∗
0
B
0
and can be explicitly written using the linear operator A+ as follows
T
b
A
0
x ∗ = A+
, A+ = II 0 K−1
.
0
0 II
The approximation of the covariance matrix can be expressed as [3]
II 0
T
+
C=A
A+ .
0 0
(2.4)
It is shown in [3] that C satisfies a linear system of equations.
Lemma 2.1. The covariance matrix C (2.4) is equal to the sub-matrix X of the
matrix K−1
X W
K−1 =
S T
and satisfies the following linear equation system with respect to variables C ∈
IRn×n and S ∈ IRm×n
C
II
K
=
.
(2.5)
S
0
Another presentation of the covariance matrix is given in [10]
Lemma 2.2. The covariance matrix C (2.4) is equal to
C = Z(Z T AT AZ)−1 Z T ,
where the columns of the orthogonal matrix Z ∈ IRn×m̄ span the null space of B.
Let us present formulas for the computation of the derivatives of the covariance matrix C = C(A, B) and the matrix S = S(A, B), as the functions of the
matrices A and B. These derivatives are needed in numerical methods for design
of optimal nonlinear experiments in case the trace of C is chosen as a criterion.
4
E. Kostina and O. Kostyukova
Let A(t) = A + t∆A and B(µ) = B + µ∆B. Then
n
X
∂trC(A(t), B)
∂t
=
∂trC(A, B(µ))
∂µ
= −2
(i)
−
C (i)T (∆AT A + AT ∆A)C (i) ,
i=1
n
X
C (i)T ∆BS (i) ,
i=1
(i)
where C and S denote i−th columns of matrices C and S respectively. The
columns C (i) and S (i) are related as follows B T S (i) = (ei − AT AC (i) ), i = 1, ..., n.
2.1. Computing of the covariance matrix using conjugate gradient method
Let us show how to compute the covariance matrix using CG method.
Algorithm CG(A, b)(for solving minx ||Ax − b||22 )
Step 1: (Initialization)
x1 = 0, r1 = AT b, p1 = r1 .
Step 2: For k = 1, 2, 3, . . . repeat steps 2.1–2.2.
2.1: (Update)
1.
αk = rkT rk /(pTk AT Apk )
2. xk+1 = xk + αk pk
3. rk+1 = rk − αk AT Apk
T
4.
βk = rkT rk /(rk−1
rk−1 )
5. pk+1 = rk+1 + βk pk
2.2: (Test for convergence)
If ||rk+1 || ≤ Tol , then STOP.
To solve problem (2.2) we apply the Algorithm CG(A, b) with A = AP,
where P is an orthogonal projector onto the null space of B. In Step 2.1 we have
to compute vectors APpk and PAT APpk . Since Ppk = pk by construction, then
APpk = Apk , and we need to compute only one projection PAT Apk of a vector
AT Apk onto the null space of the matrix B. This projection is not calculated
explicitly, but a corresponding unconstrained least squares problem (lower level
problem) is solved, i.e. to compute Pw we solve q ∗ = arg min ||B T q − w||22 and set
q
Pw = w − B T q ∗ .
Theorem 2.3. [3] Suppose, that the Algorithm CG(AP, b) results after m̄ iterations in the vectors xm̄ , p1 , ..., pm̄ . The solution x∗ of the problem (2.2) and
the corresponding covariance matrix C (2.4) can be computed as x∗ = xm̄ , C =
P diag (γk , k = 1, ..., m̄)P T with P = (p1 , ..., pm̄ ), γk = 1/||Apk ||22 , k = 1, ..., m̄.
2.2. Computing of the covariance matrix using LSQR
Let us show how to compute the covariance matrix using LSQR, an iterative
method for solving large linear systems or least-squares problems which is known
to be numerically more reliable than other conjugate-gradient methods [7, 8].
Covariance Matrices for Large Scale Parameter Estimation Problems
5
Algorithm LSQR(A, b)(for solving minx ||Ax − b||22 )
Step 1: (Initialization)
β1 u1 = b, α1 v1 = AT u1 , p1 = v1 , x0 = 0,
φ̄1 = β1 , ρ̄1 = α1
Step 2: For k = 1, 2, 3, . . . repeat steps 2.1–2.4.
2.1: (Continue the bidiagonalization)
1. βk+1 uk+1 = Avk − αk uk
2. αk+1 vk+1 = AT uk+1 − βk+1 vk
2.2: (Construct and apply next orthogonal transformation)
1
2
1.
ρk = (ρ̄2k + βk+1
)2
2.
ck = ρ̄k /ρk
3.
sk = βk+1 /ρk
4. θk+1 = sk αk+1 /ρk
5. ρ̄k+1 = −ck αk+1
6.
φk = ck φ̄k
7. φ̄k+1 = sk φ̄k .
2.3: (Update)
1.
dk = (1/ρk )pk
2.
xk = xk−1 + (φk /ρk )pk
3. pk+1 = vk+1 − θk+1 pk
2.4: (Test for convergence)
If ||rk || := ||Axk − b|| ≤ Tol , then STOP.
In this algorithm, the parameters βi and αi are computed such that ||ui ||2 =
||vi ||2 = 1.
To solve problem (2.2) we apply the Algorithm LSQR(A, b) with A = AP,
where P is an orthogonal projector onto the null space of B. In Step 2.1 we have
to compute vectors APvk and PAT uk+1 . Since Pvk = vk by construction (see
[10]), then APvk = Avk , and we need to compute only one projection PAT uk+1
of a vector AT uk+1 onto the null space of the matrix B. This projection is not
calculated explicitly, but as in CG method a corresponding lower level problem
is solved, i.e. to compute Pw we solve q ∗ = arg min ||B T q − w||22 and set Pw =
q
w − B T q∗ .
Theorem 2.4. Suppose, that the Algorithm LSQR(AP, b) results after m̄ iterations
in the vectors xm̄ , d1 , ..., dm̄ . The solution x∗ of the problem (2.2) and the corresponding covariance matrix C (2.4) can be computed as x∗ = xm̄ , C = DDT with
D = (d1 , ..., dm̄ ).
Proof. The first assertion follows from the theory of Krylov-type methods. We
prove that C = DDT .
From properties of LSQR [7], we get DT PAT APD = II, where the projection
matrix P can be expressed as P = ZZ T with orthogonal matrix Z spanning the
null space of B. Hence
DT ZZ T AT AZZ T D = (DT Z)(Z T AT AZ)(Z T D) = II.
(2.6)
6
E. Kostina and O. Kostyukova
We show, that the matrix Z T D is nonsingular. Indeed, by LSQR [7] we have D =
V R−1 , where R is an upper bi-diagonal matrix, V = (v1 , ..., vm̄ ), rank V = m̄.
Since vi ∈ Ker B and rank V = m̄, it follows that V = ZN with a nonsingular
matrix N . Summing up, Z T D = Z T ZN R−1 = N R−1 is nonsingular as a product
of nonsingular matrices and we get from (2.6) that (Z T AT AZ)−1 = (Z T D)(DT Z).
Hence
C = Z(Z T AT AZ)−1 Z T = Z(Z T D)(DT Z)Z T = DDT
since ZZ T D = ZZ T ZN R−1 = ZN R−1 = V R−1 = D.
Remark 2.5. Here and in what follows we assume that all algorithms converge at
least after m̄ iterations. If the algorithm converges after less than m̄ iterations,
that is for some k < m̄ we have ||rk || ≤ Tol , then xk solves the problem (2.2),
but for computing the matrix C we need to continue the process, as we need
the “complete” set of the vectors p1 , ..., pm̄ (CG methods) or d1 , ..., dm̄ (LSQR
methods).
Preliminary numerical results [6, 10] show, as expected, that the iterative
process needs a proper preconditioning in order to achieve a reasonable efficiency
and accuracy in the elements of the covariance matrix. Our aim is to accelerate the
solution process by applying appropriate preconditioners [1, 4, 9]. The question
to be answered further in this paper is how to compute the linear operator A+
and the covariance matrix in case the KKT systems are solved iteratively with
preconditioning.
3. Computing covariance matrix using preconditioned Krylov
subspace methods
Solving problem (2.2) by a Krylov subspace method is equivalent to solving the
following system of linear equations
P T AT APx = P T AT b.
(3.1)
Here, P as before is an orthogonal projector onto null space of B. Preconditioning
consists in an equivalent reformulation of the original linear system (3.1)
¯ = b̄
Ãx
(3.2)
where the new matrix ï has “better” properties than the matrix à = P T AT AP
from the original system (3.1).
Suppose, that we apply preconditioning, that is we change the variables x
using a nonsingular matrix W ∈ IRn×n
x̄ = W −1 x.
Then the problem (2.2) is equivalent to the following problem
min ||Āx̄ − b||22 , s.t. B̄ x̄ = 0,
x̄
(3.3)
Covariance Matrices for Large Scale Parameter Estimation Problems
7
in the sense, that if x̄∗ solves the problem (3.3) then x∗ = W x̄∗ solves the problem
(2.2). Here Ā = AW, B̄ = BW. Furthermore, the solution of problem (3.3) is
equivalent to solving the linear system
MT ĀT ĀMy = MT ĀT b
(3.4)
where M is a projector onto Ker B̄: if y ∗ is a solution of system (3.4) then x̄∗ =
My ∗ solves the problem (3.3) and x∗ = W x̄∗ is a solution to problem (2.2).
We consider two types of projectors M onto null space of B̄ : M = P̄ and
M = W −1 PW where P and P̄ are orthogonal projection operators onto subspaces
Ker B and Ker B̄ respectively. These types of projector operators generate two
types of preconditioning for system (3.1):
• Preconditioning of type I corresponds to solving system (3.4) with M = P̄,
i.e. the system
P̄ T W T AT AW P̄y = P̄ T W T AT b.
(3.5)
• Preconditioning of type II corresponds to solving system (3.4) with M =
W −1 PW , i.e. the system
W T P T AT APW y = W T P T AT b.
(3.6)
3.1. Preconditioning of type I
3.1.1. Conjugate gradient method. Setting A = ĀP̄, where P̄ is an orthogonal
projector onto null space of the matrix B̄ we may apply the Algorithm CG(A, b)
directly to solve the problem (3.3). Note, that now in the step 2.1 Apk = ĀP̄pk =
Āpk and AT Apk = P̄ ĀT Āpk = ĀT Āpk − B̄ T q ∗ , where q ∗ = arg min ||B̄ T q −
q
ĀT Āpk ||22 .
Suppose, that the Algorithm CG(A, b) with A = ĀP̄ results after m̄ iterations
in the vectors xm̄ , p1 , ..., pm̄ . Then the solution of the problem (3.3) and the
corresponding covariance matrix are computed as x̄∗ = xm̄ , C̄ = P̄ diag (γ̄k , k =
1, ..., m̄)P̄ T where P̄ = (p1 , p2 , ..., pm̄ ), γ̄k = 1/||Āpk ||22 , k = 1, ..., m̄.
Lemma 3.1. The solution x∗ of the problem (2.2) and the corresponding covariance
matrix C are computed as x∗ = W x̄∗ , C = W C̄W T .
Proof. It follows from the Lemma 2.1 that the matrix C̄ solves the system
T
C̄
II
Ā Ā B̄ T
=
S̄
0
B̄
0
T
W 0
C̄
II
W
0
⇒
K
=
S̄
0
0 II
0
II
T
W C̄
II
W
0
⇒
K
=
S̄
0
0
II
while the covariance C solves the system (2.5). Hence
−1 W C̄
II
II
C
WT 0
= K−1
= K−1
W −T =
W −T
S̄
0
0
S
0
II
8
E. Kostina and O. Kostyukova
It follows from the last equation that W C̄ = CW −T , S̄ = SW −T . Thus, C =
W C̄W T , S = S̄W T .
The Algorithm CG(A, b) for problem (3.3) makes use of the matrix products AW and BW , which is not always reasonable, see [5]. Using the variable
transformation
(old)
pk
(old)
= W −1 pk , rk
(old)
= W T rk , xk
= W −1 xk , k = 1, 2, ...,
and matrix Q = W W T it is not difficult to rewrite the Algorithm CG(ĀP̄,b) without
carrying out the variable transformation explicitly.
Algorithm Preconditioned CG-I(A, B, b, Q) (for solving problem (2.2))
Step 1: (Initialization)
x1 = 0, r1 = AT b − B T q1 , where q1 solves
min ||B T q − AT b||2Q , p1 = Qr1 .
q
Step 2: For k = 1, 2, 3, . . . repeat steps 2.1–2.2.
2.1: (Update)
1.
αk = rkT Qrk /(pTk AT Apk )
2. xk+1 = xk + αk pk
3. rk+1 = rk − αk (AT Apk − B T qk )
where qk solves min ||B T q − AT Apk ||2Q ,
q
T
4.
βk = rkT Qrk /(rk−1
Qrk−1 )
5. pk+1 = Qrk+1 + βk pk
2.2: (Test for convergence)
If ||rk || ≤ Tol , then STOP.
In this algorithm and in what follows ||s||2Q := sT Qs.
Lemma 3.2 (Computation of covariance matrix with preconditioned CG). Suppose
that the Algorithm PCG-I(A, B, b, Q) converges after m̄ iterations and we get
the vectors xm̄ and p1 , ..., pm̄ . Then the solution x∗ of the problem (2.2) and the
corresponding covariance matrix are given by
x∗ = xm̄ , C = P diag (γk , k = 1, ..., m̄)P T ,
where P = (p1 , ..., pm̄ ), γk = 1/||Apk ||22 , k = 1, ..., m̄.
3.1.2. LSQR method. Similarly to CG method, setting A = ĀP̄, where P̄ is an
orthogonal projector onto null space of the matrix B̄ we may apply the Algorithm
LSQR(A, b) directly to solve the problem (3.3). Note, that now in the step 2.1
Avk = ĀP̄vk = Āvk and AT uk+1 = P̄ ĀT uk+1 = ĀT uk+1 − B̄ T q ∗ , where q ∗ =
arg min ||B̄ T q − ĀT uk+1 ||22 .
q
Suppose, that the Algorithm LSQR(A, b) with A = ĀP̄ results after m̄ iterations in the vectors xm̄ , d1 , ..., dm̄ . Then the solution of the problem (3.3) and
the corresponding covariance matrix are as follows x̄∗ = xm̄ , C̄ = D̄D̄T where
D̄ = (d1 , d2 , ..., dm̄ ).
Covariance Matrices for Large Scale Parameter Estimation Problems
9
The solution x∗ of the problem (2.2) and the corresponding covariance matrix
C is computed according to Lemma 3.1: x∗ = W x̄∗ , C = W C̄W T .
As before using the variable transformation
(old)
pk
(old)
= W −1 pk , vk
(old)
= W T vk , xk
(old)
= W −1 xk , dk
= W −1 dk , k = 1, 2, ...,
and matrix Q = W W T we rewrite the Algorithm LSQR( ĀP̄, b) without carrying
out the variable transformation explicitly.
Algorithm Preconditioned LSQR-I(A, B, b, Q) (for solving problem (2.2))
Step 1: (Initialization)
β1 u1 = b, α1 v1 = AT b − B T q0 ,
where q0 solves the problem min ||B T q − AT b||2Q ,
q
p1 = Qv1 , x0 = 0, φ̄1 = β1 , ρ̄1 = α1
Step 2: For k = 1, 2, 3, . . . repeat steps 2.1–2.4.
2.1: (Continue the bidiagonalization)
1. βk+1 uk+1 = AQvk − αk uk
2. αk+1 vk+1 = AT uk+1 − B T qk , where qk solves
the problem min ||B T q − AT uk+1 ||2Q
q
2.2: as in Algorithm LSQR(A, b)
2.3: (Update)
1.
dk = (1/ρk )pk
2.
xk = xk−1 + (φk /ρk )pk
3. pk+1 = Qvk+1 − θk+1 pk .
2.4: (Test for convergence)
If ||rk || := ||Axk − b|| ≤ Tol , then STOP.
In this algorithm, the parameters βi and αi are computed in such a way, that
||ui ||2 = ||vi ||Q = 1.
Lemma 3.3 (Computation of covariance matrix with preconditioned LSQR). Suppose that the Algorithm PLSQR-I(A, B, b, Q) converges after m̄ iterations and we
get the vectors xm̄ and d1 , ..., dm̄ . Then the solution x∗ of the problem (2.2) and
the corresponding covariance matrix are given by
x∗ = xm̄ , C = DDT , D = (d1 , ..., dm̄ ).
3.2. Preconditioning of type II
3.2.1. CG method. Setting A = APW, where P is an orthogonal projector onto
null space of the matrix B we may apply the Algorithm CG(A, b) directly to solve
the system (3.6). Note, that now in the step 2.1 we need to compute Apk = Auk ,
where
uk := PW pk = W pk − B T q ∗ , q ∗ = arg min ||B T q − W pk ||22 ,
(3.7)
q
and AT Apk = W T PAT Auk , where PAT Auk = AT Auk −B T q ∗ with q ∗ = arg min ||B T q−
q
AT Auk ||22 , which means that we have to solve twice the lower level problem.
10
E. Kostina and O. Kostyukova
Suppose that after applying the Algorithm CG(APW , b) we get vectors and
numbers
xm̄ , u1 , ..., um̄ , α1 , ..., αm̄ ,
(3.8)
(see (3.7) for definition of vectors uk ). Let us show, how we can use this information
in order to compute the solution x∗ of the problem (2.2) and the corresponding
matrix C.
Lemma 3.4. Suppose that the Algorithm CG(APW , b) converges after m̄ iterations
and we get vectors and numbers (3.8). Then the solution x∗ of the problem (2.2)
and the corresponding covariance matrix are given by
x∗ =
m̄
X
αk uk , C = U diag (γk , k = 1, ..., m̄)U T ,
(3.9)
k=1
where U = (u1 , ..., um̄ ), γk = 1/||Auk ||22 , k = 1, ..., m̄.
Proof. By construction, y ∗ = xm̄ solves the system (3.6). Then x̄∗ = My ∗ with
M = W −1 PW solves the problem (3.3) and x∗ = W x̄∗ is a solution to problem
(2.2). Hence,
x∗ = W My ∗ = PW y ∗ =
m̄
X
αk PW pk =
k=1
m̄
X
αk uk .
k=1
By properties of CG method the vectors
pk , k = 1, ..., m̄,
(3.10)
possess the properties:
a1) the vectors (3.10) are linearly independent;
a2) pk = W p̃k , where p̃k ∈ Ker B, k = 1, ..., m̄;
a3) the vectors (3.10) are W T PAT APW -conjugate:
= 0 if i 6= j,
pi W T PAT APW pj
6= 0 if i = j.
Let us show, that the vectors (see (3.7))
uk = PW pk , k = 1, ..., m̄,
possess similar properties:
b1) the vectors (3.11) are linearly independent;
b2) the vectors (3.11) form a basis of the null space of B;
b3) the vectors (3.11) are AT A-conjugate:
= 0 if i 6= j,
ui AT Auj
6= 0 if i = j.
(3.11)
Covariance Matrices for Large Scale Parameter Estimation Problems
11
We show first that the Property b1) holds true. Suppose the contrary: then there
exist a vector l ∈ IRm̄ , such that (u1 , ..., um̄ )l = 0, l 6= 0, or equivalently
PW (p1 , ..., pm̄ )l = 0, l 6= 0.
It follows from a2) that
PW W T (p̃1 , ..., p̃m̄ )l = 0, l 6= 0.
(3.12)
Multiplying (3.12) with lT (p̃1 , ..., p̃m̄ )T and taking to account that p̃k ∈ Ker B
(and hence P p̃k = p̃k ), we get
lT (p̃1 , ..., p̃m̄ )T W W T (p̃1 , ..., p̃m̄ )l = 0.
Since W is nonsingular, the last equality yields
(p̃1 , ..., p̃m̄ )l = 0, l 6= 0,
which contradicts the fact, that the vectors p̃1 , ..., p̃m̄ are linearly indepent (see
Properties a1) and a2)). Hence, the Property b1) holds true.
The Property b2) follows from b1) and the fact, that m̄ is the dimension of
the null space of B. The Property b3) follows immideately from a3).
The computation of the covariance matrix (3.9) follows from the properties
b1)–b3) and [3].
It is easy to modify the Algorithm CG(APW , b) in order to compute recursively the vectors (3.11) and the vector x∗ . Further, we modify the algorithm such
that it makes use of the matrix Q = W W T :
Algorithm Preconditioned CG-II(A, B, b, Q) (for solving problem (2.2))
Step 1: (Initialization)
x1 = 0, r1 = AT b − B T q1 , where q1 solves
min ||B T q − AT b||22 , p1 = Qr1 .
q
Step 2: For k = 1, 2, 3, . . . repeat steps 2.1–2.2.
2.1: (Update)
1.
uk = pk − B T q̃k
where q̃k solves min ||B T q − pk ||22 ,
q
2.
3.
4.
αk = rkT Qrk /(uTk AT Auk )
xk+1 = xk + αk uk
rk+1 = rk − αk (AT Auk − B T qk )
where qk solves min ||B T q − AT Auk ||22 ,
q
T
5.
βk = rkT Qrk /(rk−1
Qrk−1 )
6. pk+1 = Qrk+1 + βk pk
2.2: as in Algorithm CG-I(A, B, b, Q)
Note, that in Step 2.1 we have to solve twice the lower level problem.
12
E. Kostina and O. Kostyukova
Lemma 3.5 (Computation of covariance matrix with preconditioned CG). Suppose,
that the Algorithm PCG-II(A, B, b, Q) converges after m̄ iterations and we get the
vectors xm̄ and u1 , ..., um̄ . Then the solution x∗ of the problem (2.2) and the
corresponding covariance matrix are given by
x∗ = xm̄ , C = U diag (γk , k = 1, ..., m̄)U T ,
where U = (u1 , ..., um̄ ), γk = 1/||Auk ||22 , k = 1, ..., m̄.
3.2.2. LSQR method. As in previous subsubsection, setting A = APW, where
P is an orthogonal projector onto null space of the matrix B we may apply the
Algorithm LSQR(A, b) directly to solve the system (3.6). Note, that now in the
step 2.1 we need to compute Avk = APW vk , where PW vk = W vk − B T q ∗ ,
q ∗ = arg min ||B T q − W vk ||22 , and AT uk+1 = W T PAT uk+1 , where PAT uk+1 =
q
AT uk+1 −B T q ∗ , where q ∗ = arg min ||B T q −AT uk+1 ||22 , which means that we have
q
to solve twice the lower level problem.
Suppose that after applying the Algorithm LSQR(APW , b) we get vectors
xm̄ , d1 , ..., dm̄ ; v1 , ..., vm̄ . Let us show, how we can use this information in order
to compute the solution x∗ of the problem (2.2) and the corresponding matrix C.
By construction, y ∗ = xm̄ solves the system (3.6). Then x̄∗ = My ∗ with
M = W −1 PW solves the problem (3.3) and x∗ = W x̄∗ is a solution to problem
(2.2). Hence,
x∗ = W My ∗ = PW y ∗ =
m̄
X
φk
k=1
ρk
PW pk .
Moreover, using properties of LSQR method [7, 8] (see also the proof of the Theorem 2.4) one can show that
C = PW D̃D̃T W T P, where D̃ = (d1 , ..., dm̄ ).
As PW dk =
1
ρk PW pk
in order to compute x∗ and C we need vectors
PW pk ,
k = 1, ..., m̄.
(3.13)
Since p1 = v1 , pk = vk − θk pk−1 , k = 2, 3, ..., and the vectors PW vk are have
calculated in the algorithm (see step 2.1 and remarks at the beginning of this
subsection), it is easy to modify the Algorithm LSQR(APW , b) in order to compute
recursively the vectors (3.13) and the vector x∗ . Further, we modify the algorithm
such that it makes use of the matrix Q = W W T :
Algorithm Preconditioned LSQR-II(A, B, b, Q) (for solving problem (2.2))
Step 1: (Initialization)
β1 u1 = b, α1 v1 = AT u1 − B T q0 ,
where q0 solves min ||B T q − AT u1 ||22 ,
q
p1 = Qv1 , g0 = 0, x0 = 0,
φ̄1 = β1 , ρ̄1 = α1 , θ1 = 0
Covariance Matrices for Large Scale Parameter Estimation Problems
13
Step 2: For k = 1, 2, 3, . . . repeat steps 2.1–2.4.
2.1: (Continue the bidiagonalization)
1.
v̂k = Qvk − B T q̂k ,
where q̂k solves min ||B T q − Qvk ||22 ,
q
2. βk+1 uk+1 = Av̂k − αk uk
3. αk+1 vk+1 = AT uk+1 − B T qk − βk+1 vk ,
where qk solves min ||B T q − AT uk+1 ||22 ,
q
4.
gk = v̂k − θk gk−1
2.2: as in Algorithm LSQR(A, b)
2.3: (Update)
1.
dk = (1/ρk )gk
2.
xk = xk−1 + (φk /ρk )gk
3. pk+1 = Qvk+1 − θk+1 pk .
2.4: as in Algorithm LSQR-I(A, B, b, Q)
In this algorithm, the parameters βi and αi are computed in such a way, that
||ui ||2 = ||vi ||Q = 1. Note, that in Step 2.1 we have to solve twice the lower level
problem: to compute PQvk and PAT uk+1 .
Lemma 3.6 (Computation of covariance matrix). Suppose that Algorithm PLSQRII(A, B, b, Q) converges after m̄ iterations and we get the vectors xm̄ and d1 , ..., dm̄ .
Then the solution x∗ of the problem (2.2) and the corresponding covariance matrix
are given by
x∗ = xm̄ , C = DDT , D = (d1 , ..., dm̄ )
3.2.3. Remarks.
Remark 3.7. In the Algorithms PCG-I(A, B, b, Q), and PCG-II(A, B, b, Q), PLSQRI(A, B, b, Q) and PLSQR-II(A, B, b, Q), matrix products like AW and BW are
never explicitly performed. Only the action of applying the preconditioner solver
operation Q := W W T to a given vector need be computed. This property is important to systems resulted from PDE discretization. In this sense these algorithms
are more preferable then the corresponding Algorithms CG(ĀP̄, b), CG(APW , b),
LSQR(ĀP̄, b) and LSQR(APW , b).
Remark 3.8. Obviously, the vectors xk , k = 1, 2, ... generated by the Algorithm
M(ĀP̄, b) (or by the Algorithm M(APW , b) ) in an exact arithmetic are connected
with the vectors x̄k = xk , k = 1, 2, ... generated by the Algorithm PM-I(A, B, b, Q)
(or by the Algorithm PM-II(A, B, b, Q)) as follows x̄k = W xk , k = 1, 2, .... Here
M=CG or M=LSQR. Thus, these vector sequences maybe considered as the same
except for a multiplication with W . But we can show that the Algorithms PM-I(A,
B, b, Q) and PM-II(A, B, b, Q) generate in general completely different vector
sequences.
Remark 3.9. Algorithm CG-I(A, B, b, Q) and Algorithm 3.1 (preconditioned CG
in expanded form) from [4] are the same.
14
E. Kostina and O. Kostyukova
4. Conclusions
For solving constraint parameter estimation and optimal design problems, we need
the knowledge of covariance matrix of the parameter estimates and its derivatives. Hence, development of effective methods for presentation and computation
of the covariance matrix and its derivatives, based on iterative methods, are crucial for practical applications. In the paper, we have shown that solving linearized
constrained least squares problems by Krylov subspace methods we get as a byproduct practically for free these matrices. The forthcoming research will be devoted to numerical aspects including choice of effective preconditioners and effective implementation of the described methods for parameter estimation and design
of optimal parameters in processes defined by partial differential equations.
References
[1] A. Battermann and E. W. Sachs. Block preconditioners for KKT systems in PDEgoverned optimal control problems. In K. H. Hoffman, R. H. W. Hoppe, and V. Schulz,
editors, Fast solution of discretized optimization problems, 1–18. ISNM, Int. Ser. Numer. Math. 138, 2001.
[2] H. G. Bock. Randwertproblemmethoden zur Parameteridentifizierung in Systemen
nichtlinearer Differentialgleichungen, volume 183 of Bonner Mathematische Schriften.
University of Bonn, 1987.
[3] H. G. Bock, E. A. Kostina, and O. I. Kostyukova. Conjugate gradient methods for
computing covariance matrices for constrained parameter estimation problems. SIAM
Journal on Matrix Analysis and Application, 29:626–642, 2007.
[4] H. S. Dollar, and A. J. Wathen. Approximate factorization constraint preconditioners
for saddle-point matrices. Siam J. Sci. Comput., 27(5): 1555-1572, 2006.
[5] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, 1995.
[6] E. Kostina, M. Saunders, and I. Schierle. Computation of covariance matrices for constrained parameter estimation problems using LSQR. Technical Report, Department
of Mathematics and Computer Science, U Marburg, 2008.
[7] C. C. Paige and M. A. Saunders, LSQR: An algorithm for sparse linear equations and
sparse least-squares, ACM Trans. Math. Softw., 8(1):43 – 71, 1982.
[8] C. C. Paige and M. A. Saunders, LSQR: Sparse linear equations and least-squares,
ACM Trans. Math. Softw., 8(2):195–209, 1982.
[9] T. Rees, H. S. Dollar, and A. J. Wathen. Optimal solvers for PDE-constrained optimization. Technical report RAL-TR-2008-018, Rutherford Appleton Laboratory,
2008.
[10] I. Schierle. Computation of Covariance Matrices for Constrained Nonlinear Parameter Estimation Problems in Dynamic Processes Using iterative Linear Algebra Methods. Diploma thesis, Universität Heidelberg, 2008.
Covariance Matrices for Large Scale Parameter Estimation Problems
Ekaterina Kostina
University of Marburg
Hans-Meerwein-Strasse
35032 Marburg
Germany
e-mail: [email protected]
Olga Kostyukova
Institute of Mathematics
Belarus Academy of Sciences
Surganov Str. 11
220072 Minsk
Belarus
e-mail: [email protected]
15