statistical condition estimation for linear

SIAM J. SCI. COMPUT.
Vol. 19, No. 2, pp. 566–583, March 1998
c 1998 Society for Industrial and Applied Mathematics
015
STATISTICAL CONDITION ESTIMATION FOR LINEAR SYSTEMS∗
C. S. KENNEY† , A. J. LAUB‡ , AND M. S. REESE†
Abstract. The standard approach to measuring the condition of a linear system compresses
all sensitivity information into one number. Thus a loss of information can occur in situations
in which the standard condition number with respect to inversion does not accurately reflect the
actual sensitivity of a solution or particular entries of a solution. It is shown that a new method for
estimating the sensitivity of linear systems addresses these difficulties. The new procedure measures
the effects on the solution of small random changes in the input data and, by properly scaling the
results, obtains reliable condition estimates for each entry of the computed solution. Moreover, this
approach, which is referred to as small-sample statistical condition estimation, is no more costly
than the standard 1-norm or power method 2-norm condition estimates, and it has the advantage of
considerable flexibility. For example, it easily accommodates restrictions on, or structure associated
with, allowable perturbations. The method also has a rigorous statistical theory available for the
probability of accuracy of the condition estimates. However, it gives no estimate of an approximate
null vector for nearly singular systems. The theory of this approach is discussed along with several
illustrative examples.
Key words. conditioning, linear systems
AMS subject classifications. 15A06, 15A12, 65F05, 65F30, 65F35
PII. S1064827595282519
1. Introduction.
1.1. Limitations of standard condition theory. For the problem of solving
linear systems Ax = b with A ∈ Rn×n and b ∈ Rn , the standard theory uses the
condition number κ(A) = kAk kA−1 k for a given consistent matrix norm k · k. Most
condition estimation procedures approximate the 2-norm condition number κ2 (A)
(as in the case of some power-method–based estimates [8]) or the 1-norm condition
number κ1 (A) (as in the Hager–Higham method [20], [22]).
The standard condition number is useful and arises naturally in bounding the
relative error k∆xk/kxk, where
(A + ∆A)(x + ∆x) = b + ∆b
for perturbations ∆A and ∆b of the data A and b, respectively. If k∆Ak/kAk < µ,
k∆bk/kbk < µ, and µκ(A) < kIk, then the relation
∆x = (A + ∆A)−1 (∆b − ∆Ax)
gives [16, p. 82]
kA−1 k
k∆xk
≤
(k∆bk + k∆Ak kxk)
kxk
kIk − kA−1 k k∆Ak
(1)
≤
2µκ(A)
.
kIk − µκ(A)
∗ Received by the editors March 3, 1995; accepted for publication (in revised form) April 25,
1996. This research was supported in part by National Science Foundation grant ECS-9120643, Air
Force Office of Scientific Research grant F49620-94-1-0104DEF, and Office of Naval Research grant
N00014-92-J-1706.
http://www.siam.org/journals/sisc/19-2/28251.html
† Department of Electrical and Computer Engineering, University of California, Santa Barbara,
CA 93106-9560 ([email protected], [email protected]).
‡ College of Engineering, University of California, Davis, CA 95616-5294 ([email protected]).
566
567
STATISTICAL CONDITION ESTIMATES
If x + ∆x is the computed solution to Ax = b using Gaussian elimination with partial
pivoting, then we expect µ to be a small multiple of the relative machine precision
mach [16], [41]. We illustrate some difficulties with the standard approach by looking
briefly at three examples. These examples and subsequent examples in the sequel
were all computed using MATLAB 4.2a on a Sun SPARCstation for which
mach = 2 × 10−52 ≈ 2.22 × 10−16 .
Example 1. Let
1
A=
1−e
1+e
1
,
x=
1
d
,
and b =
1 + d + de
1+d−e
,
where d and e are parameters. The relative error bound in (1) gives a convenient
measure of the sensitivity of the problem Ax = b but this convenience has a price.
Entrywise sensitivity information is lost. In particular, small entries of x can be much
more sensitive than indicated by the norm bound in (1). This example illustrates the
effect. If we take d = 10−5 and e = 10−5 , then κ2 (A) = 4 × 1010 , and by Gaussian
elimination via MATLAB we find
kx − xcomputed k2
= 1.58 × 10−6 ,
kxk2
2mach κ2 (A)
= 1.78 × 10−5 .
1 − mach κ2 (A)
It would be a mistake to think that this indicates about four or five digits of accuracy in
the computed result. In fact, the second entry in the computed result for this example
barely has one digit of accuracy in a relative sense: x2(computed) = 8.8818 × 10−6 ,
which is off by about 11% since the true value of x2 is 10−5 . See [24] and [25] for a
comprehensive treatment of componentwise sensitivity analysis.
LINPACK [13] and LAPACK [1] provide condition estimators that are inherently
norm-based since they depend on routines that estimate kA−1 k and k |A−1 | |v| k, respectively, where v is a vector. Given the ability to reliably and efficiently approximate the vector |A−1 ||v|, a reasonable algorithm for computing an upper bound on
componentwise forward error can be obtained. For example, suppose
Ax = b
and
(A + ∆A)(x + ∆x) = b.
The componentwise forward error is bounded above by
|∆x| ≤ |A−1 | |∆A| |x + ∆x|.
(2)
Of course, this approach requires that we know a bound on |∆A|, and it does not
necessarily provide tight bounds if ∆A has special structure. Suppose that the linear
system
10.1 −1
9.1
(3)
A=
and
b=
−10
1
−9
is perturbed by
∆A =
∆a11
−∆a11
∆a12
−∆a12
,
where
|∆A| ≤ mach
1
1
1
1
.
568
C. S. KENNEY, A. J. LAUB, AND M. S. REESE
The special form of the perturbation may arise from constraints within a physical
system being modeled, for example. The linear system (3) is not very sensitive to
perturbations of this type, so the standard 2-norm condition number κ2 (A) ≈ 103 is
somewhat misleading; in fact, the absolute forward error of the system is bounded
2mach T
] . The approach of (2) can be somewhat pessimistic, and in this
above by [0, 1−
mach
case gives a forward error bound of approximately [10−13 , 10−12 ]T .
Example 2. Let
1 d
0
1
A=
, x=
, and b =
,
2 d
1/d
1
where d is a parameter. If we take d = 1010 , then κ2 (A) = 2 × 1010 . However, solving
Ax = b in MATLAB gives the exact solution. In view of the large condition number
of A we might well be puzzled by the high accuracy in the computed result.
Example 3. Suppose that we are interested in the sensitivity of the inverse of A
but that we only allow structured perturbations ∆A. For example, suppose that A
and ∆A are given by
1 d
0 ∆d
A=
and ∆A =
,
0 −1
0 0
where d and ∆d are parameters. The condition number of A with respect to inversion
for this example is given approximately by κ2 (A) ≈ d2 for d large. However, this
example has some special properties that make it very insensitive to perturbations
of the form ∆A above: A−1 = A and (A + ∆A)−1 = A−1 + ∆A. That is, A is
perfectly well conditioned with respect to perturbations of this type since the size of
the perturbation in the inverse is exactly the same as the size of the perturbation in
A.
These three examples thus illustrate some typical limitations of standard approaches to assessing the condition of solving a linear system Ax = b:
1. Some standard condition estimators do not give entrywise sensitivity information.
2. The computed solution x may be determined rather accurately even though
κ2 (A) is large; most methods of reporting the condition of solving Ax = b do
not always give an indication of when this is occurring (see [6, section 9]).
3. The effects of any structural restrictions on perturbations of the input data
may be ignored, and thus standard condition numbers may give overly conservative estimates of the sensitivity of structured problems.
All of these limitations are addressed by Kenney and Laub [27] in a new approach
to estimating the condition of any computed result that depends smoothly on its input values; for simplicity, computed results of this type are referred to as general
matrix functions. This new procedure evaluates the matrix function with a small
random perturbation of the input argument and assesses the condition based on the
effect of the perturbation in the computed result. The success of this rather natural
approach is based on properly scaling the results to account for the action of random
inner products in p-dimensional space, where p is the number of input arguments of
the matrix function. Surprisingly, just a few (usually one) of these random perturbations suffice to accurately gauge the condition of the computed result. Because
of this, the method is referred to as the small-sample statistical condition estimation
(SCE) method. For general matrix functions, the extra function evaluation needed for
condition estimation doubles the computational effort. For linear systems, however,
STATISTICAL CONDITION ESTIMATES
569
the factorizations performed during the original solution of Ax = b reduce the cost
of the statistical condition estimate to the same level as that of standard condition
estimation procedures.
Only real matrices are treated here; extensions to complex matrices are obvious.
The primary goal of this paper is to illustrate the application of the SCE method
to the solution of linear systems. We do not discuss any of the global methods of
Demmel [10], [11] and Trefethen [39], although the methods we present are pertinent
to issues discussed there. For related studies of condition estimation for linear systems
see Chandrasekaran and Ipsen [6] and Lee [32]. Our approach is statistical, and as
such it differs from Skeel’s fundamental deterministic work on the condition of linear
systems [35]. Other stochastic approaches also do exist. For example, Stewart’s work
on stochastic perturbations [36] assumes a restricted class of perturbations of the
matrix function, and it is an inherently normwise analysis. The class of perturbations
allowed in SCE is much more general, and either componentwise or normwise results
can be obtained.
The SCE method also differs significantly from the statistical condition estimation
method of Chatelin in [7]. There, several perturbations of the input data are obtained
and the matrix function of interest is used to produce images of the perturbed data.
The standard deviation of these images is then normalized by both the size of the
computed solution and the size of the perturbation, resulting in a statistical estimate
of the condition of the matrix function. By contrast, the SCE method is accompanied
by a rigorous statistical foundation that provides an exact normalization constant and
an explicit bound on the probability of accuracy of the condition estimate.
The theoretical analysis of Chandrasekaran and Ipsen in [6] is representative of
the state of the art in deterministic approaches to estimating componentwise condition
for the solution of linear systems. Much of their analysis is, in fact, directed more to
the general problem of entrywise condition estimates for linear least squares problems
minx kAx − bk2 , where A is m × n with m ≥ n. Their approach, specialized to square
nonsingular linear systems, is relatively inefficient. The application of SCE to linear
least squares problems is discussed in [29].
In the next section we outline the theory of the small-sample statistical condition
method. Section 3 discusses the application of SCE to linear systems. Section 4 looks
at estimating the effects of relative perturbations, and section 5 discusses applying the
SCE method to problems with restrictions on or structure associated with allowable
perturbations. Section 6 addresses the componentwise sensitivity of individual entries
in a matrix inverse and, finally, section 7 provides some comparison with standard
condition estimation procedures.
1.2. Notation. We define these operations for a matrix A = [a1 , a2 , . . . , an ] =
[aij ] ∈ Rn×n , aj ∈ Rn :
2
vec(A) = [aT1 , aT2 , . . . , aTn ]T ∈ Rn .
2
If v = [vk ] ∈ Rn , then A = unvec(v) sets the entries of A to aij = vi+(j−1)n .
For q ∈ R, |A|q = [|aij |q ] ∈ Rn×n . For q = 1, this reduces to |A| = [|aij |] ∈ Rn×n .
2. Review of SCE. A function is locally sensitive if small changes in its argument can cause large changes in the value of the function. This leads us to ask
whether local sensitivity can be detected reliably by making small random changes in
the argument and looking for large relative changes in the function. The work in [27]
gives an affirmative answer to this question and provides a firm theoretical basis for
assessing the probability of accuracy in the resulting condition estimate.
570
C. S. KENNEY, A. J. LAUB, AND M. S. REESE
The ideas behind this procedure are well illustrated for functions f : Rp → R.
In the following we assume that f is at least twice continuously differentiable. Local
sensitivity can be measured by the norm of the gradient of f which, for convenience,
we denote by the row vector v T :
∂f (x)
∂f (x)
T
,...,
.
v = ∇f (x) =
∂x1
∂xp
We may expand f in a Taylor series about a point x ∈ Rp :
f (x + hz) = f (x) + hv T z + O(h2 ),
(4)
P 2
where h is a small positive number and z ∈ Rp has unit norm: kzk2 =
zi = 1. It is
clear from (4) that if the norm of the gradient of f is large, then a small perturbation in
x can yield a large change in f . Alternatively, we can see from (4) that the inequality
|f (x + hz) − f (x)| ≤ hkvk is true up to first order in h. This inequality points to
the real utility of the local condition number kvk: for small perturbations it is a firstorder bound on the magnification factor between argument errors (of norm h) and
the resulting function value error (of norm less than or equal to hkvk).
Hence, assuming the gradient is not known explicitly (as is usually the case),
we are faced with the problem of how to estimate its norm as a measure of local
sensitivity. Equation (4) is the key to estimating the norm of v. From (4) we see that
by evaluating f at x+hz we can use the quotient (f (x+hz)−f (x))/h to approximate
the inner product v T z between the gradient and the vector z. If z is selected uniformly
and randomly from the unit p-sphere Sp−1 (henceforth denoted z ∈ U(Sp−1 )), then it
is known (see discussion in [27]) that the expected value of |v T z| is equal to the norm
of v times a scaling factor ωp , called the Wallis factor, that depends only on p:
E(|v T z|) = ωp kvk,
(5)
where ω1 = 1, ω2 =
2
π,
and, for p > 2,
( 1·3·5···(p−2)
ωp =
(6)
2·4·6···(p−1)
2 2·4·6···(p−2)
π 3·5·7···(p−1)
for p odd,
for p even.
The Wallis factor can be accurately approximated [27] by
s
2
(7)
.
ωp ≈
π(p − 12 )
The Newton quotient
dz ≡
f (x + hz) − f (x)
h
satisfies
(8)
dz = v T z + O(h).
From (5) and (8) we see that the true local condition number kvk is equal to the
expected value of |dz|/ωp plus a term of order h. We can typically take h small enough
that E(|dz|)/ωp is a good approximation to kvk. In the linear equation case considered
STATISTICAL CONDITION ESTIMATES
571
in this paper, the Newton quotient approximation can be avoided by evaluating the
Fréchet derivative directly via the LU factors of A. It is shown in section 2 of [27]
that the condition estimator
ν ≡ |v T z|/ωp
is first order in the sense that the probability of a relative error in the estimate is
inversely proportional to the size of the error. That is, for γ > 1 we have
2
1
+O
.
Pr(kvk/γ ≤ ν ≤ γkvk) ≥ 1 −
πγ
γ2
Additional function evaluations can improve the estimation procedure. Suppose that we obtain estimates ν1 , ν2 , . . . , νk ∈ R corresponding to orthogonal vectors
z1 , z2 , . . . , zk ∈ Sp−1 whose span S is uniformly and randomly selected from the space
of all k-dimensional subspaces of Rp (details in [27, section 3]). An easy way to obtain the vectors zi is to select z̃1 , z̃2 , . . . , z̃k with z̃i ∈ U(Sp−1 ) and then use a QR
decomposition to produce an orthonormal basis {z1 , z2 , . . . , zk } for their span. The
expected value of the norm of the projection of v onto S is
q
q
ωp
T
2
T
2
2
2
(9) E
|v z1 | + · · · + |v zk | = E
(ωp ν1 ) + · · · + (ωp νk ) =
kvk,
ωk
where ωp and ωk are defined as in (6). We define the subspace condition estimator
q
ωk
|v T z1 |2 + · · · + |v T zk |2 ,
ν(k) ≡
ωp
which we see from (9) has expected value kvk. The analysis in [27] shows that this
estimator has kth-order accuracy. By this we mean that a relative error of size γ in
the condition estimate occurs with probability proportional to γ −k . For example,
π
,
4 γ2
32
Pr(kvk/γ ≤ ν(3) ≤ γkvk) ≈ 1 −
,
3 π2 γ 3
81π 2
.
Pr(kvk/γ ≤ ν(4) ≤ γkvk) ≈ 1 −
512 γ 4
Pr(kvk/γ ≤ ν(2) ≤ γkvk) ≈ 1 −
As an illustration, for k = 3, the estimator ν(3) has probability 0.9989 of being
within a relative factor of 10 of the true condition number kvk. In general, a relative
accuracy within an order of magnitude is sufficient for estimating the local condition
of a function.
These higher order estimators require extra function evaluations, and this may
be costly. However, there are situations in which extra function evaluations are very
cheap compared to the initial function evaluation. For example, this is the case for
any function that can be evaluated via a Newton procedure in which an initial guess
of the function value is refined by an iterative procedure. Calculating f (x) may take
many Newton steps, but usually only one Newton step is needed to find f (x+hz) since
f (x) serves as a good initial guess. The solution of algebraic Riccati equations falls
into this class of problems. The solutions of linear systems also admit very efficient
statistical condition estimates. This is discussed below.
572
C. S. KENNEY, A. J. LAUB, AND M. S. REESE
The statistical condition estimation procedure outlined above applies to functions
f : Rp → R, i.e., to scalar-valued functions. What about vector-valued and matrixvalued functions? Both vector-valued and matrix-valued functions can be formulated
as maps f : Rp → Rq . The simplest way to extend the SCE method is to view
each entry of f as a scalar-valued function. In this way one extra function evaluation
f (x + hz) provides a local condition estimate for each entry of the computed result.
Entrywise condition estimates are of interest in many scientific problems (such as
radioactive decay chains) in which entries of small value (the fast decay isotopes)
need to be determined accurately.
Denote the gradient of the ith entry of f by viT . The ith Newton quotient satisfies
fi (x + hz) − fi (x)
= viT z + O(h).
h
(10)
Thus one extra function evaluation of the form f (x + hz) is sufficient to provide
condition estimates for all the entries of f .
This may be interpreted in another way. The extension of Taylor’s theorem to
vector-valued functions gives
f (x + hz) − f (x)
= Dz + O(h),
h
(11)
where the matrix D is the Fréchet derivative of f at x. Comparing (10) with (11)
shows that the gradient vector viT of the ith entry of f is just the ith row of the
Fréchet derivative D:
T
D = [v1 , v2 , . . . , vq ] .
See also [6]. In any case, the statistical condition estimate for the ith entry of f takes
the form νi = |(Dz)i |/ωp , where (Dz)i is the ith entry of the vector Dz.
Note that the SCE method estimates the norm of the gradient of the matrix
function, but it does not provide an estimate of the gradient direction. If this direction
is needed [17], [37], [38], then another condition estimation method may be more
appropriate. Research is currently under way to extend the SCE method so that it
will provide an estimate of the gradient direction.
3. Linear systems.
3.1. SCE for general linear systems. Consider linear systems of the form
L(A, X) = B,
where L is linear in A and X, and we are interested in the condition of the problem
of solving for X given A and B. For example, we could be solving a linear system of
the form AX = B or a Lyapunov equation AT X + XA = B.
From the above discussion, it is sufficient for condition estimation to be able to
evaluate the Fréchet derivative (of the map (A, B) 7→ X) at a point Z. (For simplicity
of exposition we have switched to the case of mappings between matrices rather than
mappings between vectors. The two are equivalent as can be seen by writing the
matrix maps in terms of their Kronecker vector counterparts; see [26].) The Fréchet
derivative of the linear system satisfies the same linear system but with a different
right-hand side, an effect that was noted and exploited in [30].
STATISTICAL CONDITION ESTIMATES
573
To see this, suppose that we perturb (A, B) to (A + δ Â, B + δ B̂) and let X + δ X̂
be defined implicitly by
(12)
L(A + δ Â, X + δ X̂) = B + δ B̂.
Here we assume that the original system is uniquely solvable for X and that the
perturbing matrix direction pair (Â, B̂) has Frobenius norm 1. For sufficiently small
δ > 0, unique solvability is retained so that X̂ is well defined.
Expand (12) by linearity, cancel like terms, divide by δ, and take the limit as
δ → 0 to get the Fréchet relation
L(A, X̂) = B̂ − L(Â, X).
We refer to this as the Fréchet relation since the solution X̂ of this relation is the
Fréchet derivative of the map (A, B) 7→ X evaluated in the matrix direction (Â, B̂).
More general affine problems of the form
L(A1 , A2 , . . . , Am , X) = B,
such as the generalized Sylvester equation A1 XA2 + A3 XA4 = B, can be treated in
exactly the same manner to give the Fréchet relation
L(A1 , A2 , . . . , Am , X̂) = B̂ − L(Â1 , A2 , . . . , Am , X)
−L(A1 , Â2 , . . . , Am , X) − · · · − L(A1 , A2 , . . . , Âm , X).
3.2. SCE for Ax = b. If we take B to be a vector b in the analysis of section
3.1 and set
f (A, b) = x = A−1 b
with L(A, x) = Ax, then the Fréchet derivative of f at (A, b) evaluated in the direction
(Â, b̂) is given by the vector
(13)
Df (A, b; Â, b̂) = A−1 (b̂ − Âx).
In practice, we cannot obtain the exact Fréchet derivative since we do not have the
exact solution x. However, condition estimates are usually needed to within only
an order of magnitude, and this level of accuracy is provided comfortably by an
approximate solution.
The following algorithm is based on the Fréchet derivative in (13). It takes the
matrix A ∈ Rn×n
and the vector b ∈ Rn as inputs, and it outputs the relative condition
n
n
vector κrel ∈ R , which is an estimate of the relative sensitivity of each entry of the
computed solution vector x.
ALGORITHM 1 (ONE-SAMPLE CONDITION ESTIMATION FOR x = A−1 b).
1. Let each entry of à and b̃ be selected randomly and independently from a
normal distribution with mean 0 and variance 1 (henceforth, we say that each
entry is in N(0, 1)). Set  = Ã/k[Ã, b̃]kF and b̂ = b̃/k[Ã, b̃]kF .
2. Let p be the number of entries in the matrix [Â, b̂]. Approximate ωp using (7).
3. Calculate the absolute condition vector
1 −1
κabs =
|A (b̂ − Âx)|.
ωp
Let the relative condition vector κrel be the vector κabs divided componentwise
by x, leaving entries of κabs corresponding to zero entries of x unchanged.
574
C. S. KENNEY, A. J. LAUB, AND M. S. REESE
If more accuracy is desired in the condition estimates, then we can use k > 1
function evaluations. Each function evaluation consists of solving Az = w for a
different randomly generated right-hand side w. As in the above algorithm this can
be done efficiently once the initial LU factorization of A has been computed. The
complete algorithm is as follows.
ALGORITHM 2 (MULTIPLE-SAMPLE CONDITION ESTIMATION FOR x = A−1 b).
1. Generate (Ã1 , b̃1 ), (Ã2 , b̃2 ), . . . , (Ãk , b̃k ) with entries in N(0, 1). Orthonormalize the Kronecker vectors corresponding to these matrix-vector pairs. This
can be done by converting each augmented matrix [Ãi , b̃i ] to a vector w̃i with
the vec operation and using QR factorization to get an orthonormal matrix
of column vectors [w1 , w2 , . . . , wk ]. Each wi can then be converted into the
desired augmented matrix [Âi , b̂i ] with the unvec operation.
2. Let p be the number of entries in [Â1 , b̂1 ]. Calculate ωk and ωp using approximation (7).
3. Calculate ui = A−1 (b̂i − Âi x), where x = A−1 b. Calculate the absolute
condition vector
1/2
ωk 2
2
2
κabs =
|u1 | + |u2 | + · · · + |uk | .
ωp
Let the relative condition vector κrel be the vector κabs divided componentwise
by x, leaving entries of κabs corresponding to zero entries of x unchanged.
Let us look at how the SCE approach handles the difficulties encountered in
Example 1. There, the second entry of the computed solution had barely one digit of
accuracy. Algorithms 1 and 2 easily detect this sensitivity in the second component.
Letting r(1) be the relative condition vector defined in Algorithm 1, we find
# " |x1 −x
1(computed) |
8.38 × 10−6
1.12 × 10−6
|x1 |
(1)
and
.
r
=
=
mach
|x2 −x2(computed) |
0.112
1.10
|x2 |
We can compare this with r(2) (from Algorithm 2 with k = 2) and rexact (computed
directly from finite differences for the function f (A, b) = A−1 b):
2.57 × 10−6
3.14 × 10−6
mach r(2) =
and mach rexact =
.
0.337
0.314
That the SCE method gives good results for this problem is not surprising since it
is designed to accurately estimate the condition of each component of the computed
result. What may be surprising is how efficient this approach is. The cost of the
one-sample estimate is half that of a one-cycle power method estimate of κ2 (A) in
which two linear solves (Az = b and AT y = z) are needed.
Before looking at how the SCE method can handle relative perturbations (which
are directly tied to the difficulties encountered in Example 2), we pause to show
that the SCE method can be used to estimate the Frobenius-norm condition number
κF (A) = kAkF kA−1 kF . This is especially useful in very large linear problems (as discussed below) where standard condition estimates can be computationally inefficient.
3.3. Frobenius-norm condition estimation. The work in [19] shows that for
a given matrix M of order n, the expected value of kM bkF (= kM bk2 ; the reason
for using the Frobenius-norm notation for a vector becomes apparent below), with
b ∈ U(Sn−1 ), is equal to ωn kM kF . Letting A−1 play the role of M , we can estimate
575
STATISTICAL CONDITION ESTIMATES
TABLE 1
Condition estimates for three counterexample matrices from Cline and Rew [9].
Matrix
B
C
D
m
128
128
32
κ1
1e05
8e02
7e10
1/rcond
7e02
7e00
3e01
condest
7e04
8e02
7e10
κF
1e05
5e02
3e10
(1)
κsce
2e05
7e02
5e10
(2)
κsce
1e05
6e02
5e10
kA−1 kF by first generating a vector b̃ with entries in N(0, 1) and then normalizing
b̃ to get b. Next solve Az = b so that z = A−1 b = M b. Then kzkF /ωn ≈ kA−1 kF
(see [19] for details; see also Lee [32]). The one-sample (k = 1) SCE Frobenius-norm
(1)
estimate for κF (A) is then given by κsce = kAkF kzkF /ωn .
The two-sample (k = 2) statistical estimate of κF (A) can be found by generating
b̃1 and b̃2 with entries in N(0, 1), and then orthonormalizing them to get b1 and
b2 . Solve Azi = bi for i = 1, 2. Then ω2 kzkF /ωn ≈ kA−1 kF , where z = [z1 , z2 ].
(2)
The two-sample SCE Frobenius-norm estimate for κF (A) is then given by κsce =
ω2 kAkF kzkF /ωn . For estimates with k > 2, see [19].
Let us illustrate the type of results obtained from applying a single run of this
procedure to a group of examples taken from [9] for which the LINPACK condition
estimator gives poor condition estimates. In the following, m is a parameter. Let




1 −1 −2m
0
−2
1 1 − 2m−2

 0
1
m
−m
,
−m−1  ,
m−1
C= 0
B=
 0
1 m + 1 −(m + 1) 
0
0
1
0
0
0
m


1
0 ···
0


.
 −1

1 ..



.
.
.
..
.. 
.
D =  −1 −1 . .





..
.. . .

.
.
.
1
0 
−1 −1 · · · −1 −1 m×m
For reference, columns 3–5 of Table 1 give the 1-norm condition number κ1 and
MATLAB’s 1-norm estimates 1/rcond and condest. Columns 6–8 of Table 1 give
the Frobenius-norm condition number κF and the SCE Frobenius-norm condition
estimates for k = 1 and k = 2. These estimates are close to the true Frobenius-norm
condition numbers for these examples.
Deterministic condition estimators suffer from the drawback that specific examples can be produced to cause them to fail, i.e., give arbitrarily poor estimates. By
contrast, statistically based methods cannot be so fooled. In fact, an analysis is
available for SCE that predicts the probability of encountering a poor estimate. This
probability can be made arbitrarily small by taking sufficiently many random samples.
Naturally, statistically based methods (including SCE) are only as nondeterministic
as their random number generators permit. For an alternative analysis of the 2-norm
power method condition estimates with a random starting vector, see [31]. See also
Higham [21] and Dixon [12].
3.4. Condition estimates for large linear systems. Because of computational costs, most large linear systems of the form Ax = b are solved iteratively by
576
C. S. KENNEY, A. J. LAUB, AND M. S. REESE
a variety of methods, including Jacobi and Gauss–Seidel iteration, successive overrelaxation (SOR) [3], [40], [43], and multigrid methods [4], as well as by methods that
rely on special properties of A such as the conjugate gradient method for symmetric
positive definite matrices.
A common feature of these methods is that the initial convergence from a starting vector x0 is usually rapid and then slows in a neighborhood of the exact solution
[4]. This phenomenon is exactly what we need to make efficient statistical condition
estimates. Conversely, condition estimation methods that rely on a sequence of accurate solutions (such as the power method or related 1-norm methods) are at a big
disadvantage since the LU factors of A are not available and each accurate solution
requires as much effort as solving the original system.
To explain in more detail, suppose that we are working with a “splitting” iterative
method of the form
xi+1 = b̂ + M xi .
(In the simplest splitting methods, A = D + N , where D is easily inverted; e.g., D
is diagonal or upper triangular, and b̂ = D−1 b with M = −D−1 N .) In this case the
error vector ei ≡ x − xi satisfies
ei = M ei−1 = · · · = M i e0 .
If M has a set of distinct eigenvalues λ1 , . . . , λn ordered by decreasing magnitude
(|λj | ≥ |λj+1 |), and if the initial error is written as e0 = α1 v1 + · · · + αn vn , where vj
is the eigenvector associated with λj , then
ei = α1 λi1 v1 + · · · + αn λin vn .
The case of nondistinct eigenvalues can be analyzed in a similar manner, and we
conclude that a necessary and sufficient condition for convergence (i.e., kei k → 0 for
all possible x0 ) is that the spectral radius ρ(M ) ≡ max |λj | < 1. In many cases we
have a large spread in the magnitudes of the eigenvalues of M ; this accounts for the
initial rapid convergence mentioned above. Even in the absence of this, however, we
can still get very efficient condition estimates using the SCE method. As the iteration
continues the error tends to decrease linearly with kei+1 k ≈ ρ(M )kei k. In the SCE
method we need only to solve Az = b (for a random b) accurately enough to have
the norm of the iterate zi approximately equal to the norm of the true solution z. If
we start with z0 = 0, then e0 = z and kei k ≈ ρi ke0 k = ρi kzk. Thus, to approximate
kzk to within, say, 90% (which is more than sufficient for the purposes of condition
estimation) we need to take i iterates, where ρi < 10−1 . This is only one eighth the
cost of solving the original system to a relative accuracy of, say, 10−8 .
To illustrate, consider a model problem derived from the one-dimensional Poisson
equation [4]. Let A ∈ Rn×n be tridiagonal with twos on the main diagonal and
negative ones on the first superdiagonals and subdiagonals. Then A is positive definite
with eigenvalues
mπ
, m = 1, . . . , n.
λm = 4 sin2
2(n + 1)
Now suppose that we wish to use the SCE method to estimate
κF (A) via the
√
SOR method. From the definition of A we have kAkF = 6n − 2. To estimate
STATISTICAL CONDITION ESTIMATES
577
kA−1 kF , take b ∈ U(Sn−1 ) and solve for p
z in Az = b by using the SOR iteration with
the acceleration parameter γ = 2/(1 + 1 − ξ 2 ), where ξ = 1 − 2 sin2 (π/2(n + 1))
as described in [40]. For n = 1000 we find ρ = γ − 1 = 0.993742741. If kei+1 k
were exactly
equal to ρ(M )kei k, then we would expect to need to take about 367 ≈
ln 10−1 / ln(ρ) iterations to estimate kzk to within a relative factor of about 90%.
By actual computation we find kz367 k = 1.35 × 103 and kzexact k = 1.64 × 103 . The
corresponding estimate of κF (A) is kAkF kz367 k/ω1000 = 4.13 × 106 , which compares
favorably with the true value κF (A) = 8.18×106 . Note that the computational cost of
the estimate is minor in comparison with the 2935 steps needed to generate a solution
with relative accuracy of about 10−8 .
In the above discussion we have implicitly assumed that we know the value of the
spectral radius ρ = ρ(M ). In general, this value is not available to us but is readily
estimated by noting that the differences di ≡ xi − xi−1 satisfy di = M di−1 . Thus the
ratio ri ≡ kdi k/kdi−1 k, which is easily computed during the iteration process, tends
to ρ as i increases.
For other methods of estimating the condition of large linear systems see Grimes
and Lewis [18], who show how the LINPACK estimator of the 2-norm condition number can be implemented, with minor scaling modifications, for sparse systems that
are solved via factorization. The tridiagonal matrix example above is also treated in
[18]. Duff, Erisman, and Reid [14] also consider condition estimates for direct methods applied to sparse matrices. Wright [42] gives a discussion of condition number
convergence for matrices arising in finite-dimensional approximations of integral operators. See Higham [21] for additional references related to conditioning of large linear
systems.
4. Relative perturbations. If we are interested in measuring the sensitivity of
a function, then we may want the perturbations in the input data to be relative rather
than absolute; see [35]. For example, if the perturbations in the vector b are due to
truncation effects of finite precision representation in a computer, then we can expect
the individual entries of b to satisfy fl(bi ) = bi (1 + δi ), where |δi | is less than or equal
to the relative machine precision and fl(·) denotes the floating-point representation of
a real number. The sensitivity effects of relative perturbations can be measured by
the SCE method as described in [27]. This often leads to a more realistic indication
of the accuracy in a computed solution than the standard 2-norm condition number
might otherwise indicate. For example, inverse iteration requires the solution of an
extremely ill-conditioned linear system of equations, yet it can be performed quite
accurately; see [5] and [33] for examples and extended discussions. The SCE method,
incorporating the relative perturbation approach described below, reliably accounts
for the results obtained with inverse iteration. That inverse iteration generally gives
excellent results is evidenced by small SCE condition estimates despite the large 2norm condition number of the inverse iteration coefficient matrix. For the purposes of
exposition it is convenient to represent the input data [A, b] as a vector z = vec([A, b])
and to work with the function f (z) = A−1 b.
The basic idea in measuring the sensitivity of a general function f = f (z) to
relative perturbations is to introduce an intermediate function gz = f ◦ τz , where
τz (y) ≡ (y1 z1 , . . . , yn zn ). Then gz (1, . . . , 1) = f (z1 , . . . , zn ), and absolute perturbations in the entries of y at y = (1, . . . , 1) are converted to relative perturbations in
the entries of z: if ỹi = yi + δi = 1 + δi then y˜i zi = zi (1 + δi ). Because of this we
may obtain relative sensitivity estimates for f by applying the statistical estimation
method to gz . See [27] for details. See also [1], [2], [23], [35] for related discussions of
“mixed” and “componentwise” condition numbers.
578
C. S. KENNEY, A. J. LAUB, AND M. S. REESE
In view of the fact that τz is a diagonal scaling map of the form τz (y) = Tz y, where
Tz = diag(z1 , . . . , zn ), the Fréchet derivative of gz is simply the composition of Tz with
the Fréchet derivative of f : Dgz = Df ◦ Tz . This means that Algorithms 1 and 2 can
be used directly (in estimating the sensitivity of f = A−1 b to relative perturbations)
with the modification that, after generating and normalizing (or orthonormalizing as
in Algorithm 2) the random elements of Step 1, these elements are then multiplied
by the corresponding entries of A or b. The remaining steps in these algorithms are
unchanged. If we apply the SCE algorithm for relative perturbations to Example
2, we obtain a relative condition vector whose entries are on the order of one, in
agreement with the high accuracy of the computed solution and despite the large
2-norm condition number of A. Skeel’s condition number is as large as the standard
2-norm condition number for the linear system of this example:
cond(A, x) =
k |A−1 | |A| |x| k∞
≈ 1010 ≈ κ2 (A).
kxk∞
However, a componentwise version of Skeel’s condition number presented in [34] gives
results similar to SCE.
5. Structured perturbations. Restrictions on the types of perturbations that
may occur in a linear system often arise as a consequence of some physical property
of the system or may be due to the solution process itself. For example, in radioactive
decay problems, isotope decays are not reversible; this shows up as an upper triangular
constraint on the form of the transition matrix. Similarly, when backsolving an upper
triangular system we see no rounding effects associated with the lower triangular
entries since these entries are never referenced in the solution process. Likewise, in
solving a symmetric problem by symmetry-preserving methods we expect to see only
symmetric perturbation effects.
It is convenient to describe structure restrictions on matrices in terms of a map
τ from a space M to Rn×n . The space M is determined by the variables needed
to account for the desired structure. For example, symmetric Toeplitz matrices have
constant values along their diagonals and hence are determined by 2n − 1 numbers.
Toeplitz structure can be obtained by taking M = R2n−1 and letting τ be the map


vn vn+1 · · ·
v2n−1
 ..

..
..
..
 .

.
.
.


τ (v1 , . . . , v2n−1 ) =  v3
 .
···


 v2
v3
···
vn+1 
v1
v2
v3 · · ·
vn
As another example, upper triangular matrices of order n are determined by
n(n + 1)/2 variables, so we can set M = Rn(n+1)/2 with h(v1 , . . . , vn(n+1)/2 ) = A,
where aij = vj+(i−1)(2n−i)/2 for i ≤ j and aij = 0 otherwise. (Symmetric matrices
can be handled in the same way except that we take aij = vi+(j−1)(2n−j)/2 for i > j.)
We may also want to impose structural restrictions on the allowable perturbations
in b, so we assume that M and the structure map τ are augmented to include b, i.e.,
τ has form τ (v) = [A, b] for v ∈ M.
Structure restrictions of this type are easily accommodated by again using the fact
that the Fréchet derivative of the composition of two maps is equal to the composition
of the two Fréchet derivatives, i.e.,
D(f ◦ τ ) = Df ◦ Dτ.
STATISTICAL CONDITION ESTIMATES
579
If τ is linear, as is the case in the preceding examples, then Dτ = τ and
D(f ◦ τ ) = Df ◦ Dτ
= Df ◦ τ.
The SCE method maintains the desired structure by working with perturbations in
the space of inputs M. This produces only slight changes in the basic small-sample
condition estimation algorithm. Suppose τ is a linear map from the space of inputs
M to Rn×n × Rn . By simply generating ṽ ∈ M randomly instead of à and b̃ in
Algorithm 1, we obtain an algorithm to estimate the condition of the map f ◦ τ ,
where f (A, b) = A−1 b and τ (v) = [A, b]. We obtain a multiple-sample algorithm in a
similar fashion by modifying Algorithm 2.
Remark. The SCE condition estimation algorithm for structured perturbations
can be especially useful for analyzing the condition of a block linear system. For
example, some of the diagonal blocks may be very ill conditioned compared to others,
or certain blocks may exhibit special types of perturbations.
Standard condition estimation methods may be much more conservative than
the SCE method for structured perturbations. Specifically, the Skeel componentwise
condition mentioned in the previous section can greatly overestimate the true sensitivity of a linear system with structured perturbations, as can the LAPACK condition
estimator. For example, neither the Skeel formula nor the LAPACK estimator can
detect when perturbations are highly correlated in a linear system even though an
extremely precise solution can often be obtained in such a situation. If we apply the
structured perturbation version of SCE to the linear system with special perturbations introduced in (3), we obtain a tight upper bound for the componentwise forward
error
2 × 10−31
.
|∆x| ≤ 2 mach κsce (A, b) ≈
6 × 10−16
Recall from section 1.1 that the componentwise forward error for (3) computed with
(2) was very conservative. LAPACK also gives a conservative forward error estimate in double precision, with FERR ≈ 10−12 . Furthermore, the Skeel componentwise
condition estimate κskeel (A, b) ≈ [102 , 103 ]T is large compared to the SCE estimate
κsce (A, b) ≈ [10−16 , 1]T .
6. Componentwise condition of matrix inverses. Computing the explicit
inverse of a matrix is generally very inefficient in the context of solving a system of
linear equations. However, there do exist certain situations in which we must explicitly
compute the inverse of a matrix, such as in the matrix sign function algorithm used
to find a basis for the stable invariant subspace of a Hamiltonian matrix associated
with an algebraic Riccati equation [28].
By using the results in section 5, it is straightforward to use the SCE method to
construct sensitivity estimates of the elements of A−1 that respect the structure of A.
For example, if A is symmetric, a symmetric perturbation of its elements is reflected
in symmetry of the sensitivity estimates for the elements of the symmetric matrix
A−1 . Similarly, when A is upper triangular, as in Example 3, the upper triangular
structure in A−1 can be respected. Many other interesting matrix structures are
preserved under inversion (but not all; e.g., a tridiagonal matrix does not generally
have a tridiagonal inverse).
It can also be of interest to estimate the sensitivity of specific elements of a matrix
inverse to see which components of various right-hand-side vectors are most strongly
580
C. S. KENNEY, A. J. LAUB, AND M. S. REESE
magnified in determining the solution of a linear system. Structural restrictions on
allowable perturbations can significantly reduce the sensitivity of the inverse.
As an example, consider a companion-form matrix problem. Let A have zero
entries except for ones on the first superdiagonal and negative binomial coefficients
anj = −n!/ ((j − 1)!(n − j + 1)!) along the bottom row. The motivation for choosing
this matrix is that A has a multiple eigenvalue of order n at λ = −1 and small
perturbations of A suffice to move one of these eigenvalues to zero, thus making A
singular. For n = 20, we see that κ(A) = 1.4 × 1011 and so A is already quite ill
conditioned with respect to inversion. Hence, for general perturbations in A and b,
we can expect large variations in A−1 b. However, if we allow perturbations only in b
and the last row of A, then the effect on the inverse can be minor. This surprising
result can be shown analytically. If A is a companion-form matrix with anj = vj and
everywhere except
v1 = −1, then A−1 has zeros
for ones on the first subdiagonal and
the top row where A−1 1,j = vj+1 with A−1 1,n = −1. For example, for n = 4




v2 v3 v4 −1
0 1 0 0
 1 0 0
 0 0 1 0 
0 
−1


.
A=
 0 0 0 1  and A =  0 1 0
0 
0 0 1
0
−1 v2 v3 v4
Thus if the variations in A are restricted to the (n, j) entries for 1 < j ≤ n, then
the changes in the inverse are exactly the same as the variations in A but shifted to
the top row. This is perfect conditioning since there is no growth in the size of the
perturbations as we move from input to output.
This type of structure-dependent conditioning is easily detected using SCE methods. Using the structured condition estimator described in section 5 (with perturbations restricted to the entries of b and the last row of A) we find that κrel = 6.6 × 101 ,
whereas if we use the general perturbation condition estimate of Algorithm 1, we find
(for a random right-hand side b) that κrel = 2.3 × 107 .
7. Comparison with existing algorithms. The performance of SCE is comparable to that of condition estimation algorithms found in standard software libraries.
MATLAB code written by the authors employing SCE methods for linear systems requires on the order of k 2 n2 floating-point multiply-adds, where k is the number of
SCE samples and n is the order of the linear system being solved. For the majority of
problems, k = 1 is quite acceptable. MATLAB’s rcond, based on the LINPACK routine DGECO, requires on the order of 2n2 floating-point multiply-adds. LAPACK’s
condition estimator (STRCON) and MATLAB’s condest, based on Higham’s modification of Hager’s method [20], [22], also require on the order of 2n2 floating-point
multiply-adds.
Table 2 compares condition estimation methods applied to random matrices of
size n × n. The exact 2- and 1-norm condition numbers, κ2 and κ1 , are listed in
columns 2 and 3. Columns 4 and 5 give the 1-norm condition estimates 1/rcond
and condest, respectively, from MATLAB. SCE actually computes an estimate of the
Frobenius-norm condition number, κF = kAkF kA−1 kF , which is listed in column 6
of the table. It is easily shown that
1
n
κ1 (A)
κ2 (A)
≤
≤
κF (A)
κF (A)
≤
≤
n κ1 (A),
n κ2 (A).
Columns 7–9 of Table 2 present the results of running the authors’ implementation of
the SCE algorithm in MATLAB with k = 1, 2, and 4 function evaluations, respectively.
581
STATISTICAL CONDITION ESTIMATES
TABLE 2
Condition estimates from cond, 1/rcond, condest, and SCE on random n × n matrices.
n
4
16
64
256
1024
κ2
6.4e1
3.3e2
1.6e2
3.9e3
5.0e3
κ1
1.2e2
1.2e3
1.4e3
7.6e4
1.8e5
1/rcond
8.9e1
7.7e2
4.7e2
2.8e4
5.2e4
condest
1.2e2
1.2e3
1.4e3
7.6e4
1.7e5
κF
7.4e1
7.5e2
7.5e2
3.1e4
9.0e4
(1)
(2)
κsce
2.6e1
8.0e2
6.8e2
4.2e3
1.4e5
κsce
9.7e1
1.2e3
7.1e2
5.7e4
1.0e5
(4)
κsce
7.4e1
9.9e2
1.2e3
2.0e4
6.5e4
TABLE 3
Averages over 1000 random matrices of log condition numbers for 2-norm, Frobenius-norm,
and the SCE method with one and four function evaluations.
n
4
8
16
32
64
128
256
512
avg. ln(κ2 )
2.58
3.45
4.22
4.93
5.69
6.37
7.02
7.78
ln(n) + 1.537
2.92
3.62
4.31
5.00
5.70
6.39
7.08
7.78
avg. ln(κF )
2.91
4.04
5.10
6.13
7.22
8.25
9.24
10.33
(1)
avg. κsce
2.84
3.95
4.98
6.00
7.08
8.14
9.16
10.25
(4)
avg. κsce
2.91
4.02
5.08
6.11
7.19
8.23
9.21
10.31
All condition estimates in the table are within about an order of magnitude of the
condition numbers that they estimate, which is usually sufficient in practice.
Edelman showed in [15] that if A ∈ Rn×n is random, then the expected value of
ln(κ2 (A)) is asymptotically ln(n)+1.537. Edelman’s experiment was repeated and the
SCE method applied. Table 3 lists the results of averaging log condition numbers of
1000 random matrices A with N(0, 1) coefficients for various values of n. Column 2 of
the table gives averages for ln(κ2 (A)), and column 3 gives the value of ln(n)+1.537 as a
reference. Columns 4–6 give, respectively, averages of the natural logarithms of κF (A)
and the SCE method applied to A with one and four function evaluations. The table
is consistent with Edelman’s results, and it shows that the SCE method estimates the
Frobenius-norm condition quite well with just a single function evaluation. Of course,
SCE with four function evaluations estimates the Frobenius-norm condition number
better, but only slightly.
8. Conclusion. Small-sample SCE is a new method of estimating the condition
of general matrix functions. For the problem of estimating the condition of linear
systems, SCE has cost comparable to standard condition estimation methods, and
it has several major advantages over these methods. Its most important advantage
is that it provides, at no extra cost, a matrix of condition numbers rather than a
single condition number. Thus it provides componentwise condition estimates for
linear system solutions and can thereby indicate when a computed solution may be
determined accurately even if the condition number with respect to inversion of the
coefficient matrix is large. The method’s flexibility also enables it to handle structured systems. A rigorous error probability analysis exists for the SCE method. A
possible disadvantage is that it does not provide an estimate of the null vector in the
nearly singular case. If an approximate null vector is needed, then another condition
estimation method may be more appropriate. MATLAB software written by the authors incorporating the SCE method has performed successfully on a wide variety of
linear systems.
582
C. S. KENNEY, A. J. LAUB, AND M. S. REESE
Acknowledgments. The authors are grateful to Shiv Chandrasekaran and an
anonymous reviewer for many useful comments during the preparation of this paper.
REFERENCES
[1] E. ANDERSON, Z. BAI, C. BISCHOF, J. DEMMEL, J. DONGARRA, J. DU CROZ, A. GREENBAUM,
S. HAMMARLING, A. MCKENNEY, S. OSTROUCHOV, AND D. SORENSEN, LAPACK Users’
Guide, 2nd ed., SIAM, Philadelphia, 1994.
[2] M. ARIOLI, J. DEMMEL, AND I. DUFF, Solving sparse linear systems with sparse backward
error, SIAM J. Matrix Anal. Appl., 10 (1989), pp. 165–190.
[3] K.E. ATKINSON, An Introduction to Numerical Analysis, 2nd ed., Wiley, New York, 1989.
[4] W.L. BRIGGS, A Multigrid Tutorial, SIAM, Philadelphia, 1987.
[5] S. CHANDRASEKARAN, When is a Linear System Ill-Conditioned?, Ph.D. thesis, Yale University, New Haven, CT, December 1994.
[6] S. CHANDRASEKARAN AND I.C.F. IPSEN, On the sensitivity of solution components in linear
systems of equations, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 93–112.
[7] F. CHATELIN AND V. FRAYSSÉ, Qualitative Computing: Elements of a Theory for Finite
Precision Computation, CERFACS, Orsay, France, 1993.
[8] A.K. CLINE, C.B. MOLER, G.W. STEWART, AND J.H. WILKINSON, An estimate for the condition number of a matrix, SIAM J. Numer. Anal., 16 (1979), pp. 368–375.
[9] A.K. CLINE AND R.K. REW, A set of counter-examples to three condition number estimators,
SIAM J. Sci. Statist. Comput., 4 (1983), pp. 602–611.
[10] J. DEMMEL, On condition numbers and the distance to the nearest ill-posed problem, Numer.
Math., 51 (1987), pp. 251–289.
[11] J. DEMMEL, The probability that a numerical analysis problem is difficult, Math. Comp., 50
(1988), pp. 449–480.
[12] J.D. DIXON, Estimating extremal eigenvalues and condition numbers of matrices, SIAM J.
Numer. Anal., 20 (1983), pp. 812–814.
[13] J.J. DONGARRA, J.R. BUNCH, C.B. MOLER, AND G.W. STEWART, LINPACK Users’ Guide,
SIAM, Philadelphia, 1979.
[14] I.S. DUFF, A.M. ERISMAN, AND J.K. REID, Direct Methods for Sparse Matrices, Oxford University Press, Oxford, 1986.
[15] A. EDELMAN, Eigenvalues and condition numbers of random matrices, SIAM J. Matrix Anal.
Appl., 9 (1988), pp. 543–560.
[16] G.H. GOLUB AND C.F. VAN LOAN, Matrix Computations, 2nd ed., Johns Hopkins University
Press, Baltimore, 1989.
[17] W.B. GRAGG AND G.W. STEWART, A stable variant of the secant method for solving nonlinear
equations, SIAM J. Numer. Anal., 13 (1976), pp. 889–903.
[18] R.G. GRIMES AND J.G. LEWIS, Condition number estimation for sparse matrices, SIAM J.
Sci. Statist. Comput., 2 (1981), pp. 384–388.
[19] T. GUDMUNDSSON, C.S. KENNEY, AND A.J. LAUB, Small-sample statistical estimates for matrix norms, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 776–792.
[20] W.W. HAGER, Condition estimates, SIAM J. Sci. Statist. Comput., 5 (1984), pp. 311–316.
[21] N.J. HIGHAM, A survey of condition number estimation for triangular matrices, SIAM Rev.,
29 (1987), pp. 575–596.
[22] N.J. HIGHAM, Algorithm 674: FORTRAN codes for estimating the one-norm of a real or complex
matrix, with applications to condition estimation, ACM Trans. Math. Software, 14 (1988),
pp. 381–396.
[23] N.J. HIGHAM, Iterative refinement enhances the stability of QR factorization methods for solving linear equations, BIT, 31 (1991), pp. 447–468.
[24] N.J. HIGHAM, A survey of componentwise perturbation theory in numerical linear algebra, in
Mathematics of Computation 1943–1993: A Half Century of Computational Mathematics
48, W. Gautschi, ed., Proc. Sympos. Appl. Math. 48, American Mathematical Society,
Providence, RI, 1994, pp. 49–77.
[25] N.J. HIGHAM, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, 1996.
[26] C.S. KENNEY AND A.J. LAUB, Condition estimates for matrix functions, SIAM J. Matrix Anal.
Appl., 10 (1989), pp. 191–209.
[27] C.S. KENNEY AND A.J. LAUB, Small-sample statistical condition estimates for general matrix
functions, SIAM J. Sci. Comput., 15 (1994), pp. 36–61.
[28] C.S. KENNEY AND A.J. LAUB, The matrix sign function, IEEE Trans. Automat. Control, 40
(1995), pp. 1330–1348.
STATISTICAL CONDITION ESTIMATES
583
[29] C.S. KENNEY, A.J. LAUB, AND M.S. REESE, Statistical condition estimation for linear least
squares, SIAM J. Matrix Anal. Appl., to appear.
[30] C.S. KENNEY, A.J. LAUB, AND S.C. STUBBERUD, Frequency response computation via rational
interpolation, IEEE Trans. Automat. Control, 38 (1993), pp. 1203–1213.
[31] J. KUCZYŃSKI AND H. WOŹNIAKOWSKI, Estimating the largest eigenvalue by the power and
Lanczos algorithms with a random start, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1094–
1122.
[32] T.J. LEE, Adaptive condition estimation for matrices with rank-one modifications, in Proc.
SPIE, Advanced Signal Processing: Algorithms, Architectures and Implementations V,
vol. 2296, San Diego, CA, July 1994, pp. 376–387.
[33] G. PETERS AND J.H. WILKINSON, Inverse iteration, ill-conditioned equations and Newton’s
method, SIAM Rev., 21 (1979), pp. 339–360.
[34] J. ROHN, New condition numbers for matrices and linear systems, Computing, 41 (1989),
pp. 167–169.
[35] R. SKEEL, Scaling for numerical stability in Gaussian elimination, J. Assoc. Comput. Mach.,
26 (1979), pp. 494–526.
[36] G.W. STEWART, Stochastic perturbation theory, SIAM Rev., 32 (1990), pp. 579–610.
[37] G.W. STEWART, An updating algorithm for subspace tracking, IEEE Trans. Signal Processing,
40 (1992), pp. 1535–1541.
[38] G.W. STEWART, Updating a rank-revealing ULV decomposition, SIAM J. Matrix Anal. Appl.,
14 (1993), pp. 494–499.
[39] L.N. TREFETHEN, Pseudospectra of matrices, in Numerical Analysis 1991, D. Griffiths and
G. Watson, eds., Longman, Harlow, UK, 1992, pp. 234–266.
[40] R.S. VARGA, Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1962.
[41] J.H. WILKINSON, The Algebraic Eigenvalue Problem, Oxford University Press, Oxford, 1965.
[42] K. WRIGHT, Asymptotic properties of matrices associated with the quadrature method for
integral equations, in Treatment of Integral Equations by Numerical Methods, C. Baker
and G. Miller, eds., Academic Press, London, 1982, pp. 325–336.
[43] D.M. YOUNG, Iterative Solution of Large Linear Systems, Academic Press, New York, 1971.