SIAM J. SCI. COMPUT. Vol. 19, No. 2, pp. 566–583, March 1998 c 1998 Society for Industrial and Applied Mathematics 015 STATISTICAL CONDITION ESTIMATION FOR LINEAR SYSTEMS∗ C. S. KENNEY† , A. J. LAUB‡ , AND M. S. REESE† Abstract. The standard approach to measuring the condition of a linear system compresses all sensitivity information into one number. Thus a loss of information can occur in situations in which the standard condition number with respect to inversion does not accurately reflect the actual sensitivity of a solution or particular entries of a solution. It is shown that a new method for estimating the sensitivity of linear systems addresses these difficulties. The new procedure measures the effects on the solution of small random changes in the input data and, by properly scaling the results, obtains reliable condition estimates for each entry of the computed solution. Moreover, this approach, which is referred to as small-sample statistical condition estimation, is no more costly than the standard 1-norm or power method 2-norm condition estimates, and it has the advantage of considerable flexibility. For example, it easily accommodates restrictions on, or structure associated with, allowable perturbations. The method also has a rigorous statistical theory available for the probability of accuracy of the condition estimates. However, it gives no estimate of an approximate null vector for nearly singular systems. The theory of this approach is discussed along with several illustrative examples. Key words. conditioning, linear systems AMS subject classifications. 15A06, 15A12, 65F05, 65F30, 65F35 PII. S1064827595282519 1. Introduction. 1.1. Limitations of standard condition theory. For the problem of solving linear systems Ax = b with A ∈ Rn×n and b ∈ Rn , the standard theory uses the condition number κ(A) = kAk kA−1 k for a given consistent matrix norm k · k. Most condition estimation procedures approximate the 2-norm condition number κ2 (A) (as in the case of some power-method–based estimates [8]) or the 1-norm condition number κ1 (A) (as in the Hager–Higham method [20], [22]). The standard condition number is useful and arises naturally in bounding the relative error k∆xk/kxk, where (A + ∆A)(x + ∆x) = b + ∆b for perturbations ∆A and ∆b of the data A and b, respectively. If k∆Ak/kAk < µ, k∆bk/kbk < µ, and µκ(A) < kIk, then the relation ∆x = (A + ∆A)−1 (∆b − ∆Ax) gives [16, p. 82] kA−1 k k∆xk ≤ (k∆bk + k∆Ak kxk) kxk kIk − kA−1 k k∆Ak (1) ≤ 2µκ(A) . kIk − µκ(A) ∗ Received by the editors March 3, 1995; accepted for publication (in revised form) April 25, 1996. This research was supported in part by National Science Foundation grant ECS-9120643, Air Force Office of Scientific Research grant F49620-94-1-0104DEF, and Office of Naval Research grant N00014-92-J-1706. http://www.siam.org/journals/sisc/19-2/28251.html † Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106-9560 ([email protected], [email protected]). ‡ College of Engineering, University of California, Davis, CA 95616-5294 ([email protected]). 566 567 STATISTICAL CONDITION ESTIMATES If x + ∆x is the computed solution to Ax = b using Gaussian elimination with partial pivoting, then we expect µ to be a small multiple of the relative machine precision mach [16], [41]. We illustrate some difficulties with the standard approach by looking briefly at three examples. These examples and subsequent examples in the sequel were all computed using MATLAB 4.2a on a Sun SPARCstation for which mach = 2 × 10−52 ≈ 2.22 × 10−16 . Example 1. Let 1 A= 1−e 1+e 1 , x= 1 d , and b = 1 + d + de 1+d−e , where d and e are parameters. The relative error bound in (1) gives a convenient measure of the sensitivity of the problem Ax = b but this convenience has a price. Entrywise sensitivity information is lost. In particular, small entries of x can be much more sensitive than indicated by the norm bound in (1). This example illustrates the effect. If we take d = 10−5 and e = 10−5 , then κ2 (A) = 4 × 1010 , and by Gaussian elimination via MATLAB we find kx − xcomputed k2 = 1.58 × 10−6 , kxk2 2mach κ2 (A) = 1.78 × 10−5 . 1 − mach κ2 (A) It would be a mistake to think that this indicates about four or five digits of accuracy in the computed result. In fact, the second entry in the computed result for this example barely has one digit of accuracy in a relative sense: x2(computed) = 8.8818 × 10−6 , which is off by about 11% since the true value of x2 is 10−5 . See [24] and [25] for a comprehensive treatment of componentwise sensitivity analysis. LINPACK [13] and LAPACK [1] provide condition estimators that are inherently norm-based since they depend on routines that estimate kA−1 k and k |A−1 | |v| k, respectively, where v is a vector. Given the ability to reliably and efficiently approximate the vector |A−1 ||v|, a reasonable algorithm for computing an upper bound on componentwise forward error can be obtained. For example, suppose Ax = b and (A + ∆A)(x + ∆x) = b. The componentwise forward error is bounded above by |∆x| ≤ |A−1 | |∆A| |x + ∆x|. (2) Of course, this approach requires that we know a bound on |∆A|, and it does not necessarily provide tight bounds if ∆A has special structure. Suppose that the linear system 10.1 −1 9.1 (3) A= and b= −10 1 −9 is perturbed by ∆A = ∆a11 −∆a11 ∆a12 −∆a12 , where |∆A| ≤ mach 1 1 1 1 . 568 C. S. KENNEY, A. J. LAUB, AND M. S. REESE The special form of the perturbation may arise from constraints within a physical system being modeled, for example. The linear system (3) is not very sensitive to perturbations of this type, so the standard 2-norm condition number κ2 (A) ≈ 103 is somewhat misleading; in fact, the absolute forward error of the system is bounded 2mach T ] . The approach of (2) can be somewhat pessimistic, and in this above by [0, 1− mach case gives a forward error bound of approximately [10−13 , 10−12 ]T . Example 2. Let 1 d 0 1 A= , x= , and b = , 2 d 1/d 1 where d is a parameter. If we take d = 1010 , then κ2 (A) = 2 × 1010 . However, solving Ax = b in MATLAB gives the exact solution. In view of the large condition number of A we might well be puzzled by the high accuracy in the computed result. Example 3. Suppose that we are interested in the sensitivity of the inverse of A but that we only allow structured perturbations ∆A. For example, suppose that A and ∆A are given by 1 d 0 ∆d A= and ∆A = , 0 −1 0 0 where d and ∆d are parameters. The condition number of A with respect to inversion for this example is given approximately by κ2 (A) ≈ d2 for d large. However, this example has some special properties that make it very insensitive to perturbations of the form ∆A above: A−1 = A and (A + ∆A)−1 = A−1 + ∆A. That is, A is perfectly well conditioned with respect to perturbations of this type since the size of the perturbation in the inverse is exactly the same as the size of the perturbation in A. These three examples thus illustrate some typical limitations of standard approaches to assessing the condition of solving a linear system Ax = b: 1. Some standard condition estimators do not give entrywise sensitivity information. 2. The computed solution x may be determined rather accurately even though κ2 (A) is large; most methods of reporting the condition of solving Ax = b do not always give an indication of when this is occurring (see [6, section 9]). 3. The effects of any structural restrictions on perturbations of the input data may be ignored, and thus standard condition numbers may give overly conservative estimates of the sensitivity of structured problems. All of these limitations are addressed by Kenney and Laub [27] in a new approach to estimating the condition of any computed result that depends smoothly on its input values; for simplicity, computed results of this type are referred to as general matrix functions. This new procedure evaluates the matrix function with a small random perturbation of the input argument and assesses the condition based on the effect of the perturbation in the computed result. The success of this rather natural approach is based on properly scaling the results to account for the action of random inner products in p-dimensional space, where p is the number of input arguments of the matrix function. Surprisingly, just a few (usually one) of these random perturbations suffice to accurately gauge the condition of the computed result. Because of this, the method is referred to as the small-sample statistical condition estimation (SCE) method. For general matrix functions, the extra function evaluation needed for condition estimation doubles the computational effort. For linear systems, however, STATISTICAL CONDITION ESTIMATES 569 the factorizations performed during the original solution of Ax = b reduce the cost of the statistical condition estimate to the same level as that of standard condition estimation procedures. Only real matrices are treated here; extensions to complex matrices are obvious. The primary goal of this paper is to illustrate the application of the SCE method to the solution of linear systems. We do not discuss any of the global methods of Demmel [10], [11] and Trefethen [39], although the methods we present are pertinent to issues discussed there. For related studies of condition estimation for linear systems see Chandrasekaran and Ipsen [6] and Lee [32]. Our approach is statistical, and as such it differs from Skeel’s fundamental deterministic work on the condition of linear systems [35]. Other stochastic approaches also do exist. For example, Stewart’s work on stochastic perturbations [36] assumes a restricted class of perturbations of the matrix function, and it is an inherently normwise analysis. The class of perturbations allowed in SCE is much more general, and either componentwise or normwise results can be obtained. The SCE method also differs significantly from the statistical condition estimation method of Chatelin in [7]. There, several perturbations of the input data are obtained and the matrix function of interest is used to produce images of the perturbed data. The standard deviation of these images is then normalized by both the size of the computed solution and the size of the perturbation, resulting in a statistical estimate of the condition of the matrix function. By contrast, the SCE method is accompanied by a rigorous statistical foundation that provides an exact normalization constant and an explicit bound on the probability of accuracy of the condition estimate. The theoretical analysis of Chandrasekaran and Ipsen in [6] is representative of the state of the art in deterministic approaches to estimating componentwise condition for the solution of linear systems. Much of their analysis is, in fact, directed more to the general problem of entrywise condition estimates for linear least squares problems minx kAx − bk2 , where A is m × n with m ≥ n. Their approach, specialized to square nonsingular linear systems, is relatively inefficient. The application of SCE to linear least squares problems is discussed in [29]. In the next section we outline the theory of the small-sample statistical condition method. Section 3 discusses the application of SCE to linear systems. Section 4 looks at estimating the effects of relative perturbations, and section 5 discusses applying the SCE method to problems with restrictions on or structure associated with allowable perturbations. Section 6 addresses the componentwise sensitivity of individual entries in a matrix inverse and, finally, section 7 provides some comparison with standard condition estimation procedures. 1.2. Notation. We define these operations for a matrix A = [a1 , a2 , . . . , an ] = [aij ] ∈ Rn×n , aj ∈ Rn : 2 vec(A) = [aT1 , aT2 , . . . , aTn ]T ∈ Rn . 2 If v = [vk ] ∈ Rn , then A = unvec(v) sets the entries of A to aij = vi+(j−1)n . For q ∈ R, |A|q = [|aij |q ] ∈ Rn×n . For q = 1, this reduces to |A| = [|aij |] ∈ Rn×n . 2. Review of SCE. A function is locally sensitive if small changes in its argument can cause large changes in the value of the function. This leads us to ask whether local sensitivity can be detected reliably by making small random changes in the argument and looking for large relative changes in the function. The work in [27] gives an affirmative answer to this question and provides a firm theoretical basis for assessing the probability of accuracy in the resulting condition estimate. 570 C. S. KENNEY, A. J. LAUB, AND M. S. REESE The ideas behind this procedure are well illustrated for functions f : Rp → R. In the following we assume that f is at least twice continuously differentiable. Local sensitivity can be measured by the norm of the gradient of f which, for convenience, we denote by the row vector v T : ∂f (x) ∂f (x) T ,..., . v = ∇f (x) = ∂x1 ∂xp We may expand f in a Taylor series about a point x ∈ Rp : f (x + hz) = f (x) + hv T z + O(h2 ), (4) P 2 where h is a small positive number and z ∈ Rp has unit norm: kzk2 = zi = 1. It is clear from (4) that if the norm of the gradient of f is large, then a small perturbation in x can yield a large change in f . Alternatively, we can see from (4) that the inequality |f (x + hz) − f (x)| ≤ hkvk is true up to first order in h. This inequality points to the real utility of the local condition number kvk: for small perturbations it is a firstorder bound on the magnification factor between argument errors (of norm h) and the resulting function value error (of norm less than or equal to hkvk). Hence, assuming the gradient is not known explicitly (as is usually the case), we are faced with the problem of how to estimate its norm as a measure of local sensitivity. Equation (4) is the key to estimating the norm of v. From (4) we see that by evaluating f at x+hz we can use the quotient (f (x+hz)−f (x))/h to approximate the inner product v T z between the gradient and the vector z. If z is selected uniformly and randomly from the unit p-sphere Sp−1 (henceforth denoted z ∈ U(Sp−1 )), then it is known (see discussion in [27]) that the expected value of |v T z| is equal to the norm of v times a scaling factor ωp , called the Wallis factor, that depends only on p: E(|v T z|) = ωp kvk, (5) where ω1 = 1, ω2 = 2 π, and, for p > 2, ( 1·3·5···(p−2) ωp = (6) 2·4·6···(p−1) 2 2·4·6···(p−2) π 3·5·7···(p−1) for p odd, for p even. The Wallis factor can be accurately approximated [27] by s 2 (7) . ωp ≈ π(p − 12 ) The Newton quotient dz ≡ f (x + hz) − f (x) h satisfies (8) dz = v T z + O(h). From (5) and (8) we see that the true local condition number kvk is equal to the expected value of |dz|/ωp plus a term of order h. We can typically take h small enough that E(|dz|)/ωp is a good approximation to kvk. In the linear equation case considered STATISTICAL CONDITION ESTIMATES 571 in this paper, the Newton quotient approximation can be avoided by evaluating the Fréchet derivative directly via the LU factors of A. It is shown in section 2 of [27] that the condition estimator ν ≡ |v T z|/ωp is first order in the sense that the probability of a relative error in the estimate is inversely proportional to the size of the error. That is, for γ > 1 we have 2 1 +O . Pr(kvk/γ ≤ ν ≤ γkvk) ≥ 1 − πγ γ2 Additional function evaluations can improve the estimation procedure. Suppose that we obtain estimates ν1 , ν2 , . . . , νk ∈ R corresponding to orthogonal vectors z1 , z2 , . . . , zk ∈ Sp−1 whose span S is uniformly and randomly selected from the space of all k-dimensional subspaces of Rp (details in [27, section 3]). An easy way to obtain the vectors zi is to select z̃1 , z̃2 , . . . , z̃k with z̃i ∈ U(Sp−1 ) and then use a QR decomposition to produce an orthonormal basis {z1 , z2 , . . . , zk } for their span. The expected value of the norm of the projection of v onto S is q q ωp T 2 T 2 2 2 (9) E |v z1 | + · · · + |v zk | = E (ωp ν1 ) + · · · + (ωp νk ) = kvk, ωk where ωp and ωk are defined as in (6). We define the subspace condition estimator q ωk |v T z1 |2 + · · · + |v T zk |2 , ν(k) ≡ ωp which we see from (9) has expected value kvk. The analysis in [27] shows that this estimator has kth-order accuracy. By this we mean that a relative error of size γ in the condition estimate occurs with probability proportional to γ −k . For example, π , 4 γ2 32 Pr(kvk/γ ≤ ν(3) ≤ γkvk) ≈ 1 − , 3 π2 γ 3 81π 2 . Pr(kvk/γ ≤ ν(4) ≤ γkvk) ≈ 1 − 512 γ 4 Pr(kvk/γ ≤ ν(2) ≤ γkvk) ≈ 1 − As an illustration, for k = 3, the estimator ν(3) has probability 0.9989 of being within a relative factor of 10 of the true condition number kvk. In general, a relative accuracy within an order of magnitude is sufficient for estimating the local condition of a function. These higher order estimators require extra function evaluations, and this may be costly. However, there are situations in which extra function evaluations are very cheap compared to the initial function evaluation. For example, this is the case for any function that can be evaluated via a Newton procedure in which an initial guess of the function value is refined by an iterative procedure. Calculating f (x) may take many Newton steps, but usually only one Newton step is needed to find f (x+hz) since f (x) serves as a good initial guess. The solution of algebraic Riccati equations falls into this class of problems. The solutions of linear systems also admit very efficient statistical condition estimates. This is discussed below. 572 C. S. KENNEY, A. J. LAUB, AND M. S. REESE The statistical condition estimation procedure outlined above applies to functions f : Rp → R, i.e., to scalar-valued functions. What about vector-valued and matrixvalued functions? Both vector-valued and matrix-valued functions can be formulated as maps f : Rp → Rq . The simplest way to extend the SCE method is to view each entry of f as a scalar-valued function. In this way one extra function evaluation f (x + hz) provides a local condition estimate for each entry of the computed result. Entrywise condition estimates are of interest in many scientific problems (such as radioactive decay chains) in which entries of small value (the fast decay isotopes) need to be determined accurately. Denote the gradient of the ith entry of f by viT . The ith Newton quotient satisfies fi (x + hz) − fi (x) = viT z + O(h). h (10) Thus one extra function evaluation of the form f (x + hz) is sufficient to provide condition estimates for all the entries of f . This may be interpreted in another way. The extension of Taylor’s theorem to vector-valued functions gives f (x + hz) − f (x) = Dz + O(h), h (11) where the matrix D is the Fréchet derivative of f at x. Comparing (10) with (11) shows that the gradient vector viT of the ith entry of f is just the ith row of the Fréchet derivative D: T D = [v1 , v2 , . . . , vq ] . See also [6]. In any case, the statistical condition estimate for the ith entry of f takes the form νi = |(Dz)i |/ωp , where (Dz)i is the ith entry of the vector Dz. Note that the SCE method estimates the norm of the gradient of the matrix function, but it does not provide an estimate of the gradient direction. If this direction is needed [17], [37], [38], then another condition estimation method may be more appropriate. Research is currently under way to extend the SCE method so that it will provide an estimate of the gradient direction. 3. Linear systems. 3.1. SCE for general linear systems. Consider linear systems of the form L(A, X) = B, where L is linear in A and X, and we are interested in the condition of the problem of solving for X given A and B. For example, we could be solving a linear system of the form AX = B or a Lyapunov equation AT X + XA = B. From the above discussion, it is sufficient for condition estimation to be able to evaluate the Fréchet derivative (of the map (A, B) 7→ X) at a point Z. (For simplicity of exposition we have switched to the case of mappings between matrices rather than mappings between vectors. The two are equivalent as can be seen by writing the matrix maps in terms of their Kronecker vector counterparts; see [26].) The Fréchet derivative of the linear system satisfies the same linear system but with a different right-hand side, an effect that was noted and exploited in [30]. STATISTICAL CONDITION ESTIMATES 573 To see this, suppose that we perturb (A, B) to (A + δ Â, B + δ B̂) and let X + δ X̂ be defined implicitly by (12) L(A + δ Â, X + δ X̂) = B + δ B̂. Here we assume that the original system is uniquely solvable for X and that the perturbing matrix direction pair (Â, B̂) has Frobenius norm 1. For sufficiently small δ > 0, unique solvability is retained so that X̂ is well defined. Expand (12) by linearity, cancel like terms, divide by δ, and take the limit as δ → 0 to get the Fréchet relation L(A, X̂) = B̂ − L(Â, X). We refer to this as the Fréchet relation since the solution X̂ of this relation is the Fréchet derivative of the map (A, B) 7→ X evaluated in the matrix direction (Â, B̂). More general affine problems of the form L(A1 , A2 , . . . , Am , X) = B, such as the generalized Sylvester equation A1 XA2 + A3 XA4 = B, can be treated in exactly the same manner to give the Fréchet relation L(A1 , A2 , . . . , Am , X̂) = B̂ − L(Â1 , A2 , . . . , Am , X) −L(A1 , Â2 , . . . , Am , X) − · · · − L(A1 , A2 , . . . , Âm , X). 3.2. SCE for Ax = b. If we take B to be a vector b in the analysis of section 3.1 and set f (A, b) = x = A−1 b with L(A, x) = Ax, then the Fréchet derivative of f at (A, b) evaluated in the direction (Â, b̂) is given by the vector (13) Df (A, b; Â, b̂) = A−1 (b̂ − Âx). In practice, we cannot obtain the exact Fréchet derivative since we do not have the exact solution x. However, condition estimates are usually needed to within only an order of magnitude, and this level of accuracy is provided comfortably by an approximate solution. The following algorithm is based on the Fréchet derivative in (13). It takes the matrix A ∈ Rn×n and the vector b ∈ Rn as inputs, and it outputs the relative condition n n vector κrel ∈ R , which is an estimate of the relative sensitivity of each entry of the computed solution vector x. ALGORITHM 1 (ONE-SAMPLE CONDITION ESTIMATION FOR x = A−1 b). 1. Let each entry of à and b̃ be selected randomly and independently from a normal distribution with mean 0 and variance 1 (henceforth, we say that each entry is in N(0, 1)). Set  = Ã/k[Ã, b̃]kF and b̂ = b̃/k[Ã, b̃]kF . 2. Let p be the number of entries in the matrix [Â, b̂]. Approximate ωp using (7). 3. Calculate the absolute condition vector 1 −1 κabs = |A (b̂ − Âx)|. ωp Let the relative condition vector κrel be the vector κabs divided componentwise by x, leaving entries of κabs corresponding to zero entries of x unchanged. 574 C. S. KENNEY, A. J. LAUB, AND M. S. REESE If more accuracy is desired in the condition estimates, then we can use k > 1 function evaluations. Each function evaluation consists of solving Az = w for a different randomly generated right-hand side w. As in the above algorithm this can be done efficiently once the initial LU factorization of A has been computed. The complete algorithm is as follows. ALGORITHM 2 (MULTIPLE-SAMPLE CONDITION ESTIMATION FOR x = A−1 b). 1. Generate (Ã1 , b̃1 ), (Ã2 , b̃2 ), . . . , (Ãk , b̃k ) with entries in N(0, 1). Orthonormalize the Kronecker vectors corresponding to these matrix-vector pairs. This can be done by converting each augmented matrix [Ãi , b̃i ] to a vector w̃i with the vec operation and using QR factorization to get an orthonormal matrix of column vectors [w1 , w2 , . . . , wk ]. Each wi can then be converted into the desired augmented matrix [Âi , b̂i ] with the unvec operation. 2. Let p be the number of entries in [Â1 , b̂1 ]. Calculate ωk and ωp using approximation (7). 3. Calculate ui = A−1 (b̂i − Âi x), where x = A−1 b. Calculate the absolute condition vector 1/2 ωk 2 2 2 κabs = |u1 | + |u2 | + · · · + |uk | . ωp Let the relative condition vector κrel be the vector κabs divided componentwise by x, leaving entries of κabs corresponding to zero entries of x unchanged. Let us look at how the SCE approach handles the difficulties encountered in Example 1. There, the second entry of the computed solution had barely one digit of accuracy. Algorithms 1 and 2 easily detect this sensitivity in the second component. Letting r(1) be the relative condition vector defined in Algorithm 1, we find # " |x1 −x 1(computed) | 8.38 × 10−6 1.12 × 10−6 |x1 | (1) and . r = = mach |x2 −x2(computed) | 0.112 1.10 |x2 | We can compare this with r(2) (from Algorithm 2 with k = 2) and rexact (computed directly from finite differences for the function f (A, b) = A−1 b): 2.57 × 10−6 3.14 × 10−6 mach r(2) = and mach rexact = . 0.337 0.314 That the SCE method gives good results for this problem is not surprising since it is designed to accurately estimate the condition of each component of the computed result. What may be surprising is how efficient this approach is. The cost of the one-sample estimate is half that of a one-cycle power method estimate of κ2 (A) in which two linear solves (Az = b and AT y = z) are needed. Before looking at how the SCE method can handle relative perturbations (which are directly tied to the difficulties encountered in Example 2), we pause to show that the SCE method can be used to estimate the Frobenius-norm condition number κF (A) = kAkF kA−1 kF . This is especially useful in very large linear problems (as discussed below) where standard condition estimates can be computationally inefficient. 3.3. Frobenius-norm condition estimation. The work in [19] shows that for a given matrix M of order n, the expected value of kM bkF (= kM bk2 ; the reason for using the Frobenius-norm notation for a vector becomes apparent below), with b ∈ U(Sn−1 ), is equal to ωn kM kF . Letting A−1 play the role of M , we can estimate 575 STATISTICAL CONDITION ESTIMATES TABLE 1 Condition estimates for three counterexample matrices from Cline and Rew [9]. Matrix B C D m 128 128 32 κ1 1e05 8e02 7e10 1/rcond 7e02 7e00 3e01 condest 7e04 8e02 7e10 κF 1e05 5e02 3e10 (1) κsce 2e05 7e02 5e10 (2) κsce 1e05 6e02 5e10 kA−1 kF by first generating a vector b̃ with entries in N(0, 1) and then normalizing b̃ to get b. Next solve Az = b so that z = A−1 b = M b. Then kzkF /ωn ≈ kA−1 kF (see [19] for details; see also Lee [32]). The one-sample (k = 1) SCE Frobenius-norm (1) estimate for κF (A) is then given by κsce = kAkF kzkF /ωn . The two-sample (k = 2) statistical estimate of κF (A) can be found by generating b̃1 and b̃2 with entries in N(0, 1), and then orthonormalizing them to get b1 and b2 . Solve Azi = bi for i = 1, 2. Then ω2 kzkF /ωn ≈ kA−1 kF , where z = [z1 , z2 ]. (2) The two-sample SCE Frobenius-norm estimate for κF (A) is then given by κsce = ω2 kAkF kzkF /ωn . For estimates with k > 2, see [19]. Let us illustrate the type of results obtained from applying a single run of this procedure to a group of examples taken from [9] for which the LINPACK condition estimator gives poor condition estimates. In the following, m is a parameter. Let 1 −1 −2m 0 −2 1 1 − 2m−2 0 1 m −m , −m−1 , m−1 C= 0 B= 0 1 m + 1 −(m + 1) 0 0 1 0 0 0 m 1 0 ··· 0 . −1 1 .. . . . .. .. . D = −1 −1 . . .. .. . . . . . 1 0 −1 −1 · · · −1 −1 m×m For reference, columns 3–5 of Table 1 give the 1-norm condition number κ1 and MATLAB’s 1-norm estimates 1/rcond and condest. Columns 6–8 of Table 1 give the Frobenius-norm condition number κF and the SCE Frobenius-norm condition estimates for k = 1 and k = 2. These estimates are close to the true Frobenius-norm condition numbers for these examples. Deterministic condition estimators suffer from the drawback that specific examples can be produced to cause them to fail, i.e., give arbitrarily poor estimates. By contrast, statistically based methods cannot be so fooled. In fact, an analysis is available for SCE that predicts the probability of encountering a poor estimate. This probability can be made arbitrarily small by taking sufficiently many random samples. Naturally, statistically based methods (including SCE) are only as nondeterministic as their random number generators permit. For an alternative analysis of the 2-norm power method condition estimates with a random starting vector, see [31]. See also Higham [21] and Dixon [12]. 3.4. Condition estimates for large linear systems. Because of computational costs, most large linear systems of the form Ax = b are solved iteratively by 576 C. S. KENNEY, A. J. LAUB, AND M. S. REESE a variety of methods, including Jacobi and Gauss–Seidel iteration, successive overrelaxation (SOR) [3], [40], [43], and multigrid methods [4], as well as by methods that rely on special properties of A such as the conjugate gradient method for symmetric positive definite matrices. A common feature of these methods is that the initial convergence from a starting vector x0 is usually rapid and then slows in a neighborhood of the exact solution [4]. This phenomenon is exactly what we need to make efficient statistical condition estimates. Conversely, condition estimation methods that rely on a sequence of accurate solutions (such as the power method or related 1-norm methods) are at a big disadvantage since the LU factors of A are not available and each accurate solution requires as much effort as solving the original system. To explain in more detail, suppose that we are working with a “splitting” iterative method of the form xi+1 = b̂ + M xi . (In the simplest splitting methods, A = D + N , where D is easily inverted; e.g., D is diagonal or upper triangular, and b̂ = D−1 b with M = −D−1 N .) In this case the error vector ei ≡ x − xi satisfies ei = M ei−1 = · · · = M i e0 . If M has a set of distinct eigenvalues λ1 , . . . , λn ordered by decreasing magnitude (|λj | ≥ |λj+1 |), and if the initial error is written as e0 = α1 v1 + · · · + αn vn , where vj is the eigenvector associated with λj , then ei = α1 λi1 v1 + · · · + αn λin vn . The case of nondistinct eigenvalues can be analyzed in a similar manner, and we conclude that a necessary and sufficient condition for convergence (i.e., kei k → 0 for all possible x0 ) is that the spectral radius ρ(M ) ≡ max |λj | < 1. In many cases we have a large spread in the magnitudes of the eigenvalues of M ; this accounts for the initial rapid convergence mentioned above. Even in the absence of this, however, we can still get very efficient condition estimates using the SCE method. As the iteration continues the error tends to decrease linearly with kei+1 k ≈ ρ(M )kei k. In the SCE method we need only to solve Az = b (for a random b) accurately enough to have the norm of the iterate zi approximately equal to the norm of the true solution z. If we start with z0 = 0, then e0 = z and kei k ≈ ρi ke0 k = ρi kzk. Thus, to approximate kzk to within, say, 90% (which is more than sufficient for the purposes of condition estimation) we need to take i iterates, where ρi < 10−1 . This is only one eighth the cost of solving the original system to a relative accuracy of, say, 10−8 . To illustrate, consider a model problem derived from the one-dimensional Poisson equation [4]. Let A ∈ Rn×n be tridiagonal with twos on the main diagonal and negative ones on the first superdiagonals and subdiagonals. Then A is positive definite with eigenvalues mπ , m = 1, . . . , n. λm = 4 sin2 2(n + 1) Now suppose that we wish to use the SCE method to estimate κF (A) via the √ SOR method. From the definition of A we have kAkF = 6n − 2. To estimate STATISTICAL CONDITION ESTIMATES 577 kA−1 kF , take b ∈ U(Sn−1 ) and solve for p z in Az = b by using the SOR iteration with the acceleration parameter γ = 2/(1 + 1 − ξ 2 ), where ξ = 1 − 2 sin2 (π/2(n + 1)) as described in [40]. For n = 1000 we find ρ = γ − 1 = 0.993742741. If kei+1 k were exactly equal to ρ(M )kei k, then we would expect to need to take about 367 ≈ ln 10−1 / ln(ρ) iterations to estimate kzk to within a relative factor of about 90%. By actual computation we find kz367 k = 1.35 × 103 and kzexact k = 1.64 × 103 . The corresponding estimate of κF (A) is kAkF kz367 k/ω1000 = 4.13 × 106 , which compares favorably with the true value κF (A) = 8.18×106 . Note that the computational cost of the estimate is minor in comparison with the 2935 steps needed to generate a solution with relative accuracy of about 10−8 . In the above discussion we have implicitly assumed that we know the value of the spectral radius ρ = ρ(M ). In general, this value is not available to us but is readily estimated by noting that the differences di ≡ xi − xi−1 satisfy di = M di−1 . Thus the ratio ri ≡ kdi k/kdi−1 k, which is easily computed during the iteration process, tends to ρ as i increases. For other methods of estimating the condition of large linear systems see Grimes and Lewis [18], who show how the LINPACK estimator of the 2-norm condition number can be implemented, with minor scaling modifications, for sparse systems that are solved via factorization. The tridiagonal matrix example above is also treated in [18]. Duff, Erisman, and Reid [14] also consider condition estimates for direct methods applied to sparse matrices. Wright [42] gives a discussion of condition number convergence for matrices arising in finite-dimensional approximations of integral operators. See Higham [21] for additional references related to conditioning of large linear systems. 4. Relative perturbations. If we are interested in measuring the sensitivity of a function, then we may want the perturbations in the input data to be relative rather than absolute; see [35]. For example, if the perturbations in the vector b are due to truncation effects of finite precision representation in a computer, then we can expect the individual entries of b to satisfy fl(bi ) = bi (1 + δi ), where |δi | is less than or equal to the relative machine precision and fl(·) denotes the floating-point representation of a real number. The sensitivity effects of relative perturbations can be measured by the SCE method as described in [27]. This often leads to a more realistic indication of the accuracy in a computed solution than the standard 2-norm condition number might otherwise indicate. For example, inverse iteration requires the solution of an extremely ill-conditioned linear system of equations, yet it can be performed quite accurately; see [5] and [33] for examples and extended discussions. The SCE method, incorporating the relative perturbation approach described below, reliably accounts for the results obtained with inverse iteration. That inverse iteration generally gives excellent results is evidenced by small SCE condition estimates despite the large 2norm condition number of the inverse iteration coefficient matrix. For the purposes of exposition it is convenient to represent the input data [A, b] as a vector z = vec([A, b]) and to work with the function f (z) = A−1 b. The basic idea in measuring the sensitivity of a general function f = f (z) to relative perturbations is to introduce an intermediate function gz = f ◦ τz , where τz (y) ≡ (y1 z1 , . . . , yn zn ). Then gz (1, . . . , 1) = f (z1 , . . . , zn ), and absolute perturbations in the entries of y at y = (1, . . . , 1) are converted to relative perturbations in the entries of z: if ỹi = yi + δi = 1 + δi then y˜i zi = zi (1 + δi ). Because of this we may obtain relative sensitivity estimates for f by applying the statistical estimation method to gz . See [27] for details. See also [1], [2], [23], [35] for related discussions of “mixed” and “componentwise” condition numbers. 578 C. S. KENNEY, A. J. LAUB, AND M. S. REESE In view of the fact that τz is a diagonal scaling map of the form τz (y) = Tz y, where Tz = diag(z1 , . . . , zn ), the Fréchet derivative of gz is simply the composition of Tz with the Fréchet derivative of f : Dgz = Df ◦ Tz . This means that Algorithms 1 and 2 can be used directly (in estimating the sensitivity of f = A−1 b to relative perturbations) with the modification that, after generating and normalizing (or orthonormalizing as in Algorithm 2) the random elements of Step 1, these elements are then multiplied by the corresponding entries of A or b. The remaining steps in these algorithms are unchanged. If we apply the SCE algorithm for relative perturbations to Example 2, we obtain a relative condition vector whose entries are on the order of one, in agreement with the high accuracy of the computed solution and despite the large 2-norm condition number of A. Skeel’s condition number is as large as the standard 2-norm condition number for the linear system of this example: cond(A, x) = k |A−1 | |A| |x| k∞ ≈ 1010 ≈ κ2 (A). kxk∞ However, a componentwise version of Skeel’s condition number presented in [34] gives results similar to SCE. 5. Structured perturbations. Restrictions on the types of perturbations that may occur in a linear system often arise as a consequence of some physical property of the system or may be due to the solution process itself. For example, in radioactive decay problems, isotope decays are not reversible; this shows up as an upper triangular constraint on the form of the transition matrix. Similarly, when backsolving an upper triangular system we see no rounding effects associated with the lower triangular entries since these entries are never referenced in the solution process. Likewise, in solving a symmetric problem by symmetry-preserving methods we expect to see only symmetric perturbation effects. It is convenient to describe structure restrictions on matrices in terms of a map τ from a space M to Rn×n . The space M is determined by the variables needed to account for the desired structure. For example, symmetric Toeplitz matrices have constant values along their diagonals and hence are determined by 2n − 1 numbers. Toeplitz structure can be obtained by taking M = R2n−1 and letting τ be the map vn vn+1 · · · v2n−1 .. .. .. .. . . . . τ (v1 , . . . , v2n−1 ) = v3 . ··· v2 v3 ··· vn+1 v1 v2 v3 · · · vn As another example, upper triangular matrices of order n are determined by n(n + 1)/2 variables, so we can set M = Rn(n+1)/2 with h(v1 , . . . , vn(n+1)/2 ) = A, where aij = vj+(i−1)(2n−i)/2 for i ≤ j and aij = 0 otherwise. (Symmetric matrices can be handled in the same way except that we take aij = vi+(j−1)(2n−j)/2 for i > j.) We may also want to impose structural restrictions on the allowable perturbations in b, so we assume that M and the structure map τ are augmented to include b, i.e., τ has form τ (v) = [A, b] for v ∈ M. Structure restrictions of this type are easily accommodated by again using the fact that the Fréchet derivative of the composition of two maps is equal to the composition of the two Fréchet derivatives, i.e., D(f ◦ τ ) = Df ◦ Dτ. STATISTICAL CONDITION ESTIMATES 579 If τ is linear, as is the case in the preceding examples, then Dτ = τ and D(f ◦ τ ) = Df ◦ Dτ = Df ◦ τ. The SCE method maintains the desired structure by working with perturbations in the space of inputs M. This produces only slight changes in the basic small-sample condition estimation algorithm. Suppose τ is a linear map from the space of inputs M to Rn×n × Rn . By simply generating ṽ ∈ M randomly instead of à and b̃ in Algorithm 1, we obtain an algorithm to estimate the condition of the map f ◦ τ , where f (A, b) = A−1 b and τ (v) = [A, b]. We obtain a multiple-sample algorithm in a similar fashion by modifying Algorithm 2. Remark. The SCE condition estimation algorithm for structured perturbations can be especially useful for analyzing the condition of a block linear system. For example, some of the diagonal blocks may be very ill conditioned compared to others, or certain blocks may exhibit special types of perturbations. Standard condition estimation methods may be much more conservative than the SCE method for structured perturbations. Specifically, the Skeel componentwise condition mentioned in the previous section can greatly overestimate the true sensitivity of a linear system with structured perturbations, as can the LAPACK condition estimator. For example, neither the Skeel formula nor the LAPACK estimator can detect when perturbations are highly correlated in a linear system even though an extremely precise solution can often be obtained in such a situation. If we apply the structured perturbation version of SCE to the linear system with special perturbations introduced in (3), we obtain a tight upper bound for the componentwise forward error 2 × 10−31 . |∆x| ≤ 2 mach κsce (A, b) ≈ 6 × 10−16 Recall from section 1.1 that the componentwise forward error for (3) computed with (2) was very conservative. LAPACK also gives a conservative forward error estimate in double precision, with FERR ≈ 10−12 . Furthermore, the Skeel componentwise condition estimate κskeel (A, b) ≈ [102 , 103 ]T is large compared to the SCE estimate κsce (A, b) ≈ [10−16 , 1]T . 6. Componentwise condition of matrix inverses. Computing the explicit inverse of a matrix is generally very inefficient in the context of solving a system of linear equations. However, there do exist certain situations in which we must explicitly compute the inverse of a matrix, such as in the matrix sign function algorithm used to find a basis for the stable invariant subspace of a Hamiltonian matrix associated with an algebraic Riccati equation [28]. By using the results in section 5, it is straightforward to use the SCE method to construct sensitivity estimates of the elements of A−1 that respect the structure of A. For example, if A is symmetric, a symmetric perturbation of its elements is reflected in symmetry of the sensitivity estimates for the elements of the symmetric matrix A−1 . Similarly, when A is upper triangular, as in Example 3, the upper triangular structure in A−1 can be respected. Many other interesting matrix structures are preserved under inversion (but not all; e.g., a tridiagonal matrix does not generally have a tridiagonal inverse). It can also be of interest to estimate the sensitivity of specific elements of a matrix inverse to see which components of various right-hand-side vectors are most strongly 580 C. S. KENNEY, A. J. LAUB, AND M. S. REESE magnified in determining the solution of a linear system. Structural restrictions on allowable perturbations can significantly reduce the sensitivity of the inverse. As an example, consider a companion-form matrix problem. Let A have zero entries except for ones on the first superdiagonal and negative binomial coefficients anj = −n!/ ((j − 1)!(n − j + 1)!) along the bottom row. The motivation for choosing this matrix is that A has a multiple eigenvalue of order n at λ = −1 and small perturbations of A suffice to move one of these eigenvalues to zero, thus making A singular. For n = 20, we see that κ(A) = 1.4 × 1011 and so A is already quite ill conditioned with respect to inversion. Hence, for general perturbations in A and b, we can expect large variations in A−1 b. However, if we allow perturbations only in b and the last row of A, then the effect on the inverse can be minor. This surprising result can be shown analytically. If A is a companion-form matrix with anj = vj and everywhere except v1 = −1, then A−1 has zeros for ones on the first subdiagonal and the top row where A−1 1,j = vj+1 with A−1 1,n = −1. For example, for n = 4 v2 v3 v4 −1 0 1 0 0 1 0 0 0 0 1 0 0 −1 . A= 0 0 0 1 and A = 0 1 0 0 0 0 1 0 −1 v2 v3 v4 Thus if the variations in A are restricted to the (n, j) entries for 1 < j ≤ n, then the changes in the inverse are exactly the same as the variations in A but shifted to the top row. This is perfect conditioning since there is no growth in the size of the perturbations as we move from input to output. This type of structure-dependent conditioning is easily detected using SCE methods. Using the structured condition estimator described in section 5 (with perturbations restricted to the entries of b and the last row of A) we find that κrel = 6.6 × 101 , whereas if we use the general perturbation condition estimate of Algorithm 1, we find (for a random right-hand side b) that κrel = 2.3 × 107 . 7. Comparison with existing algorithms. The performance of SCE is comparable to that of condition estimation algorithms found in standard software libraries. MATLAB code written by the authors employing SCE methods for linear systems requires on the order of k 2 n2 floating-point multiply-adds, where k is the number of SCE samples and n is the order of the linear system being solved. For the majority of problems, k = 1 is quite acceptable. MATLAB’s rcond, based on the LINPACK routine DGECO, requires on the order of 2n2 floating-point multiply-adds. LAPACK’s condition estimator (STRCON) and MATLAB’s condest, based on Higham’s modification of Hager’s method [20], [22], also require on the order of 2n2 floating-point multiply-adds. Table 2 compares condition estimation methods applied to random matrices of size n × n. The exact 2- and 1-norm condition numbers, κ2 and κ1 , are listed in columns 2 and 3. Columns 4 and 5 give the 1-norm condition estimates 1/rcond and condest, respectively, from MATLAB. SCE actually computes an estimate of the Frobenius-norm condition number, κF = kAkF kA−1 kF , which is listed in column 6 of the table. It is easily shown that 1 n κ1 (A) κ2 (A) ≤ ≤ κF (A) κF (A) ≤ ≤ n κ1 (A), n κ2 (A). Columns 7–9 of Table 2 present the results of running the authors’ implementation of the SCE algorithm in MATLAB with k = 1, 2, and 4 function evaluations, respectively. 581 STATISTICAL CONDITION ESTIMATES TABLE 2 Condition estimates from cond, 1/rcond, condest, and SCE on random n × n matrices. n 4 16 64 256 1024 κ2 6.4e1 3.3e2 1.6e2 3.9e3 5.0e3 κ1 1.2e2 1.2e3 1.4e3 7.6e4 1.8e5 1/rcond 8.9e1 7.7e2 4.7e2 2.8e4 5.2e4 condest 1.2e2 1.2e3 1.4e3 7.6e4 1.7e5 κF 7.4e1 7.5e2 7.5e2 3.1e4 9.0e4 (1) (2) κsce 2.6e1 8.0e2 6.8e2 4.2e3 1.4e5 κsce 9.7e1 1.2e3 7.1e2 5.7e4 1.0e5 (4) κsce 7.4e1 9.9e2 1.2e3 2.0e4 6.5e4 TABLE 3 Averages over 1000 random matrices of log condition numbers for 2-norm, Frobenius-norm, and the SCE method with one and four function evaluations. n 4 8 16 32 64 128 256 512 avg. ln(κ2 ) 2.58 3.45 4.22 4.93 5.69 6.37 7.02 7.78 ln(n) + 1.537 2.92 3.62 4.31 5.00 5.70 6.39 7.08 7.78 avg. ln(κF ) 2.91 4.04 5.10 6.13 7.22 8.25 9.24 10.33 (1) avg. κsce 2.84 3.95 4.98 6.00 7.08 8.14 9.16 10.25 (4) avg. κsce 2.91 4.02 5.08 6.11 7.19 8.23 9.21 10.31 All condition estimates in the table are within about an order of magnitude of the condition numbers that they estimate, which is usually sufficient in practice. Edelman showed in [15] that if A ∈ Rn×n is random, then the expected value of ln(κ2 (A)) is asymptotically ln(n)+1.537. Edelman’s experiment was repeated and the SCE method applied. Table 3 lists the results of averaging log condition numbers of 1000 random matrices A with N(0, 1) coefficients for various values of n. Column 2 of the table gives averages for ln(κ2 (A)), and column 3 gives the value of ln(n)+1.537 as a reference. Columns 4–6 give, respectively, averages of the natural logarithms of κF (A) and the SCE method applied to A with one and four function evaluations. The table is consistent with Edelman’s results, and it shows that the SCE method estimates the Frobenius-norm condition quite well with just a single function evaluation. Of course, SCE with four function evaluations estimates the Frobenius-norm condition number better, but only slightly. 8. Conclusion. Small-sample SCE is a new method of estimating the condition of general matrix functions. For the problem of estimating the condition of linear systems, SCE has cost comparable to standard condition estimation methods, and it has several major advantages over these methods. Its most important advantage is that it provides, at no extra cost, a matrix of condition numbers rather than a single condition number. Thus it provides componentwise condition estimates for linear system solutions and can thereby indicate when a computed solution may be determined accurately even if the condition number with respect to inversion of the coefficient matrix is large. The method’s flexibility also enables it to handle structured systems. A rigorous error probability analysis exists for the SCE method. A possible disadvantage is that it does not provide an estimate of the null vector in the nearly singular case. If an approximate null vector is needed, then another condition estimation method may be more appropriate. MATLAB software written by the authors incorporating the SCE method has performed successfully on a wide variety of linear systems. 582 C. S. KENNEY, A. J. LAUB, AND M. S. REESE Acknowledgments. The authors are grateful to Shiv Chandrasekaran and an anonymous reviewer for many useful comments during the preparation of this paper. REFERENCES [1] E. ANDERSON, Z. BAI, C. BISCHOF, J. DEMMEL, J. DONGARRA, J. DU CROZ, A. GREENBAUM, S. HAMMARLING, A. MCKENNEY, S. OSTROUCHOV, AND D. SORENSEN, LAPACK Users’ Guide, 2nd ed., SIAM, Philadelphia, 1994. [2] M. ARIOLI, J. DEMMEL, AND I. DUFF, Solving sparse linear systems with sparse backward error, SIAM J. Matrix Anal. Appl., 10 (1989), pp. 165–190. [3] K.E. ATKINSON, An Introduction to Numerical Analysis, 2nd ed., Wiley, New York, 1989. [4] W.L. BRIGGS, A Multigrid Tutorial, SIAM, Philadelphia, 1987. [5] S. CHANDRASEKARAN, When is a Linear System Ill-Conditioned?, Ph.D. thesis, Yale University, New Haven, CT, December 1994. [6] S. CHANDRASEKARAN AND I.C.F. IPSEN, On the sensitivity of solution components in linear systems of equations, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 93–112. [7] F. CHATELIN AND V. FRAYSSÉ, Qualitative Computing: Elements of a Theory for Finite Precision Computation, CERFACS, Orsay, France, 1993. [8] A.K. CLINE, C.B. MOLER, G.W. STEWART, AND J.H. WILKINSON, An estimate for the condition number of a matrix, SIAM J. Numer. Anal., 16 (1979), pp. 368–375. [9] A.K. CLINE AND R.K. REW, A set of counter-examples to three condition number estimators, SIAM J. Sci. Statist. Comput., 4 (1983), pp. 602–611. [10] J. DEMMEL, On condition numbers and the distance to the nearest ill-posed problem, Numer. Math., 51 (1987), pp. 251–289. [11] J. DEMMEL, The probability that a numerical analysis problem is difficult, Math. Comp., 50 (1988), pp. 449–480. [12] J.D. DIXON, Estimating extremal eigenvalues and condition numbers of matrices, SIAM J. Numer. Anal., 20 (1983), pp. 812–814. [13] J.J. DONGARRA, J.R. BUNCH, C.B. MOLER, AND G.W. STEWART, LINPACK Users’ Guide, SIAM, Philadelphia, 1979. [14] I.S. DUFF, A.M. ERISMAN, AND J.K. REID, Direct Methods for Sparse Matrices, Oxford University Press, Oxford, 1986. [15] A. EDELMAN, Eigenvalues and condition numbers of random matrices, SIAM J. Matrix Anal. Appl., 9 (1988), pp. 543–560. [16] G.H. GOLUB AND C.F. VAN LOAN, Matrix Computations, 2nd ed., Johns Hopkins University Press, Baltimore, 1989. [17] W.B. GRAGG AND G.W. STEWART, A stable variant of the secant method for solving nonlinear equations, SIAM J. Numer. Anal., 13 (1976), pp. 889–903. [18] R.G. GRIMES AND J.G. LEWIS, Condition number estimation for sparse matrices, SIAM J. Sci. Statist. Comput., 2 (1981), pp. 384–388. [19] T. GUDMUNDSSON, C.S. KENNEY, AND A.J. LAUB, Small-sample statistical estimates for matrix norms, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 776–792. [20] W.W. HAGER, Condition estimates, SIAM J. Sci. Statist. Comput., 5 (1984), pp. 311–316. [21] N.J. HIGHAM, A survey of condition number estimation for triangular matrices, SIAM Rev., 29 (1987), pp. 575–596. [22] N.J. HIGHAM, Algorithm 674: FORTRAN codes for estimating the one-norm of a real or complex matrix, with applications to condition estimation, ACM Trans. Math. Software, 14 (1988), pp. 381–396. [23] N.J. HIGHAM, Iterative refinement enhances the stability of QR factorization methods for solving linear equations, BIT, 31 (1991), pp. 447–468. [24] N.J. HIGHAM, A survey of componentwise perturbation theory in numerical linear algebra, in Mathematics of Computation 1943–1993: A Half Century of Computational Mathematics 48, W. Gautschi, ed., Proc. Sympos. Appl. Math. 48, American Mathematical Society, Providence, RI, 1994, pp. 49–77. [25] N.J. HIGHAM, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, 1996. [26] C.S. KENNEY AND A.J. LAUB, Condition estimates for matrix functions, SIAM J. Matrix Anal. Appl., 10 (1989), pp. 191–209. [27] C.S. KENNEY AND A.J. LAUB, Small-sample statistical condition estimates for general matrix functions, SIAM J. Sci. Comput., 15 (1994), pp. 36–61. [28] C.S. KENNEY AND A.J. LAUB, The matrix sign function, IEEE Trans. Automat. Control, 40 (1995), pp. 1330–1348. STATISTICAL CONDITION ESTIMATES 583 [29] C.S. KENNEY, A.J. LAUB, AND M.S. REESE, Statistical condition estimation for linear least squares, SIAM J. Matrix Anal. Appl., to appear. [30] C.S. KENNEY, A.J. LAUB, AND S.C. STUBBERUD, Frequency response computation via rational interpolation, IEEE Trans. Automat. Control, 38 (1993), pp. 1203–1213. [31] J. KUCZYŃSKI AND H. WOŹNIAKOWSKI, Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1094– 1122. [32] T.J. LEE, Adaptive condition estimation for matrices with rank-one modifications, in Proc. SPIE, Advanced Signal Processing: Algorithms, Architectures and Implementations V, vol. 2296, San Diego, CA, July 1994, pp. 376–387. [33] G. PETERS AND J.H. WILKINSON, Inverse iteration, ill-conditioned equations and Newton’s method, SIAM Rev., 21 (1979), pp. 339–360. [34] J. ROHN, New condition numbers for matrices and linear systems, Computing, 41 (1989), pp. 167–169. [35] R. SKEEL, Scaling for numerical stability in Gaussian elimination, J. Assoc. Comput. Mach., 26 (1979), pp. 494–526. [36] G.W. STEWART, Stochastic perturbation theory, SIAM Rev., 32 (1990), pp. 579–610. [37] G.W. STEWART, An updating algorithm for subspace tracking, IEEE Trans. Signal Processing, 40 (1992), pp. 1535–1541. [38] G.W. STEWART, Updating a rank-revealing ULV decomposition, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 494–499. [39] L.N. TREFETHEN, Pseudospectra of matrices, in Numerical Analysis 1991, D. Griffiths and G. Watson, eds., Longman, Harlow, UK, 1992, pp. 234–266. [40] R.S. VARGA, Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1962. [41] J.H. WILKINSON, The Algebraic Eigenvalue Problem, Oxford University Press, Oxford, 1965. [42] K. WRIGHT, Asymptotic properties of matrices associated with the quadrature method for integral equations, in Treatment of Integral Equations by Numerical Methods, C. Baker and G. Miller, eds., Academic Press, London, 1982, pp. 325–336. [43] D.M. YOUNG, Iterative Solution of Large Linear Systems, Academic Press, New York, 1971.
© Copyright 2026 Paperzz