FAST INTERVAL MATRIX MULTIPLICATION 1. Introduction and

submitted for publication, May 18, 2011
FAST INTERVAL MATRIX MULTIPLICATION
SIEGFRIED M. RUMP
∗
Abstract. Several methods for the multiplication of point and interval matrices with interval result are discussed. Some
are based on new priori estimations of the error of matrix products in floating-point rounding to nearest. All methods are
much faster than the classical method because almost no switch of the rounding mode is necessary, and they use only highly
optimized BLAS3 routines. We discuss several possibilities to trade overestimation against computational effort. Numerical
examples focussing in particular on applications of interval matrix multiplications are presented.
Key words. interval arithmetic, rounding mode, matrix product, BLAS, unit in the first place (ufp), error analysis
AMS subject classifications. 15-04, 65G99, 65-04
1. Introduction and Notation. Let F denote a set of binary floating-point numbers according to the
IEEE 754 floating-point standard [5]. Throughout the paper we assume that no overflow occurs, but allow
underflow.
The relative rounding error unit, the distance from 1.0 to the next smaller1 floating-point number, is denoted
by eps, and the underflow unit by eta, that is the smallest positive (subnormal) floating-point number. For
IEEE 754 double precision we have eps = 2−53 and eta = 2−1074 . Denote by realmin := 21 eps−1 eta the
smallest positive normalized floating-point number
We denote by fl2 : R → F a rounding to nearest, that is
x ∈ R : |fl2 (x) − x| = min{|f − x| : f ∈ F} .
(1.1)
Any rounding of the tie can be used without jeopardizing the following estimates, only (1.1) must hold true.
This implies that for rounding downwards or upwards or towards zero, all bounds are true mutatis mutandis
using 2eps instead of eps. Particularly this may be useful for cell processors [11].
Moreover, we need the rounding downwards and upwards defined by
(1.2)
x∈R :
fl∇ (x) := max{f ∈ F : f ≤ x} and
fl△ (x) := min{f ∈ F : f ≥ x} .
Interval quantities with floating-point bounds, i.e. the set of scalar intervals, interval vectors and matrices
are defined by
(1.3)
A ∈ IFm×n :⇒ A ∈ { [A1 , A2 ] : A1 , A2 ∈ Fm×n and A1 ≤ A2 } ,
where here and in the following interval quantities are written in bold letters, and comparison is to be
understood componentwise. Such an interval quantity represents the set of all real matrices within the
bounds, i.e.
[A1 , A2 ] = { A ∈ Rm×n : A1 ≤ A ≤ A2 } .
Any A ∈ Fm×n can be identified with its “point interval” A := [A, A].
∗ Institute for Reliable Computing, Hamburg University of Technology, Schwarzenbergstraße 95, Hamburg 21071, Germany,
and Visiting Professor at Waseda University, Faculty of Science and Engineering, 3–4–1 Okubo, Shinjuku-ku, Tokyo 169–8555,
Japan ([email protected]).
1 Note that sometimes the distance from 1.0 to the next larger floating-point number is used; for example, Matlab adopts
this rule.
1
2
S. M. RUMP
We also use an alternative representation for A = [A1 , A2 ] ∈ IFm×n . Define
(1.4)
mA :=
1
(A1 + A2 ) ∈ Rm×n
2
and rA :=
1
(A2 − A1 ) ∈ Rm×n ,
2
so that A1 = mA − rA, A2 = mA + rA. Note that, in general, mA, rA ∈
/ Fm×n . Then besides the
infimum-supremum representation (1.3) we also use the midpoint-radius representation
(1.5)
A = ⟨mA, rA⟩ := { A ∈ Rm×n : mA − rA ≤ A ≤ mA + rA} .
Interval operations between compatible interval quantities A, B are defined by
∩
(1.6)
A ◦ B :=
{C : A ◦ B ∈ C for all A ∈ A, B ∈ B}
with C of appropriate dimension and ◦ ∈ {+, −, ·} (in this paper we do not need division). Note that C has
floating-point bounds. For scalar interval quantities a = [a1 , a2 ] and b = [b1 , b2 ], define c = [c1 , c2 ] by
(
)
c1 := min fl∇ (a1 ◦ b1 ) , fl∇ (a1 ◦ b2 ) , fl∇ (a2 ◦ b1 ) , fl∇ (a2 ◦ b2 ) ∈ F ,
(
)
c2 := max fl△ (a1 ◦ b1 ) , fl△ (a1 ◦ b2 ) , fl△ (a2 ◦ b1 ) , fl△ (a2 ◦ b2 ) ∈ F .
Then
c=a◦b ,
so that scalar interval operations can be computed precisely without overestimation using floating-point
arithmetic with directed rounding.
This is no longer the case for interval matrix multiplication. Recall that A · B denotes the narrowest interval
matrix with floating-point bounds comprising of all A · B for A ∈ A, B ∈ B. The classical way (cf. [14]) to
compute an inclusion of C of A · B, which is used in many libraries such as ACRITH [4], ARITHMOS [1],
Intlib [6] or C-XSC [7], is a 3-fold loop defining
(1.7)
Cij := Ai1 · B1j + . . . + Ain · Bnj ,
where all additions and multiplication are interval operations. We abbreviate it by classical(A · B) := C.
Obviously A · B ⊆ classical(A · B).
The aim of this paper is to derive fast algorithms for interval matrix multiplication, i.e. for enclosing A · B.
In particular, only BLAS3 routines [2] will be used. Those BLAS3 routines such as DGEM M for matrix
multiplication may use blocking, multi-tasking and other methods to improve performance. In the following
we assume that the rounding mode of the processor can be permanently changed, so that all subsequent
floating-point operations are executed in the specified rounding mode. For those methods using a priori
estimation of the error in floating-point rounding to nearest, we assume in addition that for two matrices of
the same dimensions the order of execution does not change.
2. Error estimates. Denote by rnd(·) the result obtained when every operation in the expression
within the parenthesis is executed in the specified rounding mode. Then, for example, 2(A · B) ∈ Fm×n
denotes the result of DGEM M in rounding to nearest for A ∈ Fm×k and B ∈ Fk×n , and ∇(A · B) and
△(A · B) denotes the result of DGEM M in rounding downwards and upwards, respectively, then
(2.1)
∇(A · B) ≤ A · B ≤ △(A · B) ,
where the inequalities are to be understood componentwise.
In this paper we use the improved error estimates for floating-point operations I introduced in [20]. They
are based on the “unit in the first place (ufp)” or leading binary bit of a real number, which is defined by
(2.2)
0 ̸= r ∈ R
⇒
ufp(r) := 2⌊log2 |r|⌋ ,
FAST INTERVAL MATRIX MULTIPLICATION
:= ufp(f )
3
2 eps f
Fig. 2.1. Normalized floating-point number: unit in the first place and unit in the last place
where ufp(0) := 0. This concept is independent of a floating point format or underflow range, and it applies
to real numbers. It gives a convenient way to characterize the bits of a normalized floating-point number
f : they range between the leading bit ufp(f ) and the unit in the last place 2eps · ufp(f ). In particular any
floating-point number is in etaZ, also in underflow. The situation is depicted in Figure 2.1.
Many interesting properties of ufp(·) are given in [20] without which certain delicate estimations of errors in
floating-point computations had not been possible. In the following we need only
0 ̸= x ∈ R :
(2.3)
ufp(x) ≤ |x| < 2ufp(x) ,
which is clear from the definition. When rounding x ∈ R into f := fl(x) ∈ F, the error is sharply characterized
by [20]
(2.4)
f = x + δ + η with |δ| ≤ eps · ufp(x) ≤ eps · ufp(f ) ≤ eps|f |, |η| ≤ eta/2, δη = 0 .
This implies in particular for floating-point addition and multiplication
(2.5)
(2.6)
f = fl(a + b)
⇒
f = a + b + δ with |δ| ≤ eps · ufp(a + b) ≤ eps · ufp(f ) ,
f = fl(a · b) ⇒ f = a · b + δ + η with |δ| ≤ eps · ufp(a · b) ≤ eps · ufp(f ), |η| ≤ eta/2 ,
where δη = 0 in the latter case.
Using these concepts, in [18] the following computable error estimate for dot products was proved.
Theorem 2.1. Let A ∈ Fm×k and B ∈ Fk×n with (k + 2)eps ≤ 1 be given. Define C := 2(A · B) and
Γ := 2(|A| · |B|). Then
(2.7)
| C − A · B | ≤ (k + 2) · eps · ufp(Γ) + realmin .
The factor k + 2 cannot be replaced by k + 1.
Fortunately there is a simple algorithm to compute the unit in the first place [20], which we repeat for
convenience. It works correct in the underflow range but causes overflow for input very close to the largest
representable floating-point number. The latter case is cured by scaling.
Algorithm 2.2. Unit in the first place of a floating-point number.
function S = ufp(p)
q = fl(φ p)
S = fl(|q − (1 − eps)q|)
for φ := (2eps)−1 + 1
Note that this algorithm can be applied to a matrix as well. For an n × n-matrix, the unit in the first place
is computed in 4n2 floating-point operations.
In this paper we will compute the bound (2.7) in rounding upwards, so we don’t have to care about rounding
errors. We mention that it can be calculated in rounding to nearest as well. This implies the following
computable error estimate for the floating-point product of matrices. Note that all operations are in rounding
to nearest.
Corollary 2.3. Let A ∈ Fm×k and B ∈ Fk×n with (k + 2)eps ≤ 1 be given. Define C := 2(A · B),
Γ := 2(|A| · |B|), R := ufp(Γ) and ϱ := 32 realmin. Then
(
)
(2.8)
| C − A · B | ≤ 2 (k + 2) · eps · R + ϱ ,
4
S. M. RUMP
where the two multiplications in (k + 2) · eps · R can be carried out in both ways.
The estimate (2.7) is superior to the classical bound [3]
(2.9)
|2(A · B) − A · B| ≤ γk · |A| · |B| with
γk :=
keps
1 − keps
in several ways: Due to the uf p-concept, (2.7) is up to a factor 2 better than (2.9), whereas the latter is not
only not computable but also only valid without the presence of underflow.
3. Matrix multiplication. We are aiming on fast implementations of interval matrix multiplication.
Let A, B ∈ Fn×n and A, B ∈ IFn×n be given. All of the following is formulated for square matrices,
application and adaption to rectangular matrices is straightforward. Three cases have to be distinguished:
(3.1)
1) A · B
two point matrices to interval result
2) A · B
point matrix times interval matrix
3) A · B
two interval matrices
3.1. Two point matrices to interval result. The first case is directly solved by
A · B ∈ [ fl∇ (A · B) , fl△ (A · B) ] .
(3.2)
The overestimation compared to A · B for A := [A, A] and B := [B, B] is proportional to the condition
number of the individual dot products. A more accurate inclusion or the precise result A · B can be
computed using accurate dot product techniques as described in [10, 20, 9]. An implementation of (3.2) in
executable INTLAB code is as follows.
Algorithm 3.1. Product of two floating-point matrices to interval result.
function C = FFmul(A,B)
setround(-1)
Cinf = A*B;
setround(1)
Csup = A*B;
C = infsup(Cinf,Csup);
%
%
%
%
%
switch rounding to downwards
floating-point product rounded downwards
switch rounding to upwards
floating-point product rounded upwards
compose result of infimum and supremum
The algorithm is correct also in the presence of underflow, there is no restriction on the dimension n.
Algorithm 3.1 requires 2 matrix multiplication, and it seems hard to imagine a faster possibility. There is
no overestimation compared to the classical approach.
3.2. Point matrix times interval matrix. Concerning the second case, the classical algorithm needs
2n switches of the rounding mode, which, in addition to the time needed for this switch, causes a drastic
slow down due to lack of optimization of the code. To improve this we developed Profil/Bias [8] at my
institute, which reduces the number of switches of the rounding mode to 2n2 . The inner loop is suited for
optimization, however, only Blas1 routines can be used.
3
Therefore I shifted to the midpoint-radius representation [17] in INTLAB for a fast multiplication by using
Blas3 routines:
(3.3)
A ∈ Rn×n , B ∈ IRn×n
⇒
[A, A] · B = ⟨ A · mid(B) , |A| · rad(B) ⟩ .
The midpoint and radius on the right hand side are, in general, not floating-point matrices. To obtain
a computable inclusion, we first need a conversion between infsup- and midrad-representation. Whereas
FAST INTERVAL MATRIX MULTIPLICATION
5
Table 3.1
Minimum, mean, median and maximum ratio the radii of the results of F Imul3 (A, B) and classical(A · B).
e
−11
10
10−12
10−13
10−14
10−15
10−16
minimum
1.0000
0.9997
0.9979
0.9811
0.9298
0.8937
mean
1.0000
1.0000
1.0001
1.0006
1.0047
1.0119
median
1.0000
1.0000
1.0000
1.0001
1.0014
1.0062
maximum
1.0000
1.0002
1.0021
1.0194
1.1000
1.1579
converting midpoint-radius to infimum-supremum is obvious, the reverse is tricky, but can be solved in the
following elegant way proposed by Oishi: B = [B1 , B2 ] ⊆ ⟨mB, rB⟩ with
(3.4)
mB := △(B1 + (B2 − B1 )/2) and rB := △(mB − B1 ) .
Note that mB, rB ∈ Fn×n . With this an inclusion [C1 , C2 ] of the product of a point matrix A times an
interval matrix B = [B1 , B2 ] can be computed as follows.
Algorithm 3.2. Point matrix times interval matrix, needs 3 point matrix multiplications.
function [C1 , C2 ] = FImul3 (A, B)
mB := △(B1 + (B2 − B1 )/2)
rB := △(mB − B1 )
% [B1 , B2 ] ⊆ ⟨mB, rB⟩
rC := △(|A| · rB)
(
)
C2 := △ A · mB + rC
(
)
C1 := ∇ A · mB − rC
% |A| · rB ≤ rC
% A · mB + |A| · rB ≤ C2
% C1 ≤ A · mB − |A| · rB
The algorithm is correct also in the presence of underflow, there is no restriction on the dimension n, and it
requires 3 matrix multiplications.
In theory, (3.3) implies that without the presence of rounding errors the result of Algorithm 3.2 is identical
with A · B. In practice, Algorithm 3.2 uses the computed ⟨B1 , B2 ⟩ rather than [B1 , B2 ], and due to the
potential overestimation one might conclude that Algorithm 3.2 cannot be superior to the classical interval
matrix multiplication.
However, as already pointed out in [17], this is not true. Generate random matrices A, B ∈ Fn×n with
pseudo-random entries drawn from a normal distribution with mean zero and standard deviation one, and
define B := [B − e, B + e]. For large values of e there is practical no difference between classical(A · B) and
F Imul3 (A, B), but this changes for smaller values of e.
As can be seen in Table 3.1, the classical interval matrix multiplication is, in general, a little superior to
Algorithm 3.2, but sometimes the radius of the one or of the other is better by up to 10%. However, due
to the lack of use of BLAS3 routines and many switches of the rounding mode the classical or Profil/Bias
algorithms are much slower than Algorithm 3.2.
One matrix multiplication may be saved using Theorem 2.1 to estimate the error in the floating-point
computation of mA · mB. The algorithm is as follows.
6
S. M. RUMP
Algorithm 3.3. Point matrix times interval matrix, needs 2 point matrix multiplications.
function [C1 , C2 ] = FImul2 (A, B)
mB := △(B1 + (B2 − B1 )/2)
rB := △(mB − B1 )
% [B1 , B2 ] ⊆ ⟨mB, rB⟩
mC := 2(A · mB)
(
[
]
)
rC := △ |A| · (n + 2)eps|mB| + rB + realmin
(
)
C2 := △ mC + rC
(
)
C1 := ∇ mC − rC
% floating-point approximation
% |2(mC − A · B)| + |A| · rB ≤ rC
% A · mB + |A| · rB ≤ C2
% C1 ≤ A · mB − |A| · rB
The algorithm is correct, also in the presence of underflow, provided (n + 2)eps ≤ 1. Algorithm 3.3 requires
2 matrix multiplications.
The proof of validity follows by Theorem 2.1, by
(3.5)
(
)
ufp 2(|A| · |mB|) ≤ 2(|A| · |mB|) ≤ △(|A| · |mB|)
and proper use of rounding modes. Theorem 2.1 bounds the error in the floating-point computation of
(
)
mC := 2(A · mB) using ufp 2(|A| · |mB|) rather than 2(|A| · |mB|). In Algorithm 3.3 we cannot use the
fact that ufp(x) is up to a factor 2 smaller than |x|.
3.3. Two interval matrices. The classical approach (1.7) for an inclusion of A · B needs at least 2n3
switches of the rounding mode and jeopardizes compiler optimization. Also in Profil/Bias [8] there is no cure
for this. In [17] we discussed the midpoint-radius representation used in INTLAB for a fast multiplication
by using Blas3 routines:
(3.6)
⟨mA, rA⟩ · ⟨mB, rB⟩ ⊆ ⟨ mA · mB , |mA| · rB + rA · (|mB| + rB) ⟩ .
Note that the inclusion is correct without the presence of rounding errors. On the computer, an inclusion
[C1 , C2 ] of the product of two interval matrices A = [A1 , A2 ], B = [B1 , B2 ] is computed as follows.
Algorithm 3.4. Interval matrix times interval matrix, needs 4 point matrix multiplications.
function [C1 , C2 ] = IImul4 (A, B)
mA := △(A1 + (A2 − A1 )/2); rA := △(mA − A1 )
% [A1 , A2 ] ⊆ ⟨mA, rA⟩
mB := △(B1 + (B2 − B1 )/2); rB := △(mB − B1 )
(
)
rC := △ |mA| · rB + rA · (|mB| + rB)
(
)
C2 := △ mA · mB + rC
(
)
C1 := ∇ mA · mB − rC
% [B1 , B2 ] ⊆ ⟨mB, rB⟩
% Γ := |mA| · rB + rA · (|mB| + rB) ≤ rC
% mA · mB + Γ ≤ C2
% C1 ≤ mA · mB − Γ
The algorithm is correct also in the presence of underflow, there is no restriction in the dimension n, and it
requires 4 matrix multiplications.
As is well-known there is some overestimation because the midpoint of the product need not be equal the
product of the midpoints. This overestimation has been estimated in [17] using the relative precision of the
input intervals.
Definition 3.5. An interval ⟨mA, rA⟩ not containing 0 is said to be of relative precision e, 0 ≤ e ∈ R, if
rA ≤ e · |mA| .
An interval containing 0 is said to be of relative precision 1.
FAST INTERVAL MATRIX MULTIPLICATION
7
Suppose all entries in A and B are of relative precision e and f , respectively. Then Proposition 2.7 in [17]
shows that the maximum overestimation of the result computed by (3.6) compared to A · B is
at most a factor 1 +
ef
e+f
in radius .
This value is globally bounded by 1.5. For example, if all entries in A and B are at least of relative precision
0.01 or 1%, then the overestimation in radius is bounded by a factor 1.005, i.e. the radii may be too wide
by at most 0.5%.
As before, one matrix multiplication may be saved using Theorem 2.1 to estimate the error in the floatingpoint computation of mA · mB.
Algorithm 3.6. Interval matrix times interval matrix, needs 3 point matrix multiplications.
function [C1 , C2 ] = IImul3 (A, B)
mA := △(A1 + (A2 − A1 )/2); rA := △(mA − A1 )
% [A1 , A2 ] ⊆ ⟨mA, rA⟩
mB := △(B1 + (B2 − B1 )/2); rB := △(mB − B1 )
% [B1 , B2 ] ⊆ ⟨mB, rB⟩
mC := 2(mA · mB)
(
)
rB ′ := △ (n + 2)eps|mB| + rB
% floating-point approximation
′
rC := △(|mA| · rB + realmin + rA · (|mB| + rB)
(
)
C2 := △ mC + rC
(
)
C1 := ∇ mC − rC
)
% includes error of 2(mA · mB)
% Γ := |mA| · rB + rA · (|mB| + rB) ≤ rC
% mA · mB + Γ ≤ C2
% C1 ≤ mA · mB − Γ
The algorithm is correct, also in the presence of underflow, provided (n + 2)eps ≤ 1, and it requires 3 matrix
multiplications. The proof of validity follows as in (3.5).
Recently Hong Diep Nguyen and Nathalie Revol [15] suggested an alternative diminishing overestimation.
For an interval A = ⟨mA, rA⟩ define (using Matlab notation)
(3.7)
ρA := sign(mA). ∗ min(|mA|, rA) .
Then
(3.8)
A · B ⊆ ⟨ mA · mB + ρA · ρB , |mA| · rB + rA · (|mB| + rB) − |ρA| · |ρB| ⟩ .
This can be seen using the fact that for scalar intervals a, b the true product c = ⟨mc, rc⟩ satisfies ([14],
Proposition 1.6.5)
(3.9)
(
)
mc := ma · mb + sign(ma · mb) · min ra|mb|, |ma|rb, ra · rb
(
)
rc := max ra(|mb| + rb), (|ma| + ra)rb, ra|mb| + |ma|rb .
Note that this formula does not extend directly to interval matrices. Nguyen and Revol show that the
√
maximum overestimation of (3.8) is at most a factor 4 − 2 2 ≈ 1.18 in radius. They define the following
algorithm to compute an inclusion of A · B.
8
S. M. RUMP
Algorithm 3.7. Interval matrix times interval matrix, needs 7 point matrix multiplications.
function [C1 , C2 ] = IImul7 (A, B)
mA := △(A1 + (A2 − A1 )/2); rA := △(mA − A1 )
% [A1 , A2 ] ⊆ ⟨mA, rA⟩
mB := △(B1 + (B2 − B1 )/2); rB := △(mB − B1 )
% [B1 , B2 ] ⊆ ⟨mB, rB⟩
ρA := sign(mA). ∗ min(|mA|, rA)
ρB := sign(mB). ∗ min(|mB|, rB)
% quantities according to (3.7)
rC := △(|mA| · rB + rA · (|mB| + rB) + (−|ρA|) · |ρB| % upper bound for radius Γ in (3.8)
(
)
C2 := △ mA · mB + ρA · ρB + rC
% mA · mB + ρA · ρB + Γ ≤ C2
(
)
C1 := ∇ mA · mB + ρA · ρB − rC
% C1 ≤ mA · mB + ρA · ρB − Γ
The algorithm is correct also in the presence of underflow, there is no restriction on the dimension n, and it
requires 7 matrix multiplications.
In order to develop a faster algorithm, we first rewrite the radius in (3.8) into
(3.10)
Γ := |mA| · rB + rA · (|mB| + rB) − |ρA| · |ρB|
(
)
= (|mA| + rA) · (|mB| + rB) − |mA| · |mB| + |ρA| · |ρB| .
For narrow interval matrices, two relatively large quantities |mA| · |mB| are subtracted and may cause cancelation. However, this applies to numerical computations in floating-point rounding to nearest; for interval
quantities with nonzero diameter there is no cancelation. In some way this extends to the computation of
the radius, see the computational results in Section 4.
Formulation (3.10) together with Theorem 2.1 gives the possibility to calculate the |mA| · |mB| + |ρA| · |ρB|
(
part of the radius also in floating-point. The introduced error is estimated using µ := eps |mA| · |mB| +
)
|ρA| · |ρB| , whereas the error of the floating-point computation of the midpoint mA · mB + ρA · ρB is based
on the same quantity µ. This saves 2 matrix multiplications. The corresponding algorithm is as follows.
Algorithm 3.8. Interval matrix times interval matrix, needs 5 point matrix multiplications.
function [C1 , C2 ] = IImul5 (A, B)
mA := △(A1 + (A2 − A1 )/2); rA := △(mA − A1 )
% [A1 , A2 ] ⊆ ⟨mA, rA⟩
mB := △(B1 + (B2 − B1 )/2); rB := △(mB − B1 )
% [B1 , B2 ] ⊆ ⟨mB, rB⟩
ρA := sign(mA). ∗ min(|mA|, rA)
ρB := sign(mB). ∗ min(|mB|, rB)
(
)
mC := 2 mA · mB + ρA · ρB
(
)
(
)
Γm := 2 |mA| · |mB| ; Γρ := 2 |ρA| · |ρB|
(
)
γ := △ (2n + 6)eps(ufp(Γm ) + ufp(Γρ )) + 4realmin
(
)
rC := △ (|mA| + rA) · (|mB| + rB) − Γm − Γρ + γ
(
)
C2 := △ mC + rC
(
)
C1 := ∇ mC − rC
% quantities according to (3.7)
% midpoint in rounding to nearest
% error bound
% upper bound for (3.10)
% mA · mB + ρA · ρB + Γ ≤ C2
% C1 ≤ mA · mB + ρA · ρB − Γ
The algorithm is correct, also in the presence of underflow, provided (n + 2)eps ≤ 1, and it requires 5 matrix
multiplications.
For the proof of correctness of the Algorithm 3.8 we need the following lemma.
Lemma 3.9. Let A, B, C, D ∈ Fn×n be given and define
P := 2(A · B), P := 2(|A| · |B|), Q := 2(C · D), Q := 2(|C| · |D|), and R := 2(P + Q) .
FAST INTERVAL MATRIX MULTIPLICATION
9
Table 4.1
Methods for point matrix times interval matrix.
method
classical
FImul3
FImul2
reference
classical method (1.7)
Algorithm 3.2
Algorithm 3.3
# matrix multiplications
very slow
3
2
Table 4.2
Methods for interval matrix times interval matrix.
method
classical
IImul4
IImul7
IImul3
IImul5
reference
classical method (1.7)
Algorithm 3.4
Algorithm 3.7
Algorithm 3.6
Algorithm 3.8
# matrix multiplications
very slow
4
7
3
5
Then
(3.11)
(
)
|R − A · B − C · D| ≤ (n + 4)eps ufp(P ) + ufp(Q) + 2realmin .
Proof. Theorem 2.1 implies
|P − A · B| ≤ (n + 2)eps · ufp(P ) + realmin and |Q − C · D| ≤ (n + 2)eps · ufp(Q) + realmin ,
and (2.4) and (2.3) applied to the matrix addition yield
(
)
|R − (P + Q)| ≤ eps · ufp(P + Q) ≤ eps(|P | + |Q|) ≤ 2eps ufp(P ) + ufp(Q) .
Using |P | ≤ P and |Q| ≤ Q, which follow by the monotonicity of rounding to nearest, proves the lemma. For the proof of validity of Algorithm 3.8, first Lemma 3.9 yields
(
)
|mC − mA · mB − ρA · ρB| ≤ (n + 4) ufp(Γm ) + ufp(Γρ ) + 2realmin .
Furthermore, applying Theorem 2.1 to 2(|mA| · |mB|) and 2(|ρA| · |ρB|) gives
(
)
| Γm − |mA| · |mB| | + | Γρ − |ρA| · |ρB| | ≤ (n + 2)eps ufp(Γm ) + ufp(Γρ ) + 2realmin ,
so that
(
)
|mA| · |mB| + |ρA| · |ρB| ≥ Γm + Γρ − (n + 2)eps ufp(Γm ) + ufp(Γρ ) − 2realmin .
Putting things together and observing the rounding upwards in the computation of rC proves A·B ⊆ [C1 , C2 ]
for the result computed by Algorithm 3.8.
4. Computational results. In this section we show computational results on the overestimation of
the different methods. We compare three methods for point times interval matrix and five methods for
interval times interval matrix as given in Tables 4.1 and 4.2, respectively.
For point matrix times interval matrix we generate factors A, B as follows (in Matlab notation):
A = randn(n);
B = midrad(randn(n),e);
% n x n random matrix
% random interval matrix, each entry with fixed radius e
10
S. M. RUMP
Table 4.3
Point matrix times interval matrix: median and maximum of the ratios of relative precision to the optimal inclusion.
n
100
100
100
100
100
100
e
1
0.01
10−5
10−10
10−14
10−15
classical
median
max
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0447 1.2881
1.4279 3.1000
FImul3
median
max
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0454 1.2712
1.4343 3.1000
FImul2
median
max
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0001
1.0001
1.8873
2.1983
9.2285 12.6346
The statement randn(n) generates an n × n-matrix with pseudo-random entries drawn from a normal
distribution with mean zero and standard deviation one. The second factor B has entries with random
midpoint and constant radius e. For e=1 about 70 % of all entries are intervals with zero in the interior, for
e=0.01 about 0.7 %.
Denote the relative precision according to Definition 3.5 of an interval matrix B ∈ Fn×n by rel(B) ∈ Rn×n .
Let C ∈ Fn×n be the result of the matrix product computed with one of the methods listed in Table 4.1.
Then Table 4.3 displays median and maximum of the entrywise ratio of the relative precision of C and the
narrowest (optimal) interval enclosure A · B as defined by (1.6).
For all kind of data the classical method and FImul3 based on midpoint radius arithmetic deliver similar
results. This seems natural because without the presence of rounding errors both definitions (1.7) and (3.3)
are identical and equal to A · B.
But also the faster version FImul2 with a priori estimation of the error in the floating-point to nearest
midpoint computation computes results of similar quality, at least for wide input intervals. For small width
of the second factor, however, it deteriorates.
The largest overestimation with the factor 12.6346 should not be overvalued because the radius of the second
factor is small. The actual components responsible are
[12.84259857639101, 12.84259857639336] and [12.84259857639208, 12.84259857639227]
with a relative precision of 9.1 · 10−14 and 7.2 · 10−15 , respectively. So in either case the inclusion is tight.
If, however, the results are used in further computations, this overestimation may become important.
Next we show the same results for interval matrix times interval matrix. Now both matrices are generated,
as previously the second factor, by midrad(randn(n),e), using the same fixed radius e for all entries of both
factors. As can be seen in Table 4.4 there is not much difference between the classical and the approach IImul7
by Nguyen and Revol. Also the midpoint-radius approach IImul4 computes results of similar quality, only
sometimes up to 20 % wider. As expected, the algorithms IImul3 and IImul5 based on a priori floating-point
error estimations overestimate the true result, in particular for small width of the input matrices.
Despite the fact that the classical method is prohibitively slow, one should also remember that, due to
rounding errors, all methods overestimate the narrowest inclusion A · B.
One may draw the conclusion that, for example, Algorithm IImul3 with an overestimation up to more than
a factor 400 in radius is of not much value. However, this is complaining on a high level. The worst case
with an overestimation of a factor 421.7 in radius are the intervals
[−1.13508640785746, −1.13508640768590] and [−1.13508640777189, −1.13508640777147] ,
11
FAST INTERVAL MATRIX MULTIPLICATION
Table 4.4
Interval matrix times interval matrix: median and maximum of the ratios of relative precision to the optimal inclusion
(fixed dimension n = 100 for both factors).
e
100
10
1
0.01
10−5
10−10
10−14
10−15
classical
median
max
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0226 1.1264
1.2097 2.0444
IImul4
median
max
1.0046 1.0061
1.0420 1.0559
1.2007 1.2574
1.0062 1.0078
1.0000 1.0000
1.0000 1.0000
1.0231 1.1226
1.2129 2.0667
IImul7
median
max
1.0046 1.0061
1.0363 1.0474
1.0188 1.0325
1.0000 1.0000
1.0000 1.0000
1.0000 1.0000
1.0234 1.1226
1.2158 2.0667
IImul3
median
max
1.6422
1.7622
6.6361
7.6766
30.0113
35.2913
40.6290
48.7160
40.0357
48.3276
40.2990
48.1929
75.1070
84.1234
366.3551 421.7162
IImul5
median
max
1.0046
1.0061
1.0363
1.0474
1.0188
1.0325
1.0000
1.0000
1.0000
1.0000
1.0001
1.0001
1.6547
2.1914
6.9774 11.9677
the inclusion by Algorithm IImul3 and the optimal inclusion, respectively. The relative precision is
7.56 · 10−11
and 1.79 · 10−13 ,
respectively, so both inclusions are not wide. The decision, whether Algorithm IImul3 or others can be used,
depends on the need of such high accuracy.
Therefore we consider an application of interval arithmetic in so-called “verification methods”, namely the
computation of an inclusion of the solution of a numerical problem. Such numerical problems cover systems
of linear and nonlinear equations, eigenvalue problems, ordinary and partial differential equations etc. For
those problems bounds for a solution are computed including a proof of solvability of the problem and
correctness of the bounds. For details, see [14, 19].
Many nonlinear problems eventually lead to the solution of a system of linear equations, for example nonlinear
systems ([19], Chapter 13) or ordinary differential equations (cf. [19], Chapter 15, or [12], [13], [21], [22],
[23]). These linear systems often consist of an interval matrix A and interval right hand side b. In this case
all solutions of linear systems Ax = b with A ∈ A and b ∈ b are included.
This is maybe the most classical application of interval matrix multiplication. The following often used
verification algorithm for linear systems is taken from [19], and it is presented in executable INTLAB code.
function XX = VerifyLinSys(A,b)
XX = NaN;
% initialization
R = inv(mid(A));
% approximate inverse
xs = R*mid(b);
% approximate solution
xs = xs + R*(mid(b)-mid(A)*xs); % residual iteration for backward stability
C = eye(dim(A))-R*intval(A);
% iteration matrix
Z = R*(b-A*intval(xs));
X = Z; iter = 0;
while iter<15
% interval iteration with epsilon-inflation
iter = iter+1;
Y = X*infsup(0.9,1.1) + 1e-20*infsup(-1,1);
X = Z + C*Y;
% interval iteration
if all(in0(X,Y)), XX = xs + X; return; end
end
12
S. M. RUMP
Table 4.5
Ratio of relative precision of the result of VerifyLinSys and reference inclusion for fixed dimension n = 100, condition
number c = 1010 and radius e = 10−12 .
classical
classical
1.0000
IImul4
1.0000
IImul7
1.0000
IImul3
1.0000
IImul5
1.0000
FImul3
1.0000
1.0000
1.0000
1.0000
1.0000
FImul2
1.0057
1.0057
1.0057
1.0057
1.0057
The type casts in the computation of C and Z ensure that interval operations are used, also for point data A
and b.
For given (interval) matrix A and (interval) right hand side b the output is an inclusion of the solution of
the linear system Ax = b. For interval input, the inclusion of all linear systems within the given interval
input data is enclosed. The result is either a valid inclusion together with the proof of solvability, i.e. all
system matrices are nonsingular or, the result NaN indicates that no inclusion could be computed in the given
precision. In this case, likely the problem is too ill-conditioned. In any case, no false result is possible. Note
that proving nonsingularity of all matrices within an interval matrix is an NP-hard problem [16].
Multiplications of a point matrix and interval matrix (vectors are considered as a matrix with one column)
are R*intval(A), R*res for res=b-A*intval(xs), and A*intval(xs) because in the latter xs is not an
interval quantity, and the only multiplication of two interval quantities is C*Y.
We now replace all those operations by one of the presented algorithms. The test data is generated as follows.
A = midrad( gallery(’randsvd’,n,c) , e );
b = A*ones(n,1);
This means the midpoint matrix is randomly generated with condition number c, the radius is constantly
e for all components, and the right hand side is computed such that the true solution is about the vector
of 1′ s. Note that the interval matrix most likely contains singular matrices if the product of the condition
number c and the radius e exceeds one. Therefore ce < 1 is a natural boundary for verification.
The first Table 4.5 shows all combinations of algorithms for fixed dimension n = 100, condition number
c = 1010 and radius e = 10−12 . For reference we compute an inclusion of the true solution set by more
sophisticated means. More precisely, an outer and inner inclusion is computed (see [19], Section 10.6),
which proves that the following data is correct. We display the median of the entrywise ratio between the
relative precision of the inclusion computed by VerifyLinSys and the reference inclusion. For example, a
displayed value 1.0057 means that the inclusion by VerifyLinSys is by about 0.6 % worse than the optimal
inclusion.
As can be seen there is almost no difference whatever multiplication method is used. This is true as long
the radii of the input data is about e = 10−12 or larger. For smaller radii the overestimation by a priori
estimating the error of floating-point operations in FImul2 introduces increasing overestimation. For example,
for condition number c = 102 and radius e = 10−14 the results are displayed in Table 4.6.
But again criticizing a “large” ratio 1.4962 is lamenting on a high level. The maximum ratio of relative
precisions of all possibilities using FImul2 compared to the reference solution was 1.4987. The corresponding
component of the inclusion by VerifyLinSys and the reference inclusion is
[0.99999999997397, 1.00000000002604] and
[0.99999999998268, 1.00000000001732]
with a relative precision of 2.60 · 10−11 and 1.73 · 10−11 , respectively.
13
FAST INTERVAL MATRIX MULTIPLICATION
Table 4.6
Ratio of relative precision of the result of VerifyLinSys and reference inclusion for fixed dimension n = 100, condition
number c = 102 and radius e = 10−14 .
classical
classical
1.0000
IImul4
1.0000
IImul7
1.0000
IImul3
1.0000
IImul5
1.0000
FImul3
1.0000
1.0000
1.0000
1.0000
1.0000
FImul2
1.4962
1.4962
1.4962
1.4962
1.4962
Table 4.7
Ratio of relative precision of the result of VerifyLinSys and reference inclusion for fixed dimension n = 100, point and
interval matrix multiplication always by FImul3 .
c
102
e
10−4
classical
1.0000
IImul4
1.0000
IImul7
1.0000
IImul3
1.0000
IImul5
1.0000
102
105
10−8
10−8
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
102
105
1010
10−12
10−12
10−12
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
102
105
1010
1013
10−14
10−14
10−14
10−14
1.0001
1.0000
1.0001
1.0020
1.0001
1.0000
1.0001
1.0020
1.0001
1.0000
1.0001
1.0020
1.0001
1.0000
1.0001
1.0012
1.0001
1.0000
1.0001
1.0020
Algorithm VerifyLinSys is applicable to matrix right hand side as well. We choose the identity matrix, so
that an inclusion of the inverse is included. In this case the computation of C*Y is a multiplication of two
interval matrices, and the effect of using the different versions IImulk can be studied. We first perform all
point and interval matrix multiplications by the standard midpoint-radius approach FImul3 . The results are
displayed in Table 4.7. Recall that ce < 1 must be satisfied.
Obviously there is not much difference, and there is no harm to choose the fastest algorithm IImul3 . Finally
we display the same table with using always a priori estimation of rounding errors by FImul2 rather than
FImul3 . The results are displayed in Table 4.8.
Once again a large ratio means only that the computed inclusion is overestimated by this factor. For example,
for condition number c = 105 and radius e = 10−15 the worst inclusion components are
103 · [−2.32788143660617, −2.32788139435358] and
103 · [−2.32788141797750, −2.32788141297821]
with relative precision 9.08 · 10−9 and 1.07 · 10−9 , respectively. Therefore, from a practical point of view,
there is not too much difference in choosing any of the proposed methods FImulk or IImulℓ .
From a mathematical point of view, the interval iteration in Algorithm VerifyLinSys can only be successful
if |I − RA| is convergent (rather than I − RA as in point iterations). In a certain way this is even necessary
and sufficient [19]. Moreover, the interval inclusion of I − RA can be expected to consist of intervals being
nearly symmetric to zero. Hence, there is no harm to replace C*Y by ⟨0, |C| · |Z|⟩, so that no product of two
interval quantities is necessary at all.
In this case, an implementation based on FImul3 applied to an inclusion of the inverse of a matrix requires
3 point by interval matrix multiplications. This means in total 9 matrix multiplications using FImul3 , but
only 6 matrix multiplications with FImul2 . Which one is preferable and/or suited can be decided depending
14
S. M. RUMP
Table 4.8
Ratio of relative precision of the result of VerifyLinSys and reference inclusion for fixed dimension n = 100, point and
interval matrix multiplication always by FImul2 .
c
102
e
10−4
classical
1.0000
IImul4
1.0000
IImul7
1.0000
IImul3
1.0000
IImul5
1.0000
102
105
10−8
10−8
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
102
105
1010
10−12
10−12
10−12
1.0109
1.0109
1.0112
1.0109
1.0109
1.0112
1.0109
1.0109
1.0112
1.0109
1.0109
1.0112
1.0109
1.0109
1.0112
102
105
1010
1013
10−14
10−14
10−14
10−14
2.0209
2.0195
2.0256
2.7290
2.0209
2.0195
2.0256
2.7293
2.0209
2.0195
2.0256
2.7293
2.0209
2.0195
2.0256
2.7237
2.0209
2.0195
2.0256
2.7293
102
105
1010
1013
10−15
10−15
10−15
10−15
7.7992
7.7884
7.8261
9.4548
7.7992
7.7884
7.8261
9.4554
7.7992
7.7884
7.8261
9.4554
7.7992
7.7884
7.8261
9.4487
7.7992
7.7884
7.8261
9.4554
on the application, in particular on the width of the input data.
5. Conclusion. Several algorithms for interval matrix multiplication have been presented. The multiplications based on midpoint-radius representation seem of similar quality as the classical interval matrix
multiplication. However, the latter is very slow due to numerous switches of the rounding mode and lack of
compiler optimization. Moreover, the former allow the use of fast BLAS3 routines.
There are improvements in performance based on a priori estimation of the error of certain matrix products
in floating-point rounded to nearest. They reduce the computing time for point times interval matrix from
3 to 2, and the product of two interval matrices from 4 to 3 matrix multiplications. Although comparing
the results directly seems to indicate that this approach is prone to overestimation, in particular for narrow
input intervals, in practical applications the difference is not too large and often negligible.
REFERENCES
[1] ARITHMOS, Benutzerhandbuch, Siemens AG, Bibl.-Nr. U 2900-I-Z87-1 edition, 1986.
[2] J.J. Dongarra, J.J. Du Croz, I.S. Duff, and S.J. Hammarling. A set of level 3 Basic Linear Algebra Subprograms. ACM
Trans. Math. Software, 16:1–17, 1990.
[3] N. J. Higham. Accuracy and stability of numerical algorithms. SIAM Publications, Philadelphia, 2nd edition, 2002.
[4] IBM Deutschland GmbH, Schönaicher Strasse 220, D-71032 Böblingen. IBM High-Accuracy Arithmetic Subroutine Library
(ACRITH), 1986. 3rd edition.
[5] ANSI/IEEE 754-1985: IEEE Standard for Binary Floating-Point Arithmetic. New York, 1985.
[6] R.B. Kearfott, M. Dawande, K. Du, and C. Hu. INTLIB: A portable Fortran-77 elementary function library. Interval
Comput., 3(5):96–105, 1992.
[7] R. Klatte, U. Kulisch, A. Wiethoff, C. Lawo, and M. Rauch. C-XSC A C++ Class Library for Extended Scientific
Computing. Springer, Berlin, 1993.
[8] O. Knüppel. PROFIL / BIAS — A Fast Interval Library. Computing, 53:277–287, 1994.
[9] U. Kulisch (ed.). Computer Arithmetic and Validity: Theory, Implementation, and Applications. de Gruyter, Berlin and
New York, 2008.
[10] M. Malcolm. On accurate floating-point summation. Comm. ACM, 14(11):731–736, 1971.
[11] J.M. Muller, N. Brisebarre, F. de Dinechin, C.P. Jeannerod, V. Lefèvre, G. Melquiond, N. Revol, D. Stehlé, and S. Torres.
Handbook of Floating-Point Arithmetic. Birkhäuser Boston, 2010.
FAST INTERVAL MATRIX MULTIPLICATION
15
[12] M. T. Nakao. Solving nonlinear elliptic problems with result verification using an H −1 type residual iteration. Computing,
pages 161–173, 1993.
[13] M.T. Nakao, K. Hashimoto, and Y. Watanabe. A numerical method to verify the invertibility of linear elliptic operators
with applications to nonlinear problems. Computing, 75:1–14, 2005.
[14] A. Neumaier. Interval Methods for Systems of Equations. Encyclopedia of Mathematics and its Applications. Cambridge
University Press, 1990.
[15] H.D. Nguyen and N. Revol. Accuracy issues in linear algebra using interval arithmetic. SCAN conference Lyon, 2010.
[16] S. Poljak and J. Rohn. Checking Robust Nonsingularity Is NP-Hard. Math. of Control, Signals, and Systems 6, pages
1–9, 1993.
[17] S.M. Rump. Fast and parallel interval arithmetic. BIT Numerical Mathematics, 39(3):539–560, 1999.
[18] S.M. Rump. Error estimation of floating-point summation and dot product. submitted for publication, 2010.
[19] S.M. Rump. Verification methods: Rigorous results using floating-point arithmetic. Acta Numerica, 19:287–449, 2010.
[20] S.M. Rump, T. Ogita, and S. Oishi. Accurate floating-point summation part I: Faithful rounding. SIAM J. Sci. Comput.,
31(1):189–224, 2008.
[21] A. Takayasu, S. Oishi, and T. Kubo. Guaranteed error estimate for solutions to two-point boundary value problem.
In Proceedings of the International Symposium on Nonlinear Theory and its Applications (NOLTA2009), Sapporo,
Japan, pages 214–217, 2009.
[22] Y. Watanabe. A computer-assisted proof for the Kolmogorov flows of incompressible viscous fluid. Journal of Computational and Applied Mathematics, 223:953–966, 2009.
[23] Y. Watanabe, M. Plum, and M.T. Nakao. A computer-assisted instability proof for the Orr-Sommerfeld problem with
Poiseuille flow. Zeitschrift für Angewandte Mathematik und Mechanik (ZAMM), 89(1):5–18, 2009.

Download Report

FAST INTERVAL MATRIX MULTIPLICATION 1. Introduction and

Paperzz.com

Your Paperzz