Duality applied to the complexity of matrix multiplications and other

D U A L I T Y A P P L I E D TO THE C O M P L E X I T Y
OF M A T R I X M U L T I P L I C A T I O N S
AND O T H E R B I L I N E A R FORMS %
J. H o p c r o f t
J. M u s i n s k i
Cornell U n i v e r s i t y
Ithaca, New York 14850
AbstraCt
The paper considers the c o m p l e x i t y of b i l i n e a r forms in a n o n c o m m u t a t i v e ring.
The dual of a c o m p u t a t i o n is defined and applied to m a t r i x m u l t i p l i c a t i o n and other
b i l i n e a r forms.
It is shown that the dual of an optimal c o m p u t a t i o n gives an optimal
c o m p u t a t i o n for a dual problem.
An nxm by mxp m a t r i x product is shown to be the dual
of an nxp by pxm or an mxn by nxp m a t r i x p r o d u c t implying that each of the m a t r i x
products requires the same number of m u l t i p l i c a t i o n s
to compute.
F i n a l l y an a l g o r i t h m
for c o m p u t i n g a single b i l i n e a r form over a n o n c o m m u t a t i v e ring with a m i n i m u m number
of m u l t i p l i c a t i o n s
is derived by c o n s i d e r i n g a dual problem.
Introduction
This paper is concerned with d e t e r m i n i n q the m i n i m u m number of m u l t i p l i c a t i o n s
n e c e s s a r y to compute certain b i l i n e a r forms over a n o n c o m m u t a t i v e ring.
dual of a set of expressions
dual of the c o m p u t a t i o n of a set of expressions
expressions.
Furthermore,
multiplications.
We define the
and the dual of a c o m p u t a t i o n in such a m a n n e r that the
is a c o m p u t a t i o n for the dual of the
a c o m p u t a t i o n and its dual both use the same number of
This implies that the m i n i m u m number of m u l t i p l i c a t i o n s n e c e s s a r y
to compute a set of e x p r e s s i o n s
is the same as that to compute its dual.
The concept of d u a l i t y is applied to m a t r i x m u l t i p l i c a t i o n .
e x p r e s s i o n s r e p r e s e n t i n g the m u l t i p l i c a t i o n of two m a t r i c e s
The dual of a set of
is a set of e x p r e s s i o n s
r e p r e s e n t i n g another m a t r i x m u l t i p l i c a t i o n p r o b l e m where the d i m e n s i o n s of the m a t r i c e s
have been permuted.
Thus we are able to show that the m i n i m u m number of m u l t i p l i c a t i o n s
n e c e s s a r y to compute an nxm by mxp m a t r i x p r o d u c t is the same as that r e q u i r e d to compute
an nxp by pxm or an mxn by nxp product.
results.
Optimal p r o g r a m s follow from p r e v i o u s
Dual statements of several i n t e r e s t i n g theorems are presented.
F i n a l l y it is
shown that Strassen's a l g o r i t h m for 2x2 by 2x2 m a t r i x m u l t i p l i c a t i o n is unique to w i t h i n
a linear transformation.
D e f i n i t i o n of a C o m p u t a t i o n
Let
~
be a c o m m u t a t i v e ring with a unit element and let
indeterminants.
Let
~
~
be a finite set of
be the n o n c o m m u t a t i v e ring o b t a i n e d by e x t e n d i n g
~
by multi-
%This r e s e a r c h was supported by the Office of Naval R e s e a r c h Grant N 0 0 0 1 4 - 6 7 - A - 0 0 7 7 - 0 0 2 1 .
73
nomial expressions of the elements of 4.
T h r o u g h o u t this section and the next
F
will
denote the set of b i l i n e a r forms
cij k aj x k
Similarly
a
and
x
I 1 < i <_ p, aj, x k E ~, cij k %
will denote the column vectors
(al,a 2,...,a m )
T
and
(Xl,X 2, • . . ,Xn )T.
We c o n s i d e r the n o t i o n of a c o m p u t a t i o n
structions
fi = gi ~ hi
plication,
where
o
a d d i t i o n or subtraction.
is either an element of ~ U
elements of ~ ,
~
(see O s t r o w s k i
[4]) as a sequence of in-
stands for one of the binary o p e r a t i o n s of multi[Each
fi
is a new v a r i a b l e and each
gi
or
hi
or a p r e v i o u s l y c o m p u t e d f..
A m u l t i p l i c a t i o n of two
3
~
is assumed to take one unit of time.
All
n e i t h e r of w h i c h is in
other o p e r a t i o n s require no time to perform.
in
be large m a t r i c e s
[5]) and thus the scheme is not only m a t h e m a t i c a l l y
(Strassen
~-~
The m o t i v a t i o n for c o u n t i n g only m u l t i -
p l i c a t i o n s between elements
is that in a p p l i c a t i o n s
the elements of
~
able but also reflects the actual c o m p u t a t i o n time w i t h i n a c o n s t a n t factor.
may
tract-
It is
well known that w i t h o u t d i v i s i o n c o m p u t a t i o n s of b i l i n e a r forms can be r e d u c e d to computing linear c o m b i n a t i o n s of p r o d u c t s of pairs of linear forms.
following d e f i n i t i o n of a computation.
This m o t i v a t e s
Express the set of e x p r e s s i o n s
F
as
the
(aTx) T%
n
where
X
is an mxp m a t r i x w i t h elements of the form ~
c i xi, c i E ~ .
A computation
l=l
of
F
is an e x p r e s s i o n of the form
M(Pa'Rx)
where
M , P
d i m e n s i o n s pxq, qxm and qxn whose elements are from ~ ,
by element m u l t i p l i c a t i o n ,
of e v a l u a t i n g
M(Pa.Rx)
and M(Pa.Rx)
uses
c o m p u t a t i o n is said to have
q
q
(aTx) T.
and
R
the symbol
are m a t r i c e s of
• indicates element
Since the s t r a i g h t forward m e t h o d
m u l t i p l i c a t i o n s b e t w e e n elements
in ~ - ~
, the
multiplications.
Duality
This section defines the dual of a set of b i l i n e a r forms and the dual of a computation.
It is then shown that the dual of any c o m p u t a t i o n of
F
c o m p u t e s the dual
of F.
Let
b
be the column vector
the system of equations given by
of F.
°(bl'b2'''''bp)T , b i ~ ~ .
(bTxT) T
Let
M(Pa.Rx)
The P-dual of the c o m p u t a t i o n is the c o m p u t a t i o n
Lemma i:
=
The l e f t dual of F is
(aTx) T
be a c o m p u t a t i o n
pT(MTb-Rx).
The P-dual of any c o m p u t a t i o n of a s y s t e m of e x p r e s s i o n s
F
computes
the
left dual of F.
Proof:
Let
M(Pa-Rx)
is a c o m p u t a t i o n of
=
(aTx) T
(bTxT) T.
be a c o m p u t a t i o n of F.
Let
D
the e l e m e n t s of the column v e c t o r Rx.
tBy
(aTx) T
We m u s t show that
pT(MTb.Rx)
be a d i a g o n a l m a t r i x w h o s e d i a g o n a l elements
Then
(M(Pa.Rx)) T =
(pa) T D M T.
Since the e l e m e n t s
we mean the m a t r i x w h o s e ijt--h e l e m e n t is the j l.th
- - e l e m e n t of aTx.
the elements are from a n o n c o m m u t a t i v e ring rather than a field
74
are
Since
(aTx) T ~ xTa in general.
of P commute w i t h the elements of a,
(pa)T = a T pT .
Now aTp T D M T = aTx for all a
implies pT D M T = X w h i c h in turn implies b T M D P = bTx T for all b.
bTx T implying pT(MTb'Rx)
Thus
(MTb)TDp =
= (bTxT) T
In a similar m a n n e r the system of e x p r e s s i o n s F can be e x p r e s s e d as Ax where A is
m
a pxn m a t r i x w i t h elements of the form
~>i cia i, ci6
~
"
The right dual of F is the
i=l
s y s t e m of equations given by ATb.
If M(Pa. Rx)
is a c o m p u t a t i o n of F, then the R-dual
of the c o m p u t a t i o n is the c o m p u t a t i o n RT(pa'MTb).
The R-dual of a c o m p u t a t i o n of a
system of e x p r e s s i o n s F computes the right dual of F.
L e m m a 2:
The R-dual of any c o m p u t a t i o n of a s y s t e m of e x p r e s s i o n s F computes
the right
dual of F.
Proof:
The proof is analogous to that of Lemma I.
T h e o r e m 3:
There is a c o m p u t a t i o n for the system of e x p r e s s i o n s c o m p u t e d by M(Pa. Rx)
with q multiplications
if and only if there is a c o m p u t a t i o n w i t h q m u l t i p l i c a t i o n s
for each of the systems of e x p r e s s i o n s
pT(Rx-MTb)
Proof:
computed by pT(MTb. Rx), RT(pa.MTb),
RT(MTb-Pa),
and M(Rx-Pa).
The result follows from the fact that a computation,
its R-dual and its P-dual
each have the same n u m b e r of m u l t i p l i c a t i o n s .
Let M(Pa'Rx)
c.
be a c o m p u t a t i o n of F and let c be a column v e c t o r such that M ( P a - R x ) =
Let T, U, V be pxp, mxm, nxn matrices r e s p e c t i v e l y w i t h e l e m e n t s
from
4"
A trans-
formation of a v e c t o r c of b i l i n e a r forms is the result of r e p l a c i n @ each e l e m e n t of a
and x by the c o r r e s p o n d i n g elements of Ua, Vx in Tc.
tion M(Pa-Rx)
Lemma 4:
A t r a n s f o r m a t i o n of the computa-
is the c o m p u t a t i o n TM(PUa-RVx).
The t r a n s f o r m a t i o n of a c o m p u t a t i o n of c is a c o m p u t a t i o n of the transforma-
tion of c.
C o r o l l a r y 5:
If c' is a t r a n s f o r m a t i o n of c, then c' can be c o m p u t e d in q m u l t i p l i c a -
tions if c can be c o m p u t e d in q m u l t i p l i c a t i o n s .
can be c o m p u t e d in q m u l t i p l i c a t i o n s
If T, U and V are n o n s i n g u l a r and c'
then c can be c o m p u t e d in q m u l t i p l i c a t i o n s .
Matrix Multiplication
Let A, B and C be mxn, nxp and mxp m a t r i c e s whose elements are from 4show that there is a c o m p u t a t i o n of AB with q m u l t i p l i c a t i o n s
We will
if and only if there are
c o m p u t a t i o n s for
ATc, BTA T, BC T, cTA, CB T
with q m u l t i p l i c a t i o n s .
In other words the number of m u l t i p l i c a t i o n s n e e d e d to compute
the product of an mxn m a t r i x with an nxp m a t r i x is the same as that r e q u i r e d to compute
an nxm by mxp, pxn by nxm, etc.
multiplications
If one uses the o r d i n a r y algorithms w h i c h require nmp
then the result is not surprising.
However,
the result claims that no
matter w h a t m e t h o d is used the m i n i m u m number of m u l t i p l i c a t i o n s
is the same.
Let a,b,c be column vectors whose elements are those of A, B and C r e s p e c t i v e l y in
row order
T)
(e.g.
a =
(all, a12,... , aln , a21,... , amn)
75
.
n
The ij th element of AB is
/j
k=l
Therefore,
aik bkj-
dimensions mp x q, q x mn and q x np whose elements
computation
for AB.
Theorem 6:
The following
statements
are from
~
such that M(Pa-Rb)
M (Pa •Rb)
b)
pT (MTc. Rb)
"
CB T
"
|!
II
II
II
II
c)
RT (MTc. pa)
,,
(cTA) T
,,
II
I!
I!
|I
II
d)
M(Rb-Pa)
,,
(BTA T) T ,,
II
l|
II
II
I|
e)
pT (Rb. MTc)
.
(BC T) T
.
I!
I!
I!
II
II
f)
R T (Pa.MTc)
,,
ATc
-
II
II
II
II
II
computes
AB
in row order using q multiplications.
(a) =>
(b).
Let D B be the mn by mp m a t r i x
oo]
[
0B .
00
M(Pa-Rb)
computes AB in row order implies that M(Pa-Rb)
computation.
=
(aTDB)T by d e f i n i t i o n
0
"B
of a
This in turn implies
pT(MTc-Rb)
Thus pT(MTc-Rb)
Corollary
is a
are equivalent.
a)
We will prove only that
Proof:
there exist matrices M, P and R of
7:
=
(cTD~) T
by Lemma i.
computes CB T in row order.
The m i n i m u m number of m u l t i p l i c a t i o n s
matrices w i t h o u t using c o m m u t a t i v i t y
required to m u l t i p l y mxn by nxp
is the same as to m u l t i p l y nxm by mxp, nxp by pxm,
pxm by mxn, pxn by nxm, or mxp by pxn.
Theorem 6 leads to new algorithms
Some of the new algorithms
for m u l t i p l y i n g
are optimal,
others
various
size matrices
are at least improvements
together.
over the best
currently known.
For example, in [3] it is shown that ~ 3 p n + m a x ( n , p ) ) / 2 ~ m u l t i p l i c a tions is sufficient for px2 by 2xn m a t r i x multiplication.
It follows that
~3pn
+ max(n,p))/27 multiplications
Since r 7 n / 2 ~ m u l t i p l i c a t i o n s
plication
[3] it follows
that
is sufficient
are necessary
r7n/2~ m u l t i p l i c a t i o n s
2xn by nx2 m a t r i x multiplication.
Similarly
for 3x2 by 2x3 m a t r i x multiplication,
sufficient
for 3x3 by 3x2 matrix multiplication.
The number of m u l t i p l i c a t i o n s
is an interesting
open problem.
necessary
are n e c e s s a r y
method
for
are n e c e s s a r y
are n e c e s s a r y
are sufficient
[5] could be improved.
algorithms
then the
An e x a m i n a t i o n
3x3 and 3x2 matrices whose elements
are from ~.
Then
I ml + m2
AX =
-m I - m 4 + m 8 - m 9
-m 2 - m 6 + m 1 3 - m14
-m2 - m3 + m7 - m8
m3 + m4
-m 4 - m 5 + ml0 - mll
where
76
of
may shed some insight on the
of an algorithm for 3x3 by 3x3 m a t r i x multiplication.
Let A, X, C and Y be 3x2, 2x3,
and
and
the product of two 3x3 matrices
If 21 or fewer m u l t i p l i c a t i o n s
growth rate of Strassen's
and sufficient
15 m u l t i p l i c a t i o n s
to compute
3x2 by 2x3 and 3x3 by 3x2 m a t r i x m u l t i p l i c a t i o n
development
for 2x2 by 2xn matrix multi-
since 15 m u l t i p l i c a t i o n s
sufficient
asymptotic
for 2xp by pxn m a t r i x multiplication.
and sufficient
-ml - m5 - m13 + m15 1
-m 3 - m 6 + mll - m12
m5 + m6
mI =
(all-al2)Xll
m6 =
(-a31+a32)x23
mll =
(a22-a31+a32)
m 2 = al2(Xll+X21 )
m7 =
(all+a21) ( X l l + X 1 2 + x 2 1 + x 2 2 )
m12 =
(-a21+a22-a31+a32)
m 3 = a21xl2
m8 =
(all-al2+a21)
m13 =
(a12+a31) (Xll-X23)
m 4 = a22x22
m9 =
(all-a12+a21-a22)
m14 =
(-a12-a32) (x21+x23)
m 5 = a31(x13+x23 )
ml0 =
(a22+a32) ( + x 1 2 + x 1 3 + x 2 2 + x 2 3 )
m15 =
(all+a31) (Xll+Xl3)
(Xll+X21+x22)
(x21+x22)
(+x12+x13+x23)
(+Xl2+Xl3)
If
1
0
0
0
0
0
0
0
0
0 -1 -i
0
0
0
1 -i
0
0
0 -i
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 -i
0
1
0
0 -i
0
0
0
1 -i
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0 -i
0
0 -i
0
0
0
0
1 -i
0
0
0
0
0 -i
0 -1
0
0
0
0
0
0
0
1 -I
0
0
0
0 -i -i
0
0
0
0
1 -i
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
1 -i
0
0
0
0
1
0
0
0
0
0-
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0 -i
1
0
0
0
0
0
1
1
0
1
0
0
0
1 -i
1
0
0
0
1 -i
i -i
0
0
0
0
1
0
0
0
0
0
0
1
1
0
1
1
0
1
0
0
1
1
0
0
0
0
0
1
1
0
1
0
1
1
0
1
1
1 -i
1
0
1
1
0
0
1
0 -I
1 -i
1
0
1
1
0
0
0
1
0
0
1
0
1
0
0
0
0 -i
0 -i
0
0
0 -i
0
0
0
1
0
1
1
0
0
1
1
0
1
0
0
0
0
computes
algorithm
0
A X in r o w o r d e r
for CY.
R =
with
15 m u l t i p l i c a t i o n s .
By T h e o r e m
Thus
n I + n 7 + n 8 + n 9 + n15
where
0
-i
is an o p t i m a l
BY =
0
0
t h e n M(Pa. Rx)
I
0
-i
M =
p =
1
-n 1 + n 2 - n 8 - n 9 + n13
n 3 + n 7 + n 8 + n 9 - n12
n4
n 5 - n 6 -nll
n 6 + nl0
- ni2 + n13 + n15
- n9 + nl0 + n l l +
+ nll+
nl2
nI =
(Cll-Cl3-C21)Yll
n4 =
(-c21+c22-c32)Y22
n2 =
(Cli-C12-c31)
n5 =
(-c13-c32+c33)
n3 =
(-c12+c22-c23)Y21
n6 =
(-c23-c31+c33)Y32
(Yli+Yi2)
77
- n141
nl2
l
- n14
J
(Y31+Y32)
6 pT(MTc-RVy)
n 7 = c12(Yll+Y12+Y21+Y22 )
n12 = -c23(Y21+Y31 )
n 8 = (-c12+c21) (Yll+Y12+Y22)
n13 = (-c13+c31) (Yll-Y32)
n 9 = -c21(Y12+Y22 )
n14 = -c31(Y12+Y32 )
nl0 = c32(Y21+Y22+Y31+Y32 )
n15 = Cl3(Yll+Y31 )
nll= (c23-c32) (Y21+Y31+Y32)
The algorithm for AX is the union of three optimal algorithms that compute
I all
el21 IXll
Xl2 1
ta21
a22
x22]
LX21
,
I all
el2
i xll
a31
a32
x21
Xl3 1
and
x23
Ia21
a221 I xl2
Xl3 1
~a31
a32
x23
x22
respectively such that each diagonal component of AX is computed with exactly two multiplications. Furthermore, both algorithms computing a given diagonal component, compute
it with the same two multiplications. Each of the three algorithms uses seven multiplications, but each pair of algorithms has two multiplications in common. Thus only 15
multiplications are used in computing AX.
The algorithm for CY is the dual of the algorithm for AX. Thus, there is a dual
construction for it. This construction is described briefly below and followed by an
example.
Let W be the 3x2 matrix such that W = CY.
optimal algorithms that compute
ic
ii
c21 c2 j LY l
y22j LW21
Cll
c13
:
I ~Yll
Yi2
Lc31
c33
~Y31
Y32
c22
c23
~Y21
Y22
Lc32
c33
I
31
yn
cc 3y31 w
Wll - c12Y21
iw31
Construct
c 3y32
Wl2 - c12Y22 7
, and
c32Y21
w32 - c32Y22
21 - c21Yll
w22 - c21Y12
LWn
1
c31Yn
wn
c31Yi2J
such that each cii appears in exactly two linear combinations which are left hand sides
of multiplications in each of the two algorithms involving cii. Furthermore, if ~ and 8
are the right hand sides of the two multiplications in one algorithr, then e and 8 are
the right hand sides in the other. Each pair of multiplications with the same right hand
Whose left hand sides contain cii are merged by the formula
A
merge
[(cii + ZI) ~, (cii + i2)~ ) = (cii + i I + £2)~,
!
\
where £i and £2 are linear combinations of the components of C. Each of the three original algorithms contains seven multiplications and between each pair of algorithms, two
pairs of multiplications are merged. Thus the composite algorithm uses 15 multiplications
in computing CY.
T8
The following example should clarify the above description.
Example
=lC J
nl+n2
1
-nl+n3+n5+n7
nl+n,+n5+n7
j
= I nS+n9
n8-nll+nl3-nl4
j • n8+nl0+nl2+nl4
l
-ns+nll+nl2+nl4
n16+n18+n20+n21]
i 2c2 IY21Y221
= Inl5+nl6
• nl5+nl7+nl9+n21
nl7-nl8
I
J
where
nl=Cl2 (Yl2+Y22)
n8=cl3(Yll+Y31 )
nl5=C23 (Y21+Y31)
n2= (Cli-C12) Yi2
n9=(Cll-Cl3)Yll
nl6= (c22-c23) Y21
n3= (c21-c22) (Y21-Y22)
n10=(c31-c33) (-Y31+Y32)
n17 = (-c32+c33) Y32
n4=c21 (Yli-Yi2+Y21-Y22)
nlI=C31(-Yli+Yi2-Y31+Y32)
n18=c32 (-Y22-Y32)
n5= (c12+c22) Y21
n12=(c13+c33)Y32
n19 = (-c23-c33) (-Y31+Y32)
n6= (Cli+C21) Yll
nl3=(Cll+C31)Yl2
n20 = (-c22-c32) (Y21-Y22)
n7= (c12+c21) (Yi2-Y21+Y22)
n14=(c13+c31) (Yli+Y31-Y32)
n21 = (c23+c32) (Y21+Y32)
Then
ml-m4+m6-m7+m8
CY =
ml+m2+m8-ml0-ml2
-ml+m4+m5+m7+ml3
-ml+m3+m5+m7+ml4+ml5
-ms+m9+mll+ml2-ml3+ml5
-m8+ml0+mll+ml2-ml4
where
ml=n 1
m9=merge(n10,n19)=(-c23+c31-c33) (-Y31+Y32)
m2=merge(n2,nl3)=(Cll-Cl2+C31)Yl2
ml0=nll
m3=merge(n3,n20)=(c21-c22-c32) (Y21-Y22)
mll=merge(n12,n17)=(c13-c32+c33)Y32
m4=n 4
m12=n14
m5=merge(n5,n16)=(c12+c22-c23)Y21
m13=n15
m6=merge(n6,n9)=(Cll-Cl3+C21)Yll
m14=n18
m7=n 7
m15=n21
m8=n 8
It is hoped that the techniques used above to construct algorithms for 3x2 by 2x3
and 3x3 by 3x2 matrix multiplication can be applied toward developing an optimal algorithm for 3x3 by 3x3 matrix multiplication. To date no algorithm for the latter using
less than 24 multiplications has been found. However, there is no indication that
79
24 multiplications is the minimum.
Let D be a 3x3 matrix with elements from 4. Then CD can be computed with 24 multiplications by partitioning the problem into a 3x2 by 2x3 and a 3xl by ix3 matrix multiplication problem or by partitioning the problem into a 3x3 by 3x2 and a 3x2 by 3xl
problem. These two partitions result in dual computations for 3x3 by 3x3 matrix multiplication. A third computation, also with 24 multiplications, can be obtained by using
both of the above combining techniques. In this case find optimal algorithms that compute
Icclic12]
Idlldl21,
I clIc13 ]
Idlldl31and
[c22c231
Id22d231
so that there are three pairs of multiplications used in computing the diagonal elements
of CD such that the two multiplications in each pair are either the same multiplication
or can be merged into a single multiplication.
Thus 18 multiplications are used. To
these are added the six multiplications c13d32 , c12(d23-dll),
c31d12, and c32(d21-d33).
(c23+c13)d31,
(c21+c31)d13 '
The computation of CD minus the above six multiplications is
illustrated below.
ml=(Cll-Cl2)dll
m8=(Cll-Cl3)dll
m15=c32(d23+d33 )
m2=cl2(d21+dll )
m9=cl3(d31+dll )
m16=(-c32+c33)d33
m3=c21d12
m10=c31(d13+d33 )
m17=c23d32
m4=c22d22
mll=(-c31+c33)d33
m18=c22d22
m5=(Cll+C21 ) (dll+d12+d21+d22) ml2 =(cll+c31 ) (dll+d33)
m19=(c23+c33) (-d22-d23-d32-d33)
m6=(Cli-C12+c21-c22) (d21+d22) m13 =(-c13-c33) (d31+d33)
m20=(-c22+c23-c32+c33) (-d22"d23)
m7=(Cli-C12+c21) (dll+d21+d22) m14 =(c13+c31 ) (dll-d33)
m21=(c23-c32+c33) (-d22-d23-d33)
Yll = merge (ml,mS) + m 2 + m 9
Y23 = -ml0 - m18 + m20 - m21 - merge(mll'ml6)
Yi2 = -m2 - m3 + m5 - m7
Y31 = -m9 - m13 + m14 - merge(mll'ml6)
Yi3 = -ml0 + m12 - m14 - merge(ml'mS)
Y32 = -m15 - m17 - m19 + m21
Y21 = -m4 - m6 + m7 - m9 - merge(ml'mS)
Y33 = merge (mll,ml6) + ml0 + m15
Y22 = m3 + m4 + m17
In addition to helping find optimal (or better) algorithms for matrix multiplication, Theorem 6 or its more general form, Theorem 3, can be applied to previously published theorems to yield new results. For example, the following appear in [3]. For
sake of simplicity the theorems are expressed for ~ being the integers. Some of the
theorems are more general.
A set of vectors Vl,V2,...,v p with elements from ~
are nondependent such that
P
civ i is a vector with elements from ~ ,
each c i an element of ~
implies each ci=0.
i=l
Since an expression can be considered to be a one dimensional vector, the notion of nondependence applies also to expressions.
8o
Lemma
8:
(Winograd)
Let A be an mxn matrix whose elements
T
are from ~ a n d
let
x = (Xl,X2,...,Xn)
where Xl £ ~.
If A has p nondependent
of Ax requires at least p multiplications.
columns,
Lemma
9:
be a set of expressions,
where
fl,...,fk
Let ~
be computed
cations
be a field and let F = {fl,...,fk,...,fp}
are nondependent
and each can be expressed
with q multiplications,
then there exists
in which k of the multiplications
Lemma 10:
for computing
Corollary
Ii:
respectively
AX has k multiplications
(all+al2+a21) Y, then the algorithm
as a single product.
an algorithm
If F can
for F with q multipli-
are fl,...,fk.
Let A and X be 2x2 and 2xn matrices
If an algorithm
then any computation
requires
at least
whose
elements
are from ~.
of forms all~, (a12+a21)~,
and
3n + k multiplications.
Let T be the group of transformations
generated
by the set of transforma-
tions which:
(i)
interchange
the two rows of A, two columns
of X, or the two columns
of A and
the two rows of X.
(2)
either
add
or add
(subtract)
(subtract)
row i of A to row j of A, column
column
i of A to column
(add) row j of X to row i of X.
have similar
Lemma 12:
theorems
(a12+a21+a22)8,
(all+a12+a22)Y
(b)
(all+al2)~,
(a12+a21+a22)B,
(all+a21+a22)Y
(c)
(all+a12+a21+a22)e,
(a12+a21) 8,
(d)
( a 2 1 ~ , (all+a22)8,
(all+a21+a22)8
(e)
~a21+a22)~,
(f)
a12~,
(g)
(a12+a22)e,
(h)
a22~,(a12+a21) 8,
and for
for computing
13:
(all+a21)e,
Theorems
14:
arbitrary
vector.
(all+a22)Y
(all+al2+a21) y
(all+al2+a21) Y
(a12+a21+a22)Y
respectively
AX which has k multiplications
whose elements
are from ~.
of types all~ , a128 , and
3n+k/2 multiplications.
(a12+a22)8,
we have similar
theorems
for a21e,
a22B,
(a21+a22)Y
(all+a12+a21+a22)Y.
3 and 6 to each of the above yields
(Fiduccia)
from T we also
(all+a12+a22)sY
(all+a21+a22) 8,
By transformations
only one new theorem
Lemma
(all+a12+a22) 8,
Let A and X be 2x2 and 2xn matrices
(all+al2) Y has at least
Applying
transformations
(all+a21)~,
Any algorithm
Corollary
By applying
j of A,
subtract
for
(a)
(all+a22) 8,
i of X to column
j of A and simultaneously
for each will be presented.
several
The others
Let A be an nxm matrix whose elements
If A has p nondependent
rows,
new theorems.
However,
are similar.
are from ~ a n d
then any algorithm
computing
let x be an
Ax requires
at least p multiplications.
Lemma
15:
Let
~
be a field and F be the set of expressions
m
< ~=i
n
cijajBijI
'
1 ~ i ~= p; aj E ~' cij E ~ ;
Bij = ... = Bpj,
1 ~ j ~ t.
Bij
k=l
~
dijkXk'
Let B be the pxt matrix whose
81
dijk E ~ '
ijt_h element
Xk E ~ ) where
is cijBij.
If F can be computed with q multiplications and B has t nondependent columns, then F can
be computed with q multiplications in which al,...,a t appear in exactly one multiplication each and that multiplication has the form (ai + £i)Bi where
m
£j = ~
i=t+l
Lemma 16:
£ijai, Zij £ ~ ,
1 ~ j ! t.
Let A and X be 2xn by nx2 matrices respectively whose elements are from $.
If an algorithm for computing AX = Y has k multiplications that are used only in computing Yll or only in computing Yi2 and Y21 or only in computing Yli' Yi2' and Y21' then
the algorithm requires 3n+k multiplications.
Corollary 17:
Lemma 18:
By applying transformations in T we have similar theorems for
(a)
Yll and Y21; YI2' Y21 and Y22; Yli' YI2 and Y22
(b)
Yll and Yi2; Yi2' Y21 and Y22; Yli' Y21 and Y22
(c)
Yli' YI2' Y21 and Y22; Yi2 and Y21; Yll and Y22
(d)
Y21; Yll and Y22; Yli' Y21 and Y22
(e)
Y21 and Y22; Yli' Yi2 and Y22; Yli' Yi2 and Y21
(f)
YI2; Yll and Y22; Yli' Yi2 and Y22
(g)
Yi2 and Y22; Yli' Y21 and Y22; Yli' Yi2 and Y21
(h)
Y22; Yi2 and Y21; Yi2' Y21 and Y22
Let A and X be 2xn by nx2 matrices whose elements are from ~.
Any algorithm
for computing AX = Y which has k multiplications used only in computing Yll' Yl2 or both
has at least 3n+k/2 multiplications.
Corollar[ 19:
By transformations we also have theorems for
(a)
Y21; Y22; Y21 ~ and Y22;
(b)
Yll and Y21; Yi2 and Y22; Yli' YI2'Y21 and Y22"
Instead of using Theorems 3 and 6 to prove the above we can construct "dual" proofs.
As an example, we will present a proof for Lemma 18, the dual of Lemma 12.
Proof of Lemma 18:
We first state some results without proofs.
n
(i)
Let al,...,a p be n-vectors whose elements are of the form 7 . cixi,ci £ ~ a n d
xi ~ ~.
i=l
Then al,...,a p are nondependent if and only if they are linearly in-
dependent.
(2)
Let C and D be mxn matrices whose elements are from ~
.
If C and D have k 1
and k 2 nondependent columns respectively then C+D has at most kl+k 2 nondependent columns.
(3)
Let C be a 2xn matrix whose elements are from ~ .
If C has k nondependent
columns, then row 1 or row 2 of C has at least k/2 nondependent elements.
8~
Let ml,...,m k be the k multiplications
puting Yll' Yl2 or both.
which
are assumed
Then Yll = M1 + F1 and Yi2 = M2 + F2 where M 1 and M 2 are sums
of the ml,...,m k and F 1 = Yll - M1 and F 2 = Yi2 - M2"
assume
= Gx
to be used only in com-
and
= Hx
M2
where x =
Without
loss of generality
we can
[Xll,Xl2,...,Xnl,Xn2]
F2
2
and G and H are 2x2n matrices
whose elements
Let G' and H' be the matrices
resulting
k nondependent
columns
by Lemma
8.
are of the form
n
~
~
cijaij,
i=l
j=l
when we set a21 = ... = a2n = 0.
Since
=
G' + H'
cij ~ 9 .
G' has at most
x, G' + H' must have 2n non-
Yi2 J
dependent
columns.
Hence by
(2) H' has at least 2n-k nondependent
row i, i is 1 or 2, has at least n-k/2 nondependent
n-k/2
elements
elements.
n
cjalj
by
(3)
(i) H has
n
+ .~7
k=l
are linearly
Therefore,
and by
in row i of the form
n
~_~7
j=l
columns
dka2k'
cj, dk E 9 ,
such that the
~7
j=l
cjalj
parts of each element
independent.
Assume
~is
a field.
We can remove n-k/2 multiplications
(i)
removing ml,...,m k
(2)
equating
solving
an appropriate
for n-k/2
choice of n-k/2
elements
from Q by
in row i of H to zero and
alj'S.
The new algorithm computes Y21 and Y22 which requires 2n multiplications.
Hence Q
must have had at least 3n+k/2 multiplications.
If ~ i s
not a field 3n+k/2 is still a
lower bound.
Theorems
3 and 6 and the preceding
3xn matrix multiplications
(and hence
lemmas
lead to the following
2xn by nx3,
3x2 by 2xn,
lemmas
3xn by nx2,
for 2x3 by
nx2 by 2x3,
and nx3 by 3x2 matrix multiplications).
Lemma 20:
Let A and X be 2x3 and 3xn matrices
Any algorithm
(all+al2)~,
for computing
respectively
AX which has k multiplications
a13~ , (all+al3)~,
(a12+a13)%,
and
(all+al2+al3)~
whose elements
of forms
are from 4-
all~ , a128 ,
has at least
4n+2k/3
mul-
tiplications.
Proof:
Similar
Corollary
21:
to Lemma 12.
Extend
the definition
3xn matrix multiplication.
(a)
a21~,
(b)
(all+a21)e,
Corollary
of T in Corollary
Then by transformations
a228 , (a21+a22) ~, a236,
(a21+a23)~,
ii in the obvious
in T we have
(a22+a23)%,
similar
If n=3,
and Q is an optimal
algorithm
83
results
for
(a21+a22+a23)~
(a12+a22)B, (all+a12+a21+a22)T,
(a13+a23)d,
a +a +a +a +a +a
)~
(all+a13+a22+a23)%'
( ii 12 13 21 22 23
"
22:
way to 2x3 by
for computing
(all+a13+a21+a23)~,
AX,
then k ~ 4.
Corollary 23:
Let n=3 and let Q be an optimal algorithm
set of all m u l t i p l i c a t i o n s in Q that begin with
all, a12, all+12,
a21+a23,
a13, all+al3,
for computing AX.
Let S A be the
a12+a13 , all+al2+al3 , a21, a22, a21+a22 , a23,
a22+a23 , a21+a22+a23 , all+a21 , a12+a22 , all+a12+a21+a22 , a13+a23 ,
all+a13+a21+a23 , a12+a13+a22+a23 , a l l + a 1 2 + a 1 3 + a 2 1 + a 2 2 + a 2 3 •
Then
(i)
at most 12 m u l t i p l i c a t i o n s
(ii) no two m u l t i p l i c a t i o n s
of Q are in S A.
of Q that are in S A have the same left hand side.
Proof
(i)
Follows
(ii) Suppose
from corollary
22.
some m u l t i p l i c a t i o n
can assume w i t h o u t
all = 0.
of Q is in S A.
loss of g e n e r a l i t y
By applying t r a n s f o r m a t i o n s
that m u l t i p l i c a t i o n
This removes at least one m u l t i p l i c a t i o n
0
a12 al3 ]
from T we
has the form all~.
from Q.
set
Q now computes
X±l
x12
x13
|
x21
x22
x23
x31
x32
x33
J
(I)
a21 a22 a23
By Lemma 15 we can assume that setting a21 = 0 causes three m u l t i p l i c a t i o n s
The resulting
plications.
computation
is a 2x2 by 2x3 matrix m u l t i p l i c a t i o n
Thus if setting
all = 0 removed more than one m u l t i p l i c a t i o n
inally have had 16 m u l t i p l i c a t i o n s
multiplication
and hence was not optimal.
For the remainder of this d i s c u s s i o n
We will conclude
2x2 by 2x2 m a t r i x m u l t i p l i c a t i o n
ii).
can be obtained
7x4, and 7x4 matrices
Lemma 24:
Proof:
to within
algorithm
algorithm
of T to the latter.
respectively
a transformation
Z2, the intealgorithm
for
of T (as d e f i n e d
for 2x2 by 2x2 m a t r i x m u l t i p l i c a t i o n
by
[Xll,Xl2,X21
,X22]T.
are from ~ s u c h
and hence
Let M, P, R be 4x7,
that M(Pa. Rx)
is an optimal
computes
algorithm.
For fixed P and R, M is unique.
= M' (Pa.Rx)
Then there exists
where M' is a 4x7 m a t r i x whose elements
Thus m I can be replaced by
Therefore,
In
(m 2 + m 3 + ... + mk) , implying
[3] it is shown that 7 m u l t i p l i c a -
M is unique.
Any optimal a l g o r i t h m
to within a t r a n s f o r m a t i o n
are from
an equation m I + ... + m k = 0, k ~ i, where m i is an entry
that AX can be computed w i t h 6 multiplications.
tions are required.
Proof:
let ~ =
for 2x2 by 2x2 m a t r i x m u l t i p l i c a t i o n
whose elements
M(Pa. Rx) uses 7 m u l t i p l i c a t i o n s
of the column vector Pa. Rx.
T h e o r e m 25:
Q had only one
Let A and X be 2x2 matrices whose elements
[all,a12,a21,a22 ]T and x =
Assume M(Pa-Rx)
and M ~ M'.
is unique
That is, every optimal
Let a =
AX in row order.
on matrix m u l t i p l i c a t i o n
this section by showing that Strassen's
from any given optimal
applying a t r a n s f o r m a t i o n
are from ~.
Therefore
ii multi-
Q must orig-
of form alle.
gers modulo 2.
in Corollary
to disappear.
w h i c h requires
Q
for 2x2 by 2x2 matrix m u l t i p l i c a t i o n
is unique
of T.
Divide the m u l t i p l i c a t i o n s
of Q into two disjoint
84
sets S A and SB, where the
multiplications
tion
in S A have left hand sides which can be mapped onto all by a transforma-
in T and the m u l t i p l i c a t i o n s
all + a22 by a transformation
six m u l t i p l i c a t i o n s
in S B have left hand sides which can be mapped onto
in T.
In [3] it is shown that an optimal
from S A and one from S B.
any other element of S B by a transformation
that Q has a m u l t i p l i c a t i o n
remaining m u l t i p l i c a t i o n s
(a12+a22)~ 2,
of the form
loss of g e n e r a l i t y
in T we can assume without
(all+a22)~.
loss of generality
Lemmas i0 and 12 tell us that the
(all+a12)~5 , (a21+a22)~6 , all~7 °
of T preserve AX, any t r a n s f o r m a t i o n
itself will send the set of remaining
without
Since any element of S B can be mapped to
have forms
(all+a21)~3 , a22~ 4,
Since the transformations
algorithm must have
that sends all+a22
left hand sides into itself.
into
Thus we can assume
that
(all+a22)
(a12+a22)
(all+a21)
Pa =
a22
(all+al2)
(a21+a22)
all
By the same reasoning
and using duals of Lemmas
right hand sides of the multiplications
{Xll+X22 , x21+x22,
Since for any two sets of possible
i0 and 12, we can conclude
of Q must be a transformation
Xll+Xl2 , x22, Xll+X21 , x12+x22,
that the
of
Xll}"
right hand sides there exists a transformation
in T
that sends one to the other without changing the set of left hand sides corresponding
the former, we can assume without
loss of generality
to
that
(Xll+X22)
(x21+x22)
(Xll+Xl2)
WRx =
x22
where W is a 7x7 permutation matrix
(Xll+X21)
(x12+x22)
_
Xll
We need only show that R is unique.
Somehow we must form the product
a12x21 .
Hence one of the four multiplications
(a12+a22) (x21+x22) , (a12+a22) (Xll+X21) , (all+al2) (x21+x22) , and
be present.
Assume
(al2+a22) (Xll+X21)
is in Q.
Then
(all+al2) (Xll+X22)
Q since this is the only way to cancel the product al2Xll
introduce
pressions.
the term a12x22.
Thus
However,
(a12+a22) (Xll+X21)
(all+al2) (Xll+X21)
from
is not in Q.
85
must also be in
(a12+a22) (Xll+X21)
we cannot obtain a12x21 and a12x22
Similar arguments
must
and to
in separate
eliminate
ex-
(all+al2) (x21+x22)
Considering
and
(all+al2) (Xll+X21) , leaving
products
involving
(al2+a22) (x21+x22) .
a12 , a21, Xl2, x21 we find that
(al2+a22) (x21+x22) ,
(all+a21) (Xll+Xl2) , a22(Xll+X21 ) , (all+a12)x22 , (a21+a22)Xll , all(X12+x22 ) are in Q.
This
leaves
(all+a22)
to match with
(Xll+X22) .
Since the left and right hand sides can match up only one way R is unique.
Lemma 24 M is unique
and thus,
the algorithm
is unique
to within
Thus by
a transformation
of T.
GeneralExpressiOns
Let al,...,am,
a
Xl,...,x n, d, be in ~ a n d
[al,...,am ]T, x =
procedure
[Xl,...,Xm ]T and d =
which will yield an optimal
let Cll,Cl2,...,Cmn
[d].
m
j=l
i=l
~.
In this section we develop
algorithm
n
be in
for computing
a single
Let
an effective
expression
cijaix j •
n
Vari
[6] accomplishes
the above provided
that
a. = x. = 0 for all i,j.
i
3
we give a second proof.
Vari has subsequently
Theorem
an effective
26:
computing
There exists
~
cijaix i = 0 if and only if
~-~
i=l
removed
procedure
this condition.
which yields
Using Theorem
an optimal
algorithm
Theorem
the P-dual
for
m
I
I cijaixj-
j=l
i=l
3 tells us that an optimal
of an optimal
algorithm
n
algorithm
for computing
for computing
~
j =i
m
~
cijaixj
i=l
the set of expressions
n
S = { ~>'
cijdx jli=l ..... m}.
j=l
The minimum
number of multiplications
needed
to compute
this set of expressions
equals
n
the maximum number of nondependent
expressions
Clearly,
then we can find matrices
M, P, R of appropriate
computes
S with the minimum number
of multiplications.
m
in the set { >.~ cijxjli
j=l
dimensions
oijaixj
with the minimum
number of multiplications.
j=l
86
l,...,m}.
such that M(Pd. Rx)
Then pT(MTa'Rx)
n
I
i=l
3,
the expression
n
Proof:
m
~
j =i
computes
is
References
i. Fiduccia,
C. M.,
"Fast Matrix Multiplication",
Proceedings of Third Annual ACM Sym-
posium_on Theory of C o m p u t i n g , pp. 45-49, 1971, Shaker Heights, Ohio
2. Gastinel, N., "Sur le calcul des produits de matrices",
Numer. Math.,
17, pp. 222-229,
1971
3. Hopcroft,
J. E. and Kerr, L. R., "On minimizing the number of multiplications
sary for matrix multiplication",
SIAM J. Appl. Math., Vol.
neces-
20, No. i, pp. 30-35, 1971
4. Ostrowski, A. M., "On two problems in abstract algebra connected with Horner's rule",
in Studies in Mathematics and Mechanics,
5. Strassen, N.,
pp. 40-48, Academic Press, New York
"Gaussian elimination is not optimal", Numer. Math.,
13, pp. 354-356,
1969
6. Vari, T. M., "On the number of multiplications
required to compute quadratic func-
tions", TR 72-120, Computer Science Department, Cornell University,
Ithaca, New York,
1972
7. Winograd,
S., "On the number of multiplications
required to compute certain func-
tions", Proc. Nat. Acad. Sci. USA, 58, pp. 1840-1842,
87
1967