CSE 190, Great ideas in algorithms:
Matrix multiplication
1
Matrix multiplication
Given two n × n matrices A, B, compute their product C = AB
Pusing as few additions and
multiplications as possible. That is, we want to compute ci,j = k ai,k bk,j . For concreteness,
consider computations over the reals, although any other field would do.
Definition 1.1. The matrix multiplication exponent is the minimal ω such that n×n matrices
can be multiplied using O(nω ) operations.
Open Problem 1.2. What is the matrix multiplication exponent?
Trivially, 2 ≤ ω ≤ 3. The first nontrivial algorithm was by Strassen, who showed that
ω ≤ log2 7 ≈ 2.81. The starting point of Strassen’s algorithm is the following algorithm for
multiplying 2 × 2 matrices:
1. m1 = (a1,1 + a2,2 )(b1,1 + b2,2 )
2. m2 = (a2,1 + a2,2 )b1,1
3. m3 = a1,1 (b1,2 − b2,2 )
4. m4 = a2,2 (b2,1 − b1,1 )
5. m5 = (a1,1 + a1,2 )b2,2
6. m6 = (a2,1 − a1,1 )(b1,1 + b1,2 )
7. m7 = (a1,2 − a2,2 )(b2,1 + b2,2 )
8. c1,1 = m1 + m4 − m5 + m7
9. c1,2 = m3 + m5
10. c2,1 = m2 + m4
11. c2,2 = m1 − m2 + m3 + m6
1
The computation model is a straight line program: each internal computation is a sum
or a product of two previous computed values. We can expand Strassen’s algorithm in this
way: it will contain 7 multiplications and 18 additions. Moreover, we can first compute the
linear forms, then multiply them, and then take a linear combination of the results. We first
show that any straight line program for matrix multiplication can be put in such a form,
which we call a normal form.
Lemma 1.3 (normal form). Any straight-line program for computing matrix multiplication,
which uses M multiplications (and any number of additions), can be converted to the following form:
(i) For 1 ≤ i ≤ 2M , compute linear combinations αi of the entries of A.
(ii) For 1 ≤ i ≤ 2M , compute linear combinations βi of the entries of B.
(iii) For 1 ≤ i ≤ 2M , Compute pi = αi βi .
(iv) For 1 ≤ i, j ≤ n, compute ci,j as a linear combination of p1 , . . . , p2M .
Note that in the normal form, the program computes 2M multiplications and O(M n2 )
additions.
Proof. Let z1 , . . . , zN be the intermediate variables of the straight line program, where for
simplicity assume that the first 2n2 are the inputs, ie the entries of A, B. Each zt (other than
the first 2n2 ) is a sum or a product of two previous variables. Each zt is some polynomial in
the inputs. As the required result is quadratic, we will show that only linear and quadratic
computations are necessary. For simplicity, let x, y be a vectors of length n2 each, containing
the elements of A and B, respectively. Then zt (x, y) is some polynomial. We can decompose
it as
zt (x, y) = ct + `0t (x) + `00t (y) + qt (x, y) + rt (x, y),
where ct isPa constant, `0t , `00t are linear functions, qt (x, y) is a bilinear function, that is
qt (x, y) =
γi,j xi yj for some coefficients γi,j , and rt (x, y) are the remaining terms. Note
that inputs have only a linear part; and that outputs have only a bilinear part. The main
point is that we can compute all of the linear and bilinear parts directly, without computing
rt at all. via a straight line program.
• If zt = zt1 + zt2 with t1 , t2 < t, then ct = ct1 + ct2 , `0t (x) = `0t1 (x) + `0t2 (x), `00t (y) =
`00t1 (y) + `00t2 (y) and qt (x, y) = qt1 (x, y) + qt2 (x, y). Same for general linear combinations.
• If zt = zt1 · zt2 with t1 , t2 < t then ct = ct1 · ct2 , `0t (x) = ct2 `0t1 (x), `00t (y) = ct1 `t2 (y) and
qt (x, y) = ct,1 qt2 (x, y) + ct,2 qt1 (x, y) + `0t1 (x)`00t2 (y) + `0t2 (x)`00t1 (y).
Note that ct are constants independent of the inputs. So, the only actual multiplications
we do is in computing `0t1 (x)`00t2 (y) and `0t2 (x)`00t1 (y). Instead, we can compute:
• `0t (x), `00t (y) for all 1 ≤ t ≤ N .
2
• If zt is a multiplication gate, compute `0t1 (x)`00t2 (y) and `0t2 (x)`00t1 (y).
• Answers are linear combinations of qt (x, y).
Note that we only need the linear combinations which enter the multiplication gates, which
gives the lemma.
Theorem 1.4. If two m × m matrices can be computed using M = mα multiplications (and
any number of additions) in a normal form, then for any n ≥ 1, any two n × n matrices can
be multiplied using only O((mn)α log(mn)) operations.
So for example, Strassen’s algorithm is an algorithm in normal form which uses 7 multiplications to multiply two 2 × 2 matrices. So, any two n × n matrices can be multiplied
using O(nlog2 7 ) ≈ O(n2.81 ) operations. So, ω ≤ log2 7 ≈ 2.81. The best known algorithms
give ω ≤ 2.373.
Proof. Let T (n) denote the number of operations required to compute the product of two
n × n matrices. We assume that n is a power of m, by possible increasing it to the smallest
power of m larger than it. This might increase n to at most nm. Now, the main idea is to
compute it recursively. We partition an n × n matrix as an m × m matrix, whose entries
are (n/m) × (n/m) matrices. Let C = AB and let Ai,j , Bi,j , Ci,j be these sub-matrices of
A, B, C, respectively, where 1 ≤ i, j ≤ m. Then, observe that (as matrices) we have
Ci,j =
m
X
Ai,k Bk,j .
k=1
We can apply any algorithm for m × m matrix multiplication in normal form to compute
{Ci,j }, as the algorithm never assumes that the inputs commute. So, to compute {Ci,j }, we:
(i) For 1 ≤ i ≤ M , compute linear combinations αi of the Ai,j .
(ii) For 1 ≤ i ≤ M , compute linear combinations βi of the Bi,j .
(iii) For 1 ≤ i ≤ M , Compute pi = αi βi .
(iv) For 1 ≤ i, j ≤ m, compute Ci,j as a linear combination of P1 , . . . , PM .
Note that αi , βi , pi are all (n/m) × (n/m) matrices. How many operations do we do? steps
(i),(ii),(iv) each require M m2 additions of (n/m) × (n/m) matrices, so in total require
O(M n2 ) additions. Step (iii) requires M multiplications of matrices of size (n/m) × (n/m).
So, we get the recursion formula
T (n) = mα T (n/m) + O(mα n2 ).
This solves to O((mn)α ) if α > 2 and to O((mn)2 log n) if α = 2. Lets see explicitly the first
case, the second being similar.
3
Let n = ms . This recursion solves to a tree of depth s, where each node has mα children.
The number of nodes at depth i is mαi , and the amount of computation that each makes is
O(mα (n/mi )2 ). Hence, the total amount of computation at depth i is O(mα · m(α−2)i n2 ). As
long as α > 2, this grows exponentially fast in the depth, and hence controlled by the last
level (at depth s) which takes O(mα · m(α−2)s m2s ) = O((mn)α ).
1.1
Verifying matrix multiplication
Assume that someone gives you a magical algorithm that is supposed to multiply two matrices quickly. How would you verify it? one way is to compute matrix multiplication yourself,
and compare the results. This will take time O(nω ). Can you do better? the answer is yes,
if we allow for randomization. In the following, our goal is to verify that AB = C where
A, B, C are n × n matrices over an arbitrary field.
Function MatrixMultVerify
Input : n × n m a t r i c e s A, B, C .
Output : I s i s t r u e t h a t AB = C ?
1 . Choose x ∈ {0, 1}n randomly .
2 . Return TRUE i f A(Bx) = Cx , and FALSE o t h e r w i s e .
Clearly, if AB = C then the algorithm always returns true. Moreover, as all the algorithm
does is iteratively multiply an n × n matrix with a vector, it runs in time O(n2 ). The main
question is: can we find matrices A, B, C where AB 6= C, but where the algorithm returns
TRUE with high probability? The answer is no, and is provided by the following lemma,
applied to M = AB − C.
Lemma 1.5. Let M be a nonzero n × n matrix. Then Prx∈{0,1}n [M x = 0] ≤ 1/2.
In particular, if we repeat this t times, the error probability will reduce to 2−t .
Proof. The matrix M has some nonzero row, lets say it is a1 , . . . , an . Then,
hX
i
Pr n [M x = 0] ≤ Pr
ai x i = 0 .
x∈{0,1}
P
P
Let i be minimal such that ai 6= 0. Then,
ai xi = 0 iff xi = j>i (−aj /ai )xj . Hence, for
any fixing of {xj : j > i}, there is at most one value for xi which would make this hold.
hX
i
hX
i
Pr
ai xi = 0 = Exj ,...,xn ∈{0,1}
Pr
ai x i = 0
xi ∈{0,1}
"
"
##
X
= Exj ,...,xn ∈{0,1}
Pr
xi =
(−aj /ai )xj
xi ∈{0,1}
≤ 1/2.
4
j>i
1.2
Finding triangles in graphs
Let G = (V, E) be a graph. Our goal is to find whether G contains a triangle, and more
generally, enumerate the triangles in G. Trivially, this takes n3 time. We will show how to
improve it using fast matrix multiplication. Let |V | = n, A be the n × n adjacency matrix
of G, Ai,j = 1(i,j)∈E . Observe that
(A2 )i,j =
X
Ai,k Ak,j = number of pathes of length two between i, j
k
So, to check G contains a triangle, we can first compute A2 , and then use it to detect if there
is a triangle.
Function TriangleExists(A)
Input : An n × n a d j a c e n c y matrix A
Output : I s t h e r e a t r i a n g l e i n t he graph ?
1 . Compute A2
2 . Check i f t h e r e i s 1 ≤ i, j ≤ n with Ai,j = 1 and (A2 )i,j ≥ 1 .
The running time of step 1 is O(nω ), and of step 2 is O(n2 ). The enumeration algorithm
will be recursive. At each step, we partition the vertices to two sets and recurse over the
possible 8 configurations. To this end, we will need to check if a triangle i, j, k exists in G
with i ∈ I, j ∈ J, k ∈ K for some I, J, K ⊂ V . The same algorithm works.
Function TriangleExists(A; I,J,K)
Input : An n × n a d j a c e n c y matrix A , I, J, K ⊂ {1, . . . , n}
Output : I s t h e r e a t r i a n g l e i n t he graph ?
1 . Let A1 , A2 , A3 be I × J, J × K, I × K sub−m a t r i c e s o f A ,
respectively .
2 . Compute A1 A2 .
3 . Check i f t h e r e i s i ∈ I, k ∈ K with (A1 A2 )i,k = 1 and (A3 )i,k = 1 .
We next describe the triangle listing algorithm. For simplicity, we assume n is a power
of two.
5
Function TrianglesList(A; I,J,K)
Input : An n × n a d j a c e n c y matrix A , I, J, K ⊂ {1, . . . , n}
Output : L i s t i n g o f a l l t r i a n g l e s i n th e graph
1 . I f n = 1 check i f t h e s i n g l e p o s s i b l e t r i a n g l e e x i s t s ,
and i f so , output i t .
2 . I f Ch e ck Tr i an g le ( I , J ,K)==F a l s e r e t u r n .
3 . P a r t i t i o n I = I1 ∪ I2 , J = J1 ∪ J2 , K = K1 ∪ K2 , each o f s i z e n/2 .
4 . Run T r i a n g l e s L i s t ( Ia , Jb , Kc ) f o r a l l 1 ≤ a, b, c ≤ 2 .
We will run TrianglesList(A,V,V,V) to enumerate all triangles in the graph.
Lemma 1.6. If G has m triangles, then TrianglesList outputs all triangles, and runs in time
O(nω m1−ω/3 ).
In particular, if ω = 2, the algorithm runs in time O(n2 m1/3 ).
Proof. It is clear that the algorithm lists all triangles, and every triangle is listed once. To
analyze its running time, consider the tree defined by the execution of the algorithm. A
node at depth d corresponds to three matrices of size n/2d × n/2d . It either has no children
(if there is no triangle in the corresponding sets of vertices), or has 8 children. Let `i denote
the number of nodes at depth i, then we know that
`i ≤ min(8i , 8m).
The first bound is obvious, the second follows because for any node at depth i, its parent at
depth i − 1 must contain a triangle, and all the triangles at a given depth are disjoint. The
computation time at level i is given by
Ti = `i · O((n/2i )ω )
∗
Let i∗ be the level at which 8i = 8m. If i ≤ i∗ then
Ti ≤ 8i · O((n/2i )ω ) = O(nω 2i(3−ω) ).
If i ≥ i∗ then
Ti ≤ 8m · O((n/2i )ω ) = O(m · nω 2−iω ).
So the total running time is controlled by that of level i∗ , and hence
X
Ti = O(Ti∗ ) = O(nω m1−ω/3 ).
Open Problem 1.7. How fast can we find one triangle in a graph? how about m triangles?
6
© Copyright 2026 Paperzz