How to Multiply - Cs.princeton.edu

Complex Multiplication
Complex multiplication. (a + bi) (c + di) = x + yi.
How to Multiply
Grade-school. x = ac - bd, y = bc + ad.
integers, matrices, and polynomials
4 multiplications, 2 additions
Q. Is it possible to do with fewer multiplications?
Slides by Kevin Wayne.
Copyright © 2005 Pearson-Addison Wesley.
All rights reserved.
1
2
Complex Multiplication
5.5 Integer Multiplication
Complex multiplication. (a + bi) (c + di) = x + yi.
Grade-school. x = ac - bd, y = bc + ad.
4 multiplications, 2 additions
Q. Is it possible to do with fewer multiplications?
A. Yes. [Gauss] x = ac - bd, y = (a + b) (c + d) - ac - bd.
3 multiplications, 5 additions
Remark. Improvement if no hardware multiply.
3
Integer Addition
Integer Multiplication
Addition. Given two n-bit integers a and b, compute a + b.
Grade-school. Θ(n) bit operations.
Multiplication. Given two n-bit integers a and b, compute a × b.
Grade-school. Θ(n2) bit operations.
1 1 0 1 0 1 0 1
1
1
1
1
1
1
0
1
1
1
0
1
0
1
0
1
1 1 0 1 0 1 0 1
+
0
1
1
1
1
1
0
1
0 0 0 0 0 0 0 0 0
1
0
1
0
1
0
0
1
0
× 0 1 1 1 1 1 0 1
1 1 0 1 0 1 0 1 0
1 1 0 1 0 1 0 1 0
1 1 0 1 0 1 0 1 0
1 1 0 1 0 1 0 1 0
1 1 0 1 0 1 0 1 0
Remark. Grade-school addition algorithm is optimal.
0 0 0 0 0 0 0 0 0
0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
Q. Is grade-school multiplication algorithm optimal?
5
6
Divide-and-Conquer Multiplication: Warmup
Recursion Tree
To multiply two n-bit integers a and b:
Multiply four ½n-bit integers, recursively.
Add and shift to obtain result.
lg n
$ 21+ lg n #1 '
2
T (n) = " n 2 k = n &
) = 2n # n
% 2 #1 (
k=0
"
0
if n = 0
T (n) = #
$ 4T (n /2) + n otherwise


!
n
!
T(n)
n /2
a
= 2 " a1 + a0
b
= 2 n / 2 " b1 + b0
a b = 2 n / 2 " a1 + a0 2 n / 2 " b1 + b0 = 2 n " a1b1 + 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0
(
Ex.
!
)(
)
a = 10001101
b = 11100001
a1
b1
T(n/4) T(n/4)
T(n/4)
T(n/4)
...
4(n/2)
T(n/2)
T(n/4)
T(n/4) T(n/4)
T(n/4)
16(n/4)
...
b0
...
...
recursive calls
4k (n / 2k)
T(n / 2k)
T (n) = 4T (n /2) + "(n) # T (n) = "(n 2 )
123
1424
3
add, shift
T(2)
!
T(n/2)
...
a0
T(n/2)
T(n/2)
7
T(2)
T(2)
T(2)
...
T(2)
T(2)
T(2)
T(2)
4
lg n
(1)
8
Karatsuba Multiplication
Karatsuba Multiplication
To multiply two n-bit integers a and b:
Add two ½n bit integers.
Multiply three ½n-bit integers, recursively.
Add, subtract, and shift to obtain result.
To multiply two n-bit integers a and b:
Add two ½n bit integers.
Multiply three ½n-bit integers, recursively.
Add, subtract, and shift to obtain result.





a

= 2 n / 2 " a1 + a0
a
n /2
b
= 2 n / 2 " b1 + b0
ab = 2 n " a1b1 + 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0
= 2 n " a1b1 + 2 n / 2 " ( (a1 + a0 ) (b1 + b0 ) # a1b1 # a0 b0 ) + a0 b0
b
= 2 " b1 + b0
ab = 2 n " a1b1 + 2 n / 2 " (a1b0 + a0 b1 ) + a0 b0
= 2 n " a1b1 + 2 n / 2 " ( (a1 + a0 ) (b1 + b0 ) # a1b1 # a0 b0 ) + a0 b0
2
1
1
3
3
= 2 n / 2 " a1 + a0
1
!
2
1
3
3
!
Theorem. [Karatsuba-Ofman 1962] Can multiply two n-bit integers
in O(n1.585) bit operations.
T (n) " T ( #n /2$ ) + T ( %n /2& ) + T ( 1+ %n /2& ) +
'(n)
( T (n) = O(n lg 3 ) = O(n1.585 )
14243
14444444244444443
recursive calls
add, subtract, shift
9
10
!
Karatsuba: Recursion Tree
lg n
"
0
if n = 0
T (n) = #
$ 3T (n /2) + n otherwise
!
T (n) = " n
k=0
( 23 )
k
Fast Integer Division Too (!)
$ 3 1+ lg n #1 '
()
) = 3n lg 3 # 2n
= n& 2 3
&
)
% 2 #1 (
n
!T(n)
T(n/2)
Integer division. Given two n-bit (or less) integers s and t,
compute quotient q = s / t and remainder r = s mod t.
T(n/2)
Fact. Complexity of integer division is same as integer multiplication.
To compute quotient q:
2
Approximate x = 1 / t using Newton's method: xi+1 = 2xi " t xi

3(n/2)
T(n/2)

After log n iterations, either q = s x or q = s x.
using fast
multiplication
!
T(n/4) T(n/4)
T(n/4)
T(n/4) T(n/4)
T(n/4)
T(n/4) T(n/4)
T(n/4)
9(n/4)
...
...
3k (n / 2k)
T(n / 2k)
...
...
T(2)
T(2)
T(2)
T(2)
...
T(2)
T(2)
T(2)
T(2)
3
lg n
(1)
11
12
Dot Product
Matrix Multiplication
Dot product. Given two length n vectors a and b, compute c = a ⋅ b.
Grade-school. Θ(n) arithmetic operations.
n
a " b = # ai bi
i=1
a = [ .70 .20 .10 ]
!
b = [ .30 .40 .30 ]
a " b = (.70 # .30) + (.20 # .40) + (.10 # .30) = .32
!
Remark. Grade-school dot product algorithm is optimal.
14
Matrix Multiplication
Block Matrix Multiplication
Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB.
Grade-school. Θ(n3) arithmetic operations.
C11
n
cij = " aik bkj
A11
A12
B11
k=1
"c11
$
$c21
$M
$c
# n1
c12
c22
M
cn2
"a11
L c1n %
'
$
L c2n '
a
= $ 21
$M
O M'
$a
L cnn '&
# n1
a12
a 22
M
a n2
"b11 b12
L a1n %
'
$
L a 2n '
b b
( $ 21 22
$M
O M '
M
$b b
L a nn '&
# n1 n2
L b1n %
'
L b2n '
O M'
L bnn '&
" 152 158 164 170 %
" 0 1 2 3 % "16
$
'
$
' $
504
526
548
570
$
' = $ 4 5 6 7 ' ( $20
$ 856 894 932 970 '
$ 8 9 10 11' $24
$
'
$
' $
1208
1262
1316
1370
#
&
#12 13 14 15& #28
!
!
".59 .32 .41%
$
'
$.31 .36 .25'
$#.45 .31 .42'&
=
!
".70 .20 .10%
" .80 .30 .50%
$
'
$
'
.30
.60
.10
(
$
'
$ .10 .40 .10'
$#.50 .10 .40'&
$# .10 .30 .40'&
17
21
25
29
18
22
26
30
19 %
'
23'
27'
'
31&
B11
#0 1& #16 17&
#2 3& #24 25&
#152 158 &
C 11 = A11 " B11 + A12 " B21 = %
(" %
( + %
(" %
( = %
(
$4 5' $20 21'
$6 7' $28 29'
$504 526'
!
Q. Is grade-school matrix multiplication algorithm optimal?
!
15
16
Matrix Multiplication: Warmup
Fast Matrix Multiplication
To multiply two n-by-n matrices A and B:
Divide: partition A and B into ½n-by-½n blocks.
Conquer: multiply 8 pairs of ½n-by-½n matrices, recursively.
Combine: add appropriate products using 4 matrix additions.
Key idea. multiply 2-by-2 blocks with only 7 multiplications.

"C11
$
#C21


"C11 C12 %
" A11
$
' = $
#C21 C22 &
# A21
A12 % " B11
'( $
A22 & # B21
B12 %
'
B22 &
C11
=
C12
C21
C22
=
=
=
C11
C12
C21
C22
!
( A11 " B11 ) + ( A12 " B21 )
( A11 " B12 ) + ( A12 " B22 )
( A21 " B11 ) + ( A22 " B21 )
( A21 " B12 ) + ( A22 " B22 )
" A11
C12 %
' = $
C22 &
# A21
=
=
=
=
A12 % " B11
' ( $
A22 & # B21
B12 %
'
B22 &
P5 + P4 " P2 + P6
P1 + P2
P3 + P4
P5 + P1 " P3 " P7
P1
P2
P3
P4
P5
P6
P7
=
=
=
=
=
=
=
A11 " ( B12 # B22 )
( A11 + A12 ) " B22
( A21 + A22 ) " B11
A22 " ( B21 # B11 )
( A11 + A22 ) " ( B11 + B22 )
( A12 # A22 ) " ( B21 + B22 )
( A11 # A21 ) " ( B11 + B12 )
!
!
T (n) = 8T (n /2 ) +
"(n 2 )
# T (n) = "(n 3 )
1442443
1424
3
recursive calls

add, form submatrices

!
!7 multiplications.
18 = 8 + 10 additions and subtractions.
!
17
18
Fast Matrix Multiplication
Fast Matrix Multiplication: Practice
To multiply two n-by-n matrices A and B: [Strassen 1969]
Divide: partition A and B into ½n-by-½n blocks.
Compute: 14 ½n-by-½n matrices via 10 matrix additions.
Conquer: multiply 7 pairs of ½n-by-½n matrices, recursively.
Combine: 7 products into 4 terms using 8 matrix additions.
Implementation issues.
Sparsity.
Caching effects.
Numerical stability.
Odd matrix dimensions.
Crossover to classical algorithm around n = 128.









Analysis.
Assume n is a power of 2.
T(n) = # arithmetic operations.
Common misperception. “Strassen is only a theoretical curiosity.”
Apple reports 8x speedup on G4 Velocity Engine when n ≈ 2,500.
Range of instances where it's useful is a subject of controversy.




T (n) = 7T (n /2 ) + "(n 2 )
# T (n) = "(n log2 7 ) = O(n 2.81 )
1424
3 14243
recursive calls
add, subtract
Remark. Can "Strassenize" Ax = b, determinant, eigenvalues, SVD, ….
!
19
20
Fast Matrix Multiplication: Theory
Fast Matrix Multiplication: Theory
Q. Multiply two 2-by-2 matrices with 7 scalar multiplications?
A. Yes! [Strassen 1969]
"(n log2 7 ) = O(n 2.807 )
Q. Multiply two 2-by-2 matrices with 6 scalar multiplications?
!
A. Impossible. [Hopcroft and Kerr 1971]
log 2 6
"(n
) = O(n 2.59 )
Q. Two 3-by-3 matrices with 21 scalar multiplications?
A. Also impossible.
!
" (n log3 21 ) = O(n 2.77 )
Begun, the decimal wars have. [Pan, Bini et al, Schönhage, …]
!
Two 20-by-20 matrices with 4,460 scalar multiplications.
Two 48-by-48 matrices with 47,217 scalar multiplications.
A year later.
!
December, 1979.
!
January, 1980.

2.7801
O(n

O(n 2.7799 )


!
Best known. O(n2.376) [Coppersmith-Winograd, 1987]
O(n 2.805 )

)
Conjecture. O(n2+ε) for any ε > 0.
O(n 2.521813)
Caveat. Theoretical improvements to Strassen are progressively
less practical.
O(n 2.521801)
!
!
21
22
Fourier Analysis
5.6 Convolution and FFT
Fourier theorem. [Fourier, Dirichlet, Riemann] Any periodic function
can be expressed as the sum of a series of sinusoids.
sufficiently smooth
t
y(t) =
2 N sin kt
#
" k=1 k
N = 100
15
10
!
24
Euler's Identity
Time Domain vs. Frequency Domain
Sinusoids. Sum of sine an cosines.
Signal. [touch tone button 1]
1
2
sin(2" # 697 t) +
1
2
sin(2" # 1209 t)
!
Time domain.
eix = cos x + i sin x
y(t) =
sound
pressure
Euler's identity
time (seconds)
Sinusoids. Sum of complex exponentials.
Frequency domain.
0.5
amplitude
frequency (Hz)
Reference: Cleve Moler, Numerical Computing with MATLAB
25
Time Domain vs. Frequency Domain
26
Fast Fourier Transform
Signal. [recording, 8192 samples per second]
FFT. Fast way to convert between time-domain and frequency-domain.
Alternate viewpoint. Fast way to multiply and evaluate polynomials.
we take this approach
If you speed up any nontrivial algorithm by a factor of a
million or so the world will beat a path towards finding
useful applications for it. -Numerical Recipes
Magnitude of discrete Fourier transform.
Reference: Cleve Moler, Numerical Computing with MATLAB
27
28
Fast Fourier Transform: Applications
Fast Fourier Transform: Brief History
Applications.
Optics, acoustics, quantum physics, telecommunications, radar,
control systems, signal processing, speech recognition, data
compression, image processing, seismology, mass spectrometry…
Digital media. [DVD, JPEG, MP3, H.264]
Medical diagnostics. [MRI, CT, PET scans, ultrasound]
Numerical solutions to Poisson's equation.
Shor's quantum factoring algorithm.
…
Gauss (1805, 1866). Analyzed periodic motion of asteroid Ceres.

Runge-König (1924). Laid theoretical groundwork.
Danielson-Lanczos (1942). Efficient algorithm, x-ray crystallography.


Cooley-Tukey (1965). Monitoring nuclear tests in Soviet Union and
tracking submarines. Rediscovered and popularized FFT.



The FFT is one of the truly great computational
developments of [the 20th] century. It has changed the
face of science and engineering so much that it is not an
exaggeration to say that life as we know it would be very
different without the FFT. -Charles van Loan
Importance not fully realized until advent of digital computers.
29
Polynomials: Coefficient Representation
30
A Modest PhD Dissertation Title
Polynomial. [coefficient representation]
A(x) = a0 + a1x + a2 x 2 +L+ an"1x n"1
2
B(x) = b0 + b1x + b2 x +L+ bn"1x
"New Proof of the Theorem That Every Algebraic Rational
Integral Function In One Variable can be Resolved into
Real Factors of the First or the Second Degree."
n"1
!
Add. O(n) arithmetic operations.
!
A(x) + B(x) = (a0 + b0 ) + (a1 + b1 )x +L+ (an"1 + bn"1 )x n"1
- PhD dissertation, 1799 the University of Helmstedt
Evaluate. O(n) using Horner's method.
!
A(x) = a0 + (x (a1 + x (a2 +L+ x (an"2 + x (an"1 ))L))
Multiply (convolve). O(n2) using brute force.
!
A(x) " B(x) =
2n#2
i
i =0
j =0
$ ci x i , where ci = $ a j bi# j
31
!
32
Polynomials: Point-Value Representation
Polynomials: Point-Value Representation
Fundamental theorem of algebra. [Gauss, PhD thesis] A degree n
polynomial with complex coefficients has exactly n complex roots.
Polynomial. [point-value representation]
A(x) : (x 0 , y0 ), K, (x n-1 , yn"1 )
B(x) : (x 0 , z0 ), K, (x n-1 , zn"1 )
Corollary. A degree n-1 polynomial A(x) is uniquely specified by its
evaluation at n distinct values of x.
Add. O(n) arithmetic operations.
!
A(x) + B(x) : (x 0 , y0 + z0 ), K, (x n-1 , yn"1 + zn"1 )
y
Multiply (convolve). O(n), but need 2n-1 points.
!
A(x) " B(x) : (x 0 , y0 " z0 ), K, (x 2n-1 , y2n#1 " z2n#1 )
Evaluate. O(n2) using Lagrange's formula.
yj = A(xj )
!
n"1
xj
A(x) = % yk
x
k=0
$ (x " x j )
j#k
$ (xk " x j )
j#k
33
34
!
Converting Between Two Polynomial Representations
Converting Between Two Representations: Brute Force
Tradeoff. Fast evaluation or fast multiplication. We want both!
representation
multiply
coefficient
O(n2)
O(n)
point-value
O(n)
O(n2)
Coefficient ⇒ point-value. Given a polynomial a0 + a1 x + ... + an-1 xn-1,
evaluate it at n distinct points x0 , ..., xn-1.
evaluate
#
%
%
%
%
%
%$
Goal. Efficient conversion between two representations ⇒ all ops fast.
y0
y1
y2
M
yn"1
#1 x
&
0
%
(
1
x
1
%
(
( = % 1 x2
%
(
M
%M
(
('
%$ 1 xn"1
x02
x12
x22
M
2
xn"1
L x0n"1 &
(
L x1n"1 (
L x2n"1 (
(
O
M (
n"1 (
L xn"1
'
#
%
%
%
%
%
%$
a0
a1
a2
M
an"1
&
(
(
(
(
(
('
!
(x0 , y0 ), K, (xn"1 , yn"1 )
a0 , a1 , ..., an-1
point-value representation
coefficient representation
!
Running time. O(n2) for matrix-vector multiply (or n Horner's).
!
35
36
Converting Between Two Representations: Brute Force
Divide-and-Conquer
Point-value ⇒ coefficient. Given n distinct points x0, ... , xn-1 and values
y0, ... , yn-1, find unique polynomial a0 + a1x + ... + an-1 xn-1, that has given
values at given points.
Decimation in frequency. Break up polynomial into low and high powers.
A(x)
= a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7.
Alow(x) = a0 + a1x + a2x2 + a3x3.
Ahigh (x) = a4 + a5x + a6x2 + a7x3.
A(x) = Alow(x) + x4 Ahigh(x).



#
%
%
%
%
%
%$
y0
y1
y2
M
yn"1
#1 x
&
0
%
(
% 1 x1
(
( = % 1 x2
%
(
M
%M
(
('
%$ 1 xn"1
x0n"1
x1n"1
x2n"1
x02
x12
x22
M
2
xn"1
L
L
L
O
M
n"1
L xn"1
&
(
(
(
(
(
('
#
%
%
%
%
%
%$
a0
a1
a2
M
an"1
&
(
(
(
(
(
('

Decimation in time. Break polynomial up into even and odd powers.
A(x)
= a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7.
Aeven(x) = a0 + a2x + a4x2 + a6x3.
Aodd (x) = a1 + a3x + a5x2 + a7x3.
A(x) = Aeven(x2) + x Aodd(x2).



Vandermonde matrix is invertible iff xi distinct
!

Running time. O(n3) for Gaussian elimination.
or O(n2.376) via fast matrix multiplication
37
38
Coefficient to Point-Value Representation: Intuition
Coefficient to Point-Value Representation: Intuition
Coefficient ⇒ point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1,
evaluate it at n distinct points x0 , ..., xn-1.
Coefficient ⇒ point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1,
evaluate it at n distinct points x0 , ..., xn-1.
we get to choose which ones!
we get to choose which ones!
Divide. Break polynomial up into even and odd powers.
A(x)
= a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7.
Aeven(x) = a0 + a2x + a4x2 + a6x3.
Aodd (x) = a1 + a3x + a5x2 + a7x3.
A(x) = Aeven(x2) + x Aodd(x2).
A(-x) = Aeven(x2) - x Aodd(x2).
Divide. Break polynomial up into even and odd powers.
A(x)
= a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7.
Aeven(x) = a0 + a2x + a4x2 + a6x3.
Aodd (x) = a1 + a3x + a5x2 + a7x3.
A(x) = Aeven(x2) + x Aodd(x2).
A(-x) = Aeven(x2) - x Aodd(x2).
Intuition. Choose two points to be ±1.
A( 1) = Aeven(1) + 1 Aodd(1).
A(-1) = Aeven(1) - 1 Aodd(1).
Can evaluate polynomial of degree ≤ n
Intuition. Choose four complex points to be ±1, ±i.
A(1) = Aeven(1) + 1 Aodd(1).
A(-1) = Aeven(1) - 1 Aodd(1).
Can evaluate polynomial of degree ≤ n
at 4 points by evaluating two polynomials
A( i ) = Aeven(-1) + i Aodd(-1).
of degree ≤ ½n at 2 points.
A( -i ) = Aeven(-1) - i Aodd(-1).














at 2 points by evaluating two polynomials
of degree ≤ ½n at 1 point.


39
40
Discrete Fourier Transform
Roots of Unity
Coefficient ⇒ point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1,
evaluate it at n distinct points x0 , ..., xn-1.
Key idea. Choose xk =
ωk
where ω is principal
#1 1
1
%
1
2
1
)
)
%
% 1 )2
)4
%
3
)6
%1 )
%M
M
M
%
n"1
2(n"1)
1
)
)
$
# y0 &
% (
% y1 (
% y2 (
% ( =
% y3 (
% M (
% (
$yn"1'
DFT
!
1
)3
L
L
)6
L
)9
M
L
O
L
) 3(n"1)
nth
Def. An nth root of unity is a complex number x such that xn = 1.
Fact. The nth roots of unity are: ω0, ω1, …, ωn-1 where ω = e 2π i / n.
Pf. (ωk)n = (e 2π i k / n) n = (e π i ) 2k = (-1) 2k = 1.
root of unity.
&
(
(
)2(n"1) (
(
) 3(n"1) (
(
M
(n"1)(n"1) (
)
'
1
) n"1
Fact. The ½nth roots of unity are: ν0, ν1, …, νn/2-1 where ν = ω2 = e 4π i / n.
# a0 &
%
(
% a1 (
% a2 (
%
(
% a3 (
% M (
%
(
$an"1'
ω2 = ν 1 = i
ω3
ω4 = ν2 = -1
ω1
ω0 = ν 0 = 1
n=8
Fourier matrix Fn
ω7
ω5
ω6 = ν3 = -i
41
42
Fast Fourier Transform
FFT Algorithm
Goal. Evaluate a degree n-1 polynomial A(x) = a0 + ... + an-1 xn-1 at its
nth roots of unity: ω0, ω1, …, ωn-1.
fft(n, a0,a1,…,an-1) {
if (n == 1) return a0
Divide. Break up polynomial into even and odd powers.
Aeven(x) = a0 + a2x + a4x2 + … + an-2 x n/2 - 1.
Aodd (x) = a1 + a3x + a5x2 + … + an-1 x n/2 - 1.
A(x) = Aeven(x2) + x Aodd(x2).

(e0,e1,…,en/2-1) ← FFT(n/2, a0,a2,a4,…,an-2)
(d0,d1,…,dn/2-1) ← FFT(n/2, a1,a3,a5,…,an-1)


for k =
ωk ←
yk+n/2
yk+n/2
}
Conquer. Evaluate Aeven(x) and Aodd(x) at the ½nth
roots of unity: ν0, ν1, …, νn/2-1.
νk = (ωk )2
Combine.
A(ω k)
= Aeven(ν k) + ω k Aodd (ν k), 0 ≤ k < n/2
k+
½n
A(ω
) = Aeven(ν k) – ω k Aodd (ν k), 0 ≤ k < n/2
0 to n/2 - 1 {
e2πik/n
← ek + ωk dk
← ek - ωk dk

return (y0,y1,…,yn-1)

}
νk = (ωk + ½n )2
ωk+ ½n = -ωk
43
44
FFT Summary
Recursion Tree
Theorem. FFT algorithm evaluates a degree n-1 polynomial at each of
the nth roots of unity in O(n log n) steps.
a0, a1, a2, a3, a4, a5, a6, a7
assumes n is a power of 2
perfect shuffle
Running time.
T (n) = 2T (n /2) + "(n) # T (n) = "(n log n)
a0, a2, a4, a6
a1, a3, a5, a7
!
a0, a4
a2, a6
a3, a7
a1, a5
O(n log n)
a0 , a1 , ..., an-1
(" 0 , y0 ), ..., (" n#1 , yn#1 )
???
coefficient
representation
point-value
representation
!
a0
a4
a2
a6
a1
a5
a3
a7
000
100
010
110
001
101
011
111
"bit-reversed" order
!
45
46
Inverse Discrete Fourier Transform
Inverse DFT
Point-value ⇒ coefficient. Given n distinct points x0, ... , xn-1 and values
y0, ... , yn-1, find unique polynomial a0 + a1x + ... + an-1 xn-1, that has given
values at given points.
# a0 &
%
(
% a1 (
% a2 (
%
( =
% a3 (
% M (
%
(
$an"1'
#1 1
1
%
1
2
1
)
)
%
2
%1 )
)4
%
3
)6
%1 )
%M
M
M
%
n"1
2(n"1)
)
$1 )
Inverse DFT
!
)
1
)3
L
L
)6
L
)9
L
M
O
L
3(n"1)
&
(
(
2(n"1)
(
)
3(n"1) (
)
(
(
M
(n"1)(n"1) (
)
'
1
) n"1
"1
Claim. Inverse of Fourier matrix Fn is given by following formula.
Gn
# y0 &
%
(
% y1 (
% y2 (
%
(
% y3 (
% M (
%
(
$ yn"1'
$1
1
&
#1
1
"
&
#2
1 &1 "
=
&
#3
n &1 "
&M
M
&
#(n#1)
1
"
%
1
n
1
"#2
1
"#3
L
L
"#4
"#6
L
#6
"
M
"#9
M
"#2(n#1)
"#3(n#1)
L
O
L
'
1
)
"#(n#1) )
"#2(n#1) )
)
"#3(n#1) )
)
M
#(n#1)(n#1) )
"
(
Fn is unitary
!
!
Consequence. To compute inverse FFT, apply same algorithm but use
ω-1 = e -2π i / n as principal nth root of unity (and divide by n).
Fourier matrix inverse (Fn) -1
47
48
Inverse FFT: Proof of Correctness
Inverse FFT: Algorithm
Claim. Fn and Gn are inverses.
Pf.
(F
n
Gn
)
n$1
k k"
=
n$1
1
1
"
"
% # k j #$ j k =
% #(k$k ) j
n j=0
n j=0
ifft(n, a0,a1,…,an-1) {
if (n == 1) return a0
& 1 if k = k"
= '
( 0 otherwise
(e0,e1,…,en/2-1) ← FFT(n/2, a0,a2,a4,…,an-2)
(d0,d1,…,dn/2-1) ← FFT(n/2, a1,a3,a5,…,an-1)
summation lemma
!
Summation lemma. Let ω be a principal nth root of unity. Then
for k =
ωk ←
yk+n/2
yk+n/2
}
& n if k % 0 mod n
$ "kj = '
( 0 otherwise
j=0
n#1
Pf.



0 to n/2 - 1 {
e-2πik/n
← (ek + ωk dk) / n
← (ek - ωk dk) / n
return (y0,y1,…,yn-1)
If k is a!multiple of n then ωk = 1 ⇒ series sums to n.
Each nth root of unity ωk is a root of xn - 1 = (x - 1) (1 + x + x2 + ... + xn-1).
if ωk ≠ 1 we have: 1 + ωk + ωk(2) + … + ωk(n-1) = 0 ⇒ series sums to 0. ▪
}
49
50
Inverse FFT Summary
Polynomial Multiplication
Theorem. Inverse FFT algorithm interpolates a degree n-1 polynomial
given values at each of the nth roots of unity in O(n log n) steps.
Theorem. Can multiply two degree n-1 polynomials in O(n log n) steps.
pad with 0s to make n a power of 2
assumes n is a power of 2
coefficient
representation
coefficient
representation
a0 , a1 , K, an-1
b0 , b1 , K, bn-1
!
2 FFTs
c0 , c1 , K, c2n-2
!
O(n log n)
inverse FFT
O(n log n)
O(n log n)
a0 , a1 , K, an-1
coefficient
representation
!
(" 0 , y0 ), K, (" n#1 , yn#1 )
O(n log n)
A(" 0 ), ..., A(" 2n#1 )
point-value
representation
0
B(" ), ..., B("
!
2n#1
)
point-value multiplication
C(" 0 ), ..., C(" 2n#1 )
O(n)
51
52
!
!
FFT in Practice ?
FFT in Practice
Fastest Fourier transform in the West. [Frigo and Johnson]
Optimized C library.
Features: DFT, DCT, real, complex, any size, any dimension.
Won 1999 Wilkinson Prize for Numerical Software.
Portable, competitive with vendor-tuned code.




Implementation details.
Instead of executing predetermined algorithm, it evaluates your
hardware and uses a special-purpose compiler to generate an
optimized algorithm catered to "shape" of the problem.
Core algorithm is nonrecursive version of Cooley-Tukey.
O(n log n), even for prime sizes.



Reference: http://www.fftw.org
53
Integer Multiplication, Redux
Integer Multiplication, Redux
Integer multiplication. Given two n bit integers a = an-1 … a1a0 and
b = bn-1 … b1b0, compute their product a ⋅ b.
Convolution algorithm.
Form two polynomials.
Note: a = A(2), b = B(2).
Compute C(x) = A(x) ⋅ B(x).
Evaluate C(2) = a ⋅ b. !


Integer multiplication. Given two n bit integers a = an-1 … a1a0 and
b = bn-1 … b1b0, compute their product a ⋅ b.
A(x) = a0 + a1x + a2 x 2 +L+ an"1x n"1
"the fastest bignum library on the planet"
B(x) = b0 + b1x + b2 x 2 +L+ bn"1x n"1
Practice. [GNU Multiple Precision Arithmetic Library]
It uses brute force, Karatsuba, and FFT, depending on the size of n.



54
Running time: O(n log n)!complex arithmetic operations.
Theory. [Schönhage-Strassen 1971] O(n log n log log n) bit operations.
Theory. [Fürer 2007] O(n log n 2O(log *n)) bit operations.
55
56
Integer Arithmetic
Factoring
Fundamental open question. What is complexity of arithmetic?
Factoring. Given an n-bit integer, find its prime factorization.
2773 = 47 × 59
Operation
Upper Bound
Lower Bound
addition
O(n)
Ω(n)
multiplication
division
2O(log*n))
Ω(n)
O(n log n 2O(log*n))
Ω(n)
O(n log n
267-1 = 147573952589676412927 = 193707721 × 761838257287
a disproof of Mersenne's conjecture that 267 - 1 is prime
740375634795617128280467960974295731425931888892312890849
362326389727650340282662768919964196251178439958943305021
275853701189680982867331732731089309005525051168770632990
72396380786710086096962537934650563796359
RSA-704
($30,000 prize if you can factor)
57
58
Factoring and RSA
Shor's Algorithm
Primality. Given an n-bit integer, is it prime?
Factoring. Given an n-bit integer, find its prime factorization.
Shor's algorithm. Can factor an n-bit integer in O(n3) time on a
quantum computer.
algorithm uses quantum QFT !
Significance. Efficient primality testing ⇒ can implement RSA.
Significance. Efficient factoring ⇒ can break RSA.
Theorem. [AKS 2002] Poly-time algorithm for primality testing.
Ramification. At least one of the following is wrong:
RSA is secure.
Textbook quantum mechanics.
Extending Church-Turing thesis.



59
60
Shor's Factoring Algorithm
Extra Slides
Period finding.
1
2
4
8
2 i mod 15
1
2
4
8
2 i mod 21
1
2
4
8
2
i
16
32
64
128
…
1
2
4
8
…
16
11
1
2
…
period = 4
period = 6
Theorem. [Euler] Let p and q be prime, and let N = p q. Then, the
following sequence repeats with a period divisible by (p-1) (q-1):
x mod N, x2 mod N, x3 mod N, x4 mod N, …
Consequence. If we can learn something about the period of the
sequence, we can learn something about the divisors of (p-1) (q-1).
by using random values of x, we get the divisors of (p-1) (q-1),
and from this, can get the divisors of N = p q
61
Fourier Matrix Decomposition
Fn =
!
I4
!
$1 1
1
&
1
2
1
"
"
&
& 1 "2
"4
&
3
"6
&1 "
&M
M
M
&
n#1
2(n#1)
1
"
"
%
"1
$
0
= $
$0
$
#0
0
1
0
0
0
0
1
0
0%
'
0'
0'
'
1&
# I n /2
y = Fn a = !
%
$ I n /2
1
"3
L
L
"6
L
"9
M
L
O
L
" 3(n#1)
D4
'
)
)
"2(n#1) )
)
" 3(n#1) )
)
M
)
"(n#1)(n#1) (
#"0
%
0
= %
%0
%
$0
1
" n#1
0
"1
0
0
0
0
"2
0
0 &
(
0 (
0 (
(
"3 '
"
$
a = $
$
$
#
a0
a1
a2
a3
%
'
'
'
'
&
Dn /2 & # Fn /2 aeven &
(%
( !
" Dn /2 ' $ Fn /2 aodd '
63
!

Download Report

How to Multiply - Cs.princeton.edu

Paperzz.com

Your Paperzz