A Karatsuba-based Algorithm for Polynomial Multiplication in

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, MONTH YYYY
1
A Karatsuba-based Algorithm for Polynomial
Multiplication in Chebyshev Form
Juliano B. Lima, Student Member, IEEE, Daniel Panario, Member, IEEE, and Qiang Wang
Abstract— In this paper, we present a new method for multiplying polynomials in Chebyshev form. Our approach has two
steps. First, the well-known Karatsuba’s algorithm is applied to
polynomials constructed by using Chebyshev coefficients. Then,
from the obtained result, extra arithmetic operations are used to
write the final result in Chebyshev form. The proposed algorithm
has a quadratic computational complexity. We also compare our
method to other approaches.
Index Terms— Theory of computation, analysis of algorithms
and problem complexity, computations on polynomials.
I. I NTRODUCTION
HEBYSHEV polynomials have been an essential mathematical object in several fields of knowledge. In Electronics, for
instance, such polynomials have an important role in the design
of analog and digital filters with characteristics close to the ideal
ones [1]. Recently, Chebyshev series, i.e., the approximation of
a function in terms of Chebyshev polynomials, was proposed for
analyzing circuit’s nonlinearities. This provides more accuracy
when comparing to other expansions, such as Taylor series [2].
Interpolation techniques via Chebyshev polynomials have been
part of numerical algorithms for calculating chromatic dispersion
coefficients of optical fibers. This allows us to plot the dispersion curves that describe the behavior of those fibers [3]. Such
techniques are also useful in direct digital frequency synthesis of
arbitrary waveform, resample procedures for discrete multitone
modems and many other scenarios [4], [5].
In general, the use of Chebyshev polynomials for approximating a function assures more stability than the monomial
representation or the use of other basis. In particular, if a truncation is necessary, the quick decreasing of Chebyshev expansion
coefficients entails relatively small rounding errors [6], [7]. This
is the basic reason making those polynomials highly attractive
in numerical analysis and, in particular, in approximation and
interpolation techniques.
This paper deals with the important operation of multiplication
of polynomials in Chebyshev form. That is, given two polynomials a(x) and b(x) in Chebyshev form, obtain the polynomial
c(x) = a(x) · b(x) also written in Chebyshev form. This problem
was previously addressed in [8], where two approaches were
given. The first one is a direct multiplication of polynomials in
Chebyshev form, while the second is based on the discrete cosine
transform (DCT). In this work, we propose a method based on
the well-known Karatsuba’s algorithm [9], [10].
Our approach consists of the application of Karatsuba’s algorithm to the ordinary polynomials a′ (x) and b′ (x) obtained
C
Manuscript received Month DD, YYYY; revised Month DD, YYYY.
J. B. Lima is with the Department of Electronics and Systems, Federal
University of Pernambuco, Recife, Brazil (e-mail: juliano [email protected]).
D. Panario and Q. Wang are with the School of Mathematics and Statistics, Carleton University, Ottawa, Canada (e-mail: [email protected];
[email protected]).
from a(x) and b(x). The coefficients in the resulting product are
denoted by c′i . Then, we show that the Chebyshev coefficients
of c(x), denoted by ci , can be computed from the coefficients
c′i . This procedure, which needs extra arithmetic operations, is
derived and explained in detail. Although our method involves
a quadratic computational complexity, the number of required
multiplications is reduced by half, when compared to the direct
multiplication [8]. Under this aspect, for small degree polynomials
a(x) and b(x) covering several Chebyshev expansion practical
applications [2], [3], [4], our method is also more efficient than
the mentioned DCT approach. Moreover, our procedure seems to
provide implementation advantages because it does not introduce
rounding errors.
In Section II, we review Chebyshev polynomials and the
direct method for multiplying polynomials in Chebyshev form.
In Section III, after introducing the main ideas of this paper, the
standard Karatsuba’s algorithm is briefly shown. Then, we use this
algorithm to perform the Chebyshev basis polynomial multiplication and provide some examples. Furthermore, Theorem 2 gives
a precise estimate for the cost of the algorithm. A comparison
with other approaches and conclusions are given in Section IV.
II. M ULTIPLICATION OF P OLYNOMIALS IN C HEBYSHEV F ORM
The classical definition of Chebyshev polynomials of the first
kind is
Ti (x) := cos(i · arccos x),
(1)
where i ∈ N and x ∈ [−1, 1]. From Equation (1), we obtain
T0 (x) = 1, T1 (x) = x, and the recurrence relation
Ti+1 (x) = 2 x Ti (x) − Ti−1 (x).
Hence, Chebyshev polynomials of degree i can be easily obtained.
It is also shown that every real polynomial a(x) of degree
≤ N − 1 can be written as a linear combination of Chebyshev
polynomials of the first kind [6]. Usually, this is called Chebyshev
expansion and it is given by
a(x) =
N
−1
X
a0
ai Ti (x), ai ∈ R.
+
2
(2)
i=1
From the relation
Ti Tj =
Ti+j + T|i−j|
2
, i, j ∈ N,
which can be verified by using simple trigonometric identities,
a multiplication rule for polynomials in Chebyshev form can be
derived. It is described in the following proposition [8].
Proposition 1: Let a(x) and b(x) be polynomials of degree
N − 1 given in the Chebyshev form
a(x) =
N
−1
X
a0
ai Ti (x)
+
2
i=1
2
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, MONTH YYYY
and
b(x) =
N
−1
X
b0
bi Ti (x),
+
2
i=1
where ai , bi ∈ R. Then the product c(x) = a(x) · b(x) has the
Chebyshev form
c(x) =
2N
−2
X
c0
ci Ti (x)
+
2
with

PN −1
a0 · b0 + 2 l=1
al · bl , i = 0;





 P
PN −1−i
i
2ci =
(al · bl+i + al+i · bl ),
l=0 ai−l · bl +
l=1



i
=
1, . . . , N − 2;


 PN −1
a
·
b
,
i
=
N − 1, . . . , 2N − 2.
l
l=i−N +1 i−l
(3)
The computation of all coefficients ci , i = 0, . . . , 2N − 2,
directly from Equation (3) is referred as a “direct method” and
involves O(N 2 ) real multiplications [8]. In this same equation,
the number of all possible products ai ·bj , i, j = 0, . . . , N −1, and
the number of products by 1/2 are counted. This gives Md (n),
the exact number of multiplications for computing all coefficients
ci using the direct method,
Md (N ) = N 2 + 2 N − 1.
(4)
According to Equation (3), given integers i1 and i2 such that
1 ≤ i1 ≤ N − 2 and i1 < i2 ≤ 2 N − 2, any term with the form
(al ·bl+i1 +al+i1 ·bl ), l = 1, . . . , N −1−i1 , is previously computed
P2
P −1
in the sum il=0
ai2 −l · bl or N
a
· bl . Consequently,
l=i2 −N +1 i2 −l
in the second row of that equation, the additions (al ·bl+i +al+i ·bl )
do not need to be counted. Therefore, Ad (N ), the exact number of
additions for obtaining all coefficients ci using the direct method
is
N
−2
X
(N − 1) +
i=1
=
(N − 1) (3N − 2)
.
2
2N
−2
X
i = 0;
i = 1, . . . , N − 2;
(6)
i = N − 1, . . . , 2N − 2.
By substituting Equation (6) into Equation (3), we obtain
i=1
Ad (N ) = N − 1 +

a0 · b 0 ,





 P
i
c′i =
l=0 ai−l · bl ,





 PN −1
l=i−N +1 ai−l · bl ,
(2 N − 2 − i),
 ′
PN −1
ci + 2 l=1
al · bl , i = 0;






P −1−i
2ci =
(al · bl+i + al+i · bl ),
c′i + N
l=1



i = 1, . . . , N − 2;


 ′
ci ,
i = N − 1, . . . , 2N − 2.
(7)
We remark that Equation (6) can be obtained by running a
classical divide and conquer method. It involves N 2 multiplicaP(log2 N )−1 −k
tions and N − 1 + N (N − 1) k=1
2
additions. To obtain
coefficients ci in Equation (7), we need extra operations (that is,
2N − 1 extra multiplications and N (N − 1)/2 extra additions).
The total numbers of multiplications and additions are equal to the
same numbers for the direct method; see Equations (4) and (5).
That is why in this paper we concentrate on using Karatsuba’s
algorithm to obtain coefficients c′i in Equation (6).
Given coefficients c′i computed by Karatsuba’s algorithm, coefficients ci could be obtained from Equation (7) with the following
number of extra multiplications: 2 N − 1 due to the scale factor
1/2; N − 1 for computing terms al · bl , l = 1, . . . , N − 1;
(N − 2) (N − 1)/2 for computing terms (al · bl+i + al+i · bl ) =
(al + al+i ) · (bl + bl+i ) − al · bl − al+i · bl+i , i = 1, . . . , N − 2,
l = 1, . . . , N − 1 − i. This implies a total number of extra
multiplications given by (N 2 + 3 N − 2)/2.
The number of extra additions related to first and second
rows of Equation (7) would be N − 1 and 5(N − 2)(N − 1)/2,
respectively. Then, the total number of extra additions would
be (5 N 2 − 13 N + 8)/2. We show how these numbers of extra
operations can be further reduced by using the intermediate results
of the Karatsuba’s algorithm previously applied.
Our algorithm is given below.
i=N −1
(5)
III. K ARATSUBA - BASED ALGORITHM FOR THE
MULTIPLICATION OF POLYNOMIALS IN C HEBYSHEV FORM
In this section, we present our algorithm: we use Karatsuba’s
algorithm to compute the product of two polynomials whose
Chebyshev coefficients are given. Karatsuba’s algorithm intermediate results are kept in track and then used to obtain the
Chebyshev coefficients ci of the product polynomial. The key
point of our approach is to apply Karatsuba’s algorithm for
performing an ordinary polynomial multiplication and recover
Chebyshev coefficients through some equations.
More specifically, in order to use the algorithm for multiplying
a(x) and b(x), coefficients ai and bi are associated to the term of
degree i, i = 0, . . . , N − 1, in the monomial representation. This
P −1
ai xi and b′ (x) =
procedure gives polynomials a′ (x) = N
i=0
PN −1
i
b
x
.
By
running
Karatsuba’s
algorithm,
we obtain the
i
i=0
P2N −2 ′ i
ci x . On the other
polynomial c′ (x) = a′ (x) · b′ (x) = i=0
hand, these coefficients c′i are given by
Algorithm: Karatsuba-based algorithm for polynomial multiplication in Chebyshev form.
P
N −1
Input: polynomials a(x) = a20 + i=1
ai Ti (x) and b(x) =
PN −1
b0
b
T
(x)
of
degree
N
−
1
in
Chebyshev
form.
+
i=1 i i
2
Output: polynomial c(x) = a(x) · b(x) =
of degree 2N − 2 in Chebyshev form.
c0
2
+
P2N −2
i=1
ci Ti (x)
Step 1: Apply Karatsuba’s algorithm on polynomials a′ (x) =
PN −1
PN −1
ai xi and b′ (x) =
bi xi , the product of which is
i=0
i=0
P2N −2 ′ i
′
′
′
ci x and store all
denoted by c (x) = a (x) · b (x) =
i=0
intermediate computations.
Step 1.1: These c′i are obtained from Equation (6).
Step 1.2: Clearly, any intermediate computation related to the
term of degree d in the polynomial c′ (x) can be written in the
P
form D
(aik ·bjk +ajk ·bik ), where D ≤ N −1 and ik +jk = d.
k=0
Step 2: Obtain terms al ·bl , l = 1, . . . , N −1, and (al ·bl+i +al+i ·
bl ), i = 1, . . . , N − 2, l = 1, . . . , N − 1 − i, from intermediate
computations of the form presented in Step 1.1. This may require
LIMA et al.: A KARATSUBA-BASED ALGORITHM FOR POLYNOMIAL MULTIPLICATION IN CHEBYSHEV FORM
3
a separation procedure.
B. Extra Operations for Karatsuba’s algorithm
Step 2.1 (Separation): Separate each term of the form (al · bl+i +
P
al+i · bl ) from the intermediate term D
(aik · bjk + ajk · bik ),
k=0
D > 0, such that ik + jk = 2l + i, for i = 1, . . . , N − 2 and
l = 1, . . . , N − 1 − i.
According to Equation (7), in order to obtain the Chebyshev
coefficients ci of polynomial c(x) from coefficients c′i , we need
to consider scaling factors 1/2 and computing terms al · bl ,
l = 1, . . . , N − 1, and (al · bl+i + al+i · bl ), i = 1, . . . , N − 2,
l = 1, . . . , N − 1 − i. Due to the recursive nature of Karatsuba’s
algorithm, some of these terms appear computed together with
other terms. Therefore, extra arithmetic operations will be needed
for computing them separately before conveniently adding them
to coefficients c′i . In this paper, this procedure is referred as
separation. Briefly, extra operations for obtaining coefficients ci
from coefficients c′i are related to:
• operations for separating terms originally computed together
with other terms;
• additions of terms al · bl and (al · bl+i + al+i · bl ) respectively
on first and second rows of Equation (7);
• multiplications by the scale factor 1/2.
The total number of required extra operations is stated in the
following theorem.
Step 3: Add the terms obtained in Step 2 to coefficients c′i , i =
0, . . . , N − 2, according to first and second rows of Equation (7),
P −2
to obtain c(x) = a(x) · b(x) = c20 + 2N
i=1 ci Ti (x).
We provide details concerning the execution of Step 2 of
the presented algorithm in Section III-B. The correctness of the
algorithm is immediate from Equations (3), (6) and (7).
A. Karatsuba’s Algorithm
Assume that we want to multiply two polynomials, a′ (x) and
b (x), with degrees N − 1. These polynomials are given in the
monomial form and have coefficients ai and bi respectively. For
the purpose of this paper, we consider N = 2n , n ∈ N. However,
there are also efficient ways for dealing with polynomials with
degrees different from 2n − 1 [10], [11]. We may write
′
a′ (x) = A1 (x) xN/2 + A0 (x)
and
b′ (x) = B1 (x) xN/2 + B0 (x),
where
A1 (x) = aN −1 xN/2−1 + · · · + aN/2 ,
Theorem 1: Let a(x) and b(x) be polynomials of degree N − 1
whose Chebyshev coefficients ai and bi , i = 0, . . . , N − 1, are
PN −1
PN −1
given. Let a′ (x) =
ai xi and b′ (x) =
bi xi be
i=0
i=0
P
2N −2 ′
′
polynomials whose product is denoted by c (x) = i=0 ci xi .
If the polynomial c′ (x) is computed using Karatsuba’s algorithm,
then the Chebyshev coefficients ci , i = 0, . . . , 2N − 2, of the
polynomial c(x) = a(x) · b(x) are obtained from the coefficients
c′i , i = 0, . . . , 2N − 2, with
B1 (x) = bN −1 xN/2−1 + · · · + bN/2 ,
N/2−1
B0 (x) = bN/2−1 x
(9)
5 N 2 − 6 N log2 3 + N (1 − log2 N )
2
(10)
extra multiplications and
+ · · · + b0 .
Ae (N ) ≤
We have c′ (x) = a′ (x) · b′ (x) given by
extra additions.
c′ (x) = [A1 (x) B1 (x)] xN
+ [A0 (x) B1 (x) + A1 (x) B0 (x)] xN/2
N 2 − 2 N log2 3 + 5 N − 2
2
Me (N ) =
A0 (x) = aN/2−1 xN/2−1 + · · · + a0 ,
(8)
+ [A0 (x) B0 (x)] .
In the above equation, simplifying the notation and omitting
“(x)”, the term multiplying xN/2 may be rewritten as
A0 B1 + A1 B0 = (A0 + A1 ) (B0 + B1 ) − A0 B0 − A1 B1 .
This saves one multiplication, because we have previously computed A0 B0 and A1 B1 . Therefore, the product of polynomials
with degree N − 1 may be computed using three products of
polynomials with degree (N/2)−1. As this procedure is recursive,
it is shown that Karatsuba’s algorithm for multiplying polynomials of degree N = 2n , i.e., for obtaining coefficients c′i , can be
done with N log2 3 multiplications and at most 6 N log2 3 − 8 N + 2
additions [12].
It is important to notice that we are not applying Karatsuba’s algorithm in a blackbox manner. Instead, we store
all intermediate results to be used later. We also remark that such an algorithm has a “three term” structure
based on the recursive computation of A1 B1 , A0 B0 and
A0 B1 +A1 B0 = (A0 +A1 ) (B0 +B1 )−A0 B0 −A1 B1 . Throughout this paper, intermediate terms involved on the computation of
A1 B1 , A0 B0 and A0 B1 + A1 B0 are respectively associated to
symbols 11, 00 and 01.
Before presenting the proof of Theorem 1, we introduce some
notations and develop examples which make the derivation of
Equations (9) and (10) easier to understand. Particularly, we
are interested in observing which intermediate terms related to
symbols 11, 00 and 01 are produced together. In what follows,
terms with such characteristic are written between h·i; we omit
this notation for single terms ai · bi .
Example 1: We want to multiply polynomials a(x) and b(x),
N = 2, whose Chebyshev coefficients ai and bi are given. Using
Karatsuba’s algorithm for computing coefficients c′i , we have
11 : c′2 = A1 B1 = a1 · b1 ;
00 :
01 :
c′0
c′1
(11)
= A0 B0 = a0 · b0 ;
(12)
= (A1 + A0 ) (B1 + B0 ) − A1 B1 − A0 B0
= (a1 + a0 ) (b1 + b0 ) − a1 b1 − a0 b0
= ha0 · b1 + a1 · b0 i.
(13)
c′2 /2,
c′1 /2
From Equation (7), we directly obtain c2 =
c1 =
and
c0 = c′0 /2 + c′2 , because there are no terms to be separated. In
this case, extra operations are exclusively due to the scale factor
1/2 and the addition c′0 /2 + c′2 , which results in Me (2) = 3 and
Ae (2) = 1.
Example 2: A second example is to multiply a(x) and b(x)
where N = 4. As Karatsuba’s algorithm is recursive, in this
4
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, MONTH YYYY
case, the computation of A1 B1 and A0 B0 may be viewed as
repetitions of the first example. Therefore, the intermediate terms
related to symbols 11 and 00 are
11
: c′6
= a3 · b 3 ,
00 : a1 · b1 ,
c′1
c′5
= ha2 · b3 + a3 · b2 i, a2 · b2 ;
= ha0 · b1 + a1 · b0 i,
c′0
= a0 · b 0 .
11 : c′14 = a7 · b7 , c′13 = ha6 · b7 + a7 · b6 i,
(14)
a6 · b6 , ha5 · b7 + a7 · b5 i,
(15)
c′11 = ha5 · b6 + a6 · b5 + a4 · b7 + a7 · b4 i,
The computation of (A1 + A0 ) (B1 + B0 ) − A1 B1 − A0 B0
is similar, being necessary a special care with terms produced
together. More specifically, we have (A1 + A0 ) = (a3 + a1 )x +
(a2 + a0 ) and (B1 + B0 ) = (b3 + b1 )x + (b2 + b0 ), the product
of which produces terms
h(a3 + a1 ) · (b3 + b1 )i,
ha4 · b5 + a5 · b4 i, a4 · b4 ;
00 : a3 · b3 , ha2 · b3 + a3 · b2 i,
a2 · b2 , ha1 · b3 + a3 · b1 i,
c′1 = ha0 · b1 + a1 · b0 i, c′0 = a0 · b0 .
The subtractions by A1 B1 and A0 B0 come from the intermediate
terms related to symbols 11 and 00, respectively, in Equations (14)
and (15). By subtracting a3 · b3 and a1 · b1 from h(a3 + a1 ) · (b3 +
b1 )i, we obtain ha1 · b3 + a3 · b1 i; by subtracting a2 · b2 and a0 · b0
from h(a2 +a0 )·(b2 +b0 )i, we get ha0 ·b2 +a2 ·b0 i; by subtracting
ha2 · b3 + a3 · b2 i and ha0 · b1 + a1 · b0 i from h(a3 + a1 ) · (b2 + b0 ) +
(a2 + a0 ) · (b3 + b1 )i, we obtain ha1 · b2 + a2 · b1 + a0 · b3 + a3 · b0 i.
Therefore, the final result for the intermediate terms related to
symbol 01 is
01 : ha1 · b3 + a3 · b1 i,
The computation of (A1 + A0 ) (B1 + B0 ) − A1 B1 − A0 B0 is
also analogous. We give only its final result, that is
01 : ha3 · b7 + a7 · b3 i,
ha2 · b7 + a7 · b2 + a3 · b6 + a6 · b3 i,
ha2 · b6 + a6 · b2 i,
ha1 · b7 + a7 · b1 + a3 · b5 + a5 · b3 i,
c′7 = ha1 · b6 + a6 · b1 + a2 · b5 + a5 · b2
(20)
+ a0 · b7 + a7 · b0 + a3 · b4 + a4 · b3 i,
ha0 · b6 + a6 · b0 + a2 · b4 + a4 · b2 i,
(16)
ha1 · b5 + a5 · b1 i,
We recall that coefficients
i = 0, . . . , 6, are obtained by
running Karatsuba’s algorithm after all other intermediate terms
are computed. However, at this point, we just want to observe the
terms that are produced together, being sufficient to perform the
first step of the algorithm. In this sense, from Equation (7), we
particularly know that
ha0 · b4 + a4 · b0 i.
ha0 · b2 + a2 · b0 i.
ha0 · b5 + a5 · b0 + a1 · b4 + a4 · b1 i,
c′i ,
c′1 + (a1 · b2 + a2 · b1 + a2 · b3 + a3 · b2 )
.
2
Hence, in order to evaluate c1 , we need to compute a1 ·b2 +a2 ·b1 ,
because this term is originally produced together with a0 · b3 +
a3 · b0 , as shown in Equation (16). Since we know a1 · b1 and
a2 · b2 , this requires one multiplication and four additions because
a1 · b2 + a2 · b1 = (a1 + a2 ) · (b1 + b2 ) − a1 · b1 − a2 · b2 .
All other coefficients ci can be obtained in similar way. Naturally,
we still need to count other extra operations mentioned before
Theorem 1. The final result is Me (4) = 8 and Ae (4) = 11.
Remark: In Example 2, we do not need to separate a0 · b3 +
a3 · b0 . However, this term could be obtained by using one more
addition, namely
a0 · b3 + a3 · b0 = ha1 · b2 + a2 · b1 + a0 · b3 + a3 · b0 i
(19)
ha0 · b2 + a2 · b0 i, a1 · b1 ,
h(a2 + a0 ) · (b2 + b0 )i.
c1 =
(18)
ha4 · b6 + a6 · b4 i, a5 · b5 ,
c′3 = ha1 · b2 + a2 · b1 + a0 · b3 + a3 · b0 i,
h(a3 + a1 ) · (b2 + b0 ) + (a2 + a0 ) · (b3 + b1 )i,
c′3 = ha1 · b2 + a2 · b1 + a0 · b3 + a3 · b0 i,
and A0 B0 may be viewed as repetitions of the case N = 4. The
terms obtained are
(17)
− ha1 · b2 + a2 · b1 i.
Although the last step of the separation procedure is not required
in Example 2, we do need to use it in multiplications involving
larger degree polynomials.
Example 3: In this example, we want to multiply a(x) and
b(x) for N = 8. As in Example 2, the computation of A1 B1
Let us consider the term c′7 = ha1 · b6 + a6 · b1 + a2 · b5 + a5 · b2 +
a0 ·b7 +a7 ·b0 +a3 ·b4 +a4 ·b3 i. Terms a1 ·b6 +a6 ·b1 , a2 ·b5 +a5 ·b2
and a3 ·b4 +a4 ·b3 need to be separated from c′7 because they must
also be added to c′5 , c′3 and c′1 , in order to compute c5 , c3 and c1 ,
respectively. Similarly to the previous example, one multiplication
and four additions are necessary for calculating each one of these
terms. From the term ha2 · b7 + a7 · b2 + a3 · b6 + a6 · b3 i, which
is associated to c′9 , we need to separate a2 · b7 + a7 · b2 and
a3 ·b6 +a6 ·b3 , and respectively add them to c′5 and c′3 , in order to
compute c5 and c3 . The same procedure is applied for all terms
which are previously computed together. After this, other extra
operations have to be counted for adding the separated terms to
coefficients c′i and multiplying by the factor 1/2. This results in
Me (8) = 24 and Ae (8) = 71.
With the previous examples in mind, we can derive a formula for the number of operations necessary to separate terms
produced together in Karatsuba’s algorithm. We start observing
the intermediate terms produced by the algorithm, i.e., before
obtaining the final result for the coefficients of c′ (x). We associate
terms in the form ai · bi to 0, hai1 · bj1 + aj1 · bi1 i to 1,
hai1 · bj1 + aj1 · bi1 + ai2 · bj2 + aj2 · bi2 i to 2, hai1 · bj1 + aj1 · bi1
+ ai2 · bj2 + aj2 · bi2 + ai3 · bj3 + aj3 · bi3 + ai4 · bj4 + ai4 · bj4 i
to 4, etc. In general, a term with the form
+
* 2t
X
(aik · bjk + ajk · bik ) ,
(21)
k=1
where t ∈ N and ik + jk is a constant for 1 ≤ k ≤ 2t , is associated
to the number or status s = 2t . If we consider that all terms in the
LIMA et al.: A KARATSUBA-BASED ALGORITHM FOR POLYNOMIAL MULTIPLICATION IN CHEBYSHEV FORM
above expression need to be separated, s − 1 extra multiplications
are required. Consequently, at most 4 (s −1)+1 extra additions are
necessary. The upper bound is justified by the possible presence
of terms of the form ha0 · bi + ai · b0 i, i 6= 0, produced together
with other terms. They do not need to be separated and, in these
cases, one addition is saved; see remark after Example 2.
After applying the separation procedure just explained, every
term has status at most 1, i.e, has the form ai · bi or hai · bj +
aj · bi i. Such terms are then added to coefficients c′i according to
Equation (7) in order to obtain coffecients ci .
For N = 1, we have a0 · b0 only, which has status 0 and
does not represent any extra operation. Since this case is like
an “initial state”, we associate it to 01. For N = 2, we have a
repetition of the previous one on terms associated to 11 and 00;
see Equations (11) and (12). The symbol 01 is also a repetition of
the previous one, but with a status incremented from 0 to 1; see
Equation (13). Due to the recursive nature of the algorithm, an
analogous fact occurs for N = 4, 8, . . .. This may be verified
in Equations (14)–(16) and Equations (18)–(20). This allows
to construct Table I, which shows the status of all terms in
Karatsuba’s algorithm up to N = 8. The last row emphasizes
that m(n), the number of multiplications necessary for separating
terms that Karatsuba’s algorithm computes together, is obtained
by summing contributions of terms associated to 11, 01 and 00.
These contributions are respectively denoted by m(n)11 , m(n)01
and m(n)00 .
If N = 4, for instance, we have m(n) = m(n)01 = 1 because
only the term with status 2 associated to 01 requires a separation
procedure (see Table I). Specifically, this term corresponds to
ha1 ·b2 +a2 ·b1 +a0 ·b3 +a3 ·b0 i, presented in Example 1. If N = 8,
we have m(n)00 = 1 (one term with status 2), m(n)01 = 7 (four
terms with status 2 and one term with status 4) and m(n)11 = 1
(one term with status 2). These terms may be distinguished in
Equations (18)–(20). In this case, m(n) = 1 + 7 + 1 = 9.
Moreover, by comparing rows for N = 4 (n = 2) and N =
8 (n = 3) in Table I, we note that m(3)11 = m(3)00 = m(2);
m(3)01 is given by 2 m(2)01 plus the contribution of the terms
related to m(2)01 , but with incremented (doubled) statuses. Due
to the recursion of Karatsuba’s algorithm, this situation is general,
that is, m(n)11 = m(n)00 = m(n − 1) and m(n)01 is given by
2 m(n − 1)01 plus the contribution of the terms associated to
m(n − 1)01 with incremented statuses.
Proof of Theorem 1: By using previous notation and remarks,
the number of multiplications necessary for separating terms that
Karatsuba’s algorithm computes together, m(n), is given by
m(n) = m(n)11 + m(n)01 + m(n)00 .
(22)
We know that
m(n)11 = m(n)00 = m(n − 1).
(23)
From above comments, m(n)01 is given by 2 m(n − 1)01 plus the
contribution of the terms related to m(n − 1)01 with incremented
(doubled) statuses. A term with status s1 = 2n , n ≥ 0, contributes
with ms1 = 2n − 1 extra multiplications. Consequently, a term
with status s2 = 2 s1 = 2n+1 contributes with ms2 = 2n+1 − 1 =
2 (2n − 1) + 1 = 2 ms1 + 1 extra multiplications. Then, if a set
with t terms contributes with mt extra multiplications, a new set,
obtained by doubling the status of each term in the previous set,
contributes with 2 mt + t extra multiplications. We note that there
5
are 3n−2 terms associated to m(n − 1)01 (see Table I for the
cases n = 1, 2, 3). Therefore, by doubling the status of each one
of these terms, the new contribution is 2 m(n − 1)01 + 3n−2 . This
allows us to write
m(n)01 = 2 m(n − 1)01 + 2 m(n − 1)01 + 3n−2
= 4 m(n − 1)01 + 3n−2 .
We also note that m(n − 1)01 = m(n − 1) − 2 m(n − 2). Thus,
the above equation may be rewritten as
m(n)01 = 4 (m(n − 1) − 2 m(n − 2)) + 3n−2 .
(24)
By substituting Equations (23) and (24) in Equation (22), we have
m(n) = 2 m(n − 1) + 4 (m(n − 1) − 2 m(n − 2)) + 3n−2
= 6 m(n − 1) − 8 m(n − 2) + 3n−2 .
(25)
Equation (25) is a recurrence relation1 and it can be solved by
means of the z -transform. Denoting by M (z) the z -transform of
m(n), Equation (25) is written in the z -transform domain as
M (z) = 6 M (z) z −1 − 8 M (z) z −2 +
z −2
.
1 − 3 z −1
In the last equation, grouping the terms with M (z), we have
z −2
(1 − 6 z −1 + 8 z −2 ) (1 − 3 z −1 )
1/2
1/2
1
+
−
.
=
1 − 4 z −1
1 − 2 z −1
1 − 3 z −1
M (z) =
(26)
Applying the inverse z -transform to Equation (26), one obtains
m(n) =
4n + 2n − 2 · 3n
.
2
The above equation can be written in function of N as
m(N ) =
N 2 + N − 2 N log2 3
.
2
Adding to m(N ) multiplications due to the scale factor 1/2, we
compute Me (N ), the total number of extra multiplications for
computing coefficients ci from coefficients c′i , by
N 2 + N − 2 N log2 3
+ 2N − 1
2
2
log2 3
N − 2N
+ 5N − 2
=
.
2
Me (N ) =
The extra additions come from two sources. The first one is
related to the separation procedure. There are four additions per
product and at most one more addition per each term with status
≥ 2; see comments immediately after Equation (21). Given n, the
total number of terms produced in the first step of Karatsuba’s
algorithm is 3n . Denoting respectively by S0 (n) and S1 (n) the
number of terms with status 0 and 1 for such an n, we know that
S≥2 (n), the number of terms with status ≥ 2, is given by
S≥2 (n) = 3n − S0 (n) − S1 (n).
(27)
We note that S0 (n) = 2n and
S1 (n) = 2 S1 (n − 1) + S0 (n − 1) = 2 S1 (n − 1) + 2n−1 .
1 Curiously, this recurrence relation produces a sequence m(n), n =
0, 1, 2, . . ., which coincides with the number of monotone Boolean functions
of n variables with 2 mincuts. It also represents the number of Sperner
systems with 2 blocks and some other sequences archived by the “On-line
Encyclopedia of Integer Sequences” [13].
6
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, MONTH YYYY
TABLE I
S TATUS OF ALL TERMS IN K ARATSUBA’ S ALGORITHM UP TO N = 8. T HE NUMBER OF MULTIPLICATIONS m(n) NECESSARY FOR SEPARATING TERMS
ORIGINALLY COMPUTED TOGETHER WITH OTHER TERMS IS ALSO PRESENTED .
N = 2n
11
01
00
m(n)
1
2
4
8
-
0
1
1, 2, 1
1, 2, 1, 2, 4, 2, 1, 2, 1
|
{z
}
-
0
0
1
9
0
0, 1, 0
0, 1, 0, 1, 2, 1, 0, 1, 0
|
{z
}
m(n)11
m(n)01
Since the above equation is also a recursion, it may be solved
using the z -transform. The result is S1 (n) = n 2n−1 . Hence,
Equation (27) may be written as S≥2 (n) = 3n − 2n − n 2n−1
and, consequently,
N
log2 N
S≥2 (N ) = N log2 3 − N −
2
log2 N
log2 3
.
=N
−N 1+
2
Thus, the number of extra additions related to the separation
procedure is at most
log2 N
N 2 + N − 2 N log2 3
+ N log2 3 − N 1 +
2
2
log2 N
2
log2 3
.
(28)
=2 N − 3 N
+N 1−
2
4
The second source of extra additions is related to operations
needed to add terms al ·bl , l = 1, . . . , N −1, and al ·bl+i +al+i ·bl ,
i = 1, . . . , N − 2, l = 1, . . . , N − 1 − i, in Equation (7), which
gives
N −1+
N
−2
X
(N − 1 − i) =
i=1
N (N − 1)
.
2
(29)
Thus, by summing Equations (28) and (29), we compute Ae (N ),
the total number of extra additions for computing coefficients ci
from coefficients c′i . One obtains
Ae (N ) ≤ 2N 2 − 3N log2 3 + N
=
1−
log2 N
2
+
N (N − 1)
2
5N 2 − 6N log2 3 + N (1 − log2 N )
.
2
C. Total Arithmetic Complexity
By using our Karatsuba-based algorithm, the total arithmetic
complexity for computing Chebyshev coefficients of the product
of two polynomials in Chebyshev form is given by the following
theorem.
Theorem 2: Let a(x) and b(x) be polynomials of degree N −
1 whose Chebyshev coefficients ai and bi , i = 0, . . . , N − 1,
are given. By means of the proposed Karatsuba-based algorithm,
Chebyshev coefficients ci , i = 0, . . . , 2N − 2, of the polynomial
c(x) = a(x) · b(x) are obtained with
N2 + 5 N − 2
Mk (N ) =
2
(30)
0
0, 1, 0
0, 1, 0, 1, 2, 1, 0, 1, 0
|
{z
}
m(n)00
multiplications and
Ak (N ) ≤
5 N 2 + 6 N log2 3 − N (15 + log2 N ) + 4
2
(31)
additions.
The proof is immediate. Equations (30) and (31) are obtained
by adding the number of operations necessary for computing
coefficients c′i , presented in Section III-A, to the number of extra
operations derived in the last subsection.
We remark that the standard application of Karatsuba’s algorithm for multiplying polynomials involves O(N log2 3 ) arithmetic
operations. Here, due to the extra operations, our method has a
higher cost of O(N 2 ).
IV. D ISCUSSION AND C ONCLUSIONS
From Equations (4), (5), (30) and (31), we construct Table II,
in which the total number of multiplications and additions for
multiplying polynomials in Chebyshev form by direct (resp. Md
and Ad ) and our Karatsuba-based (resp. Mk and Ak ) methods
are shown. All the entrances in Table II were checked by a
c
Matlab
computer simulation. The program counted the number
of operations for both direct and our Karatsuba-based methods.
Although both direct and our Karatsuba-based methods involve
O(N 2 ) multiplications, the division by 2 in Equation (30) makes
a considerable difference. By asymptotically evaluating the ratio
Mk (N )/Md (N ), we conclude that half of the multiplications
required by the direct method is saved if we use Karatsuba-based
algorithm. This tendency is observed in Table II.
As expected, since one of Karatsuba’s algorithm principles is
to exchange multiplications by additions, Ak (N ) is larger than
Ad (N ). More precisely, the ratio Ak (N )/Ad (N ) is closed to 5/3
as N increases. Thus, a coherent comparison between the direct
and the proposed methods strongly depends on the computational
cost of one multiplication in terms of additions. If we consider
that one multiplication costs r additions, the following analysis
can be done.
Let N = 2n , the total computational cost Td (N ) for multiplying
two polynomials of degree N −1 in Chebyshev form by the direct
method is measured by
Td (N ) = r Md (N ) + Ad (N ).
The total cost Tk (n) using our Karatsuba-based method is
Tk (N ) = r Mk (N ) + Ak (N ).
LIMA et al.: A KARATSUBA-BASED ALGORITHM FOR POLYNOMIAL MULTIPLICATION IN CHEBYSHEV FORM
TABLE II
T OTAL NUMBER OF MULTIPLICATIONS AND ADDITIONS FOR MULTIPLYING
POLYNOMIALS IN C HEBYSHEV BASIS BY DIRECT ( RESP. Md AND Ad )
AND K ARATSUBA - BASED ( RESP. Mk AND Ak ) METHODS .
N = 2n
1
2
4
8
16
32
64
128
Md
2
7
23
79
287
1087
4223
16639
Mk
2
6
17
51
167
591
2207
8511
Ad
0
2
15
77
345
1457
5985
24257
Ak
0
5
35
171
733
2971
11757
46115
TABLE III
T OTAL NUMBER OF MULTIPLICATIONS AND ADDITIONS FOR MULTIPLYING
POLYNOMIALS IN C HEBYSHEV BASIS BY DCT ( RESP. MDCT AND
ADCT ) AND K ARATSUBA - BASED ( RESP. Mk AND Ak ) METHODS .
N = 2n
1
2
4
8
16
32
64
128
MDCT
2
7
23
67
179
451
1091
2563
Mk
2
6
17
51
167
591
2207
8511
ADCT
12
30
81
216
555
1374
3297
7716
Ak
0
5
35
171
733
2971
11757
46115
A general knowledge concerning the ratio Td (N )/Tk (N ) can be
acquired by computing
lim
N →∞
r Md (N ) + Ad (N )
Td (N )
= lim
.
Tk (N )
N →∞ r Mk (N ) + Ak (N )
In order to find the range of r where Karatsuba-based approach is
faster than the direct approach, we substitute previously derived
formulas in the above equation and obtain
2r + 3
> 1,
r+5
whose solution is
r > 2.
Hence, Karatsuba-based approach is cheaper than the direct
approach if one multiplication costs more than two additions.
In most applications, one multiplication is significantly more
expensive than two additions [10].
Another alternative for performing the operation discussed in
this paper is to expand the polynomials in Chebyshev form to
rewrite them in monomial form. Then, the product is computed
applying the standard Karatsuba’s algorithm. As a final step,
the obtained polynomial is written back in Chebyshev form. In
this case, besides increasing the involved arithmetic complexity,
extra operations for converting polynomials in Chebyshev form
to polynomials in monomial form and vice-versa also induce
precision restrictions.
It is also pertinent to compare our approach with that proposed
in [8], where the polynomial multiplication in Chebyshev form is
computed in the discrete cosine transform (DCT) domain. In this
case, the product of two polynomials of degree N − 1 is carried
out by computing 2N -DCTs. Although the authors of [8] only
discuss asymptotic aspects of the arithmetic complexity involved
in this method, it is possible to use general formulas and obtain a
7
more precise number of multiplications and additions required by
the DCT method. They are respectively denoted by MDCT (N )
and ADCT (N ) and are given in [14]
MDCT (N ) = 3N log2 2N − 4N + 3
and
ADCT (N ) = (9N + 3) log2 2N − 4N + 12.
By observing Table III, which compares DCT and our Karatsubabased methods, we note that the former uses less arithmetic
operations for N ≥ 32. For N = 16, a coherent comparison
depends on the cost r of one multiplication in terms of additions.
Since DCT implementation requires multiplications by cosines
of arcs, precision restrictions must be also considered. On the
other hand, in Karatsuba-based method, besides products among
coefficients ai and bi , only products by 1/2 are necessary, which
makes this aspect less critical. Hence, for N < 16, which covers
several Chebyshev expansion practical applications, Karatsubabased method should be used. For instance, in [2], [3] and [4],
Chebyshev expansions with 4 ≤ N ≤ 6, 5 ≤ N ≤ 13 and
3 ≤ N ≤ 5 are used, respectively. For larger N , if precision
is not a problem, DCT method should be used.
We remark that the space required by our algorithm is a bit
larger than that for the other algorithms. However, our method
should be employed for intermediate sizes where this larger
memory requirement is not a problem.
Although this paper is not focused on hardware implementations for the proposed method, there is a relevant remark concerning this aspect. Except for some multiplications by 1/2, all extra
operations needed for computing coefficients ci from coefficients
c′i can be implemented in parallel to standard Karatsuba’s algorithm. Thus, using this, our method can be considerably sped up.
ACKNOWLEDGMENT
Juliano B. Lima performed this work while at the School of
Mathematics and Statistics, Carleton University. He was supported by Coordenação de Aperfeiçoamento de Pessoal de Nı́vel
Superior – CAPES – under Grant 0599-07-7. Both Daniel Panario
and Qiang Wang are supported in part by NSERC of Canada.
R EFERENCES
[1] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal
Processing, Prentice-Hall, Englewood Cliffs, NJ, 2nd edition, 1999.
[2] I. Sarkas, D. Mavridis, M. Papamichail, and G. Papadopoulos, “Volterra
analysis using Chebyshev series,” in Proc. IEEE Int. Symposium on
Circuits and Systems (ISCAS’2007), May 2007, pp. 1931–1934.
[3] P. J. Chiang, C. P. Yu, and H. C. Chang, “Robust calculation of chromatic
dispersion coefficients of optical fibers from numerically determined
effective indices using Chebyshev-Lagrange interpolation polynomials,”
Journal of Lightwave Technology, vol. 24, no. 11, pp. 4411–4416, Nov.
2006.
[4] A. Ashrafi, R. Adhami, L. Joiner, and P. Kaveh, “Arbitrary waveform
DDFS utilizing Chebyshev polynomials interpolation,” IEEE Transactions on Circuits and Systems–I: Regular Papers, vol. 51, no. 8, pp.
1468–1475, Aug. 2004.
[5] G. Cuypers, G. Ysebaert, M. Moonen, and F. Pisoni, “Chebyshev
interpolation for DMT modems,” in Proc. IEEE Int. Conference on
Communications (ICC’2004), June 2004, pp. 2736–2740.
[6] J. C. Mason and D. C. Handscomb, Chebyshev Polynomials, Chapman
& Hall/CRC, Boca Raton, FL, 1st edition, 2003.
[7] G. H. Rawitscher and I. Koltracht, “An efficient numerical spectral
method for solving the Schrodinger equation,” Computing in Science &
Engineering, vol. 7, no. 6, pp. 58–66, Nov.-Dec. 2005.
8
[8] G. Baszenski and M. Tasche, “Fast polynomial multiplication and
convolutions related to the discrete cosine transform,” Linear Algebra
Appl., vol. 252, no. 1-3, pp. 1–25, Feb. 1997.
[9] A. Karatsuba and Y. Ofman, “Multiplication of many-digital numbers
by automatic computers,” Doklady Akad. Nauk SSSR, vol. 145, pp. 293–
294, 1962. Translation in Physics-Doklady, no. 7, pp. 595–596, 1963.
[10] J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cambridge University Press, Cambridge, United Kingdom, 2nd edition, 2003.
[11] P. L. Montgomery, “Five, six, and seven-term Karatsuba-like formulae,”
IEEE Transactions on Computers, vol. 54, no. 3, pp. 362–369, Mar.
2005.
[12] C. Paar, “A new architecture for a parallel finite field multiplier with
low complexity based on composite fields,” IEEE Transactions on
Computers, vol. 45, no. 7, pp. 856–861, July 1996.
[13] N. J. A. Sloane, “The on-line encyclopedia of integer sequences,”
http://www.research.att.com/∼njas/sequences/A016269.
[14] S. C. Chan and K. L. Ho, “Direct method for computing sinusoidal
transforms,” IEEE Proceedings, vol. 137, no. 6, pp. 433–442, Dec.
1990.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, MONTH YYYY

Download Report

A Karatsuba-based Algorithm for Polynomial Multiplication in

Paperzz.com

Your Paperzz