Log-sum multiplier - IEEE Computer Society

Log-sum multiplier
by J. P. AGRAWAL and V. U. REDDY
Indian Institute of Technology
:M:adras, India
ABSTRACT
the same as in Dadda's and the carry-save schemes.
But these adders can be replaced fully by Q 4-bit F As
taking full advantage of the fast carry generation.
Q is the smallest integer greater than or equal to
(n2-n) /4. Hence, the log-sum scheme is more economical than any other parallel multiplier and at the
same time yields a high speed of operation. Another
feature of the scheme is that it can easily be modified
to adopt longer FAs if and when I.C. technology makes
them available.
In the second section, we give a general scheme for
reduction of summands using I-bit F As and HAs and
then modify it for 4-bit CLA adders. In the third
section, we present cost and speed analysis for the
log-sum multiplier and compare it in the fourth section
with other schemes. In the fifth section, we discuss
how the scheme can be extended for M-bit F As, and
to include 2's complement multiplications.
~~ scheme for implementing a fast parallel multiplier
based on a log-sum method is presented. The scheme
for an n-bit by n-bit multiplication uses Q, least integer
greater than or equal to (n2-n) /4, 4-bit Carry-LookAhead (CLA) adders. The worst case time for multiplying two sign + 12-bit numbers is estimated to be
180 ns when TTL 7400 series I.e.s are used. The
scheme can easily be modified to use M-bit (M>4)
CLA adders if and when available.
INTRODUCTION
Multiplication of an n-bit number A(a n an-I' .. al) by
another n-bit number B (b n b n-I ... b l ) involves reduction of n 2 summands aj' b j , i, j = 1,2, ... n, arranged in
(2n-l) columns and n rows, into one 2n-bit row
(P 2n P 2n -1 • • • PI), as shown in Figure 1. The summands are generated in parallel using n 2 AND-gates.
A number of schemes have been given in the past l - 4
for implementing fast parallel multipliers which are
different basically in the procedure used for the reduction of summands. Dadda'sl and Wallace's~ schemes
are equally fast and take approximately log1.3n1-bit
adding-stages (successive additions) to reduce the
summands to two G (least integer ~ 2n-2-log1.3n)-bit
numbers, which may be added using a G-bit CarryLook-Ahead (CLA) adder or a ripple adder. Wallace's
scheme, however, requires a larger number of adders.
The carry-save schemel takes (n -1) I-bit addingstages to reduce the summands to two (n-l)-bit
numbers. Both Dadda's and the carry-save schemes
require (n2-2n) I-bit full adders (FA) and n I-bit
half adders (RA).
Habibi and Wintz4 have shown that the carry-save
scheme is slower than Dadda's but cheaper since it
makes greater use of 4-bit FAs. However, in both the
schemes the 4-bit F As are connected in such a way
that the fast carry generation property of the 4-bit
FA I.C. chip is not utilized.
The purpose of this paper is to present a log-sum
configuration for the reduction of summands, in which
the numbers of I-bit FAs and I-bit HAs required are
THE LOG-SUM MULTIPLIER
In the log-sum method 5 the addition of n rows is
performed in log2n adding-stages where each addingstage reduces the number of rows by a factor of 2.
Therefore, the adder-rows (row of adders to add two
rows) required for the first, second, ... (lOg2n) th
adding-stages are ~ ,~, ... 1, respectively. This
method of addition is used to yield economic and fast
multiplication.
PreliminaTY scheme
The n rows, shown in Figure 1, are grouped into
(n+k)/4(=L) sets (k=0,1,2 or 3 such that L is an
integer), the last set consisting of (4 - k) rows and
the others 4 rows each. The 4 rows in a set are reduced
to two rows by first stage addition and then reduced
to yield one (n + 4) -hit row by second stage addition,
as shown in Figure 2 for n=7. As illustrated in the
Figure, (n-l)-hit and (n+2)-bit adder-rows are required for first stage and second stage additions, respectively. Also, it is seen that in this configuration
(i) the outputs of two successive sets always overlap
783
From the collection of the Computer History Museum (www.computerhistory.org)
784
National Computer Conference, 1976
a3
b3
an
bn
a~bl
a nb 2
a~bn
P 2n
P 2n - 1
anb n_1 an_Ib n_1
a~_1 b n a~_2bn
P 2n - 2 P 2n - 3
a2
b2
al
bl
... aSb l aibi aibl rowl
... a:ib 2 ai b 2
row 2
a:ibn_1 aibn_1
aib n
P n ..... P 3
row n-l
rown
P2
PI
Figure l-Multiplication of two n-Bit Numbers A (an· .. a l
B(bn · .. b l )
in n-bit locations and (ii) each mth set has an half
adder in (n+4m-3)th column. Therefore, outputs of
the mth and (m+ l)th sets are added using one n-bit
adder-row, feeding the overflow bit from this to the
corresponding half adder in (m + 1) th set. This corresponds to the third stage addition. In the same
manner, only n-bit adder-rows are required for each
of the 4th, 5th, ... and (lOg2n) th adding-stages. The
)
&
total number of I-bit full/half adders required in this
scheme may, therefore, be written as,
n
n
n + ... +2n)
N=2(n-I)
+4(n+2)
+ (n
8+ Hi
n
j
3n 2 n 2
=T+T(1-2-j )=n 2 -n, where j=log2(n/4);
which is same as required by Dadda's scheme.
°7
b7
°6
b6
°5
b5
Q4
b4
°3
b3
---- ---- ---
°1
b1
°2
b2
x
x
Set
-xl
I
I
I
D ~First-stage
D
Adder
-Second-stage Adder
D-Third-stage Adder
)( = Qj.bj,
i =1,2 ...... 7,
j =.1.2 ..... 7.
• = Final Product bits
(i.e.,P14 ,P13 ....... Pl )
H = Half adder
Figure 2-General scheme for 7-bit x 7-bit log-sum multiplier
From the collection of the Computer History Museum (www.computerhistory.org)
Log-Sum Multiplier
785
Final scheme
listed above, logzn successive additions (Le., one 4-bit
addition in each adding-stage) would yield approxi-
A fast and economic multiplier is realized by rearranging a few adders in the above described scheme
as follows.
mately ~ + 1 Isb's (al· b 1 being the first Isb) of the
final product, as seen from Figures 3 and 4. This is so
because the two inputs to the final-adding stage are
n
n
the sum of first 2 rows and the sum of last 2 rows,
(i) All the adders in the final propagation path
which includes the final adding-stage are aligned in
one row.
(ii) After (i) has been completed some adders are
moved from one adder-row to the other and rearranged
such that the number of adders in each row (except
in one row if N/4 is not an integer) is a multiple of 4.
An adder that is moved from one adder-row is rearranged in another adder-row, as far as possible,
corresponding to the equivalent or higher adding-stage.
This is illustrated in Figure 2 for n=7, where five
adders marked 1 to 5 are rearranged to yield the final
scheme.
(iii) The strings (or rows) of I-bit full/half adders
in (ii) are replaced by the strings of 4-bit adders.
(iv) In general, the least significant bit (Isb) adder
in each adder-row is a half-adder. Such an HA in a
particular adding-stage is interchanged with a corresponding FAin a lower adding-stage if it allows to
reduce carry-dependence in that adder-row. This is
illustrated in Figure 4 which gives final scheme for a
12-bit x 12-bit multiplier.
If the above four steps are carried out in the order
which are displaced by ~ bits.
COST AND SPEED ANALYSIS
In this section we give expressions for the cost and
the multiplication time of the log-sum multiplier. Costs
and speeds of the 7-bit x 7-bit and 12-bit x 12-bit
multipliers are computed from these expressions assuming Texas Instruments (TI) TTL 7400 series LC.
chips. Prices have been taken from the 1974 "Cramer
Electronics Catalogue" and delays from 1974 Texas
Instruments 'The TTL Data Book for Design Engineers'.
Cost estimation
The log-sum multiplier requires n 2 AND-gates for
generation of summands and Q 4-bit F As for reduction
OJ
":'b
u~
°4
°3
°2
°1
b7
b6
b5
b4
b3
b2
bl
~
x
To be replaced by
4 bit full adder ~--r-----
>f-
I x
I
X
L.: __ _
x
(symbols given in Fig.2)
Figure 3-Final scheme for 7-bit x 7-bit log-sum multiplier
From the collection of the Computer History Museum (www.computerhistory.org)
786
National Computer Conference; 1976
where t g is the time required for the generation of
summands (Le., propagation delay of one 2-input
AND-gate), t nd is the addition time for a 4-bit FA and
t(. is the carry-propagation delay of the 4-bit FA.
Maximum values of tg, tad and t c , as given in TI 1974
catalogue, are 27 ns, 24 ns and 16 ns, respectively.
Hence, the worst case multiplication time for n=7 and
n = 12 cases are 130 ns and 180 ns, respectively.
A 7-bit x 7-bit multiplier was built on a printed
circuit board, using the scheme given in Figure 3, for
use in a special purpose signal processor. 6 The multiplication time required for each possible combination
of inputs was measured. The maximum time was
found to be less than 110 ns which is well below the
estimated worst case value of T)f'
of summands. Hence, the cost of the log-sum multiplier
may be written as
C=cl·R+c~·Q
where C 1 and C 2 are the prices of a quad 2-input ANDgate chip and a 4-bit FA chip, respectively. R is the
least integer greater than or equal to n 2/4 and Q is the
least integer greater than or equal to (n2-n) /4.
The 7-bit x 7-bit log-sum multiplier requires 49
AND-gates and, hence, 13 quad 2-input AND-gate chips
(SN 7408), and, 11 4-bit FAs (SN 7483A). The
12-bit x 12-bit multiplier requires 36 quad 2-input
AND-gate chips and 33 4-bit F As. The corresponding
costs are about $40 and $115, respectively.
Speed estimation
COMPARISON WITH OTHER SCHEMES
As indicated earlier, ~ + I Isb's of the final product
are obtained in log2n 4-bit successive additions. The
time required to obtain remaining product bits is
Cost comparison
The log-sum multiplication scheme is most- economical of all the array multiplier schemes, since it employs
minimum number of 4-bit F As. Each of the other
schemes requires I-bit and/or 2-bit F As in addition to
4-bit F As or can be implemented using larger number
of 4-bit F As. The adders required by the carry-save
scheme, which is more economic than any other past
schemes, are compared with those required by log-sum
the carry-propagation delay from ( ~ + 1) th bit to
(2n-l)th bit. Therefore, the multiplication time required by the log-sum multiplier is given as
T)I=tg+tad'IOg2n+tc'( 2n-2-~ )/4
= tg+tad,log2n+ (3n-4) ·tc/8
a l2
b l2
all
b ll
a10
a9
b10
bg
"a
a7
a6
be
~
bs
a5
bS
Q,
bt.
QJ
bJ
iI
0--
4 til Stagf-adder
(othfr syMbols given in Fig. 2 l
Figure 4-Final scheme for 12-bit x 12-bit log-sum multiplier
From the collection of the Computer History Museum (www.computerhistory.org)
Log-Sum Multiplier
TABLE I
Scheme
0
Log-sum
Q
Carry-save
Q_3n
Log-sum
Q-1
Carry-save
Q_ 2n + 2
T _ _ _ •• _
2
3
DISCUSSION
4
1
n
4
A
1
.L.iVl5-i::)u..J..!J.
"t-.L
Carry-save
Q_ n + 2
Log-sum
Carry-save
of log-sum multiplier using TTL logic, presented in
this paper.
Number of adders required
4-bit F As
2-bit F As 1-bit F As/HAs
k
1
787
n
4
Q
Q
scheme for various values of k (defined in an earlier
section) in Table I.
As seen from this table, the cost of the log-sum multiplier is at most equal to that of carry-save multiplier.
Speed comparison
Dadda's scheme requires roughly log1.5n= 1.71 log2n
adding stages. The multiplication time required by
Dadda's scheme, therefore, is approximately given by
T :\m= t g+ t ad' 1•7 1 I Og2n +
2n-2-1.711og2n
4
·te
assuming 4-bit FAs throughout (including the final
addition). It is clear from the expressions for TM and
T:\lD that for all n, T:\ID~T:\I' The worst case values of
T:\I and T:\ID for n=7, 12 and 32 are given in Table II.
If a G (let G be the smallest integer~)-bit CLA adder,
built from Schottky TTL I.C. chips, is employed to
add the final two numbers in Dadda's scheme, T)ID
reduces significantly. However (i) use of Schottky
TTL I.C.s increases the multiplier cost enormously,
and (ii) T:\ID reduces to less than or equal to T:\I only
for large values of n (>32).
Pezaris' has given a 40-ns 17-bit x 17-bit array
multiplier, using basically the carry-save scheme. However, he has used ECL logic and, hence, the cost and
speed of his multiplier cannot be compared with those
The configuration presented in this paper yields a
fast multiplier using the optimum number of 4-bit CLA
adders. The scheme can be adapted to M-bit CLA
adders by making the number of adders in each row a
multiple of lVI instead of a multiple of 4 in step (ii) of
the second section of this paper.
The scheme has been presented for the multiplication of two sign+n-hit numbers. The sign of the product is generated separately. Baugh and WooleyS have
given an algorithm for 2's complement array multiplier. in which the multiplication process is same as
given in Figure 1 except that the number of rows
increases from n to n+2. Therefore, the 2's complement multiplication can easily be implemented by logsum scheme.
A 4-bit x 4-bit multiplier from TI (requiring two
chips, SN 74284 and SN 74285) costs approximately
$26, and, the worst case and typical values of multiplication time are 60 ns and 40 ns, respectively. The
same multiplier based on log-sum scheme costs approximated $11, while the worst case and typical values
of multiplication time are approximately same as
above.
Finally, the log-sum scheme fully exploits both the
cost effectiveness and speed effectiveness of the 4-bit
F As and, therefore, is well suited for multipliers in
special purpose hardware requiring high speeds of
operation.
REFERENCES
1. Dadda, L., "Some schemes for parallel multipliers," Alta
Frequenza, Vol. 34, pp. 349-356, March 1965.
2. Wallace, C. S., "A suggestion for a fast multiplier," IEEE
Trans. Electronic Computers, Vol. EC-13, pp. 14-17., Feb.
1964.
3. Braun, E. L., Digital Computer Design. New York, Academic
Press, 1963.
4. Habibi, A. and P. A. Wintz, "Fast multipliers," IEEE Trans.
Computers (Short notes), Vol. C-19, pp. 153-157, Feb. 1970.
5. Kuck, D. J., "Illiac IV software and application programming," IEEE Trans. Computers, Vol. 17, pp. 758-770, August
1968.
6. Agrawal, J. P., V. U. Reddy and H. Renganathan, Design
and Test Evaluation of Arithmetic Unit for FFT Processor,
TABLE II
n
Log-sum scheme
TM (ns)
Dadda's scheme
TMD(ns)
7
12
32
130
180
330
170
230
450
Technical Report, CSD/TR/A12-2/19, Sept. 1974, lIT,
Madras.
7. Pezaris, S. D., "A 40-ns 17-bit by 17-bit array multiplier,"
IEEE Trans. Computers (Short Notes), C-20, pp. 442-447,
April 1971.
8. Baugh, C. R. and B. A. Wooley, "A two's complement parallel
array multiplication algorithm," IEEE Trans. Computers,
Vol. C-22, pp. 1045-1047, December 1973.
From the collection of the Computer History Museum (www.computerhistory.org)
From the collection of the Computer History Museum (www.computerhistory.org)