Log-sum multiplier by J. P. AGRAWAL and V. U. REDDY Indian Institute of Technology :M:adras, India ABSTRACT the same as in Dadda's and the carry-save schemes. But these adders can be replaced fully by Q 4-bit F As taking full advantage of the fast carry generation. Q is the smallest integer greater than or equal to (n2-n) /4. Hence, the log-sum scheme is more economical than any other parallel multiplier and at the same time yields a high speed of operation. Another feature of the scheme is that it can easily be modified to adopt longer FAs if and when I.C. technology makes them available. In the second section, we give a general scheme for reduction of summands using I-bit F As and HAs and then modify it for 4-bit CLA adders. In the third section, we present cost and speed analysis for the log-sum multiplier and compare it in the fourth section with other schemes. In the fifth section, we discuss how the scheme can be extended for M-bit F As, and to include 2's complement multiplications. ~~ scheme for implementing a fast parallel multiplier based on a log-sum method is presented. The scheme for an n-bit by n-bit multiplication uses Q, least integer greater than or equal to (n2-n) /4, 4-bit Carry-LookAhead (CLA) adders. The worst case time for multiplying two sign + 12-bit numbers is estimated to be 180 ns when TTL 7400 series I.e.s are used. The scheme can easily be modified to use M-bit (M>4) CLA adders if and when available. INTRODUCTION Multiplication of an n-bit number A(a n an-I' .. al) by another n-bit number B (b n b n-I ... b l ) involves reduction of n 2 summands aj' b j , i, j = 1,2, ... n, arranged in (2n-l) columns and n rows, into one 2n-bit row (P 2n P 2n -1 • • • PI), as shown in Figure 1. The summands are generated in parallel using n 2 AND-gates. A number of schemes have been given in the past l - 4 for implementing fast parallel multipliers which are different basically in the procedure used for the reduction of summands. Dadda'sl and Wallace's~ schemes are equally fast and take approximately log1.3n1-bit adding-stages (successive additions) to reduce the summands to two G (least integer ~ 2n-2-log1.3n)-bit numbers, which may be added using a G-bit CarryLook-Ahead (CLA) adder or a ripple adder. Wallace's scheme, however, requires a larger number of adders. The carry-save schemel takes (n -1) I-bit addingstages to reduce the summands to two (n-l)-bit numbers. Both Dadda's and the carry-save schemes require (n2-2n) I-bit full adders (FA) and n I-bit half adders (RA). Habibi and Wintz4 have shown that the carry-save scheme is slower than Dadda's but cheaper since it makes greater use of 4-bit FAs. However, in both the schemes the 4-bit F As are connected in such a way that the fast carry generation property of the 4-bit FA I.C. chip is not utilized. The purpose of this paper is to present a log-sum configuration for the reduction of summands, in which the numbers of I-bit FAs and I-bit HAs required are THE LOG-SUM MULTIPLIER In the log-sum method 5 the addition of n rows is performed in log2n adding-stages where each addingstage reduces the number of rows by a factor of 2. Therefore, the adder-rows (row of adders to add two rows) required for the first, second, ... (lOg2n) th adding-stages are ~ ,~, ... 1, respectively. This method of addition is used to yield economic and fast multiplication. PreliminaTY scheme The n rows, shown in Figure 1, are grouped into (n+k)/4(=L) sets (k=0,1,2 or 3 such that L is an integer), the last set consisting of (4 - k) rows and the others 4 rows each. The 4 rows in a set are reduced to two rows by first stage addition and then reduced to yield one (n + 4) -hit row by second stage addition, as shown in Figure 2 for n=7. As illustrated in the Figure, (n-l)-hit and (n+2)-bit adder-rows are required for first stage and second stage additions, respectively. Also, it is seen that in this configuration (i) the outputs of two successive sets always overlap 783 From the collection of the Computer History Museum (www.computerhistory.org) 784 National Computer Conference, 1976 a3 b3 an bn a~bl a nb 2 a~bn P 2n P 2n - 1 anb n_1 an_Ib n_1 a~_1 b n a~_2bn P 2n - 2 P 2n - 3 a2 b2 al bl ... aSb l aibi aibl rowl ... a:ib 2 ai b 2 row 2 a:ibn_1 aibn_1 aib n P n ..... P 3 row n-l rown P2 PI Figure l-Multiplication of two n-Bit Numbers A (an· .. a l B(bn · .. b l ) in n-bit locations and (ii) each mth set has an half adder in (n+4m-3)th column. Therefore, outputs of the mth and (m+ l)th sets are added using one n-bit adder-row, feeding the overflow bit from this to the corresponding half adder in (m + 1) th set. This corresponds to the third stage addition. In the same manner, only n-bit adder-rows are required for each of the 4th, 5th, ... and (lOg2n) th adding-stages. The ) & total number of I-bit full/half adders required in this scheme may, therefore, be written as, n n n + ... +2n) N=2(n-I) +4(n+2) + (n 8+ Hi n j 3n 2 n 2 =T+T(1-2-j )=n 2 -n, where j=log2(n/4); which is same as required by Dadda's scheme. °7 b7 °6 b6 °5 b5 Q4 b4 °3 b3 ---- ---- --- °1 b1 °2 b2 x x Set -xl I I I D ~First-stage D Adder -Second-stage Adder D-Third-stage Adder )( = Qj.bj, i =1,2 ...... 7, j =.1.2 ..... 7. • = Final Product bits (i.e.,P14 ,P13 ....... Pl ) H = Half adder Figure 2-General scheme for 7-bit x 7-bit log-sum multiplier From the collection of the Computer History Museum (www.computerhistory.org) Log-Sum Multiplier 785 Final scheme listed above, logzn successive additions (Le., one 4-bit addition in each adding-stage) would yield approxi- A fast and economic multiplier is realized by rearranging a few adders in the above described scheme as follows. mately ~ + 1 Isb's (al· b 1 being the first Isb) of the final product, as seen from Figures 3 and 4. This is so because the two inputs to the final-adding stage are n n the sum of first 2 rows and the sum of last 2 rows, (i) All the adders in the final propagation path which includes the final adding-stage are aligned in one row. (ii) After (i) has been completed some adders are moved from one adder-row to the other and rearranged such that the number of adders in each row (except in one row if N/4 is not an integer) is a multiple of 4. An adder that is moved from one adder-row is rearranged in another adder-row, as far as possible, corresponding to the equivalent or higher adding-stage. This is illustrated in Figure 2 for n=7, where five adders marked 1 to 5 are rearranged to yield the final scheme. (iii) The strings (or rows) of I-bit full/half adders in (ii) are replaced by the strings of 4-bit adders. (iv) In general, the least significant bit (Isb) adder in each adder-row is a half-adder. Such an HA in a particular adding-stage is interchanged with a corresponding FAin a lower adding-stage if it allows to reduce carry-dependence in that adder-row. This is illustrated in Figure 4 which gives final scheme for a 12-bit x 12-bit multiplier. If the above four steps are carried out in the order which are displaced by ~ bits. COST AND SPEED ANALYSIS In this section we give expressions for the cost and the multiplication time of the log-sum multiplier. Costs and speeds of the 7-bit x 7-bit and 12-bit x 12-bit multipliers are computed from these expressions assuming Texas Instruments (TI) TTL 7400 series LC. chips. Prices have been taken from the 1974 "Cramer Electronics Catalogue" and delays from 1974 Texas Instruments 'The TTL Data Book for Design Engineers'. Cost estimation The log-sum multiplier requires n 2 AND-gates for generation of summands and Q 4-bit F As for reduction OJ ":'b u~ °4 °3 °2 °1 b7 b6 b5 b4 b3 b2 bl ~ x To be replaced by 4 bit full adder ~--r----- >f- I x I X L.: __ _ x (symbols given in Fig.2) Figure 3-Final scheme for 7-bit x 7-bit log-sum multiplier From the collection of the Computer History Museum (www.computerhistory.org) 786 National Computer Conference; 1976 where t g is the time required for the generation of summands (Le., propagation delay of one 2-input AND-gate), t nd is the addition time for a 4-bit FA and t(. is the carry-propagation delay of the 4-bit FA. Maximum values of tg, tad and t c , as given in TI 1974 catalogue, are 27 ns, 24 ns and 16 ns, respectively. Hence, the worst case multiplication time for n=7 and n = 12 cases are 130 ns and 180 ns, respectively. A 7-bit x 7-bit multiplier was built on a printed circuit board, using the scheme given in Figure 3, for use in a special purpose signal processor. 6 The multiplication time required for each possible combination of inputs was measured. The maximum time was found to be less than 110 ns which is well below the estimated worst case value of T)f' of summands. Hence, the cost of the log-sum multiplier may be written as C=cl·R+c~·Q where C 1 and C 2 are the prices of a quad 2-input ANDgate chip and a 4-bit FA chip, respectively. R is the least integer greater than or equal to n 2/4 and Q is the least integer greater than or equal to (n2-n) /4. The 7-bit x 7-bit log-sum multiplier requires 49 AND-gates and, hence, 13 quad 2-input AND-gate chips (SN 7408), and, 11 4-bit FAs (SN 7483A). The 12-bit x 12-bit multiplier requires 36 quad 2-input AND-gate chips and 33 4-bit F As. The corresponding costs are about $40 and $115, respectively. Speed estimation COMPARISON WITH OTHER SCHEMES As indicated earlier, ~ + I Isb's of the final product are obtained in log2n 4-bit successive additions. The time required to obtain remaining product bits is Cost comparison The log-sum multiplication scheme is most- economical of all the array multiplier schemes, since it employs minimum number of 4-bit F As. Each of the other schemes requires I-bit and/or 2-bit F As in addition to 4-bit F As or can be implemented using larger number of 4-bit F As. The adders required by the carry-save scheme, which is more economic than any other past schemes, are compared with those required by log-sum the carry-propagation delay from ( ~ + 1) th bit to (2n-l)th bit. Therefore, the multiplication time required by the log-sum multiplier is given as T)I=tg+tad'IOg2n+tc'( 2n-2-~ )/4 = tg+tad,log2n+ (3n-4) ·tc/8 a l2 b l2 all b ll a10 a9 b10 bg "a a7 a6 be ~ bs a5 bS Q, bt. QJ bJ iI 0-- 4 til Stagf-adder (othfr syMbols given in Fig. 2 l Figure 4-Final scheme for 12-bit x 12-bit log-sum multiplier From the collection of the Computer History Museum (www.computerhistory.org) Log-Sum Multiplier TABLE I Scheme 0 Log-sum Q Carry-save Q_3n Log-sum Q-1 Carry-save Q_ 2n + 2 T _ _ _ •• _ 2 3 DISCUSSION 4 1 n 4 A 1 .L.iVl5-i::)u..J..!J. "t-.L Carry-save Q_ n + 2 Log-sum Carry-save of log-sum multiplier using TTL logic, presented in this paper. Number of adders required 4-bit F As 2-bit F As 1-bit F As/HAs k 1 787 n 4 Q Q scheme for various values of k (defined in an earlier section) in Table I. As seen from this table, the cost of the log-sum multiplier is at most equal to that of carry-save multiplier. Speed comparison Dadda's scheme requires roughly log1.5n= 1.71 log2n adding stages. The multiplication time required by Dadda's scheme, therefore, is approximately given by T :\m= t g+ t ad' 1•7 1 I Og2n + 2n-2-1.711og2n 4 ·te assuming 4-bit FAs throughout (including the final addition). It is clear from the expressions for TM and T:\lD that for all n, T:\ID~T:\I' The worst case values of T:\I and T:\ID for n=7, 12 and 32 are given in Table II. If a G (let G be the smallest integer~)-bit CLA adder, built from Schottky TTL I.C. chips, is employed to add the final two numbers in Dadda's scheme, T)ID reduces significantly. However (i) use of Schottky TTL I.C.s increases the multiplier cost enormously, and (ii) T:\ID reduces to less than or equal to T:\I only for large values of n (>32). Pezaris' has given a 40-ns 17-bit x 17-bit array multiplier, using basically the carry-save scheme. However, he has used ECL logic and, hence, the cost and speed of his multiplier cannot be compared with those The configuration presented in this paper yields a fast multiplier using the optimum number of 4-bit CLA adders. The scheme can be adapted to M-bit CLA adders by making the number of adders in each row a multiple of lVI instead of a multiple of 4 in step (ii) of the second section of this paper. The scheme has been presented for the multiplication of two sign+n-hit numbers. The sign of the product is generated separately. Baugh and WooleyS have given an algorithm for 2's complement array multiplier. in which the multiplication process is same as given in Figure 1 except that the number of rows increases from n to n+2. Therefore, the 2's complement multiplication can easily be implemented by logsum scheme. A 4-bit x 4-bit multiplier from TI (requiring two chips, SN 74284 and SN 74285) costs approximately $26, and, the worst case and typical values of multiplication time are 60 ns and 40 ns, respectively. The same multiplier based on log-sum scheme costs approximated $11, while the worst case and typical values of multiplication time are approximately same as above. Finally, the log-sum scheme fully exploits both the cost effectiveness and speed effectiveness of the 4-bit F As and, therefore, is well suited for multipliers in special purpose hardware requiring high speeds of operation. REFERENCES 1. Dadda, L., "Some schemes for parallel multipliers," Alta Frequenza, Vol. 34, pp. 349-356, March 1965. 2. Wallace, C. S., "A suggestion for a fast multiplier," IEEE Trans. Electronic Computers, Vol. EC-13, pp. 14-17., Feb. 1964. 3. Braun, E. L., Digital Computer Design. New York, Academic Press, 1963. 4. Habibi, A. and P. A. Wintz, "Fast multipliers," IEEE Trans. Computers (Short notes), Vol. C-19, pp. 153-157, Feb. 1970. 5. Kuck, D. J., "Illiac IV software and application programming," IEEE Trans. Computers, Vol. 17, pp. 758-770, August 1968. 6. Agrawal, J. P., V. U. Reddy and H. Renganathan, Design and Test Evaluation of Arithmetic Unit for FFT Processor, TABLE II n Log-sum scheme TM (ns) Dadda's scheme TMD(ns) 7 12 32 130 180 330 170 230 450 Technical Report, CSD/TR/A12-2/19, Sept. 1974, lIT, Madras. 7. Pezaris, S. D., "A 40-ns 17-bit by 17-bit array multiplier," IEEE Trans. Computers (Short Notes), C-20, pp. 442-447, April 1971. 8. Baugh, C. R. and B. A. Wooley, "A two's complement parallel array multiplication algorithm," IEEE Trans. Computers, Vol. C-22, pp. 1045-1047, December 1973. From the collection of the Computer History Museum (www.computerhistory.org) From the collection of the Computer History Museum (www.computerhistory.org)
© Copyright 2026 Paperzz