A Simple High-Speed Multiplier Design D.S.Dawoud Peter D. Dawoud University of KwaZulu Natal University of KwaZulu Natal Email: [email protected] email: [email protected] Ndagije Charles National University of Rwanda email: [email protected] ABSTRACT The performance of multiplication is crucial for multimedia applications such as 3D graphics and signal processing systems, which depend on the execution of large numbers of multiplications. Many algorithms are proposed to implement high speed parallel multipliers. These algorithms mainly focused on rapidly reducing the partial products rows down to final sums and carries used for the final accumulation. These techniques mostly rely on circuit optimization and minimization of the critical paths. This paper focuses on reducing the number of generated partial product rows before applying any of the partial products reduction techniques. Fewer partial products rows means lowering the overall operation time. The paper introduces two techniques for reducing the number of generated partial product rows. The first technique uses an algorithm and a circuit that result in finding the 2’s complement in a fast way. The second technique uses the conventional way of getting the 2’s complement (complement a binary number and add 1 to the complemented number), but a simple hardware is proposed to implement the “add 1” operation without any carry propagation. In addition to the speed improvement, our algorithms result in a true diamond-shape for the partial product tree, which is more efficient in terms of implementation. The simulation of our proposed techniques showed large improvement in speed and in power consumption when compared to conventional multiplication algorithms. Index Terms—Multiplier, Booth Algorithm, modified Booth encoding (MBE), partial products. 1. INTRODUCTION Many applications e.g. digital signal processing systems and 3D graphics are highly multiplicationintensive. The performance of such systems strongly depends on the performance of the multiplier and the algorithm used for implementing the multiplication operations. Therefore, there has been much work on advanced multiplication algorithms and architectures during the last decade [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19]. Multiplication consists of three major steps. In the first step, the partial products are generated. In the second step, the partial products are reduced to one row of final sums and one row of carries. In the third step, the final sums and carries are added to generate the result. For the first step, most of the authors employ the Modified Booth Encoding (MBE) approach [3], [4], [5], [11], [12], [19] because of its ability to cut the number of partial products rows in half. In the second step, the authors use, normally, some variation of the known partial products reduction schemes, such as Wallace trees [4], [6], [21], [26] or compressor trees [6], [12], [13], [14], [15]. They use such techniques to rapidly reduce the number of partial product rows to the final two (sums and carries). In the third step, they use some kind of advanced adder approach such as carry-lookahead or carry- select adders [6], [12], [18] to add the final two rows, resulting in the final product. The main focus of recent multiplier papers [3], [9], [10], [14], [16], [15] has been on rapidly reducing the partial product rows by using some kind of circuit optimization and identifying the critical paths and signal races. In other words, the goals have been to optimize the second step of the multiplication described above. It is easily to deduce that in all these techniques the time required to reduce the partial product rows depends on the number of rows to be reduced. For example, when using Wallace tree or compressor tree, the propagation time depends on the depth of the tree (the number of levels in the tree). This means that to improve the performance of the multiplier we have to start by looking for algorithms and techniques that produce fewer partial product rows. This paper concentrates on the first step, i.e. forming the partial product array. The paper target is finding a multiplication algorithm and circuits which will produce fewer partial product rows. By having fewer partial product rows, the reduction tree can be smaller in size and faster in speed. A well defined approach was presented by Jung-Yup Kang and Jean-Luc Gaudiot in [11]. This paper describes a scheme to eliminate the sign extension problem produced in MBE Wallace multipliers. The authors of [11] have proved that taking the 2’s complement of the multiplicand reduces one row of partial products and a negative signal, which reduces the carry save adder tree (CSAT). The reduction of CSAT also reduces the delay and power consumption. In this paper they also presented a fast technique to compute the 2’s complement of a binary number. The main problem with this proposal is that it is practical for multiplying words of limited width (they recommended 8 x 16). Increasing the word length results in hardware complexity and also increases the multiplication time. These limitations are completely avoided in our proposals. The proposed techniques are suitable for any word length. The paper is organized as follows: In SectionII, the conventional multiplication method is described with an emphasis on its weaknesses. In Section-III, a step-by-step procedure to prevent the adverse effects of some conventional multiplication algorithms is presented. In Section-IV, the effectiveness and the implementation evaluation and analysis of our method are described and, finally, a summary is presented. number of partial product rows to be accumulated is reduced from n to n/2, where n is the multiplier width. This explains why it is used in implementing many multipliers [5], [7], [8], [14], [15], [26]. However, it is important to note that there are two unavoidable consequences of using MBE: sign extension and negative encoding. The combination of these two unavoidable consequences results in the formation of one additional partial product row and, of course, this additional partial product row requires not only more hardware, but also, and more importantly, time (to add this one more row of partial products). Let us look at the benefit and the overhead of MBE by an example. For an 8-bit x 8-bit multiplication, a multiplier without MBE will generate eight partial product rows (because there is one partial product row for each bit of the multiplier). However, with MBE, only n/2 (= 4) partial products rows are generated, as shown in the example of Figure 1. The partial products may take one of the values 2Y, Y, 0, -Y, and -2Y. Because of the possibility of generating negative encoding, it is needed to add the neg signals (neg0, neg1, neg2, and neg3). This means that the actual number of partial product rows is [(n/2) +1] and not n/2 (neg3 in Figure 1). Having one more partial product row adds at least one more EXOR-delay to the time to reduce the partial products. This fact that one additional partial product row brings delay is even more critical for multiplications of words with small word length (e.g. 8 x16) than with longer operands because of the relatively higher delay effect that this additional row brings. There is also an extra hardware cost since one more carry saving adder stage hardware is necessary. 2. THE CONVENTIONAL MULTIPLICATION ALGORITHMS AND THE OVERHEAD Overhead due to Sign Extension: Overhead due to negative Partial Product terms: The second overhead of using MBE is the sign extension, Figure 1. When adding the n/2 partial products Pi’s, the Pi+1th partial product is placed two bits to the left of the Pith partial product. However, all Pi’s should be sign extended to the m.n binary position. Modified Booth Encoding (MBE) is one of the most efficient techniques that can be used to reduce the number of partial products. Applying MBE, the pp80 pp81 pp82 pp83 pp80 pp81 pp82 pp83 pp80 pp81 pp82 pp73 pp80 pp81 pp82 pp63 pp80 pp81 pp72 pp53 pp80 pp81 pp62 pp43 pp80 pp71 pp52 pp33 pp80 pp61 pp42 pp23 x7 x6 x5 x4 x3 x2 x1 x0 y7 y6 y5 y4 y3 y2 y1 y0 pp70 pp51 pp32 pp13 pp60 pp41 pp22 pp03 pp50 pp31 pp12 pp40 pp21 pp02 neg2 pp30 pp11 pp20 pp01 neg1 pp10 pp00 neg0 neg3 Fig.1 The array of partial products for signed multiplication with MBE row, and the remaining bits are added to the other partial product rows. Concerning the execution time, the signal neg3 is the only overhead. The sign extension results in a non regular shaped partial product array. Regular shaped partial product array is a more efficient configuration for VLSI implementation. 3. PREVENTING THE ADDITIONAL PARTIAL PRODUCT ROW Sign Extension prevention The sign extension can be avoided if we accumulate the partial products Pi’s one at a time. In case of parallel multipliers, e.g. use of Wallace tree, we add all of the partial products in parallel. Thus such a sign extension becomes very costly. We are considering here one way to prevent sign extension. This technique assumes that all the partial products are negative. In this case, the sum of all sign extensions can be pre-calculated: To prevent the extra partial product row and, thus, save the time of one additional carry save adding stage and the hardware required for the additional carry save adding, it is necessarily to find ways to remove the last neg signal (neg3 in Figure 3) . Removing the last neg signal can also generate a more regularly shaped partial product array, making it a more efficient configuration for VLSI implementation. In this paper we are proposing two techniques to remove the last neg. The first proposed technique is based on stopping the generation of the neg signal by introducing a fast method to find the 2’s complements. In the second proposal, the signal neg is generated as usual but a suitable circuit is used to take it into consideration but not as a separate partial product row. M /2 signs = ∑ ((−1).2 N ).4 i i =0 2M −1 = 2 (−1)( ) 3 N The above relation can be interpreted to mean that a fixed number (-1)[(2M -1)/3], should be added to the (unextended) partial products, starting from the Nth binary position leftwards. This number, if expressed in binary is equal to 1010101…01011, where there are M/2 – 1 zeros. If it turns out that a partial product, Pi = di.Y (di = 2, 1, 0, -1, -2) is indeed positive, we simply replace its sign bit with a “1” to undo the effect of our earlier assumption about the negativeness of the partial product Pi. The technique is shown in Figure 2 and it can be easily modified to take the form shown in Figure 3. The modified form, Figure 3, means that the additional word that is used for sign extension prevention does not increase the number of rows that result when using MBE. The first three bits of the word are taken into consideration by the first partial product 1 pp83 pp73 0 1 pp82 pp63 pp72 pp53 pp81 pp62 pp43 pp71 pp52 pp33 pp80 pp61 pp42 pp23 0 1 0 1 1 3.1 Removing the Last neg Signal by Using a Fast Method to Find 2’s Complement As mentioned before, if the encoding technique generates only the signals +2Y, +Y, and 0 (where Y is the multiplicand) the neg signals would not be necessary, i.e. there would not be need for the additional overhead partial product row (neg3 in our x7 x6 x5 x4 x3 x2 x1 y7 y6 y5 y4 y3 y2 y1 x0 y0 pp70 pp51 pp32 pp13 pp60 pp41 pp22 pp03 pp50 pp31 pp12 pp40 pp21 pp02 neg2 pp30 pp11 pp20 pp01 neg1 pp10 pp00 neg0 neg3 Fig. 2 Multiplication algorithm with negative encoding and sign extension prevention. x7 1 0 pp83 1 pp73 0 pp82 pp63 1 pp72 pp53 pp80 pp81 pp62 pp43 /pp80 pp71 pp52 pp33 /pp80 pp61 pp42 pp23 x6 x5 x4 x3 x2 x1 x0 y7 y6 y5 y4 y3 y2 y1 y0 pp70 pp51 pp32 pp13 pp60 pp41 pp22 pp03 pp50 pp31 pp12 pp40 pp21 pp02 neg2 pp30 pp11 pp20 pp01 neg1 pp10 pp00 neg0 neg3 Fig. 3 Modified form of Figure 2. examples). MBE encoding generates the signals +2Y, +Y, 0, -2Y, -Y which necessitates the existence of the signals neg to calculate the negative values (to get the 2’s complement). Taking into consideration the fact that neg exists to produce the 2’s complement, we can say that if we could somehow produce the two’s complement of the multiplicand while the other partial products were produced, there would be no need for the last neg because this neg signal would have already been applied when generating the two’s complement of the multiplicand. In such a case, we “only” need to find a faster method to calculate the two’s complement of a binary number. The first proposal is, thus, based on finding an efficient way of finding the two’s complement of a binary number. Figure 4), then the value is kept unchanged and, if the conversion signal is “1” (the checks in Figure 4), then the value is complemented. The conversion signals after the rightmost “1” are always 1. They are “0” otherwise. Once a lower order bit has been found to be a “1”, the conversion signals for the higher order bits to the left of that bit position should all be “1”. The main problem with this technique is how to find a fast algorithm that can be used for searching for the rightmost “1”. The searching for the rightmost “1” could be as time consuming as rippling a carry through to the MSB since the previous bits information must be transferred to the MSB. In this section we are proposing a method to expedite this detection of the rightmost “1”. In a recent paper [11], the authors proposed a technique that can achieve the search for the rightmost “1” in logarithmic time by using a binary search treelike structure. This technique is practical for small word lengths. The proposed algorithm and its implementation is shown in Figure 5. The output of the ith stage, si is given by equation - 1. A Fast Method to Find Two’s Complements The conventional method of getting the 2’s complement by first complementing the binary number and the add “1” to the complemented number (this is the way used effectively in Figures 1 to 3) can not be used in case of parallel multiplication. This is because the propagation delay of the carry linearly increases with the word size and it would be much greater than the delay to generate the partial products. In the second well known algorithm for getting the 2’s complement, which we are going to use in our proposal, all the bits after the rightmost “1” in the word are complemented but all the other bits are unchanged. The two’s complement of a binary number 01010100 is 10101100 (Figure 4). For this number, the rightmost “1” happens in bit position 2 (the check mark position in Figure 4). Therefore, values in bit positions 3 to 7 can simply be complemented, while values in bit positions 0, 1 and 2 are kept unchanged. Using this Algorithm, two’s complementation now comes down to finding the conversion signals that are used for selectively complementing some of the input bits. If the conversion signal at any position is “0” (the crosses in Bit position Input binary 2’s Complement 7 0 / 1 6 1 / 0 5 0 / 1 4 1 / 0 3 0 / 1 si = a i ⊕ (ai −1 + a i −2 + .... + a1 + a 0 ) a i = a i if (ai −1 + a i −2 + ... + a 0 ) = 0 if (ai −1 + a i −2 + ... + a 0 ) = 1 (1) In this expression “+” means logic “OR”, “ ⊕ ” means EXOR and (ai-1+ai-2 + ….. + a0) represents the conversion signal. This algorithm achieves all the requirements for getting the 2’s complement: • (ai-1 + ai-2 +…+ a0) = 0 means that the conversion signal is “0” and the value of ai kept unchanged. This happens only when all the bits to the right of ai are zeros. • (ai-1 + ai-2 +…+ a0) = 1 means that the conversion signal is “1” and the value of ai is complemented ( si = ai ). This happens only when one of the bits to the right of ai is “1”. 2 1 / X 1 1 0 X 0 0 0 X 0 First 1’s Appearance from LSB Complement Fig.4 2’s Complement conversion example This algorithm is suitable for any word length without any significant increase in the time required to get the 2’s complement. In order to remove any possibilities for any delay that may arise due to the propagations of the signals between the OR gates, it is possible to group the word into groups of 4 bits (for example) and between each two groups we use an “OR” gate to stop any possible propagation as shown in Figure 5(b). The proposed partial products after removing the last neg (neg3 in Figure 3), is shown in Figure 6. (a) (b) Fig. 5 Finding 2’s Complement using the proposed algorithm 1 0 s8 1 s7 0 pp82 s6 1 pp72 s5 pp80 pp81 pp62 s4 /pp80 pp71 pp52 s3 /pp80 pp61 pp42 s2 x7 x6 x5 x4 x3 x2 x1 x0 y7 y6 y5 y4 y3 y2 y1 y0 pp70 pp51 pp32 s1 pp60 pp41 pp22 s0 pp50 pp31 pp12 pp40 pp21 pp02 neg2 pp30 pp11 pp20 pp01 neg1 pp10 pp00 neg0 Fig. 6 Proposed partial products after removing the last neg m/2 0 1 0 pp45 1 pp35 0 pp44 pp25 1 pp34 pp15 0 pp43 pp24 pp05 neg5 1 pp33 pp14 0 pp42 pp23 pp04 neg4 pp60 p41 pp22 pp03 neg3 1 pp32 pp13 /pp50 pp31 pp12 /pp40 pp21 pp02 neg2 ------- n -------------- X3 X2 X1 X0 Y3 Y2 Y1 Y0 pp30 pp20 pp10 pp00 pp11 pp01 neg0 neg1 Fig. 7 Partial products when m > (n+4) By applying the first proposal for getting the 2’s complementation, the last PPR (in Figure 3) is correctly replaced without the last neg (Figure 6). Now, the multiplication can have a smaller critical path. This avoids having to include one extra carry saving adding stage. It also reduces the time to find the product and saves the hardware corresponding to the carry saving adding stage. The only difference between the multiplier architecture in our case and the conventional multiplier architectures is that, for the last partial product row, our architecture has no partial product generation but partial product selection with a two’s complement unit. A 3-5 coder selector is needed to select the correct value from five possible inputs (2 x X, 1 x X, 0, -1 x X, -2 x X) which are either coming from the two’s complement logic or the multiplicand itself and input into the row of 5-1 selectors. Unlike the other rows which use PPRG (Partial Product Row Generator), the two’s complement logic does not have to wait for MBE to finish: The two’s complementation is performed in parallel with MBE and the 3-5 decoder. - In the case of m > n+4 (Figure 7), the last neg signal can be included in the first row without affecting any of the original bits in that row. In other words, if Wallace tree or any similar way is used for reducing the partial product rows, the last neg signal (negn/2-1) will not represent any additional row. The above two observations represent the base of the second proposal. Figure 8 shows the proposed circuit in case of m = n. The least significant n - 2 bits of the first row stay without any change. The last five bits in the first row together with the neg signal of the last row are used as inputs (address) to a small lookup table of capacity 26 words x 6 bits. The output of the lookup table will be exactly as the input in case of neg =0 or it will be the inputs incremented by one if neg = 1. Partial Product Row 0 pp80 /pp80 /pp80 pp70 pp60 pp50 ........ pp00 26 x 6-bit Look up Table negn/2-1 3.2 Removing the last neg signal by using a Simple Look-up Table Figure 3 given before represents the shape of the partial product rows (PPR) in the case when the multiplier length (m) equals to the multiplicand length (n). Figure 7 represents the corresponding shape when the multiplier length (m) is greater than the multiplicand length by 4 bits or more. From these two figures, it is easy to note the following: In the case when n = m (Figure 3), if we added the last neg signal to the first partial product row, this will have an effect on the last five bits of this row (the first row). Modified Partial Product Row 0 Fig. 8 Proposed way of adding neg to first PPR The value of the last neg signal negn/2-1 depends on the most significant three bits of the multiplier (see the MBE truth table): neg m / 2−1 = y m−1 .( y m −2 + y m−3 ) (2) Equation (2) shows that it is possible to form the last neg signal while forming the first partial product row. This means that the circuit proposed at Figure 8 can be The proposed multiplier architecture in case of m = n is shown in Figure 9. used to generate the modified six bits of the first partial product row without any need for any extra delay. 4. CRITICAL PATH AND PERFORMANCE ANALYSIS We introduced two techniques to achieve that. The two techniques are suitable for any word length. The results showed improve of 15% improvement in speed increases with the increase of the word length. The performance of our multiplier architecture when using the first proposal is clearly depend on the speed of the generating the last PPR (in other words the speed of generating the two’s complementation step). if we can generate the last partial product row of our multiplier architecture within the exact time that the other partial product rows are generated, the performance will be improved as we have predicted because of the removal of the additional partial product row. Similarly in the case of the second proposal the performance depends on the speed of getting the modified PPR0. The time delays required to generate the last PPR (or the modified first PPR in the second case) must be calculated and compared to the delays of conventional partial product generation methods. We run such comparison using normalized gate delays. Also we investigated the overall performance (in terms of speed, area, and power) of using our multiplier architectures as compared to the conventional methods. The main important result of the analysis is the fact that the delay required to generate the last PPR (or the first PPR) is almost constant and does not depend on the word length. This is a very good result compared with the technique proposed in [11]. The details of the analysis and the table of comparison will be published in the complete text of the paper on the web. smaller number of partial product rows to add. As we have shown, we can achieve this using less hardware. Multiplicand X MBE0 negn/2-1 PP0modified Multiplier Multiplicand X MBE1 PP1 : : : 5. CONCLUSIONS In this paper, we have presented simple, high-speed, and well-structured multiplication algorithms. Our multiplication algorithms focus on the first step of a multiplication algorithm, which is the partial product generation step to reduce from n/2 + 1 to n/2 the number of partial product rows generated. By doing so, the structure of the partial product array becomes more regular and easier to implement. Even more importantly, the product is found faster because of the Multiplicand X MBE n/2 Fig.9 Proposed multiplier architecture 6. REFERENCES [1] A.D. Booth, “A Signed Binary Multiplication Technique,” Quarterly J. Mechanical and Applied Math., vol. 4, pp. 236-240, 1951. [2] L. Dadda, “Some Schemes for Parallel Multiplier,” Alta Frequenza, vol. 34, pp. 349-356, 1965. [3] F. Elguibaly, “A Fast Parallel MultiplierAccumulator Using the Modified Booth Algorithm,” IEEE Trans. Circuits and Systems, vol. 47, no. 9, pp. 902-908, 2000. [4] J. Fadavi-Ardekani, “M x N Booth Encoded Multiplier Generator Using Optimized Wallace Trees,” IEEE Trans. Very Large Scale Integration, vol. 1, no. 2, pp. 120-125, 1993. [5] A. Farooqui and V. Oklobdzija, “General Data-Path Organization of a MAC Unit for VLSI Implementation of DSP Processors,” Proc. 1998 IEEE Int’l Symp. Circuits and Systems, vol. 2, pp. 260-263, 1998. [6] Dawoud D.S., “Design of Fast Parallel Multiplier Using an Algorithm Approach,” Proc. IEEE International Conference on Electronic, Circuits & Systems, EGYPT, Dec. 15-18, 1997, pp. 919- 924 [7] Dawoud D.S.” Carry-Save Multiplier with Recoded Operands” Al –Azhar University Engineering Journal, pp. 159-168, September, 2000 (International Refereed Journal, ISSN: 1110-6409). [8] Dawoud D.S., “Novel Serial-Parallel Multiplier” Proc. Joint Conference of 5th World Multiconference on Systemics, Cybernetics and Information (SCI 2001) and the 7th International Conference on Information Systems Analysis and Synthesis (ISAS2001), ORLANDO – USA, July 2001. [9] Dawoud D.S., “Very High Speed Pipelined SerialParallel Multiplier”, Proc. Joint Conference of 5th World Multiconference on Systemics, Cybernetics and Information, ORLANDO-USA, July 2001. [10] N. Itoh, Y. Naemura, H. Makino, Y. Nakase, T. Yoshihara, and Y. Horiba, “A 600-MHz 54x54-bit Multiplier with Rectangular-Styled Wallace Tree,” IEEE J. Solid-State Circuits, vol. 36, no. 2, pp. 249257, 2001. [11] J.-Y. Kang and J.-L. Gaudiot, “A Fast and WellStructured Multiplier,” EUROMICRO Symp. Digital System Design, pp. 508- 515, Aug. 2004. [12] J.-Y. Kang, W.-H. Lee, and T.-D. Han, “A Design of a Multiplier Module Generator Using 4-2 Compressor,” Proc. Korea Inst. of Telematics and Electronics (KITE) Fall Conf, vol. 16, pp. 388-392, 1993. [13] M. Nagamatsu, S. Tanaka, J. Mori, T. Noguchi, and K. Hatanaka, “A l5ns 32x32-bit CMOS Multiplier with an Improved Parallel Structure,” Digest of Technical Papers, IEEE Custom Integrated Circuits Conf., 1989. [14] V.G. Oklobdzija, D. Villeger, and S. S. Liu, “A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach,” IEEE Trans. Computers vol. 45, no. 3, pp. 294-306, Mar. 1996. [15] Santoro and M. Horowitz, “SPIM: A Pipelined 64x64-bit Iterative Multiplier,” IEEE Trans. Circuits and Systems, vol. 24, no. 2, pp. 487-493, 1989. [16] P.F. Stelling, CU. Martel, V.G. Oklobdzija, and R. Ravi, “Optimal Circuits for Parallel Multipliers,” IEEE Trans. Computers, vol. 47, no. 3, pp. 273-285, Mar. 1998. [17] Moises E. Robinson, Earl Swartzlander “A reduction scheme to optimize the Wallace multiplier” IEEE International Conference on Computer Design (ICCD) 1998. [18] A. Weinberger, “4:2 Carry-Save Adder Module,” IBM Technical Disclosure Bull., vol. 23, 1981. [19] W.-C. Yeh and C.-W. len, “High-Speed Booth Encoded Parallel Multiplier Design,” IEEE Trans. Computers, vol. 49, no. 7, pp. 692- 701, July 2000. [20] M.D. Ercegovac and T. Lang, Digital Arithmetic. Los Altos, Calif.: Morgan Kaufmann, 2003. [21] M.J.Liao, C.F.Su, Chang and Allen Wu “A carry select adder optimization technique for highperformance Booth-encoded Wallace tree multipliers” IEEE International Symposium on Circuits and Systems 2002. [22] D.A. Patterson and J.L. Hennessy, Computer Architecture: A Quantitative Approach. San Mateo, Calif.: Morgan Kaufmann, 1996. [23] D. Gajski, Principles of Digital Design. Prentice Hall, 1997. [24] R. Hashemian and C. P. Chen, “A New Parallel Technique for Design of Decrement/Increment and Two’s Complement Circuits,” Proc. 34th Midwest Symp. Circuits and Systems, vol. 2, pp. 887-890, 1991. [25] Z. Huang and M. Ercegovac, “High-Performance Left-to-Right Array Multiplier Design,” Proc. 16th Symp. Computer Arithmetic, pp. 4-11, June 2003. [26] D. Bakalis, E. Kalligeros, D. Nikolos ” Low power BIST for Wallace tree based fast multipliers” First International Symposium on Quality of Electronic design, 2000. [27] King Fai Pang “Architectures for pipelined Wallace tree multiplier-accumulators” IEEE International Conference on VLSI in computers and processors 1990 (ICCD’90). This paper is based on research work conducted by Peter Dawoud (University of KwaZulu Natal) under the supervision of Prof. D.S.Dawoud (Head of Computer Engineering Program, University of KwaZulu Natal- South Africa). Peter Dawoud’s research work aims in designing a high-performance arithmetic unit and its VLSI implementation.
© Copyright 2026 Paperzz