ISSN 2319-8885 Vol.04,Issue.51, December-2015, Pages:11060-11067 www.ijsetr.com Efficient Method of BCD Multiplication to Reduce Power and Delay Values VUTLA APARNA1, P. BALA MURALI KRISHNA2 1 PG Scholar, Dept of ECE, Sri Mittapalli Institute of Technology for Women, Guntur, AP, India, E-mail: [email protected]. 2 Professor, Dept of ECE, Sri Mittapalli Institute of Technology for Women, Guntur, AP, India, E-mail: [email protected]. Abstract: In this paper, We propose an algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speed up its computation. In this paper, we also develop new techniques to reduce the latency and area of previous representative high performance implementations. The Partial products are generated in parallel using a signed-digit radix-10 recoding of BCD multiplier with the digit set [-5, 5], and set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in excess-3 code(XS-3). This encoding has many advantages. First, it is self-complementing codes, so that a negative multiplicand multiple can be obtained by just inverting bits of the corresponding positive one. The available redundancy allows a fast and simple generation of multiplicand multiples in a carry free way. The partial products can be recoded to the overloaded BCD representation (ODDS) by just adding a constant factor into the partial product reduction tree. To show the advantages of our proposed architecture, we have synthesized a RTL model for 16 x 16-digit multiplications and performed a comparative survey of the existing representative designs. We show that the proposed multiplier has an area improvement roughly in the range 20-35 percent for similar target delays with respect to the fastest implementation. Keywords: Parallel Multiplication, Overloaded BCD Representation, Redundant Excess-3 Code. I. INTRODUCTION Decimal fixed-point and floating-point formats are important in financial, commercial, and user-oriented computing, where conversion and rounding errors that are inherent to floatingpoint binary representations cannot be tolerated [1]. The new IEEE 754-2008 Standard for Floating- Point Arithmetic [2], which contains a format and specification for decimal floating-point (DFP) arithmetic [1], [2], has encouraged a significant amount of research in decimal hardware [3], [5] . Furthermore, current IBM Power and z/System families of microprocessors [5], and the Fujitsu Sparc X microprocessor [6], oriented to servers and mainframes, already include fully IEEE 754-2008 compliant decimal floating-point units (DFPUs) for Decimal64 (16 precision digits) and Decimal128 (34 precision digits) formats. Since area and power dissipation are critical design factors in state-of-the-art DFPUs, multiplication and division are performed iteratively by means of digit-by-digit algorithms [4], [5], and therefore they present low performance. Moreover, the aggressive cycle time of these processors puts an additional constraint on the use of parallel techniques [4], [6], [10] for reducing the latency of DFP multiplication in high-performance DFPUs. Thus, efficient algorithms for accelerating DFP multiplication should result in regular VLSI layouts that allow an aggressive pipelining. Hardware implementations normally use BCD instead of binary to manipulate decimal fixed-point operands and integer significant of DFP numbers for easy conversion between machine and user representations [1], [15]. BCD encodes a number X in decimal (non-redundant radix-10) format, with each decimal digit Xi £ [0, 9] represented in a 4-bit binary number system. However, BCD is less efficient for encoding integers than binary, since codes 10 to 15 are unused. Moreover, the implementation of BCD arithmetic has more complications than binary, which lead to area and delay penalties in the resulting arithmetic units. A variety of redundant decimal formats and arithmetic’s have been proposed to improve the performance of BCD multiplication. The BCD carry-save format [9] represents a radix-10 operand using a BCD digit and a carry bit at each decimal position. It is intended for carry-free accumulation of BCD partial products using rows of BCD digit adders arranged in linear [6], [9] or tree-like configurations [15]. Decimal signed-digit (SD) representations [10], [11], [14] rely on a redundant digit set f {a, … 0, . . . a}, 5 ≤ a ≤9, to allow decimal carry-free addition. The overloaded BCD (or ODDS—overloaded decimal digit set) representation was proposed to improve decimal multioperand addition [13], and sequential [8] and parallel [12], [13] decimal multiplications. In this code, each 4-bit binary value represents a redundant radix-10 digit Xi £[0,15]. The ODDS presents interesting properties for a fast and efficient hardware implementation of decimal arithmetic:(1) it is a redundant decimal representation so that it allows carry-free generation of both simple and complex decimal multiples (2X, 3X, 4X, 5X, 6X,. . .) and addition, (2) since digits are represented in the binary number system, digit operations can be performed with binary Copyright @ 2015 IJSETR. All rights reserved. VUTLA APARNA, P. BALA MURALI KRISHNA arithmetic, and (3) unlike BCD, there is no need to implement SD radix- 10 digits. Each SD radix-10 digit controls a level additional hardware to correct invalid 4-bit combinations. A of 5:1 muxes , which selects a positive multiplicand [13] disadvantage with respect to signed-digit and selfmultiple (0, X, 2X, 3X, 4X, 5X) coded in (4221). To obtain complementing codes, is a slightly more complex each partial product a level of XOR gates inverts the output implementation of 9’s complement operation for negation of bits of the 5:1 muxes when the sign of the corresponding SD operands and subtraction. radix-10 digit is negative. Before being reduced the d+ 1 partial product, coded in (4221), are aligned according to their In this work, we focus on the improvement of parallel decimal weights. Each p-digit column of the partial product decimal multiplication by exploiting the redundancy of two the decimal digit p: 2 CSA trees. The number of digits to be decimal representations: the ODDS and the redundant BCD reduced [5] for each column varies from p=d+1 to p= 2. Thus, excess-3 (XS-3) representation, a self-complementing code the d+ 1 partial product are reduced to two 2d digit operands with the digit set [-3, 12]. We use a minimally redundant digit S and H coded in (4221). set for the recoding of the BCD multiplier digits, the signeddigit radix-10 recoding [30], that is, the recoded signed digits are in the set f{-5,-4,-3,-2-1,0,1,2,3,4, 5}. For this digit set, the main issue is to perform the * 3 multiple without long carry-propagation (note that * 2 and * 5 are easy multiples for decimal [3] and that * 4 is generated as two consecutive * 2 operations). By performing the reduction of partial products coded in ODDS via binary carry-save arithmetic. Partial product generation (PPG). By generating positive multiplicand multiples coded in XS-3 in a carry free form. An advantage of the XS-3 representation over non-redundant decimal codes (BCD and 4221/ 5211 [3]) is that all the interesting multiples for decimal partial product generation, including the 3X multiple, can be implemented in constant time with an equivalent delay of about three XOR gate levels. Moreover, since XS-3 is a self-complementing code, the 9’s complement of a positive multiple can be obtained by just inverting its bits as in binary. Partial products can be recoded from the XS-3 representation to the ODDS representation by just adding a constant factor into the partial product reduction tree. The resultant partial product reduction tree is implemented using regular structures of binary carry-save Fig.1. Binary P: 2 CSA Tree. adders or compressors. The 4-bit binary encoding of ODDS operands allows a more efficient mapping of decimal The nine’s complement of a positive decimal operand is algorithms into binary techniques. By contrast, signed-digit given by this implementation of leads to a complex impleradix-10 and BCD carry-save redundant representations mentation, since the Zi digits of the multiples generated may require specific radix-10 digit adders [1], [7], [16]. The rest of take values higher than 9. A simple implementation is this paper is organized as follows. In Section 2, preliminaries obtained by observing that the excess-3 of the nine’s are discussed. Section 3 explains the proposed method. complement of an operand is equal to the bit-complement of Section 4 shows the experimental results. Finally, Section 5 the operand coded in excess-3 [5]. The final product is a 2dconcludes this paper. digit BCD word given by P=2H +S. Before being added, S and H need to be processed. S is recoded from (4221) to BCD II. RADIX-10 PARALLEL DECIMAL MULTIPLIER excess-6. The H × 2 multiplication is performed in parallel The Radix-10 architecture for d-digit BCD decimal fixedwith the recoding of S. This ×2 blocks uses a (4221) to (5421) point parallel multiplication [3] is based on the techniques for digit recoder and a 1-bit wired left shift to obtain the operand partial product generation and reduction respectively. The 2H coded in BCD shows in Fig.3. For the final BCD carry code (4221) and (5211) is used instead of BCD to represent propagate addition [4] uses a quaternary tree (Q-T) adder the partial product is the main feature of this architecture. based on conditional speculative decimal addition. It has low This improves the reduction of decimal partial product [7] latency and requires less hardware than other alternatives. C. with respect to other proposals, in terms of latency and area is Partial Product Reduction The partial product arrays expected. The architecture of the d-digit SD radix -10 generated by the SD radix-10 encoding each column of p multiplier consists of the following stages, Generation of digits is reduced to two digits by means of a decimal digit p:2 decimal partial products coded in (4221), reduction of partial CSA tree shown in Fig.1. The decimal carries are passed beproducts [9] and a final BCD carry propagate addition. B. tween adjacent digit columns and decimal coding method Partial Product Generation The generation of the d+1 partial used for decimal carry-save addition [6]. product is performed by an encoding of the multiplier into d International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.51, December-2015, Pages: 11060-11067 Efficient Method of BCD Multiplication to Reduce Power and Delay Values carry bit at each decimal multiplication area and power dissipation are critical design factors in DFPU. Multiplication and division are performed iteratively by means of digit-bydigit algorithms for reducing the latency of DFP multiplication in high performance DFPUs. Fig.2. Scheme of x2 for BCD-4221. To perform the decimal coding instead of BCD for an efficient implementation of decimal carry-save addition [15] with binary CSAs or full adders use (4221) and (5211). The use of these codes avoids the need for decimal corrections and need to focus on the ×2 decimal multiplication shown in Fig.2. The Decimal p:2 CSA Trees for Digits Coded in (4221) Operands Long carry propagation because of that area and delay is more in this system. The proposed method is a new delay measurement technique for the small-delay defect detection technique. III. DECIMAL PARTIAL PRODUCT GENERATION A. SD Radix-10 Architecture The algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speed up its computations are redundant BCD excess-3 code (XS-3) and the overloaded BCD representation (ODDS). Proposed techniques are developed to reduce significantly the latency and area of previous representative high performance implementations. A. Partial Product Generation Partial products are generated in parallel using a signed-digit radix-10 recoding of the BCD multiplier with the digit set [-5,5] and a set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3 encoding has several advantages. First, it is a self-complementing code the negative multiplicand multiple can be obtained by just inverting the bits of the corresponding positive one. The available redundancy allows a fast and simple generation of multiplicand multiples in a carry-free way. Finally, the partial products can be recoded to the ODDS representation by just adding a constant factor into the partial product reduction tree. The ODDS uses a similar 4-bit binary encoding as nonredundant BCD techniques explains binary carry-save adders and compressor trees, can be adapted efficiently to perform decimal operations. A variety of redundant decimal formats and arithmetic have been proposed to improve the performance of BCD multiplication. The BCD carry-save format represents a radix-10 operand using a BCD digit and a Fig.3. Combinational SD Radix-10 Architecture. B. Sign Digit Radix-10 Generation The partial product generation stage comprises the recoding of the multiplier to a SD radix-10 representation, the calculation of the multiplicand multiples in XS-3 code and the generation of the ODDS partial products. The SD radix-10 encoding produces d SD radix- 10 digits Ybk [-5, 5], with k = 0,. . . , d - 1, Yd-1 being the MSD (most significant digit) of the multiplier shown in Fig.4. Each digit Ybk is represented with a 5-bit hot-one code (Y1k, Y2k, Y3k, Y4k, Y5k) to select the appropriate multiple {-1X,0X,1X, . . , 5X} with a 5:1 mux and a sign bit Ysk that controls the negation of the selected multiple shown in fig.4. The negative multiples are obtained by ten’s complementing the positive ones. This is equivalent to taking the nine’s complement of the positive multiple and then adding 1. As we have shown in Section 2, the nine’s complement can be obtained simply by bit inversion. This needs the positive multiplicand multiples to be coded in XS3, with digits in [-3,12]. The d least significant partial products PP[d-1], . . , PP[0] are generated from digits Yb k by using a set of 5:1 muxes. The xor gates at the output of the mux invert the multiplicand multiple, to obtain its 9’s International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.51, December-2015, Pages: 11060-11067 VUTLA APARNA, P. BALA MURALI KRISHNA complement, if the SD radix-10 digit is negative (Ysk = 1). On the other hand, if the signals (Y1k, Y2k, Y3k, Y4k, Y5k) are all (1) zero then PP[k] = 0, but it has to be coded in XS-3 bit encoding 0011. partial product signs are encoded into their (2) MSDs. The generation of the most significant partial product PP[d] is described and only depends on Ysd-1. C. Partial Wmi has been split into two parts, the 3 most significant bits of Product Reduction PPR tree consists of three parts a regular Wg[0]i+1 and least-significant bit, Wmi,0. Digit Wti is obtained binary CSA tree to compute an estimation of the decimal by the concatenation of the most-significant bit of Wg[0]i+1* partial product sum in a binary carry-save form (S, C). 2 and LSB of Wg[0]i, A row of decimal 3:2 digit compressors is used to reduce the 3-op-erand partial product sum (G, Z, Wz) to two BCD operands (A, B), with A represented in excess-6. E. Decimal 128 Implementation The maximum height of the partial product array by the 34 x34-digit BCD multiplier is h = 35. The optimal value for parameter m is m = 31. There-fore, the addition of these carries has been split into two parts. First, a 31-bit counter evaluates Wmi, the 5-bit sum of the 31 fastest carries. Then, the two slowest carries, Ci+1[31] and Ci+1[32], are added to Wmi into a second 5-bit counter. digits Si, Ci, Wt[0]i, Wt[1]i, are reduced to two digits Gi;Zi [0, 15] using a 4-bit binary 4:2 CSA. Finally, the three digits Gi, Zi, Wzi are reduced to two excess-6 BCD digits Ai and Bi by using the decimal digit 3:2 compressor shown in fig.3. It reduces the overall critical path latency, area and improving speed of parallel decimal multiplication to avoid long carry-propagations which Reduces the number of partial products generation. Fig.4. SD radix-10 Generation of Partial Product Digit. A sum correction block to count the carries generated between the digit columns and a decimal digit 3:2 compressor which increments the carry-save sum according to the carries count to obtain the final double-word product (A,B), A being represented with excess-6 BCD digits and B being represented with BCD digits. The PPR tree can be viewed as adjacent columns of h ODDS digits each, h being the column height see fig.4 and h _ d + 1. Fi-nally addition of digits Gi, Zi, Wzi of the column, Gi + Zi + Wzi[0, 45]. We have designed a decimal 3:2 digit compressor that reduces digits Wzi, Gi and Zi to two digits Ai, Bi. The final BCD product by using a single BCD carry propagate addition P = A+B, which is the last step in the multiplication. It required that Ai + Bi [0, 18] to reduce the delay of the final BCD carry-propagate adder operand A is obtained in excess-6, so that we compute [Ai] = Ai + e in excess e = 6. The output digits sum [Ai] + Bi [6,24]. D. Decimal 64 Implementation The maximum number of carries transferred between adjacent columns of the binary 17:2 CSA tree is 15. These carries are labeled Ci+1[0]. . . ,Ci+1[14] (output carries) and Ci[0] , . . ,Ci[14] (input carries). The binary 17:2 CSA tree is built of a first level composed of a 9:2 compressor and a 8:2 compressors, and a second level composed of a 4:2 compressor. To balance the delay of the 17:2 CSA tree and the bit counter, m = 14 has been chosen. The 14-bit counter produces the 4-bit digit Wmi. The computation of Wmi * 6 deserves a more detailed description. The 4-bit digit Wmi = Wi,3 ,Wmi,2 ,Wmi,1, Wmi,0, with Wmi,j being the bits of the digit, is conveniently represented as, C. Generation of Multiplicand Multiples All the required decimal multiplicand multiples, except the 3X multiple, are obtained in a few levels of combinational logic using different digit decoders and performing different fixed m-bit left shifts (Lmshift) in the bit-vector representation of operands. The structure of these digit decoders is discussed Fig.5 shows the block diagram for the generation of the positive multiplicand multiples {X, 2X, 3X, 4X, 5X} for the SD radix-10 recoding. All these multiples are coded in (4221). The X BCD multiplicand is easily recoded to (4221) using the logical expressions Fig.5. Generation of multiplicand multiples for SD radix– 10. International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.51, December-2015, Pages: 11060-11067 Efficient Method of BCD Multiplication to Reduce Power and Delay Values Multiple 2X: Each BCD digit is first recoded to the (5421) each pipelined stage (Fig.4), carrying out σ additions of decimal coding shown in Table 1 (the mapping is unique). An (M+3+M/#blocks) digits. The proposed addition is based on L1shift is performed to the recoded shows figure5 10´s complement BCD numbers, and the carry-chains multiplicand, obtaining the 2X multiple in BCD. Then, the 2X techniques to carry out the full design. The carry-chain BCD multiple is recoded to (4221) using Expressions [2]. addition consists in Computing beforehand all propagating carry and carry generating conditions, in order to reduce the Multiple 3X: It is evaluated by a carry-propagate addition of overall execution time. The implemented adder utilizes an BCD multiples X and 2X in a d-digit BCD adder. The BCD area of 10x (M+3+M/#blocks) LUTs. It has been considered sum digits are recoded to (4221) as indicated by Expression the high performance adder proposed. Fig. 6 illustrates the (2). The latency of the partial product generation for the SD shape of the partial product array, particularizing for d =16. radix-10 scheme is constrained by the generation of 3X. Multiple 4X: It is obtained as 2X × 2, where the 2X multiple is coded in (4221). The second ×2 operation is implemented as a digit recoding from (4221) to code (5211), followed by an L1shift. The design of the (4221) to (5211) digit recorders is described in figure5. The ×2 operation, with input operands coded in (4221) or (5211), is also implemented in the decimal CSA trees used for partial product reduction. Multiple 5X: It is obtained by a simple L3shift of the (4221) recoded multiplicand, with resultant digits coded in (5211). Then, a digit recoding from (5211) to (4221) is performed Fig. 4 shows an example of this operation. 1. digit recoding of the BCD multiplicand digits Xi into a decimal carry 0 ≤ T i ≤ Tmax and a digit -3 ≤ Di ≤ 12-Tmax such as (3) being Tmax the maximum possible value for the decimal carry. 2. The decimal carries transferred between adjacent digits are assimilated obtaining the correct 4-bit representation of XS-3 digits Nxi. Fig.6. Decimal partial product array generated for d =16. C. Partial Product Reduction The partial product arrays generated by the SD radix-10 encoding. Each column of p digits is reduced to two digits by means of a decimal digit p:2 CSA tree. Also, decimal carries are passed between adjacent digit columns. we present the set of preferred decimal codings and the method for decimal carry-save addition. We propose the use of the (4221) and (5211) decimal codings instead of BCD for an efficient implementation of decimal carry-save addition with binary CSAs or full adders. The use of these codes avoids the need for decimal corrections, so we only need to focus on the ×2 decimal multiplications. The implementation of decimal 3:2 CSAs for the proposed codings is also described in fig.6. We present the Decimal p:2 CSA Trees for Digits Coded in (4221) Operands in fig.6 reduced with the area-optimized or delay-optimized decimal 17:2 CSA trees presented in fig.6. Decimal Addition Design: After finishing the previous stage and as soon as its outputs are computed, a (8421) decimal addition is carried out. First of all, an aligned process on the vectors D0 and D1 is developed, just before both of the mare recoded in to (8421). As it is pointed out at the beginning of this figure, various levels of pipeline=log2 (M) +1 are implemented. Several decimal Additions are processed at Fig.7. High-level architecture of the proposed decimal PPR tree (h inputs, 1-digit column). International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.51, December-2015, Pages: 11060-11067 VUTLA APARNA, P. BALA MURALI KRISHNA Decimal Partial Product Reduction: The PPR tree consists These conditional sums correspond to each one of the carry of three parts: [1] a regular binary CSA tree to compute an input values. If the conditional carry out from a digit is one, estimation of the decimal partial product sum in a binary the digit adder performs a -6 subtraction. The selection of the carry-save form (S, C), [2] a sum correction block to count appropriate conditional BCD digit sums is implemented with the carries generated between the digit columns, and [3] a a final level of 2 : 1 multiplexers. To design the carry prefix decimal digit 3:2 compressor which increments the carry-save tree we analyzed the signal arrival profile from the PPRT tree, sum according to the carries count to obtain the final doubleand considered the use of different prefix tree topologies to word product (A,B), A being represented with excess-6 BCD optimize the area for the minimum delay adder. digits and B being represented with BCD digits. The PPR tree can be viewed as adjacent columns of h ODDS digits each, h IV. RESULTS & DISCUSSION being the column height, and h <d+1. Fig. 7 shows the highIn this section, the proposed method synthesis and simulation level architecture of a column of the PPR tree (the ith column) results are reported. with h ODDS digits in [0, 15] (4 bits per digit). Each digit A. Synthesis Result column of the binary CSA tree (the gray colored box in Fig.7) Synthesis Report: Synthesis report shows the design reduces the h input digits and n cin input carry bits, transferred summary including number of LUT’s, I/O buffers, Slice from the previous column of the binary CSA tree, to two registers, flip flop. area and delay estimates and a comparison digits, Si, Ci, with weight 10i. Moreover, a group of ncout with other representative decimal implementations that show carry outputs. The digit columns of the binary CSA tree are the potential advantages of our proposal show table1 and implemented efficiently using 4-bit 3:2, 4:2 and higher order table2. compressors made of full adders. These compressors take TABLE I: Power & Delay for Proposed Method advantage of the delay difference of the inputs and of the sum and carry outputs of the full adders, allowing significant delay reductions. Thus, there is a difference between the value of the carry outs generated at the i-column and the value of the carries transferred to the (i+1)-column. This difference, T, is computed in the sum correction block of every digit column and added to the partial product sum (S, C) in the decimal CSA. (4) The contribution of the column i to the sum correction term T is given by (5) The comparison results in the above table shows that the proposed architecture is 89.7% cost effective and 66.66% delay efficient. Also these circuits are simulated in Xilinx for Automotive CoolRunner2 and comparison results are as shown below. TABLE II: Proposed Systems Compared With Existing Methods Therefore, the sum correction is given by (6) Consequently, the sum correction block evaluates Wix6. This module is composed of a m-bit binary counter and a x6 operator. A straightforward implementation would use m = ncout and a decomposition of the x6 operator into x5 and x1 (both without long carry propagations), and then a four to two decimal reduction to add the correction to the PPR tree result. D. Final Conversion to BCD The selected architecture is a 2d-digit hybrid parallel prefix/carry-select adder, the BCD Quaternary Tree adder. The delay of this adder is slightly higher to the delay of a binary adder of 8d bits with a similar topology. The decimal carries are computed using a carry prefix tree, while two conditional BCD digit sums are computed out of the critical path using 4-bit digit adders which implements (7) B. Simulation Result The proposed system is implemented by Xilinx software and the simulation waveforms of each module are shown below. Proposed Method For BCD Codes Results: In the fig.8 shows that in This multiplication of the BCD codes we are using the inputs and outputs are the a, b, clk1,c1k2 and function f. We are given the any binary values are given to the inputs such as a, b, and multiply the input values and gives the output in the function f. International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.51, December-2015, Pages: 11060-11067 Efficient Method of BCD Multiplication to Reduce Power and Delay Values hand, recoding of XS-3 partial products to the ODDS representation is straightforward. The ODDS representation uses the redundant digit-set [0, 15] and a 4-bit binary encoding (BCD encoding), which allows the use of a binary carry-save adder tree to perform partial product reduction in a very efficient way. We have presented architectures for IEEE754 formats, Decimal64 (16 precision digits) and Decimal128 (34 precision digits). The area and delay figures estimated from both a theoretical model and synthesis show that our BCD multiplier presents 20-35 percent less area than other designs for a given target delay. Fig.8. Simulation waveform of the BCD codes. Final Result for BCD Codes: In the fig.8 and fig.9 shows that in This multiplication of the BCD codes we are using the inputs and outputs are the a, b, clk1,c1k2 and function f. We are given the any binary values or also we are hexadecimal values are given to the inputs such as a, b, and multiply the input values and gives the output in the function f. Fig.9. Simulation waveform of the BCD codes final results. V. CONCLUSION In this paper we have presented the algorithm and architecture of a new BCD parallel multiplier. The improvements of the proposed architecture rely on the use of certain redundant BCD codes, the XS-3 and ODDS representations. Partial products can be generated very fast in the XS-3 representation using the SD radix-10 PPG scheme: positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) are precompiled in a carry-free way, while negative multiples are obtained by bit inversion of the positive ones. On the other VI. REFERENCES [1] A. Aswal, M. G. Perumal, and G. N. S. Prasanna, ―On basic financial decimal operations on binary machines,‖ IEEE Trans. Comput., vol. 61, no. 8, pp. 1084– 1096, Aug. 2012. [2] M. F. Cowlishaw, E. M. Schwarz, R. M. Smith, and C. F. Webb, ―A decimal floating point specification,‖ in Proc. 15th IEEE Symp. Comput. Arithmetic, Jun. 2001, pp. 147–154. [3] M. F. Cowlishaw, ―Decimal floating point: Algorism for computers,‖ in Proc. 16th IEEE Symp. Comput. Arithmetic, Jul. 2003, pp. [4] S. Carlough and E. Schwarz, Power6 decimal divide,‖ in Proc. 18th IEEE Symp. Appl.-Specific Syst., Arch., Process., Jul. 2007, pp. 128–133. [5] S. Carlough, S. Mueller, A. Collura, and M. Kroener, The IBM zEnterprise-196 decimal floating point accelerator,‖ in Proc. 20th IEEE Symp. Comput. Arithmetic, Jul. 2011, pp. 139–146. [6] L. Dadda, Multioperand parallel decimal adder: A mixed binary and BCD approach,‖ IEEE Trans. Comput., vol. 56, no. 10, pp. 1320–1328, Oct. 2007. [7]. Alvaro Vazques, Elisardo Antelo and Javier Bruguera (2014), ‘Fast Radix-10 Multiplication Using Redundant BCD Codes’, IEEE Vol. 63, No.8, pp.325-338. [8]. Carlough S and Schwarz E (2007), ‘Power Six Deci-mal Divide’, Proc. 18th IEEE Symp. on Application-Spe-cific Systems, Architectures, and Processors, Vol. 89, No. 8, pp. 28–133. [9]. Dadda L (2007), ‘Multioperand and Parallel Decimal Adder: A Mixed Binary and BCD Approach’, IEEE Transactions on Computers, Vol. 56, No. 10, pp. 1320–1328. [10]. Dadda L and Nannarelli A (2008), ‘A Variant of a Radix- 10 Combinational Multiplier’, IEEE Int. Symposium in Circuits and Systems, ISCAS 2008, Vol. 37, No. 2, pp. 3370– 3373. [11]. Erle M.A, Schwarz E.M and M. J. Schulte (2005), ‘Decimal Multiplication With Efficient Partial Product Generation’, Proc. 17th IEEE Symposium on Computer Arithmetic, Vol. 73, No. 4, pp. 21–28. [12]. Erle M.A and M. J. Schulte (2003), ‘Decimal Multiplication Via Carry-Save Addition’, Proc. IEEE Int. Conf. on Application-Specific Systems, Architectures, and Pro-cessors, Vol. 51, No.7, pp. 348–358. [13]. Gorgin S and Jaberipur G (2013), ’High Speed Paral-lel Decimal Multiplication with Redundant Internal En-codings’, IEEE Transactions on Computers, Vol.45, No 160, pp. 232249. International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.51, December-2015, Pages: 11060-11067 VUTLA APARNA, P. BALA MURALI KRISHNA [14]. Han L and Ko S (2013), ‘High Speed Parallel Decimal Multiplication with Redundant Internal Encodings’, IEEE Transactions on Computers, Vol. 62, No. 5 pp. 956–968. [15]. G.Jaberipur, and A. Kaivani (2009), ‘Improving the Speed of Parallel Decimal Multiplication’, IEEE Transactions on Computers, Vol. 58, No. 11, pp.39–52. [16]. Vazquez A,Antelo E and Montuschi P (2010), ’Improved Design of High-Performance Parallel Decimal Multipliers’, IEEE Transactions on Computers, Vol. 59, No. 5, pp. 679–693. International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.51, December-2015, Pages: 11060-11067
© Copyright 2026 Paperzz