Efficient Method of BCD Multiplication to Reduce Power and

ISSN 2319-8885
Vol.04,Issue.51,
December-2015,
Pages:11060-11067
www.ijsetr.com
Efficient Method of BCD Multiplication to Reduce Power and Delay Values
VUTLA APARNA1, P. BALA MURALI KRISHNA2
1
PG Scholar, Dept of ECE, Sri Mittapalli Institute of Technology for Women, Guntur, AP, India,
E-mail: [email protected].
2
Professor, Dept of ECE, Sri Mittapalli Institute of Technology for Women, Guntur, AP, India,
E-mail: [email protected].
Abstract: In this paper, We propose an algorithm and architecture of a BCD parallel multiplier that exploits some properties of
two different redundant BCD codes to speed up its computation. In this paper, we also develop new techniques to reduce the
latency and area of previous representative high performance implementations. The Partial products are generated in parallel
using a signed-digit radix-10 recoding of BCD multiplier with the digit set [-5, 5], and set of positive multiplicand multiples (0X,
1X, 2X, 3X, 4X, 5X) coded in excess-3 code(XS-3). This encoding has many advantages. First, it is self-complementing codes, so
that a negative multiplicand multiple can be obtained by just inverting bits of the corresponding positive one. The available
redundancy allows a fast and simple generation of multiplicand multiples in a carry free way. The partial products can be recoded
to the overloaded BCD representation (ODDS) by just adding a constant factor into the partial product reduction tree. To show the
advantages of our proposed architecture, we have synthesized a RTL model for 16 x 16-digit multiplications and performed a
comparative survey of the existing representative designs. We show that the proposed multiplier has an area improvement roughly
in the range 20-35 percent for similar target delays with respect to the fastest implementation.
Keywords: Parallel Multiplication, Overloaded BCD Representation, Redundant Excess-3 Code.
I. INTRODUCTION
Decimal fixed-point and floating-point formats are important
in financial, commercial, and user-oriented computing, where
conversion and rounding errors that are inherent to floatingpoint binary representations cannot be tolerated [1]. The new
IEEE 754-2008 Standard for Floating- Point Arithmetic [2],
which contains a format and specification for decimal
floating-point (DFP) arithmetic [1], [2], has encouraged a
significant amount of research in decimal hardware [3], [5] .
Furthermore, current IBM Power and z/System families of
microprocessors [5], and the Fujitsu Sparc X microprocessor
[6], oriented to servers and mainframes, already include fully
IEEE 754-2008 compliant decimal floating-point units
(DFPUs) for Decimal64 (16 precision digits) and Decimal128
(34 precision digits) formats. Since area and power
dissipation are critical design factors in state-of-the-art
DFPUs, multiplication and division are performed iteratively
by means of digit-by-digit algorithms [4], [5], and therefore
they present low performance. Moreover, the aggressive cycle
time of these processors puts an additional constraint on the
use of parallel techniques [4], [6], [10] for reducing the
latency of DFP multiplication in high-performance DFPUs.
Thus, efficient algorithms for accelerating DFP multiplication
should result in regular VLSI layouts that allow an aggressive
pipelining. Hardware implementations normally use BCD
instead of binary to manipulate decimal fixed-point operands
and integer significant of DFP numbers for easy conversion
between machine and user representations [1], [15].
BCD encodes a number X in decimal (non-redundant
radix-10) format, with each decimal digit Xi £ [0, 9]
represented in a 4-bit binary number system. However, BCD
is less efficient for encoding integers than binary, since codes
10 to 15 are unused. Moreover, the implementation of BCD
arithmetic has more complications than binary, which lead to
area and delay penalties in the resulting arithmetic units. A
variety of redundant decimal formats and arithmetic’s have
been proposed to improve the performance of BCD
multiplication. The BCD carry-save format [9] represents a
radix-10 operand using a BCD digit and a carry bit at each
decimal position. It is intended for carry-free accumulation of
BCD partial products using rows of BCD digit adders
arranged in linear [6], [9] or tree-like configurations [15].
Decimal signed-digit (SD) representations [10], [11], [14] rely
on a redundant digit set f {a, … 0, . . . a}, 5 ≤ a ≤9, to allow
decimal carry-free addition. The overloaded BCD (or
ODDS—overloaded decimal digit set) representation was
proposed to improve decimal multioperand addition [13], and
sequential [8] and parallel [12], [13] decimal multiplications.
In this code, each 4-bit binary value represents a redundant
radix-10 digit Xi £[0,15]. The ODDS presents interesting
properties for a fast and efficient hardware implementation of
decimal arithmetic:(1) it is a redundant decimal representation
so that it allows carry-free generation of both simple and
complex decimal multiples (2X, 3X, 4X, 5X, 6X,. . .) and
addition, (2) since digits are represented in the binary number
system, digit operations can be performed with binary
Copyright @ 2015 IJSETR. All rights reserved.
VUTLA APARNA, P. BALA MURALI KRISHNA
arithmetic, and (3) unlike BCD, there is no need to implement
SD radix- 10 digits. Each SD radix-10 digit controls a level
additional hardware to correct invalid 4-bit combinations. A
of 5:1 muxes , which selects a positive multiplicand [13]
disadvantage with respect to signed-digit and selfmultiple (0, X, 2X, 3X, 4X, 5X) coded in (4221). To obtain
complementing codes, is a slightly more complex
each partial product a level of XOR gates inverts the output
implementation of 9’s complement operation for negation of
bits of the 5:1 muxes when the sign of the corresponding SD
operands and subtraction.
radix-10 digit is negative. Before being reduced the d+ 1
partial product, coded in (4221), are aligned according to their
In this work, we focus on the improvement of parallel
decimal weights. Each p-digit column of the partial product
decimal multiplication by exploiting the redundancy of two
the decimal digit p: 2 CSA trees. The number of digits to be
decimal representations: the ODDS and the redundant BCD
reduced [5] for each column varies from p=d+1 to p= 2. Thus,
excess-3 (XS-3) representation, a self-complementing code
the d+ 1 partial product are reduced to two 2d digit operands
with the digit set [-3, 12]. We use a minimally redundant digit
S and H coded in (4221).
set for the recoding of the BCD multiplier digits, the signeddigit radix-10 recoding [30], that is, the recoded signed digits
are in the set f{-5,-4,-3,-2-1,0,1,2,3,4, 5}. For this digit set,
the main issue is to perform the * 3 multiple without long
carry-propagation (note that * 2 and * 5 are easy multiples for
decimal [3] and that * 4 is generated as two consecutive * 2
operations). By performing the reduction of partial products
coded in ODDS via binary carry-save arithmetic. Partial
product generation (PPG). By generating positive
multiplicand multiples coded in XS-3 in a carry free form. An
advantage of the XS-3 representation over non-redundant
decimal codes (BCD and 4221/ 5211 [3]) is that all the
interesting multiples for decimal partial product generation,
including the 3X multiple, can be implemented in constant
time with an equivalent delay of about three XOR gate levels.
Moreover, since XS-3 is a self-complementing code, the 9’s
complement of a positive multiple can be obtained by just
inverting its bits as in binary. Partial products can be recoded
from the XS-3 representation to the ODDS representation by
just adding a constant factor into the partial product reduction
tree. The resultant partial product reduction tree is
implemented using regular structures of binary carry-save
Fig.1. Binary P: 2 CSA Tree.
adders or compressors. The 4-bit binary encoding of ODDS
operands allows a more efficient mapping of decimal
The nine’s complement of a positive decimal operand is
algorithms into binary techniques. By contrast, signed-digit
given
by this implementation of leads to a complex impleradix-10 and BCD carry-save redundant representations
mentation, since the Zi digits of the multiples generated may
require specific radix-10 digit adders [1], [7], [16]. The rest of
take values higher than 9. A simple implementation is
this paper is organized as follows. In Section 2, preliminaries
obtained by observing that the excess-3 of the nine’s
are discussed. Section 3 explains the proposed method.
complement of an operand is equal to the bit-complement of
Section 4 shows the experimental results. Finally, Section 5
the operand coded in excess-3 [5]. The final product is a 2dconcludes this paper.
digit BCD word given by P=2H +S. Before being added, S
and H need to be processed. S is recoded from (4221) to BCD
II. RADIX-10 PARALLEL DECIMAL MULTIPLIER
excess-6. The H × 2 multiplication is performed in parallel
The Radix-10 architecture for d-digit BCD decimal fixedwith the recoding of S. This ×2 blocks uses a (4221) to (5421)
point parallel multiplication [3] is based on the techniques for
digit recoder and a 1-bit wired left shift to obtain the operand
partial product generation and reduction respectively. The
2H coded in BCD shows in Fig.3. For the final BCD carry
code (4221) and (5211) is used instead of BCD to represent
propagate addition [4] uses a quaternary tree (Q-T) adder
the partial product is the main feature of this architecture.
based on conditional speculative decimal addition. It has low
This improves the reduction of decimal partial product [7]
latency and requires less hardware than other alternatives. C.
with respect to other proposals, in terms of latency and area is
Partial Product Reduction The partial product arrays
expected. The architecture of the d-digit SD radix -10
generated by the SD radix-10 encoding each column of p
multiplier consists of the following stages, Generation of
digits is reduced to two digits by means of a decimal digit p:2
decimal partial products coded in (4221), reduction of partial
CSA tree shown in Fig.1. The decimal carries are passed beproducts [9] and a final BCD carry propagate addition. B.
tween adjacent digit columns and decimal coding method
Partial Product Generation The generation of the d+1 partial
used for decimal carry-save addition [6].
product is performed by an encoding of the multiplier into d
International Journal of Scientific Engineering and Technology Research
Volume.04, IssueNo.51, December-2015, Pages: 11060-11067
Efficient Method of BCD Multiplication to Reduce Power and Delay Values
carry bit at each decimal multiplication area and power
dissipation are critical design factors in DFPU. Multiplication
and division are performed iteratively by means of digit-bydigit algorithms for reducing the latency of DFP
multiplication in high performance DFPUs.
Fig.2. Scheme of x2 for BCD-4221.
To perform the decimal coding instead of BCD for an
efficient implementation of decimal carry-save addition [15]
with binary CSAs or full adders use (4221) and (5211). The
use of these codes avoids the need for decimal corrections and
need to focus on the ×2 decimal multiplication shown in
Fig.2. The Decimal p:2 CSA Trees for Digits Coded in (4221)
Operands Long carry propagation because of that area and
delay is more in this system. The proposed method is a new
delay measurement technique for the small-delay defect
detection technique.
III. DECIMAL PARTIAL PRODUCT GENERATION
A. SD Radix-10 Architecture
The algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant
BCD codes to speed up its computations are redundant BCD
excess-3 code (XS-3) and the overloaded BCD representation
(ODDS). Proposed techniques are developed to reduce
significantly the latency and area of previous representative
high performance implementations. A. Partial Product
Generation Partial products are generated in parallel using a
signed-digit radix-10 recoding of the BCD multiplier with the
digit set [-5,5] and a set of positive multiplicand multiples
(0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3 encoding has
several advantages. First, it is a self-complementing code the
negative multiplicand multiple can be obtained by just
inverting the bits of the corresponding positive one. The
available redundancy allows a fast and simple generation of
multiplicand multiples in a carry-free way. Finally, the partial
products can be recoded to the ODDS representation by just
adding a constant factor into the partial product reduction
tree. The ODDS uses a similar 4-bit binary encoding as nonredundant BCD techniques explains binary carry-save adders
and compressor trees, can be adapted efficiently to perform
decimal operations. A variety of redundant decimal formats
and arithmetic have been proposed to improve the
performance of BCD multiplication. The BCD carry-save
format represents a radix-10 operand using a BCD digit and a
Fig.3. Combinational SD Radix-10 Architecture.
B. Sign Digit Radix-10 Generation
The partial product generation stage comprises the recoding
of the multiplier to a SD radix-10 representation, the
calculation of the multiplicand multiples in XS-3 code and the
generation of the ODDS partial products. The SD radix-10
encoding produces d SD radix- 10 digits Ybk [-5, 5], with k =
0,. . . , d - 1, Yd-1 being the MSD (most significant digit) of
the multiplier shown in Fig.4. Each digit Ybk is represented
with a 5-bit hot-one code (Y1k, Y2k, Y3k, Y4k, Y5k) to select
the appropriate multiple {-1X,0X,1X, . . , 5X} with a 5:1 mux
and a sign bit Ysk that controls the negation of the selected
multiple shown in fig.4. The negative multiples are obtained
by ten’s complementing the positive ones. This is equivalent
to taking the nine’s complement of the positive multiple and
then adding 1. As we have shown in Section 2, the nine’s
complement can be obtained simply by bit inversion. This
needs the positive multiplicand multiples to be coded in XS3, with digits in [-3,12]. The d least significant partial
products PP[d-1], . . , PP[0] are generated from digits Yb k by
using a set of 5:1 muxes. The xor gates at the output of the
mux invert the multiplicand multiple, to obtain its 9’s
International Journal of Scientific Engineering and Technology Research
Volume.04, IssueNo.51, December-2015, Pages: 11060-11067
VUTLA APARNA, P. BALA MURALI KRISHNA
complement, if the SD radix-10 digit is negative (Ysk = 1). On
the other hand, if the signals (Y1k, Y2k, Y3k, Y4k, Y5k) are all
(1)
zero then PP[k] = 0, but it has to be coded in XS-3 bit
encoding 0011. partial product signs are encoded into their
(2)
MSDs. The generation of the most significant partial product
PP[d] is described and only depends on Ysd-1. C. Partial
Wmi has been split into two parts, the 3 most significant bits of
Product Reduction PPR tree consists of three parts a regular
Wg[0]i+1 and least-significant bit, Wmi,0. Digit Wti is obtained
binary CSA tree to compute an estimation of the decimal
by the concatenation of the most-significant bit of Wg[0]i+1*
partial product sum in a binary carry-save form (S, C).
2 and LSB of Wg[0]i, A row of decimal 3:2 digit compressors
is used to reduce the 3-op-erand partial product sum (G, Z,
Wz) to two BCD operands (A, B), with A represented in
excess-6. E. Decimal 128 Implementation The maximum
height of the partial product array by the 34 x34-digit BCD
multiplier is h = 35. The optimal value for parameter m is m =
31. There-fore, the addition of these carries has been split into
two parts. First, a 31-bit counter evaluates Wmi, the 5-bit sum
of the 31 fastest carries. Then, the two slowest carries,
Ci+1[31] and Ci+1[32], are added to Wmi into a second 5-bit
counter. digits Si, Ci, Wt[0]i, Wt[1]i, are reduced to two digits
Gi;Zi [0, 15] using a 4-bit binary 4:2 CSA. Finally, the three
digits Gi, Zi, Wzi are reduced to two excess-6 BCD digits Ai
and Bi by using the decimal digit 3:2 compressor shown in
fig.3. It reduces the overall critical path latency, area and
improving speed of parallel decimal multiplication to avoid
long carry-propagations which Reduces the number of partial
products generation.
Fig.4. SD radix-10 Generation of Partial Product Digit.
A sum correction block to count the carries generated
between the digit columns and a decimal digit 3:2 compressor which increments the carry-save sum according to the
carries count to obtain the final double-word product (A,B), A
being represented with excess-6 BCD digits and B being
represented with BCD digits. The PPR tree can be viewed as
adjacent columns of h ODDS digits each, h being the column
height see fig.4 and h _ d + 1. Fi-nally addition of digits Gi,
Zi, Wzi of the column, Gi + Zi + Wzi[0, 45]. We have designed
a decimal 3:2 digit compressor that reduces digits Wzi, Gi and
Zi to two digits Ai, Bi. The final BCD product by using a
single BCD carry propagate addition P = A+B, which is the
last step in the multiplication. It required that Ai + Bi [0, 18]
to reduce the delay of the final BCD carry-propagate adder
operand A is obtained in excess-6, so that we compute [Ai] =
Ai + e in excess e = 6. The output digits sum [Ai] + Bi [6,24]. D.
Decimal 64 Implementation The maximum number of carries
transferred between adjacent columns of the binary 17:2 CSA
tree is 15. These carries are labeled Ci+1[0]. . . ,Ci+1[14]
(output carries) and Ci[0] , . . ,Ci[14] (input carries). The
binary 17:2 CSA tree is built of a first level composed of a 9:2
compressor and a 8:2 compressors, and a second level
composed of a 4:2 compressor. To balance the delay of the
17:2 CSA tree and the bit counter, m = 14 has been chosen.
The 14-bit counter produces the 4-bit digit Wmi. The
computation of Wmi * 6 deserves a more detailed description.
The 4-bit digit Wmi = Wi,3 ,Wmi,2 ,Wmi,1, Wmi,0, with Wmi,j being
the bits of the digit, is conveniently represented as,
C. Generation of Multiplicand Multiples
All the required decimal multiplicand multiples, except the
3X multiple, are obtained in a few levels of combinational
logic using different digit decoders and performing different
fixed m-bit left shifts (Lmshift) in the bit-vector
representation of operands. The structure of these digit
decoders is discussed Fig.5 shows the block diagram for the
generation of the positive multiplicand multiples {X, 2X, 3X,
4X, 5X} for the SD radix-10 recoding. All these multiples are
coded in (4221). The X BCD multiplicand is easily recoded to
(4221) using the logical expressions
Fig.5. Generation of multiplicand multiples for SD radix–
10.
International Journal of Scientific Engineering and Technology Research
Volume.04, IssueNo.51, December-2015, Pages: 11060-11067
Efficient Method of BCD Multiplication to Reduce Power and Delay Values
Multiple 2X: Each BCD digit is first recoded to the (5421)
each pipelined stage (Fig.4), carrying out σ additions of
decimal coding shown in Table 1 (the mapping is unique). An
(M+3+M/#blocks) digits. The proposed addition is based on
L1shift is performed to the recoded shows figure5
10´s complement BCD numbers, and the carry-chains
multiplicand, obtaining the 2X multiple in BCD. Then, the 2X
techniques to carry out the full design. The carry-chain
BCD multiple is recoded to (4221) using Expressions [2].
addition consists in Computing beforehand all propagating
carry and carry generating conditions, in order to reduce the
Multiple 3X: It is evaluated by a carry-propagate addition of
overall execution time. The implemented adder utilizes an
BCD multiples X and 2X in a d-digit BCD adder. The BCD
area of 10x (M+3+M/#blocks) LUTs. It has been considered
sum digits are recoded to (4221) as indicated by Expression
the high performance adder proposed. Fig. 6 illustrates the
(2). The latency of the partial product generation for the SD
shape of the partial product array, particularizing for d =16.
radix-10 scheme is constrained by the generation of 3X.
Multiple 4X: It is obtained as 2X × 2, where the 2X multiple
is coded in (4221). The second ×2 operation is implemented
as a digit recoding from (4221) to code (5211), followed by
an L1shift. The design of the (4221) to (5211) digit recorders
is described in figure5. The ×2 operation, with input operands
coded in (4221) or (5211), is also implemented in the decimal
CSA trees used for partial product reduction.
Multiple 5X: It is obtained by a simple L3shift of the (4221)
recoded multiplicand, with resultant digits coded in (5211).
Then, a digit recoding from (5211) to (4221) is performed
Fig. 4 shows an example of this operation.
1. digit recoding of the BCD multiplicand digits Xi into a
decimal carry 0 ≤ T i ≤ Tmax and a digit -3 ≤ Di ≤ 12-Tmax
such as
(3)
being Tmax the maximum possible value for the decimal carry.
2. The decimal carries transferred between adjacent digits are
assimilated obtaining the correct 4-bit representation of XS-3
digits Nxi.
Fig.6. Decimal partial product array generated for d =16.
C. Partial Product Reduction
The partial product arrays generated by the SD radix-10
encoding. Each column of p digits is reduced to two digits by
means of a decimal digit p:2 CSA tree. Also, decimal carries
are passed between adjacent digit columns. we present the set
of preferred decimal codings and the method for decimal
carry-save addition. We propose the use of the (4221) and
(5211) decimal codings instead of BCD for an efficient
implementation of decimal carry-save addition with binary
CSAs or full adders. The use of these codes avoids the need
for decimal corrections, so we only need to focus on the ×2
decimal multiplications. The implementation of decimal 3:2
CSAs for the proposed codings is also described in fig.6. We
present the Decimal p:2 CSA Trees for Digits Coded in
(4221) Operands in fig.6 reduced with the area-optimized or
delay-optimized decimal 17:2 CSA trees presented in fig.6.
Decimal Addition Design: After finishing the previous stage
and as soon as its outputs are computed, a (8421) decimal
addition is carried out. First of all, an aligned process on the
vectors D0 and D1 is developed, just before both of the mare
recoded in to (8421). As it is pointed out at the beginning of
this figure, various levels of pipeline=log2 (M) +1 are
implemented. Several decimal Additions are processed at
Fig.7. High-level architecture of the proposed decimal
PPR tree (h inputs, 1-digit column).
International Journal of Scientific Engineering and Technology Research
Volume.04, IssueNo.51, December-2015, Pages: 11060-11067
VUTLA APARNA, P. BALA MURALI KRISHNA
Decimal Partial Product Reduction: The PPR tree consists
These conditional sums correspond to each one of the carry
of three parts: [1] a regular binary CSA tree to compute an
input values. If the conditional carry out from a digit is one,
estimation of the decimal partial product sum in a binary
the digit adder performs a -6 subtraction. The selection of the
carry-save form (S, C), [2] a sum correction block to count
appropriate conditional BCD digit sums is implemented with
the carries generated between the digit columns, and [3] a
a final level of 2 : 1 multiplexers. To design the carry prefix
decimal digit 3:2 compressor which increments the carry-save
tree we analyzed the signal arrival profile from the PPRT tree,
sum according to the carries count to obtain the final doubleand considered the use of different prefix tree topologies to
word product (A,B), A being represented with excess-6 BCD
optimize the area for the minimum delay adder.
digits and B being represented with BCD digits. The PPR tree
can be viewed as adjacent columns of h ODDS digits each, h
IV. RESULTS & DISCUSSION
being the column height, and h <d+1. Fig. 7 shows the highIn this section, the proposed method synthesis and simulation
level architecture of a column of the PPR tree (the ith column)
results are reported.
with h ODDS digits in [0, 15] (4 bits per digit). Each digit
A. Synthesis Result
column of the binary CSA tree (the gray colored box in Fig.7)
Synthesis Report: Synthesis report shows the design
reduces the h input digits and n cin input carry bits, transferred
summary including number of LUT’s, I/O buffers, Slice
from the previous column of the binary CSA tree, to two
registers, flip flop. area and delay estimates and a comparison
digits, Si, Ci, with weight 10i. Moreover, a group of ncout
with other representative decimal implementations that show
carry outputs. The digit columns of the binary CSA tree are
the potential advantages of our proposal show table1 and
implemented efficiently using 4-bit 3:2, 4:2 and higher order
table2.
compressors made of full adders. These compressors take
TABLE I: Power & Delay for Proposed Method
advantage of the delay difference of the inputs and of the sum
and carry outputs of the full adders, allowing significant delay
reductions. Thus, there is a difference between the value of
the carry outs generated at the i-column and the value of the
carries transferred to the (i+1)-column. This difference, T, is
computed in the sum correction block of every digit column
and added to the partial product sum (S, C) in the decimal
CSA.
(4)
The contribution of the column i to the sum correction
term T is given by
(5)
The comparison results in the above table shows that the
proposed architecture is 89.7% cost effective and 66.66%
delay efficient. Also these circuits are simulated in Xilinx for
Automotive CoolRunner2 and comparison results are as
shown below.
TABLE II: Proposed Systems Compared With Existing
Methods
Therefore, the sum correction is given by
(6)
Consequently, the sum correction block evaluates Wix6.
This module is composed of a m-bit binary counter and a x6
operator. A straightforward implementation would use m =
ncout and a decomposition of the x6 operator into x5 and x1
(both without long carry propagations), and then a four to two
decimal reduction to add the correction to the PPR tree result.
D. Final Conversion to BCD
The selected architecture is a 2d-digit hybrid parallel
prefix/carry-select adder, the BCD Quaternary Tree adder.
The delay of this adder is slightly higher to the delay of a
binary adder of 8d bits with a similar topology. The decimal
carries are computed using a carry prefix tree, while two
conditional BCD digit sums are computed out of the critical
path using 4-bit digit adders which implements
(7)
B. Simulation Result
The proposed system is implemented by Xilinx software
and the simulation waveforms of each module are shown
below.
Proposed Method For BCD Codes Results: In the fig.8
shows that in This multiplication of the BCD codes we are
using the inputs and outputs are the a, b, clk1,c1k2 and
function f. We are given the any binary values are given to the
inputs such as a, b, and multiply the input values and gives
the output in the function f.
International Journal of Scientific Engineering and Technology Research
Volume.04, IssueNo.51, December-2015, Pages: 11060-11067
Efficient Method of BCD Multiplication to Reduce Power and Delay Values
hand, recoding of XS-3 partial products to the ODDS
representation is straightforward. The ODDS representation
uses the redundant digit-set [0, 15] and a 4-bit binary
encoding (BCD encoding), which allows the use of a binary
carry-save adder tree to perform partial product reduction in a
very efficient way. We have presented architectures for IEEE754 formats, Decimal64 (16 precision digits) and Decimal128
(34 precision digits). The area and delay figures estimated
from both a theoretical model and synthesis show that our
BCD multiplier presents 20-35 percent less area than other
designs for a given target delay.
Fig.8. Simulation waveform of the BCD codes.
Final Result for BCD Codes: In the fig.8 and fig.9 shows
that in This multiplication of the BCD codes we are using the
inputs and outputs are the a, b, clk1,c1k2 and function f. We
are given the any binary values or also we are hexadecimal
values are given to the inputs such as a, b, and multiply the
input values and gives the output in the function f.
Fig.9. Simulation waveform of the BCD codes final results.
V. CONCLUSION
In this paper we have presented the algorithm and
architecture of a new BCD parallel multiplier. The
improvements of the proposed architecture rely on the use of
certain redundant BCD codes, the XS-3 and ODDS
representations. Partial products can be generated very fast in
the XS-3 representation using the SD radix-10 PPG scheme:
positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) are
precompiled in a carry-free way, while negative multiples are
obtained by bit inversion of the positive ones. On the other
VI. REFERENCES
[1] A. Aswal, M. G. Perumal, and G. N. S. Prasanna, ―On
basic financial decimal operations on binary machines,‖ IEEE
Trans. Comput., vol. 61, no. 8, pp. 1084– 1096, Aug. 2012.
[2] M. F. Cowlishaw, E. M. Schwarz, R. M. Smith, and C. F.
Webb, ―A decimal floating point specification,‖ in Proc. 15th
IEEE Symp. Comput. Arithmetic, Jun. 2001, pp. 147–154.
[3] M. F. Cowlishaw, ―Decimal floating point: Algorism for
computers,‖ in Proc. 16th IEEE Symp. Comput. Arithmetic,
Jul. 2003, pp.
[4] S. Carlough and E. Schwarz, Power6 decimal divide,‖ in
Proc. 18th IEEE Symp. Appl.-Specific Syst., Arch., Process.,
Jul. 2007, pp. 128–133.
[5] S. Carlough, S. Mueller, A. Collura, and M. Kroener, The
IBM zEnterprise-196 decimal floating point accelerator,‖ in
Proc. 20th IEEE Symp. Comput. Arithmetic, Jul. 2011, pp.
139–146.
[6] L. Dadda, Multioperand parallel decimal adder: A mixed
binary and BCD approach,‖ IEEE Trans. Comput., vol. 56, no.
10, pp. 1320–1328, Oct. 2007.
[7]. Alvaro Vazques, Elisardo Antelo and Javier Bruguera
(2014), ‘Fast Radix-10 Multiplication Using Redundant BCD
Codes’, IEEE Vol. 63, No.8, pp.325-338.
[8]. Carlough S and Schwarz E (2007), ‘Power Six Deci-mal
Divide’, Proc. 18th IEEE Symp. on Application-Spe-cific
Systems, Architectures, and Processors, Vol. 89, No. 8, pp.
28–133.
[9]. Dadda L (2007), ‘Multioperand and Parallel Decimal
Adder: A Mixed Binary and BCD Approach’, IEEE Transactions on Computers, Vol. 56, No. 10, pp. 1320–1328.
[10]. Dadda L and Nannarelli A (2008), ‘A Variant of a Radix- 10 Combinational Multiplier’, IEEE Int. Symposium in
Circuits and Systems, ISCAS 2008, Vol. 37, No. 2, pp. 3370–
3373.
[11]. Erle M.A, Schwarz E.M and M. J. Schulte (2005),
‘Decimal Multiplication With Efficient Partial Product
Generation’, Proc. 17th IEEE Symposium on Computer
Arithmetic, Vol. 73, No. 4, pp. 21–28.
[12]. Erle M.A and M. J. Schulte (2003), ‘Decimal Multiplication Via Carry-Save Addition’, Proc. IEEE Int. Conf. on
Application-Specific Systems, Architectures, and Pro-cessors,
Vol. 51, No.7, pp. 348–358.
[13]. Gorgin S and Jaberipur G (2013), ’High Speed Paral-lel
Decimal Multiplication with Redundant Internal En-codings’,
IEEE Transactions on Computers, Vol.45, No 160, pp. 232249.
International Journal of Scientific Engineering and Technology Research
Volume.04, IssueNo.51, December-2015, Pages: 11060-11067
VUTLA APARNA, P. BALA MURALI KRISHNA
[14]. Han L and Ko S (2013), ‘High Speed Parallel Decimal
Multiplication with Redundant Internal Encodings’, IEEE
Transactions on Computers, Vol. 62, No. 5 pp. 956–968.
[15]. G.Jaberipur, and A. Kaivani (2009), ‘Improving the
Speed of Parallel Decimal Multiplication’, IEEE Transactions on Computers, Vol. 58, No. 11, pp.39–52.
[16]. Vazquez A,Antelo E and Montuschi P (2010), ’Improved Design of High-Performance Parallel Decimal
Multipliers’, IEEE Transactions on Computers, Vol. 59, No.
5, pp. 679–693.
International Journal of Scientific Engineering and Technology Research
Volume.04, IssueNo.51, December-2015, Pages: 11060-11067