A Power Saving Multiplication Algorithm

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9(2016) pp 6200-6203
© Research India Publications. http://www.ripublication.com
A Power Saving Multiplication Algorithm
S. Subha*
SITE, VIT, Vellore, Tamil Nadu, India.
R. Sakthivel
SENSE, VIT, Vellore, Tamil Nadu, India.
Abstract
Multiplication is widely used in many applications. A number
of multiplication algorithms like Booth algorithm, radix-n
algorithm have been proposed in literature. This paper
proposes multiplication algorithm for integers in sign
magnitude notation. The magnitudes are the operands for the
multiplication. The method divides the multiplier and
multiplicand into slices of two bits. The partial products for all
slices is calculated and added appropriately to give the
result.The proposed algorithm is synthesized using Xilinx tool.
An improvement in area utilization for ASIC configuration,
improvement in power consumption for ASIC configuration
with degradation in time is observed for the simulated
algorithm. Degradation in FPGA area utilization with no
change in power consumption and execution time degradation
is observed. The proposed model can be extended for two’s
complement multiplication, floating point multiplication.
Keywords: Area Improvement, Bit Slicing, Multiplication,
Power Improvement, Sign Magnitude Notation
Introduction
Multiplication is widely used operation in computers. A
number of multiplication algorithms have been proposed in
literature. Some of the multiplication algorithms are repeated
addition, radix-n algorithm with special case as Booth’s
algorithm, array multipliers. Some algorithms in parallel
processing have been proposed for multiplication. In all these
algorithms, the partial products need to perform either addition
or shifting or multiplication of bits.
Multiplication algorithms with fast multiplication, array
multipliers are discussed [2]. The authors in [5] suggest a rule
to modify Booth algorithm for any radix (> 2). The
modification is suggested for multipliers of any size with radix
as any power of two. The author in [6] proposes method to
correct the Booth algorithm to be implemented for any radix
as power of two and operand length. The authors in [7]
propose method to reduce power consumption by choosing
operand with smaller dynamic range for Booth encoding. In
the paper [3] low power two’s complement multiplication is
developed by minimizing switching activities of partial
products using radix-4 Booth algorithm. Encoding is used in
the proposed method. The authors in [4] propose
multiplication algorithm by dividing the operands into four
parts. Each multiplication is computed independently and the
results are added. The proposed algorithm divided each
operand into two parts. Techniques for low power design is
proposed in [1].
This paper proposes an algorithm involving multiplication of
bits of length two resulting in maximum of four bits. The
inputs are in sign magnitude form. The magnitudes of the
numbers are considered for computation. The inputs
multiplicand and multiplier are divided into slices of two bits
each. The partial products for all combinations of input slices
are calculated. Based on input size, the partial products are
added to give the result. The proposed algorithm is simulated
with Xilinx tool. Three configurations are used for analysis.
The trad configuration is the in-built multiplication operation.
The p_mult is the configuration obtained by using in-built
multiplication to generate the partial products. The p_mux
configuration generates the partial products by bit inspection.
The following is observed.
1. A 17.5% improvement in area utilization for ASIC 90nm
for proposed model with multiplication of the slices
p_mult compared with existing multiplication algorithm
trad and no change in proposed model using multiplexers
for multiplication p_mux. An improvement of 20% in
p_mult and 1.2% for p_mux is found for ASIC 45nm. The
number of slices used in FPGA is increased by 20% for
p_mult and 100% for p_mux.
2. A 31% improvement in power for p_mult over trad, 9%
improvement of p_mux over trad for ASIC 90nm is
observed. A power improvement of 31% of p_mult over
trad and 10% of p_mux over trad is observed for ASIC
45nm. The power consumed in FPGA configuration is
unchanged for all configurations.
3. The p_mult configuration increased the time by 47% over
tradand p_mux by 60% over trad for ASIC 90nm. A
timing degradation of 48% for p_mult and 64% for
p_mux over trad configuration is observed for ASIC
45nm. The FPGA configuration has time degradation of
57% for both p_mult and p_mux over trad configuration.
The strategies for power improvement suggested in [1] can be
examined for power improvement.
The rest of paper is organized as follows. Section 2 gives the
mathematical background for the proposed model, section 3
the proposed algorithm, section 4 simulations, section 5
conclusion followed by references.
Mathematical Background
Consider the product of two bits. There are four possibilities.
The product is given below in Table 1.
6200
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9(2016) pp 6200-6203
© Research India Publications. http://www.ripublication.com
Table 2: Area Utilisation Table
Table 1:Bit Multiplication Table
Product 00
01
10
11
00
00
00
00
00
00
01
10
11
01
00
10
100
110
10
00
11
110
1001
11
Consider the product of two 2n-bit numbers M and R. Divide
each operand into slices of two bits. Then the partial product
can be calculated as shown in Bit multiplication Table. The
partial products can be added suitably to obtain the result.
Traditional
Proposed_multiply
Proposed_mux
The number of slices in M and R are n. There are n partial
products to compute. They can be obtained from the Bit
Multiplication Table. The partial products are added to give
the final result. On average there are (n+1)(n-1) additions to
be performed.
2
ASIC
TSMC
90nm
ASIC
TSMC
45nm
222
183
222
83
66
82
FPGA(Virtex5
XC5CLX20T-2
FF 323)
Occupied Slices
FPGA
5 out of 3120
4 out of 3120
10 out of 3120
Table 3: Power Report
ASICTSM
C 90nm
ASICTSM
C 45nm
11792.647
nW
8056.113
nW
10681.233
nW
4464.628
nW
3051.049
nW
4015.667
nW
Proposed Algorithm
Consider the multiplication of M and R. Let both be n bits
wide, n being even. The algorithm for multiplication of two
four bit numbers is given below.
1. Start
2. Divide M and R into slices of two bits each. As M and
R are of four bits width, then M is divded into m1 and
mo and R is sliced into r1 and r0.
3. Calculate the following
pp0 = ro*m0
pp1 = r0*m1
pp2 = r1*m0
pp3 = r1*m1
4. Let the result be eight bits res[7:0]. Calculate the result
as below from step 5 to step 9.
5. res[1:0] = pp0[1:0];
6. temp1 = $signed(pp1) + $signed(pp2)+pp0[3:2];
7. res[3:2] = temp1[1:0];
8. temp2 = $signed(pp3) + temp1[4:2];
9. res[7:4] = temp2;
10. Stop
Traditional
Proposed_multipl
y
Proposed_mux
FPGA(Virtex
5
XC5CLX20T
-2 FF 323)
Occupied
Slices
321 mW
321 mW
321 mW
Table 4: Timing Analysis Table
Traditional
Proposed_multipl
y
Proposed_mux
ASICTSM
C 90nm
ASICTSM
C 45nm
704 ps
1039 ps
1186 ps
1765 ps
FPGA(Virtex
5
XC5CLX20T
-2 FF 323)
325.975 ns
513.395 ns
1129ps
1946ps
513.395 ns
The improvement in area utilization is shown in Fig. 1
The algorithm can be extended to input of arbitrary length. It
is assumed that the length is even. For multiplying two four bit
numbers, there are four multiplications, three additions. The
multiplication results can be computed based on the inputs and
the actual multiplication steps are avoided. Only the addition
steps are involved in computation.
Simulations
The proposed model was simulated on Xilinx ISE Tool. Code
in Verilog was written to simulate the product of two four bit
numbers. The code was compiled and synthesized. The area,
power consumption and execution time was calculated. The
multiplication algorithm supported by the software is called
trad. The algorithm involving multiplication of partial
products using the in-built multiplication is called p_mult.
Code in Verilog was written to generate the partial products by
bit inspection. This configuration is called p_mux in the
discussion that follows. The results obtained are shown below.
6201
Area Utilization
250
200
a
r 150
e 100
a
50
trad
p_mux
p_mult
0
type
Figure 1: Area Utilization
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9(2016) pp 6200-6203
© Research India Publications. http://www.ripublication.com
As seen from Fig. 1, 17.5% improvement in area utilization for
ASIC 90nm for proposed model with multiplication of the
slices p_mult compared with existing multiplication algorithm
trad and no change in proposed model using multiplexers for
multiplication p_mux. An improvement of 20% in p_mult and
1.2% for p_mux is found for ASIC 45nm. The number of
slices used in FPGA is increased by 20% for p_mult and 100%
for p_mux.
The power consumption is shown in Fig. 2. As seen from Fig.
2, 31% improvement in power for p_mult over trad, 9%
improvement of p_mux over trad for ASIC 90nm is observed.
A power improvement of 31% of p_mult over trad and 10% of
p_mux over trad is observed for ASIC 45nm. The power
consumed in FPGA configuration is unchanged for all
configurations.
Power Consumption
m
i
l
l
i
W
a
t
t
s
14000
12000
10000
8000
6000
4000
2000
0
trad
p_mux
p_mul
Figure 2: Power Consumption Comparison
The timing analysis is shown in Fig. 3. As seen from Fig. 3
the p_mult configuration increased the time by 47% over
tradand p_mux by 60% over trad for ASIC 90nm. A timing
degradation of 48% for p_mult and 64% for p_mux over trad
configuration is observed for ASIC 45nm. The FPGA
configuration has time degradation of 57% for both p_mult
and p_mux over trad configuration.
Timing Chart
2500
2000
trad
1500
p_mux
1000
S
e
c
o
n
d
s
Thus res = 01111000 = 1010 * 1100
#add operations = 3 = (2+1)(2-1)
Conclusion
type
N
a
n
o
The following example shows the algorithm simulation.
1. M=1010 R=1100
2. Let m0=10 m1=10 r0=00 r1=11
3. pp0=m0r0 = 10*00 = 00
pp1 = m1r0 = 10*00 = 00
pp2 = m0r1 = 10*11 = 110
pp3 = m1r1 = 10*11 = 110
4. Then res[1:0] = 00
5. temp1 = 00 + 110 + 0 = 110
6. res[3:2] = 10
7. temp2 = 110 + 1 = 111
8. res[7:4] = 0111
A multiplication algorithm for integers is proposed in this
paper. The method consists of slicing the inputs into chunks of
two bits. The partial products are determined from the slices
without multiplication. The partial products are added to give
the result suitably. The proposed model is simulated in Xilinx
Tool. The proposed method is simulated using in-built
multiplication algorithm for calculating the partial products as
configuration p_mult. A second configuration called p_mux is
developed to generate the partial products by bit inspection.
The configuration using in-built multiplication is called trad.
From the simulations, the following is observed.
1. 17.5% improvement in area utilization for ASIC 90nm
for proposed model with multiplication of the slices
p_mult compared with existing multiplication algorithm
trad and no change in proposed model using multiplexers
for multiplication p_mux. An improvement of 20% in
p_mult and 1.2% for p_mux is found for ASIC 45nm. The
number of slices used in FPGA is increased by 20% for
p_mult and 100% for p_mux.
2. 31% improvement in power for p_mult over trad, 9%
improvement of p_mux over trad for ASIC 90nm is
observed. A power improvement of 31% of p_mult over
trad and 10% of p_mux over trad is observed for ASIC
45nm. The power consumed in FPGA configuration is
unchanged for all configurations.
3. The p_mult configuration increased the time by 47% over
tradand p_mux by 60% over trad for ASIC 90nm. A
timing degradation of 48% for p_mult and 64% for
p_mux over trad configuration is observed for ASIC
45nm. The FPGA configuration has time degradation of
57% for both p_mult and p_mux over trad configuration.
p_mul
500
References
0
[1]
[2]
type
Figure 3: Timing Comparison
6202
Gaurav Verma, Manish Kumar, Vijay Khare, Low
Power Techniques for Digital System Design, Indian
Journal of Science and Technology, 8(17), pp.1-6
Israel Koren, Computer Arithmetic Algorithms,
Prentice Hall, NJ, 1993
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9(2016) pp 6200-6203
© Research India Publications. http://www.ripublication.com
[3]
[4]
[5]
[6]
[7]
Jongsu Park, San Kim, Yong-Surk Lee, A Low-Power
Booth Multiplier Using Novel DataPartition Method,
Proceedings of AP-ASIC 2004, pp. 54-57
Nan-Ying Shen, OscalT.C. Chen, Low Power
Multipliers by Minimizing Switching Activities of
Partial Products, Proceedings of ISCAS, 2002, pp. IV93-IV-96
Philip.E.Madrid, Brian Millar, Earl E Swartzlander Jr.,
Modified Booth Algorithm for High Radix Floating
point Multiplication , IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, Vol. 1, No. 2,
June 1993, pp.164-167
RajendraKatti, A Modified Booth Algorithm for High
Radix Fixed Point Multiplication, IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, Vol.
2, No. 4, Dec. 1994, pp, 522-524
Sandy Wang, Yi-Wen Wu, OscalT.C.Chen, Ruey-Lian
Ma, Low Power Multipliers by Minimizing Inter-Data
Switching Activities, Proceedings of IEEE Midwest
Symposium on Circuits and Systems, 2000, pp.89-92
Vol I
6203