BIT-PARALLEL ARITHMETIC Bit-Parallel Multiplication

1
BIT-PARALLEL ARITHMETIC
Notice that even though the adder is said to be bit-parallel, computation of
the output bits is done sequentially.
Addition and Subtraction
Two binary numbers,
In fact, at each time instant only one of the full-adders performs useful
work. The others have already completed their computations or may not
yet have received a correct carry input, hence they may perform erroneous
computations.
x = (x0∑x1x2… xWd–1)2 and y = (y0∑y1y2… yWd–1)2,
can be added using bit-parallel operation in O(log(Wd)) steps
Because of these wasted switch operations, power consumption is higher
than necessary.
■ Ripple-Carry Adder/Subtractor
x0
c0
FA
x1
y0
c1
FA
x0
xWd–1
y1
y0
yWd–1
c2
cWd–1
FA
=1
cWd
s1
Adder x + y
Lars Wanhammar
x3
y1
=1
y2
y3
=1
Add/Sub
=1
FA
FA
FA
c4
sWd–1
s0
DSP Integrated Circuits
x2
x1
c0
FA
s0
2
Addition time is determined by the carry path in the full-adders.
s1
s2
s3
Subtractor x – y
Department of Electrical Engineering
Linköping University
DSP Integrated Circuits
[email protected]
http://www.es.isy.liu.se/
Lars Wanhammar
Department of Electrical Engineering
Linköping University
[email protected]
http://www.es.isy.liu.se/
3
4
■ Carry–Look–Ahead Adder
Bit-Parallel Multiplication
■ Brent–Kung’s Adder
In order to multiply two two’s-complement numbers, a and x, we form the
partial bit-products ai · xk, as shown below.
■ Carry–Save Adder
–a0◊xW
d–1
■ Carry–Select Adder
a0◊xWd–2 – a1◊xWd–2
■ Carry–Skip Adder
–a0◊x1
■ Conditional–Sum Adder
a0◊x0
y–1
–a1◊x0
y0
◊x
c–2 Wd–1
aW
◊x
c–1 Wd–1
aW
aWc–1◊xWd–2
–a0◊x2
a1◊x1
–a2◊x0
y1
–aWc–1◊x0
yW
d+Wc–3
yW
d+Wc–2
High-speed, bit-parallel multipliers can be divided in three classes
■ Add–and–shift multipliers
The partial bit-products are generated sequentially and accumulate them
successively as they are generated. This type are therefore the slowest
multiplier, but the required chip area is low.
DSP Integrated Circuits
Lars Wanhammar
Department of Electrical Engineering
Linköping University
[email protected]
http://www.es.isy.liu.se/
DSP Integrated Circuits
Lars Wanhammar
Department of Electrical Engineering
Linköping University
[email protected]
http://www.es.isy.liu.se/
5
6
■ Parallel multipliers
Add–And–Shift Multiplication
All bit-products are generated in parallel and uses a multi-operand adder
(i.e., an adder tree) for their accumulation.
Consider the multiplication of two two’s-complement numbers, y = a · x.
A parallel multiplier structure can be partitioned into three parts: partial
product generation, carry-free addition, and carry-propagation addition.
Wd – 1
Wd – 1
Ê
ˆ
–
i
Á
˜
y = a – x0 + Â xi 2
= – ax 0 + Â ax i 2 – i
Á
˜
Ë
¯
i=1
i=1
These three parts can be implemented using different schemes.
0
(–a0
The carry-free addition part is often implemented by using a Wallace or
redundant binary addition tree.
(–a0
■ Array multipliers
An array of almost identical cells for generation of the bit-products and
accumulation.
(–a0
a1
(–a0
a1
a2
– (–a0
a1
a2
y–1
y0
y1
a1
a1
a2
a
aWc–1)◊xWd–1
a2
x
aW
)◊x
c–1 Wd–2
a2
LSB
MUX
aWc–1)◊x2
DSP Integrated Circuits
Department of Electrical Engineering
Linköping University
Reg
Add/Sub
aWc–1)◊x1
aW
)◊x
c–1 0
yW
d+Wc–3
yW
MSB
d+Wc–2
LSB
y
This type of multipliers requires the least amount of area, but it is also the
slower
Lars Wanhammar
MSB
The multiplication can be performed by generating the partial products,
for example row wise.
For example, the partial product generation can be implemented by AND
gates or by using the Booth’s algorithm.
Horners’ method
DSP Integrated Circuits
[email protected]
http://www.es.isy.liu.se/
Lars Wanhammar
Department of Electrical Engineering
Linköping University
[email protected]
http://www.es.isy.liu.se/
7
8
Modified Booth’s Algorithm
By considering three bits at a time from the input operand, x, is used to
recode the other input operand, y, into partial products (–2y, –y, 0, y, 2y)
that are added (subtracted) to form the result.
Tree–Based Multipliers
Short-hand version of the bit-products
x
y
Hence, the number of recoded partial products is reduced.
Booth’s algorithm reduces the number of additions/subtractions cycles to
Wd/2 = 8, which is only half that required in the ordinary shift–and–add
algorithm.
Notice that the reduction of the number of cycles directly corresponds to
a reduction in the power consumption.
Booth’s algorithm is also suitable for bit-serial or digit-serial implementation.
Bit-products
Product
In a tree-based multiplier the bit-products are often added using fulladders.
A full-adder can be considered as a counter, denoted (3, 2)-counter, that
adds three inputs and forms a two bit result.
The output equals the number of 1s in the inputs – counter.
A half-adder is denoted (2, 2)-counter.
DSP Integrated Circuits
Lars Wanhammar
Department of Electrical Engineering
Linköping University
[email protected]
http://www.es.isy.liu.se/
DSP Integrated Circuits
Lars Wanhammar
Department of Electrical Engineering
Linköping University
[email protected]
http://www.es.isy.liu.se/
9
In 1964 Wallace showed that a tree structure (Wallace tree) of such
counters is an efficient method, O(log(Wd)), to add the bit-products.
For example, Dadda has derived a scheme where all bit-products with the
same weight are collected and added using a Wallace tree with a minimum
number of counters and minimal critical paths.
Input
However, the multiplier is irregular and difficult to implement efficiently
since a large wiring area is required.
Inputs to the
second stage
x
y
(3, 2)-counter
Result of the
second stage
Wallace tree multipliers should therefore only be used for large word
length and where the performance is critical.
Outputs
(2, 2)-counter
Bit-products
10
Several alternative addition schemes are possible.
Another more regular structure is the Overturn-Stair Multiplier
Inputs to the
third stage
Result of the
third stage
Result of
the first stage
In the fourth stage, a RCA or a CLA can be used to obtain the product.
DSP Integrated Circuits
Department of Electrical Engineering
Linköping University
Lars Wanhammar
DSP Integrated Circuits
[email protected]
http://www.es.isy.liu.se/
Lars Wanhammar
Department of Electrical Engineering
Linköping University
[email protected]
http://www.es.isy.liu.se/
11
Array Multipliers
12
Notice that some slight variations in the way the partial products are
formed occur in the literature.
Many array multiplier schemes with varying degrees of pipelining have
been proposed.
All bit-products are generated in parallel and collected through an array of
full-adders and a final RCA.
A typical array multiplier—the Baugh–Wooley’s multiplier with a multiplication time proportional to 2Wd.
x0
1
p–1
x1
x2
x3
y0
y1
y2
y3
1
x0◊y3
x1◊y3
x2◊y3
x3◊y3
x3◊y2
x0◊y2
x1◊y2
x2◊y2
x0◊y1
x1◊y1
x2◊y1
x3◊y1
x0◊y0
x1◊y0
x2◊y0
x3◊y0
p0
p1
p2
p3
FA
x2◊y3
x2◊y2
FA
0
x3◊y3
x3◊y2
FA
x0◊y2
x1◊y1
FA
x2◊y1
FA
x3◊y1
x0◊y1
p4
p5
p6
don’t
care
Department of Electrical Engineering
Linköping University
FA
x1◊y0
FA
x2◊y0
FA
x3◊y0
x0◊y0
FA
DSP Integrated Circuits
0
x1◊y3
x1◊y2
FA
0 1
Lars Wanhammar
0
x0◊y3
FA
p–1
p0
FA
FA
p1
1
p2
[email protected]
http://www.es.isy.liu.se/
p3
p4
p5
p6
DSP Integrated Circuits
Lars Wanhammar
Department of Electrical Engineering
Linköping University
[email protected]
http://www.es.isy.liu.se/