Part 6a

UNIVERSITY OF MASSACHUSETTS
Dept. of Electrical & Computer Engineering
Digital Computer Arithmetic
ECE 666
Part 6a
High-Speed Multiplication - I
Israel Koren
Copyright 2010 Koren
ECE666/Koren Part.6a.1
Speeding Up Multiplication
♦Multiplication involves 2 basic operations - generation of
partial products + their accumulation
♦2 ways to speed up - reducing number of partial
products and/or accelerating accumulation
♦3 types of high-speed multipliers:
♦Sequential multiplier - generates partial products
sequentially and adds each newly generated product to
previously accumulated partial product
♦Parallel multiplier - generates partial products in
parallel, accumulates using a fast multi-operand adder
♦Array multiplier - array of identical cells generating
new partial products; accumulating them simultaneously
∗ No separate circuits for generation and accumulation
∗ Reduced execution time but increased hardware complexity
Copyright 2010 Koren
ECE666/Koren Part.6a.2
Page 1
Reducing Number of Partial Products
♦Examining 2 or more bits of multiplier at a time
♦Requires generating A (multiplicand), 2A, 3A
♦Reduces number of partial products to n/2 each step more complex
♦Several algorithm which do not increase
complexity proposed - one is Booth's algorithm
♦Fewer partial products generated for groups of
consecutive 0’s and 1’s
Copyright 2010 Koren
ECE666/Koren Part.6a.3
Booth’s Algorithm
♦Group of consecutive 0’s in multiplier - no new partial
product - only shift partial product right one bit
position for every 0
♦Group of m consecutive 1's in multiplier - less than m
partial products generated
♦...01…110... = ...10...000... - ...00...010...
♦Using SD (signed-digit) notation =...100...010...
♦Example:
♦ ...011110...
- = ...100000... - ...000010... =
...100010... (decimal notation: 15=16-1)
♦Instead of generating all m partial products - only 2
partial products generated
♦First partial product added - second subtracted number of single-bit shift-right operations still m
Copyright 2010 Koren
ECE666/Koren Part.6a.4
Page 2
Booth’s Algorithm - Rules
♦Recoding multiplier xn-1 xn- 2...x1 x0 in SD code
♦Recoded multiplier yn-1 yn-2 ... y1 y0
♦xi,xi-1 of multiplier examined to generate yi
♦Previous bit - xi-1 - only reference bit
♦i=0 - reference bit x-1=0
♦Simple recoding - yi = xi-1 - xi
♦No special order - bits can be recoded in parallel
♦Example:
0011110011(0) recoded as
- Multiplier
0100010101 - 4 instead of 6 add/subtracts
Copyright 2010 Koren
ECE666/Koren Part.6a.5
Sign Bit
♦Two's complement - sign bit xn-1 must be used
♦Deciding on add or subtract operation - no shift
required - only prepares for next step
♦Verify only for negative values of X (xn-1=1)
♦2 cases
♦Case 1 - A subtracted - necessary correction
♦Case 2 - without sign bit - scan over a string of
1's and perform an addition for position n-1
∗ When xn-1=1 considered - required addition not done
∗ Equivalent to subtracting A⋅⋅2n-1 - correction term
Copyright 2010 Koren
ECE666/Koren Part.6a.6
Page 3
Example
Copyright 2010 Koren
ECE666/Koren Part.6a.7
Booth’s Algorithm - Properties
♦Multiplication starts from least significant bit
♦If started from most significant bit - longer
adder/subtractor to allow for carry propagation
♦No need to generate recoded SD multiplier
(requiring 2 bits per digit)
∗ Bits of original multiplier scanned - control signals for
adder/subtractor generated
♦Booth's algorithm can handle two's complement
multipliers
∗ If unsigned numbers multiplied - 0 added to left of
multiplier (xn=0) to ensure correctness
Copyright 2010 Koren
ECE666/Koren Part.6a.8
Page 4
Drawbacks to Booth's Algorithm
♦Variable number of add/subtract operations and
of shift operations between two consecutive
add/subtract operations
∗ Inconvenient when designing a synchronous multiplier
♦Algorithm inefficient with isolated 1's
♦Example:
- - - ♦001010101(0) recoded as 011111111, requiring
8 instead of 4 operations
♦Situation can be improved by examining 3 bits of
X at a time rather than 2
Copyright 2010 Koren
ECE666/Koren Part.6a.9
Radix-4 Modified Booth Algorithm
♦Bits xi and xi-1 recoded into yi and yi-1 xi-2 serves as reference bit
♦Separately - xi-2 and xi-3 recoded into yi-2 and
yi-3 - xi-4 serves as reference bit
♦Groups of 3 bits each overlap - rightmost being
x1 x0 (x-1), next x3 x2 (x1), and so on
Copyright 2010 Koren
ECE666/Koren Part.6a.10
Page 5
Radix-4 Algorithm - Rules
♦i=1,3,5,…
♦Isolated 0/1
handled
efficiently
♦If xi-1 is an
isolated 1, yi-1=1 - only a single operation needed
♦Similarly - xi-1 an isolated
0 in a string
of 1's -
...10(1)… recoded as ...11... or ...01… - single
operation performed
♦Exercise: To find required operation - calculate
xi-1+xi-2-2xi for odd i’s and represent result as a
2-bit binary number yiyi-1 in SD
Copyright 2010 Koren
ECE666/Koren Part.6a.11
Radix-4 vs. Radix-2 Algorithm
♦01|01|01|01|(0) yields 01|01|01|01| - number of
operations remains 4 - the minimum
- - ♦00|10|10|10|(0) yields 01|01|01|10|, requiring 4,
instead of 3, operations
♦Compared to radix-2 Booth's algorithm - less
patterns with more partial products; Smaller
increase in number of operations
♦Can design n-bit synchronous multiplier that
generates exactly n/2 partial products
♦Even n - two's complement multipliers handled
correctly; Odd n - extension of sign bit needed
♦Adding a 0 to left of multiplier needed if unsigned
numbers are multiplied and n odd - 2 0’s if n even
Copyright 2010 Koren
ECE666/Koren Part.6a.12
Page 6
Example
♦n/2=3
steps ; 2 multiplier bits in each step
♦All shift operations are 2 bit position shifts
♦Additional bit for storing correct sign required to
properly handle addition of 2A
Copyright 2010 Koren
ECE666/Koren Part.6a.13
Radix-8 Modified Booth's Algorithm
♦Recoding extended to 3 bits at a time overlapping groups of 4 bits each
♦Only n/3 partial products generated - multiple
3A needed - more complex basic step
♦Example: recoding 010(1) yields yi yi-1 yi-2=011
♦Technique for simplifying generation and
accumulation of ±3A exists
♦To find minimal number of add/subtract ops
required for a given multiplier - find minimal SD
representation of multiplier
♦Representation with smallest number of nonzero
digits -
Copyright 2010 Koren
ECE666/Koren Part.6a.14
Page 7
Obtaining Minimal Representation of X
♦yn-1yn-2... y0
is a minimal representation of an
SD number if yi⋅yi-1=0 for 1≤
≤ i≤
≤ n-1, given that
most significant bits can satisfy yn-1⋅yn-2 ≠ 1
♦Example:
Representation
of 7 with 3 bits
111 minimal
representation
although
yi⋅yi-1 ≠ 0
♦For any X add a 0 to its
left to satisfy
above condition
Copyright 2010 Koren
ECE666/Koren Part.6a.15
Canonical Recoding
♦Multiplier bits examined
one at a time from right;
xi+1 - reference bit
♦To correctly handle a
single 0/1 in string of
1's/0’s - need information on string to right
♦“Carry” bit - 0 for 0's and 1 for 1's
♦As before, recoded multiplier can be used without
correction if represented in two's complement
♦Extend sign bit xn-1 - xn-1xn-1xn-2…x0
♦Can be expanded to two or more bits at a time
♦Multiples needed for 2 bits - ±A and ±2A
Copyright 2010 Koren
ECE666/Koren Part.6a.16
Page 8
Disadvantages of Canonical Recoding
♦Bits of multiplier generated sequentially
♦In Booth’s algorithm - no “carry” propagation -
partial products generated in parallel and a fast
multi-operand adder used
♦To take full advantage of minimum number of
operations - number of add/subtracts and length
of shifts must be variable - difficult to
implement
♦For uniforms shifts - n/2 partial products - more
than the minimum in canonical recoding
Copyright 2010 Koren
ECE666/Koren Part.6a.17
Alternate 2-bitat-a-time
Algorithm
♦Reducing number of
partial products but
still uniform shifts
of 2 bits each
♦xi+1
reference bit for xi xi-1 - i odd
♦±2A,±±4A can be generated using shifts
♦4A generated when (xi+1)xi (xi-1)=(0)11 - group of
1's - not for (xi+3)(xi+2)xi+1 - 0 in rightmost position
∗ Not recoding - cannot express 4 in 2 bits
∗ Number of partial products - always n/2
∗ Two's complement multipliers - extend sign bit
∗ Unsigned numbers - 1 or 2 0’s added to left of multiplier
Copyright 2010 Koren
ECE666/Koren Part.6a.18
Page 9
Example
♦Multiplier 01101110 - partial products:
♦Translates to the SD number 010110010 - not
minimal - includes 2 adjacent nonzero digits
♦Canonical recoding yields 010010010 - minimal
representation
Copyright 2010 Koren
ECE666/Koren Part.6a.19
Dealing with Least significant Bit
♦For the rightmost pair x1x0, if x0 = 1 -
considered continuation of string of 1's that never
really started - no subtraction took place
♦Example: multiplier 01110111 - partial products:
♦Correction: when x0=1 - set initial partial product
to -A instead of 0
♦4 possible cases:
Copyright 2010 Koren
ECE666/Koren Part.6a.20
Page 10
Example
♦Previous
example -
♦Multiplier's sign bit extended in order to decide
that no operation needed for first pair of
multiplier bits
♦As before - additional bit for holding correct
sign is needed, because of multiples like -2A
Copyright 2010 Koren
ECE666/Koren Part.6a.21
Extending the Alternative Algorithm
♦The above method can be extended to three bits
or more at each step
♦However, here too, multiples of A like 3A or
even 6A are needed and
∗ Prepare in advance and store
∗ Perform two additions in a single step
♦For example, for (0)101 we need 8-2=6, and for
(1)001, -8+2=-6
Copyright 2010 Koren
ECE666/Koren Part.6a.22
Page 11
Implementing Large Multipliers Using
Smaller Ones
♦Implementing n x n bit multiplier as a single
integrated circuit - several such circuits for
implementing larger multipliers can be used
♦2n x 2n bit multiplier can be constructed out of 4
n x n bit multipliers based on :
♦AH , AL - most and least significant halves of A ;
XH , XL - same for X
Copyright 2010 Koren
ECE666/Koren Part.6a.23
Aligning Partial Products
♦4 partial products of 2n bits
- correctly aligned before adding
♦Last arrangement - minimum
height of matrix - 1 level of
carry-save addition and a CPA
♦n
least significant bits - already
bits of final product - no further
addition needed
♦2n center bits - added by 2n-bit CSA
with outputs connected to a CPA
♦n
most significant bits connected to same CPA,
since center bits may generate carry into most
significant bits - 3n-bit CPA needed
Copyright 2010 Koren
ECE666/Koren Part.6a.24
Page 12
Decomposing a Large Multiplier into
Smaller Ones - Extension
♦Basic multiplier - n x m bits - n ≠ m
♦Multipliers larger than 2n x 2m can be implemented
♦Example: 4n x 4n bit multiplier - implemented using
n x n bit multipliers
∗ 4n x 4n bit multiplier
∗ 2n x 2n bit multiplier
∗ Total of 16 n x n bit
∗ 16 partial products before being added
requires 4 2n x 2n bit multipliers
requires 4 n x n bit multipliers
multipliers
aligned
♦Similarly - for any kn x kn
bit multiplier with integer k
Copyright 2010 Koren
ECE666/Koren Part.6a.25
Adding Partial Products
♦After aligning 16 products
- 7 bits in one column need
to be added
♦Method 1: (7,3) counters generating 3 operands added by
(3,2) counters - generating 2
operands added by a CPA
♦Method 2: Combining 2 sets
of counters into a set
of (7;2) compressors
♦Selecting more
economical multi-operand adder - discussed next
Copyright 2010 Koren
ECE666/Koren Part.6a.26
Page 13