UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6a High-Speed Multiplication - I Israel Koren Copyright 2010 Koren ECE666/Koren Part.6a.1 Speeding Up Multiplication ♦Multiplication involves 2 basic operations - generation of partial products + their accumulation ♦2 ways to speed up - reducing number of partial products and/or accelerating accumulation ♦3 types of high-speed multipliers: ♦Sequential multiplier - generates partial products sequentially and adds each newly generated product to previously accumulated partial product ♦Parallel multiplier - generates partial products in parallel, accumulates using a fast multi-operand adder ♦Array multiplier - array of identical cells generating new partial products; accumulating them simultaneously ∗ No separate circuits for generation and accumulation ∗ Reduced execution time but increased hardware complexity Copyright 2010 Koren ECE666/Koren Part.6a.2 Page 1 Reducing Number of Partial Products ♦Examining 2 or more bits of multiplier at a time ♦Requires generating A (multiplicand), 2A, 3A ♦Reduces number of partial products to n/2 each step more complex ♦Several algorithm which do not increase complexity proposed - one is Booth's algorithm ♦Fewer partial products generated for groups of consecutive 0’s and 1’s Copyright 2010 Koren ECE666/Koren Part.6a.3 Booth’s Algorithm ♦Group of consecutive 0’s in multiplier - no new partial product - only shift partial product right one bit position for every 0 ♦Group of m consecutive 1's in multiplier - less than m partial products generated ♦...01…110... = ...10...000... - ...00...010... ♦Using SD (signed-digit) notation =...100...010... ♦Example: ♦ ...011110... - = ...100000... - ...000010... = ...100010... (decimal notation: 15=16-1) ♦Instead of generating all m partial products - only 2 partial products generated ♦First partial product added - second subtracted number of single-bit shift-right operations still m Copyright 2010 Koren ECE666/Koren Part.6a.4 Page 2 Booth’s Algorithm - Rules ♦Recoding multiplier xn-1 xn- 2...x1 x0 in SD code ♦Recoded multiplier yn-1 yn-2 ... y1 y0 ♦xi,xi-1 of multiplier examined to generate yi ♦Previous bit - xi-1 - only reference bit ♦i=0 - reference bit x-1=0 ♦Simple recoding - yi = xi-1 - xi ♦No special order - bits can be recoded in parallel ♦Example: 0011110011(0) recoded as - Multiplier 0100010101 - 4 instead of 6 add/subtracts Copyright 2010 Koren ECE666/Koren Part.6a.5 Sign Bit ♦Two's complement - sign bit xn-1 must be used ♦Deciding on add or subtract operation - no shift required - only prepares for next step ♦Verify only for negative values of X (xn-1=1) ♦2 cases ♦Case 1 - A subtracted - necessary correction ♦Case 2 - without sign bit - scan over a string of 1's and perform an addition for position n-1 ∗ When xn-1=1 considered - required addition not done ∗ Equivalent to subtracting A⋅⋅2n-1 - correction term Copyright 2010 Koren ECE666/Koren Part.6a.6 Page 3 Example Copyright 2010 Koren ECE666/Koren Part.6a.7 Booth’s Algorithm - Properties ♦Multiplication starts from least significant bit ♦If started from most significant bit - longer adder/subtractor to allow for carry propagation ♦No need to generate recoded SD multiplier (requiring 2 bits per digit) ∗ Bits of original multiplier scanned - control signals for adder/subtractor generated ♦Booth's algorithm can handle two's complement multipliers ∗ If unsigned numbers multiplied - 0 added to left of multiplier (xn=0) to ensure correctness Copyright 2010 Koren ECE666/Koren Part.6a.8 Page 4 Drawbacks to Booth's Algorithm ♦Variable number of add/subtract operations and of shift operations between two consecutive add/subtract operations ∗ Inconvenient when designing a synchronous multiplier ♦Algorithm inefficient with isolated 1's ♦Example: - - - ♦001010101(0) recoded as 011111111, requiring 8 instead of 4 operations ♦Situation can be improved by examining 3 bits of X at a time rather than 2 Copyright 2010 Koren ECE666/Koren Part.6a.9 Radix-4 Modified Booth Algorithm ♦Bits xi and xi-1 recoded into yi and yi-1 xi-2 serves as reference bit ♦Separately - xi-2 and xi-3 recoded into yi-2 and yi-3 - xi-4 serves as reference bit ♦Groups of 3 bits each overlap - rightmost being x1 x0 (x-1), next x3 x2 (x1), and so on Copyright 2010 Koren ECE666/Koren Part.6a.10 Page 5 Radix-4 Algorithm - Rules ♦i=1,3,5,… ♦Isolated 0/1 handled efficiently ♦If xi-1 is an isolated 1, yi-1=1 - only a single operation needed ♦Similarly - xi-1 an isolated 0 in a string of 1's - ...10(1)… recoded as ...11... or ...01… - single operation performed ♦Exercise: To find required operation - calculate xi-1+xi-2-2xi for odd i’s and represent result as a 2-bit binary number yiyi-1 in SD Copyright 2010 Koren ECE666/Koren Part.6a.11 Radix-4 vs. Radix-2 Algorithm ♦01|01|01|01|(0) yields 01|01|01|01| - number of operations remains 4 - the minimum - - ♦00|10|10|10|(0) yields 01|01|01|10|, requiring 4, instead of 3, operations ♦Compared to radix-2 Booth's algorithm - less patterns with more partial products; Smaller increase in number of operations ♦Can design n-bit synchronous multiplier that generates exactly n/2 partial products ♦Even n - two's complement multipliers handled correctly; Odd n - extension of sign bit needed ♦Adding a 0 to left of multiplier needed if unsigned numbers are multiplied and n odd - 2 0’s if n even Copyright 2010 Koren ECE666/Koren Part.6a.12 Page 6 Example ♦n/2=3 steps ; 2 multiplier bits in each step ♦All shift operations are 2 bit position shifts ♦Additional bit for storing correct sign required to properly handle addition of 2A Copyright 2010 Koren ECE666/Koren Part.6a.13 Radix-8 Modified Booth's Algorithm ♦Recoding extended to 3 bits at a time overlapping groups of 4 bits each ♦Only n/3 partial products generated - multiple 3A needed - more complex basic step ♦Example: recoding 010(1) yields yi yi-1 yi-2=011 ♦Technique for simplifying generation and accumulation of ±3A exists ♦To find minimal number of add/subtract ops required for a given multiplier - find minimal SD representation of multiplier ♦Representation with smallest number of nonzero digits - Copyright 2010 Koren ECE666/Koren Part.6a.14 Page 7 Obtaining Minimal Representation of X ♦yn-1yn-2... y0 is a minimal representation of an SD number if yi⋅yi-1=0 for 1≤ ≤ i≤ ≤ n-1, given that most significant bits can satisfy yn-1⋅yn-2 ≠ 1 ♦Example: Representation of 7 with 3 bits 111 minimal representation although yi⋅yi-1 ≠ 0 ♦For any X add a 0 to its left to satisfy above condition Copyright 2010 Koren ECE666/Koren Part.6a.15 Canonical Recoding ♦Multiplier bits examined one at a time from right; xi+1 - reference bit ♦To correctly handle a single 0/1 in string of 1's/0’s - need information on string to right ♦“Carry” bit - 0 for 0's and 1 for 1's ♦As before, recoded multiplier can be used without correction if represented in two's complement ♦Extend sign bit xn-1 - xn-1xn-1xn-2…x0 ♦Can be expanded to two or more bits at a time ♦Multiples needed for 2 bits - ±A and ±2A Copyright 2010 Koren ECE666/Koren Part.6a.16 Page 8 Disadvantages of Canonical Recoding ♦Bits of multiplier generated sequentially ♦In Booth’s algorithm - no “carry” propagation - partial products generated in parallel and a fast multi-operand adder used ♦To take full advantage of minimum number of operations - number of add/subtracts and length of shifts must be variable - difficult to implement ♦For uniforms shifts - n/2 partial products - more than the minimum in canonical recoding Copyright 2010 Koren ECE666/Koren Part.6a.17 Alternate 2-bitat-a-time Algorithm ♦Reducing number of partial products but still uniform shifts of 2 bits each ♦xi+1 reference bit for xi xi-1 - i odd ♦±2A,±±4A can be generated using shifts ♦4A generated when (xi+1)xi (xi-1)=(0)11 - group of 1's - not for (xi+3)(xi+2)xi+1 - 0 in rightmost position ∗ Not recoding - cannot express 4 in 2 bits ∗ Number of partial products - always n/2 ∗ Two's complement multipliers - extend sign bit ∗ Unsigned numbers - 1 or 2 0’s added to left of multiplier Copyright 2010 Koren ECE666/Koren Part.6a.18 Page 9 Example ♦Multiplier 01101110 - partial products: ♦Translates to the SD number 010110010 - not minimal - includes 2 adjacent nonzero digits ♦Canonical recoding yields 010010010 - minimal representation Copyright 2010 Koren ECE666/Koren Part.6a.19 Dealing with Least significant Bit ♦For the rightmost pair x1x0, if x0 = 1 - considered continuation of string of 1's that never really started - no subtraction took place ♦Example: multiplier 01110111 - partial products: ♦Correction: when x0=1 - set initial partial product to -A instead of 0 ♦4 possible cases: Copyright 2010 Koren ECE666/Koren Part.6a.20 Page 10 Example ♦Previous example - ♦Multiplier's sign bit extended in order to decide that no operation needed for first pair of multiplier bits ♦As before - additional bit for holding correct sign is needed, because of multiples like -2A Copyright 2010 Koren ECE666/Koren Part.6a.21 Extending the Alternative Algorithm ♦The above method can be extended to three bits or more at each step ♦However, here too, multiples of A like 3A or even 6A are needed and ∗ Prepare in advance and store ∗ Perform two additions in a single step ♦For example, for (0)101 we need 8-2=6, and for (1)001, -8+2=-6 Copyright 2010 Koren ECE666/Koren Part.6a.22 Page 11 Implementing Large Multipliers Using Smaller Ones ♦Implementing n x n bit multiplier as a single integrated circuit - several such circuits for implementing larger multipliers can be used ♦2n x 2n bit multiplier can be constructed out of 4 n x n bit multipliers based on : ♦AH , AL - most and least significant halves of A ; XH , XL - same for X Copyright 2010 Koren ECE666/Koren Part.6a.23 Aligning Partial Products ♦4 partial products of 2n bits - correctly aligned before adding ♦Last arrangement - minimum height of matrix - 1 level of carry-save addition and a CPA ♦n least significant bits - already bits of final product - no further addition needed ♦2n center bits - added by 2n-bit CSA with outputs connected to a CPA ♦n most significant bits connected to same CPA, since center bits may generate carry into most significant bits - 3n-bit CPA needed Copyright 2010 Koren ECE666/Koren Part.6a.24 Page 12 Decomposing a Large Multiplier into Smaller Ones - Extension ♦Basic multiplier - n x m bits - n ≠ m ♦Multipliers larger than 2n x 2m can be implemented ♦Example: 4n x 4n bit multiplier - implemented using n x n bit multipliers ∗ 4n x 4n bit multiplier ∗ 2n x 2n bit multiplier ∗ Total of 16 n x n bit ∗ 16 partial products before being added requires 4 2n x 2n bit multipliers requires 4 n x n bit multipliers multipliers aligned ♦Similarly - for any kn x kn bit multiplier with integer k Copyright 2010 Koren ECE666/Koren Part.6a.25 Adding Partial Products ♦After aligning 16 products - 7 bits in one column need to be added ♦Method 1: (7,3) counters generating 3 operands added by (3,2) counters - generating 2 operands added by a CPA ♦Method 2: Combining 2 sets of counters into a set of (7;2) compressors ♦Selecting more economical multi-operand adder - discussed next Copyright 2010 Koren ECE666/Koren Part.6a.26 Page 13
© Copyright 2026 Paperzz