Graduate Institute of Electronics Engineering, NTU Numerical Strength Reduction (Chapter 15 of Textbook) For Advanced VLSI Design 11-27-2002 台大電機系 吳安宇 教授 ACCESS IC LAB ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Introduction Numerical Transformation techniques Re-structure the computation Rely upon sub-expression elimination (sharing) Advantage Reduce the strength (signal level or data wordlength) of DSP computation Improve the performance (Power, Speed and Area) 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Sub-expression Elimination Numerical transform technique Sub-expression Only performed on constant multiplications Efficient implementation for “constant” multiplications by dedicated shift-and-add multipliers Example: [Sec.15.2.1 on pp.560] a × x = 13 × x = 001101× x b × x = 27 × x = 011011× x Note: 000001 × x is the input signal (without additional computation) Total 5 Shifters, 5 Adders (2S,2A and 3S, 3A, respectively) 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Common term a × x = 13 × x = 001101× x = 001001× x + 000100 × x b × x = 27 × x = 011011× x = 001001× x + 010010 × x a × x = 000100 × x + 001001× x Total 2S, 2A * Further enhanced by Modify by sub-expression sharing a × x = 13 ⋅ x = 000100 × x + 001001 × x b × x = 27 ⋅ x = 010010 × x + 001001 × x = (001001× x) << 1 + (001001× x) Total 3S, 3A (2S, 2A and 1S, 1A, respectively) 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Multiple Constant Multiplication (MCM) The algorithm for MCM uses an iterative matching process that consists of the following steps: 1. Express each constant in the set using a binary format (such as signed, unsigned, 2’s complement). 2. Determine the number of bit-wise matches (nonzero bits) between all of the constants in the set. 3. Choose the best match. 4. Eliminate the redundancy from the best match. Return the remainders and the redundancy to the set of coefficients. 5. Repeat Step 2-4 until no improvement is achieved. 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Example of iterative matching process [15.3.1(p.561)] a × x = 237 × x b × x = 182 × x c × x = 93 × x a 3 b 4 2 c Constant Value Unsigned a 237 11101101 b 182 10110110 c 93 01011101 Constant Unsigned Rem. of a 10100000 b 10110110 Rem. of c 00010000 Red. of a,c 01001101 台灣大學 吳安宇 教授 Common Term ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Constant Unsigned Rem. of a (2) 00000000 Rem. of b (1) 00010110 Rem. of c (1) 00010000 Red. of a,c (1ST run) 01001101 Red. Of Rem. of a and b (2nd run) 10100000 Rem. a 2 b 1 Rem. c 3S,3A 2S,1A a = [ 01001101 + 10100000 ] 3S,2A b = [ 00010110 + 10100000 ] 5S,5A 5S,3A c = [ 01001101+ 00010000 ] 1S,1A 1S 9This implementation requires : 9 Shifts ; 9 Adds 9Standard implementation required: 14 Shifts ; 13 Adds 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Application in Linear Transformation General form of linear transformation: y =T ∗X T: m-by-n matrix x: length-n vector y: length-m vector Equivalent form: n yi = ∑ tij x j , i = 1,....., m. j =1 Subexpression elimination: Minimize the number of shifts and additions required to compute the product tijxj 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Linear Transformation (Matrix) 3 basic steps of strength reduction: 1. Minimize the number of shifts and adds by using iterative matching algorithm 2. Formation of unique products using the sub-expression found in 1st step. 3. Final step involves the sharing of adds 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Example Step1 7 12 TX = 5 7 Step2 8 2 13 x1 11 7 13 x2 8 2 15 x3 11 7 11 x4 p1 = 0101∗ x1 Columm1 Columm2 Columm3 Columm4 0101 (5) 1000 (5) 0010 (2) 1001 (9) 0010 (2) 1011 (11) 0111 (7) 0100 (4) 1100 (12) 0010 (2) p2 = 0010∗ x1 p4 = 1000∗ x2 p5 = 1011∗ x2 p8 = 1001∗ x4 p9 = 0100∗ x4 台灣大學 吳安宇 教授 p3 = 1100∗ x1 p6 = 0010∗ x3 p7 = 0111∗ x3 p10 = 0010∗ x4 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Step2 y1 = 7 x1 + 8 x2 + 2 x3 + 13 x4 = [ p1 + p 2 ] + p 4 + p6 + [ p8 + p9 ] y2 = 12 x1 + 11x2 + 7 x3 + 13x4 = p3 + p5 + p7 + [ p8 + p9 ] y3 = 5 x1 + 8 x2 + 2 x3 + 15x4 = p1 + p4 + p6 + [ p8 + p9 + p10 ] y4 = 7 x1 + 11x2 + 7 x3 + 11x4 = [ p1 + p2 ] + p5 + p7 + [ p8 + p10 ] Step3 y1 = p 2 + ( p1 + p 4 + p 6 + p8 + p 9 ) y2 = p3 + p9 + ( p5 + p7 + p8 ) y3 = p10 + ( p1 + p4 + p6 + p8 + p9 ) y4 = p1 + p2 + p10 + ( p5 + p7 + p8 ) 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Hardware complexity a. Original Circuit y = 7 x + 8 x + 2 x + 13 x 1 2 3 4 1 y2 = 12 x1 + 11x2 + 7 x3 + 13 x4 y3 = 5 x1 + 8 x2 + 2 x3 + 15 x4 y4 = 7 x1 + 11x2 + 7 x3 + 11x4 6S, 4A, 3A 8S, 7A, 3A 6S, 4A, 3A 8S, 8A, 3A 28S, 35A b. Modified Circuit y 1 = p 2 + ( p1 + p 4 + p 6 + p 8 + p 9 ) y 2 = p 3 + p 9 + ( p 5 + p 7 + p8 ) y3 = p10 + ( p1 + p 4 + p6 + p8 + p9 ) y 4 = p1 + p 2 + p10 + ( p5 + p7 + p8 ) 台灣大學 吳安宇 教授 1S,(5S,2A, 4A), 1A 3S,1A,(5S,5A, 2A), 2A 1S, 1A 3S,1A, 3A 18S, 22A ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Application in Polynomial Evaluation y = x13 + x 7 + x 4 + x 2 + x y = x 8 ∗ ( x 4 ∗ x) + x 2 ∗ ( x 4 ∗ x) + x 4 + x 2 + x x = x∗x 2 x = x ∗x 4 2 2 x8 = x 4 ∗ x 4 Require 6 instead of 22 multiplications 台灣大學 吳安宇 教授 13 = 1101 7 = 0111 4 = 0100 2 = 0010 1 = 0001 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Subexpression sharing in Digital Filter Representation Canonic Sign Digit (CSD): No adjacent 1 and –1. Signed Power-of-Two (SPT): No CSD contraint Notation: (x >>i) = 2−i Shift-right-by-i-bit operation Example y = 0.101000101 ∗ x y = ( x >> 1) − ( x >> 3) + ( x >> 7) − ( x >> 9) x2 = x − ( x >> 2) y = ( x2 >> 1) + ( x2 >> 7) 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Application in N-tape Filter y (n) = c0 x(n) + c1 x(n − 1) + ⋅ ⋅ ⋅ ⋅ +c N −1 x(n − N + 1) One variable is multiplied to multiple constant coefficient Subexpression elimination can then applied to this structure 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Visualization Approach z4-tap FIR filter with CSD representation y (n) = 1.01010000010 ∗ x(n) + 0.10001010101 ∗ x(n − 1) + 0.10010000010 ∗ x(n − 2) + 1.00000101000 ∗ x(n − 3) Step 1. x(n) x(n − 1) x ( n − 2) x(n − 3) -1 1 1 -1 -1 1 -1 -1 -1 1 -1 1 1 1 -1 Define x2 = x1 - ( x[-1] >> -1 ) Time delay 台灣大學 吳安宇 教授 Shift No. ACCESS IC LAB Step 2. Graduate Institute of Electronics Engineering, NTU Define x2 = x1 - ( x[-1] >> -1 ) -1 2 1 2 -2 -1 -2 -2 1 Step 3. -1 Define x3 = x2 - ( x1 >> 2 ) -1 3 2 -3 -2 -2 1 台灣大學 吳安宇 教授 -1 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Step 4. Write out the complete definition of the filter x 2 = x1 − ( x[−1] >> (−1)) x3 = x 2 + x1 >> 2 How to deal with negative shift? y = − x1 + x3 >> 2 + x 2 >> 10 − x3[−1] >> 5 − x 2[−1] >> 11 − x 2[−2] >> 1 + x1[−3] >> 6 − x1[−3] >> 8 Step 5. Modify as (add >> +1 to both sides of x2 and x3) x 2 = x1 >> 1 − x1[−1] x3 = x 2 + x1 >> 3 y = − x1 + x3 >> 1 + x 2 >> 9 − x3[−1] >> 4 − x 2[−1] >> 10 − x 2[−2] >> 1 + x1[−3] >> 6 − x1[−3] >> 8 (Add >> -1 to the summation term y) 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Implementation using Transpose Form Example 15.4.4(p.570) y(n) = c0 x(n) − c1 x(n − 1) + c2 x(n − 2) = 0.00101011× x(n) y = x1 >>3 + x1 >>5 + x1 >> 7 + x1 >>8 + 0.10011010× x(n − 1) + x2[−1] >>1+ x1[−1] >>5 + 0.11010010× x(n − 2) + x2[−2] >>1+ x1[−2] >> 2 Hint: (1)setting x 2 = x1 + x1 >> 3 + x1 >> 6 (2) x2[−1] >> 1 = delay x2, then = x 2 shift , then x 2[−2] >> 1 = double delay = x 2 shift, then 台灣大學 吳安宇 教授 shift delay x 2, then shift double delay ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Implementation: Sharing Method I 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Implementation: Sharing Method II Carry-save adder Can be applied. 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CSD representation Canonic Sign Digit (CSD) Less hardware using a CSD representation 2 most common subexpressions are: 101 = x + ( x >> 2) 101 = x − ( x >> 2) a W-bit CSD number can be broken down into: W 18 + O(1) pairs of type 101 W + O(1) pairs of type 101 18 W + O (1) isolated 1 or 1 9 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Advantage in CSD Circuit can be achieve very easily by finding the concurrence of only 2 sub-expression 33% saving compared with the total no. of nonzero bits This architecture can be done without a major increase in routing cost. 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Example in CSD Example 15.4.5 (P.572) a 3-tap FIR filter c0 = 0.10101010000 c1 = 0.10010100101 c = 0.10101010101 2 101 101 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Further, reducing hardware by sharing methodology 101 101 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Number Splitting Numerical Transform Reduce the hardware cost or power consumption Perform on the infinite precision version of the constant coefficient Strength reduction at a higher level. Type Additive Multiplicative 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Representation of Number Splitting Y = TX Y Y: N-Dimension Vector T: N-by-M Matrix X: M=J+K-Dimension Vector T X 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Row-Based Additive Number Splitting T' T =W ∗ AG δ = ti , q − ti , p 9In order to equate T’ and T, 2 additional matrix must be generated: AG = [ 0 ... 0 ti , p 0 ti , p 0 ... 0] 9We is initially set to an identity matrix of dimension N-by-N 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Example Step 1. y0 (n) y0 (n + 1) .4 .7 .4 .8 y n ( ) y (n + 1) = .3 .6 .2 .9 1 1 y ( n) y2 (n + 1) .5 .3 .2 .7 2 x0 (n) 0 .7 0 .8 T ' = .3 .6 .2 .9 .5 .3 .2 .7 1 0 0 1 W = 0 1 0 0 0 0 1 0 AG = [.4 0 .4 0] 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Step 2. T' T =W ∗ 0 .7 0 .8 AG .4 .7 .4 .8 1 0 0 1 .3 .6 .2 .9 .3 .6 .2 .9 = 0 1 0 0 .5 .3 .2 .7 Originally .5 .3 .2 .7 0 0 1 0 .4 0 .4 0 y0 (n + 1) = .4 y0 (n) + .7 y1 (n) + .4 y2 (n) + .8 x0 (n) 4M, 3A Transform y0 (n + 1) = .4( y0 (n) + y2 (n)) + .7 y1 (n) + .8 x0 (n) 3M, 3A 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Step 3. .4 .7 .4 .8 1 .3 .6 .2 .9 = 0 .5 .3 .2 .7 0 0 .7 0 .8 T ' = .3 .6 .2 .9 .5 .3 .2 .7 Originally 1 0 0 W = 0 1 0 0 0 1 0 .7 0 0 0 1 0 .3 .6 .2 1 0 0 1 .5 .3 0 0 1 0 0 .4 0 .4 0 0 .2 1 0 0 1 .4 0 .4 AG = 0 0 0 0 .2 .8 .9 .5 0 .2 0 .2 y2 (n + 1) = .5 y0 (n) + .3 y1 (n) + .2 y2 (n) + .7 x0 (n) 4M, 3A Transform y2 (n + 1) = .5 y0 (n) + .3 y1 (n) + .2( y2 (n) + x0 (n)) + .5 x0 (n) 4M, 4A (cause hardware cost to grow) 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Column-Based Additive Number Splitting y0 (n) y 0 ( n + 1) .4 .7 .4 .8 ( ) y n y ( n + 1) = .3 .6 .2 .9 1 1 y (n) y 2 ( n + 1) .5 .3 .2 .7 2 x0 ( n ) .4 .7 .4 .8 1 0 0 0 AG = [0 0 .2 0] T ' = .3 .6 0 .9 W = 0 1 0 1 .5 .3 0 .7 0 0 1 1 .4 . 3 .5 .7 .6 .3 .4 .2 .2 .8 1 .9 = 0 .7 0 0 1 0 0 0 1 台灣大學 吳安宇 教授 .4 0 .3 1 . 5 1 0 .7 .6 .4 0 .3 0 0 .2 .8 .9 .7 0 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU .4 .7 .4 .8 1 . 3 . 6 . 2 . 9 = 0 .5 .3 .2 .7 0 0 0 1 0 0 1 .4 .7 .4 .8 0 . 3 . 6 0 . 9 1 .5 .3 0 .7 1 0 0 . 2 0 y1 ( n + 1) = .3 y0 ( n) + .6 y1 ( n) + .2 y 2 ( n) + .9 x0 ( n) y2 (n + 1) = .5 y0 (n) + .3 y1 (n) + .2 y2 (n) + .7 x0 (n) Transform .2 y2 (n) is computed once 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Comparison Difference Row-based number splitting Column-based number splitting single output multiple output Optimization Number splitting is performed in conjunction with an optimization During each iteration, number splitting lead to the largest reduction in the cost function 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Multiplicative Number Splitting Y = TX The product Ti-1Ti remains unchanged if for some constant K (1) the j-th row of Ti is transformed as Row( j ) = Row( j ) − K ∗ Row(k ) (2) the k-th column of Ti-1 is transformed as Col (k ) = Col (k ) − K ∗ Col ( j ) where j and k denote arbitrary valid row and column indices. 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU a 1 ,1 a 2 ,1 T = a 3 ,1 a 4 ,1 0 1 0 0 a 2,2 a 3,2 a 4,2 a1, 3 a 2 , 3 a 3,3 a 4 ,3 T = T1T 2 ∗ (3) 1 0 0 0 a1, 2 0 0 1 0 0 a 1 ,1 0 a 2 , 1 0 a 3 ,1 1 a 4 ,1 台灣大學 吳安宇 教授 a 1,2 a 2 ,2 a 3,2 a 4 ,2 a 1,3 a 2 , 3 a 3 ,3 a 4 ,3 ∗ (−3) ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU ∗ (1) 3 0 0a1,1 − 3a2,1 a1,2 − 3a2,2 a1,3 − 3a2,3 a2,2 a2,3 1 0 0 a2,1 a3,2 a3,3 0 1 0 a3,1 a4,2 a4,3 0 0 1 a4,1 1 0 0 0 1 0 0 0 T = T1T2 3 1 0 0 3 1 1 0 0 a1,1 − 3 a 2 ,1 a −a 0 2 ,1 3 ,1 a 3 ,1 0 1 a 4 ,1 a1 , 2 − 3 a 2 , 2 a 2 , 2 − a3, 2 a 3, 2 a4,2 台灣大學 吳安宇 教授 ∗ (−1) a1 , 3 − 3 a 2 , 3 a 2 , 3 − a 3 , 3 a 3,3 a 4 ,3 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU T = T1T 2 T 3 1 0 0 0 3 1 0 0 3 1 1 0 0a1,1 − 3a2,1 a1,2 − 3a2,2 a1,3 − 3a2,3 1 0 0 a −a 0 2,1 3,1 a2,2 − a3,2 a2,3 − a3,3 0 1 0 a3,2 a3,3 0 a3,1 0 0 1 ∗ (−1) a4,2 a4,3 1 a4,1 ∗ (1) 1 0 0 0 3 3 0 a1,1 − 3a2,1 + a1,3 − 3a2,3 a1, 2 − 3a2, 2 a2, 2 − a3, 2 1 1 0 a2,1 − a3,1 + a2,3 − a3,3 a3,1 + a3,3 a3, 2 0 1 0 a4,1 + a3,3 a4, 2 0 0 1 台灣大學 吳安宇 教授 a1,3 − 3a2,3 1 0 0 a2,3 − a3,3 0 1 0 a3,3 − 1 0 1 a4,3 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Example. (p. 582) . 4 T = . 6 . 1 ∗ (1) 1 T = 0 0 .7 .8 .3 .4 . 2 . 2 T = T 1T 2 0 1 0 0 . 4 0 . 6 1 . 1 台灣大學 吳安宇 教授 .7 .8 .3 .4 ∗ (−1) .2 . 2 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU T 1 T = 0 0 = T 1T 0 1 1 0 1 0 2 .4 .5 . 1 .7 .5 .3 .4 0 . 2 ∗ (2) 1 T = 0 0 0 1 0 2 .2 1 .5 1 . 1 台灣大學 吳安宇 教授 .1 .5 .3 0 0 . 2 ∗ (− 2 ) ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU T = T1T2T3 1 0 T = 0 1 0 0 2 .2 .1 0 1 0 0 1 .5 .5 0 0 1 0 1 .1 .3 .2 0 0 1 ∗ (1) ∗ (−1) ∗ (−2) 1 0 T = 0 1 0 0 2 .1 .1 0 1 0 0 1 0 .5 0 1 1 0 1 − .2 .3 .2 0 0 1 台灣大學 吳安宇 教授 ∗ ( 2) ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU T = T1T2T3 −3 0 2.1 .1 01 0 0 T = − 2 1 10 .5 01 1 0 − 2 0 10 .5 .20 0 1 ∗ (−1) ∗ (1) ∗ (−1) − 3 2 2.1 .1 0 1 0 0 T = − 2 2 10 .5 0 1 1 0 − 2 1 10 0 .20 0 1 台灣大學 吳安宇 教授 ∗ (1) ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU T = T1T2T3 − 3 2 2 0 .1 0 1 0 0 T = − 2 2 1 − .5 .5 0 2 1 0 − 2 1 1 0 0 .2 0 0 1 Originally 9 nontrivial multiplications Transform 2 nontrivial multiplications At the expense of a few additions and shift operations 台灣大學 吳安宇 教授 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Conclusions Strength reduction can help to reduce the number of operations as well as the number bits. The concept can be applied to digital filter designs and other DSP problems (e.g., adaptive filters): Reduce signal strength to help to reduce the wordlength assignment. Low switching activities to lower the power consumption. 台灣大學 吳安宇 教授
© Copyright 2026 Paperzz