Numerical Strength Reduction

Graduate Institute of Electronics Engineering, NTU
Numerical Strength Reduction
(Chapter 15 of Textbook)
For Advanced VLSI Design
11-27-2002
台大電機系吳安宇教授
ACCESS IC LAB
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Introduction
Numerical Transformation techniques
Re-structure the computation
Rely upon sub-expression elimination (sharing)
Advantage
Reduce the strength (signal level or data wordlength)
of DSP computation
Improve the performance (Power, Speed and Area)
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Sub-expression Elimination
Numerical transform technique
Sub-expression
Only performed on constant multiplications
Efficient implementation for “constant” multiplications
by dedicated shift-and-add multipliers
Example: [Sec.15.2.1 on pp.560]
 a × x = 13 × x = 001101× x

b × x = 27 × x = 011011× x
Note: 000001 × x is the input signal (without additional
computation)
Total 5 Shifters, 5 Adders (2S,2A and 3S, 3A, respectively)
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Common term
 a × x = 13 × x = 001101× x = 001001× x + 000100 × x

b × x = 27 × x = 011011× x = 001001× x + 010010 × x
a × x = 000100 × x + 001001× x
Total 2S, 2A
* Further enhanced by Modify by sub-expression sharing
 a × x = 13 ⋅ x = 000100 × x + 001001 × x

b × x = 27 ⋅ x = 010010 × x + 001001 × x

= (001001× x) << 1 + (001001× x)

Total 3S, 3A (2S, 2A and 1S, 1A, respectively)
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Multiple Constant Multiplication (MCM)
The algorithm for MCM uses an iterative matching
process that consists of the following steps:
1. Express each constant in the set using a binary format (such
as signed, unsigned, 2’s complement).
2. Determine the number of bit-wise matches (nonzero bits)
between all of the constants in the set.
3. Choose the best match.
4. Eliminate the redundancy from the best match. Return the
remainders and the redundancy to the set of coefficients.
5. Repeat Step 2-4 until no improvement is achieved.
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Example of iterative matching process [15.3.1(p.561)]
a × x = 237 × x

 b × x = 182 × x
 c × x = 93 × x

a
3
b
4
2
c
Constant
Value
Unsigned
a
237
11101101
b
182
10110110
c
93
01011101
Constant
Unsigned
Rem. of a
10100000
b
10110110
Rem. of c
00010000
Red. of a,c
01001101
台灣大學吳安宇教授
Common
Term
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Constant
Unsigned
Rem. of a (2)
00000000
Rem. of b (1)
00010110
Rem. of c (1)
00010000
Red. of a,c (1ST run)
01001101
Red. Of Rem. of a and b
(2nd run)
10100000
Rem.
a
2
b
1
Rem.
c
3S,3A





2S,1A
a = [ 01001101 + 10100000 ]
3S,2A
b = [ 00010110 + 10100000 ]
5S,5A
5S,3A
c = [ 01001101+ 00010000 ]
1S,1A
1S
9This implementation requires
:
9 Shifts ; 9 Adds
9Standard implementation required: 14 Shifts ; 13 Adds
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Application in Linear Transformation
General form of linear transformation:
y =T ∗X
T: m-by-n matrix
x: length-n vector
y: length-m vector
Equivalent form:
n
yi = ∑ tij x j , i = 1,....., m.
j =1
Subexpression elimination:
Minimize the number of shifts and additions required to
compute the product tijxj
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Linear Transformation (Matrix)
3 basic steps of strength reduction:
1. Minimize the number of shifts and adds by using iterative
matching algorithm
2. Formation of unique products using the sub-expression
found in 1st step.
3. Final step involves the sharing of adds
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Example
Step1
7
12
TX = 
5

7
Step2
8 2 13  x1 
11 7 13  x2 
8 2 15  x3 
 
11 7 11  x4 
p1 = 0101∗ x1
Columm1
Columm2 Columm3 Columm4
0101 (5)
1000 (5) 0010 (2) 1001 (9)
0010 (2)
1011 (11) 0111 (7) 0100 (4)
1100 (12)
0010 (2)
p2 = 0010∗ x1
p4 = 1000∗ x2 p5 = 1011∗ x2
p8 = 1001∗ x4
p9 = 0100∗ x4
台灣大學吳安宇教授
p3 = 1100∗ x1
p6 = 0010∗ x3 p7 = 0111∗ x3
p10 = 0010∗ x4
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Step2
y1 = 7 x1 + 8 x2 + 2 x3 + 13 x4 = [ p1 + p 2 ] + p 4 + p6 + [ p8 + p9 ]
y2 = 12 x1 + 11x2 + 7 x3 + 13x4 = p3 + p5 + p7 + [ p8 + p9 ]

y3 = 5 x1 + 8 x2 + 2 x3 + 15x4 = p1 + p4 + p6 + [ p8 + p9 + p10 ]

y4 = 7 x1 + 11x2 + 7 x3 + 11x4 = [ p1 + p2 ] + p5 + p7 + [ p8 + p10 ]
Step3






y1 = p 2 + ( p1 + p 4 + p 6 + p8 + p 9 )
y2 = p3 + p9 + ( p5 + p7 + p8 )
y3 = p10 + ( p1 + p4 + p6 + p8 + p9 )
y4 = p1 + p2 + p10 + ( p5 + p7 + p8 )
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Hardware complexity
a. Original Circuit
 y = 7 x + 8 x + 2 x + 13 x
1
2
3
4
 1
 y2 = 12 x1 + 11x2 + 7 x3 + 13 x4

 y3 = 5 x1 + 8 x2 + 2 x3 + 15 x4
 y4 = 7 x1 + 11x2 + 7 x3 + 11x4
6S, 4A, 3A
8S, 7A, 3A
6S, 4A, 3A
8S, 8A, 3A
28S, 35A
b. Modified Circuit






y 1 = p 2 + ( p1 + p 4 + p 6 + p 8 + p 9 )
y 2 = p 3 + p 9 + ( p 5 + p 7 + p8 )
y3 = p10 + ( p1 + p 4 + p6 + p8 + p9 )
y 4 = p1 + p 2 + p10 + ( p5 + p7 + p8 )
台灣大學吳安宇教授
1S,(5S,2A, 4A), 1A
3S,1A,(5S,5A, 2A), 2A
1S, 1A
3S,1A, 3A
18S, 22A
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Application in Polynomial Evaluation
y = x13 + x 7 + x 4 + x 2 + x
y = x 8 ∗ ( x 4 ∗ x) + x 2 ∗ ( x 4 ∗ x) + x 4 + x 2 + x
x = x∗x
2
x = x ∗x
4
2
2
x8 = x 4 ∗ x 4
Require 6 instead of 22 multiplications
台灣大學吳安宇教授


13 = 1101

 7 = 0111
 4 = 0100
 2 = 0010

 1 = 0001
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Subexpression sharing in Digital Filter
Representation
Canonic Sign Digit (CSD): No adjacent 1 and –1.
Signed Power-of-Two (SPT): No CSD contraint
Notation:
(x >>i) = 2−i Shift-right-by-i-bit operation
Example y = 0.101000101 ∗ x
y = ( x >> 1) − ( x >> 3) + ( x >> 7) − ( x >> 9)
x2 = x − ( x >> 2)
y = ( x2 >> 1) + ( x2 >> 7)
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Application in N-tape Filter
y (n) = c0 x(n) + c1 x(n − 1) + ⋅ ⋅ ⋅ ⋅ +c N −1 x(n − N + 1)
One
variable is multiplied to multiple constant coefficient
Subexpression elimination can then applied to this structure
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Visualization Approach
z4-tap
FIR filter with CSD representation
y (n) = 1.01010000010 ∗ x(n) + 0.10001010101 ∗ x(n − 1)
+ 0.10010000010 ∗ x(n − 2) + 1.00000101000 ∗ x(n − 3)
Step 1.
x(n)
x(n − 1)
x ( n − 2)
x(n − 3)
-1
1
1
-1
-1
1
-1
-1
-1
1
-1
1
1
1
-1
Define x2 = x1 - ( x[-1] >> -1 )
Time delay
台灣大學吳安宇教授
Shift No.
ACCESS IC LAB
Step 2.
Graduate Institute of Electronics Engineering, NTU
Define x2 = x1 - ( x[-1] >> -1 )
-1
2
1
2
-2
-1
-2
-2
1
Step 3.
-1
Define x3 = x2 - ( x1 >> 2 )
-1
3
2
-3
-2
-2
1
台灣大學吳安宇教授
-1
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Step 4. Write out the complete definition of the filter
x 2 = x1 − ( x[−1] >> (−1))
x3 = x 2 + x1 >> 2
How to deal with negative shift?
y = − x1 + x3 >> 2 + x 2 >> 10 − x3[−1] >> 5 − x 2[−1] >> 11
− x 2[−2] >> 1 + x1[−3] >> 6 − x1[−3] >> 8
Step 5.
Modify as (add >> +1 to both sides of x2 and x3)
x 2 = x1 >> 1 − x1[−1]
x3 = x 2 + x1 >> 3
y = − x1 + x3 >> 1 + x 2 >> 9 − x3[−1] >> 4 − x 2[−1] >> 10
− x 2[−2] >> 1 + x1[−3] >> 6 − x1[−3] >> 8
(Add >> -1 to the summation term y)
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Implementation using Transpose Form
Example 15.4.4(p.570)
y(n) = c0 x(n) − c1 x(n − 1) + c2 x(n − 2)
= 0.00101011× x(n)
y = x1 >>3 + x1 >>5 + x1 >> 7 + x1 >>8
+ 0.10011010× x(n − 1)
+ x2[−1] >>1+ x1[−1] >>5
+ 0.11010010× x(n − 2)
+ x2[−2] >>1+ x1[−2] >> 2
Hint: (1)setting x 2 = x1 + x1 >> 3 + x1 >> 6
(2)
 x2[−1] >> 1 = delay x2, then

= x 2 shift , then


 x 2[−2] >> 1 = double delay

= x 2 shift, then
台灣大學吳安宇教授
shift
delay
x 2, then shift
double delay
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Implementation: Sharing Method I
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Implementation: Sharing Method II
Carry-save adder
Can be applied.
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
CSD representation
Canonic Sign Digit (CSD)
Less hardware using a CSD representation
2 most common subexpressions are:
101 = x + ( x >> 2)

101 = x − ( x >> 2)
a W-bit CSD number can be broken down into:
W
18 + O(1) pairs of type 101
 W
 + O(1) pairs of type 101
 18
 W + O (1) isolated 1 or 1
 9
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Advantage in CSD
Circuit can be achieve very easily by finding the
concurrence of only 2 sub-expression
33% saving compared with the total no. of nonzero bits
This architecture can be done without a major increase
in routing cost.
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Example in CSD
Example 15.4.5 (P.572) a 3-tap FIR filter
 c0 = 0.10101010000

c1 = 0.10010100101
c = 0.10101010101
 2
101
101
台灣大學吳安宇教授
ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU
Further, reducing hardware by sharing methodology
101
101
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Number Splitting
Numerical Transform
Reduce the hardware cost or power consumption
Perform on the infinite precision version of the
constant coefficient
Strength reduction at a higher level.
Type
Additive
Multiplicative
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Representation of Number Splitting
Y = TX
Y
Y: N-Dimension Vector
T: N-by-M Matrix
X: M=J+K-Dimension Vector
T
X
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Row-Based Additive Number Splitting
 T' 
T =W ∗ 
 AG 
δ = ti , q − ti , p
9In order to equate T’ and T, 2 additional matrix must be
generated:
AG = [ 0 ...
0
ti , p 0
ti , p
0 ...
0]
9We is initially set to an identity matrix of dimension N-by-N
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Example
Step 1.
 y0 (n)
 y0 (n + 1) .4 .7 .4 .8 

y
n
(
)
 y (n + 1)  = .3 .6 .2 .9  1 
 1
 
  y ( n) 
 y2 (n + 1) .5 .3 .2 .7  2 
 x0 (n) 
 0 .7 0 .8
T ' = .3 .6 .2 .9
.5 .3 .2 .7
1 0 0 1


W = 0 1 0 0 
0 0 1 0
AG = [.4 0 .4 0]
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Step 2.
 T' 
T =W ∗ 
 0 .7 0 .8
 AG  .4 .7 .4 .8 1 0 0 1 .3 .6 .2 .9
.3 .6 .2 .9 = 0 1 0 0 



.5 .3 .2 .7 
Originally
 .5 .3 .2 .7 

0 0 1 0 

.4 0 .4 0 
y0 (n + 1) = .4 y0 (n) + .7 y1 (n) + .4 y2 (n) + .8 x0 (n)
4M, 3A
Transform
y0 (n + 1) = .4( y0 (n) + y2 (n)) + .7 y1 (n) + .8 x0 (n)
3M, 3A
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Step 3.
.4 .7 .4 .8 1
.3 .6 .2 .9 = 0
 

.5 .3 .2 .7  0
 0 .7 0 .8
T ' = .3 .6 .2 .9
.5 .3 .2 .7
Originally
1 0 0
W = 0 1 0
0 0 1
 0 .7 0
0 0 1 0 .3 .6 .2


1 0 0 1 .5 .3 0

0 1 0 0 .4 0 .4
 0 0 .2
1 0
0 1
.4 0 .4
AG = 
0 0
 0 0 .2
.8
.9
.5

0
.2
0
.2
y2 (n + 1) = .5 y0 (n) + .3 y1 (n) + .2 y2 (n) + .7 x0 (n)
4M, 3A
Transform
y2 (n + 1) = .5 y0 (n) + .3 y1 (n) + .2( y2 (n) + x0 (n)) + .5 x0 (n)
4M, 4A (cause hardware cost to grow)
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Column-Based Additive Number
Splitting
 y0 (n) 
 y 0 ( n + 1)  .4 .7 .4 .8  

(
)
y
n
 y ( n + 1)  = .3 .6 .2 .9   1 
 1
 
  y (n)
 y 2 ( n + 1)  .5 .3 .2 .7   2 
 x0 ( n ) 
.4 .7 .4 .8
1 0 0 0
AG = [0 0 .2 0]
T ' = .3 .6 0 .9
W = 0 1 0 1
.5 .3 0 .7 
0 0 1 1
.4
. 3

.5
.7
.6
.3
.4
.2
.2
.8   1
.9  =  0
.7   0
0
1
0
0
0
1
台灣大學吳安宇教授
.4
0 
.3


1
. 5
1  
0
.7
.6
.4
0
.3
0
0
.2
.8 
.9 
.7 

0
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
.4 .7 .4 .8  1
. 3 . 6 . 2 . 9  =  0

 
.5 .3 .2 .7   0
0
0
1
0
0
1
.4 .7 .4 .8 
0 

.
3
.
6
0
.
9

1  
.5 .3 0 .7 
1  

0
0
.
2
0


y1 ( n + 1) = .3 y0 ( n) + .6 y1 ( n) + .2 y 2 ( n) + .9 x0 ( n)
y2 (n + 1) = .5 y0 (n) + .3 y1 (n) + .2 y2 (n) + .7 x0 (n)
Transform
.2 y2 (n) is computed once
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Comparison
Difference
Row-based number splitting
Column-based number splitting
single output
multiple output
Optimization
Number splitting is performed in conjunction with
an optimization
During each iteration, number splitting lead to the
largest reduction in the cost function
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Multiplicative Number Splitting
Y = TX
The product Ti-1Ti remains unchanged if for
some constant K
(1) the j-th row of Ti is transformed as
Row( j ) = Row( j ) − K ∗ Row(k )
(2) the k-th column of Ti-1 is transformed as
Col (k ) = Col (k ) − K ∗ Col ( j )
where j and k denote arbitrary valid row and column indices.
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
 a 1 ,1
a
2 ,1

T =
 a 3 ,1

 a 4 ,1
0
1
0
0
a 2,2
a 3,2
a 4,2
a1, 3 
a 2 , 3 
a 3,3 

a 4 ,3 
T = T1T 2
∗ (3)
1
0

0

0
a1, 2
0
0
1
0
0   a 1 ,1

0   a 2 , 1
0   a 3 ,1

1   a 4 ,1
台灣大學吳安宇教授
a 1,2
a 2 ,2
a 3,2
a 4 ,2
a 1,3 
a 2 , 3 
a 3 ,3 

a 4 ,3 
∗ (−3)
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
∗ (1)
3 0 0a1,1 − 3a2,1 a1,2 − 3a2,2 a1,3 − 3a2,3 

a2,2
a2,3 
1 0 0 a2,1
a3,2
a3,3 
0 1 0 a3,1


a4,2
a4,3 
0 0 1 a4,1
1
0

0

0
1
0

0

0
T = T1T2
3
1
0
0
3
1
1
0
0   a1,1 − 3 a 2 ,1
a −a

0   2 ,1
3 ,1
a 3 ,1
0 

1   a 4 ,1
a1 , 2 − 3 a 2 , 2
a 2 , 2 − a3, 2
a 3, 2
a4,2
台灣大學吳安宇教授
∗ (−1)
a1 , 3 − 3 a 2 , 3 
a 2 , 3 − a 3 , 3 

a 3,3

a 4 ,3

ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
T = T1T 2 T 3
1
0

0

0
3
1
0
0
3
1
1
0
0a1,1 − 3a2,1 a1,2 − 3a2,2 a1,3 − 3a2,3 
1 0 0
 a −a


0 2,1 3,1 a2,2 − a3,2 a2,3 − a3,3 

0
1
0

a3,2
a3,3 
0 a3,1
0 0 1 ∗ (−1)

a4,2
a4,3 
1 a4,1
∗ (1)
1
0

0

0
3 3 0 a1,1 − 3a2,1 + a1,3 − 3a2,3 a1, 2 − 3a2, 2

a2, 2 − a3, 2
1 1 0  a2,1 − a3,1 + a2,3 − a3,3
a3,1 + a3,3
a3, 2
0 1 0 

a4,1 + a3,3
a4, 2
0 0 1 
台灣大學吳安宇教授
a1,3 − 3a2,3 
 1 0 0

a2,3 − a3,3  

0
1
0

a3,3  
 − 1 0 1
a4,3 
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Example. (p. 582)
. 4
T =  . 6
 . 1
∗ (1)
1

T = 0
 0
.7
.8
.3
.4 
. 2 
. 2 
T = T 1T 2
0
1
0
0  . 4


0  . 6
1   . 1
台灣大學吳安宇教授
.7
.8
.3
.4 
∗ (−1)

.2 
. 2 
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
T
1
T =  0
 0
= T 1T
0
1 
1 
0
1
0
2
.4
.5

 . 1
.7
.5
.3
.4 
0 
. 2 
∗ (2)
1

T = 0
 0
0
1
0
2  .2


1  .5
1   . 1
台灣大學吳安宇教授
.1
.5
.3
0 

0 
. 2 
∗ (− 2 )
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
T = T1T2T3
1 0
T = 0 1
0 0
2  .2 .1 0  1 0 0 
1  .5 .5 0  0 1 0 
1  .1 .3 .2  0 0 1 
∗ (1)
∗ (−1)
∗ (−2)
1 0
T = 0 1
0 0
2   .1 .1 0   1 0 0 
1   0 .5 0  1 1 0 
1   − .2 .3 .2   0 0 1 
台灣大學吳安宇教授
∗ ( 2)
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
T = T1T2T3
−3 0 2.1 .1 01 0 0
T = − 2 1 10 .5 01 1 0
− 2 0 10 .5 .20 0 1
∗ (−1)
∗ (1)
∗ (−1)
− 3 2 2.1 .1 0 1 0 0
T = − 2 2 10 .5 0 1 1 0
− 2 1 10 0 .20 0 1
台灣大學吳安宇教授
∗ (1)
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
T = T1T2T3
 − 3 2 2  0 .1 0  1 0 0
T = − 2 2 1  − .5 .5 0  2 1 0
− 2 1 1   0 0 .2 0 0 1
Originally
9 nontrivial multiplications
Transform
2 nontrivial multiplications
At the expense of a few additions and shift operations
台灣大學吳安宇教授
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
Conclusions
Strength reduction can help to reduce the
number of operations as well as the number bits.
The concept can be applied to digital filter
designs and other DSP problems (e.g., adaptive
filters):
Reduce signal strength to help to reduce the
wordlength assignment.
Low switching activities to lower the power
consumption.
台灣大學吳安宇教授

Download Report

Numerical Strength Reduction

Paperzz.com

Your Paperzz