Formal Bit With Determination for Nested Loop Programs

Formal Bit With Determination
for Nested Loop Programs
David Cachera,
Tanguy Risset,
Djamel Zegaoui
Tanguy Risset
1
Outline
• Introduction/motivation
• Explaining the methodology
• Solving the Bit Width equation with
(max,+)
Tanguy Risset
2
Context and Motivations
• Context:
– High level synthesis (hardware compilation
from functional specification)
– How to go (safely) from algorithmic
description to finite precision implementation
• Specific motivations:
– Parameterized loop nests programs
– MMAlpha methodology
Tanguy Risset
3
Context and Motivations: MMAlpha
FPGA
Uniformization
Alpha
Scheduling/Mapping
ASIC
VHDL
RTL Derivation
• Provide a formal methodology based on the
strong semantic properties of the Alpha language
• But still ! Keep applicability for effective VHDL
generation
Tanguy Risset
4
BW determination: state of the art
• Formal methods :
– Provide abstract framework for solving the
problem (Gaut, Ptolemy, DeepC)
– Limited applicability
• Simulation based methods:
– Based on probabilistic models for input data
(Ptolemy, Imec,etc.)
– Time consuming processes
• Ideally: provide formal methods to speed up
the simulation.
Tanguy Risset
5
Our methodology
• Start from loop nest specification (in Alpha)
• Schedule and Place (SIMD-like specification)
• Bit Width determination:
– problem modeling
– BW equation generation
– BW equation solving
• Hardware generation (VHDL)
Tanguy Risset
6
Example: … the FIR !
N 1
n  N  1, y n    xn  i wi 
i 0
system fir : {N,M | 3<=N<=M-1}
(x : {n | 1<=n<=M} of integer;
w : {i | 0<=i<=N-1} of integer)
returns
(res : {n | N<=n<=M} of integer);
var
Y : {n,i | N<=n<=M; -1<=i<=N-1} of integer;
let
Y[n,i] = case
{ | i=-1} : 0[];
{ | 0<=i} : Y[n,i-1] +w[i] * x[n-i];
esac;
res[n] = Y[n,N-1];
Tanguy Risset
7
tel;
Problem modeling: error signal
• « Formal » signal s(n), implementation š(n)
• Noise signal: e(n)=s(n)- š(n)
• Noise Standard deviation:
s 
1
M

 1
 es (i )  

M

i 1 

M

es ( j )  


j 1

M
2
 1 
Rs  10 Log10  2 
s 
• Signal to Noise ratio (SNR):
• Good bit width if Rs is greater than a given
value
Tanguy Risset
8
Operators modeling [Tou99]
• Let X be a signal encoded on m+n+1 bits
bm
...
b0
. b-1
...
b-n
b-n+1 ...
q2
• Generated error:  X 
where q=2-n
12
• Error propagation:
2
2





– Addition: X Y
X
Y
2
2
– Multiplication:  X *Y  Ymax
 X2  X max
 Y2
Tanguy Risset
9
Architectural description in Alpha
W[t,p] = case
{ | t=p+1} : w[t-1];
{ | p+2<=t} : W[t-1,p];
esac;
XP[t,p] = case
{ | p=0} : x[t+N-1];
{ | 1<=p} : XP[t-2,p-1];
esac;
Y[t,p] = case
{ | p=-1} : 0[];
{ | 0<=p} : Y[t-1,p-1] +
W[t-1,p] * XP[t-1,p];
esac;
Tanguy Risset
10
Generation of BW equation
• Simple projection of Alpha equation on
space (p index) (BWA=A2):
W[t,p] = case
{ | t=p+1} : w[t-1];
{ | p+2<=t} : W[t-1,p];
esac;
XP[t,p] = case
{ | p=0} : x[t+N-1];
{ | 1<=p} : XP[t-2,p-1];
esac;
Y[t,p] = case
{ | p=-1} : 0[];
{ | 0<=p} : Y[t-1,p-1] +
W[t-1,p] * XP[t-1,p];
esac;
BWW[p]
= Max(
BWw[]
BWW[p])
BWXP[p] =case
{ | p=0} : BWx[]
{ | 1<=p} : BWXP[p-1]
esac
BWY[p] = case
{ | p=-1} : 0[];
{ | 0<=p} :
q2/12+max(BWY[p-1] + q2/12,
BWW*XP[p]+q2/12)
Tanguy Risset esac;
11
Solving the BW equations (FIR)
• Here the solution can be easily provided by
a symbolic solver (q=2-n):
 X ( p)   x  0
p
 W ( p)   w  0
p
q2
 Y ( p)  ( p  1)
p
6
1
SNR  R  10 Log10
 R  n  (ln( N ))
2
 q 
 N 
 6
Tanguy Risset
12
Solving the BW equations...
• In general, we solve successively the
strongly connected component of the
reduced dependence graph
input
X
input
W
V1
V2
Y
V3
Fir (3 SCC)
Other example: 1 SCC
Tanguy Risset
13
Solving BW Eq for 1 SCC
input
V1
V1[t,p] = case
{ | p=0} : Input[]
{ | p>=1} : V1[t-1,p-1]V3[t-2,p-1];
esac;
V2[t,p] = case
{ | p=0} : Input[];
{ | 1<=p} : V2[t-2,p-1]+
V3[t-1,p-1];
esac;
V3[t,p] = case
{ | p=0} : Input[];
{ | 1<=p} : V1[t-1,p-1]+
V2[t-3,p-1]
esac;
V2
V3
BWV1[p] = case
{ | p=0} : 0
{ | p>=1} : max(BWV1[p-1]+ ,
BWV3[p-1] ]+ );
esac;
BWV2[p] = case
{ | p=0} : 0
{ | 1<=p} : max(BWV2[p-1]+ ,
BWV3[p-1] ]+ );
esac;
BWV3[p] = case
{ | p=0} : 0
{ | 1<=p} : max(BWV1[p-1]+ ,
BWV2[p-1] ]+ );
esac;
Tanguy Risset
14
Solving the BW equations...
• General form (under some assumptions) of
the BW equation for one SCC with k
variables (for i=1..k):
BWi ( p)  MaxBW1 ( p  1)  1 ,...., BWk ( p  1)  k , 0 
• Example :
BW1 ( p)  MaxBW1 ( p  1)   , BW3 ( p  1)   , 0 
BW2 ( p)  MaxBW2 ( p  1)   , BW3 ( p  1)   , 0 
BW3 ( p)  MaxBW1 ( p  1)   , BW2 ( p  1)   , 0 
Tanguy Risset
15
Using (max,+) notations
•  is the max and  is the addition
BW1 ( p)  BW1 ( p  1)   , BW3 ( p  1)   , 0 
BW2 ( p)  BW2 ( p  1)   , BW3 ( p  1)   , 0 
BW3 ( p)  BW1 ( p  1)   , BW2 ( p  1)   , 0 
• Or:
BW ( p)  M  BW ( p  1)
    0 


 BW1 


    0 
where M  
and BW   BW2 

   0
 BW 


3

   0 


Tanguy Risset
16
Perron-Frobenius for (max,+)
• Let MRmaxnn be an irreducible matrix in
(max,+) with spectral ray M and cyclicity
c(M), there exist an integer N such that :
kNM
k c ( M )
  M 
c(M )
M
k
• Here: c(M)=1, M = and N=1:
BWi ( p)    M 
p
 p 
BWi (0)       0  p  0
 i 1 
Tanguy Risset
17
Result
• If we respect our restrictions, we are able to
solve, in a parametric way the bit Width
equations for a loop nest program.
• This is the only method that solves this
problem in a parametric way (MIT did
something with DeepC but they do not
handle symbolic parameters)
Tanguy Risset
18
Restrictions of our methodology
• Linear array architecture
• BW equation solvable (i.e. no auto-adaptive
mechanism or complicated convergence
property)
• No multiplication in strongly connected
component of the graph:
a[0]=x
Do i=1,N
a[i]=a[i-1]*a[i-1]
Enddo
Tanguy Risset
19
Conclusion
• First method for parameterized loop nest bit
width determination
• Allow reducing the time needed for
simulation (probably not much more than
previous methods did)
• New typing mechanism introduced in
Alpha:
– Integer[S,8]
– Integer[S,3,6]
– C = Mul8x8-12(A,B)
– B = Trunc(C,11)
Tanguy Risset
20
Processor variable dependent BW
Rmin
N=3
N=10
N=50
N=100
N=200
30dB
b=5
b=6
b=7
b=8
b=8
30dB
b(p)
b(0)=5
b(1-2)=4
b(0-16)=7
b(17-49)=6
b=8
b=10
b(0)=8
b(1-76)=7
b(77-99)=6
b=11
b(0-68)=8
b(69-199)=7
50dB
b(0)=6
b(1-8)=5
b(9)=4
b=9
50dB
b(p)
b(0-1)=8
b(2)=7
b(0-2)=9
b(3-9)=8
b(0-40)=10
b(41-49)=9
b(0)=8
b(1-99)=10
b(0-162)=11
b(163-199)=10
Tanguy Risset
b=12
21