BOCA

(BOCA)
Bijzondere Onderwerpen
Computer Architectuur
Block A
Introduction
1
The aims of the course
• Show the relation between the algorithm and
the architecture.
• Derive the architecture from the algorithm.
• Explaining and formalizing the design
process.
• Explain the distinction between structure and
behavior.
• Explain some architectures.
2
The design process
A design description may express:
• Behavior: Expresses the relation between the
input and the output value-streams of the system
• Structure: Describes how the system is
decomposed into subsystems and how these
subsystems are connected
• Geometry: Describes where the different parts
are located.
Pure behavioral, structural or geometrical
descriptions do not exist in practice.
3
Abstraction levels
Behavior
Structure
Application
Algorithm
Geometry
Board level
Block level
Basic operator
Boolean logic
Physical level
Processing
element
Layout
Basic block
Cell
Transistor
4
The Design Process
The implementation i
is the specification
for the
implementation i+1
verification:
Idea
by simulation
only
Spec 0
by simulation
formal verification
For practical reasons
a specification must
be executable
Spec 1
by simulation
formal verification
Spec N
5
Descriptions
• Predicate logic
• Algebra (language Z, SDL (VDM) )
• Process algebras CCS, CSP, Lotos
• VHDL, Verilog
• Silage, ......
6
Specification overloading
Specification overloading means that the
specification gives a possibly unwanted
implementation suggestion,
i.e. the behavioral specification expresses
structure
In practice:
A behavioral specification always contains
structure.
7
Example:
same function
same behavior,
different expressions
different structure
different designs
z  a  b  2
2
suggests:
and
z  a  b   a  2 
+
x
z
+
z
a
b
2
x
a
x
suggests:
b
8
Architecture
Definition:
Architecture is the way in which hardware and
software is structured;
the structure is usually based on grandiose
design philosophies.
Architecture deals with fundamental elements
that affect the way a system operates and thus its
capabilities and its limitations.
The New American Computer Dictionary
9
Our focus
• Array processors.
• Systolic arrays.
• Wave-front array processors.
• Architectures for embedded algorithms
s.a. digital signal processing algorithms.
10
Array processor
An array processor is a
structure in which
identical processing
elements are arranged
regularly
1 dimension
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
2 dimensions
11
Array processor 3 dimensions
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
12
Systolic array
In a systolic array
processor all
communication path
contain at least one
unit delay (register).
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
is register or delay
PE
PE
PE
PE
Delay constraints are local. Therefore unlimited extension
without changing the cells
13
Wave-front array
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
14
Array Processors
Can be approached from:
• Application
• Algorithm
• Architecture
•Technology
We will focus on
Algorithm
Architecture
Derive the architecture from the algorithm
15
Array processors: Application areas
• Speech processing
• Image processing (video, medical ,.....)
• Radar
• Weather
• Medical signal processing
Many simple calculations
• Geology
on a lot of data
•...........
in a short time
General purpose processors do not provide sufficient
processing power
16
Example video processing
1000 operations per pixel (is not that much)
1024 x 1024 pixels per frame (high density TV)
50 frames per second (100 Hz TV)
50 G operations per second
< 1 Watt available
Pentium 2Ghz: 2G operations per second
> 30 Watt
required 25 Pentiums 750 Watt
17
Description of the algorithms
In practice the algorithms are described (specified)
in:
• some programming language.
In our (toy) examples we use:
• programming languages
• algebraic descriptions
18
Examples of algorithms we will use:
Filter:
Matrix algebra:
yt  i 0 hi .xt i
N 1


y  Cx
yi   j 0 ci , j x j
N 1
Transformations likeFourier transform
Z transform
Sorting
....
19
Graphs
Graphs are applicable for describing
• behavior
• structure
Dependency graphs
consist of:
• nodes expressing operations or functions
• edges expressing data dependencies or
the flow of data
So, graphs are suitable to describe the design flow
from
Algorithm to architecture
20
Design flow example: Sorting
idea
program (imperative)
single assignment code (functional)
recurrent relations
dependency graph
21
Sorting: the idea
>

>

empty place
needed
shifted
one position
22
y
8
x
9
9
8
mj-1
mj
8
3
1
x  mj ?
8
y := mj
9
6
mj-1
3
mj+1
y
x
9
6
mj
3
3
1
mj+1
8
x
9
9
mj-1
y
8
mj
x
9
9
mj:= x
8
6
3
3
1
mj+1
8
y
6
3
3
1
x := y
23
mj-1
mj
mj+1
Sorting: inserting one element
if (x>= m[j])
if (x>= m[j]) swap(m[j],x);
{ y = m[j];
m[j] = x;
Identical descriptions of swapping
x = y;
m[j],x = MaxMin(m[j],x);
}
Inserting an element into a sorted array of i elements such
that the order is preserved:
m[i] = -infinite
for(j = 0; j < i+1; j++)
{ m[j],x = MaxMin(m[j],x);
}
24
Sorting: The program
Sorting N elements in an array is composed from N times inserting an
element into a sorted array of N elements such that the order is
preserved. An empty array is ordered.
input
int in[0:N-1], x[0:N-1], m[0:N-1];
for(int i = 0; i < N; i++)
{ x[i] = in[i]; m[i] = - infinite; }
body
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[j],x[i] = MaxMin(m[j],x[i]);}
}
output
for(int j = 0; j < N; j++)
{ out[j] = m[j];}
25
Sorting: Towards ‘Single assignment’
Single assignment:
Each scalar variable is assigned only once
Why?
Goal is a data dependency graph
- nodes expressing operations or functions
- edges expressing data dependencies or
the flow of data
26
Sorting: Towards ‘Single assignment’
Single assignment:
Each scalar variable is assigned only once
Why?
Code
Nodes
a
x=a+b;
Graph
x
+
b
c
How do you connect these?
*
x=c*d;
x
d
27
Sorting: Towards ‘Single assignment’
Single assignment:
Each scalar variable is assigned only once
Why?
Code
x=a+b;
x=c*d;
Description already optimized towards
implementation: memory optimization.
But, fundamentally you produce two
different values, e.g. x1 an x2
28
Sorting: Towards ‘Single assignment’
Single assignment:
Each scalar variable is assigned only once
Start with m[j]:
m[j] at loop index i depends on the value at loop index i-1
hence,
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; i++)
j++)
{ m[j],x[i]
= MaxMin(m[j],x[i]);}
m[i,j],x[i]
= MaxMin(m[i-1,j],x[i]);}
}
29
Sorting: Towards ‘Single assignment’
x[i] at loop index j depends on the value at loop index j-1
hence, for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; i++)
{ m[i,j],x[i]
= MaxMin(m[i-1,j],x[i]);}
m[i,j],x[i,j]
= MaxMin(m[i-1,j],x[i,j-1]);}
}
30
Sorting: The algorithm in ‘single assignment’
input
int in[0:N-1], x[0:N-1,-1:N-1], m[-1:N-1,0:N-1];
for(int i = 0; i < N; i++)
{ x[i,-1] = in[i]; m[i-1,i] = - infinite; }
body
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}
}
output
for(int j = 0; j < N; j++)
{ out[j] = m[N-1,j];}
All scalar variables are assigned only once.
The algorithm satisfies the single assignment property
31
Sorting: The algorithm in ‘single assignment’
0
n-1
-1
x0
0
x1
x2
x2
in
n-1
-∞
x0
x1
x3
-1
n-1
-∞
m
-∞
i = 1
-∞
x3
j = 0
n-1
x
out
int in[0:N-1], x[0:N-1,-1:N-1], m[-1:N-1,0:N-1];
for(int i = 0; i < N; i++)
{ x[i,-1]
= in[i]; m[i-1,i] = - infinite; }
MaxMin
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}
}
32
Sorting: The algorithm in ‘single assignment’
0
n-1
-1
5
0
7
i
-∞
5
-∞
MM
7
4
6
-1
m
-∞
4
n-1
n-1
-∞
6
n-1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}
}
33
Sorting: The algorithm in ‘single assignment’
0
n-1
-1
5
0
7
i
-∞
-∞
5
7
-∞
m
-∞
MM
4
6
5
-1
n-1
4
n-1
-∞
6
n-1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}
}
34
Sorting: The algorithm in ‘single assignment’
0
n-1
-1
5
0
7
i
4
6
5
-∞
7
5
-1
-∞
5
7
4
n-1
n-1
-∞
m
-∞
-∞
6
n-1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}
}
35
Sorting: The algorithm in ‘single assignment’
0
n-1
-1
5
0
7
i
-∞
7
5
-∞
5
7
-∞
m
-∞
MM
4
6
5
-1
n-1
4
n-1
-∞
6
n-1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}
}
36
Sorting: The algorithm in ‘single assignment’
0
n-1
-1
5
0
7
i
4
6
5
-∞
7
5
-1
-∞
-∞
5
-∞
7
5
4
n-1
n-1
m
-∞
-∞
6
n-1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}
}
37
Sorting: Recurrent relation
A description in single assignment can be directly translated into a
recurrent relation
declaration in[0:N-1], out[0:N-1], x[0:N-1, -1:N-1], m[-1:N-1, 0:N-1];
input
x[i,-1] = in[i]
m[i-1,i] = - infinite
body
m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1])
output
out[j] = m[N-1,j]
area
0 <= i < N;
0 <= j < i+1
}
Notice that the order of these relations is arbitrary
38
Sorting: Body in two dimensions
m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1])
body
The body is executed for all i and j. Hence two dimensions
j
m[i-1,j]
x[i,j-1]
i
MaxMi
n
x[i,j]
m[i,j]
39
Sorting: Body implementation
body
m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-i])
if( m[i-1,j] <= x[i,j-1])
{ m[i,j] = x[i,j-1]; x[i,j] = m[i-1,j]; }
else
{ m[i,j] = m[i-1,j]; x[i,j] = x[i,j-1]); }
j
m[i-1,j]
1
0
x[i,j-1]
1
i
0
x[i,j]

m[i,j]
40
Sorting: Implementation N = 4
-1
0
1
2
3
j
-1
m[-1,0]=
x[0,-1]
0
PE = MaxMin
PE
m[0,1]=
x[1,-1]
1
PE
PE
m[1,2]=
x[2,-1]
2
PE
PE
PE
m[2,3]=
x[3,-1]
3
i
PE
m[3,0]
PE
PE
PE
m[3,1] m[3,2] m[3,3]
41
Sorting: Example N = 4
3
1
5
2
5
3
2
1
42
Something on functions
Tuple : a, b
a  A b B
Cartesian product: set of all tuples
a, b is A  B
a, b  A  B
The number of tuples in the set
AB  A  B
If Q is a set and P is a subset of Q, P  Q
then the set of all subsets of Q is 2Q
The number of subsets of Q is 2Q  2
Q
Hence, the set of all subsets of A  B is 2 AB
and the number of subsets of A  B is 2
AB
2
AB
43
Something on functions
F  X  Y 
Function F
X  Y 
is the set of all functions with domain X and
co-domain Y
F is a function in
X  Y 
if and only if
Each element of the domain of X is mapped
by F on a single element of the codomain Y
Hence
and
a : a  X  F a Y
a, b,c : a  X  F a  b  F a  c  b  c
F can be represented as a set of tuples
a, b with a  X and b Y
Hence,
 X  Y   2 X Y
44
Functions, Arrays, Tuples, Sequences, ....
Arrays, tuples and sequences are all representations of
the same set of functions
D
l ,u
V 
in which Dl,u is a closed subset of the set of integers Z
Dl ,u  z | z  Z  l  z  u
and V is some value co-domain
So
y  y 0 , y1, y 2,....., y N 1,
corresponds to y  D0,N 1  V 
Hence, yi, y(i) and y[i] are syntactically different notations
for the function value in i.
45
Functions on more than one variable
Currying
A function on two variables can be represented in three
different ways:
F a, b   F  a b   F  b a 
F a,b :
a, b
F
v
v  F a, b
F  A  B  V 
F   a, b ,v | a, b  A  B  v  V 
46
Functions on more than one variable
Currying
F a, b   F  a b   F  b a 
F a b  :

a
F
b
F  a 
v

v  F  ab
F *  A  B  V 
F   a, p | a  A  p  B  V 
F

ba :
b
F 
a
F  b 
v

v  F  ba
F * *  B  A  V 
F   b, q | b  B  q  A  V 
47
Functions on more than one variable
Currying (Example)
v  a  2b a  0,1, 2 b  0, 1, 2, 3
v
b
0 1 2 3
0 0 2 4 6
0
1

F  1  
2
3
 1
 3
 5
 7
a 1 1 3 5 7
2 2 4 6 8
F 1,2  F  12  F  21
0  4

F  2  1  5
2  6

48
Linear Time Invariant Systems
x and y are streams.
x
z
y
F
z
time
time
x  Z  V 
y  Z  V 
Time is represented by the set of integers Z,
so F maps functions on functions
F  Z  V   Z  V 
Obviously, this class of functions also models systems that
cannot exist in reality. For example non-causal systems
49
Adding functions
x and y are streams
modeled by functions on Z.
z
time
+
xi  Z  V 
=
x3  x1  x2  x3 z   x1z   x2 z  for all z  Z
50
Linear functions, linear systems
Definition:
A system F is called linear if F a.x1  b.x2   a.F x1   b.F x2 
or
y1  a.F x1  

y 2  b.F x2 

x1
y1
x2
y2
x1+x2
y1+y2
y1  y 2  F a.x1  b.x2 
51
Time invariant systems
Definition:
A system F is called time invariant if
y1  F x1 


x2 t   x1t   
y 2  F x2 

y 2 (t )  y1t   
x1
y1
x2
y2

52
Linear time-invariant systems
Why?
Linear: Because they can easily be described
Time-invariant: Because electrical systems like
transistors resistors capacitance and induction satisfy
this property.
53
The convolution algorithm
The behavior of a linear time-invariant system can be
fully described by its impulse response h, i.e. the
response on the output to a single unit pulse on the
input.
The response y on the output to an input stream x then

follows from:
y z    x i .hz  i 
i  
or
y  xh
We will derive this convolution operation for time discrete
signals
54
The convolution algorithm
Let the unit sample sequence be defined by
1
 i z   
0
if i  z
In which
otherwise
z represents time,
i, z  Z
i represents the location of the
unit pulse
 i   Z  V 
  Z  Z  V 
 i z 
1
i
z
55
The convolution algorithm
• Step 1: express x using a delta function
56
The convolution algorithm
Then
x z 
x

 x i . i 
i  
1
2
3
4
5
6
7
x2. 2
x3. 3
x4. 4
x5. 5
in which (i) is a function on Z and x(i) is a scalar
57
The convolution algorithm
• Step 1: express x using a delta function
• Step 2: rewrite time-shifted delta function
58
The convolution algorithm
Shifting over 
y z
x z 

z
Hence
y z  xz   
 0z 
 2z 
-1
0
1
2
3
-1
z
Hence
z
 i z    0z  i 
0
1
2
3
z
59
The convolution algorithm
• Step 1: express x using a delta function
• Step 2: rewrite time-shifted delta function
• Step 3: rewrite impulse response using time
invariance property
60
The convolution algorithm
Consider a linear time-invariant system F
(i)
F
h*(i)
Let h*(i) be the response of this system to the
unit sample sequence (i).
(i)(z)
h*(i)(z)
z
z
F is time-invariant, so
h * i z  h * 0z  i 
61
The convolution algorithm
• Step 1: express x using a delta function
• Step 2: rewrite time-shifted delta function
• Step 3: rewrite impulse response using time
invariance property
• Step 4: rewrite impulse response using linearity
property
62
The convolution algorithm
(i)
Example
F
(0)(z)
-1
h*(i)
h*(0)(z)
0
1
2
3
-1
0
1
2
z
z
-(1)(z)
-1
-h*(1)(z)
0
1
2
3
½.(2)(z)
-1
3
-1
1
0
3
2
½. h*(2)(z)
0
1
2
3
a. i z  a. 0z  i 
-1
0
1
2
3
4
a.h * i z  a.h * 0z  i 
63
The convolution algorithm
• Step 1: express x using a delta function
• Step 2: rewrite time-shifted delta function
• Step 3: rewrite impulse response using time
invariance property
• Step 4: rewrite impulse response using linearity
property
• Step 5: rewrite general expression by means of
algebraic manipulation using result from step 4.
64
The convolution algorithm
x
F
y
y  F x 
h * i   F  i 
h * 0  h
in which h is called the impulse response of the system F
65
The convolution algorithm
From the preceding we derive:
x

 x i . i 
i  
scalar
function on Z
 

y  F   x i . i 
 i  

F is linear and x(i) is a scalar, hence
y
h * i   F  i 

 x i .F  i 
i  
y

 x i .h * i 
i  
66
The convolution algorithm
continue

 x i .h * i 
y
i  
 

y z     x i .h * i z 
 i  

recall
h3  h1  h2  h3 z   h1z   h2 z  for all z  Z

y z  
 x i .h * i z 
i  
recall
h * i z   h * 0z  i 
y z  

 x i .h * 0z  i 
i  
67
The convolution algorithm
continue
recall
y z  
h * 0  h

 x i .h * 0z  i 
i  
y z  

 x i .hz  i 
i  
This is called the convolution operation, denoted by
y  xh
We will apply this formula several times
68
The convolution algorithm
continue
y z  

 x i .hz  i 
i  
with j = z – i, we obtain:
y z  

 x z  j .h j 
j  
and if the impulse response h is finite (bounded), i.e.
hz   0 if z  0 or z  N
we get
y z  
N 1
 x z  j .h j 
j 0
69
Dependency Graphs and Signal Flow Graphs
The array processor described:
PE
• the way in which the processors are
arranged and
PE
• the way in which the data is communicated
between the processing elements.
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
Hence, the graph describes the dependencies of the data that is
communicated, or said differently:
The graph describes the way in which the data values at the outputs
of a processing element depend on the data at the outputs of the
other processing elements.
So we may consider it as a Dependency Graph or
a Signal Flow Graph
70
Dependency graphs and Signal Flow Graphs
Dependency Graph:
PE
PE
PE
PE
All communicated values are scalars and
the processing elements are functions on
scalars. Each arrow carries only one
value. Time does not play a role.
PE
PE
PE
PE
PE  V N  V N
PE
PE
PE
PE


V is the value domain, number of inputs = number of outputs = N
Signal Flow Graph:
The communicated values are streams, i.e. functions on time and
the processing elements are functions on steams.

PE  Z  V   Z  V 
N
N

Z represents time
71
Recurrent relations
For simple algorithms the transformation from single
assignment code to a recurrent relation is simple.
Questions to answer:
• How do recurrent relations influence the
dependency graph
• How can recurrent relations be manipulated such that
the behavior remains the same and the structure of the
dependency graph is changed
We will answer these questions by means
of an example:


Matrix-Vector multiplication c  A.b
72
Matrix Vector multiplication


c  A.b
Recurrent relations:
ci   j 0 ai , j .b j
N 1
si , j  si , j 1  ai , j .b j
si ,1  0
i 0,, K 1
ci  si ,N 1
j 0,, N 1
Alternative (because  is associative)
si , j  si , j 1  ai , j .b j
si ,N  0
i 0,, K 1
ci  si ,0
j 0,, N 1
73
Matrix Vector multiplication
The basic cell is described by:
si , j  si , j 1  ai , j .b j
We have two indices i and j, so the dependency graph can be
described as a two-dimensional array
j
bj
bj
si,j-1
i
ai,j
x
PE
si,j
si,j-1
+
si,j
74
DG-1 of the Matrix Vector multiplication
j
si , j  si , j 1  ai , j .b j
b0
si ,1  0
c i  si , 2
i  0, 1, 2, 3(K = 4)
0
s0,-1
b1
PE
i
0
PE
j 0, 1, 2 (N = 3)
0
b0, b1 and b2 are global
dependencies.
Therefore this graph is called a
0
PE
s3,-1
PE
S0,0
s1,0
s2,0
s3,0
b2
s0,1
PE
s0,2=c0
PE
PE
s1,2=c1
PE
PE
s2,2=c2
PE
PE
s3,2=c3
PE
Globally recursive Graph
75
DG-2 of the Matrix Vector multiplication
j
si , j  si , j 1  ai , j .b j
b0
si ,N  0
c0=s0,0
ci  si ,0
i  0, 1, 2, 3 (K = 4)
j  0, 1, 2
(N = 3)
i
c1=s1,0
c2=s2,0
c3=s3,0
b1
PE
PE
PE
PE
s0,1
s1,1
s2,1
s3,1
b2
s0,2
s0,3
PE
0
PE
PE
0
PE
PE
0
PE
PE
PE
s3,3
0
76
Variable naming and index assignment
j
bi-1,j-1
ci-1,j
ai,j-1
PEi,j
i
ai,j
A variable associated to
an arrow gets the indices
of the processing
element that delivers its
value.
(i,j)
ci,j
bi,j
vi,j
PEi,j
Local constants get the
indices of the processing
element that they are in
77
Recurrent relations: Conclusion
The associative operations  and  result in two different
recurrent relations and thus in two different dependency graphs.
Equation
N 1
x
i 1 i
y 
si  si 1  xi ,
results in
s1  0,
y  sN 1
si  si 1  xi ,
s N  0,
y  s0
with i  0, 1, , N 1
Equation y 
N 1
x
i 0 i

si  si 1.xi ,
si  si 1.xi ,
with
results in
s1  1,
sN  1,
y  sN 1
y  s0
i  0, 1, , N 1
Other associative operations are for example ‘AND’ and ‘OR’.
78
Changing global data dependencies
into local data dependencies
Global data dependencies resist manipulating the dependency
graph
j
N 1
b
j

c
a .b
i

j 0
i, j
j
Global data dependencies
si , j  si , j 1  ai , j .b j
si ,1  0
ci  si ,N 1
Local data dependencies
si , j  si , j 1  ai , j .d i 1, j
si ,1  0
d 1. j  b j
ci  si ,N 1
ci
i
bj
di-1,j
si,j
ci
d i , j  d i 1, j
79
Changing global data dependencies
into local data dependencies
So the matrix-vector multiplications becomes:
b0=d-1,0 b1=d-1,1 b2=d-1,2
c i   j 0 ai , j .b j
N 1
Relations:
si , j  si , j 1  ai , j .d i 1, j
si ,1  0
ci  si ,N 1
d 1. j  b j
d i , j  d i 1, j
i  0, 1, 2, 3 (K = 4)
j 0, 1, 2 (N = 3)
Locally recursive graph
0
s0,-1
PE
s0,0
d0,0
0
PE
s1,0
s0,1
PE
s0,2=c0
PE
PE
s1,2=c1
PE
PE
s2,2=c2
PE
PE
s3,2=c3
PE
d0,1
d1,0
0
0
PE
s3,-1
PE
s2,0
s3,0
80
Alternative transformation from global data
dependencies to local data dependencies
c i   j 0 ai , j .b j
N 1
bi
Global data dependencies
si , j  si , j 1  ai , j .b j
si ,1  0
ci  si ,N 1
ci
Local data dependencies
si , j  si , j 1  ai , j .d i 1, j
si ,1  0
ci  si ,N 1
dN, j  b j
d i , j  d i 1, j
di,j
si,j
bi
ci
81
Changing global data dependencies
into local data dependencies
So the alternative locally recursive graph becomes:
c i   j 0 ai , j .b j
N 1
0
s0,-1
d1,0
Relations:
si, j  si, j 1  ai, j .d i 1, j
si ,1  0
ci  si ,N 1
d K, j  b j
0
PE
s1,0
s0,1
PE
s0,2=c0
PE
PE
s1,2=c1
PE
PE
s2,2=c2
PE
PE
s3,2=c3
PE
d1,1
d2,0
0
d i , j  d i 1, j
i 0, 1, 2, 3 (K = 4)
j 0, 1, 2 (N = 3)
PE
s0,0
0
PE
s3,-1
PE
s2,0
s3,0
b0=d4,0
b1=d4,1 b2=d4,2
82
Shift-invariant graph
Consider an N-dimensional dependency graph with processing
elements PE at locations (i,j,k, ...) .
Base (1,0,0,..), (0,1,0,..), (0,0,1,...), ... .
If for any (i,j,k, ...) and for any input x of the PE at (i,j,k, ...) that is
delivered by the output x of PE at (p,q,r,... ), holds that the input x
of the PE at (i,j+1,k,...) is delivered by the output x of the PE at
(p,q+1,r,... ), then the graph is called shift-invariant in the direction
(0,1,0,..).
j
i
Sh-Inv in direction i
Sh-Inv in direction i and j
83
Shift-invariant graphs (Examples)
j
i
Sh-Inv in direction i and j
Sh-Inv in no direction
Sh-Inv in no direction
Sh-Inv in direction j
84
Shift-invariant graphs
Because the inputs and outputs often negatively
influence the shift-invariance property, the inputs
and outputs are treated separately.
Hence, we always distinguish between
• Input edges,
• Output edges and
• Intermediate edges
85
Dependeny Graphs
Conclusions:
Associative operations give two alternative DG’s.
Transformation from global to local dependencies
gives two alternative DG’s.
Input, output and intermediate edges will be treated
separately.
86