CR18: Advanced Compilers
L03: Transformations
Tomofumi Yuki
1
Today’s Agenda
Transformations in the polyhedral world
Unimodular framework
Tiling
2
Transforming Polyhedral
Objects
Recall: Set/Relation/Function
What are the math operators over these?
intersection/union
image/pre-image
join/compose
What does it mean to the code?
3
Simplifying Assumption
The order of operations is the lex. order
changing the shape of domain =
changing the order of execution
We will revisit this later
4
Loop Skewing
“Shift” the iteration space
for (i=1; i<=N; i++)
for (j=1; j<=M; j++)
S1;
j
for (i=1; i<=N; i++)
for (j=1+i; j<=M+i; j++)
S1;
i
5
Why do you want to skew?
What happens to the dependences?
for (i=1; i<=N; i++)
for (j=1; j<=M; j++)
A[i,j] = A[i-1,j+1];
for (i=1; i<=N; i++)
for (j=1+i; j<=M+i; j++)
???;
j
i
6
Change of Basis
Building block of transformations for poly. IR
how to keep the IR consistent
CoB : Apply a transform to a statement
domain
transform = affine function
what are the effects to the full PRDG/Alpha?
7
CoB: Skewing Example
Apply (i,j -> i,j+i) to the following
S0: [N] -> { [i,j] : 0<=i,j<=N }
S1 : [N] -> { [i,j] : 0<=i,j<=N }
//ignore boundary cases
S0[i,j] = {j>0} : S0[i,j-1] + S1[i,j];
S1[i,j] = {i>0} : S0[i-1,j];
Is this correct?
S0 : [N] -> { [i,j] : 0<=i<=N and i<=j<N+i}
S1 : [N] -> { [i,j] : 0<=i,j<=N }
S0[i,j] = {j>0} : S0[i,j-1] + S1[i,j];
S1[i,j] = {i>0} : S0[i-1,j];
8
CoB: Main Idea
Apply function f to statement S
4 cases of dependences
dom(S)
dom(T)
f
dom(S’)
9
CoB: Main Idea
What happens to the dependences?
dom(S)
f-1@I
dom(T)
I
f
dom(S’)
I@f
f-1@I@f
10
How to skew in ISCC
Apply the transformation and generate code
for (i=1; i<=N; i++)
for (j=1; j<=N; j++)
A[i,j] = A[i-1,j+1];
j
D:=[N]->{[i,j] : 1<=i,j<=N};
F:=[N]->{[i,j] -> [i,i+j]};
E := range(F*D);
codegen E;
i
for (int c0 = 1; c0 <= N; c0 += 1)
for (int c1 = c0 + 1; c1 <= N + c0; c1 += 1)
(c0, c1);
11
How to skew in ISCC
You get different code if you use relation
for (i=1; i<=N; i++)
for (j=1; j<=N; j++)
A[i,j] = A[i-1,j+1];
j
E := [N]->{[i,j]->[i,i+j] :
0<=i,j<=N };
codegen (E);
i
for (int c0 = 1; c0 <= N; c0 += 1)
for (int c1 = c0 + 1; c1 <= N + c0; c1 += 1)
(c0, -c0 + c1);
12
Schedules
Mapping from iteration points to exec. order
different notions exist
In the next lecture, we will cover how to
automatically come up with them
for now, we will figure out on our own
13
Schedules
Not necessarily one iteration at a time
j
But codegen can become
ambiguous
R := [N] -> {
S1[i,j] -> [i] : 0<=i,j<=N;
S2[i,j]->[i] : 0<=i,j<=N};
codegen R;
for (int c0 = 0; c0 <= N; c0 += 1) {
for (int c2 = 0; c2 <= N; c2 += 1)
S2(c0, c2);
for (int c2 = 0; c2 <= N; c2 += 1)
S1(c0, c2);
}
i
(i,j->i)
14
Space-Time Mapping
More precise notion
time: time stamp for iterations
space: (virtual) processor assigned
Example
θ(i,j) = i
π(i,j) = j
θ
π
for i
forall j
S
j
i
15
Schedules in Loop Context
Often used interchangeably
How to transform a loop
j
[N] -> { [i,j] -> [i,i+j] };
+
parallel := [true, false]
↓
i
forall i
for j
S;
16
Schedules in Loop Context
You also use constant dimensions
to modify loop structure
to specify statement orders
θS1(i) = [0,i]
θS2(i) = [1,i]
for i
S1;
for i
S2;
θS1(i) = [i,1]
θS2(i) = [i,0]
for i
S2;
S1;
17
Schedule in ISL
When a relation is given to codegen, RHS is
supposed to be the schedule
should be in common space
D:=[N] -> { S1[i] -> [0,i] : 0<=i<=N;
S2[i] -> [1,i] : 0<=i<=N };
codegen D;
{
for (int c1 = 0; c1 <= N; c1 += 1)
S1(c1);
for (int c1 = 0; c1 <= N; c1 += 1)
S2(c1);
}
18
Schedule in ISL
Otherwise it is treated as unordered
D:=[N] -> { S1[i] -> A[0,i]
S2[i] -> B[1,i]
codegen D;
{
for (int c1 = 0; c1 <= N;
S2(c1);
for (int c1 = 0; c1 <= N;
S1(c1);
}
: 0<=i<=N;
: 0<=i<=N };
c1 += 1)
c1 += 1)
19
Overloading of Schedules
Schedule may be
timestamp for iterations (θ)
space-time mapping (θ+π)
abstraction of loop structure (θ with 2d+1)
code gen strategy (θ+ aux. info)
20
Back to CoB
What are the properties required for f?
dom(S)
f-1@I
dom(T)
I
f
dom(S’)
I@f
f-1@I@f
21
Unimodular Framework
Earlier variant of polyhedral-ish framework
Model transformations as f = Ax
where A is unimodular
What does this restriction bring?
22
Unimodular Framework
Main Flow
select a transformation (will skip for today)
apply the transformation to the loop bounds
apply the transformation to array accesses
Limitations
much coarser grained than polyhedral
same transformation for the entire loop nest
23
Unimodular Framework
Given transformation T
loop bounds : L.x ≥ m -> L.T.x ≥ m
array accesses: A.x ≥ b -> A.T-1.x ≥ b
Recall CoB
changes to bounds = exactly the same
let array access function be g
g(i) -> g@(f-1(f(i))
More or less subsumed by polyhedral model
24
Key Feature of Unimodular FW
Composition of transformations
Applying transformation T1 and then T2
easy to show that it is equivalent to applying a
single transformation T2.T1
Enables exploration of arbitrarily
combinations of transformations
e.g., skew + interchange is just another matrix
25
What is missing?
What are the space of transformations that
can be expressed in unimodular FW?
26
27
Loop Skewing
“Enabler” transformation
for (i=1; i<=N; i++)
for (j=1; j<=M; j++)
A[i,j] = A[i-1,j+1];
j
We already saw this one
distance vector: [1,-1]
i
Find f([1,-1]T) s.t.?
28
Loop Fusion
Can you fuse these loops?
for (i=1;
A[i] =
for (i=1;
B[i] =
i<=N; i++)
foo();
i<N; i++)
bar(A[i]);
S0: [N]-> { [i] : 0<=i<=N };
S1: [N]-> { [i] : 0<=i<N
};
Why would you want to fuse them?
What are the schedules?
29
Loop Fusion 2
Can you fuse them now?
for (i=1;
A[i] =
for (i=1;
B[i] =
i<=N; i++)
foo();
i<N; i++)
bar(A[i+1]);
S0: [N]-> { [i] : 0<=i<=N };
S1: [N]-> { [i] : 0<=i<N
};
30
Affine Loop Transformations
It is some composition of:
loop skewing
loop permutation
loop reversal
loop shifting
loop fusion
loop fission
loop unrolling
loop tiling
unimodular
31
Tiling
Well-known transformation in HPC [Wolfe 87]
for t=0; t<T; t++
for i=1; i<N-1; i++
A[i] = f(B[i], B[i-1], B[i+1])
//swap A and B
?
Improves data locality
Performance improvement
my laptop: 5.95s 4.76s (20%)
PLuTo: ~1.8x (with older processor)
32
So what is Tiling?
Loop transformation
Effect: change the order of execution
for t=0; t<T; t++
for i=1; i<N-1; i++
foo()
for tt=0; tt<T; t+=x
for ti=1; ti<N-1; ti+=y
for t=tt; t<min(tt+x,T); t++
for i=ti; i<min(ti+y,N-1); i++
foo()
33
Visualization of Tiling
Improve locality through temporal locality
Each tile becomes an atomic unit
i
i
t
t
34
What is Tiling?
Loop transformation that doubles the depth
for (t=0; t<N; t++)
for (i=0; i<M; i++)
S
i
Tile Origins
Tile Loops
t
for (x=0; x<N; x+=3)
for (y=0; y<M; y+=3)
S’
35
What is Tiling?
Loop transformation that doubles the depth
for (t=0; t<N; t++)
for (i=0; i<M; i++)
S
i
Point Loops
t
for (t=x; t<x+3; t++)
for (i=y; i<y+3; i++)
S’
36
What is Tiling?
Loop transformation that doubles the depth
i
M
t
for (t=0; t<N; t++)
for (i=0; i<M; i++)
S
for (t=x; t<x+3; t++)
for (i=y; i<min(y+3,M)
i<y+3; i++)
S
37
What is Tiling?
Loop transformation that doubles the depth
for (t=0; t<N; t++)
for (i=0; i<M; i++)
S
i
for (x=0; x<N; x+=3)
for (y=0; y<M; y+=3)
S’
t
for (t=x; t<min(x+3,N); t++)
for (i=y;i<min(y+3,M);i++)
S
38
What is Tiling?
Loop transformation that doubles the depth
for (t=0; t<N; t++)
for (i=0; i<M; i++)
S
i
for (x=0; x<N; x+=3)
for (y=0; y<M; y+=3)
for (t=x; t<min(x+3,N); t++)
for (i=y;i<min(y+3,M);i++)
S
t
39
Legality of Tiling
Is this tiling legal?
i
t
40
Legality of Tiling
Is this tiling legal?
i
Fully Permutable
≈
Tilable
t
41
Variations of Tiling
Rectangular Tiles are not always legal
Tiles cannot be mutually dependent
i
i
t
t
42
Oblique Tiling
Parallelograms
avoid mutual dependence
i
i
t
t
43
Oblique Tiling
Parallelograms
avoid mutual dependence
i
i
t
t
44
Oblique Tiling
Parallelograms
avoid mutual dependence
wave-front parallelism
i
i
t
t
45
Overlapped Tiling
Pros:
Less frequent
communication
i
i
t
t
46
Tile Sizes and Shapes
More decisions for the compiler to make
out of many variations, what to use?
what should be the size of each tile?
Tile sizes have huge impact on performance
Easily 2-5x difference
Much much more with parallel execution
Topic of HW2
47
Expressing Tiling as Schedules
What is the schedule for tile loops?
for (i=0; i<=N; i++)
for (j=0; j<=N; j++)
S
[N] -> { [i,j] -> [ti,tj] : 0<=i,j<=N and
ti=3i and tj=3j and 0<=ti,tj<=N};
?
for (x=0; x<=N; x+=3)
for (y=0; y<=N; y+=3)
S’
48
Expressing Tiling as Schedules
What is the schedule for tile loops?
for (i=0; i<=N; i++)
for (j=0; j<=N; j++)
S
[N] -> { [i,j] -> [ti,tj, i',j',x,y] :
0<=i,j<=N and
ti=3x and tj=3y and 0<=ti,tj<=N and
ti<=i'<ti+3 and tj<=j'<tj+3 and
i=i' and j=j'};
?
for (ti=0; ti<=N; ti+=3)
for (tj=0; tj<=N; tj+=3)
for (t=ti; t<=min(ti+2,N); t++)
for (i=tj;i<=min(tj+2,N);i++)
S
49
Alternative Formulation
Different view of tile origins
for (ti=0; ti<=N; ti+=3)
for (tj=0; tj<=N; tj+=3)
for (t=ti; t<=min(ti+2,N); t++)
for (i=tj;i<=min(tj+2,N);i++)
S<i,j>
for (ti=0; ti<=N/3; ti++)
for (tj=0; tj<=N/3; tj++)
for (t=0; t<3; t++)
for (i=0;i<3;i++)
S<3ti+i,3tj+j>
50
Exercises
Using ISCC, review
array dataflow analysis
transformations
51
ADA with ISCC
Recall the good old example:
for i = 0 .. N
for j = 0.. M
A[j] = foo(A[j], A[j+1])
Two problems:
A[j] <> A[j] pair
A[j] <> A[j+1] pair
52
ADA with ISCC
A[j] <> A[j] pair
for i = 0 .. N
for j = 0.. M
A[j] = foo(A[j], A[j+1])
reader instance as parameters
memory conflict
PRef1 := [N,M,i,j] ->
{ [i',j'] : 0<=i,i'<N and 0<=j,j'<M and j=j’
and i'<i;
[i',j'] : 0<=i,i'<N and 0<=j,j'<M and j=j’
and i'=i and j'<j};
lex order
lexmax(PRef1);
[N, M, i, j] -> { [-1 + i, j] : i <= -1 + N and
j >= 0 and j <= -1 + M and i >= 1 }
53
ADA with ISCC
A[j] <> A[j+1] pair
for i = 0 .. N
for j = 0.. M
A[j] = foo(A[j], A[j+1])
PRef2 := [N,M,i,j] ->
{ [i',j'] : 0<=i,i'<N and 0<=j,j'<M and j=j’
and i'<i;
[i',j'] : 0<=i,i'<N and 0<=j+1,j'<M and j=j’
and i'=i and j'<j};
lexmax(PRef2);
[N, M, i, j] -> { [-1 + i, 1 + j] : i <= -1 + N and
j >= 0 and j <= -2 + M and i >= 1 }
54
ADA with ISCC 2
Multi-statement and sloppy specification
for (i = 0 .. M)
A[i] = 0;
for (i = 0 .. N)
A[i] = 1;
for (i= 0 .. Z)
B[i] = A[i];
PRef1 := [N,M,Z,i] -> {
[0,i'] : i'=i and 0<=i<=Z
and 0<=i'<=M;
[1,i'] : i'=i and 0<=i<=Z
and 0<=i'<=N};
PRef1 := [N,M,Z,c2,i] -> {
[c,i'] : i'=i and 0<=i<=Z
and 0<=i'<=M;
[c,i'] : i'=i and 0<=i<=Z
and 0<=i'<=N};
(c=0 or c=1) and c2=2, c < c2
55
Being more Sloppy
Using << operator in ISCC
m3 := m1 << m2
a map from the domain of m1 to the domain of
m2 those elements such that their images live
in the same space and such that the images of
the elements of m1 are lexicographically
strictly smaller than those of m2.
Also use lexmax on relations
gives output parameterized by the LHS
56
The << Operator Explained
Given
two statement domains d1 and d2
two schedules for the statements f1 and f2
schedules in this context is the loop structure
Create maps m1 and m2 as
f1 * d1 and f2 * d2
Then use the << operator on the resulting
maps
Result: a map that restrict d1 to be lex. before
d2 according to f1 and f2
57
The << Operator Example
ADA for the following program
find producer for S0
for (i=0 .. N) {
for (j=0 .. P)
S0:
A[j] = A[j];
for (j=0 .. Q)
S1:
A[j] = A[j];
}
S1beforeS0 := (FS1*DS1) << (FS0*DS0);
S1beforeS0 := (FS0*DS0) >> (FS1*DS1);
S0beforeS0 := (FS0*DS0) >> (FS0*DS0);
conflictS0 := [N,P,Q] -> { S0[i,j] -> S0[i',j'] : j=j'};
conflictS1 := [N,P,Q] -> { S0[i,j] -> S1[i',j'] : j=j'};
DS0
DS1
FS0
FS1
:=
:=
:=
:=
[N,P,Q]
[N,P,Q]
[N,P,Q]
[N,P,Q]
->
->
->
->
{
{
{
{
S0[i,j]
S1[i,j]
S0[i,j]
S1[i,j]
: 0<=i<=N and 0<=j<=P };
: 0<=i<=N and 0<=j<=Q };
-> [i,0,j] };
-> [i,1,j] };
58
Some ISCC Operators
You probably need:
m1 + m2: union of two maps
m1 * d1: intersect d1 with the domain of m1
m1 . m2: join
domain(m): domain of m
range(m): range of m
coalesce x: simplify x
59
Exercise
http://perso.ens-
lyon.fr/tomofumi.yuki/courses/exercise150930.txt
60
Systolic Arrays
Pipeline of Processors
Each one is identical to the other
Nearest neighbor communication
Synchronized with “heart-beat”
Advantages:
Highly parallel and scalable
Disadvantages:
Specialized and difficult to design
61
Systolic Matrix Multiplication
Example of 2D Systolic Array
Ain
reg
PE
Bin
mult
add
Bout
Aout
62
Systolic MM in Action
A11
A21
A31
A41
A12 A13 A14
A22 A23 A24
A32 A33 A34
A42 A43 A44
B11 B12 B13 B14
PE
PE
PE
PE
B21 B22 B23 B24
PE
PE
PE
PE
B31 B32 B33 B34
PE
PE
PE
PE
B41 B42 B43 B44
PE
PE
PE
PE
63
Systolic MM in Action
A11
A21
A31
A41
B11 B12 B13 B14
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
64
Systolic MM in Action
A11
A21
A31
B11 B12 B13
A41
PE
B14
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
65
Systolic MM in Action
A11
A21
B11 B12
A31
PE PE
B13 B14
PE
PE
A41
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
66
Systolic MM in Action
A11
B11
A21
PE PE
B12 B13
PE
B14
PE
A31
PE
PE
PE
PE
A41
PE
PE
PE
PE
PE
PE
PE
PE
67
Systolic MM in Action
A11
PE PE PE
B11 B12 B13
PE
B14
A21
PE
PE
PE
PE
A31
PE
PE
PE
PE
A41
PE
PE
PE
PE
68
Systolic MM in Action
A11
A21
A31
A41
A12 A13 A14
A22 A23 A24
A32 A33 A34
A42 A43 A44
B11 B12 B13 B14
PE
PE
PE
PE
B21 B22 B23 B24
PE
PE
PE
PE
B31 B32 B33 B34
PE
PE
PE
PE
B41 B42 B43 B44
PE
PE
PE
PE
69
Systolic MM in Action
A11
A21
A31
A12 A13 A14
A22 A23 A24
A32 A33 A34
B11 B12 B13
A41
PE
B14
A42 A43
PE PE
A44
PE
B21 B22 B23
PE
B24
PE
PE
PE
B31 B32 B33
PE
B34
PE
PE
PE
B41 B42 B43
PE
B44
PE
PE
PE
70
Systolic MM in Action
A11
A21
A12 A13 A14
A22 A23 A24
B11 B12
A31
PE
B13
A32
PE
B14
A33
PE
A34
PE
B21 B22
A41
PE
B23
A42 A43
PE PE
B24
A44
PE
B31 B32
PE
B33
PE
B34
PE
PE
B41 B42
PE
B43
PE
B44
PE
PE
71
Systolic MM in Action
A11
A21
A31
A41
A14
A13 A24
A12 A23 A34
A22 A33 A44
A32 A43
A42
PE
PE
PE
PE
PE
PE
PE
PE
B31 B32 B33 B34
PE
PE
PE
PE
B42 B43 B44
PE
PE
PE
PE
B11 B12 B13 B14
B21 B22 B23 B24
72
Systolic MM in Action
A11
A21
A31
A14
A13 A24
A12 A23 A34
A22 A33 A44
A32 A43
A42
B11 B12 B13
A41
PE
B14
PE
PE
PE
B21 B22 B23 B24
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
B31 B32 B33 B34
B41 B42 B43 B44
73
Systolic MM in Action
A11
A21
A14
A13 A24
A12 A23 A34
A22 A33 A44
A32 A43
B11 B12
A31
PE
B13
A42
PE
B14
PE
PE
B21 B22 B23
A41
PE
B24
PE
PE
PE
B31 B32 B33 B34
PE
PE
PE
PE
PE
PE
PE
PE
B41 B42 B43 B44
74
Systolic MM in Action
A11
A14
A13 A24
A12 A23 A34
A22 A33 A44
B11
A21
PE
B12
A32
PE
B13
A43
PE
B14
PE
B21 B22
A31
PE
B23
A42
PE
B24
PE
PE
B31 B32 B33
A41
PE
B34
PE
PE
PE
B41 B42 B43 B44
PE
PE
PE
PE
75
Systolic MM in Action
A13
A12 A23
A14
A24
A34
A33
PE
B13
A44
PE
B14
A11 A22
PE PE
B11 B12
B21
A21
PE
B22
A32 A43
PE PE
B23 B24
PE
B31 B32
A31
PE
B33
A42
PE
B34
PE
PE
B41 B42 B43
A41
PE
B44
PE
PE
PE
76
Systolic MM in Action
A13
A14
A24
A12 A23 A34
PE PE PE
B11 B12 B13
A11 A22 A33 A44
PE PE PE PE
B21 B22 B23 B24
PE
B31
B41 B42
A21
PE A32
PE A43
PE
B33
B34
B32
PE
A31
PE A42
PE
B43 B44
PE
PE
B14
A41
77
Automatic Synthesis
Slides from http://www.cs.colostate.edu/~cs560/Spring2013/
Write the computation in Alpha
Align inputs and outputs
Serialize reductions
Uniformize dependences
Schedule Alpha
Allocate computation to processors
Transform Alpha
Generate HDL
78
Automatic Synthesis
Slides from http://www.cs.colostate.edu/~cs560/Spring2013/
Write the computation in Alpha
Align inputs and outputs
Serialize reductions
Part of scheduling
Uniformize dependences
Schedule Alpha
Allocate computation to processors
Transform Alpha
Generate HDL
79
N
FIR Filter
Align inputs
j
yi = å a j xi- j
j=0
yi = å
N
j=0
A[0, j]X[i - j, 0]
A[0, j] = a j
X[i, 0] = xi
i
80
Serialization
Reduction can be computed in many different
order
Serialization = Select an order
Example: reduce(+, [i], {|0<=i<5} : x[i])
Σ
res
81
Serialization
Reduction can be computed in many different
order
Serialization = Select an order
Example: reduce(+, [i], {|0<=i<5}: x[i])
+
+
+
+
res
82
N
FIR Filter
yi = å a j xi- j
j=0
Serialize Reduction
j
yi = Y[i, N ]
ìï j = 0 : A[0, j]X[i - j, 0]
Y[i, j] = í
ïî j > 0 :Y[i, j -1]+ A[0, j]X[i - j, 0]
A[0, j] = a j
X[i, 0] = xi
i
83
Uniformization
Affine dependences can be replaced by
chains of uniform dependences
Also called localization
Localizes communication
Example: A[i] = foo(B[0], …)
84
Uniformization
Affine dependences can be replaced by
chains of uniform dependences
Also called localization
Localizes communication
Example: A[i] = foo(B[i], …)
B[i] = B[i-1];
85
N
FIR Filter
yi = å a j xi- j
j=0
Uniformize Dependences
j
yi = Y[i, N ]
ìï j = 0 : A[0, j]X[i - j, 0]
Y[i, j] = í
ïî j > 0 :Y[i, j -1]+ A[0, j]X[i - j, 0]
A[0, j] = a j
X[i, 0] = xi
i
86
N
FIR Filter
yi = å a j xi- j
j=0
Uniformize Dependences
j
yi = Y[i, N ]
ìï j = 0 : A[i, j]X[i - j, 0]
Y[i, j] = í
ïî j > 0 :Y[i, j -1]+ A[i, j]X[i - j, 0]
ìï i = 0 : a
j
A[i, j] = í
ïî i > 0 : A[i -1, j]
ìï j = 0 : x
i
X[i, j] = í
îï j > 0 : X[i, j -1]
i
87
N
FIR Filter
yi = å a j xi- j
j=0
Uniformize Dependences
j
yi = Y[i, N ]
ìï j = 0 : A[i, j]X[i, j]
Y[i, j] = í
ïî j > 0 :Y[i, j -1]+ A[i, j]X[i, j]
ìï i = 0 : a
j
A[i, j] = í
ïî i > 0 : A[i -1, j]
ìï j = 0 : x
i
X[i, j] = í
ïî j > 0 : X[i -1, j -1]
i
88
N
FIR Filter
yi = å a j xi- j
j=0
Final Dependences
j
yi = Y[i, N ]
ìï j = 0 : A[i, j]X[i, j]
Y[i, j] = í
ïî j > 0 :Y[i, j -1]+ A[i, j]X[i, j]
ìï i = 0 : a
j
A[i, j] = í
ïî i > 0 : A[i -1, j]
ìï j = 0 : x
i
X[i, j] = í
ïî j > 0 : X[i -1, j -1]
i
89
N
FIR Filter
yi = å a j xi- j
j=0
Schedule
j
i
90
N
FIR Filter
yi = å a j xi- j
j=0
Schedule θ(i,j) = i+j
j
i
91
N
FIR Filter
yi = å a j xi- j
j=0
Allocate π(i,j) = j
θ(i,j) = i+j
j
i
92
N
FIR Filter
yi = å a j xi- j
j=0
Transform
Make i=θ, j=π
θ(i,j) = i+j
π(i,j) = j
j
i
93
N
FIR Filter
yi = å a j xi- j
j=0
Final Equations
Difference in:
i = delay
j = PE
yi = Y[i, N ]
ìï j = 0 : A[i, j]X[i, j]
Y[i, j] = í
ïî j > 0 :Y[i -1, j -1]+ A[i, j]X[i, j]
ìï i = 0 : a
j
A[i, j] = í
ïî i > 0 : A[i -1, j]
ìï j = 0 : x
i
X[i, j] = í
ïî j > 0 : X[i - 2, j -1]
94
N
FIR Filter
yi = å a j xi- j
j=0
Final Equations
Difference in:
i = delay
j = PE
yi = Y[i, N ]
ìï j = 0 : A[i, j]X[i, j]
Y[i, j] = í
ïî j > 0 :Y[i -1, j -1]+ A[i, j]X[i, j]
ìï i = 0 : a
j
A[i, j] = í
ïî i > 0 : A[i -1, j]
Need a register for delay of 2
ìï j = 0 : x
i
X[i, j] = í
ïî j > 0 : X[i - 2, j -1]
95
N
FIR Filter
yi = å a j xi- j
j=0
Final Equations
Difference in:
i = delay
j = PE
No communication
yi = Y[i, N ]
ìï j = 0 : A[i, j]X[i, j]
Y[i, j] = í
ïî j > 0 :Y[i -1, j -1]+ A[i, j]X[i, j]
ìï i = 0 : a
j
A[i, j] = í
ïî i > 0 : A[i -1, j]
ìï j = 0 : x
i
X[i, j] = í
ïî j > 0 : X[i - 2, j -1]
96
N
FIR Filter
Generate HDL
X
a
Y
*
+
yi = å a j xi- j
j=0
yi = Y[i, N ]
ìï j = 0 : A[i, j]X[i, j]
Y[i, j] = í
ïî j > 0 :Y[i -1, j -1]+ A[i, j]X[i, j]
ìï i = 0 : a
j
A[i, j] = í
ïî i > 0 : A[i -1, j]
ìï j = 0 : x
i
X[i, j] = í
ïî j > 0 : X[i - 2, j -1]
97
© Copyright 2026 Paperzz