(BOCA) Bijzondere Onderwerpen Computer Architectuur Block A Introduction 1 The aims of the course • Show the relation between the algorithm and the architecture. • Derive the architecture from the algorithm. • Explaining and formalizing the design process. • Explain the distinction between structure and behavior. • Explain some architectures. 2 The design process A design description may express: • Behavior: Expresses the relation between the input and the output value-streams of the system • Structure: Describes how the system is decomposed into subsystems and how these subsystems are connected • Geometry: Describes where the different parts are located. Pure behavioral, structural or geometrical descriptions do not exist in practice. 3 Abstraction levels Behavior Structure Application Algorithm Geometry Board level Block level Basic operator Boolean logic Physical level Processing element Layout Basic block Cell Transistor 4 The Design Process The implementation i is the specification for the implementation i+1 verification: Idea by simulation only Spec 0 by simulation formal verification For practical reasons a specification must be executable Spec 1 by simulation formal verification Spec N 5 Descriptions • Predicate logic • Algebra (language Z, SDL (VDM) ) • Process algebras CCS, CSP, Lotos • VHDL, Verilog • Silage, ...... 6 Specification overloading Specification overloading means that the specification gives a possibly unwanted implementation suggestion, i.e. the behavioral specification expresses structure In practice: A behavioral specification always contains structure. 7 Example: same function same behavior, different expressions different structure different designs z a b 2 2 suggests: and z a b a 2 + x z + z a b 2 x a x suggests: b 8 Architecture Definition: Architecture is the way in which hardware and software is structured; the structure is usually based on grandiose design philosophies. Architecture deals with fundamental elements that affect the way a system operates and thus its capabilities and its limitations. The New American Computer Dictionary 9 Our focus • Array processors. • Systolic arrays. • Wave-front array processors. • Architectures for embedded algorithms s.a. digital signal processing algorithms. 10 Array processor An array processor is a structure in which identical processing elements are arranged regularly 1 dimension PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE 2 dimensions 11 Array processor 3 dimensions PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE 12 Systolic array In a systolic array processor all communication path contain at least one unit delay (register). PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE is register or delay PE PE PE PE Delay constraints are local. Therefore unlimited extension without changing the cells 13 Wave-front array PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE 14 Array Processors Can be approached from: • Application • Algorithm • Architecture •Technology We will focus on Algorithm Architecture Derive the architecture from the algorithm 15 Array processors: Application areas • Speech processing • Image processing (video, medical ,.....) • Radar • Weather • Medical signal processing Many simple calculations • Geology on a lot of data •........... in a short time General purpose processors do not provide sufficient processing power 16 Example video processing 1000 operations per pixel (is not that much) 1024 x 1024 pixels per frame (high density TV) 50 frames per second (100 Hz TV) 50 G operations per second < 1 Watt available Pentium 2Ghz: 2G operations per second > 30 Watt required 25 Pentiums 750 Watt 17 Description of the algorithms In practice the algorithms are described (specified) in: • some programming language. In our (toy) examples we use: • programming languages • algebraic descriptions 18 Examples of algorithms we will use: Filter: Matrix algebra: yt i 0 hi .xt i N 1 y Cx yi j 0 ci , j x j N 1 Transformations likeFourier transform Z transform Sorting .... 19 Graphs Graphs are applicable for describing • behavior • structure Dependency graphs consist of: • nodes expressing operations or functions • edges expressing data dependencies or the flow of data So, graphs are suitable to describe the design flow from Algorithm to architecture 20 Design flow example: Sorting idea program (imperative) single assignment code (functional) recurrent relations dependency graph 21 Sorting: the idea > > empty place needed shifted one position 22 y 8 x 9 9 8 mj-1 mj 8 3 1 x mj ? 8 y := mj 9 6 mj-1 3 mj+1 y x 9 6 mj 3 3 1 mj+1 8 x 9 9 mj-1 y 8 mj x 9 9 mj:= x 8 6 3 3 1 mj+1 8 y 6 3 3 1 x := y 23 mj-1 mj mj+1 Sorting: inserting one element if (x>= m[j]) if (x>= m[j]) swap(m[j],x); { y = m[j]; m[j] = x; Identical descriptions of swapping x = y; m[j],x = MaxMin(m[j],x); } Inserting an element into a sorted array of i elements such that the order is preserved: m[i] = -infinite for(j = 0; j < i+1; j++) { m[j],x = MaxMin(m[j],x); } 24 Sorting: The program Sorting N elements in an array is composed from N times inserting an element into a sorted array of N elements such that the order is preserved. An empty array is ordered. input int in[0:N-1], x[0:N-1], m[0:N-1]; for(int i = 0; i < N; i++) { x[i] = in[i]; m[i] = - infinite; } body for(int i = 0; i < N; i++) { for(j = 0; j < i+1; j++) { m[j],x[i] = MaxMin(m[j],x[i]);} } output for(int j = 0; j < N; j++) { out[j] = m[j];} 25 Sorting: Towards ‘Single assignment’ Single assignment: Each scalar variable is assigned only once Why? Goal is a data dependency graph - nodes expressing operations or functions - edges expressing data dependencies or the flow of data 26 Sorting: Towards ‘Single assignment’ Single assignment: Each scalar variable is assigned only once Why? Code Nodes a x=a+b; Graph x + b c How do you connect these? * x=c*d; x d 27 Sorting: Towards ‘Single assignment’ Single assignment: Each scalar variable is assigned only once Why? Code x=a+b; x=c*d; Description already optimized towards implementation: memory optimization. But, fundamentally you produce two different values, e.g. x1 an x2 28 Sorting: Towards ‘Single assignment’ Single assignment: Each scalar variable is assigned only once Start with m[j]: m[j] at loop index i depends on the value at loop index i-1 hence, for(int i = 0; i < N; i++) { for(j = 0; j < i+1; i++) j++) { m[j],x[i] = MaxMin(m[j],x[i]);} m[i,j],x[i] = MaxMin(m[i-1,j],x[i]);} } 29 Sorting: Towards ‘Single assignment’ x[i] at loop index j depends on the value at loop index j-1 hence, for(int i = 0; i < N; i++) { for(j = 0; j < i+1; i++) { m[i,j],x[i] = MaxMin(m[i-1,j],x[i]);} m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);} } 30 Sorting: The algorithm in ‘single assignment’ input int in[0:N-1], x[0:N-1,-1:N-1], m[-1:N-1,0:N-1]; for(int i = 0; i < N; i++) { x[i,-1] = in[i]; m[i-1,i] = - infinite; } body for(int i = 0; i < N; i++) { for(j = 0; j < i+1; j++) { m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);} } output for(int j = 0; j < N; j++) { out[j] = m[N-1,j];} All scalar variables are assigned only once. The algorithm satisfies the single assignment property 31 Sorting: The algorithm in ‘single assignment’ 0 n-1 -1 x0 0 x1 x2 x2 in n-1 -∞ x0 x1 x3 -1 n-1 -∞ m -∞ i = 1 -∞ x3 j = 0 n-1 x out int in[0:N-1], x[0:N-1,-1:N-1], m[-1:N-1,0:N-1]; for(int i = 0; i < N; i++) { x[i,-1] = in[i]; m[i-1,i] = - infinite; } MaxMin for(int i = 0; i < N; i++) { for(j = 0; j < i+1; j++) { m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);} } 32 Sorting: The algorithm in ‘single assignment’ 0 n-1 -1 5 0 7 i -∞ 5 -∞ MM 7 4 6 -1 m -∞ 4 n-1 n-1 -∞ 6 n-1 j for(int i = 0; i < N; i++) { for(j = 0; j < i+1; j++) { m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);} } 33 Sorting: The algorithm in ‘single assignment’ 0 n-1 -1 5 0 7 i -∞ -∞ 5 7 -∞ m -∞ MM 4 6 5 -1 n-1 4 n-1 -∞ 6 n-1 j for(int i = 0; i < N; i++) { for(j = 0; j < i+1; j++) { m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);} } 34 Sorting: The algorithm in ‘single assignment’ 0 n-1 -1 5 0 7 i 4 6 5 -∞ 7 5 -1 -∞ 5 7 4 n-1 n-1 -∞ m -∞ -∞ 6 n-1 j for(int i = 0; i < N; i++) { for(j = 0; j < i+1; j++) { m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);} } 35 Sorting: The algorithm in ‘single assignment’ 0 n-1 -1 5 0 7 i -∞ 7 5 -∞ 5 7 -∞ m -∞ MM 4 6 5 -1 n-1 4 n-1 -∞ 6 n-1 j for(int i = 0; i < N; i++) { for(j = 0; j < i+1; j++) { m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);} } 36 Sorting: The algorithm in ‘single assignment’ 0 n-1 -1 5 0 7 i 4 6 5 -∞ 7 5 -1 -∞ -∞ 5 -∞ 7 5 4 n-1 n-1 m -∞ -∞ 6 n-1 j for(int i = 0; i < N; i++) { for(j = 0; j < i+1; j++) { m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);} } 37 Sorting: Recurrent relation A description in single assignment can be directly translated into a recurrent relation declaration in[0:N-1], out[0:N-1], x[0:N-1, -1:N-1], m[-1:N-1, 0:N-1]; input x[i,-1] = in[i] m[i-1,i] = - infinite body m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]) output out[j] = m[N-1,j] area 0 <= i < N; 0 <= j < i+1 } Notice that the order of these relations is arbitrary 38 Sorting: Body in two dimensions m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]) body The body is executed for all i and j. Hence two dimensions j m[i-1,j] x[i,j-1] i MaxMi n x[i,j] m[i,j] 39 Sorting: Body implementation body m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-i]) if( m[i-1,j] <= x[i,j-1]) { m[i,j] = x[i,j-1]; x[i,j] = m[i-1,j]; } else { m[i,j] = m[i-1,j]; x[i,j] = x[i,j-1]); } j m[i-1,j] 1 0 x[i,j-1] 1 i 0 x[i,j] m[i,j] 40 Sorting: Implementation N = 4 -1 0 1 2 3 j -1 m[-1,0]= x[0,-1] 0 PE = MaxMin PE m[0,1]= x[1,-1] 1 PE PE m[1,2]= x[2,-1] 2 PE PE PE m[2,3]= x[3,-1] 3 i PE m[3,0] PE PE PE m[3,1] m[3,2] m[3,3] 41 Sorting: Example N = 4 3 1 5 2 5 3 2 1 42 Something on functions Tuple : a, b a A b B Cartesian product: set of all tuples a, b is A B a, b A B The number of tuples in the set AB A B If Q is a set and P is a subset of Q, P Q then the set of all subsets of Q is 2Q The number of subsets of Q is 2Q 2 Q Hence, the set of all subsets of A B is 2 AB and the number of subsets of A B is 2 AB 2 AB 43 Something on functions F X Y Function F X Y is the set of all functions with domain X and co-domain Y F is a function in X Y if and only if Each element of the domain of X is mapped by F on a single element of the codomain Y Hence and a : a X F a Y a, b,c : a X F a b F a c b c F can be represented as a set of tuples a, b with a X and b Y Hence, X Y 2 X Y 44 Functions, Arrays, Tuples, Sequences, .... Arrays, tuples and sequences are all representations of the same set of functions D l ,u V in which Dl,u is a closed subset of the set of integers Z Dl ,u z | z Z l z u and V is some value co-domain So y y 0 , y1, y 2,....., y N 1, corresponds to y D0,N 1 V Hence, yi, y(i) and y[i] are syntactically different notations for the function value in i. 45 Functions on more than one variable Currying A function on two variables can be represented in three different ways: F a, b F a b F b a F a,b : a, b F v v F a, b F A B V F a, b ,v | a, b A B v V 46 Functions on more than one variable Currying F a, b F a b F b a F a b : a F b F a v v F ab F * A B V F a, p | a A p B V F ba : b F a F b v v F ba F * * B A V F b, q | b B q A V 47 Functions on more than one variable Currying (Example) v a 2b a 0,1, 2 b 0, 1, 2, 3 v b 0 1 2 3 0 0 2 4 6 0 1 F 1 2 3 1 3 5 7 a 1 1 3 5 7 2 2 4 6 8 F 1,2 F 12 F 21 0 4 F 2 1 5 2 6 48 Linear Time Invariant Systems x and y are streams. x z y F z time time x Z V y Z V Time is represented by the set of integers Z, so F maps functions on functions F Z V Z V Obviously, this class of functions also models systems that cannot exist in reality. For example non-causal systems 49 Adding functions x and y are streams modeled by functions on Z. z time + xi Z V = x3 x1 x2 x3 z x1z x2 z for all z Z 50 Linear functions, linear systems Definition: A system F is called linear if F a.x1 b.x2 a.F x1 b.F x2 or y1 a.F x1 y 2 b.F x2 x1 y1 x2 y2 x1+x2 y1+y2 y1 y 2 F a.x1 b.x2 51 Time invariant systems Definition: A system F is called time invariant if y1 F x1 x2 t x1t y 2 F x2 y 2 (t ) y1t x1 y1 x2 y2 52 Linear time-invariant systems Why? Linear: Because they can easily be described Time-invariant: Because electrical systems like transistors resistors capacitance and induction satisfy this property. 53 The convolution algorithm The behavior of a linear time-invariant system can be fully described by its impulse response h, i.e. the response on the output to a single unit pulse on the input. The response y on the output to an input stream x then follows from: y z x i .hz i i or y xh We will derive this convolution operation for time discrete signals 54 The convolution algorithm Let the unit sample sequence be defined by 1 i z 0 if i z In which otherwise z represents time, i, z Z i represents the location of the unit pulse i Z V Z Z V i z 1 i z 55 The convolution algorithm • Step 1: express x using a delta function 56 The convolution algorithm Then x z x x i . i i 1 2 3 4 5 6 7 x2. 2 x3. 3 x4. 4 x5. 5 in which (i) is a function on Z and x(i) is a scalar 57 The convolution algorithm • Step 1: express x using a delta function • Step 2: rewrite time-shifted delta function 58 The convolution algorithm Shifting over y z x z z Hence y z xz 0z 2z -1 0 1 2 3 -1 z Hence z i z 0z i 0 1 2 3 z 59 The convolution algorithm • Step 1: express x using a delta function • Step 2: rewrite time-shifted delta function • Step 3: rewrite impulse response using time invariance property 60 The convolution algorithm Consider a linear time-invariant system F (i) F h*(i) Let h*(i) be the response of this system to the unit sample sequence (i). (i)(z) h*(i)(z) z z F is time-invariant, so h * i z h * 0z i 61 The convolution algorithm • Step 1: express x using a delta function • Step 2: rewrite time-shifted delta function • Step 3: rewrite impulse response using time invariance property • Step 4: rewrite impulse response using linearity property 62 The convolution algorithm (i) Example F (0)(z) -1 h*(i) h*(0)(z) 0 1 2 3 -1 0 1 2 z z -(1)(z) -1 -h*(1)(z) 0 1 2 3 ½.(2)(z) -1 3 -1 1 0 3 2 ½. h*(2)(z) 0 1 2 3 a. i z a. 0z i -1 0 1 2 3 4 a.h * i z a.h * 0z i 63 The convolution algorithm • Step 1: express x using a delta function • Step 2: rewrite time-shifted delta function • Step 3: rewrite impulse response using time invariance property • Step 4: rewrite impulse response using linearity property • Step 5: rewrite general expression by means of algebraic manipulation using result from step 4. 64 The convolution algorithm x F y y F x h * i F i h * 0 h in which h is called the impulse response of the system F 65 The convolution algorithm From the preceding we derive: x x i . i i scalar function on Z y F x i . i i F is linear and x(i) is a scalar, hence y h * i F i x i .F i i y x i .h * i i 66 The convolution algorithm continue x i .h * i y i y z x i .h * i z i recall h3 h1 h2 h3 z h1z h2 z for all z Z y z x i .h * i z i recall h * i z h * 0z i y z x i .h * 0z i i 67 The convolution algorithm continue recall y z h * 0 h x i .h * 0z i i y z x i .hz i i This is called the convolution operation, denoted by y xh We will apply this formula several times 68 The convolution algorithm continue y z x i .hz i i with j = z – i, we obtain: y z x z j .h j j and if the impulse response h is finite (bounded), i.e. hz 0 if z 0 or z N we get y z N 1 x z j .h j j 0 69 Dependency Graphs and Signal Flow Graphs The array processor described: PE • the way in which the processors are arranged and PE • the way in which the data is communicated between the processing elements. PE PE PE PE PE PE PE PE PE PE Hence, the graph describes the dependencies of the data that is communicated, or said differently: The graph describes the way in which the data values at the outputs of a processing element depend on the data at the outputs of the other processing elements. So we may consider it as a Dependency Graph or a Signal Flow Graph 70 Dependency graphs and Signal Flow Graphs Dependency Graph: PE PE PE PE All communicated values are scalars and the processing elements are functions on scalars. Each arrow carries only one value. Time does not play a role. PE PE PE PE PE V N V N PE PE PE PE V is the value domain, number of inputs = number of outputs = N Signal Flow Graph: The communicated values are streams, i.e. functions on time and the processing elements are functions on steams. PE Z V Z V N N Z represents time 71 Recurrent relations For simple algorithms the transformation from single assignment code to a recurrent relation is simple. Questions to answer: • How do recurrent relations influence the dependency graph • How can recurrent relations be manipulated such that the behavior remains the same and the structure of the dependency graph is changed We will answer these questions by means of an example: Matrix-Vector multiplication c A.b 72 Matrix Vector multiplication c A.b Recurrent relations: ci j 0 ai , j .b j N 1 si , j si , j 1 ai , j .b j si ,1 0 i 0,, K 1 ci si ,N 1 j 0,, N 1 Alternative (because is associative) si , j si , j 1 ai , j .b j si ,N 0 i 0,, K 1 ci si ,0 j 0,, N 1 73 Matrix Vector multiplication The basic cell is described by: si , j si , j 1 ai , j .b j We have two indices i and j, so the dependency graph can be described as a two-dimensional array j bj bj si,j-1 i ai,j x PE si,j si,j-1 + si,j 74 DG-1 of the Matrix Vector multiplication j si , j si , j 1 ai , j .b j b0 si ,1 0 c i si , 2 i 0, 1, 2, 3(K = 4) 0 s0,-1 b1 PE i 0 PE j 0, 1, 2 (N = 3) 0 b0, b1 and b2 are global dependencies. Therefore this graph is called a 0 PE s3,-1 PE S0,0 s1,0 s2,0 s3,0 b2 s0,1 PE s0,2=c0 PE PE s1,2=c1 PE PE s2,2=c2 PE PE s3,2=c3 PE Globally recursive Graph 75 DG-2 of the Matrix Vector multiplication j si , j si , j 1 ai , j .b j b0 si ,N 0 c0=s0,0 ci si ,0 i 0, 1, 2, 3 (K = 4) j 0, 1, 2 (N = 3) i c1=s1,0 c2=s2,0 c3=s3,0 b1 PE PE PE PE s0,1 s1,1 s2,1 s3,1 b2 s0,2 s0,3 PE 0 PE PE 0 PE PE 0 PE PE PE s3,3 0 76 Variable naming and index assignment j bi-1,j-1 ci-1,j ai,j-1 PEi,j i ai,j A variable associated to an arrow gets the indices of the processing element that delivers its value. (i,j) ci,j bi,j vi,j PEi,j Local constants get the indices of the processing element that they are in 77 Recurrent relations: Conclusion The associative operations and result in two different recurrent relations and thus in two different dependency graphs. Equation N 1 x i 1 i y si si 1 xi , results in s1 0, y sN 1 si si 1 xi , s N 0, y s0 with i 0, 1, , N 1 Equation y N 1 x i 0 i si si 1.xi , si si 1.xi , with results in s1 1, sN 1, y sN 1 y s0 i 0, 1, , N 1 Other associative operations are for example ‘AND’ and ‘OR’. 78 Changing global data dependencies into local data dependencies Global data dependencies resist manipulating the dependency graph j N 1 b j c a .b i j 0 i, j j Global data dependencies si , j si , j 1 ai , j .b j si ,1 0 ci si ,N 1 Local data dependencies si , j si , j 1 ai , j .d i 1, j si ,1 0 d 1. j b j ci si ,N 1 ci i bj di-1,j si,j ci d i , j d i 1, j 79 Changing global data dependencies into local data dependencies So the matrix-vector multiplications becomes: b0=d-1,0 b1=d-1,1 b2=d-1,2 c i j 0 ai , j .b j N 1 Relations: si , j si , j 1 ai , j .d i 1, j si ,1 0 ci si ,N 1 d 1. j b j d i , j d i 1, j i 0, 1, 2, 3 (K = 4) j 0, 1, 2 (N = 3) Locally recursive graph 0 s0,-1 PE s0,0 d0,0 0 PE s1,0 s0,1 PE s0,2=c0 PE PE s1,2=c1 PE PE s2,2=c2 PE PE s3,2=c3 PE d0,1 d1,0 0 0 PE s3,-1 PE s2,0 s3,0 80 Alternative transformation from global data dependencies to local data dependencies c i j 0 ai , j .b j N 1 bi Global data dependencies si , j si , j 1 ai , j .b j si ,1 0 ci si ,N 1 ci Local data dependencies si , j si , j 1 ai , j .d i 1, j si ,1 0 ci si ,N 1 dN, j b j d i , j d i 1, j di,j si,j bi ci 81 Changing global data dependencies into local data dependencies So the alternative locally recursive graph becomes: c i j 0 ai , j .b j N 1 0 s0,-1 d1,0 Relations: si, j si, j 1 ai, j .d i 1, j si ,1 0 ci si ,N 1 d K, j b j 0 PE s1,0 s0,1 PE s0,2=c0 PE PE s1,2=c1 PE PE s2,2=c2 PE PE s3,2=c3 PE d1,1 d2,0 0 d i , j d i 1, j i 0, 1, 2, 3 (K = 4) j 0, 1, 2 (N = 3) PE s0,0 0 PE s3,-1 PE s2,0 s3,0 b0=d4,0 b1=d4,1 b2=d4,2 82 Shift-invariant graph Consider an N-dimensional dependency graph with processing elements PE at locations (i,j,k, ...) . Base (1,0,0,..), (0,1,0,..), (0,0,1,...), ... . If for any (i,j,k, ...) and for any input x of the PE at (i,j,k, ...) that is delivered by the output x of PE at (p,q,r,... ), holds that the input x of the PE at (i,j+1,k,...) is delivered by the output x of the PE at (p,q+1,r,... ), then the graph is called shift-invariant in the direction (0,1,0,..). j i Sh-Inv in direction i Sh-Inv in direction i and j 83 Shift-invariant graphs (Examples) j i Sh-Inv in direction i and j Sh-Inv in no direction Sh-Inv in no direction Sh-Inv in direction j 84 Shift-invariant graphs Because the inputs and outputs often negatively influence the shift-invariance property, the inputs and outputs are treated separately. Hence, we always distinguish between • Input edges, • Output edges and • Intermediate edges 85 Dependeny Graphs Conclusions: Associative operations give two alternative DG’s. Transformation from global to local dependencies gives two alternative DG’s. Input, output and intermediate edges will be treated separately. 86
© Copyright 2024 Paperzz