Design & Co-design of Embedded Systems Distributed System Co-synthesis (2) Maziar Goudarzi Today Program Introduction Preliminaries Hardware/Software Partitioning Distributed System Co-Synthesis (part 2) References: Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997. W. Wolf, “An architectural co-synthesis algorithm for distributed, embedded computing systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 2, pp. 218-229, 1997. Fall 2005 Design & Co-design of Embedded Systems 2 Topics Introduction An Integer Linear Programming Model A Heuristic Algorithm On ordinary task graphs On an Object-Oriented model Fall 2005 Design & Co-design of Embedded Systems 3 Co-Synthesis Algorithms: Distributed System Co-Synthesis Wolf’s Heuristic Algorithm on Ordinary Task Graphs Wolf’s Heuristic Algorithm As ever, topics of importance: System Specification Language/Model Target Architecture Functionality (Allocation/Scheduling) Quantum Allocation Strategy Scheduling Strategy Cost Estimation Performance Estimation Algorithm Details Fall 2005 Design & Co-design of Embedded Systems 5 Wolf’s Heuristic Algorithm (cont’d) Wolf’s Heuristic Algorithm System Specification Language/Model Algorithm input: single-rate task graph Target Architecture Heterogeneous multiprocessor architecture Allocation Primal approach: Performance is the major objective Scheduling ? Functionality Quantum Processes in a single-rate task graph Fall 2005 Design & Co-design of Embedded Systems 6 Wolf’s Heuristic Algorithm (cont’d) Wolf’s Heuristic Algorithm (cont’d) Performance Estimation Component Technology Library Run-time of each process on each available PE is supposed to be known Cost Estimation Component Technology Library Total Cost = Si (Cost of PEi) + Sj (Cost of Devicej) + Sk (Cost of Comm. Channelk) Algorithm Details Fall 2005 Design & Co-design of Embedded Systems 7 Wolf’s Heuristic Algorithm Details Four major steps in co-design Partitioning: dividing the spec. into smaller parts (e.g. processes) Allocation: assigning each process to a multiprocessor node (PE) Scheduling: serializing processes assigned to each PE Mapping: selecting a particular component for each PE Problem: These steps (especially allocation, scheduling, and mapping) have a circular relationship Solution: Break the loop Fall 2005 Design & Co-design of Embedded Systems 8 Wolf’s Heuristic Algorithm Details (cont’d) Wolf: 1. Give an initial allocation 2. Refine it to reduce cost Order of satisfying design criteria: 1. 2. 3. 4. Fall 2005 Satisfy all deadlines Minimize PE cost Minimize comm. port cost Minimize device cost Design & Co-design of Embedded Systems 9 Wolf’s Heuristic Algorithm Details (cont’d) First ignore communication costs. Later, take them into account Steps: 1. Create an initial feasible solution, and perform an initial scheduling on it. • Initial feasible solution: assign each process to a separate PE 2. Reallocate processes to PEs to minimize total PE cost. • Possibly eliminate PEs from initial feasible solution 3. Reallocate processes again to minimize the amount of communication required between PEs 4. Allocate communication channels 5. Allocate IO devices. (Internal or external to PEs) Fall 2005 Design & Co-design of Embedded Systems 10 Wolf’s Heuristic Algorithm Details (cont’d) The most important step: 2. Initial reallocation Reason: PE cost is the dominant hardware cost Initial reallocation 1. PE cost reduction: 1.1 Scan the PEs, starting with the least-utilized PE. 1.2 Try to reallocate that PE’s processes to other existing PEs 1.3 If no process left on the PE, eliminate it otherwise replace the PE with a suitable lower-cost one 2. Pair-wise merge Merge a pair of PEs into a single, more powerful one 3. Load balancing Fall 2005 Design & Co-design of Embedded Systems 11 Wolf’s Heuristic Algorithm Details (cont’d) Initial reallocation (cont’d) “PE cost reduction” phase tries to reallocate multiple processes at a time The above 3 phases are repeated as far as possible Fall 2005 Design & Co-design of Embedded Systems 12 Wolf’s Heuristic Algorithm: Experimental Results Example #processes Period Impl. Cost Wolf P&P CPU time (sec) Wolf P&P pp1 4 2.5 3 4 7 14 14 7 5 14 13 7 5 0.05 0.05 0.05 0.05 11 24 28 37 pp2 9 5 6 7 15 12 8 15 12 8 0.7 1.1 1.6 3732 26710 32320 8 15 8 5 7 5 1.0 1.1 4511 385012 Fall 2005 Design & Co-design of Embedded Systems 13 Wolf’s Heuristic Algorithm Experimental Results (cont’d) Finds optimal solutions to most of ILP-solved examples Finds near-optimal solutions for the remaining examples Showed good results on larger examples Requires very little run-time Due to multiple-move strategy during PE cost minimization phase Fall 2005 Design & Co-design of Embedded Systems 14 Co-Synthesis Algorithms: Distributed System Co-Synthesis Wolf’s Heuristic Algorithm for Object-Oriented Models Introduction Target Co-synthesis of a Distributed-System out of an Object-Oriented Specification Significance OO is a promising approach in designing embedded systems at ESL Reference: W. Wolf, “Object-Oriented Co-Synthesis of Distributed Embedded Systems,” ACM Transactions on Design Automation of Electronics Systems, pp. 301-314, 1996 Design & Co-design of Fall 2005 Embedded Systems 16 OO Co-Synthesis Algorithm Again, our eight topics System Specification Language/Model Target Architecture Functionality (Allocation/Scheduling) Quantum Allocation Strategy Scheduling Strategy Cost Estimation Performance Estimation Algorithm Details Fall 2005 Design & Co-design of Embedded Systems 17 OO Co-Synthesis Algorithm (cont’d) System Specification Model/Language An Object-Oriented Specification as input Method dataflow graph as model method m1 variables v1,v2 method m2 variables v2,v3 Object O1 Fall 2005 method m4 variables v10,v20 Object O2 method m3 variables v8,v9 Object O3 Design & Co-design of Embedded Systems 18 OO Co-Synthesis Algorithm (cont’d) Target Architecture Distributed System An arbitrary-topology network of PEs Functionality Quantum Methods of Objects in an OO Specification As far as possible, keeps together all methods of an object Partitioning is done during algorithm execution Fall 2005 Design & Co-design of Embedded Systems 19 OO Co-Synthesis Algorithm (cont’d) Cost and Performance Estimation Pre-specified A technology description of available components is input to the algorithm Allocation, Scheduling, and Algorithm Details Much like Wolf’s previous heuristic algorithm Includes modifications in order to: handle large sets of methods consider effects of splitting objects across PEs Fall 2005 Design & Co-design of Embedded Systems 20 OO Co-Synthesis Algorithm (cont’d) Allocation, Scheduling, and Algorithm Details 1. Initial allocation and scheduling. Allocate processes to PEs such that all tasks are placed on PEs fast enough to ensure that all deadlines are met, keeping objects together as much as possible 2. Minimize PE cost. Reallocate processes to PEs to minimize PE cost, splitting objects when necessary. 3. Minimize communication. Reallocate processes again to minimize inter-PE communication, taking into account traffic generated by splitting objects across PEs Fall 2005 Design & Co-design of Embedded Systems 21 OO Co-Synthesis Algorithm (cont’d) Allocation, … Details (cont’d) 4. Allocate channels. Allocate communication channels 5. Allocate devices. either as on-chip devices or external devices on communication channels Fall 2005 Design & Co-design of Embedded Systems 22 OO Co-synthesis Details Step 1 (initial allocation) One PE per object Step 2 (minimize PE cost) oo_balance_load() Tries to redistribute methods to better balance the system load PE_replacement() Use a cheaper PE without distributing the allocation oo_pairwise_merge() Tries to eliminate PE by moving its methods to other PEs Step 2 is done repeatedly Methods are re-scheduled after each new allocation Fall 2005 Design & Co-design of Embedded Systems 23 OO Co-synthesis Details (cont’d) Note : This operation may cause "Hidden communication”. Fall 2005 Design & Co-design of Embedded Systems 24 OO Co-synthesis Details (cont’d) Fall 2005 Design & Co-design of Embedded Systems 25 OO Co-Synthesis Algorithm (cont’d) Experimental Results Algorithm implemented in C++ Using NIH class library Reason 8600 linesfor of highest code cpu-time: Having most methods => scheduling required in each inner loop of step 2 Executed on SGI Indigo workstation Algorithm applied to examples from software engineering books on OO design This implementation, had a simple inefficient scheduler. Fall 2005 Example #objects/methods cfuge 2/3 dye 3/15 juice 3/4 Design & Co-design of train 5/6 Embedded Systems CPU Time 0.05 2.0 0.05 0.05 26 OO Co-Synthesis Algorithm (cont’d) Main contribution OO specification is an important aid to automatic partitioning The specification is naturally divided into two levels of granularity • Systems is composed of Objects • Objects are composed of data members and methods The heuristic: Preserve the specification’s partitioning as much as possible Fall 2005 Design & Co-design of Embedded Systems 27 What we learned today Distributed System Co-Synthesis A heuristic approach Non-OO algorithm Customization to OO specifications Heuristic: First minimize the PE cost since it is the dominant factor Fall 2005 Design & Co-design of Embedded Systems 28
© Copyright 2026 Paperzz