Introduction to Co-Synthesis - Sharif University of Technology

Design & Co-design of
Embedded Systems
Distributed System
Co-synthesis (2)
Maziar Goudarzi
Today Program
Introduction
Preliminaries
Hardware/Software Partitioning
Distributed System Co-Synthesis (part 2)
References:
Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2,
Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W.
Wolf, Kluwer Academic Publishers, 1997.
W. Wolf, “An architectural co-synthesis algorithm for distributed, embedded
computing systems,” IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 5, no. 2, pp. 218-229, 1997.
Fall 2005
Design & Co-design of
Embedded Systems
2
Topics
Introduction
An Integer Linear Programming Model
A Heuristic Algorithm
On ordinary task graphs
On an Object-Oriented model
Fall 2005
Design & Co-design of
Embedded Systems
3
Co-Synthesis Algorithms:
Distributed System Co-Synthesis
Wolf’s Heuristic Algorithm
on Ordinary Task Graphs
Wolf’s Heuristic Algorithm
As ever, topics of importance:
System Specification Language/Model
Target Architecture
Functionality (Allocation/Scheduling) Quantum
Allocation Strategy
Scheduling Strategy
Cost Estimation
Performance Estimation
Algorithm Details
Fall 2005
Design & Co-design of
Embedded Systems
5
Wolf’s Heuristic Algorithm
(cont’d)
Wolf’s Heuristic Algorithm
System Specification Language/Model
Algorithm input: single-rate task graph
Target Architecture
Heterogeneous multiprocessor architecture
Allocation
Primal approach: Performance is the major objective
Scheduling
?
Functionality Quantum
Processes in a single-rate task graph
Fall 2005
Design & Co-design of
Embedded Systems
6
Wolf’s Heuristic Algorithm
(cont’d)
Wolf’s Heuristic Algorithm (cont’d)
Performance Estimation
Component Technology Library
Run-time of each process on each available PE is
supposed to be known
Cost Estimation
Component Technology Library
Total Cost = Si (Cost of PEi)
+ Sj (Cost of Devicej)
+ Sk (Cost of Comm. Channelk)
Algorithm Details
Fall 2005
Design & Co-design of
Embedded Systems
7
Wolf’s Heuristic Algorithm
Details
 Four major steps in co-design
 Partitioning: dividing the spec. into smaller parts (e.g.
processes)
 Allocation: assigning each process to a multiprocessor node
(PE)
 Scheduling: serializing processes assigned to each PE
 Mapping: selecting a particular component for each PE
 Problem: These steps (especially allocation,
scheduling, and mapping) have a circular
relationship
 Solution: Break the loop
Fall 2005
Design & Co-design of
Embedded Systems
8
Wolf’s Heuristic Algorithm
Details (cont’d)
 Wolf:
1. Give an initial allocation
2. Refine it to reduce cost
 Order of satisfying design criteria:
1.
2.
3.
4.
Fall 2005
Satisfy all deadlines
Minimize PE cost
Minimize comm. port cost
Minimize device cost
Design & Co-design of
Embedded Systems
9
Wolf’s Heuristic Algorithm
Details (cont’d)
First ignore communication costs. Later, take them into
account
Steps:
1. Create an initial feasible solution, and perform an initial
scheduling on it.
• Initial feasible solution: assign each process to a separate PE
2. Reallocate processes to PEs to minimize total PE cost.
• Possibly eliminate PEs from initial feasible solution
3. Reallocate processes again to minimize the amount of
communication required between PEs
4. Allocate communication channels
5. Allocate IO devices. (Internal or external to PEs)
Fall 2005
Design & Co-design of
Embedded Systems
10
Wolf’s Heuristic Algorithm
Details (cont’d)
The most important step: 2. Initial reallocation
Reason: PE cost is the dominant hardware cost
Initial reallocation
1. PE cost reduction:
1.1 Scan the PEs, starting with the least-utilized PE.
1.2 Try to reallocate that PE’s processes to other existing
PEs
1.3 If no process left on the PE, eliminate it
otherwise replace the PE with a suitable lower-cost one
2. Pair-wise merge
Merge a pair of PEs into a single, more powerful one
3. Load balancing
Fall 2005
Design & Co-design of
Embedded Systems
11
Wolf’s Heuristic Algorithm
Details (cont’d)
Initial reallocation (cont’d)
“PE cost reduction” phase tries to reallocate multiple
processes at a time
The above 3 phases are repeated as far as possible
Fall 2005
Design & Co-design of
Embedded Systems
12
Wolf’s Heuristic Algorithm:
Experimental Results
Example #processes
Period
Impl. Cost
Wolf
P&P
CPU time (sec)
Wolf
P&P
pp1
4
2.5
3
4
7
14
14
7
5
14
13
7
5
0.05
0.05
0.05
0.05
11
24
28
37
pp2
9
5
6
7
15
12
8
15
12
8
0.7
1.1
1.6
3732
26710
32320
8
15
8
5
7
5
1.0
1.1
4511
385012
Fall 2005
Design & Co-design of
Embedded Systems
13
Wolf’s Heuristic Algorithm
Experimental Results (cont’d)
Finds optimal solutions to most of ILP-solved
examples
Finds near-optimal solutions for the
remaining examples
Showed good results on larger examples
Requires very little run-time
Due to multiple-move strategy during PE
cost minimization phase
Fall 2005
Design & Co-design of
Embedded Systems
14
Co-Synthesis Algorithms:
Distributed System Co-Synthesis
Wolf’s Heuristic Algorithm
for Object-Oriented Models
Introduction
Target
Co-synthesis of a Distributed-System
out of an Object-Oriented Specification
Significance
OO is a promising approach in designing
embedded systems at ESL
Reference:
W. Wolf, “Object-Oriented Co-Synthesis of Distributed Embedded Systems,”
ACM Transactions on Design Automation
of Electronics Systems, pp. 301-314, 1996
Design & Co-design of
Fall 2005
Embedded Systems
16
OO Co-Synthesis Algorithm
Again, our eight topics
System Specification Language/Model
Target Architecture
Functionality (Allocation/Scheduling) Quantum
Allocation Strategy
Scheduling Strategy
Cost Estimation
Performance Estimation
Algorithm Details
Fall 2005
Design & Co-design of
Embedded Systems
17
OO Co-Synthesis Algorithm
(cont’d)
System Specification Model/Language
An Object-Oriented Specification as input
Method dataflow graph as model
method m1
variables v1,v2
method m2
variables v2,v3
Object O1
Fall 2005
method m4
variables v10,v20
Object O2
method m3
variables v8,v9
Object O3
Design & Co-design of
Embedded Systems
18
OO Co-Synthesis Algorithm
(cont’d)
Target Architecture
Distributed System
An arbitrary-topology network of PEs
Functionality Quantum
Methods of Objects in an OO Specification
As far as possible, keeps together all methods of an
object
Partitioning is done during algorithm execution
Fall 2005
Design & Co-design of
Embedded Systems
19
OO Co-Synthesis Algorithm
(cont’d)
Cost and Performance Estimation
Pre-specified
A technology description of available components is input
to the algorithm
Allocation, Scheduling, and Algorithm Details
Much like Wolf’s previous heuristic algorithm
Includes modifications in order to:
handle large sets of methods
consider effects of splitting objects across PEs
Fall 2005
Design & Co-design of
Embedded Systems
20
OO Co-Synthesis Algorithm
(cont’d)

Allocation, Scheduling, and Algorithm Details
1.
Initial allocation and scheduling.
Allocate processes to PEs such that all tasks are placed
on PEs fast enough to ensure that all deadlines are met,
keeping objects together as much as possible
2. Minimize PE cost.
Reallocate processes to PEs to minimize PE cost, splitting
objects when necessary.
3. Minimize communication.
Reallocate processes again to minimize inter-PE
communication, taking into account traffic generated by
splitting objects across PEs
Fall 2005
Design & Co-design of
Embedded Systems
21
OO Co-Synthesis Algorithm
(cont’d)
Allocation, … Details (cont’d)
4. Allocate channels.
Allocate communication channels
5. Allocate devices.
either as on-chip devices or external devices
on communication channels
Fall 2005
Design & Co-design of
Embedded Systems
22
OO Co-synthesis Details
Step 1 (initial allocation)
One PE per object
Step 2 (minimize PE cost)
oo_balance_load()
Tries to redistribute methods to better balance the
system load
PE_replacement()
Use a cheaper PE without distributing the allocation
oo_pairwise_merge()
Tries to eliminate PE by moving its methods to other PEs
Step 2 is done repeatedly
Methods are re-scheduled after each new allocation
Fall 2005
Design & Co-design of
Embedded Systems
23
OO Co-synthesis
Details (cont’d)
Note :
This operation may cause "Hidden communication”.
Fall 2005
Design & Co-design of
Embedded Systems
24
OO Co-synthesis
Details (cont’d)
Fall 2005
Design & Co-design of
Embedded Systems
25
OO Co-Synthesis Algorithm
(cont’d)
Experimental Results
Algorithm implemented in C++
Using NIH class library
Reason
8600
linesfor
of highest
code cpu-time:
Having most
methods
=> scheduling
required in each inner loop of step 2
Executed
on SGI
Indigo
workstation
Algorithm applied to examples from software
engineering books on OO design
This implementation, had a simple inefficient scheduler.
Fall 2005
Example #objects/methods
cfuge
2/3
dye
3/15
juice
3/4
Design & Co-design of
train
5/6
Embedded Systems
CPU Time
0.05
2.0
0.05
0.05
26
OO Co-Synthesis Algorithm
(cont’d)
Main contribution
OO specification is an important aid to automatic
partitioning
The specification is naturally divided into two levels of
granularity
• Systems is composed of Objects
• Objects are composed of data members and methods
The heuristic:
Preserve the specification’s partitioning as much as
possible
Fall 2005
Design & Co-design of
Embedded Systems
27
What we learned today
Distributed System Co-Synthesis
A heuristic approach
Non-OO algorithm
Customization to OO specifications
Heuristic: First minimize the PE cost since it is the
dominant factor
Fall 2005
Design & Co-design of
Embedded Systems
28