Swedish INTELECT Summer School on Multiprocessor Systems on Chip Örebro, Aug. 25-27, 2003 8.45 – 10.30 hrs Reiner Hartenstein Kaiserslautern University of Technology Reconfigurable Computing and its Compilation Techiques Kaiserslautern University of Technology Reconfigurable Computing: a second programming domain Migration of programming to the structural domain The structural domain has become RAM-based The opportunity to introduce the structural domain to programmers ... ... to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm © 2003, [email protected] 2 http://hartenstein.de Kaiserslautern University of Technology http://www.uni-kl.de >> outline << • why coarse grain reconfigurable ? • terminology • toward higher abstraction levels • flowware languages • why a new Machine Paradigm ? • (co-) compilation techniques • final remarks © 2003, [email protected] 3 http://hartenstein.de granularity Kaiserslautern University of Technology Datapath width 1 bit CLB: fine grain Word level CFB: coarse grain bundling of nibble or byte width CFBs: multiple granularity © 2003, [email protected] 4 http://hartenstein.de Kaiserslautern University of Technology One more argument for coarse grain we have already seen the first day: MOPS / mW 1000 T. Claasen et al.: ISSCC 1999 *) R. Hartenstein: ISIS 1997 100 10 1 0.1 Wiring by abutment: a 32 Bit KressArray example 0.01 0.001 2 © 2003, [email protected] 1 0.5 0.25 5 if coarse grain cells are full custom and mesh-connected, and 2nd level interconnect ressources layouted over the cells the array is almost as area-efficient as hardwired 0.13 0.1 0,07 µ feature size http://hartenstein.de Kaiserslautern University of Technology mapping algorithms efficently onto rDPA SNN filter on KressArray rout thru only array size: 10 x 16 = 160 rDPUs Legend: rDPU not used backbus connect used for routing only backbus connect operator and routing port location not usedmarker by the way: example of scalability / relocatability by EDA support also FPGA scalability (avoid routing congestion) by EDA solution © 2003, [email protected] 6 http://hartenstein.de Kaiserslautern University of Technology Xplorer Plot: SNN Filter Example http://kressarray.de 2 hor. NNports, 32 bit 3 vert. NNports, 32 bit route-thru-only rDPU © 2003, [email protected] [13] + result operand 7 operator operand route thru backbus connect http://hartenstein.de Kaiserslautern University of Technology PACT XPP: Reference Module: XPU128 Co-Processor XPP128 rDPA ALU • Full 32 or 24 Bit Design working silicon • 2 Configuration Hierarchies buses not shown Ctrl CFG rDPU PAE core • Evaluation Board • XDS Development Tool with Simulator • all used by SIEMENS Corporation • Other contractors preparing .... : ask © 2003, [email protected] Ron Mabry (here in the audience) http://hartenstein.de 8 Kaiserslautern University of Technology Mikroprozessorarchitekturen (8): hochgradig parallele Systeme E/A Konfiguration Manager E/A microprocessor architectures (8) SRAM PE PE PE PE PE PE PE PE PE SRAM SRAM PE PE PE PE PE PE PE PE PE SRAM SRAM PE PE PE PE PE PE PE PE PE SRAM SRAM PE PE PE PE PE PE PE PE PE SRAM SRAM PE PE PE PE PE PE PE PE PE SRAM SRAM PE PE PE PE PE PE PE PE PE SRAM SRAM PE PE PE PE PE PE PE PE PE SRAM ©Arndt Bode LRR-TUM © 2003, [email protected] ©Arndt Bode LRR-TUM 9 TU Dresden, 09.05.2003 E/A E/A http://hartenstein.de 9 XPP64A: Platform Development Board Kaiserslautern University of Technology - SDR Board In Debug Phase -> XPP64A Chips from STMicro Fab - Assembly & Test / Available March 2003 © 2003, [email protected] 10 http://hartenstein.de PACT Corp Kaiserslautern University of Technology • Xtreme Processor Platform (XPP) family of IP cores, high-speed data-stream-capable, scalable, reconfigurable clusters of arrays of 32-bit DPUs with embedded memories, and high-speed I/O ports • Application development support software featuring a flow graphstyle algorithm mapping language - to minimize training requirements. • XPP's fabrics, featuring automatic DataFlow synchronization and flagged Event Network to dynamically configure the execution flow, • Supports dynamic RTR: hierarchical configuration managers free the designer from chip-level details and ensure that configurations are independently loaded in exactly the intended order. • Automatic event-based task swapping along with data streams: released resources automatically reconfigured immediately © 2003, [email protected] 11 http://hartenstein.de Kaiserslautern University of Technology Entwicklung der Mikroprozessor Architekturen (1) Bis 1995: Einschränkung - , seit 1995 Erhöhung der Typen- und Architekturvielfalt Transistorzahl (Moore‘s Gesetz): Abwägung Rechenleistung-Leistungsaufnahme-KostenKompatibilität MPR Analysts‘ Choice Awards Kategorien: - PC Processors: Intel P4 (HyperThreading), AMD Athlon (x 86-64, Hyper Transport), Transmeta (Binary Compilation, VLIW),... - Server Processors: Intel Xeon MP und Itanium 2 (EPIC), AMD Opteron (x86-64), HP Alpha EV-7, Fujitsu Sparc 64 V (out-of-order superscalar) - High-Performance Embedded Processors: Broadcom BCM 1250, IBM 440 GX, Intrinsity FastMIPS, Motorola MPC 7455, NEC VR7701, PMC Sierra RM9000x2 - Low-Power Embedded Processors: AMD Au1100, Intel PXA 250, NEC VR 4131, DragonBall MX1, NeoMagic MiMagic5 (1mW pro MHz) - Extreme Processors: CmU PipeRench, Intrinsity FastMath, Micron Yukon, microprocessor architectures (1) NEC DRP, PACT XPP, Sandbridge Sand Blaster (bis 512 ALUs) - Embedded IP Processor Cores: ARCtangent-A5, ARM 1026 EJ-S/1136JF-S, Improv Crescendo, MIPS M4K, Tensilica Xtensa V - Graphics Processors: 3Dlabs Wildcat VP900, ATI Radeon 9700, Nvidia GeForce FX ©Arndt Bode LRR-TUM © 2003, [email protected] 12 http://hartenstein.de 12 Kaiserslautern University of Technology wide variety of speed-up factors key issue: algorithmic cleverness platform speed-up factor application PACT Xtreme 4-by-4 array 16 tap FIR filter [2003] MoM anti machine with DPLA* [1983] straight x16 MOPS/mW forward grid-based DRC** 1-metal 1-poly nMOS 256 reference patterns > x1000 (computation time) *) MPC fabrication via E.I.S. multi university project © 2003, [email protected] method 13 multiple aspects **) Design Rule Check http://hartenstein.de Kaiserslautern University of Technology instruction stream-based Compilation Principles 1-D memory space source text parser library link/load instruction call placement scheduler execution order by location © 2003, [email protected] 14 http://hartenstein.de Kaiserslautern University of Technology Datastream-based Compilation Principles library mapper placement & routing scheduler data stream assembly © 2003, [email protected] 15 http://hartenstein.de Kaiserslautern University of Technology © 2003, PACT AG Sequential Processor Model Conventional processors use the sequential model: Each operation takes one clock cycle. Multiple operations are computed consecutively. Register Operation 1 Operation 2 Operation 3 Operation 4 Operation 5 Time © 2003, [email protected] 16 http://hartenstein.de Kaiserslautern University of Technology © 2003, PACT AG A New Parallel Processor Paradigm Multiple computations are configured as code sections onto a two dimensional array. y Data Buffer x © 2003, [email protected] 17Time http://hartenstein.de Kaiserslautern University of Technology Parallel Processor Model © 2003, PACT AG Multiple code sections are computed sequentially. y Section 1 x Operation 2 Section 2 Section 3 © 2003, [email protected] Time 18 http://hartenstein.de Kaiserslautern University of Technology Dataflow Performance © 2003, PACT AG Traditional Microprocessor XPP Architecture Instruction Memory and cache ALU Configuration Memory and cache Register ADD MULT Array of ALUs One word Filter One operation per cycle FFT SHIFT Basic machine operations performed on single words © 2003, [email protected] Buffer Stream of words Many operations per cycle Viterbi 19 Complex Functions performed on data streams http://hartenstein.de Dataflow Synchronisation: Transport Kaiserslautern University of Triggered Technology 3 2 1 3 13 5 © 2003, [email protected] 3 6 3 13 4 13 7 20 8 http://hartenstein.de XPP: Parallel Algorithm Example Kaiserslautern University of Technology Matrix Multiplication a b c d x x = y Flow Graph a x © 2003, [email protected] ax+by cx+dy x c PACT x’ = y’ y b x x + + x’ y’ 21 Matrix is Constant d x http://hartenstein.de XPP: Parallel Algorithm Example Kaiserslautern University of Technology a x x I/O c y b x x + x’ + y’ PACT d x a I/O MUL • SCM configures Opcodes and Constant Registers via CM SCM + CM Note: MAC Opcode is not used in this example to improve clarity of the presentation http://hartenstein.de © 2003, [email protected] 22 XPP: Parallel Algorithm Example Kaiserslautern University of Technology a x x in_x c x x + x’ + y’ x c b d MUL in_y mul1 MUL mul2 MUL mul3 MUL mul4 adder1 © 2003, [email protected] PACT d a ADD SCM + CM y b out_x out_y • CM Configures Opcodes and Constant Registers ADD adder2 23 http://hartenstein.de XPP: Parallel Algorithm Example Kaiserslautern University of Technology a x x c y b x x + x’ + y’ PACT d x y a c b d y’ x MUL in_y mul1 MUL mul2 MUL mul3 MUL mul4 x’ ADD ADD in_x SCM + CM adder1 © 2003, [email protected] out_x out_y • CM Configures Routing Resources adder2 24 http://hartenstein.de XPP: Parallel Algorithm Example Kaiserslautern University of Technology a x x y x y b x x + x’ + y’ x c b d MUL in_y mul1 MUL mul2 MUL mul3 MUL mul4 ADD ADD adder1 © 2003, [email protected] PACT d a I/O in_x c I/O y’ x’ out_x out_y • Data Packets are routed through the Network adder2 25 http://hartenstein.de Kaiserslautern University of Technology http://www.uni-kl.de >> terminology << • why coarse grain reconfigurable ? • terminology • toward higher abstraction levels • flowware languages + mapping • why a new Machine Paradigm ? • (co-) compilation techniques • final remarks © 2003, [email protected] 26 http://hartenstein.de Tredennick’s Paradigm Shifts Kaiserslautern University of Technology standard TTL 1957 custom hardwired 1967 LSI, MSI procedural programming µproc., memory 1977 structural programming 2007 1987 ASICs, accel’s 1997 2 sources algorithm: fixed algorithm: variable algorithm: variable resources: fixed resources: fixed resources: variable © 2003, [email protected] vN machine paradigm 27 new machine paradigm needed http://hartenstein.de Paradigm Shifts: Nick Tredennick‘s view Kaiserslautern University of Technology why 2 program sources ? reconfigurable computing: instruction-streambased computing: algorithms variable algorithms variable resources fixed resources variable programmable © 2003, [email protected] 28 http://hartenstein.de Kaiserslautern University of Technology programming media and platforms Co-Compilation software data adress generators asM - auto sequencing data Memory program Memory instruction stream hardware © 2003, [email protected] ... interface µProcessor flowware data streams Reconfigurable Accelerators morphware 29 configware configuration Memory bit stream http://hartenstein.de Kaiserslautern University of Technology Placement & routing (configware) done: ... which data item flowware defines .... at which time at which port time x x x DPA time x x x | x x x | | x x x x x x - time - - - - x x x - - - - - x x x port # | | | | | | | | | | | x x x 30 port # - - - x x x x x x - - © 2003, [email protected] input data streams time x x x port # output data streams | x x x http://hartenstein.de Kaiserslautern University of Technology Terminology: Digital System Platforms clearly distinguished source running on it platform hardware (not running on it) fine grain rGA (FPGA) configware morphware coarse rDPU, rDPA grain reconfigurable flowware & data stream configware processor data stream processor (hardwired) flowware instruction stream processor software © 2003, [email protected] machine paradigm 31 none anti machine von Neumann machine http://hartenstein.de Kaiserslautern University of Technology http://www.uni-kl.de >> higher abstraction levels << • why coarse grain reconfigurable ? • terminology • toward higher abstraction levels • flowware languages + mapping • why a new Machine Paradigm ? • (co-) compilation techniques • final remarks © 2003, [email protected] 32 http://hartenstein.de Kaiserslautern University of Technology „EDA industry shifts into CS mentality“ [Wojciech Maly] • patches instead of engineering • innovation stalled many years ago • netlist-based: do not care about efficiency, ... • ... do not care about transistor density • 85% users hate their tools © 2003, [email protected] 33 http://hartenstein.de Kaiserslautern University of Technology Development of Hypergrowth Markets Harper Business 1995 Mainstream Tornado Paradigm Shift © 2003, [email protected] 34 http://hartenstein.de Kaiserslautern University of Technology McKinsey Curve: dynamics of R&D disciplines new discipline on top of it by .... maturity of a discipline saturation: limitations met ... by innovation consolidation year fundmental issues © 2003, [email protected] 35 http://hartenstein.de EDA Industry Revolutions Kaiserslautern University of Technology EDA industry paradigm switching every 7 years courtesy [Keutzer / Newton] 1999 1992 HLLs, (Co-) Compilation Data-Stream-based DPU arrays Synthesis: Cadence, Synopsys ... 1985 1978 coming closer to programmers‘ mind set 2006 Schematics entry: Daisy, Mentor, Valid ... Transistor entry: Applicon, Calma, CV ... © 2003, [email protected] 36 http://hartenstein.de Kaiserslautern University of Technology SoC System level Design: Embedded SW (ESW) (ECW) ESW becomes main vehicle to product differentiation ECW ESE becomes the main focus in system design: CW- HW-(E)SW codesign onto highly programmable platforms (SoC) new design automation from high level descriptions CW and SW synthesis included (SoC) CW- HW-(E)SW-co-verification H.] formal verification for (E)SW and CW © 2003, [email protected] 37 http://hartenstein.de Kaiserslautern University of Technology Complexity: System Level Design Challenge [ITRS 2001] “abstraction levels must be raised above present-day RT-level from HW + (processor-dependent embedded) C code level language infrastructures for complex models (SystemC etc.) must be leveraged by industry consensus on use-methodology and abstraction levels” © 2003, [email protected] 38 http://hartenstein.de Kaiserslautern University of Technology http://www.uni-kl.de >> flowware languages << • why coarse grain reconfigurable ? • terminology • toward higher abstraction levels • flowware languages + mapping • why a new Machine Paradigm ? • (co-) compilation techniques • final remarks © 2003, [email protected] 39 http://hartenstein.de Kaiserslautern University of Technology mathematic methods for systolic array synthesis time good reading: Nikolay Petkov: Systolic Parallel Processing; North-Holland; 1992 DPA linear projection or algebraic method mapping math formula preprocessing x x x time © 2003, [email protected] | | architecture 40 port # - - - x x x time - - - - x x x x x x - - DPU input data streams | x x x x x x - - - - - - x x x port # only uniform DPA with linear pipes: only for applications with strictly regular data dependencies x x x x x x | | | | | | | | | | | x x x time x x x port # output data streams | x x x http://hartenstein.de Kaiserslautern University of Technology Compilation for (r)DPA of anti machine high level source program (software notation) parameters wrapper expression morphware tree DPU library configware mapper code generators scheduler simulated annealing streamware flowware © 2003, [email protected] 41 http://hartenstein.de Super Pipe Networks Kaiserslautern University of Technology array systolic array applications regular data dependencies only supersystolic rDPA * pipeline properties shape resources linear only uniform only mapping linear projection or algebraic synthesis simulated annealing or P&R algorithm no restrictions scheduling (data stream formation) (e.g. force-directed) scheduling algorithm *) KressArray [1995] © 2003, [email protected] 42 http://hartenstein.de Kaiserslautern University of Technology language category both deterministic operation sequence driven by: state register address computation Instruction fetch parallel memory bank access © 2003, [email protected] Programming Language Paradigms Computer Languages Languages f. Anti Machine procedural sequencing: traceable, checkpointable read next instruction, read next data item, goto (instr. addr.), goto (data addr.), jump (to instr. addr.), jump (to data addr.), instr. loop, loop nesting data loop, loop nesting, no parallel loops, escapes, parallel loops, escapes, instruction stream branching data stream branching program counter data counter(s) massive memory overhead avoided cycle overhead memory cycle overhead overhead avoided interleaving only no restrictions 43 http://hartenstein.de Basics of Binding Time Kaiserslautern University of Technology time of “Instruction Fetch” run time parallel computer v.N. machine Reconfigurable Computing anti machine microprocessor loading time compile time © 2003, [email protected] 44 http://hartenstein.de Kaiserslautern University of Technology Similar Programming Language Paradigms language category both deterministic sequencing driven by: © 2003, [email protected] Computer Languages Xputer Languages procedural sequencing: traceable, checkpointable read next instruction, read next data object, goto (instruction addr.), goto (data addr.), jump (to instruction addr.), jump (to data addr.), instruction loop, data loop, instruction loop nesting data loop nesting, no parallel loops, parallel data loops, instruction loop escapes, data loop escapes, instruction stream branching data stream branching 45 http://hartenstein.de Kaiserslautern *> Declarations University of Technology SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan; JPEG zigzag scan pattern Flowware language example HalfZigZag; SouthWestScan (MoPL) reverse(uturn(HalfZigZag)) goto PixMap[1,1] SouthScan is step by [0,1] endSouthScan; NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan; x y dataHalfZigZag counter data counter data counter data counter EastScan is step by [1,0] end EastScan; © 2003, [email protected] 46 HalfZigZag HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag; http://hartenstein.de Kaiserslautern University of Technology http://www.uni-kl.de >> new Machine Paradigm << • why coarse grain reconfigurable ? • terminology • toward higher abstraction levels • flowware languages + mapping • why a new Machine Paradigm ? • (co-) compilation techniques • final remarks © 2003, [email protected] 47 http://hartenstein.de CS: young ? dynamic? Kaiserslautern University of Technology .. but the von Neumann Paradigm is still the dominant doctrine ... after >10 technology generations ... • • • ... still pushing he basic models from the times of • • mainframe dinosaurs • • Microelectronics is • • ignored (except falling cost • of computational effort) • • © 2003, [email protected] 1th 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th ....... 4004 ... the vN Microprocessor 8008 is a methusela, the steam 8086 engine of the silicon age. 80286 80386 80486 P5 (Pentium) P6 (Pentium Pro / Pentium II) Pentium III .... 48 http://hartenstein.de Kaiserslautern University of Technology MPU designs more complex new kinds of concurrency are becoming important chip-level multiprocessing + simultaneous multithreading many bugs relate to concurrency issues greatly complicates the verification process © 2003, [email protected] 49 http://hartenstein.de Kaiserslautern University of Technology [intel] „Pollack‘s Law“ (simplified) growth factor area efficiency performance © 2003, [email protected] 50 µm http://hartenstein.de 0.1 KressArray principles Kaiserslautern University of Technology • take systolic array principles • replace classical synthesis by simulated annealing • yields the super systolic array • a generalization of the systolic array • no more restricted to regular data dependencies • now reconfigurability makes sense © 2003, [email protected] 51 http://hartenstein.de Kaiserslautern University of Technology control-procedural vs. data-procedural The structural domain is primarily data-stream-based: Flowware ..... mostly not yet modelled that way: most flowware is hidden by its indirect instruction-stream-based implementation Flowware provides a (data-)procedural abstraction from the (data-stream-based) structural domain Flowware converts „procedural vs. structural“ into „control-procedural vs. data-procedural“ ... ... a Troyan horse to introduce the structural domain to the procedural mind set of programmers © 2003, [email protected] 52 http://hartenstein.de Kaiserslautern University of Technology Why a dichotomy of machine paradigms? vN: unbalanced vN bottleneck data stream machine: • bad message: caches do not help • good message: no vN bottleneck • caches not needed stolen from Bob Colwell The anti machine has no von Neumann bottleneck © 2003, [email protected] 53 http://hartenstein.de Kaiserslautern University of Technology computing paradigms and methodologies 1946: machine paradigm (von Neumann) 1989: anti machine paradigm 1990: rDPU (Rabaey) 1994: anti machine high level programming language 1995: super systolic rDPA flowware* 1980: data streams (Kung, Leiserson) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ... 1997+: discipline of distributed memory architecture 1997: configware / software partitioning compiler © 2003, [email protected] 54 http://hartenstein.de Kaiserslautern University of Technology Flowware heading toward mainstream •Data-stream-based Computing is heading for mainstream –1997 SCCC (LANL) Streams-C Configurabble Computing –SCORE (UCB) Stream Computations Organized for Reconfigurable Execution –ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing –2000 Bee (UCB), ... –Most stream-based multimedia systems, etc. –Many other areas .... © 2003, [email protected] 55 Flowware: managing data streams Software: managing instruction streams http://hartenstein.de Kaiserslautern University of Technology - Matter & Antimatter: Atom and Anti Atom Anti Matter + Machine paradigm: Anti Atom The World of Matter Machine paradigm: the Atom © 2003, [email protected] 56 + http://hartenstein.de Kaiserslautern University of Technology Matter & Antimatter of Informatics : Anti Machine paradigm CPU - + nothing central ! DPU + © 2003, [email protected] 57 - http://hartenstein.de Kaiserslautern University of Technology machine paradigm: some differences no. of streams ³ 1 CPU - + + DPA + DPU + - © 2003, [email protected] 58 - + http://hartenstein.de Parallelism by Concurrency Kaiserslautern University of Technology independent instruction streams + + - - © 2003, [email protected] + + - + 59 - + - + http://hartenstein.de Dead Supercomputer Society Kaiserslautern University of Technology •ACRI •Alliant •American Supercomputer •Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent [Gordon Bell, keynote at ISCA 2000] •DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland •Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines •Kendall Square Research •Key Computer Laboratories © 2003, [email protected] 60 •MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics http://hartenstein.de Kaiserslautern University of Technology Lacking Sense of Direction ? „we are o.k. !“ (no new direction) blinders: for ignoring the impact of RC © 2003, [email protected] 61 http://hartenstein.de Kaiserslautern University of Technology Some Supercomputing people now looking at us Steroids for the aging microprocessor: Reconfigurable Computing © 2003, [email protected] 62 http://hartenstein.de Machine paradigms Kaiserslautern University of Technology von Neumann memory M I/O instruction stream machine instruction stream DPU CPU instruction sequencer - CPU + (reconf.) data-stream machine Flowware DPU + - Software (Configware) M M M M I/O M I/O memory data address generator (data sequencer) asM** data stream DPU or rDPU distributed memory architecture* memory M M M M M M I/O (r)DPU © 2003, [email protected] (r)DPA *) the new discipline came just in time: http://hartenstein.de 63al.: Proc. IEEE ICECS 2002 see Herz et heavy anti atoms: DPA = DPU array Kaiserslautern University of Technology + + + DPU DPU DPU DPU DPU DPU DPU DPU DPU DPA - + - + - + © 2003, [email protected] - - - 64 + - DPA + - + http://hartenstein.de Distributed Memory Kaiserslautern University of Technology SA: scrambling and descrambling the data ? Just in time: a new research area: Application-specific distributed memory: e. g. book by F. Catthoor et al. ... Data address generators - 20 years research: © 2003, [email protected] 65 http://hartenstein.de Kaiserslautern University of Technology http://www.uni-kl.de >> compilation techniques << • why coarse grain reconfigurable ? • terminology • toward higher abstraction levels • flowware languages + mapping • why a new Machine Paradigm ? • (co-) compilation techniques • final remarks © 2003, [email protected] 66 http://hartenstein.de We introduce: Co-Compilation Co-Compilation Kaiserslautern University of Technology high level programming language source Software running on Computer Machine Paradigm mProcessor interface partitioning compiler Reconfigurable Accelerators Configware running on Xputer “Soft” Machine Paradigm Reconfigurable Architecture (RA) -- instead of hardwired © 2003, [email protected] 67 http://hartenstein.de Kaiserslautern University of Technology The Secret of Success: Co-Compilation supporting platform-based design High level PL source “vN" machine paradigm Partitioner anti machine paradigm CW SW Analyzer compiler / Profiler compiler SW code © 2003, [email protected] CW Code 68 could provide the platforms supporting different platforms Resource Parameters http://hartenstein.de Loop Transformation Examples Kaiserslautern University of Technology sequential processes: loop 1-16 body endloop resource parameter driven Co-Compilation host: loop 1-8 trigger endloop loop 1-8 fork body body loop 1-8 loop 9-16 endloop body body endloop endloop loop unrolling loop 1-4 trigger endloop loop 1-2 trigger endloop join strip mining © 2003, [email protected] reconf.array: 69 http://hartenstein.de Machine Paradigms Kaiserslautern University of Technology machine category Computer (the Machine: “v. Neumann”) driven by: Instruction streams data streams (no “dataflow”) engine principles instruction sequencing sequencing data streams state register single program counter (multiple) data counter(s) at run time at load time resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. operation sequential parallel pipe network etc. Communication path set-up . fetch” ) ( “instruction data path *) e g. Bee project Prof. Broderson © 2003, [email protected] The Anti Machine also hardwired implementations* 70 http://hartenstein.de Kaiserslautern University of Technology KressArray Family generic Fabrics: a few examples Select mode, Select number, width of NNports 16 Function Repertory 8 32 + 24 2 rDPU 4 select Nearest Neighbour (NN) Interconnect: an example routthrough only more NNports: rich Rout Resources rout-through and function Examples of 2nd Level Interconnect: layouted over rDPU cell no separate routing areas ! http://kressarray.de © 2003, [email protected] 71 http://hartenstein.de KressArray DPSS Kaiserslautern University of Technology ALEX Code Architecture Estimator User User Interface interm. form Selection Architecture Editor Mapping Editor Data Path Synthesis System © 2003, [email protected] ALE-X Compiler interm. form Bus & I/O Mapper Schedule interm. form HDL Generator Simulator VHDL Verilog Design Rules Datapath Generator Generator Scheduler Kress rDPU Layout DPSS Power Estimator Power Data 72 http://hartenstein.de Application Set User KressArray (Design Space) (Platform Space) Xplorer User Interface ALEX Code ALE-X Compiler Suggestion KressArray DPSS Xplorer Kaiserslautern University of Technology Architecture Estimator interm. form interm. form Selection Architecture Editor Mapping Editor interm. form Bus & I/O Mapper Schedule Improvement Proposal Generator © 2003, [email protected] Suggestion statist. Data Delay Estim. Inference Engine (FOX) Scheduler DPSS Analyzer 73 http://hartenstein.de Kaiserslautern University of Technology Ulrich Nageldinger‘s Ph. D. thesis http://hartenstein.de click „recent talks“ this page: also link to Ph. D thesis download © 2003, [email protected] 74 http://hartenstein.de Kaiserslautern University of Technology http://www.uni-kl.de >> final remarks << • why coarse grain reconfigurable ? • terminology • toward higher abstraction levels • flowware languages + mapping • why a new Machine Paradigm ? • (co-) compilation techniques • final remarks © 2003, [email protected] 75 http://hartenstein.de Where are we heading ? Kaiserslautern University of Technology factor 2 90% by 2010 10 times more programmers will write embedded applications than computer software by 2010 1 0*) Department of Trade and Industry, London © 2003, [email protected] 10 12 18 76 months http://hartenstein.de Kaiserslautern University of Technology PS: Personal Supercomputer replaces the PC PS: personal supercomputer 1967 57 2007 1987 1977 nframes PC 1997 co-compiler µProc rDPA . data streams ... morphware © 2003, [email protected] 77 http://hartenstein.de What‘s the problem ? Kaiserslautern University of Technology µprocessor accelerators Crossing the Hardware / Software Chasm [Mike Butts] It‘s the gap between procedural and structural mind set Traditional CS: programming is (control-)procedural, instruction-stream-based – sources: software The typical programmer has problems to understand function evaluation without machine mechanisms.... .... by signals rippling through a network of transistors. © 2003, [email protected] 78 http://hartenstein.de What‘s the problem ? Kaiserslautern University of Technology µprocessor Crossing the Hardware / Software Chasm [Mike Butts] accelerators structural hemisphere missing The brain hurts on paradigm shift ? no, it can‘t ... solution only with user-friendly SW / CW / FW co-compilers based on anti machine paradigm used as a Troyan Horse into CS © 2003, [email protected] 79 Brain usage: procedural-only http://hartenstein.de Annihilation? Kaiserslautern University of Technology - avoidable by tools .... + © 2003, [email protected] + 80 http://hartenstein.de >>> thank you <<<<< Kaiserslautern University of Technology thank you for your patience © 2003, [email protected] 81 http://hartenstein.de >>> END <<< Kaiserslautern University of Technology © 2003, [email protected] 82 http://hartenstein.de Kaiserslautern University of Technology Conclusion: all knowledge needed is available • machine paradigm • languages • hw / sw partitioning methodology • compilation techniques • anti machine architectural resources • sequencing methodology: hw & sw • parallel memory IP core and module generator vendors • anything else needed © 2003, [email protected] 83 http://hartenstein.de Kaiserslautern University of Technology The Situation in Computing Sciences • Computing Sciences are in a severe crisis • New fundamentals and R&D directions are inevitable • my mission: getting you involved • All knowledge needed is readily available ... • ... even from Computing Sciences • Silicon application and EDA provide useful concepts • Reconfigurable Computing has the remedy © 2003, [email protected] 84 http://hartenstein.de Configware / Flowware Compilation Kaiserslautern University of Technology M M M high level source program M data streams M M M M © 2003, [email protected] M M mapper configware M M M r. Data Path Array M wrapper intermediate M rDPA M asM scheduler address generator 85 flowware data sequencer http://hartenstein.de “von Neumann” Computer: the wrong Machine Paradigm Kaiserslautern University of Technology Xputer Xputer LabLab University Kaiserslautern University of of Kaiserslautern tightly coupled by compact instruction code Computer RAM Compiler instructions Sequencer Datapath Datapath program cou n ter: hardwired loosely coupled by decision data bits only “von Neumann” does not support soft data paths Xputer: The Soft Machine Paradigm Compiler Scheduler “instructions” (multiple) sequencer Datapath Array reconfigurable d a ta cou n ter s also for hardwired state register © 2003, [email protected] © 2001, [email protected] RAM Xputer 86 (anti machine) http://hartenstein.de Why Coarse Grain instead of FPGA ? Kaiserslautern University of Technology Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld physical logical 100 000 000 000 FPGA physical Transistors / chip 10 000 000 000 1000 000 000 FPGA routed 10 000 000 reduced reconfigurability overhead by up to ~ 1000 1000 000 100 000 drastically much fastersmaller loading configuration memory a lot of more benefits 10 000 © 2003, [email protected] ~ 10 000 FPGA logical 100 000 000 1000 1980 ~ 10 1990 2000 87 2010 http://hartenstein.de
© Copyright 2025 Paperzz