Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel Corp.) Network of Computing Units Out In B3 B1 B2 Network of Computing Units Out In B3 B1 B2 Network of Computing Units Out In B3 B1 B2 Latency-insensitive (elastic) system Out In B3 B1 B2 Every block only makes one step when all inputs are valid Why Scalable Modular (Plug & Play) Tolerance to variable latency – Communication – Computation Not asynchronous – Use existing design paradigms – CAD tools Outline The cost of elasticity SELF: an elastic protocol – Basic implementation (linear pipelines) – General netlists (forks and joins) – Formal models and verification Synthesis of elastic architectures Related work What’s the cost of Elastic block elasticity? Data Core Data Gated clock Valid Stop CLK Control Valid Stop Communication channel sender receiver Data Data Long wires: slow transmission Pipelined communication sender Data receiver Data Pipelined communication sender Data receiver Data Pipelined communication sender receiver Data Data How about if the sender does not always send valid data? The Valid bit sender receiver Data Data Valid Valid The Valid bit sender receiver Data Data Valid Valid The Valid bit sender receiver Data Data Valid Valid The Valid bit sender receiver Data Data Valid Valid The Valid bit sender receiver Data Data Valid Valid How about if the receiver is not always ready ? The Stop bit sender receiver Data Data Valid Valid Stop Stop 0 0 0 0 0 The Stop bit sender receiver Data Data Valid Valid Stop Stop 0 0 0 1 1 The Stop bit sender receiver Data Data Valid Valid Stop Stop 0 0 1 1 1 The Stop bit sender receiver Data Data Valid Valid Stop Stop 1 1 1 Back-pressure 1 1 The Stop bit sender receiver Data Data Valid Valid Stop Stop 0 0 0 Long combinational path 0 1 Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux S Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux S Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux S Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux S Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux S Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux S Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux S Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux S Carloni’s relay stations (double storage) sender receiver shell main main shell main pearl pearl V V S aux S V aux S V aux • Handshakes with short wires • Double storage required S Proposal: an elastic protocol SELF (Synchronous ELastic Flow) Simple and provably correct Data-path with no overhead in: – Area – Latency – Energy Negligible control overhead Fine-grain elasticity Flip-flops vs. latches sender receiver FF FF 1 cycle Flip-flops vs. latches sender receiver H L H L 1 cycle Flip-flops vs. latches sender receiver H L H L 1 cycle Flip-flops vs. latches sender receiver H L H L 1 cycle Flip-flops vs. latches sender receiver H L H L 1 cycle Flip-flops vs. latches sender receiver H L H L 1 cycle Flip-flops vs. latches sender receiver H L H L 1 cycle Flip-flops vs. latches sender receiver H L H L 1 cycle Flip-flops already have a double storage capability, but … Flip-flops vs. latches sender receiver H L H L 1 cycle Not allowed in conventional FF-based design ! Flip-flops vs. latches sender receiver H L H L 1 cycle Let’s make the master/slave latches independent Flip-flops vs. latches sender receiver H L H L ½ cycle ½ cycle Let’s make the master/slave latches independent Only half of the latches (H or L) can move tokens Elastic buffer keeps data while stop is in flight W1R1 Cannot be done with Single Edge Flops without double pumping Use latches inside MS W2R1 W1R2 W2R2 Carloni’s relay station belongs to this class SELF (linear communication) sender receiver Data Data En En Valid En En V V V V 1 1 1 1 Stop Valid Stop S S S S SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 0 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 0 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 0 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 0 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 0 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 1 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop SELF sender receiver Data Data En Valid En En V 1 En V V V Valid 0 Stop S S S S Stop The protocol Sender Data Valid 0 Stop Idle cycle: Valid = 0 Receiver The protocol Sender Data D Valid 1 Stop 0 Transfer cycle: Valid = 1 Stop = 0 Receiver The protocol Sender Data D Valid 1 Stop 1 Receiver Retry cycle: Valid = 1 Stop = 1 Persistency: G [ V S (Data=D) Next (V Data=D) ] The protocol Sender Receiver * D D * C C C B * A Data Valid Stop Data 0 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 0 0 0 Transfer Retry Valid Stop Latch Elastic Half Buffer Data Eni EHB Vi-1 Vi Si-1 Si Join + V1 EHB EHB S1 V2 S2 V S EHB Lazy Fork V S V1 S1 V2 S2 Eager Fork S1 ^ V1 V V2 S ^ S2 Elastic combinational paths Wire EB Fork Join EB Join / Fork EB Wire EB Elastic combinational paths Enable signal to data latches Wire EB Fork Join EB Join / Fork EB Wire EB Elastic combinational paths Wire EB Fork Join EB Join / Fork EB Wire EB Elastic buffer: formal model i i+1 Din … i+k Dout Vin Sin Vout rd Buffer [ 0.. ] Initial state: rd = wr = 0 Invariant: wr rd wr Sout Elastic buffer: formal model i Din i+1 … i+k Dout Vin Sin Vout rd wr Liveness properties (finite unbounded latencies) • Finite forward latency: G (rd wr F Vout) • Finite backward latency : G( Sout F Sin) Sout Formal verification i Din i+1 … i+k Dout Vin Vout rd wr Sin Dout Din Vin Sin Implementation Vout Sout Sout Formal verification The abstract FSM model is appropriate for compositional verification Verification of implementations with model checking (1-bit abstractions of the datapath) – LTL specs + NuSMV – – – – Buffer is a refinement of the spec In-order data-transmission Correct synchronization of fork/join structures Absence of deadlocks Observational equivalence Synchronous: D: a b c d e f g h i j k … Elastic: D: a a b b b c d e e f g g h i i i j k … En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 … Elasticization Synchronous Elastic CLK FORK IF/ID PC CLK J O I N ID/EX J O I N EX/MEM MEM/WB F O R K FORK V S J O I N J O I N V S CLK V V S S F O R K V S FORK 1 0 J O I N J O I N 1 0 CLK 1 1 0 0 F O R K 1 0 0 FORK 1 0 J O I N 0 1 0 CLK J O I N 1 1 0 0 F O R K 1 0 1 0 Elastic control layer 1 1 0 Generation of 0 gated clocks 0 1 CLK 1 0 Variable-latency Units [0 - k] cycles go VS done VS Variable-latency units Telescopic units: – 1 cycle for fast operations – 2 cycles for slow operations Examples: – Short / long additions (carry propagation) – A × 0, A / 1 – Dynamic changes in latency (fast if cold, slow if hot) Microarchitectural exploration Bubble insertion + Variable-latency units – May improve performance More bubbles but reduces cycle time – Reduce power Units designed for most frequent input data Exploration at fine-granularity Some related work Asynchronous design – Micropipelines (Sutherland) – Rings (Williams, Sparso) – CHP and slack-elasticity (Martin, Burns, Manohar et al.) Latency insensitive design – Carloni and a few follow-ups (large overhead) – Wire pipelining: Svensson, Nookala, Casu, … Interlock pipelines (H. Jacobson et al.) De-synchronization – J. Cortadella et al. – V. Varshavsky Synchronous implementations of CSP – J. O’Leary et al. – A. Peeters et al. Summary SELF: a specific protocol and implementation for elastic systems with very small overhead buffering Compositional theory proving correctness (Krstic et al., FMCAD’06) Library of controllers has been designed and their correctness verified Elasticization CAD in progress New micro-architectural opportunities based on bubbles and variable latency units
© Copyright 2025 Paperzz