Router Designs for ElasticBuffer On-Chip Networks George Michelogiannakis William J. Dally Stanford University Introduction EB flow-control was recently proposed. • Uses the channels as distributed FIFOs. EB routers are bufferless packet-switched routers. • They have the benefits of circuit-switched routers, without the overhead of setting up and tearing down circuits. This work explores the EB router design space. • By evaluating three representative designs. SC09: Routers for EB NoCs 2 The EB Flow-control Idea Pipelined channel Channel as FIFO Elastic buffer Master-slave FF SC09: Routers for EB NoCs 3 How Elastic Buffer Channels Work Ready/valid handshake between elastic buffers • Ready: At least one free storage slot • Valid: Non-empty (driving valid data) Cycle 6 1 2 3 4 5 SC09: Routers for EB NoCs 4 Use EB Flow-Control Through the Router VC input-buffered router Three-slot VC & SW output Input buffer EB cover for allocators removed. LAto routing also replaced by arbitration Per-output arbiters applicable done to EB input EB one cycle in instead. networks. advance. EB router 5 Baseline Router - Issues Issues constraining the clock cycle time: • Three-slot EB FSM too complicated: output EB implemented as FIFO. • Routing is performed serially with switch arbitration. Serially FIFO 6 Enhanced Two-Stage Router Look-ahead routing to shorten the critical path. Use two-slot EBs at output and for pipelining. • Flits are stored in the interm. EB and wait for a grant. • Decision to traverse switch made in the same cycle. 7 Enhanced Two-Stage Router – Sync Module Synchronization module maintains alignment between flits and grants. Contains an output port EB. • Stores the chosen output port of the current and any other packets in the router stage 1 and interm. EB. Maintains alignment between flits and grants. 8 Enhanced Two-Stage Router – Sync Module When the current packet’s tail flit is departing: • Sync. module propagates the next output to the arbiters. • From the appropriate location. Sync. module propagates an update to all outputs. • An output receiving an update from the input it is granting clocks the arbiter output regs at the next edge. 9 Single-Stage Router Merges the two router stages to: • Reduce router latency. • Avoid pipelining overhead. SC09: Routers for EB NoCs 10 Evaluation Methodology 45nm worst-case low-power commercial library. Synopsys DC and Cadence Encounter. • 64-bit router datapath. 70% initial area utilization ratio. Used a cycle-accurate network simulator. We assume each router at its maximum post-P&R frequency, or all at the same frequency. 8x8 2D mesh. 2mm-long wires. 1 cycle latency. • Constant packet size of 512 bits. Averaged over a set of six traffic patterns. Swept datapath width from 28 to 171 bits. SC09: Routers for EB NoCs 11 Placement and Routing Cycle Time Enhanced twostage has a 26% reduced cycle time compared to the single-stage, and 42% compared to the baseline twostage. SC09: Routers for EB NoCs 12 Placement and Routing Energy per Bit Baseline twostage requires 9% less energy per bit compared to the singlestage, and 35% compared to the enhanced twostage. 13 Placement and Routing Area Single-stage occupies 30% less area than the enhanced two-stage and 44% less than the baseline twostage. 14 Latency-Throughput, Max Frequencies. Latency increase: Enhanced: Baseline: +1% +46% 15 Latency-Throughput, Equal Frequencies. Latency increase: Enhanced: Baseline: +34% +32% 16 Which Router is the Optimal Choice? Priority Router Choice Operate at maximum frequencies Area Enhanced two-stage Energy Baseline two-stage (closely followed by single-stage) Latency Single-stage (depends on effect on channels) Operate at the same frequency Area Single-stage Energy Baseline two-stage (closely followed by single-stage) Latency Single-stage SC09: Routers for EB NoCs 17 Conclusion Improved EB router designs can widen the gap compared to VC networks. • Makes EB look even more attractive. EB routers are simple designs. Simple designs have numerous advantages. • A lot of the complexity of VC networks is ignored by some area and power models. Overall compared to VC, 43% reduction in power per unit throughput, 67% reduction in cycle time and 22% throughput per unit area. SC09: Routers for EB NoCs 18 Questions? SC09: Routers for EB NoCs
© Copyright 2026 Paperzz