downloading - Stanford University

Router Designs for ElasticBuffer On-Chip Networks
George Michelogiannakis
William J. Dally
Stanford University
Introduction
 EB flow-control was recently proposed.
• Uses the channels as distributed FIFOs.
 EB routers are bufferless packet-switched
routers.
• They have the benefits of circuit-switched routers,
without the overhead of setting up and tearing down
circuits.
 This work explores the EB router design space.
• By evaluating three representative designs.
SC09: Routers for EB NoCs
2
The EB Flow-control Idea
Pipelined channel
Channel as FIFO
Elastic buffer
Master-slave FF
SC09: Routers for EB NoCs
3
How Elastic Buffer Channels Work
 Ready/valid handshake between elastic buffers
• Ready: At least one free storage slot
• Valid: Non-empty (driving valid data)
Cycle 6
1
2
3
4
5
SC09: Routers for EB NoCs
4
Use EB Flow-Control Through the Router
VC input-buffered
router
Three-slot
VC & SW output
Input
buffer
EB
cover
for
allocators
removed.
LAto
routing
also
replaced by
arbitration
Per-output
arbiters
applicable done
to EB
input
EB
one
cycle in
instead.
networks.
advance.
EB router
5
Baseline Router - Issues
 Issues constraining the clock cycle time:
• Three-slot EB FSM too complicated: output EB
implemented as FIFO.
• Routing is performed serially with switch arbitration.
Serially
FIFO
6
Enhanced Two-Stage Router
 Look-ahead routing to shorten the critical path.
 Use two-slot EBs at output and for pipelining.
• Flits are stored in the interm. EB and wait for a grant.
• Decision to traverse switch made in the same cycle.
7
Enhanced Two-Stage Router – Sync Module
 Synchronization module maintains alignment
between flits and grants.
 Contains an output port EB.
• Stores the chosen output port of the current and any
other packets in the router stage 1 and interm. EB.
Maintains
alignment
between
flits and
grants.
8
Enhanced Two-Stage Router – Sync Module
 When the current packet’s tail flit is departing:
• Sync. module propagates the next output to the arbiters.
• From the appropriate location.
 Sync. module propagates an update to all outputs.
• An output receiving an update from the input it is
granting clocks the arbiter output regs at the next edge.
9
Single-Stage Router
 Merges the two router stages to:
• Reduce router latency.
• Avoid pipelining overhead.
SC09: Routers for EB NoCs
10
Evaluation Methodology
 45nm worst-case low-power commercial library.
 Synopsys DC and Cadence Encounter.
• 64-bit router datapath. 70% initial area utilization ratio.
 Used a cycle-accurate network simulator.
 We assume each router at its maximum post-P&R
frequency, or all at the same frequency.
 8x8 2D mesh. 2mm-long wires. 1 cycle latency.
• Constant packet size of 512 bits.
 Averaged over a set of six traffic patterns.
 Swept datapath width from 28 to 171 bits.
SC09: Routers for EB NoCs
11
Placement and Routing Cycle Time
 Enhanced twostage has a 26%
reduced cycle
time compared to
the single-stage,
and 42%
compared to the
baseline twostage.
SC09: Routers for EB NoCs
12
Placement and Routing Energy per Bit
 Baseline twostage requires
9% less energy
per bit compared
to the singlestage, and 35%
compared to the
enhanced twostage.
13
Placement and Routing Area
 Single-stage
occupies 30%
less area than
the enhanced
two-stage and
44% less than
the baseline twostage.
14
Latency-Throughput, Max Frequencies.
Latency increase:
Enhanced:
Baseline:
+1%
+46%
15
Latency-Throughput, Equal Frequencies.
Latency increase:
Enhanced:
Baseline:
+34%
+32%
16
Which Router is the Optimal Choice?
Priority
Router Choice
Operate at maximum frequencies
Area
Enhanced two-stage
Energy
Baseline two-stage
(closely followed by single-stage)
Latency
Single-stage
(depends on effect on channels)
Operate at the same frequency
Area
Single-stage
Energy
Baseline two-stage
(closely followed by single-stage)
Latency
Single-stage
SC09: Routers for EB NoCs
17
Conclusion
 Improved EB router designs can widen the gap
compared to VC networks.
• Makes EB look even more attractive.
 EB routers are simple designs. Simple designs
have numerous advantages.
• A lot of the complexity of VC networks is ignored by
some area and power models.
 Overall compared to VC, 43% reduction in power
per unit throughput, 67% reduction in cycle time
and 22% throughput per unit area.
SC09: Routers for EB NoCs
18
Questions?
SC09: Routers for EB NoCs