Exploring Concentration and Channel Slicing in On-chip Network Router Prabhat Kumar1 Yan Pan1 John Kim2 Gokhan Memik1 Alok Choudhary1 1Northwestern University 2KAIST, South Korea 1 Contributions of the Work Performance implication of concentration. Integrated vs. external concentration. 47% reduction in area 36% reduction in energy 10% performance degradation Channel Slicing Virtual concentration for efficient resource utilization. 69% reduction in area 32 % reduction in energy 2 Outline Motivation Concentration Channel slicing Virtual concentration Results Conclusion 3 Motivation Limited Budget Efficiency Performance Cost optimization Design Options Area Energy Concentration Channel Slicing Previous Work Concentration – CMESH, Flattened Butterfly, Firefly, Multidrop Express Channels (on-chip) Channel Slicing – Dragonfly (off-chip) 4 Motivation Firefly [Pan ISCA’09] Simplified Router Microarchitecture 5 Typical Topology 2D Mesh # Processor Nodes = # Routers 6 Solution: Concentration Multiple cores share one router Benefits Resource sharing Network Diameter decreases Local communication cost decreases Drawbacks Router complexity increases significantly • C=4 • Radix = 8 • Width = 2x 7 Issue : Router Complexity Router components Crossbar Switch ~ (radix)2 Arbitration logic • How can we reduce the complexity of crossbar switch? 5x5 crossbar, 2D MESH 8x8 crossbar, 2D MESH, C = 4 (Integrated Concentration) 8 Design Option: External Concentration Multiplex injection ports De-multiplex ejection ports Benefits Router radix decreases Area decreases Cons Reduced switching capacity 9 Issue: Arbitration External Concentration Two levels of arbitration Parallel Arbitration Use router switch information for concentration arbitration 10 Outline Motivation Concentration Channel slicing Virtual concentration Results Conclusion 11 Issue: Wide Channels Constant bandwidth density => wider channels Inefficient utilization Cache lines ~ 512-1024 bits wide Request, control, coherency packets much narrower Router Area Switch area ~ (channel width)2 • C=4 • Radix = 8 • Width = 2x 12 Design Option: Channel Slicing Slice wide channels Pros Complexity reduces further Better channel utilization Cons Serialization latency increases (for long pkts) • C=4 • Slicing Factor = 4 13 Outline Motivation Concentration Channel slicing Virtual concentration Results Conclusion 14 Combining Concentration and Slicing Slicing + Concentration Virtual Concentration Nodes dedicated to a sliced layer No sharing of input bandwidth 15 Outline Motivation Concentration Channel slicing Virtual concentration Results Conclusion 16 Evaluation Setup Simulation Environment Booksim simulator Constant on-chip resources • • Equal Bisection bandwidth for all configurations Equal amount of buffer storage Terminology Code Name Slicing Architecture Name M1D1 No External Concentration M4D4 No Integrated Concentration S1R4M4D4 Yes Integrated Concentration S4R1M1D1 Yes Virtual Concentration S4R4M4D4 Yes Fully connected Slice Code Name MESH (C=1) S1R4M4D4 S4R1M1D1 S4R4M4D4 # Slices 1 1 4 4 Inj Channel Buffer Node Width Depth 1 0.5b 0.6x 4 1b 0.75x 1 0.25b 1.2x 4 0.25b 0.75x Router Latency M1D1 M1D4 2 3 M4D1 M4D4 3 3 17 External Concentration Zero-load latency Throughput 10% reduction for UR No change for Bitcomp 20 15 M1D1 M4D4 10 0 Area 21% reduction for UR 25% reduction for Bitcomp Average Pkt Latency (# Cycles) Uniform Random 25 47% reduction compared to Integrated Energy 36% reduction compared to Integrated Average Pkt Latency ( # Cycles) 0.05 0.1 0.15 Injection Rate 0.2 Bitcomp 40 35 30 25 20 M1D1 M4D4 15 10 0 0.05 0.1 Injection Rate 0.15 18 Virtual Concentration Throughput 69% reduction compared to MESH Energy 32% reduction compared to MESH Uniform Random 80 60 S1R4M4D4 S4R1M1D1 S4R4M4D4 MESH 40 20 0 0 No significant difference for UR 4.5% increase for Bitcomp Area No change compared to MESH 16% increase for UR compared to Integrated 12% increase for Bitcomp compared to Integrated Average Pkt Latency (# Cycles) Zero-load latency 0.1 0.2 Injection Rate 0.3 Bitcomp 80 Average Pkt Latency (# Cycles) 60 40 S1R4M4D4 S4R1M1D1 S4R4M4D4 MESH 20 0 0 0.05 0.1 0.15 Injection Rate 19 Area and Energy Consumption Area 69% reduction compared to MESH 88% reduction compared to Integrated concentration Normalized Area 3 2.5 2 1.5 1 0.5 0 Energy 32 % reduction compared to MESH 35% reduction compared to Integrated concentration 20 Conclusion Combination of concentration and channel slicing provides efficient NoC design. External concentration reduces complexity with some performance degradation. Virtual Concentration saves 69% area and 32% energy compared to 2D MESH. 21 Thank you for your patience!! Questions? [email protected] 22
© Copyright 2026 Paperzz