OWN: OPTICAL AND WIRELESS NETWORK-ON-CHIP FOR KILO-CORE ARCHITECTURES Ashif Iqbal SikderϮ, Avinash KodiϮ, Matthew KennedyϮ, Savas KayaϮ, and Ahmed Louri ‡ School of Electrical Engineering and Computer Science, Ohio UniversityϮ Department of Electrical and Computer Engineering, George Washington University‡ E-mail: [email protected], [email protected], [email protected], [email protected], [email protected] Website: http://oucsace.cs.ohiou.edu/~avinashk/ 23rd Annual Symposium on High-Performance Interconnects (HOTI) August 26 – 28, 2015 Oracle Santa Clara Campus, CA, USA 2 Talk Outline • Motivation & Background • OWN: Architecture & Communication • Performance Analysis • Conclusions & Future Work 3 Multi-cores & Network-on-Chips TILE-Mx1001 MPPA-256 Kalray2 GF100 512-Core (Nvidia)3 • With increasing multiple number of cores, communication-centric design paradigm (Network-on-Chips) is facing challenges due to: • Higher power dissipation: long metallic wires • Area overhead: more router components • Increased Latency: Complex multi-hop routing => Potential solutions: Emerging technologies such as optics, wireless 1http://www.tilera.com/products/?ezchip=585&spage=686 2http://www.kalrayinc.com/kalray/products/ 3http://www.nvidia.com/object/IO_86775.html 4 Optical Network-on-Chip Optical NoC offers several advantages: • Low power (~7.9 fJ/bit ) • Low latency (~500ps) • High Bandwidth (~40 Gbps) • CMOS compatibility 1. Lin Xu; Wenjia Zhang; Qi Li; Chan, J.; Lira, H.L.R.; Lipson, M.; Bergman, K., "40-Gb/s DPSK Data Transmission Through a Silicon Microring Switch," Photonics Technology Letters, IEEE , vol.24, no.6, pp.473,475, March15, 2012 2. Sasikanth Manipatruni, Kyle Preston, Long Chen, and Michal Lipson, "Ultra-low voltage, ultra-small mode volume silicon microring modulator," Opt. Express 18, 18235-18242 (2010) 3. J. Cunningham, R. Ho, X. Zheng, J. Lexau, H. Thacker, J. Yao, Y. Luo, G. Li, I. Shubin, F. Liu et al., “Overview of short-reach optical interconnects: from vcsels to silicon nanophotonics.” 4. Xia, Fengnian, Lidija Sekaric, and Yurii Vlasov. "Ultracompact optical buffers on a silicon chip." Nature photonics 1.1 (2007): 65-71. Disadvantages of optical NoC: • Optical-only crossbar is not scalable for large core networks • Multi-hop networks with smaller crossbar have increased latency for large core networks 5 Wireless Network-on-Chip • Wireless offers several advantages: • CMOS compatibility • Omnidirectional communication without wires using multicasting and broadcasting • Bandwidth extension using Frequency Division Multiplexing (FDM), Time Division Multiplexing (TDM), Space Division Multiplexing (SDM) Multicasting Broadcasting • Disadvantages of Wireless : • High transceiver area and energy/bit • Low wireless bandwidth at 60 GHz center frequency for CMOS technology • Latency due to resource sharing RF-CMOS transceiver trend for WiNoC1 1. D. DiTomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W. Rayess, “Energy-efficient adaptive wireless nocs architecture,” in Networks on Chip (NoCS), 2013 Seventh IEEE/ACM International Symposium on. IEEE, 2013, pp. 1–8. 6 Optical & Wireless NoC: OWN • OWN combines the benefits of photonics and wireless to overcome the disadvantages of each technology • Smaller optical crossbar to provide one hop communication and reduce area and power overhead • Connect the optical domains via wireless to facilitate one hop communication between the domains Optical Waveguide Optical Domain Wireless Antenna 7 OWN Architecture & Communication (1/4) 4 Cores 64 Cores 256 Cores A Tile consists of 4 Cores => 16 Tiles form a Cluster => 4 Clusters create a Group => 4 Groups are on the Chip 1024 Cores 8 OWN Architecture & Communication (2/4) Inactive Modulators Active Modulators Demodulators Waveguide Arbitration Waveguide Cluster & Intra-cluster optical communication 9 OWN Architecture & Communication (3/4) Wireless Router Wait for token Rx Tx Optical Router Group & Intra-group wireless communication 10 OWN Architecture & Communication (4/4) G2 G3 Wait for token Wait for token G0 G1 Wait for token Chip & Inter-group wireless communication 11 OWN Deadlocks & Solution Group 2 P Wireless Router J G C P C Group 3 G N J L Circular Dependency H D N D H L J C C J L D D L P G P G N H N H Optical Link Intra-group Wireless Link Inter-group Wireless Link Cluster • • • • VC0 = Intra-cluster & Intra-group VC1 = Inter-group horizontal VC2 = Inter-group vertical VC3 = Inter-group diagonal Input Core Output Core Solution => VC allocation based on packet types which requires 4 VCs per port 12 Performance Analysis – Architectures: OWN, Cmesh (wired only), Wcube (hybrid wireless) and ATAC (hybrid optical) – Number of cores: 1024 – Synthetic Benchmarks: Uniform (UN), Bit-Reversal (BR), Complement (COMP), Matrix Transpose (MT), Perfect Shuffle (PS), and Neighbor (NBR) – Network Simulation: Optisim* – Area and Power Analysis • Dsent# to calculate wire link and router area and power at bulk 45nm LVT • Optical link area and power (waveguide, micro-ring resonators, laser power) • Wireless transceiver area is 0.62 mm2 and energy 1pJ/bit$ * A. Kodi and A. Louri, “A system simulation methodology of optical interconnects for high-performance computing systems,” J. Opt. Netw, vol. 6, no. 12, pp. 1282–1300, 2007 # C. Sun, C.-H. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V. Stojanovic, “Dsent-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling,” in Networks on Chip (NoCS), 2012 Sixth IEEE/ACM International Symposium on. IEEE, 2012, pp. 201–210 $ D. DiTomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W. Rayess, “Energy-efficient adaptive wireless nocs architecture,” in Networks on Chip (NoCS), 2013 Seventh IEEE/ACM International Symposium on. IEEE, 2013, pp. 1–8. 13 Related Work Wcube[MobiCom’09] ATAC[PACT’10] 14 Area µm2 Router Area Photonic Link Area 100 90 80 70 60 50 40 30 20 10 0 WireLess Link Area Wired Link Area 35.5% 34.14% CMESH WCUBE OWN OWN requires about a 35.5% less area than ATAC ATAC 15 Energy per bit : Uniform Wireless Link Energy Router Energy Wire Link Energy Photonic Energy 0.4 Energy (pJ/bit) 0.35 40.2% 0.3 0.25 23.2% 0.2 0.15 0.1 0.05 0 CMESH ATAC OWN OWN consumes about a 40.2% less energy than WCube WCUBE 16 Energy per bit : Perfect Shuffle Wireless Link Energy Router Energy Photonic Energy Wire Link Energy 0.3 Energy (pJ/bit) 0.25 21.2% 0.2 3% 0.15 0.1 0.05 0 CMESH ATAC OWN OWN consumes about a 21.2% less energy than WCube WCUBE 17 Latency: Uniform Traffic CMESH WCUBE ATAC OWN 0.031 0.041 Network Load 0.051 Number of Cycles 600 500 400 300 200 67% 100 0 0.001 0.011 0.021 0.061 OWN lowered the latency by about 67% and 11% from Wcube and ATAC respectively 18 Latency: Bit-Reversal Traffic CMESH WCUBE ATAC OWN Number of Cycles 600 500 400 300 200 69% 100 0 0.001 57% 0.026 0.051 Network Load 0.076 OWN lowered the latency by about 69%, 57% and 11% from CMesh, Wcube and ATAC respectively 19 Saturation Throughput Saturation Throughput (flits/ cycle/core) CMESH WCUBE ATAC OWN 0.25 0.2 0.15 0.1 0.05 0 UN BR COMP MT PS NBR GM OWN outperforms WCube and Cmesh on average by about 8% and 28% respectively 20 Conclusions & Future Work • OWN requires 35.5% less area than ATAC but 34.14% higher area than WCube • OWN requires 30.36% less energy/bit than WCube but 13.99% higher energy/bit than ATAC • OWN has higher saturation throughput & lower latency compared to wired, wireless and optical networks • CMOS technology advancement will benefit OWN in both area and energy/bit • Dynamic wireless channel allocation can be a future work Thank You Questions? 22 Decomposed Crossbar Router Output 1 VC 1 Output 2 VC 2 Input 1 Output 3 VC 3 VC 4 Input Buffers MUX Output Unit Output 4 Switch Allocator Route Computation VC Allocator Output 1 VC 1 VC 2 Input 1 VC 3 VC 4 Input Buffers MUX From Cores VC 1 VC 2 Input 4 VC 3 VC 4 Input Buffers MUX Crossbar Switch Output Unit Output 15 To Cores 23 Optical Power Calculation • Laser Power (one wavelength) = Longest Link x 1dB/cm + 1 dB (for modulation) + 1dB (for demodulation) + 0.0001 x ring modulator adjacent + 0.2dB (splitter) + 1dB (photodetector loss) • Laser Efficiency = 15% • Receiver Sensitivity = -17dBm • Pin = 10 ^ (loss in dB / 10) x Pout • Ptotal = #WL (Pin / Laser Eff.) + (Pin / Laser Eff.) x Arbitration Link • Ring Heating Power = 26uW/ring • Ring Modulating = 500uW/ring 24 Optical Area Calculation • Ring Resonator diameter 12um • WG = Width (4um) x Length
© Copyright 2026 Paperzz