sikdar

OWN: OPTICAL AND WIRELESS
NETWORK-ON-CHIP FOR
KILO-CORE ARCHITECTURES
Ashif Iqbal SikderϮ, Avinash KodiϮ, Matthew KennedyϮ, Savas KayaϮ, and Ahmed Louri ‡
School of Electrical Engineering and Computer Science, Ohio UniversityϮ
Department of Electrical and Computer Engineering, George Washington University‡
E-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
Website: http://oucsace.cs.ohiou.edu/~avinashk/
23rd Annual Symposium on High-Performance Interconnects (HOTI)
August 26 – 28, 2015
Oracle Santa Clara Campus, CA, USA
2
Talk Outline
• Motivation & Background
•  OWN: Architecture & Communication
•  Performance Analysis
•  Conclusions & Future Work
3
Multi-cores & Network-on-Chips
TILE-Mx1001
MPPA-256 Kalray2
GF100 512-Core (Nvidia)3
•  With increasing multiple number of cores, communication-centric
design paradigm (Network-on-Chips) is facing challenges due to:
•  Higher power dissipation: long metallic wires
•  Area overhead: more router components
•  Increased Latency: Complex multi-hop routing
=> Potential solutions: Emerging technologies such as optics,
wireless
1http://www.tilera.com/products/?ezchip=585&spage=686
2http://www.kalrayinc.com/kalray/products/
3http://www.nvidia.com/object/IO_86775.html
4
Optical Network-on-Chip
Optical NoC offers several
advantages:
•  Low power (~7.9 fJ/bit )
•  Low latency (~500ps)
•  High Bandwidth (~40 Gbps)
•  CMOS compatibility
1. Lin Xu; Wenjia Zhang; Qi Li; Chan, J.; Lira, H.L.R.; Lipson, M.; Bergman, K., "40-Gb/s DPSK Data Transmission Through a Silicon Microring Switch," Photonics Technology Letters, IEEE ,
vol.24, no.6, pp.473,475, March15, 2012
2. Sasikanth Manipatruni, Kyle Preston, Long Chen, and Michal Lipson, "Ultra-low voltage, ultra-small mode volume silicon microring modulator," Opt. Express 18, 18235-18242 (2010)
3. J. Cunningham, R. Ho, X. Zheng, J. Lexau, H. Thacker, J. Yao, Y. Luo, G. Li, I. Shubin, F. Liu et al., “Overview of short-reach optical interconnects: from vcsels to silicon nanophotonics.”
4. Xia, Fengnian, Lidija Sekaric, and Yurii Vlasov. "Ultracompact optical buffers on a silicon chip." Nature photonics 1.1 (2007): 65-71.
Disadvantages of optical NoC:
•  Optical-only crossbar is not
scalable for large core networks
•  Multi-hop networks with smaller
crossbar have increased latency
for large core networks
5
Wireless Network-on-Chip
•  Wireless offers several advantages:
•  CMOS compatibility
•  Omnidirectional communication without
wires using multicasting and broadcasting
•  Bandwidth extension using Frequency
Division Multiplexing (FDM), Time Division
Multiplexing (TDM), Space Division
Multiplexing (SDM)
Multicasting
Broadcasting
•  Disadvantages of Wireless :
•  High transceiver area and energy/bit
•  Low wireless bandwidth at 60 GHz center
frequency for CMOS technology
•  Latency due to resource sharing
RF-CMOS transceiver trend for WiNoC1
1. D. DiTomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W. Rayess, “Energy-efficient adaptive wireless nocs architecture,” in Networks on Chip (NoCS), 2013 Seventh IEEE/ACM
International Symposium on. IEEE, 2013, pp. 1–8.
6
Optical & Wireless NoC: OWN
•  OWN combines the benefits of photonics and wireless to
overcome the disadvantages of each technology
•  Smaller optical crossbar to provide one hop communication and
reduce area and power overhead
•  Connect the optical domains via wireless to facilitate one hop
communication between the domains
Optical
Waveguide
Optical
Domain
Wireless
Antenna
7
OWN Architecture & Communication (1/4)
4 Cores
64 Cores
256 Cores
A Tile consists of 4 Cores => 16 Tiles form a
Cluster => 4 Clusters create a Group => 4
Groups are on the Chip
1024 Cores
8
OWN Architecture & Communication (2/4)
Inactive
Modulators
Active
Modulators
Demodulators
Waveguide
Arbitration
Waveguide
Cluster & Intra-cluster optical communication
9
OWN Architecture & Communication (3/4)
Wireless
Router
Wait for token
Rx Tx
Optical
Router
Group & Intra-group wireless communication
10
OWN Architecture & Communication (4/4)
G2
G3
Wait for token
Wait for token
G0
G1
Wait for token
Chip & Inter-group wireless communication
11
OWN Deadlocks & Solution
Group 2
P
Wireless
Router
J
G
C
P
C
Group 3
G
N
J
L
Circular
Dependency
H
D
N
D
H
L
J
C
C
J
L
D
D
L
P
G
P
G
N
H
N
H
Optical Link
Intra-group
Wireless Link
Inter-group
Wireless Link
Cluster
• 
• 
• 
• 
VC0 = Intra-cluster & Intra-group
VC1 = Inter-group horizontal
VC2 = Inter-group vertical
VC3 = Inter-group diagonal
Input
Core
Output
Core
Solution => VC allocation based on packet types which requires 4 VCs per port
12
Performance Analysis
–  Architectures: OWN, Cmesh (wired only), Wcube (hybrid
wireless) and ATAC (hybrid optical)
–  Number of cores: 1024
–  Synthetic Benchmarks: Uniform (UN), Bit-Reversal (BR),
Complement (COMP), Matrix Transpose (MT), Perfect
Shuffle (PS), and Neighbor (NBR)
–  Network Simulation: Optisim*
–  Area and Power Analysis
•  Dsent# to calculate wire link and router area and power at bulk
45nm LVT
•  Optical link area and power (waveguide, micro-ring resonators,
laser power)
•  Wireless transceiver area is 0.62 mm2 and energy 1pJ/bit$
* A. Kodi and A. Louri, “A system simulation methodology of optical interconnects for high-performance computing systems,” J. Opt. Netw, vol. 6, no. 12, pp. 1282–1300, 2007
# C. Sun, C.-H. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V. Stojanovic, “Dsent-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip
modeling,” in Networks on Chip (NoCS), 2012 Sixth IEEE/ACM International Symposium on. IEEE, 2012, pp. 201–210
$ D. DiTomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W. Rayess, “Energy-efficient adaptive wireless nocs architecture,” in Networks on Chip (NoCS), 2013 Seventh IEEE/ACM
International Symposium on. IEEE, 2013, pp. 1–8.
13
Related Work
Wcube[MobiCom’09]
ATAC[PACT’10]
14
Area
µm2
Router Area
Photonic Link Area
100
90
80
70
60
50
40
30
20
10
0
WireLess Link Area
Wired Link Area
35.5%
34.14%
CMESH
WCUBE
OWN
OWN requires about a 35.5% less area than ATAC
ATAC
15
Energy per bit : Uniform
Wireless Link Energy
Router Energy
Wire Link Energy
Photonic Energy
0.4
Energy (pJ/bit)
0.35
40.2%
0.3
0.25
23.2%
0.2
0.15
0.1
0.05
0
CMESH
ATAC
OWN
OWN consumes about a 40.2% less energy than WCube
WCUBE
16
Energy per bit : Perfect Shuffle
Wireless Link Energy
Router Energy
Photonic Energy
Wire Link Energy
0.3
Energy (pJ/bit)
0.25
21.2%
0.2
3%
0.15
0.1
0.05
0
CMESH
ATAC
OWN
OWN consumes about a 21.2% less energy than WCube
WCUBE
17
Latency: Uniform Traffic
CMESH
WCUBE
ATAC
OWN
0.031
0.041
Network Load
0.051
Number of Cycles
600
500
400
300
200
67%
100
0
0.001
0.011
0.021
0.061
OWN lowered the latency by about 67% and 11% from Wcube and
ATAC respectively
18
Latency: Bit-Reversal Traffic
CMESH
WCUBE
ATAC
OWN
Number of Cycles
600
500
400
300
200
69%
100
0
0.001
57%
0.026
0.051
Network Load
0.076
OWN lowered the latency by about 69%, 57% and 11% from CMesh,
Wcube and ATAC respectively
19
Saturation Throughput
Saturation Throughput (flits/
cycle/core)
CMESH
WCUBE
ATAC
OWN
0.25
0.2
0.15
0.1
0.05
0
UN
BR
COMP
MT
PS
NBR
GM
OWN outperforms WCube and Cmesh on average by about 8% and
28% respectively
20
Conclusions & Future Work
•  OWN requires 35.5% less area than ATAC but 34.14%
higher area than WCube
•  OWN requires 30.36% less energy/bit than WCube but
13.99% higher energy/bit than ATAC
•  OWN has higher saturation throughput & lower latency
compared to wired, wireless and optical networks
•  CMOS technology advancement will benefit OWN in both
area and energy/bit
•  Dynamic wireless channel allocation can be a future
work
Thank You
Questions?
22
Decomposed Crossbar Router
Output 1
VC 1
Output 2
VC 2
Input 1
Output 3
VC 3
VC 4
Input Buffers
MUX
Output
Unit
Output 4
Switch Allocator
Route
Computation
VC Allocator
Output 1
VC 1
VC 2
Input 1
VC 3
VC 4
Input Buffers
MUX
From
Cores
VC 1
VC 2
Input 4
VC 3
VC 4
Input Buffers
MUX
Crossbar Switch
Output
Unit
Output 15
To
Cores
23
Optical Power Calculation
•  Laser Power (one wavelength) = Longest Link x 1dB/cm
+ 1 dB (for modulation) + 1dB (for demodulation) +
0.0001 x ring modulator adjacent + 0.2dB (splitter) +
1dB (photodetector loss)
•  Laser Efficiency = 15%
•  Receiver Sensitivity = -17dBm
•  Pin = 10 ^ (loss in dB / 10) x Pout
•  Ptotal = #WL (Pin / Laser Eff.) + (Pin / Laser Eff.) x
Arbitration Link
•  Ring Heating Power = 26uW/ring
•  Ring Modulating = 500uW/ring
24
Optical Area Calculation
•  Ring Resonator diameter 12um
•  WG = Width (4um) x Length