RobHendry

1
MODELING AND EVALUATION
OF CHIP-TO-CHIP SCALE
SILICON PHOTONIC
NETWORKS
Robert Hendry, Dessislava Nikolova,
Sébastien Rumley, Keren Bergman
Columbia University
HOTI 2014
2
Chip-to-chip optical networks
•  Projected chip I/O bandwidth: tens of Tb/s
•  Chip I/O bandwidth limited by pin count, data rate
•  Promising solution: silicon photonics
•  Dense bandwidth via WDM
•  High data rates
•  Energy-distance independence in fiber
3
Outline
•  Silicon photonic chip-to-chip networks
•  Characterizing loss and WDM capacity
•  Modeling power
•  Determining network performance
•  Conclusions
4
Microring-based silicon photonic links
•  Microrings
•  Modulation
•  Switching
•  Filtering
Microring modulator (Cornell)
Demultiplexing filter (Kotura/
Oracle)
•  Other optical devices:
•  Lasers, couplers, integrated photodetectors
Photodetectors 5
Chip-to-chip optical networks
•  “Chip-to-chip” low radix, high bandwidth
•  Chose two architectures to represent extremes of design space
Full mesh architecture
Switched architecture
6
Full mesh
•  One link at each source for each destination
PNI 0 Tx
To PNI 1
PNI 1 Rx
From PNI 0
To PNI 2
To PNI 3
From PNI 2
From PNI 3
7
Switched architecture
•  One input and output link per PNI
2x2 switch
“Through” state
“Drop” state
PNI 0 Tx
Optical switch fabric
PNI 1 Rx
8
Comparing topologies
•  Laser power is the largest contributor to overall power in the
network
•  4.5%, X. Zheng, et al. “Efficient WDM laser sources towards terabytes/s silicon
photonic interconnects.” Journal of Lightwave Technology, vol. 31, no. 15, 2013.
•  Assume lasers are always on
•  Laser stabilization time on the order of microseconds
•  Context: small packets, short inter-arrivals
•  Energy efficiency closely related to utilization of laser sources
•  Full mesh expectation:
•  No contention, lower queuing latency
•  More lasers, higher power, poor efficiency with load is low
•  Switched architecture expectation:
•  Contention, higher queuing latency
•  Resource sharing improves utilization and therefore efficiency
9
Shared input/output waveguides
•  Another way to share laser sources
PNI 0 Rx
PNI 0 Tx
PNI 1 Rx
PNI 1 Tx
…
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Tx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Full mesh, 2-way sharing
PNI
PNI
Full mesh
Tx
Rx
Rx
Tx
Tx
Rx
Rx
Tx
Tx
Rx
Rx
Tx
Tx
Rx
Rx
Tx
•  Sacrificing performance for better utilization
10
Shared input/output waveguides
•  Another way to share laser sources
PNI 0 Rx
PNI 0 Tx
PNI 1 Rx
PNI 1 Tx
Benes
Rx
Rx
Rx
Benes, 2-way sharing
PNI
Rx
Tx
Tx
Tx
Tx
2x2
2x2
2x2
Tx
2x2
Tx
2x2
Tx
2x2
Tx
2x2
PNI
…
Rx
Rx
Rx
Rx
11
Design space
•  Topology
•  Benes
•  Full mesh
•  Sharing
•  No sharing, or two-way sharing
•  Network radix
•  4, 8, or 16
•  Goal: to find optimal topologies for given bisectional
bandwidth requirements
•  Ex: “Benes-4T-2S”, “FM-16T-1S”
12
Outline
•  Silicon photonic chip-to-chip networks
•  Characterizing loss and WDM capacity
•  Modeling power
•  Determining network performance
•  Conclusions
13
Determining worst-case loss
•  Combine losses of all devices along worst-case path of light
Optical switch fabric
PNI 0 Tx
Waveguide:
.92 dB/cm
PNI 1 Rx
Filter: .6 – 2.8 dB
Coupler: 1 dB
Switch through/drop:
0.07 – 3.3 dB
Modulators: 5.4 – 7.85 dB
14
Complexity vs. link capacity
100 mW (20 dBm)
Channel
power
Channel
power
Channel
power
Total loss
Total loss
Total loss
Required input power
Non-linear effects
Optical power budget
6.3 uW (-22 dBm)
Below receiver sensitivity
10 Gb/s per wavelength channel, OOK modulation
Intermodulation crosstalk limits WDM capacity to 125 wavelengths
• 
• 
• 
• 
Assuming 50nm spectrum
K. Padmaraju, et al. “Intermodulation Crosstalk Characteristics of WDM
Silicon Microring Modulators.” IEEE Photonics Letters, vol. 26, no. 14, 2014.
15
Peak Bisectional Bandwidth
•  Loss ! maximum wavelengths per link
•  Maximum wavelengths x 10 Gb/s x number of
links in bisection ! peak bisectional bandwidth
Benes-4T-1S
Benes-8T-1S
Benes-16T-1S
Benes-4T-2S
Benes-8T-2S
Benes-16T-2S
FM-4T-1S
FM-8T-1S
FM-16T-1S
FM-4T-2S
FM-8T-2S
FM-16T-2S
1 Tb/s
More devices (switches),
more loss, less bandwidth
Simpler links, less loss,
more bandwidth
10 Tb/s
100 Tb/s
T = number of PNIs, S = number of Tx/Rx per link
16
Peak Bisectional Bandwidth
•  Loss ! maximum wavelengths per link
•  Maximum wavelengths x 10 Gb/s x number of
links in bisection ! peak bisectional bandwidth
Benes-4T-1S
Benes-8T-1S
Benes-16T-1S
Benes-4T-2S
Benes-8T-2S
Benes-16T-2S
FM-4T-1S
FM-8T-1S
FM-16T-1S
FM-4T-2S
FM-8T-2S
FM-16T-2S
1 Tb/s
More links, but also higher
radix switch, so bandwidth
grows slowly, or not at all
More links, more bisectional
bandwidth
10 Tb/s
100 Tb/s
T = number of PNIs, S = number of Tx/Rx per link
17
Outline
•  Silicon photonic chip-to-chip networks
•  Characterizing loss and WDM capacity
•  Modeling power
•  Determining network performance
•  Conclusions
18
Power modeling
•  Microring tuning, trimming
•  Thermal fluctuations
•  Imperfect fabrication
Device Type/Origin Power/Device
(mW)
Thermal
0.875
Driver
circuitry
1.35
Dissipation
in ring
0.1
Switch
Thermal
3.5
Filter
Thermal
0.875
Detector
Static
3.95
Laser
Static
1250
Modulator
•  Laser power
•  A function of loss and number of
wavelengths used
•  Static dissipation in photodetectors
•  Dynamic modulation, switching
power
•  Not modeling network interfaces
19
Outline
•  Silicon photonic chip-to-chip networks
•  Characterizing loss and WDM capacity
•  Modeling power
•  Determining network performance
•  Conclusions
20
Impact of layout on network performance
•  Poisson arrivals, uniform random destination
•  Fixed message size (256B)
•  Assume we have an arbitration scheme that can reach 100%
utilization across the chip-scale network
Benes-4T-1S
2
10
Benes-8T-1S
Benes-16T-1S
Benes-4T-2S
Benes-8T-2S
Benes-16T-2S
1
10
2
10
3
4
10
10
Offered bandwidth (Gb/s)
5
10
Average network latency (ns)
Average network latency (ns)
•  Models indicate queuing and head-to-tail latency
FM-4T-1S
FM-8T-1S
2
10
FM-16T-1S
FM-4T-2S
FM-8T-2S
FM-16T-2S
1
10
2
10
3
4
10
10
Offered bandwidth (Gb/s)
5
10
21
Impact of layout on energy per bit
4
4
10
10
Benes-4T-1S
2
Benes-16T-2S
10
1
10
0
Energy per bit (pJ)
Energy per bit (pJ)
3
10
10
2
10
FM-4T-1S
Steady decrease
in energy per bit
Benes-8T-1S
3
10
because mostBenes-16T-1S
power is “static.”
Benes-4T-2S
More load means
more utilization.
Benes-8T-2S
FM-8T-1S
FM-16T-1S
FM-4T-2S
FM-8T-2S
2
FM-16T-2S
10
1
10
0
3
4
10
10
Offered bandwidth (Gb/s)
5
10
10
2
10
3
4
10
10
Offered bandwidth (Gb/s)
5
10
•  The best configuration in terms of energy per bit depends
on offered load
•  However, these figures hide latency
22
Impact of layout on energy per bit
4
4
10
10
Benes-4T-1S
Benes-8T-1S
Benes-16T-1S
Benes-4T-2S
Benes-8T-2S
2
Benes-16T-2S
10
1
10
0
FM-8T-1S
3
Energy per bit (pJ)
Energy per bit (pJ)
3
10
10
2
10
FM-4T-1S
10
FM-16T-1S
FM-4T-2S
FM-8T-2S
2
FM-16T-2S
10
1
10
0
3
4
10
10
Offered bandwidth (Gb/s)
5
10
10
2
10
3
4
10
10
Offered bandwidth (Gb/s)
5
10
•  The best configuration in terms of energy per bit depends
on offered load
•  However, these figures hide latency
23
Pareto optimality of topologies
100
Energy per bit (pJ)
0.4 Tb/s
1 Tb/s
4 Tb/s
Benes
40 Tb/s
Full-mesh
4T-1S
4T-2S
10
8T-1S
8T-2S
8T-4S
1
16T-1S
10
100
10
100
Average netw ork latency (ns)
10
100
10
100
Architectures with optimal trade-off at given load
•  Can move to a more power-consuming topology to
improve latency, or vice versa
16T-2S
16T-4S
24
Pareto optimality of topologies
100
Energy per bit (pJ)
0.4 Tb/s
1 Tb/s
4 Tb/s
Benes
40 Tb/s
Full-mesh
4T-1S
4T-2S
10
8T-1S
8T-2S
8T-4S
1
16T-1S
10
100
10
100
Average netw ork latency (ns)
10
100
10
100
•  Low loaded networks inevitably suffer from higher energy
per bit
16T-2S
16T-4S
25
Conclusions
•  Developed methodology for navigating design space
•  Using cross-layer analysis, we characterized an upper
bound on the energy efficiency of silicon photonic
networks at the chip-to-chip scale
•  Trend: For (relatively small scale) silicon photonic
networks, the mechanisms that accommodate for low
loads (i.e. resource sharing) degrade energy efficiency