Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS, Technion), Isaac Keslassy (EE, Technion) CIOQ Switches Cell-Mode Scheduling Cell-Mode Scheduling Cell-Mode Scheduling Trend towards Packet-Mode Cell-mode scheduling is getting too hard Fragmentation and reassembly should work very fast, at the external rate Extra header for each cell loss of bandwidth For optical switches such fragmentation and reassembly are prohibitive Cell-mode schedulers are packet-oblivious Degradation of the overall performance Packet-Mode Scheduling Packet-Mode Scheduling [Marsan et al., 2002][Ganjali et al., 2003][Turner, 2006] No need for fragmentation and reassembly Must ensure contiguous packet delivery over the fabric While input i delivers a packet to output j, neither input i nor output j can handle other packets. Can packet-mode schedulers provide similar performance guarantees as cell-mode schedulers? Output Queuing Emulation OQ switches are considered optimal with respect to queuing delay and throughput But too hard to implement in practice… Emulation: Same input traffic same output traffic How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch? Output Queuing Emulation OQ switches are considered optimal with respect to queuing delay and throughput But too hard to implement in practice… Emulation: Same input traffic same output traffic How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch? Cell-Mode Emulation is Possible Easy with speedup S=N N scheduling decisions every time-slot: In the 1st decision forward the cell of input 1 In the 2nd decision forward the cell of input 2 ⋮ In the Nth decision forward the cell of input N Possible with speedup S2: CCF algorithm Lower bound: S≥2-1/N is required [Chuang et al.,1999] What is the speedup required for packet-mode emulation? Packet-Mode Emulation is Impossible Regardless of speedup Even with speedup S=N Packet-Mode Emulation is Impossible Packet-Mode Emulation is Impossible Packet-Mode Emulation is Impossible Packet-Mode Emulation is Impossible Packet-Mode Emulation is Impossible Emulation w/ Relative Queuing Delay The CIOQ switch is allowed a bounded lag behind the shadow OQ switch Exact same behavior as the optimal OQ switch, but with some extra delay Called relative queuing delay Can we provide packet-mode OQ emulation with bounded RQD and small speedup? Our Results: Speedup-RQD tradeoff Speedup 2Lmax First algorithm: S 4 with RQD=O(NLmax) Generalization of cell-mode Lower bound on RQD scheduling L packet size with S=2: max=maximum (even with infinite speedup) Taking each packet of size ≤ Lmax as one huge cell 4 2 Lower bound on the speedup (from cell-mode scheduling) RQD Intuition for Emulation Algorithms Packet Mode CIOQ Cell Mode CIOQ w/ S=2 Packet Mode OQ Underlying CCF Algorithm Observation: Packet-Mode OQ switch is a Cell-Mode OQ switch with different queuing discipline (called PIFO) Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO cell-mode OQ switch Packet Mode CIOQ Cell Mode CIOQ w/ S=2 [Chuang et al.,1999] But, CCF does not maintain contiguous packet forwarding over the fabric! PIFO Cell-Mode OQ = Packet Mode OQ Intuition for Emulation Algorithms Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ Frame-Based Schedulers Works in pipelined frame-based manner time Within each frame: Build a demand matrix for this frame Schedule the demand matrix of the previous frame Building the Demand Matrix At each frame of size T, CCF forwards at most 2T cells from each input and to each output. + + + + 3 +1 + 2 +0 0 + 2 + 2 +2 1 + 2 + 2 +1 2 + 1 + 0 +3 + + + + + + + + Number of cells CCF sent from input 1 to output 1 in the last frame ≤ ≤ ≤ ≤ 2T 2T 2T 2T ≤ 2T ≤ 2T ≤ 2T ≤ 2T Problem: A packet may span several frames. Building the Demand Matrix Count only packets whose last cell is forwarded by the CCF in the frame Each row/column in the matrix is bounded by 2T+N(Lmax-1) For each input-output pair only cells of one additional packet can be added. Translates into RQD of 2T+Lmax-2. Intuition for Emulation Algorithms Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ Decomposing the Demand Matrix Challenge: Decompose the matrix into permutations while maintaining contiguous packet delivery. Each permutation dictates a scheduling decision. Speedup = Number of permutations/Frame Length First try: optimal Birkhoff von-Neumann decomposition results in 2T+N(Lmax-1) permutations. 3 0 1 2 1 2 0 1 2 2 2 0 2 2 1 0 1 0 3 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 0 Contiguous Greedy Decomposition To maintain contiguous packet delivery: If (i,j) was matched in iteration t-1 and there are more (i,j) cells to schedule keep for iteration t. Find a greedy matching for the rest of the matrix. 1 0 0 0 0 0 0 0 1 0 1 0 0 Cells left from 0 0 1 1 to 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 Iteration t-1 2 N ( Lmax 1) 1 Speedup: 4 T 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 Iteration t RQD: 2T+Lmax-2 Our Results: Speedup-RQD tradeoff Speedup 2Lmax S=4+ (2N(Lmax-1)-1)/T RQD = 2T+Lmax-2 Next… 4 2 RQD Packet-Mode Emulation w/ S2 Separate demand matrix for every possible packet size Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) Leftover matrix for each size m Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ Packet-Mode Emulation w/ S2 Optimally decompose (w/ Birkhoff von-Neumann) the mega-packets matrix then the leftover matrices N ( Lmax k 1) S 2 T RQD 2T Lmax 2 Packet Mode CIOQ Two sub-steps: 1. Framing 2. Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ Wrap-up Packet-mode scheduling can be done with the same speedup as cell-mode scheduling With the price of bounded RQD Future work: lower bounds ?? Thank You!
© Copyright 2026 Paperzz