Packet-Mode Emulation of Output

Packet-Mode Emulation of
Output-Queued Switches
David Hay, CS, Technion
Joint work with Hagit Attiya (CS, Technion),
Isaac Keslassy (EE, Technion)
CIOQ Switches
Cell-Mode Scheduling
Cell-Mode Scheduling
Cell-Mode Scheduling
Trend towards Packet-Mode

Cell-mode scheduling is getting too hard
 Fragmentation
and reassembly should work very fast,
at the external rate
 Extra header for each cell  loss of bandwidth


For optical switches such fragmentation and
reassembly are prohibitive
Cell-mode schedulers are packet-oblivious
 Degradation
of the overall performance
Packet-Mode Scheduling
Packet-Mode Scheduling
[Marsan et al., 2002][Ganjali et al., 2003][Turner, 2006]


No need for fragmentation and reassembly
Must ensure contiguous packet delivery over the
fabric
 While
input i delivers a packet to output j, neither
input i nor output j can handle other packets.
Can packet-mode schedulers provide similar
performance guarantees as cell-mode schedulers?
Output Queuing Emulation

OQ switches are considered optimal with
respect to queuing delay and throughput
 But
too hard to implement in practice…

Emulation: Same input traffic  same output
traffic

How hard is it for cell-mode / packet-mode CIOQ
switch to emulate OQ switch?
Output Queuing Emulation

OQ switches are considered optimal with
respect to queuing delay and throughput
 But
too hard to implement in practice…

Emulation:
Same input traffic  same output traffic

How hard is it for cell-mode / packet-mode CIOQ
switch to emulate OQ switch?
Cell-Mode Emulation is Possible


Easy with speedup S=N
N scheduling decisions every time-slot:


In the 1st decision forward the cell of input 1
In the 2nd decision forward the cell of input 2
⋮



In the Nth decision forward the cell of input N
Possible with speedup S2: CCF algorithm
Lower bound: S≥2-1/N is required
[Chuang et al.,1999]
What is the speedup required for
packet-mode emulation?
Packet-Mode Emulation is Impossible

Regardless of speedup
 Even
with speedup S=N
Packet-Mode Emulation is Impossible
Packet-Mode Emulation is Impossible
Packet-Mode Emulation is Impossible
Packet-Mode Emulation is Impossible
Packet-Mode Emulation is Impossible
Emulation w/ Relative Queuing Delay
The CIOQ switch is allowed a bounded
lag behind the shadow OQ switch
 Exact same behavior as the optimal OQ
switch, but with some extra delay


Called relative queuing delay
Can we provide packet-mode OQ emulation
with bounded RQD and small speedup?
Our Results:
Speedup-RQD tradeoff
Speedup
2Lmax
First algorithm:
S  4 with RQD=O(NLmax)
Generalization of cell-mode
Lower bound on RQD
scheduling
L
packet
size with S=2:
max=maximum
(even
with infinite
speedup)
Taking each packet of size ≤ Lmax
as one huge cell
4
2
Lower bound on the speedup
(from cell-mode scheduling)
RQD
Intuition for Emulation Algorithms
Packet Mode CIOQ
Cell Mode CIOQ w/
S=2
Packet Mode OQ
Underlying CCF Algorithm


Observation: Packet-Mode OQ
switch is a Cell-Mode OQ
switch with different queuing
discipline (called PIFO)
Cell-Mode CIOQ w/ CCF (and
speedup S=2) emulates any
PIFO cell-mode OQ switch
Packet Mode CIOQ
Cell Mode CIOQ w/
S=2
[Chuang et al.,1999]
 But, CCF does not maintain
contiguous packet forwarding
over the fabric!
PIFO Cell-Mode OQ
=
Packet Mode OQ
Intuition for Emulation Algorithms
Packet Mode CIOQ
Two sub-steps:
1. Framing
2. Contiguous Decomposition
Cell Mode CIOQ w/
S=2
Packet Mode OQ
Frame-Based Schedulers
Works in pipelined frame-based manner
time
Within each frame:
 Build a demand matrix for this frame
 Schedule the demand matrix of the
previous frame
Building the Demand Matrix

At each frame of size T, CCF forwards at most
2T cells from each input and to each output.
+
+
+
+
 3 +1 + 2 +0


 0 + 2 + 2 +2
1 + 2 + 2 +1


 2 + 1 + 0 +3


+
+
+
+
+
+
+
+
Number of cells CCF sent
from input 1 to output 1 in
the last frame
≤ ≤ ≤ ≤
2T 2T 2T 2T
≤ 2T
≤ 2T
≤ 2T
≤ 2T
Problem: A packet may span several frames.
Building the Demand Matrix
Count only packets whose last cell is
forwarded by the CCF in the frame
 Each row/column in the matrix is bounded
by 2T+N(Lmax-1)

 For
each input-output pair only cells of one
additional packet can be added.

Translates into RQD of 2T+Lmax-2.
Intuition for Emulation Algorithms
Packet Mode CIOQ
Two sub-steps:
1. Framing
2. Contiguous Decomposition
Cell Mode CIOQ w/
S=2
Packet Mode OQ
Decomposing the Demand Matrix

Challenge: Decompose the matrix into permutations
while maintaining contiguous packet delivery.



Each permutation dictates a scheduling decision.
Speedup = Number of permutations/Frame Length
First try: optimal Birkhoff von-Neumann decomposition
results in 2T+N(Lmax-1) permutations.
3

0
1

2

1 2 0 1
 
2 2 2  0

2 2 1 0
 
1 0 3   0
0 0 0 1
 
1 0 0 0

0 1 0 0
 
0 0 1   0
0 0 0 1
 
0 1 0 0

1 0 0 0
 
0 0 1   0
0 0 0

0 0 1

0 1 0

1 0 0 
0

0
1

0

0 1 0  0
 
1 0 0  0


0 0 0
0
 
0 0 1   1
1 0 0  0
 
0 1 0  0


0 0 1
0
 
0 0 0   1
0 1 0

0 0 1
1 0 0

0 0 0 
Contiguous Greedy Decomposition

To maintain contiguous packet delivery:
 If
(i,j) was matched in iteration t-1 and there are more
(i,j) cells to schedule  keep for iteration t.

Find a greedy matching for the rest of the matrix.
1

0
0

0

0 0 0

0 1 0
1 0 0
 Cells left from
0 0 1 
1 to 1
1

0
0

0

0 0 0

0 1 0
1 0 0

0 0 1 
Iteration t-1
2 N ( Lmax  1)  1
 Speedup: 4 
T
1

0
0

0

0 0 0

0 0 1
0 1 0

1 0 0 
Iteration t
RQD: 2T+Lmax-2
Our Results:
Speedup-RQD tradeoff
Speedup
2Lmax
S=4+ (2N(Lmax-1)-1)/T
RQD = 2T+Lmax-2
Next…
4
2
RQD
Packet-Mode Emulation w/ S2



Separate demand matrix
for every possible packet
size
Concatenate packets of
the same size into
mega-packets of size
k=LCM(1,…,Lmax)
Leftover matrix for each
size m
Packet Mode CIOQ
Two sub-steps:
1. Framing
2. Contiguous
Decomposition
Cell Mode CIOQ w/
S=2
Packet Mode OQ
Packet-Mode Emulation w/ S2

Optimally decompose (w/
Birkhoff von-Neumann)
 the mega-packets
matrix
 then the leftover
matrices
N ( Lmax k  1)
S  2
T
RQD  2T  Lmax  2
Packet Mode CIOQ
Two sub-steps:
1. Framing
2. Contiguous
Decomposition
Cell Mode CIOQ w/
S=2
Packet Mode OQ
Wrap-up
Packet-mode scheduling can be done with
the same speedup as cell-mode
scheduling
 With the price of bounded RQD
Future work: lower bounds
??
Thank You!