nsf-jan99-12 - University of California, Berkeley

The End-to-end Argument:
Multicast’s Friend or Foe?
Steven McCanne
Sylvia Ratnasamy
Electrical Engineering and Computer Science
University of California, Berkeley
NSF PI Meeting
January 23, 1999
Washington, DC
The Problem
• Scaling multipoint communication
–
–
–
–
–
broadcast video
group conferencing
directory services
remote file distribution
hierarchical web cache updates
IP Multicast to the Rescue
• Extend IP
architecture for
multipoint
• Best effort
• Defer reliability
to higher layers
S
A
B
C
D
IP Multicast to the Rescue
• Extend IP
architecture for
multipoint
• Best effort
• Defer reliability
to higher layers
S
A
B
C
D
The End-to-end Challenge
• Keep the network simple & robust
• Rely upon end-to-end adaptation
• Build apps on top of IP multicast that
– scale, and
– accommodate heterogeneity.
• Been trying for a decade
– this is a HARD problem
Topological Awareness
• Divide and conquer
– Localize “problems”
• e.g., recovery and control traffic
– Impose hierarchy
• Hierarchy should be congruent with
underlying network topology
• This means we should know something
about the topology (maybe indirectly)
Topological Optimizations
S
A
B
C
Logical Topology
D
Topological Optimizations
S
S
D
C
A
B
C
Logical Topology
D
A
B
Physical
Topology
Topological Optimizations
S
• Reliable multicast
recovery groups
D
C
A
B
Topological Optimizations
S
• Reliable multicast
recovery groups
D
C
A
B
Topological Optimizations
S
DR
DR
A
B
C
D
• Reliable multicast
recovery groups
• DR placement for
RMTP
Topological Optimizations
S
DR
DR
A
B
C
D
• Reliable multicast
recovery groups
• DR placement for
RMTP
Topological Optimizations
S
D
C
A
B
• Reliable multicast
recovery groups
• DR placement for
RMTP
• Optimal transcoder
placement for
streaming media
Topological Optimizations
S
T
C
A
B
D
• Reliable multicast
recovery groups
• DR placement for
RMTP
• Optimal transcoder
placement for
streaming media
Topological Optimizations
S
T
T
A
B
C
D
• Reliable multicast
recovery groups
• DR placement for
RMTP
• Optimal transcoder
placement for
streaming media
Topological Optimizations
S
$
D
$ C
A
B
• Reliable multicast
recovery groups
• DR placement for
RMTP
• Optimal transcoder
placement for
streaming media
• Self-organizing web
caches
The Problem
• IP deliberately hides topology
• No hooks to discover it
• Two alternatives
– extend the service model (e.g., put new
forwarding services in the network)
– infer topological structure from end-toend measurements
Our Approach
• Let’s look at both
– service extensions (NSF CAREER grant)
• How would we minimally extend the IP
multicast service model to better support endto-end transports, and how would we co-design
those new transport protocols?
– inference
• If restricted to the existing service model,
how well could we infer topology through endto-end observations and use this knowledge in
a transport protocol?
Our Approach
• Let’s look at both
– service extensions (NSF CAREER grant)
• How would we minimally extend the IP
multicast service model to better support endto-end transports, and how would be co-design
those new transport protocols?
– inference
• If restricted to the existing service model,
how well could we infer topology through endto-end observations and use this knowledge in
a transport protocol?
State of the Art
• Self-organizing multicast groups
– Cluster receivers with similar loss
patterns
– Assumes shared loss implies spatial
(topological) correlation
• Landmark works
– Liu et al. reliable multicast
– Kouvelas et al. transcoder placement
Self-organizing Groups
• Loss “finger prints” (lossprints)
– Exchange lossprints over session channel
– If shared loss > THRESH, form new
group
– Join group with similar lossprint
• Problem: it doesn’t work
Lossprint Pathology
S
A
B
C
Lossprint Pathology
S
med loss
high loss
A
B
C
Lossprint Pathology
S
Lossprint
thresholding
A
B
C
Lossprint Pathology
S
– Shared loss does
not imply spatial
correlation
– Haven’t localized
the problem
Lossprint
thresholding
A
B
C
Lossprint Pathology
S
– Shared loss does
not imply spatial
correlation
– Haven’t localized
the problem
Ideal
A
B
C
Back to the Drawing Board
• Can we somehow extract the true loss
correlations?
S
• Red signal (AC)
– strong but wrong
• Yellow signal (AB)
– weak but right
• Short answer: yes
A
B
C
Tree Inference
• Syvlia’s Infocom ’99 paper
• Probabilistic model
• Assumptions
– Independence
– Perfect (end-to-end) knowledge
everywhere
– Global exchange of lossprints
– Lossprints from the beginning of time
Bottom-Up Construction
S
A
B
C
Bottom-Up Construction
S
A
B
S
C
A
B
C
Bottom-Up Construction
S
A
B
S
C
A
B
S
C
A
B
C
The Magic
S
S
?
A
B
S
A
B
C
C
A
B
C
Probabilistic Model
• Choose max likelihood pairing
• Assume independent losses
• Tease apart
– real shared loss
• losses along a common path
– coincidental shared loss
• independent losses along separate paths
• Trick: can observe only the union
Probabilistic Model
Real
shared
loss
S
ps
Coincidental
shared loss
pa pb
A
B
Pa~b = (1- ps) pa (1- pb)
P~ab = (1- ps) (1- pa) pb
Pab = ps + (1- ps) pa pb
• LHS big P’s are observed
• Solve for little p’s
• Choose max likelihood pairing
Lossprint Pathology Fixed
S
A
B
– Real shared loss
does imply spatial
correlation
– Successful
localization
C
P(A,B real shared loss) > P (A,C real shared loss)
even though P(A,B shared loss) < P (A,C shared loss)
Results
• It works well
– Simple network models
– independent losses
– P(correct inference) -> 1 as N -> oo
• Open: impact of correlated losses
• See Sylvia’s paper
A Group Formation Protocol
• Unfortunately, global exchange of
information is expense and unrealistic
• Can we apply the lessons learned
through this extreme case analysis to
a more practical protocol design?
• Short answer: yes
Delivery based model
• Unicast data to each homogenous region within the heterogeneous tree.
• Tailor the original delivery process to meet varied receiver
characteristics, making the subsequent issues of flow/congestion control
and reliability that much easier
source
unicast
connections
multicast
LAN
Three core pieces required to
enable a delivery based model
• Group Formation Protocol
• Representative Election
Protocol
• Scattercast
Group Formation Protocol (GFP)
Goal
Organize receivers into a multi -level hierarchy
of disjoint multicast delivery groups
source
Metric
Pt(i,j)
True shared loss probability
Estimates the loss probability along the
shared path between two nodes
[Tree Inference algo. - Infocom ’99]
I
J
• Source continually multicasts probe pkts onto a
separate control channel
• Receivers use observed loss patterns to infer group
structure
Group Formation Protocol
Terminology
source
• Participant: A receiver participates in
the GFP if its loss rate > LRmin
{1,4,6,9}
• Receiver LossPrint (LPr): list of pkts
lost by a receiver
{3,5}
• Group LossPrint (LPg): list of pkts lost
by the group as a whole
• Working LossPrint (LPr - LPg): list of
pkts lost by a receiver within its current
group
I
For receiver I :
LPi = { 1,3,4,5,6,9 }
LPg= { 1,4,6,9 }
LPi-LPg = { 3,5 }
Group Formation Protocol
• Every participant sets a timer as a linearly increasing function of its
loss rate
• On timer expiry, a participant either initiates the formation of a new
group (INIT(g) msg) or joins a previously proposed group (JOIN(g) msg)
• INIT and JOIN msgs include a LossPrint describing the group’s losses
• For every incoming INIT(g)/JOIN(g) msg, a participant (I) computes
R = Pt(i,g) / Pt(g,g)
R is indicative of the extent of the shared path between
group “g” and participant “I”
• Until its timer expires, a participant tracks the group (Gmax) with
which it shares the maximum value of R (Rmax)
Group Formation Protocol
• On timer expiry :
If ( Rmax > THRESH ) {
participant I joins group Gmax (JOIN msg)
<JOIN/Gmax/Gmax’s lossprint>
} else
{
participant I initiates the formation of new group (INIT msg)
<INIT/Gnew/I’s working lossprint>
}
• Value of THRESH determines the “shape” of the final hierarchy
• Value of LRmin determines the heterogeneity within a single delivery group
Representative Election Protocol
Low loss rate receiver is an ideal group representative
• In GFP, initiator of a new group has the lowest loss rate in that group.
• Group initiator acts as group representative
• Use of soft state principles to achieve robustness in the face of
representative crashes
– Representative periodically transmits ‘REP’’ messages
– Every receiver sets a timer as a linearly increasing function of loss rate
– A receiver whose timer expires before it sees a ‘REP’ message takes over
as representative
• REP message includes group LossPrint LPg
Scattercast
• Source unicasts data to the first-level group
representatives
• Representatives unicast data to the group
representatives at the next level in the
hierarchy and multicast data within their own
group
• Implement reliability and flow/congestion
control at two levels
– between homogenous regions by the coarse
grained clustering of receivers (GFP)
– within each homogenous region (e.g.: using SRM
global recovery, transmitting at the least common
denominator rate etc)
source
TCP
SRM
SRM
TCP
SRM
SRM
Conclusions?
• Open question: To what extent should
we retain the end-to-end way of
thinking in the context of multicast?
• Two areas we’re looking at
– End-to-end and infer
– Extend service model
• Both are hard and the jury is still
out...