A simple Model of P2P Streaming and Some Extensions September 29, 2008 Dah Ming Chiu Chinese University of Hong Kong P2P content distribution Network multicast Is pre-occupied with network efficiency Find an efficient tree to reach all receivers P2P content distribution Tries to maximize throughput (or minimize makespan) Makespan of broadcasting Various algorithms studied Given n receivers, makespan for single message O(log n) * T But divide “message” into M pieces, makespan ~ (M + O(log n) * (T/M) = T + O(log n) * T/M ~T as M becomes large ~ The Secret of p2p content distribution Use multiple trees Divide content into multiple “chunks” or “stripes” Let different chunks (stripes) flow through different trees Every node serves as interior node in some tree(s), hence contribute in the distribution The multiple trees are distributing content in parallel! source 1 2 3 5 6 4 7 8 The capacity of p2p networks The multiple distribution trees are logical trees in an overlay network To study the capacity of p2p network, need a model of how these overlay trees map to physical network resources The uplink sharing model - Mundinger Assume away the network complexity Only “uplinks” can be bottlenecks in peer communications Practically all models make this assumption Under this assumption, it is possible to derive a simple “rule of thumb”, even if peers have different uplinks J. Mundinger, R. Weber, and G. Weiss. Optimal Scheduling of Peer-to-Peer File Dissemination. Journal of Scheduling, 2007. When peers have unequal capacity What is the minimum makespan when peers have different uplink capacities? It is a MILP (Mixed integer linear programming problem) See Mundinger’s thesis for details… Solutions can be derived for small M and N But there is a simple asymptotic, closed-form solution when M is very large! -> fluid approximation The simple formula Given the uplink sharing model (right figure) and very large M, the maximum throughput is These two values are equal when C0 = sum(Cj) / (n-1) Most of the time, server’s capacity is larger, so R equals to the 2nd term Proof step 1: R is upper bound C0 is uplink bandwidth from server Server must send content out at least once – C0 is a clear upper bound C0 + sumj(Cj) is the total uplink bandwidth from server and all peers Each peer must receive server’s content from some where. The total demand is therefore n*R, which must equal to the total supply Proof step 2: Realizing R by construction One 1-hop tree (from sender) N 2-hop trees (from each peer) Assign rates optimally, satisfying the uplink constraints Two cases: C0 C1 C0 > C/(n-1) where C = sum Ci C0 <= C/(n-1) Cn C2 C3 C0 C1 Cn C2 C3 Maximum rate achieved Case 1: C0 > C/(n-1) where C = sum Ci Assign rate Ci/(n-1) to the ith 2-hop spanning tree The ith spanning tree can deliver Ci/(n-1) to each other peer Assign rate C0 – C/(n-1) to the 1-hop spanning tree The server can deliver (C0-C/(n-1))/n to each peer Each peer receives (C0 + C)/n C0 C1 Cn C2 C3 C0 Case 2: C0 <= C/(n-1) Assign rate C0*Ci/C to the ith 2-hop spanning tree, which can then deliver to all other peers at that rate Each peer receives sum(C0*Ci/C) = C0 C1 Cn C2 C3 How to build multiple distribution trees? Two approaches: Structured P2P network: build multiple trees explicitly “Splitstream: High-bandwidth Multicast in Cooperative Environments”, SOSP, 2003 (500+ citations in GS) Unstructured P2P network: build multiple trees dynamically and implicitly, based on who has and can provide content “Incentives Build Robustness in BitTorrent”, Bram Cohen, 2003 (900+ citations in GS) PPLive: very popular P2P streaming platform Why does it work, especially for streaming? A mystery. Approx model of unstructured P2P Each peer gets a list of neighbors, during bootstrapping (e.g. from tracker) In each time slot, a peer “talks to” a random neighbor, and select a piece to download, using a piece selection algorithm This model is commonly used for studying “BT-like” or “gossip” protocols Metrics: throughput, delay… Our work is also based on this model Metrics for streaming: Continuity, start-up latency… Our recent work 1. “A simple model for analyzing p2p streaming protocols”, ICNP 2007 (YP Zhou, DM Chiu, J Lui) Peer streaming model Server “pushes” chunks to peers sequentially, a new chunk in each time slot Buffer shifts ahead one position per time slot In a p2p system, a peer may download one chunk from another peer per time slot server t=1 t=2 t=3 playback 1 2 3 1 2 1 Buffer ………. Simple p2p model M homogeneous peers Same upload capacity Same playback buffer size Synchronized playback In each time slot, server “push” a chunk to a random peer continuity = p(n) Without P2P network continuity = p(n) = 1/M With P2P network, recursively compute p(i). p(i) is defined as: p(i)=prob(position i filled) server playback 1/M 1 2 …………… n 2 …………… n 1/M 1 … M peers 1/M 1 2 …………… n Sliding window Each peer’s buffer is a sliding window In each time slot, each peer downloads from a random neighbor q(i) = the probability Buf[i] gets filled p(i 1) p(i) q(i) p(1)=1/M time=t 1 2 p(n)=? …………… n sliding window t+1 1 2 p(1)=1/M …………… n p(i+1) = p(i) + q(i) q(i) = w(i)*h(i)*s(i) w(i) = peer wants to fill Buf[i] h(i) = the selected peer has the content for Buf[i] s(i) = Buf[i] determined by chunk P2P technology effect selection strategy Chunk selection strategies Greedy try to fill the empty buffer closest to playback Rarest First try to fill the empty buffer for the newest chunk since p(i) is an increasing function, this means “Rarest First” playback 1 Buffer map 2 3 X RF Selection 4 5 X X 6 7 8 X Greedy Selection Chunk selection strategy - cont Greedy S(i) = (1-1/M)∏j>I Pr[¬(W(k,j)H(h,j))] = (1-1/M)∏j>I (p(j) + (1-p(j))2) = (1- p(1) - p(n) + p(i+1)) Rarest first s(i) = (1-1/M) ∏j<I (p(j) + (1-p(j))2) = 1 – p(i) Chunk selection strategy - cont Greedy p(i+1) = p(i) + (1-p(i)) p(i) (1- p(1) - p(n) + p(i+1)) w(i) h(i) s(i) Rarest first Lemma 1 p(i+1) = p(i) + (1-p(i)) p(i) (1- p(i)) w(i) h(i) s(i) Lemma 2 Also studied continuous forms for these difference equations simulation Which strategy is better? What does “better” mean? Playback continuity: p(n) as large as possible Start-up latency: p(i) as small as possible Given buffer size (n) and large peer population (M) 1) “Rarest first” is more scalable, and is often better in continuity! 2) Greedy is better in start-up latency Numerical result M=1000 N=40 In simulation, # neighbors=60 Uploads at most 2 in each time slot Mixed strategy Partition the buffer into [1,m] and [m+1,n] Use RF for [1,m] first If no chunks available for download by RF, use Greedy for [m+1,n] Difference equations become p(1) = 1/M p(i+1) = p(i) + p(i) (1- p(i))2 for i = 1,…,m-1 p(i+1) = p(i) + p(i) (1- p(i))(1- p(m)- p(n) + p(i+1)) for i = m, … n-1 Proposition: Given peer population M, if buffer length n is large enough, Mixed Strategy beats both Greedy and RF Comparison s(i) = 1 achieves upper bound For different buffer sizes Mixed achieves better continuity than both RF and Greedy Mixed has better start-up latency than RF Closer look with simulation Simulate 2000 time slots Continuity is the average of all peers Continuity for Mixed is more consistent, as well highest Adapting m to unknown population Adjust m so that p(m) achieves a target probability (e.g. 0.3) In simulation study, 100 new peers arrive every 100 slots m adapts to a larger value as population increases Optimality Proposition: Mixed strategy is asymptotically optimal Idea: Upper bound case: s(i)=1or q(i) = p(i))(1-p(i)) Derive continuity for upper bound and mixed, in terms of M (population) and n (buffer) Show they are the same order Extensions: Unsynchronized playback This is still on-going work Three unsynchronized cases considered so far: 1. 2. 3. Peers in one cluster have same buffer length but different playback offset. Peers in one cluster have same playback offset but different buffer length. Peers in different clusters and there are lags among clusters. Unsynchronized playback Case 1: Same buffer length different offset Objective: maximize peers’ own buffer overlap. Overlap Unsynchronized playback Case 1: Same buffer length different offset All peers can achieve maximum overlaps only when they have same playback offset, and it is a Nash Equilibrium. Peer*’s overlap = 20 Peer*’s overlap = 22 By moving from g1 to g2, peer* gains more overlap Unsynchronized playback Case 2: Same playback offset different buffer length If peers can change buffer length at will, what is the best strategy? We prove that cluster with same buffer length has better average continuity (by continuous model) Unsynchronized playback Case 2: Same offset different buffer length Peers playback the same chunk at different time slot according to startup algorithm, which is equivalent to the different buffer length case. Start-up algorithm: each peer will not start the playback until they have downloaded N consecutive chunks We run a simulation to determine the peer population of different clusters. M = 1000, n = 40, N = 10 The result is similar to a normal distr. Unsynchronized playback Case 2: Same latency different buffer length Simulation results validate that homogeneous buffer length performs better (continuity). In these simulations, we apply the population distribution derived by start-up algorithm to the heterogeneous case. Unsynchronized playback Case 3: Different clusters with lags Can the lag among clusters improve performance? We consider two cluster case, C1 and C2. Compare the average continuity of the single cluster and of two small cluster case. Unsynchronized playback Case 3: Different clusters with lags Can the lag among clusters improve performance? Yes The performance of two cluster case is better than single cluster More smaller clusters, better performance. Unsynchronized playback Conclusions: Given sufficient buffer size, it is better for peers to play the video at the same time in one cluster It is better for peers to chose the same buffer length in one cluster. If there are too many peers, it is better to divide them into different clusters. The lag among clusters can improve performance. Other directions ISP-friendly P2P algorithms (sensitive to ISP connectivity) P2P VoD May or may not be based on streaming Optimal replication problem (optimize what?) P2P traffic detection and traffic management The economics of ISP and P2P conflicts
© Copyright 2026 Paperzz