A Measurement Study of a Peer-to-Peer Video-on-Demand System Bin Cheng1, Xuezheng Liu2, Zheng Zhang2 and Hai Jin1 1Huazhong University of Science and Technology 2Microsoft Research Asia IPTPS 2007, Feb. 28 2007 Motivation VoD is every coach potato’s dream • Select anything, start at any time, jump to anywhere Centralized VoD is costly • Servers, bandwidth, contents () P2P VoD is attractive, but challenging: • Harder than streaming: no single stream; unpredictable, multiple “swarms” • Harder than file downloading: globally optimal (e.g. “rarest first”) policy inapplicable • VoD is a superset of file downloading and streaming Main Contribution Detailed measurement of a real, deployed P2P VoD system • What do we measure? E.g. What does it mean that a system delivers good UX? • How far off are we from an ideal system? • How does users behave? • Etc. Etc… Problems spotted • There is a great tension between scalability and UX • Network heterogeneity is an issue Is P2P VoD a luxury that poor peers cannot afford? Outline Motivation System background: GridCast Measurement methodology Evaluation • Overall performance • User behavior and UXexperience Conclusions GridCast Overview source tracker Tracker server • web Index all joined peers Source server • Stores a copy for every video file channel list Web portal • Provide channel list Peer • • • Feed data to player Cache all fetched data of the current file Exchange data with others Initial neighbor list One Overlay per Channel Finding the partners • • • • Get the initial content-closer set from the tracker when joining Periodically gossip with some near- & far-neighbors (30s) Look up new near-neighbors from the current neighbors when seeking Refresh the tracker every 5minutes t Scheduling (every 10s) Current position next 200 seconds next 10 seconds Feed to the player Fetch the next 200 seconds from partners (if they have them) Fetch the next 10 seconds from the source server if no partners have them If bandwidth budget allows, fetch the rarest anchor from the source server or partners Anchor Prefetching Anchors are used to improve seek latency • Each anchor is a segment of 10 seconds • Anchors are 5 minutes apart • Playhead adjusted to the nearest anchor (if present) 10s 5 Minutes DataSet Summary Log duration Sept. & Oct. 2006 Number of visited users About 20,000 Percent of CERNET users 98% Percent of no-CERNET users Netcom: 1% Unicom: 0.6% Unicom: 0.4% Percent of NAT users 22.8% Maximal online users More than 360 Number of sessions About 250,000 Number of videos About 1,200 channels Average Code rate 500~600kbps Movie length Mostly about one hour Total bytes from the source server 11,420GB Total bytes played by peers 15,083GB System Setup GridCast was deployed since May 2006 • The tracker server and the Web server share one machine • One source server with 100Mb, 2GB Memory and 1 TB disk Popularity keeps on climbing up; in Dec 2006 – • Users : 91K; sessions: 290K; total bytes from server: 22TB Peer logs collected at the tracker (30s) • Latency, jitter, buffer map and anchor usage • Sep-log and Oct-log w/o and w/ log, respectively Just a matter of switch the codepath as the peer joins in The source server keeps other statistics (e.g. total bytes served) Strong Diurnal Pattern Hot time vs. cold time Two peaks • After lunch time & before midnight • Higher at weekends or holidays 350 number of online peers • Hot time (10:00 ~24:00) • Cold time (0:00 ~ 10:00) 400 300 250 200 150 100 50 0 Mon Tue Wed Thu Fri date (Oct. 2006) Sat Sun Scalability Ideal model: only the lead peer fetches from the source server cs model: all data from the source server 100 90 cs GridCast ideal Significantly decreases the source server load (against cs), especially in hot time. normalized load 80 70 60 Follows quite closely the ideal curve. 50 40 30 20 10 0 13:00 22:00 6:00 load of the source server in a typical day 6:00 # of active channel increase 3x from cold to hot – the long tail effect! Understand the Ceiling Utilization = data from peers / total fetched data • Calculated from the snapshots For the ideal model, utilization = (n-1)/n • n is # of users in a session; or concurrency GridCast achieves the ideal when n is large 100 90 80 utilization(%) 70 60 Why? 50 ideal GridCast 40 30 20 10 0 2 4 6 8 10 12 popularity 14 16 18 20 22 Why do we fall short (when n is small) The peer cannot get the content if: • • • It’s only available from the server (missing content); caused by random seeks It exists in disconnected peers; caused by NAT Its partners do not have enough bandwidth 100 Missing content NAT Limited Bandwidth GridCast utilization (%) 80 60 missing content dominates for those unpopular files 40 20 0 2 3 4 5 6 popularity 7 8 UX: latency Startup Latency ( 70% < 5s, 90% < 10s ) Seek latency ( 70% < 3.5s, 90% < 8s ) Seek latency is smaller: There is a 2-second delay to create TCP connections with initial partners Short seeks hit cached data 110 100 90 seek startup 80 70 CDF(%) • • 60 50 40 30 20 10 0 0 4 8 12 16 latency (sec.) 20 24 28 UX: jitter For sessions with 5 minutes, 72.3% has not any jitter For sessions with 40 minutes, 40.6% has not any jitter Avg. delayed data: 3~4% 100 90 no jitter percent average delayed data percent average delayed chunks number 80 72.3 % 70 124 120 100 60 54.7 49.7 50 47.8 43.2 52 30 20 26 7.4 9 42.0 66 41.7 40 10 140 44.9 80 67 40.6 57 60 50 40 32 20 6.2 4.3 5.0 3.8 3.5 3.4 3.0 3.2 5-10 10-15 15-20 20-25 25-30 30-35 35-40 >40 0 0 0-5 duration(minutes) delayed chunks number 160 Reasons for Bad UX Network capacity CERNET to CERNET: >100KB/s Non-CERNET to Non-CERNET: 20~50KB/s CERNET to Non-CERNET: 4-5KB/s Bad UX in Non-CERNET region might have prevented swarm to form 70 60.2 60 average latency (sec.) • • • • startup seek 52.6 50 40 30 20 10 5.2 3.6 4.6 3.4 0 Non-CERNET CERNET network type Campus Reasons for Bad UX (cont.) Server stress and UX is inversely correlated Hot time -> lots of active channels -> long tail -> high server stress -> bad UX Most pronounced for movies at the tail (next slide) 120 server stress(bandwidth) unacceptable jitter unacceptable seeking 100 normalized value • • 80 60 40 20 0 7 9 11 13 15 17 19 time( hour ) 21 23 1 3 5 UX Correlation with Concurrency Higher concurrency: • Reduces both startup and seek latencies • Reduces amount of jitters Getting close to that of cold time 8 unacceptable percentage(%) 32 average latency (sec.) 7 startup seek 6 5 4 3 2 1 0 2 4 6 8 initial partner number 10 unacceptable jitter(in hot time) unacceptable seeking(in hot time) unacceptable jitter(in cold time) unacceptable seeking(in cold time) 28 24 20 16 12 8 4 0 0 2 4 6 popularity 8 10 User Seek Behavior Seek behavior (Without anchor) 100 90 80 CDF(%) FORWARD BACKWARD 70 60 BACKWORAD:FORWARD ~= 3:7 50 40 30 Short seeks dominate (80% within 500seconds) 20 10 0 -60 -50 -40 -30 -20 -10 0 10 20 30 seek distance (minutes) 40 50 60 Seek Behavior vs. Popularity Fewer seeks in more popular channels More popular channels usually have longer sessions So: stop making bad movies average number of seeks seek duration 3500 4.0 3000 3.6 2500 2000 3.2 1500 0 2 4 6 8 10 popularity (for file sessions) 12 duration of sessions (seconds) 4000 4.4 Benefit of Anchor Prefetching Significant reduction of seek latency • FORWARD seeks get more benefit (seeks < 1s jump from 33% to 63%) “next-anchor first” is statistically optimal from any one peer’s point of view • “rarest-first” is globally optimal in reducing the load of the source server (sees 30% prefetched but unused 100 100 90 90 70 60 with anchor without anchor 50 CDF(%) CDF(%) 80 80 with anchor without anchor 70 40 30 60 20 50 10 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 a) forward seek latency (seconds) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 b) backward seek latency (seconds) Conclusions A few things are not new: • Diurnal pattern; the looooooooong tail of content A few things are new: • Seeking behaviors (e.g. 7:3 split of forward/backward seeks; 80% seeks are short etc.) • The correlation of UX to source server stress and concurrency A few things are good to know: • Even moderate concurrency improves system utilization and UX • Simple prefetching helps to improve seeking performance A few things remain to be problematic • The looooooong tail • Network heterogeneity A lot remain to be done (and are being done) • Multi-file caching and proactive replication http://grid.hust.edu.cn/gridcast http://www.gridcast.cn Thank you! Q&A
© Copyright 2026 Paperzz