A Measurement Study of a Peer-to-Peer Video-on

A Measurement Study of a Peer-to-Peer
Video-on-Demand System
Bin Cheng1, Xuezheng Liu2, Zheng Zhang2 and Hai Jin1
1Huazhong
University of Science and Technology
2Microsoft Research Asia
IPTPS 2007, Feb. 28 2007
Motivation
 VoD is every coach potato’s dream
• Select anything, start at any time, jump to anywhere
 Centralized VoD is costly
• Servers, bandwidth, contents ()
 P2P VoD is attractive, but challenging:
• Harder than streaming: no single stream; unpredictable, multiple “swarms”
• Harder than file downloading: globally optimal (e.g. “rarest first”) policy
inapplicable
• VoD is a superset of file downloading and streaming
Main Contribution
 Detailed measurement of a real, deployed P2P VoD
system
• What do we measure?
 E.g. What does it mean that a system delivers good UX?
• How far off are we from an ideal system?
• How does users behave?
• Etc. Etc…
 Problems spotted
• There is a great tension between scalability and UX
• Network heterogeneity is an issue
 Is P2P VoD a luxury that poor peers cannot afford?
Outline
Motivation
 System background: GridCast
 Measurement methodology
 Evaluation
• Overall performance
• User behavior and UXexperience
 Conclusions
GridCast Overview
source
tracker
 Tracker server
•
web
Index all joined peers
 Source server
•
Stores a copy for every video file
channel list
 Web portal
•
Provide channel list
 Peer
•
•
•
Feed data to player
Cache all fetched data of the
current file
Exchange data with others
Initial neighbor list
One Overlay per Channel
 Finding the partners
•
•
•
•
Get the initial content-closer set from the tracker when joining
Periodically gossip with some near- & far-neighbors (30s)
Look up new near-neighbors from the current neighbors when seeking
Refresh the tracker every 5minutes
t
Scheduling (every 10s)
Current position
next 200 seconds
next 10 seconds
Feed to the player
Fetch the next 200 seconds from partners
(if they have them)
Fetch the next 10 seconds from the source
server if no partners have them
If bandwidth budget allows, fetch the rarest
anchor from the source server or partners
Anchor Prefetching
 Anchors are used to improve seek latency
• Each anchor is a segment of 10 seconds
• Anchors are 5 minutes apart
• Playhead adjusted to the nearest anchor (if present)
10s
5 Minutes
DataSet Summary
Log duration
Sept. & Oct. 2006
Number of visited users
About 20,000
Percent of CERNET users
98%
Percent of no-CERNET users
Netcom: 1% Unicom: 0.6% Unicom: 0.4%
Percent of NAT users
22.8%
Maximal online users
More than 360
Number of sessions
About 250,000
Number of videos
About 1,200 channels
Average Code rate
500~600kbps
Movie length
Mostly about one hour
Total bytes from the source server
11,420GB
Total bytes played by peers
15,083GB
System Setup
 GridCast was deployed since May 2006
• The tracker server and the Web server share one machine
• One source server with 100Mb, 2GB Memory and 1 TB disk
 Popularity keeps on climbing up; in Dec 2006 –
• Users : 91K; sessions: 290K; total bytes from server: 22TB
 Peer logs collected at the tracker (30s)
• Latency, jitter, buffer map and anchor usage
• Sep-log and Oct-log w/o and w/ log, respectively
 Just a matter of switch the codepath as the peer joins in
 The source server keeps other statistics (e.g. total bytes
served)
Strong Diurnal Pattern
 Hot time vs. cold time
 Two peaks
• After lunch time & before midnight
• Higher at weekends or holidays
350
number of online peers
• Hot time (10:00 ~24:00)
• Cold time (0:00 ~ 10:00)
400
300
250
200
150
100
50
0
Mon
Tue
Wed
Thu
Fri
date (Oct. 2006)
Sat
Sun
Scalability
 Ideal model: only the lead peer fetches from the source server
 cs model: all data from the source server
100
90
cs
GridCast
ideal
Significantly decreases the source server
load (against cs), especially in hot time.
normalized load
80
70
60
Follows quite closely the ideal curve.
50
40
30
20
10
0
13:00
22:00
6:00
load of the source server in a typical day
6:00
# of active channel increase 3x
from cold to hot – the long tail effect!
Understand the Ceiling
 Utilization = data from peers / total fetched data
• Calculated from the snapshots
 For the ideal model, utilization = (n-1)/n
• n is # of users in a session; or concurrency
 GridCast achieves the ideal when n is large
100
90
80
utilization(%)
70
60
Why?
50
ideal
GridCast
40
30
20
10
0
2
4
6
8
10
12
popularity
14
16
18
20
22
Why do we fall short (when n is small)
 The peer cannot get the content if:
•
•
•
It’s only available from the server (missing content); caused by random seeks
It exists in disconnected peers; caused by NAT
Its partners do not have enough bandwidth
100
Missing content
NAT
Limited Bandwidth
GridCast
utilization (%)
80
60
missing content dominates for those
unpopular files
40
20
0
2
3
4
5
6
popularity
7
8
UX: latency
 Startup Latency ( 70% < 5s, 90% < 10s )
 Seek latency ( 70% < 3.5s, 90% < 8s )
 Seek latency is smaller:
There is a 2-second delay to create TCP connections with initial partners
Short seeks hit cached data
110
100
90
seek
startup
80
70
CDF(%)
•
•
60
50
40
30
20
10
0
0
4
8
12
16
latency (sec.)
20
24
28
UX: jitter
 For sessions with 5 minutes, 72.3% has not any jitter
 For sessions with 40 minutes, 40.6% has not any jitter
 Avg. delayed data: 3~4%
100
90
no jitter percent
average delayed data percent
average delayed chunks number
80
72.3
%
70
124
120
100
60
54.7
49.7
50
47.8
43.2
52
30
20
26
7.4 9
42.0 66
41.7
40
10
140
44.9
80
67 40.6
57
60
50
40
32
20
6.2
4.3
5.0
3.8
3.5
3.4
3.0
3.2
5-10
10-15
15-20
20-25
25-30
30-35
35-40
>40
0
0
0-5
duration(minutes)
delayed chunks number
160
Reasons for Bad UX
 Network capacity
CERNET to CERNET: >100KB/s
Non-CERNET to Non-CERNET: 20~50KB/s
CERNET to Non-CERNET: 4-5KB/s
Bad UX in Non-CERNET region might have prevented swarm to form
70
60.2
60
average latency (sec.)
•
•
•
•
startup
seek
52.6
50
40
30
20
10
5.2
3.6
4.6
3.4
0
Non-CERNET
CERNET
network type
Campus
Reasons for Bad UX (cont.)
 Server stress and UX is inversely correlated
Hot time -> lots of active channels -> long tail -> high server stress -> bad UX
Most pronounced for movies at the tail (next slide)
120
server stress(bandwidth)
unacceptable jitter
unacceptable seeking
100
normalized value
•
•
80
60
40
20
0
7
9
11
13
15
17
19
time( hour )
21
23
1
3
5
UX Correlation with Concurrency
 Higher concurrency:
• Reduces both startup and seek latencies
• Reduces amount of jitters
 Getting close to that of cold time
8
unacceptable percentage(%)
32
average latency (sec.)
7
startup
seek
6
5
4
3
2
1
0
2
4
6
8
initial partner number
10
unacceptable jitter(in hot time)
unacceptable seeking(in hot time)
unacceptable jitter(in cold time)
unacceptable seeking(in cold time)
28
24
20
16
12
8
4
0
0
2
4
6
popularity
8
10
User Seek Behavior
 Seek behavior (Without anchor)
100
90
80
CDF(%)
FORWARD

BACKWARD

70
60
BACKWORAD:FORWARD
~= 3:7
50
40
30
Short seeks dominate
(80% within 500seconds)
20
10
0
-60
-50
-40
-30
-20
-10
0
10
20
30
seek distance (minutes)
40
50
60
Seek Behavior vs. Popularity
 Fewer seeks in more popular channels
 More popular channels usually have longer sessions
 So: stop making bad movies 
average number of seeks
seek
duration
3500
4.0
3000
3.6
2500
2000
3.2
1500
0
2
4
6
8
10
popularity (for file sessions)
12
duration of sessions (seconds)
4000
4.4
Benefit of Anchor Prefetching
 Significant reduction of seek latency
• FORWARD seeks get more benefit (seeks < 1s jump from 33% to 63%)
 “next-anchor first” is statistically optimal from any one peer’s
point of view
• “rarest-first” is globally optimal in reducing the load of the source server (sees
30% prefetched but unused
100
100
90
90
70
60
with anchor
without anchor
50
CDF(%)
CDF(%)
80
80
with anchor
without anchor
70
40
30
60
20
50
10
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30
a) forward seek latency (seconds)
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30
b) backward seek latency (seconds)
Conclusions
 A few things are not new:
• Diurnal pattern; the looooooooong tail of content
 A few things are new:
• Seeking behaviors (e.g. 7:3 split of forward/backward seeks; 80% seeks are
short etc.)
• The correlation of UX to source server stress and concurrency
 A few things are good to know:
• Even moderate concurrency improves system utilization and UX
• Simple prefetching helps to improve seeking performance
 A few things remain to be problematic
• The looooooong tail
• Network heterogeneity
 A lot remain to be done (and are being done)
• Multi-file caching and proactive replication
 http://grid.hust.edu.cn/gridcast
 http://www.gridcast.cn
Thank you!
Q&A