16 Application Measurement

#16
Application Measurement
Presentation by Bobin John
st
1
paper:
Measurement, Modeling &
Analysis of a Peer-to-Peer FileSharing Workload (KaZaa paper)
KaZaa paper
 P2P
file sharing is the most dominant
 This paper deals with KaZaa
 200-day
trace is taken
 Model is developed
 Locality-awareness can improve KaZaa
performance
KaZaa paper

Trace Methodology

KaZaa trace summary statistics

KaZaa “usernames” used
KaZaaLite … IPs used
Easy to distinguish KaZaa-specific HTTP headers
Auto-update transactions filtered out



KaZaa paper
 User
Characteristics
 KaZaa
users are patient
KaZaa paper

User Characteristics

Users slow down as they age

2 reasons: attrition & slowing down over time
KaZaa paper
 Client Activity
KaZaa paper
 Object
Characteristics
 Diverse
workload
KaZaa paper
 Object
Characteristics
 Object
Dynamics
Clients fetch objects at most once
 Popularity of objects is often short-lived
 Most popular objects tend to be recently born
objects
 Most requests are for old objects

KaZaa paper
 Object
 NOT

Characteristics
Zipf-like
Web access patterns follow the Zipf property
KaZaa paper
 Model
KaZaa paper
 Model
for P2P file-sharing workloads
 Model
Description
KaZaa paper
 Model
for P2P
 File-Sharing
client age
effectiveness diminishes with
KaZaa paper
 Model
 New
for P2P
Object Arrivals improve performance
KaZaa paper
 Model
 New
for P2P
clients cannot stabilize performance
KaZaa paper
 Model
for P2P
 Model
validation
KaZaa paper
 New
idea!
 How

Use a proxy cache


to reduce bandwidth cost?
Legal & political problems
Locality-aware request routing


Centralized request redirection
 redirector
Decentralized request redirection
 supernodes
KaZaa paper
 Locality
awareness
 Methodology
 Benefits
KaZaa paper
 Locality
awareness
 Accounting
for Hits & Misses
KaZaa paper
 Locality
awareness
 Availability
KaZaa paper

Conclusion




KaZaa workload is different
Does not follow Zipf
Can be improved with locality awareness
Drawbacks



A trace from a university ought not to be
generalized to all KaZaa/P2P applications
Further implementation details of localityawareness?
Scope of use for such a locality awareness tool?

I don’t think universities would like this
nd
2
paper:
An analysis of Internet Chat
systems
Chat paper
 Why
is chat a worthwhile target for
traffic characterization?
 Chat
offers computer mediated
communication
 Used by a large number of people …
potential of being habit-forming
Chat paper
 Different
 Internet
types of chat systems:
Relay Chat [IRC]
 Web-based chat systems
 ICQ & AIM
 Gale
Chat paper
 Problem
in analyzing chat traffic
 Multitude
& diversity of systems &
protocols
 Chat protocol realized on top of HTTP
protocol … difficult to separate chat traffic
 Resource limitations due to filtering
demands
Chat paper
 IRC
 Set
of connected servers
 Client connection requests on port 6667
 Unique nicknames
 Discussion channels
 Channel operators
 Medium to share data
 IRC operator
Chat paper
 Web-chat
tty-based … Web browser interface
 A single server to connect to
 3 classes of chat systems:
 Not
HTML-Web-Chat
 Applet-Web-Chat
 Applet-IRC-Chat

 Difference
between IRC & Web-chat is
only “social”
Chat paper

Identifying IRC chat traffic






Packet monitor that captures all TCP traffic
involving port 6667
Can only capture text & control messages
Data/file transfers cannot be captured as they run
on other TCP connections
IRC’s packet size distribution is mainly dominated
by small packets
IRC session should last more than a few minutes
IRC sends keep-alive messages
Chat paper
 Identifying
Web-chat traffic
 HTML-Web-chat:
Appropriate cache-control-headers
 Adding state information
 Cache-Control: Must-revalidate &
Cache-Control: Private indicates nonchat traffic
 Use of scripting languages e.g.,Javascript
 Use of applet windows e.g., Java

Chat paper
 Identifying
Web-chat traffic
 Applet-Web-chat:

User would have accessed a Java file or a
script or even a page like “xxxchatyyy” … “chat”
could occur even in the path
Chat paper
 Overall
traffic
strategy for extracting chat
Chat paper
 Overall
strategy for extracting chat
traffic
 Repeat
this process
Identify traffic that cannot be chat traffic
 Remove it

 Steps
that filter out more non-chat traffic
has to be implemented earlier
 Other steps that need more processin gor
pre-processing should be implemented
later
Chat paper
 Overall
strategy for extracting chat
traffic
 Eliminate
traces from ports < 1024 except
port 80
 Also eliminate trace from well-known
application ports (e.g., Gnutella - 6346)
 Group packets into flows
 Mark & filter them according to the
previous table
Chat paper
 Experiment
 At
University of Saarland
 Resource partitioning
 Traces were generated after filtering
 950GB > 1.2GB > 238MB (WEBCHAT1)
 192MB (IRC1)
 350MB (WEBCHAT2)
Chat paper:
 Validation
2
aspects:
Recall – ability of a system to present all
relevant items
 Precision – ability of a system to present only
relevant items

Chat paper
 Validation
 Lots
of calculations
“we can expect to locate about 91.7% of all real chat
connections and that we expect that at least 93.1% of all
connections we identify are indeed chat connections. “
Chat paper
 Results
 Session
durations
Chat paper
 Results
 Interarrival
times of sessions
Chat paper
 Results
 Packet
sizes
Chat paper
 Results
 Sent
& Received bytes
Chat paper
 Conclusion
 Chat-traffic
was successfully filtered out
 Accuracy was above 90%
 Drawbacks
 Use
of this work?