The Medusa Proxy - Defense - University of California San Diego

Whole Page
Performance
Leeann Bent and Geoffrey M. Voelker
University of California, San Diego
Whole Page Performance?

Extensive previous work on how specific
techniques affect individual object download.


However, user downloads pages of objects.


Caching, Prefetching, CDNs, DNS caching.
Not clear how individual object performance maps
onto whole page performance
Goal: Study whole page performance


Extent to which different optimizations are used
Effect on downloading whole pages of objects
August 14, 2002
WWCCD ‘02
2
Related Work

[Krishnamurthy and Wills99] look at:

Parallel (HTTP1.0), persistent and pipelined connections.
» In addition to caching, range requests, and content placed on
different servers.



Top-level pages of popular sites.
Focus on pages where all optimizations used.
Our Study:


Follow on, with a different perspective.
Use real user workloads.
» All pages, not just top level pages on popular servers
» Not all pages use optimizations


Base page + embedded objects.
Connection optimizations + CDNs + DNS.
August 14, 2002
WWCCD ‘02
3
Overview




Introduction
Methodology
Results
Conclusion
August 14, 2002
WWCCD ‘02
4
Methodology Overview

Use Medusa to:


Record everyday browsing from six users over four
days.
Replay traces toggling performance options:
»
»
»
»

Parallel Connections
Using CDNs
Complete DNS caching
Persistent Connections
Compute download costs for whole pages
August 14, 2002
WWCCD ‘02
5
The Medusa Proxy
User Driven Behavior
August 14, 2002
Trace Driven Behavior
WWCCD ‘02
6
Page Download Time

Page download time



Time required to download base page and all
embedded objects.
Reflects user-perceived web performance
Calculated using object download time.


Determine object download time from just after
DNS lookup to connection close or full object return
(persistent).
Incorporate original recorded DNS times where
appropriate.
August 14, 2002
WWCCD ‘02
7
Example
Individual Object Times:
Download Time (ms)
DNS (ms)
Obj1
Obj2
Obj3
Obj4
155
205
102
253
90
40
5
4
Page Download Times:
854 ms
Serial
Parallel
(2 conns)
August 14, 2002
259 ms
580 ms
WWCCD ‘02
580ms
8
Traces


Six users: April 27 - 30 (Sat. - Tues.).
Originally 22,228 objects and 1,455 pages.


Remove error pages.
Replay data gathered May 6-7 (Mon - Tues) & June
22-27 (Sat. – Thurs.).

Minimize warming effects by taking median of 5 consecutive
page downloads.
Users
6
August 14, 2002
Requests Pages Ave Requests per Page
13747
920
WWCCD ‘02
15.0
9
Optimization Combinations

Parallel Connections (1)



Medusa tracks number of concurrent connections used
during trace.
Used to replay parallel download.
CDN Usage (2)

When no CDN usage, remove CDN references.
»


When CDN usage enabled, traces left intact.
DNS Caching (3)



Replace with references to origin servers.
Simulate ideal DNS caching by excluding DNS time.
Normal DNS: add original DNS lookup times from trace.
Persistent Connections (4)

Use whichever protocol (1.0/1.1) recorded in original trace.
August 14, 2002
WWCCD ‘02
10
Overview




Introduction
Methodology
Results
Conclusion
August 14, 2002
WWCCD ‘02
11
Whole Page Optimizations
Parallel gives large
improvement.

CDN improvement
small.
 2.5%

DNS improvement
consistent.
 7.4%
 6.7%

Persistent connections
not as helpful as
expected
 1.5%

August 14, 2002
WWCCD ‘02
12
Overall Trace Conclusions

Parallelism has the greatest effect.


Parallelism used aggressively on all pages.
All other options provide incremental benefits.



Does not mean other optimizations don’t work.
Some overheads may be relatively small.
Average over all pages.
» Not all pages implement all optimizations.
» We don’t simulate more aggressive use of options than
found in original trace.

A closer look…
August 14, 2002
WWCCD ‘02
13
Ideal DNS Caching

Average DNS costs:



DNS improvement moderate across the board.


Per object: 7.1 ms
Per page: 529 ms
5 – 14% improvement across all pages.
Provides moderate benefit to all pages.


Not all objects require full DNS lookups
Already effective DNS caching in traces
August 14, 2002
WWCCD ‘02
14
Objects Per Page

We would expect some other optimizations to have a
greater effect (e.g. persistent connections).


Less opportunity for connection optimizations on small
pages.



Looking at all pages in trace doesn’t tell the whole story.
Page with one object counts as much as a page with 152
objects.
Optimizations more effective on a page with 152 objects.
Separate out effects of optimizations in pages with
different numbers of objects:


Median number of objects per page is 5.
Average number of objects per page is 15.
August 14, 2002
WWCCD ‘02
15
Page Breakdown
• 1-5 objects
• 1: 21%
• 2-5: 63%
• 6+ objects
improvements.
• 6-15: 157%
• 16+: 183%
•Persistent
•1.95%
• 18.5%
August 14, 2002
WWCCD ‘02
16
Page Breakdown Conclusions

Performance optimizations dependent on number of
objects per page.



Optimizations more effective when more objects per page.
Especially connection optimizations.
Single object pages see moderate improvement.



Can usually only benefit from DNS caching and CDNs.
Persistent benefit only if on same server as previous page.
And 26% of pages had one object
August 14, 2002
WWCCD ‘02
17
Persistent Connections

Still don’t see a whole lot of improvement for
persistent connections.


Expected to see more benefit for 16+ objects.
Not all pages use persistent connections.

20% of pages in our trace use them (229 pages).
» 2211 objects or 16.1%.
» 9.65 objects per page.

Look at only pages that contain persistent
connections.
August 14, 2002
WWCCD ‘02
18
Persistent Connections

Persistent connections useful if:



Many objects downloaded over persistent connections in the
original trace.
Objects downloaded from few servers.
For pages < 6 objects:

2 out of 3 downloaded with persistent connections.
» Average page size 3.


On average, 1.32 persistent objects per server.
For pages >= 16 objects:


Average 18 objects with persistent connections.
On average, 3.92 persistent objects per server.
August 14, 2002
WWCCD ‘02
19
Mostly Persistent Pages
• Know what it takes to see persistent optimization
improvement:
• Look at large pages where persistent connections used
extensively (>50% of objects).
Objects
per Page
Pages
(% persistent pages)
6-15
Method
14 (56%) serial
persistent
16+
45 (42%) serial
persistent

Mean
(ms)
Improvement
(%)
4000
2680
49.3%
6180
4660
32.6%
Pages that can benefit, do:

6+ objects improve 33-50%.
August 14, 2002
WWCCD ‘02
20
CDN

Previous study showed CDNs highly effective
for individual objects. [Koletsou01]


Few pages with explicit Akamai-hosted
objects.




What is effect on whole page performance?
48 pages or 5.2% of pages.
216 objects or 1.6% of total downloaded objects.
Average of 4.5 CDN objects per page.
Looked at CDN only page improvements:

CDNs improve CDN containing pages 6% - 30%.
August 14, 2002
WWCCD ‘02
21
Conclusions

Parallel connections have greatest impact.


Universally applicable and easy to implement.
Other options give incremental performance
across all pages.


Some optimizations provide consistent, but
moderate, improvement across all pages.
Some optimizations are not implemented on all
pages.
» Provide benefit when used extensively.
August 14, 2002
WWCCD ‘02
22
Conclusions

Can we draw correlation between object and realworld whole page performance?



Depends.
Not all optimizations widely used.
When optimizations are used to full advantage, they are
effective.
August 14, 2002
WWCCD ‘02
23
Medusa Available
http://ramp.ucsd.edu/~lbent/Medusa/index.html
August 14, 2002
WWCCD ‘02
24
The End
Medusa Proxy Functionality

Trace and Replay

Record requests and replay.
»
»

Transformation


CDN/no CDN replay.
Performance Measurement



Parallel connections.
Persistent connections.
Request latency.
DNS overhead.
Optimization options


Use parallel connections.
Use persistent connections.
»
»
August 14, 2002
HTTP 1.0 and HTTP 1.1.
Always attempt, never attempt, mirror trace attempt.
WWCCD ‘02
26
Page Delimitation

Determining pages:

Necessary for:
» Calculating total page costs.
» Limiting optimizations to within one page.


Parallel Connections.
Can analyze page and draw object dependencies.
» High overhead
» May impact user

Use inter-object times in the original trace data.

Use 2 second inter-object times.
August 14, 2002
WWCCD ‘02
27
Akamaized URLs



Akamai accounts for 85%-98% of CDN hosted
objects [ref].
Will not account for sites completely hosted on
Akamai hosts.
Filter:


http://a1964.g.akamai.net/f/1964/2730/1h/app.whenu.com/image.gif
http://app.whenu.com/image.gif
August 14, 2002
WWCCD ‘02
28
Interleaved Requests

Requests may get interleaved when recorded
in parallel mode and replayed in serial mode

E.G.
» Connection 0 requests: www.cnn.com,
www.cnn.com/style.css.
» Connection 1 requests: ar.atwola.com.

Requests may be ordered in trace as:
» www.cnn.com, ar.atwola.com, www.cnn.com/style.css.

Negates benefit of parallel connections.
August 14, 2002
WWCCD ‘02
29
Page Characterization:
Objects per Page
August 14, 2002
WWCCD ‘02
30
Object Types

Identified object type by clues in URL:





80% of URLs images (.gif, .jpg).
5.6% html file (.htm, .html).
3.8% cgi, perl or javascript (?,.pl, .class).
3.3% javascript (.js).
3.6% unidentified (no suffix, pdf, txt, etc).
August 14, 2002
WWCCD ‘02
31
Persistent Connection/Brower

Persistent connections appear correlated with
browser:




IE - 12% pgs, 15.8% objs.
Netscape - 19.5% pgs,10.0% objs.
Omniweb - 66.0% pgs, 72.4% objs.
Mozilla 5.0/Gecko - 95.8% pgs, 91.3% objs.
August 14, 2002
WWCCD ‘02
32
Persistent Connection Pages
Optimizations
Average
Median
Improvement (%) Improvement (%)
Serial
Parallel Connections
Parallel Connections with DNS, CDN

7.28%
-3.5%
24.03%
7.5%
12.5%
0.6%
Still not as improved as expected:


Better than for only large pages:
» Serial 7.28% vs. 1.98%
» Parallel 24.03% vs.18.5%
Medians don’t show improvements in all cases.
August 14, 2002
WWCCD ‘02
33
Mostly Persistent Pages
Objects
per Page
Pages
(% persistent pages)
6-15
16+
August 14, 2002
Method
14 (56%) serial
Mean
(ms)
4000
persistent
2680
parallel
1567
persistent/parallel
1414
45 (42%) serial
49.3%
10.1%
6180
persistent
4660
parallel
2524
persistent/parallel
1669
WWCCD ‘02
Improvement
(%)
32.6%
51.2%
34
Persistent Connections per
Page
August 14, 2002
WWCCD ‘02
35
Same as previous 16+
August 14, 2002
WWCCD ‘02
36
Ad-Servers

Identified by identifying hosts that were named
with the phrases “ads” and “adserver”.


YES: http://rmads.msn.com/images_47144_date_0429_50.jpg.
NO: http://graphics4.nytimes.com/ads/scottrade_sov.gif.
August 14, 2002
WWCCD ‘02
37
Ad-Servers and DNS

Number of pages with ad-servers.
» 9.5% of pages, 1.53% of total objects.
» Average of 2.4 ads per page.

Objects not hosted on content server.


DNS lookup may be large part of lookup cost.
DNS caching doesn’t give great improvement:

DNS caching improves parallel case 10.9%.
» Compared with 12.2% over all pages.

DNS caching improves parallel, persistent case 8%.
» Compared with 6.3% over all pages.

DNS caching improves parallel, persistent w/ CDN 4.7%.
» Compared to 6.3%.
August 14, 2002
WWCCD ‘02
38