Whole Page Performance Leeann Bent and Geoffrey M. Voelker University of California, San Diego Whole Page Performance? Extensive previous work on how specific techniques affect individual object download. However, user downloads pages of objects. Caching, Prefetching, CDNs, DNS caching. Not clear how individual object performance maps onto whole page performance Goal: Study whole page performance Extent to which different optimizations are used Effect on downloading whole pages of objects August 14, 2002 WWCCD ‘02 2 Related Work [Krishnamurthy and Wills99] look at: Parallel (HTTP1.0), persistent and pipelined connections. » In addition to caching, range requests, and content placed on different servers. Top-level pages of popular sites. Focus on pages where all optimizations used. Our Study: Follow on, with a different perspective. Use real user workloads. » All pages, not just top level pages on popular servers » Not all pages use optimizations Base page + embedded objects. Connection optimizations + CDNs + DNS. August 14, 2002 WWCCD ‘02 3 Overview Introduction Methodology Results Conclusion August 14, 2002 WWCCD ‘02 4 Methodology Overview Use Medusa to: Record everyday browsing from six users over four days. Replay traces toggling performance options: » » » » Parallel Connections Using CDNs Complete DNS caching Persistent Connections Compute download costs for whole pages August 14, 2002 WWCCD ‘02 5 The Medusa Proxy User Driven Behavior August 14, 2002 Trace Driven Behavior WWCCD ‘02 6 Page Download Time Page download time Time required to download base page and all embedded objects. Reflects user-perceived web performance Calculated using object download time. Determine object download time from just after DNS lookup to connection close or full object return (persistent). Incorporate original recorded DNS times where appropriate. August 14, 2002 WWCCD ‘02 7 Example Individual Object Times: Download Time (ms) DNS (ms) Obj1 Obj2 Obj3 Obj4 155 205 102 253 90 40 5 4 Page Download Times: 854 ms Serial Parallel (2 conns) August 14, 2002 259 ms 580 ms WWCCD ‘02 580ms 8 Traces Six users: April 27 - 30 (Sat. - Tues.). Originally 22,228 objects and 1,455 pages. Remove error pages. Replay data gathered May 6-7 (Mon - Tues) & June 22-27 (Sat. – Thurs.). Minimize warming effects by taking median of 5 consecutive page downloads. Users 6 August 14, 2002 Requests Pages Ave Requests per Page 13747 920 WWCCD ‘02 15.0 9 Optimization Combinations Parallel Connections (1) Medusa tracks number of concurrent connections used during trace. Used to replay parallel download. CDN Usage (2) When no CDN usage, remove CDN references. » When CDN usage enabled, traces left intact. DNS Caching (3) Replace with references to origin servers. Simulate ideal DNS caching by excluding DNS time. Normal DNS: add original DNS lookup times from trace. Persistent Connections (4) Use whichever protocol (1.0/1.1) recorded in original trace. August 14, 2002 WWCCD ‘02 10 Overview Introduction Methodology Results Conclusion August 14, 2002 WWCCD ‘02 11 Whole Page Optimizations Parallel gives large improvement. CDN improvement small. 2.5% DNS improvement consistent. 7.4% 6.7% Persistent connections not as helpful as expected 1.5% August 14, 2002 WWCCD ‘02 12 Overall Trace Conclusions Parallelism has the greatest effect. Parallelism used aggressively on all pages. All other options provide incremental benefits. Does not mean other optimizations don’t work. Some overheads may be relatively small. Average over all pages. » Not all pages implement all optimizations. » We don’t simulate more aggressive use of options than found in original trace. A closer look… August 14, 2002 WWCCD ‘02 13 Ideal DNS Caching Average DNS costs: DNS improvement moderate across the board. Per object: 7.1 ms Per page: 529 ms 5 – 14% improvement across all pages. Provides moderate benefit to all pages. Not all objects require full DNS lookups Already effective DNS caching in traces August 14, 2002 WWCCD ‘02 14 Objects Per Page We would expect some other optimizations to have a greater effect (e.g. persistent connections). Less opportunity for connection optimizations on small pages. Looking at all pages in trace doesn’t tell the whole story. Page with one object counts as much as a page with 152 objects. Optimizations more effective on a page with 152 objects. Separate out effects of optimizations in pages with different numbers of objects: Median number of objects per page is 5. Average number of objects per page is 15. August 14, 2002 WWCCD ‘02 15 Page Breakdown • 1-5 objects • 1: 21% • 2-5: 63% • 6+ objects improvements. • 6-15: 157% • 16+: 183% •Persistent •1.95% • 18.5% August 14, 2002 WWCCD ‘02 16 Page Breakdown Conclusions Performance optimizations dependent on number of objects per page. Optimizations more effective when more objects per page. Especially connection optimizations. Single object pages see moderate improvement. Can usually only benefit from DNS caching and CDNs. Persistent benefit only if on same server as previous page. And 26% of pages had one object August 14, 2002 WWCCD ‘02 17 Persistent Connections Still don’t see a whole lot of improvement for persistent connections. Expected to see more benefit for 16+ objects. Not all pages use persistent connections. 20% of pages in our trace use them (229 pages). » 2211 objects or 16.1%. » 9.65 objects per page. Look at only pages that contain persistent connections. August 14, 2002 WWCCD ‘02 18 Persistent Connections Persistent connections useful if: Many objects downloaded over persistent connections in the original trace. Objects downloaded from few servers. For pages < 6 objects: 2 out of 3 downloaded with persistent connections. » Average page size 3. On average, 1.32 persistent objects per server. For pages >= 16 objects: Average 18 objects with persistent connections. On average, 3.92 persistent objects per server. August 14, 2002 WWCCD ‘02 19 Mostly Persistent Pages • Know what it takes to see persistent optimization improvement: • Look at large pages where persistent connections used extensively (>50% of objects). Objects per Page Pages (% persistent pages) 6-15 Method 14 (56%) serial persistent 16+ 45 (42%) serial persistent Mean (ms) Improvement (%) 4000 2680 49.3% 6180 4660 32.6% Pages that can benefit, do: 6+ objects improve 33-50%. August 14, 2002 WWCCD ‘02 20 CDN Previous study showed CDNs highly effective for individual objects. [Koletsou01] Few pages with explicit Akamai-hosted objects. What is effect on whole page performance? 48 pages or 5.2% of pages. 216 objects or 1.6% of total downloaded objects. Average of 4.5 CDN objects per page. Looked at CDN only page improvements: CDNs improve CDN containing pages 6% - 30%. August 14, 2002 WWCCD ‘02 21 Conclusions Parallel connections have greatest impact. Universally applicable and easy to implement. Other options give incremental performance across all pages. Some optimizations provide consistent, but moderate, improvement across all pages. Some optimizations are not implemented on all pages. » Provide benefit when used extensively. August 14, 2002 WWCCD ‘02 22 Conclusions Can we draw correlation between object and realworld whole page performance? Depends. Not all optimizations widely used. When optimizations are used to full advantage, they are effective. August 14, 2002 WWCCD ‘02 23 Medusa Available http://ramp.ucsd.edu/~lbent/Medusa/index.html August 14, 2002 WWCCD ‘02 24 The End Medusa Proxy Functionality Trace and Replay Record requests and replay. » » Transformation CDN/no CDN replay. Performance Measurement Parallel connections. Persistent connections. Request latency. DNS overhead. Optimization options Use parallel connections. Use persistent connections. » » August 14, 2002 HTTP 1.0 and HTTP 1.1. Always attempt, never attempt, mirror trace attempt. WWCCD ‘02 26 Page Delimitation Determining pages: Necessary for: » Calculating total page costs. » Limiting optimizations to within one page. Parallel Connections. Can analyze page and draw object dependencies. » High overhead » May impact user Use inter-object times in the original trace data. Use 2 second inter-object times. August 14, 2002 WWCCD ‘02 27 Akamaized URLs Akamai accounts for 85%-98% of CDN hosted objects [ref]. Will not account for sites completely hosted on Akamai hosts. Filter: http://a1964.g.akamai.net/f/1964/2730/1h/app.whenu.com/image.gif http://app.whenu.com/image.gif August 14, 2002 WWCCD ‘02 28 Interleaved Requests Requests may get interleaved when recorded in parallel mode and replayed in serial mode E.G. » Connection 0 requests: www.cnn.com, www.cnn.com/style.css. » Connection 1 requests: ar.atwola.com. Requests may be ordered in trace as: » www.cnn.com, ar.atwola.com, www.cnn.com/style.css. Negates benefit of parallel connections. August 14, 2002 WWCCD ‘02 29 Page Characterization: Objects per Page August 14, 2002 WWCCD ‘02 30 Object Types Identified object type by clues in URL: 80% of URLs images (.gif, .jpg). 5.6% html file (.htm, .html). 3.8% cgi, perl or javascript (?,.pl, .class). 3.3% javascript (.js). 3.6% unidentified (no suffix, pdf, txt, etc). August 14, 2002 WWCCD ‘02 31 Persistent Connection/Brower Persistent connections appear correlated with browser: IE - 12% pgs, 15.8% objs. Netscape - 19.5% pgs,10.0% objs. Omniweb - 66.0% pgs, 72.4% objs. Mozilla 5.0/Gecko - 95.8% pgs, 91.3% objs. August 14, 2002 WWCCD ‘02 32 Persistent Connection Pages Optimizations Average Median Improvement (%) Improvement (%) Serial Parallel Connections Parallel Connections with DNS, CDN 7.28% -3.5% 24.03% 7.5% 12.5% 0.6% Still not as improved as expected: Better than for only large pages: » Serial 7.28% vs. 1.98% » Parallel 24.03% vs.18.5% Medians don’t show improvements in all cases. August 14, 2002 WWCCD ‘02 33 Mostly Persistent Pages Objects per Page Pages (% persistent pages) 6-15 16+ August 14, 2002 Method 14 (56%) serial Mean (ms) 4000 persistent 2680 parallel 1567 persistent/parallel 1414 45 (42%) serial 49.3% 10.1% 6180 persistent 4660 parallel 2524 persistent/parallel 1669 WWCCD ‘02 Improvement (%) 32.6% 51.2% 34 Persistent Connections per Page August 14, 2002 WWCCD ‘02 35 Same as previous 16+ August 14, 2002 WWCCD ‘02 36 Ad-Servers Identified by identifying hosts that were named with the phrases “ads” and “adserver”. YES: http://rmads.msn.com/images_47144_date_0429_50.jpg. NO: http://graphics4.nytimes.com/ads/scottrade_sov.gif. August 14, 2002 WWCCD ‘02 37 Ad-Servers and DNS Number of pages with ad-servers. » 9.5% of pages, 1.53% of total objects. » Average of 2.4 ads per page. Objects not hosted on content server. DNS lookup may be large part of lookup cost. DNS caching doesn’t give great improvement: DNS caching improves parallel case 10.9%. » Compared with 12.2% over all pages. DNS caching improves parallel, persistent case 8%. » Compared with 6.3% over all pages. DNS caching improves parallel, persistent w/ CDN 4.7%. » Compared to 6.3%. August 14, 2002 WWCCD ‘02 38
© Copyright 2026 Paperzz