Objective-Optimal Algorithms for Long-term Web Prefetching Ajay Kshemkalyani (jointly with Bin Wu) Univ. of Illinois at Chicago [email protected] 1 Outline • • • • Prefetching: definition and background Survey of web prefetching algorithms Performance metrics Objective-Greedy algorithms (O(n) time) – Hit rate greedy (also hit rate optimal) – Bandwidth greedy (also bandwidth optimal) – H/B greedy • H/B-Optimal algorithm (expected O(n) time) • Simulation results • Variants under different constraints 2 Introduction Web caching reduces user-perceived latency – Client-server mode – Bottleneck occurs at server side – Means of improving performance: • local cache, proxy server, server farm, … – Cache management: LRU, Greedy dual-size, … On-demand caching vs. (long-term) prefetching – Prefetching is effective in dynamic environments. – Clients subscribe to web objects – Server “pushes” fresh copies into web caches – Selection of prefetched objects based on long-term statistical characteristics, maintained by CDS 3 Introduction • Web prefetching Caches web objects in advance Updated by web server Reduces retrieval latency and user access time Requires more bandwidth and increases traffic. • Performance metrics Hit rate Bandwidth usage Balance of the two 4 Object Selection Criteria Popularity (Access frequency) Lifetime Good Fetch APL 5 Web Object Characteristics • Access frequency Zipf-like request model is used in web traffic modeling. The relationship between access frequency p and popularity rank i of web object: p k / i, 1 where k 1 / i i 6 Web Object Characteristics The generalized “Zipf’s-like” distribution of web requests is calculated as: 1 p k /i , where k 1/ i i k is a normalization constant, i is the object ID (popularity rank), and α is a Zipf’s parameter: 0.986 (Cunha et al.), 0.75 (Nishikawa et al.) and 0.64 (Breslau et al.) 7 Web Object Characteristics • Size of objects (heavy-tailed Pareto, lognormal) – Average object size:10–15 KB. – No strong correlation between object size si and its access frequency pi. • Access (read) pattern of objects: (Poisson) – Average access rate api • Lifetime of web objects (exponential) – Average time interval between updates li – Weak correlation between access frequency pi and lifetime li. 8 Caching/Prefetching Architecture Reuters NYSE Cache Prefetching algorithm BBC BSE 9 Caching Architecture • Prefetching selection algorithms use as an input these global statistics: – estimates of object reference frequencies – estimates of object lifetimes • Content distribution servers cooperate to maintain these statistics • When an object is updated in the original server, the new version will be sent to any cache that has subscribed to it. 10 Solution space for web prefetching • Two extreme cases: Passive caches (non-prefetching) - Least network bandwidth and lowest cache hit rate Prefetching all objects - 100% cache hit rate - Huge amount of unnecessary bandwidth • Existing algorithms use different object-selecting criteria and fetch objects exceeding some threshold. 11 Existing Prefetching Algorithms • Popularity [Markatos et al.] Keeps the most popular objects in the system Updates these objects immediately when they change Criterion – object’s popularity Expected to achieve high hit rate • Lifetime [Jiang et al.] Keeps objects with longest lifetimes Mostly considers the network resource demands Threshold – the expected lifetime of object Expected to minimize bandwidth usage 14 Existing Prefetching Algorithms • Good Fetch [Venkataramani et al.] Computes the probability that an object is accessed before it changes. Prefetches objects with “high probability of being accessed during their average lifetime” Prefetches object i if the probability exceeds threshold. Objects with higher access frequencies and longer update intervals are more likely to be prefetched Balances the benefit (hit rate increase) against the cost (bandwidth increase) of keeping an object. 15 Existing Prefetching Algorithms • APL [Jiang et al.] • Computes apl values of web objects. • apl of an object represents “expected number of accesses during its lifetime” • Prefetches object i if its apl exceeds threshold. • Tends to improve hit rate; attempts to balance benefit (hit rate) against cost (bandwidth). • Enhanced APL: apkl – k>1, prefers objects with higher popularity (emphasize hit rate) – k<1, prefers objects with longer lifetime (emphasize network bandwidth) 16 Objective-Greedy Algorithms • Existing algorithms choose prefetching criteria based on intuitions – not aimed at any specific performance metrics – consider only individual objects’ characteristics, not the global impact • None gives optimal performance based on any metric – Simple counter-examples can be shown 17 Objective-Greedy Algorithms • Objective-Greedy algorithms select criteria to intentionally improve performance based on various metrics. • E.g., Hit Rate-Greedy algorithm aims to improve the overall hit rate, thus, reduce the latency of object requests. 18 Steady State Properties • Steady state hit rate for object i hi api li api li 1 api li api li 1 1 object i is not prefetched object i is prefetched is defined as freshness factor, f(i) • Overall hit rate: H ph i i i • On-demand hit rate: H demand pi f (i) i 19 Steady State Properties • Steady state bandwidth for object i bi api (1 f ( i )) si si li • Total bandwidth: object i is not prefetched object i is prefetched BW bi i • On-demand bw: BWdemand api (1 f (i)) si i 20 Objective Metrics • Hit rate – benefit • Bandwidth – cost • H/B model – balance of benefit and cost Basic H/B Enhanced H/B H B Hk B Hit Prefetching Hit Demand BWPrefetching BW Demand ( Hit Prefetching Hit Demand ) k BWPrefetching BWDemand 21 H/B-Greedy Prefetching • Considers the H/B value of on-demand caching: Hit demand H B demand BWdemand p i f (i ) i si f (i ) i li • If object j is prefetched, then H/B is updated to: p iS iS f (i ) p j (1 f ( j )) H sj si B demand f (i ) (1 f ( j )) li lj i p j (1 f ( j )) 1 p f ( i ) i iS sj (1 f ( j )) lj 1 si f (i ) iS li 22 H/B-Greedy Prefetching • We define p j (1 f ( j )) 1 pi f (i ) iS sj (1 f ( j )) lj 1 si f (i ) iS li as the increase factor of object j, incr(j). • incr(j) indicates the factor by which H/B can be increased if object j is selected. 23 H/B-Greedy Prefetching • H/B-Greedy prefetching prefetches those m objects with greatest increase factors. • The selection is based on the effect of prefetching individual objects on the hit rate. • H/B-Greedy is still not an optimal algorithm in terms of H/B value. 24 25 Hit Rate-Greedy Prefetching • To maximize the overall hit rate given the number of objects to prefetch, m, we select the m objects with the greatest hit rate contribution: pi HR _ Contr(i ) pi (1 f (i )) api li 1 • This algorithm is optimal in terms of hit rate. 26 Bandwidth-Greedy Prefetching • To minimize the total bandwidth given m, the number of objects to prefetch, we select the m objects with least bandwidth contribution: si si BW _ Contr(i ) (1 f (i )) li api li2 li • Bandwidth-Greedy Prefetching is optimal in terms of bandwidth consumption. 27 H/B-Optimal Prefetching • Optimal algorithm for H/B metric provided by a solution to the following selection problem. p f ( i ) p ( 1 f ( j )) ' j i H i S jS S ' arg max arg max sj si B pref S ' S , S m S ' S , S m l f (i ) ' l (1 f ( j )) jS j iS i • This is equivalent to maximum weighted average problem with pre-selected items. 28 Maximum Weighted Average Maximum Weighted Average Problem: • Totally n courses, with different credit hours and scores • select m (m < n ) courses • maximize the GPA of m selected courses Solution: • If m=1 Then select course with highest score What if m>1? Misleading intuition: select m courses with highest scores. 29 A Course Selection Problem (example) Courses A Credit hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0 Scores 70 90 B C D E 95 85 75 F G 60 65 H 80 • If m=2 If we select the 2 courses with highest scores: C and B. then GPA: 93.33 But if we select C and D, then GPA: 93.57 • Question: how to select m courses such that the GPA is maximized? Answer: Eppstein & Hirschberg solved this 30 With Pre-selected Items Courses A B C D E F G H Credit hours 5.0 3.0 1.0 6.0 2.0 4.0 3.0 6.0 Scores 70 90 95 85 75 60 65 80 Maximum Weighted Average with pre-selected items: • Totally n courses, with different credit hours and scores • Example: – Courses A and E must be selected, plus: – Select additional m (m is given, m<n) courses, such that: the resulting GPA is maximized (m=1): with D, GPA=77.7, with C, GPA=74.3, with B, GPA=77 31 Pre-selection clause … (example) Course A B C D E F G H I Credit 5.0 1.0 2.0 10.0 1.5 2.5 2.0 3.0 4.0 Score 60 95 85 83 63 71 80 77 65 1) Selection domain B~I, no pre-selection, m=2 optimal subset: {B,C}, GPA: 88.33 2) Selection domain B~I, A is pre-selected, m=2 one candidate subset: {A,D,H}, GPA: 75.61 better than: {A,B,C}, GPA: 70.625 Conclusion: {B,C} not contained in optimal subset for pre-selection problem. 32 H/B-Optimal v.s. Course Selection • The problem is formulated as: v0 v j jS ' ' S arg max w w S ' S , S m 0 ' j jS Where v0=5.0*70+2.0*75=500, and w0=5.0+2.0=7.0, in the previous example. • Equivalent to H/B-Optimal selection problem: pi f (i ) ' p j (1 f ( j )) iS jS ' S arg max sj si S ' S , S m l f (i ) ' l (1 f ( j )) jS j iS i 33 H/B-Optimal v.s. Course Selection 34 H/B-Optimal Algorithm Design • The selection of m courses is not trivial • For course i, we define auxiliary function v0 w0 ri ( x) (vi ) (wi ) x m m • And for a given number m, we define a Utility function F ( x) max ri ( x) S ' m , S ' S iS ' 35 H/B-Optimal Algorithm Design • Lemma 1 Suppose A* is the maximum GPA we are computing, then for any subset S’ S and |S|=m 1). * r ( A i ) 0; iS ' 2). * r ( A i ) 0 iff S ' is the optimal subset. iS ' Thus, the optimal subset contains those courses that have the m largest ri (A*) values 36 H/B-Optimal Algorithm Design • n=6, m=4 • Each line is ri (x) • Assume we know A* • Optimal subset has the 4 courses with largest ri (A*) values. • Dilemma: A* is unknown 37 H/B-Optimal Algorithm Design • Lemma 2: 1). F ( x ) 0 iff x A* 2). F ( x ) 0 iff x A* 3). F ( x ) 0 iff x A* • Lemma 2 used to narrow range of A* (Xl , Xr) is the current A*-range 38 H/B-Optimal Algorithm Design • If F (xl) > 0 and F (xr) < 0, then A* in (xl, xr) • Compute F((xl+xr)/2) - if F((xl+xr)/2) > 0, then A* > (xl+xr)/2 - if F((xl+xr)/2) < 0, then A* < (xl+xr)/2 - if F((xl+xr)/2) = 0, then A* = (xl+xr)/2; (Lemma 2) • Narrow the range of A* by half (use binary search) 39 H/B-Optimal Algorithm Design (Idea) • Why keep on narrowing down the range of A* ? – If intersection of rj (x) and rk (x) falls out of range, then • the ordering of rj (x) and rk (x) is determined within the range, so is rj (A*) and rk (A*), by comparing their slopes. – If the range is narrow enough that there are no intersections of r (x) lines within the range then • the total ordering of all r (A*) values is determined. – Now our optimal problem is solved: just select the m candidates with highest r (A*) values. 40 H/B-Optimal Algorithm Design • However, the total ordering requires O(n2) time complexity • A randomized approach is used instead, this randomized algorithm: – Iteratively reduces the problem domain into a smaller one. – The algorithm maintains 4 sets: • X, Y, E, Z, initially empty • (larger, smaller, equal, or undetermined r) 41 H/B-Optimal Algorithm Design In each iteration, randomly select a course i, compare it with each other course k. One of 4 possibilities: 1). if rk(A*) > ri(A*): insert k in set X 2). if rk(A*) < ri(A*): insert k in set Y 3). if wk=wi and vk=vi: insert k in set E 4). if undetermined: insert k in set Z Now do the following loop: loop: narrow the range of A* by half compare ri(A*) with rk’(A*) for k’ in Z if appropriate, move k’ to X or Y, accordingly until |Z| is sufficiently small (i.e., |Z| < |S|/32) 42 H/B-Optimal Algorithm Design • After the loop, either X or Y has “enough” members to ensure speedy “convergence”. • Next, examine and compare the sizes of X, Y and E: – |X|+|E| > m – |Y|+|E| > |S|-m // delete Y //combine X and E into 1 course 43 H/B-Optimal Algorithm Design 1). If |X|+|E| > m: At least m courses whose r(A*) values are greater than r(A*) value of all courses in Y. All members in Y may be removed. Then: |S| = |S| - |Y| 44 H/B-Optimal Algorithm Design 2). If |Y|+|E| > |S|-m: All members in X are among the top m courses. All members in X must be in the optimal set. Collapse X into a single course (This course is included in the final optimal set). Then: |S| = |S| - |X| + 1; m = m - |X| + 1. 45 H/B-Optimal Algorithm Design • In either case, the resulting domain has reduced size. • By iteratively removing or collapsing courses, the problem domain finally has only one course remaining: formed by collapsing all courses in optimal set. • Expected time complexity: (Assume Sb is the domain before iteration and Sa after.) 1). Each iteration takes expected time O(|Sb|) 2). Expected size |Sa| = (207/256) |Sb| The recurrence relation of the iteration: T(n) = O(n) + T[(207/256)n] Resolves to linear time complexity. 46 H/B-Greedy v.s. H/B-Optimal • H/B-Greedy is an approximation to H/B-Optimal • H/B-Greedy achieves higher H/B metric than any existing algorithms. • H/B-Greedy easier to implement than H/BOptimal. – Lower constant – Easily adjust to updates of object characteristics 47 Simulation Results • Evaluation of H/B Greedy Prefetching Figure 1: H/B,for total object number Figure 2: H/B,for total object number Figure 3: H/B,for total object number Figure 4: H/B,for total object number =1,000. =10,000. =100,000. =1,000,000. • Evaluation of H-Greedy and B-Greedy algorithm Figure 5: H-Greedy algorithm. Figure 6: B-Greedy algorithm. Figure 7: B-Greedy algorithm, zoomed in. 48 Figure 1: H/B, for total object number=1,000 49 Figure 2: H/B, for total object number=10,000 50 Figure 3: H/B, total object number=100,000 51 Figure 4: H/B, total object number=1,000,000 52 Figure 5: H-Greedy algorithm 53 Figure 6: B-Greedy algorithm 54 Figure 7: B-Greedy, Bandwidth magnified 55 Performance Comparison Table 1. Performance comparison of different algorithms in terms of various metrics. (Lower values represents better performance) 56 Prefetching under Different Constraints 57 H under Cache Size Constraint • Hit rate: Knapsack problem. • Objective: H _ contr(i) s i i C i Mapping: H contribution → item value size → item weight cache size constraint → knapsack capacity – Bandwidth: Not a problem 58 H/B-Greedy under Cache Size Constraint – H/B-Greedy : Knapsack problem. Objective: H i B _ contr(i) s i C i Mapping: H/B contribution → item value size → item weight cache size constraint → Knapsack capacity 59 H/B-Optimal under Cache Size Constraint – H/B-Optimal: Objective: H B s i C i Mapping? Why? Not a knapsack problem Objective not a sum of single object property Solution? Yes Polynomial algorithm? Open 60 H under Bandwidth Constraint Hit rate: Knapsack problem. Objective: H _ contr(i) B _ contr(i) BC i i Mapping: H contribution → item value B contribution → item weight Bandwidth constraint → knapsack capacity – Bandwidth: Not a problem 61 H/B-Greedy under Bandwidth Constraint – H/B-Greedy : Knapsack problem. Objective: Mapping: H/B contribution → item value B contribution → item weight Bandwidth constraint → Knapsack capacity 62 H/B-Optimal under Bandwidth Constraint – H/B-Optimal: Objective: H B _ contr(i ) BC B i Mapping? Why? Not a knapsack problem Objective not a sum of single object property Solution? Yes Polynomial algorithm? Open 63 Conclusions • Proposed Objective-Greedy prefetching algorithms, that are superior to Popularity, Good Fetch, APL, & Lifetime – Hit rate greedy (this is also optimal) – Bandwidth greedy (this is also optimal) – H/B greedy • The proposed algos need O(n) time • Proposed H/B-Optimal algorithm, that has O(n) expected time 64 Conclusions (contd.) • Simulations show significant gains of proposed algorithms over existing algorithms • H/B-greedy is almost as good as H/Boptimal, both O(n) time • Future work: – Consider power control, BW, latency – Address H/B-Optimal/C, H/B-Optimal/B – Determine when H/B attains global max 65 Thank you! 66
© Copyright 2026 Paperzz