Towards Latency: An Online Learning Mechanism for Caching Dynamic Query Content Michael Schaarschmidt Sidney Sussex College A dissertation submitted to the University of Cambridge in partial fulfilment of the requirements for the degree of Master of Philosophy in Advanced Computer Science (Research Project - Option B) University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom Email: [email protected] June 11, 2015 Declaration I Michael Schaarschmidt of Sidney Sussex College, being a candidate for the M.Phil in Advanced Computer Science, hereby declare that this report and the work described in it are my own work, unaided except as may be specified below, and that the report does not contain material that has already been used to any substantial extent for a comparable purpose. Total word count: 14,975 (excluding appendices A and B) Signed: Date: c This dissertation is copyright 2015 Michael Schaarschmidt. All trademarks used in this dissertation are hereby acknowledged. Acknowledgements I would like to express gratitude to my supervisor Dr. Eiko Yoneki for her comments, advice and encouragement throughout this project. I would further like to especially thank Felix Gessert for his advice and our discussions on practical caching issues. Additionally, I would like to thank Valentin Dalibard for his insights into Bayesian optimisation. Finally, I want to thank Dr. Damien Fay for his comments on online learning. Abstract This study investigates caching models of dynamic query content in distributed web infrastructures. Web performance is largely governed by latency and the number of round-trips required to retrieve content. It has also been established that latency is directly linked to user behaviour and satisfaction [1]. Recently, access latency has gained importance together with service abstraction in the data management space. Instead of having to manage a dedicated cluster of database servers on premises, applications can use highlyavailable and scalable database-as-a-service (DBaaS) platforms. These services typically provide a REST interface to a set of basic database operations [2]. A REST-ful approach enables the use of HTTP caching through browser caches, content delivery networks (CDNs), proxy caches and reverse proxy caches [3, 4, 5]. Such methods are used extensively to cache static content like JavaScript libraries or background images. However, caching result sets of database queries over an arbitrary number of dynamic objects in distributed infrastructures poses multiple challenges. First, any query-caching scheme needs to maintain consistency from the client’s perspective, i.e. a cache should not return stale content. From the server’s perspective, it is hard to predict an optimal expiration for a collection of objects that form a query result since each individual object is read and updated with arbitrary frequency. DBaaS providers thus generally do not cache their interactive content, resulting in noticable loading times when interacting with dynamic applications. This project introduces a comprehensive scheme for caching dynamic query results. The first component of this model is based upon the idea that there are multiple ways to represent and cache query results. Further, the model relies on a stochastic method to estimate optimal expiration times for dynamically changing content. Finally, an online learning model enables real-time decisions on the different cache representations. As a result, the model is able to provide imperceptible request latency and consistent reads for clients. Contents 1 Introduction 1 2 Background and Related Work 2.1 Web Caching . . . . . . . . . . . . . . . . . . . 2.1.1 Introduction to Web Caching . . . . . . 2.1.2 Previous Work . . . . . . . . . . . . . . 2.2 Bloom Filters . . . . . . . . . . . . . . . . . . . 2.3 Monte Carlo Methods . . . . . . . . . . . . . . 2.4 Machine Learning . . . . . . . . . . . . . . . . . 2.4.1 Reinforcement Learning . . . . . . . . . 2.4.2 Machine Learning in Data Management 3 Caching Queries 3.1 Introduction . . . . . . . . . . . . . . . . . . 3.1.1 The Latency Problem . . . . . . . . . 3.1.2 The Staleness Problem . . . . . . . . 3.1.3 Model Assumptions and Terminology 3.2 Caching Models for Queries . . . . . . . . . 3.2.1 Caching Object-Lists . . . . . . . . . 3.2.2 Caching Id-Lists . . . . . . . . . . . . 3.2.3 Matching Queries to Updates . . . . 3.2.4 When Not to Cache . . . . . . . . . . 3.3 Estimating Expirations . . . . . . . . . . . . 3.3.1 Approximating Poisson Processes . . 3.3.2 Write-Only Estimation . . . . . . . . 3.3.3 Dynamic Quantile Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 7 9 10 12 12 14 . . . . . . . . . . . . . 17 17 17 18 19 21 21 23 26 29 31 31 33 34 4 Online Learning 37 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Representation as an MDP . . . . . . . . . . . . . . . . . . . . 38 i 4.3 4.2.1 4.2.2 4.2.3 Belief 4.3.1 4.3.2 4.3.3 State and Action spaces . . . Decision Granularity . . . . . Reward Signals . . . . . . . . State Approximation . . . . . Convergence and Exploration Sampling Techniques . . . . . Hyperparameter Optimisation 5 Evaluation 5.1 Aims . . . . . . . . . . . . . . . . . 5.2 Simulation Framework . . . . . . . 5.2.1 Design and Implementation 5.2.2 Benchmark Configuration . 5.3 Comparing Execution Models . . . 5.3.1 Read-Dominant Workload . 5.3.2 Write-Dominant Workload . 5.4 Consistency and Invalidations . . . 5.4.1 Adjusting Quantiles . . . . . 5.4.2 Reducing Invalidation Load 5.5 Online Learning . . . . . . . . . . . 5.5.1 Learning Decisions . . . . . 5.5.2 Evaluating Trade-offs . . . . 5.5.3 Convergence and Stability . 6 Outlook and Conclusion 6.1 Summary and Conclusion . . . . 6.2 Future Work . . . . . . . . . . . . 6.2.1 Parsing Query Predicates 6.2.2 Unified Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 39 40 42 43 44 45 . . . . . . . . . . . . . . 49 49 50 50 53 53 53 57 59 59 61 63 63 65 67 . . . . 69 69 70 70 70 A Proofs 73 A.1 Minimum of Exponential Random Variables . . . . . . . . . . 73 B Additional Analysis 75 B.1 Impact of Invalidation Latency . . . . . . . . . . . . . . . . . 75 B.2 Monte Carlo Optimisation . . . . . . . . . . . . . . . . . . . . 76 ii List of Figures 1.1 Simplified caching architecture with clients in Europe bound by access latency to a backend server in the USA. . . . . . . . 3 2.1 2.2 2.3 Empty Bloom filter of length m. . . . . . . . . . . . Insertion of new element e into Bloom filter. . . . . Reinforcement learning: An agent takes actions and new states and rewards through his environment. . 3.1 Query matching architecture overview. A load balancer distributes requests from caches. An invalidation engine determines which query results are stale. Bloom filters can then be used to determine whether they are still cached at some CDN edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Topology of an Apache Storm invalidation pipeline. Afterimages of update operations are published to Storm spouts (data stream endpoints). They determine which bolt (stream processing node) holds the cached queries related to that update. Bolts evaluate the queries on the after-image to find which result sets are invalid and notify the DBaaS, which sends invalidations to the cache. . . . . . . . . . . . . . . . . . . . . 30 3.2 . . . . . . 9 . . . . . . 9 observes . . . . . . 12 4.1 Utility function example for response times. . . . . . . . . . . 46 5.1 5.2 Overview of the simulation architecture. . . . . . . . . . . . Cache hit rates as a function of average query selectivity on a mixture of 40% reads, 55% queries and 5% writes. . . . . . . Average query response times as a function of average query selectivity on a mixture of 40% reads, 55% queries and 5% writes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cache hit rates as a function of average query selectivity on a mixture of 40% reads, 55% queries and 5% writes under a uniform access distribution. . . . . . . . . . . . . . . . . . . 5.3 5.4 iii . 52 . 54 . 54 . 56 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 Average query response times as a function of average query selectivity on a mixture of 40% reads, 55% queries and 5% writes under a uniform acess distribution. . . . . . . . . . . . . Average query response times as a function of average query selectivity on a mixture of 25% reads, 25% queries and 50% writes. Estimation quantiles have been adjusted to p = 0.4 to account for the write-dominant workload. . . . . . . . . . . . . Cache hit rates as a function of average query selectivity on a mixture of 25% reads, 25% queries and 50% writes. . . . . . . Absolute number of stale reads on the write-dominant workload as a function of the quantile of the next expected write. . Cache hit rates on the write-dominant workload as a function of the quantile of the next expected write. . . . . . . . . . . . Cache hit rates on the write-dominant workload as a function of the quantile of the next expected write and compared to a static caching method. . . . . . . . . . . . . . . . . . . . . . . Invalidation loads for using the naive id-list approach versus dynamically marking frequently written objects as uncachable. Global utility as a function of operations performed. . . . . . . Behaviour of learning model versus random guessing under a change of workload mixture. . . . . . . . . . . . . . . . . . . . 57 58 58 60 60 62 62 66 68 B.1 Stale reads as a function of mean invalidation latency on 100,000 operations. Higher invalidation latency gives rise to more stale reads, as there is a bigger time window to retrieve stale content from the cache. Marking frequently written objects as uncachable reduces this effect. . . . . . . . . . . . . . . . . . . 75 iv List of Tables 3.1 3.2 3.3 3.4 3.5 3.6 3.7 5.1 5.2 5.3 5.4 5.5 Employee table. . . . . . . . . . . . . . . . . . . . . . . . . . CDN after caching Q1 as an object-list. . . . . . . . . . . . . CDN after caching Q1 , Q2 as object-lists. . . . . . . . . . . . CDN after caching Q1 as an id-list, before client has requested individual resources. . . . . . . . . . . . . . . . . . . . . . . CDN after client has requested all individual resources. . . . CDN after invalidation of id 1, id-list still matches query predicate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CDN after invalidation of id 1, id-list does not match query predicate any more. . . . . . . . . . . . . . . . . . . . . . . . . 22 . 23 . 23 . 24 . 24 . 25 . 25 Average overall request response times (ms) for learning model compared to random guessing and static decisions on a readdominant workload. . . . . . . . . . . . . . . . . . . . . . . . . Cache hit rates for learning model compared to random guessing and static decisions on a read-dominant workload. . . . . . Average query response times (ms) for learning model compared to random guessing on execution model on read-dominant workload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average request response times (ms) for learning model compared to random guessing on execution model on write-dominant workload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invalidation loads for learning model compared to random guessing on execution model on write-dominant workload. . . 63 63 64 65 65 B.1 Bayesian optimisation of optimal quantile p and maximum allowed ttl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 v vi Chapter 1 Introduction In recent years, cloud computing has allowed users to delegate the task of storing and managing data for web-based services. In particular, companies or individual developers can now completely withdraw from the costly task of setting up and maintaining dedicated database servers. Instead, they can utilise database-as-a-service (DBaaS) platforms that offer service level agreements on availability and performance, flexible pricing models and elastic configurations that can quickly allocate virtualised resources [2]. This project assesses how providers of such services can cache content that is updated by users interacting with dynamic applications, e.g. mobile applications or web sites. For instance, a social network application constantly refreshes in order to show the latest content from a user’s network. While interacting with such applications, users need to wait on the DBaaS server to deliver new data to their end devices. For an interactive application, imperceptible loading times (ideally below 100 milliseconds) are desirable so the user’s experience is not interrupted. However, this often times proves problematic if clients and DBaaS servers are located in different geographic regions.1 Hence, service providers aim to deliver as much content as possible through local cache servers. A good example of an effective local cache server 1 A single round-trip between Europe and the United States takes around 170 milliseconds. [6] 1 are content delivery networks (CDNs). CDNs are often used to deliver static background images and style sheets. Caching dynamic content can nevertheless prove difficult because some content may be updated very frequently, e.g. every few seconds. This is problematic because on every update to any part of the content, the DBaaS provider needs to determine which entries to delete from caches, which is a computationally expensive task for large databases. Further, in order to prevent clients from reading stale content the DBaaS would have to permanently send out requests to delete old cached content. These issues of maintaining a consistent view of the data for the client while not blocking performance of the backend server generally prevent DBaaS providers from caching volatile content. Consequently, many applications suffer from long loading times. This study proposes a comprehensive caching scheme for caching dynamic query content. Figure 1.1 provides an abstract view on the suggested caching infrastructure. There are clients in one geographic region and a DBaaS in another geographic region. In a drastically simplified network topology, clients access the geographically closest CDN server to access content, thus minimising latency. By interacting with their applications, clients also continuously update content, e.g. by posting a comment in a social feed. This project thus deals with mechanisms that allow the DBaaS to cache the content of queries that change on the scale of seconds while ensuring high consistency at the client. On a high level, this will be achieved by monitoring access metrics like the frequency of incoming updates, thus allowing for a stochastic view on optimal cache expirations. Further insights with regard to the semantics of caching lead to different caching models. Acting on these results, a machine learning module will provide an effective model for online decisions on incoming queries. Through the combination of semantic insights about caching, stochastic analysis and machine learning, I will therefore present the first comprehensive web caching scheme for highly volatile query content. 2 Clients (end devices) Clients in Europe Requests resources Issues queries CDN edge in Europe Cache Returns resources and requests invalidations Forwards requests DBaaS Updates Requests decision and ttl Amazon EC2 US region Queries Learner Estimates rewards Access metrics MongoDB Figure 1.1: Simplified caching architecture with clients in Europe bound by access latency to a backend server in the USA. 3 In summary, this work makes the following contributions: • A comprehensive scheme for caching dynamic query results, thus enabling interactive applications with drastically reduced response times. • A stochastic method to estimate optimal expiration times for dynamically changing query results. • An online learning mechanism that can adapt caching models to changing request loads. • A dedicated Monte Carlo simulation framework which can be used to analyse various properties of query processing. The structure of my dissertation is as follows: Chapter 2 provides a brief overview on REST-based web caching, Bloom filters, Monte Carlo methods and reinforcement learning as well as on related work on these topics. Chapter 3 introduces the concept of query caching and the implications of different cache representations. Chapter 4 proposes a machine learning model that provides online decisions on these representations. In chapter 5, I first explain the implementation of my simulation framework before evaluating different cache representations and the learning model. Finally, chapter 6 summarises my findings and concludes with an outlook on future work. 4 Chapter 2 Background and Related Work 2.1 2.1.1 Web Caching Introduction to Web Caching In this chapter, I provide an overview on some essential concepts concerning web caching, Bloom filters and machine learning. I also supply recent examples of work related to these concepts. In doing so, I assume the reader to be familiar with the basic ideas of web protocols, database management and probability theory. This section begins with an introduction to web (HTTP) caching. The fundamental challenge of web caching is consistency, i.e. the requirement that content read from a cache be up-to date. For consistency purposes, there are essentially two types of caches. Expiration-based caches like browser caches, forward proxy caches or ISP caches control consistency through freshness and validation. Freshness is the duration for which a cached copy is considered fresh and can be controlled through “max-age” or “expires” HTTP headers. For instance, if a cached object expires after one minute, then there is a clear one minute upper limit on how long a client may see old content if the original content is modified. Expiration-based 5 caches can also validate their content by using an “If-modified” header in a refresh request. On the other hand, invalidation-based caches like content delivery networks or reverse proxy caches are server-controlled caches. That is, the origin server of the content can actively control consistency by deleting (invalidating) content from the cache through a specific HTTP request [7, 8]. A reverse proxy is usually located at the network of the origin server and can be used to hide the structure of the internal network, reduce and distribute load from incoming requests and cache content from the origin server. Note that reverse proxies, due to their location at the origin server, do not aid in mitigating latency caused by access from a geographically distant client. Thus, this project deals primarily with invalidation-based mechanics using the example of CDNs. Content delivery networks distribute content through a globally distributed network of cache servers (or edge servers). Requests from clients are usually routed to the closest edge server to minimise access latency. There are various types of CDN architectures, network topologies and use cases. CDNs can be used to cache complete websites in the function of a proxy cache, to synchronise and deliver streaming content or to cache the embedded static parts (e.g. stylesheets) of dynamic websites. For this project, the most relevant feature of CDNs is their invalidation mechanism. The origin server generally sets an expiration for the cached content. However, the origin server can also ask the CDN to remove the content through an invalidation request, which means the origin server can actively mitigate reads of stale content from the cache. Clients can also add revalidation headers to their request if they do not want to risk reading stale cache content. This instructs the CDN to request the latest content version from the origin server. The key point here is that this does not enforce strong consistency because the cache does not know about updates at the DBaaS immediately. Instead, there is the notion of eventual consistency, i.e. consistency requirements are relaxed for higher performance (more cache hits, less requests sent to the origin server) [9]. Even if the origin server sends an invalidation to the CDN directly after an update, there is a small time window until the invalidation is 6 completed in which clients can read stale content. A read is only guaranteed to be consistent if it adds a revalidation header, thus excluding cached content and increasing load at the origin server. A large part of this work concerns the mechanisms of invalidation for dynamic query content and their implication for overall system performance. 2.1.2 Previous Work In this section, I briefly survey previous and related work on caching. First, this project relies upon my own previous work on expiration based caching [6] and on the architecture of scalable cloud-databases [10, 11]. Gessert and I have proposed a comprehensive scheme for leveraging global HTTP caching infrastructures. More specifically, we have introduced the Cache Sketch, a Bloom filter-based representation of database records that is used to enable tunable consistency and performance in web-caching. Throughout this dissertation, I will repeatedly point towards specific aspects of this work (and other related work) in order to clarify my analysis. The primary focus of our previous work was to introduce a proof of concept for dynamic caching of single database records. The contribution of this project is to advance these ideas into a model of caching full query results as well as adding an online learning component for decision making on query-execution. To this end, Monte Carlo methods are employed to analyse the performance of various configurations. Monte Carlo simulations have been used previously to help quantify eventual consistency measures [12, 13, 14]. Recently, Huang et al. have provided an in-depth analysis of a large scale productive caching infrastructure by looking at Facebook’s photo cache [15, 16]. Even though this example contains some photo-specific problems (resizing), it still contains relevant insights. Pictures are essentially read-only content and the challenge in an infrastructure at the scale of Facebook’s photo cache lies in the huge data volume. Nevertheless, their work can provide an understanding of typical workloads and achievable cache hit rates. Apart from this recent work, there is an extensive body of research on the nature 7 of internet content delivery systems [17, 18] and their workloads [19]. Recent research has also looked into content delivery networks (CDN) and their role in dealing with sudden popularity (“flash crowds”) of social media content as well as with geographically distributed workloads [20, 21, 22]. This work aims to provide low-latency through exploiting existing HTTP caching infrastructures. Another popular approach that however requires additional infrastructure is geo-replication. Instead of caching data on geographically distributed edges of a CDN infrastructure, the database system itself is globally distributed [23, 24]. A primary example of this is Google’s Spanner [25]. Data is replicated across datacenters and manages serialisation of distributed transactions through globally meaningful commit timestamps. This enables globally-consistent reads across the database for a given timestamp. The main performance issue stems from the fact that synchronisation between data centers is costly, as it is bound by round-trip delay time between geographic regions. Finally, there have been some previous efforts into scalable query caching. Garrod et al. have achieved high cache hit rates by using proxy servers with a distributed consistency management model based on a publish/subscribe invalidation architecture [26]. There have also been some efforts into adaptive time-to-live (ttl) estimation of web-search results [27]. This work separates itself from previous work in multiple aspects. First, it uses existing HTTP infrastructure and does not require additional dedicated servers for caching. Employing stochastic models, this work provides a record-level analysis of query results to provide much more fine-grained ttl estimates. Furthermore, the online learning model can achieve tunable trade-offs between average query response time, consistency and server load by changing execution models for queries at runtime. 8 2.2 Bloom Filters Bloom filters are space-efficient probabilistic data structures that allow membership queries on sets with a certain false positive rate [28]. A Bloom filter represents a set S of n elements through a bit array of length m. It also requires k independent hash functions h1 , . . . , hk with range 1, . . . , m that map each element uniformly to a random index of the bit array. To save an element s ∈ S to the Bloom filter, all k hash functions are computed independently and the appropriate indices in the bit array are set to 1 (and stay 1 if they were already set from another insert), as seen in figures 2.1 and 2.2. 0 0 0 0 0 0 0 1 0 m Figure 2.1: Empty Bloom filter of length m. Element e h1(e) 1 0 h2(e) 0 1 0 1 h3(e) 1 0 0 m Figure 2.2: Insertion of new element e into Bloom filter. A membership query can then be performed by again computing the hash functions and looking up if all k result indices are set to 1. This means that a false positive occurs through hash collisions if inserts from other elements have already set the relevant bits. An extension of this concept is the counting Bloom filter, which has counters instead of single bits, thus also enabling the deletion of elements through decreasing the counter (which could cause a 9 false negative with a single bit). It can then be shown that the false positive rate can be approximated as follows [29]: k kn 1 kn f = 1 − (1 − ≈ 1 − e− m . m (2.1) The implication of being able to determine the false positive rate as a function of expected objects, length m and hash functions is that Bloom filters are precisely tunable, i.e. the size can be controlled according to the false positive rate. Bloom filters have found particular use in networking applications, as they can be transferred quickly due to their compact representation [30, 31, 32]. 2.3 Monte Carlo Methods Monte Carle methods are a set of computational techniques that are used to approximate distributions in experiments through repeated random sampling. Monte Carlo simulations are widely employed in the physical sciences to model and understand the behaviour of probabilistic systems. They essentially rely on the law of large numbers, i.e. the expectation that the sample mean over a sufficient number of inputs will approximate the actual mean of the target distribution [33]. There are three central components to Monte Carlo simulations [34]: (1) A known input distribution for the system. (2) Random sampling from the input distribution and simulation of the system and its conditions of interest under the sampled inputs. (3) Numerical evaluation of the aggregated results. A generic approach to Monte Carlo simulation is the construction of a Markov Chain that converges to a target density equal to the distribution of interest. 10 This is particularly relevant to the simulation of complex multivariate distributions. Consequently, there is an extensive body of research on sampling methods, notably Gibbs sampling and the Metropolis-Hastings algorithm [35, 36]. Monte Carlo simulation is also useful in the analysis of distributed systems and caching infrastructures. In particular, Monte Carlo simulation of access and latency distributions enables detailed analysis of caching behaviour, as it can quantify the impact of small changes in latency and workload on performance. Fortunately, simulation of database workloads can be achieved by drawing a key for a database entry to access from a univariate discrete distribution. An easy way to do this is the inverse integral transform method, which will be introduced briefly [37]. Consider a discrete random variable X to sample from and its probability mass function fX (x) = P r(X = x) = pj , j = 1, 2, . . . , X pj = 1, (2.2) j as well as its cumulative mass function P r(X ≤ Xi ) ≡ F (Xi ) = p1 + p2 + . . . + pj . (2.3) The inverse then takes the form F −1 (u) = Xi if p1 + p2 + . . . + pj−1 ≤ u ≤ p1 + p2 + . . . + pj . (2.4) Hence, the discrete distribution can be sampled by drawing a sample U from a distribution uniform on (0, 1) and then computing the inverse F −1 (U ) , as described by Chib [38]. It thus follows that one can sample Xi with its probability pi because P r(F −1 (U ) = Xi ) = P r(p1 + . . . + pj−1 ≤ U ≤ p1 + . . . + pj ) = pj . (2.5) In the Monte Carlo simulation framework, inverse integral transform is used because it is computationally inexpensive and provides good accuracy. 11 2.4 2.4.1 Machine Learning Reinforcement Learning Reinforcement learning (RL) is a machine learning technique that is characterised by software agents that interact with an environment and learn optimal behaviour through rewards on the actions they take [39]. Initially, the agent does not know how its actions change its environment and thus has to explore the space of available actions (as schematically depicted in figure 2.3). Hence, RL does not require an explicit analytical model of the environment. Agent Reward State Action Environment Figure 2.3: Reinforcement learning: An agent takes actions and observes new states and rewards through his environment. More precisely, RL is a form of sequential decision making. The goal of the agent is to select actions that maximise the sum of all future rewards. A reward is a scalar feedback value the agent receives after taking an action. Rewards can be stochastic and delayed, thus making it harder for the agent to reason about the consequences of his actions. For instance, a single move in a board game during the beginning of a match might have consequences that only become apparent after one player wins. Variations of RL have been used in various applications, notably navigation in robotics [40, 41] and complex 12 board games [42, 43, 44, 45]. Formally, RL problems can be understood as Markov decision processes (MDPs). A finite MDP has four elements [46, 39]: (1) A set of states S. (2) A set of actions A. (3) For a given pair of state and action (s, a) at some point in time t, a transition probability of possible next states s0 is a 0 Pss 0 = P r{st+1 = s |st = s, at = a}. (4) The associated expected reward for a transition from s to s0 trough a: a 0 Rss 0 = E{rr+1 |st = s, at = a, ss+1 = s }. A policy then maps states to actions that presumably maximise rewards. In general, RL techniques aim to find optimal policies for a given MDP by iteratively improving upon their current estimates for state and action pairs as they observe rewards. A popular RL method is Q-learning, which can learn optimal policies by comparing expected cumulative rewards (Q-values) in environments with stochastic transitions and rewards [47]. This is achieved by updating a function Q : S × A → R: h i Qt+1 (st , at ) ← Qt (st , at ) + α rt+1 + γ max Qt (st+1 , a) − Qt (st , at ) a (2.6) Intuitively, initially fixed Q-values are adjusted by combining observed reward rt+1 after taking a transition and selection of the action that is estimated to maximise future rewards. Updates are parametrised through a learning rate α and a discount factor γ that prohibits infinite rewards in state-action loops. A central component of RL is the trade-off between exploitation and exploration. During learning, the agent needs to explore his environment by trying out actions that are non-optimal under his current policy to find whether these state-action sequences lead to higher overall rewards than following his current policy. Typically, this exploration rate is decreased over time so the agent eventually primarily exploits its found policy. 13 Reinforcement learning (RL) in small (finite) state and action spaces is a well understood application of finite Markov decision processes [39, 48, 47, 49]. For large state and action spaces, policies cannot be expressed as simple lookup tables of actions and associated rewards on transitions. Hence, function approximators are frequently employed to estimate rewards [50, 42]. Specifically, the rise of deep neural networks has recently inspired novel research, as neural networks have been successfully applied to approximate delayed rewards in complex noisy environments [51, 52]. Other RL approaches from Bayesian learning employ Gaussian processes to estimate rewards [53, 54]. 2.4.2 Machine Learning in Data Management In recent years, many scientific disciplines have begun to investigate machine learning methods as a new tool for research in their domains. In database management and cloud computing, a particularly interesting problem is the question of how to adapt behaviour to changing workloads. Traditional rulebased or threshold-approaches on resource allocation in compute clusters (e.g. provision a new server if load is over a given percentage of capacity) can be replaced by online learning strategies [55, 56, 57]. Such improvements can be practically achieved by implementing a middleware control layer that tracks data flow in real-time on top of web-based services. For instance, Angel et al. have demonstrated how to provide throughput guaruantees for multitenancy clusters by network request header inspection [58]. Alici et al. have explored machine learning based expiration estimation strategies for search engine results that however rely on offline training data to build a set of features and cannot adapt to highly dynamic workloads [27]. This is because their approach works on a much larger time-scale of months, whereas this work provides a learning model that recognises workload changes in a matter of minutes. 14 The advantage of the middleware approach is that it allows for a more generic and transferable learning process, as opposed to interferring with the underlying application to achieve more control over specific configurations. In this project, I also opt for a middleware approach and treat the database as a query execution service to the learning model. This way, the concept is not limited to specific query languages or database paradigms but relies only on properties of request distributions that are interpreted as stochastic processes. 15 16 Chapter 3 Caching Queries 3.1 Introduction This chapter contains an in-depth description of the query caching scheme. First, the challenges of caching dynamic content are discussed in detail. Next, different representations for caching queries are suggested. Further, the problem of determining which queries are invalidated by an update is considered. Finally, a stochastic method to determine optimal expirations for query results is introduced. 3.1.1 The Latency Problem The framework of assumptions is a database-as-a-service provider exposing its API through a HTTP/REST interface, e.g. Facebook’s popular parse platform [59]. Understanding the structure of modern web or mobile application helps to see the importance of latency. In general, there are two aspects to the performance of a web-based application. First, there is an initial loading period when the browser has to request all external resources and build the Document Object Model of the application. The duration of this so called critical rendering path depends on the number of critical exter17 nal resources, their size and the time it takes to fetch them from a DBaaS, a CDN or a proxy cache. Loading times hence depend on the number of round-trips and on the round-trip latency. Static resources like JavaScript libraries or background images are thus cached on all levels of the web-caching hierarchy. Second, there is dynamic and interactive content that has to be requested by the end device while the user is interacting with the application. Single-page applications are a typical form of this interaction. On a single-page application, all navigation happens within a single website that is never left but dynamically changed depending on user actions [60]. Considering mobile applications running with DBaaS platforms, application logic is often executed on the client side (smart client), whereas the server is primarily a data management service. Consequently, user experience critically depends on low latencies of all interactions with the DBaaS, which includes minimising the number of geographically-bound latency round-trips as well as maximising cache hits in the caching hierarchy. 3.1.2 The Staleness Problem The latency problem cannot be solved by simply pushing as much dynamic content as possible into various layers of the caching hierarchy. Without further measures, writes would continuously flush the caches. This creates problems for both clients and servers. First, determining which objects are potentially alive in which layer of the caching hierarchy and sending invalidations requests creates load on the DBaaS. Further, every invalidation creates a potential stale read for the client. A stale read occurs in the following situation: • A client sends an update on some object and gets an acknowledgement at some point t0 for version vw . • At some later point in time t1 , a client requests the same object. • The cache returns the object with some version vr . 18 • If the write-acknowledged version vw is newer than the version vr read from the cache, the read is stale. How can the cache return an older version if the server already acknowledged the write of a newer version? This is because invalidation is generally an asynchronous operation. The DBaaS executes an update and waits for a write-acknowledgement from the database. It then sends an acknowledgement of the write back to the client and an invalidation to the appropriate caching layers. Blocking on invalidations is not feasible because an invalidation could get lost (e.g. network partition) and the DBaaS would lose availability. Further, a single cached object might have to be invalidated over multiple geographic locations, thus potentially incurring multiple expensive round-trips. This work thus aims for a best-effort view on eventual consistency. First, if the cached content expires before a write, there can be no stale read. Second, even if a cached item has not expired, there cannot be a stale read if the invalidation is fast enough and there is enough time between an update and subsequent read. There is an inherent trade-off between providing results both in a consistent and timely manner. This notion of eventual consistency has been thoroughly investigated by Bailis et al. in their seminal work on probabilistically bounded staleness [12, 61, 62, 13]. In particular, they were able to provide expected bounds on staleness for certain request distributions. In summary, the problem of staleness and invalidation load makes it prohibitively expensive to cache dynamic query content. To the best of my knowledge, DBaaS-providers and other web-services thus refrain from caching their interactive content. 3.1.3 Model Assumptions and Terminology Previous work has proposed a cache-coherent scheme to cache volatile objects [6], as summarized in chapter 2. This work dealt exclusively with simple create, read and update operations on individual objects. However, at least regarding the content of the requests, this was rather a proof-of-concept, 19 as actual queries usually do not just request single database objects. This project thus turns to investigate query-specific problems. The first insight of the query-caching scheme is to acknowledge that there are multiple ways to execute queries and represent query results. Before introducing these different representations, it is worth discussing some model assumptions. As explained in the introduction on web-caching, there is generally a whole hierarchy of expiration and invalidation-based caches. This work concentrates on the specific interaction of clients and servers with an invalidation-based cache, i.e. a CDN. When I use the term “the cache” in the remainder of this work, I am referring to an instance of an invalidation-based cache. Note that a CDN has multiple points of presence, ideally one in all major geographic regions. It is a valid reduction to investigate caching behaviour on a single edge-server. First, for a given client, requests will usually be routed to the same edgeserver in a CDN infrastructure, i.e. the one that minimises (geographically bounded) round-trip latency. Second, consider the hypothetical case of a client whose HTTP requests are randomly routed to one of multiple edges. A write operation still only causes a single invalidation request to be sent out by the DBaaS. This is because the DBaaS does not have to send invalidation requests to every edge-server of a CDN. Instead, an invalidation request is only sent to the closest CDN edge. The invalidation can then be distributed through a bimodal multicasting protocol [63, 64]. The important insight here is that the invalidation load for the DBaaS does not depend on the number of CDN edges (and grows linearly for reverse proxies). Similarly, routing queries to different edge-servers will lead to a lower cache hit rate on the individual edges, which can easily be simulated on a single cache by adjusting query parameters. I thus consider the abstraction of using a single invalidationbased cache to be feasible for the analysis of query-caching behaviour. Furthermore, the term “database object” needs to be clarified. An object refers to a single entry in the database, which can be a single row in a relational model, a JSON-style document in a document database or a serialised string in a key-value store. The usage of the term “object” is primarily mo20 tivated through the fact that the DBaaS server represents database entries as REST-ful resources after retrieving them from the database. This illustrates the point that the proposed caching scheme is independent from the database employed by the DBaaS. Naturally, the performance of the system will vary depending on whether the chosen database matches the requirements of the workload. MongoDB, a wide-spread document database that is based on JSON-style documents as the basic record [65] is used in the evaluation. MongoDB organises documents in collections (roughly equivalent to a relational table) and is popular for its scalability and flexibility to store schema-free data. Finally, this work assumes a large cache so the performance does not depend on cache size or eviction algorithms. It is clear that a smaller cache leads to systematically lower cache hit rates for certain request distributions. Incorporating this additional degree of freedom is hence not particularly interesting to this study. Nevertheless, the impact of a limited cache size will be factored into the discussion of uncachable objects. 3.2 3.2.1 Caching Models for Queries Caching Object-Lists The first query-caching model is the naive approach of caching query results as complete collections of result-objects, i.e. a single entry in the cache maps a query to its result. The processing flow of this model is fairly straightforward. A client issues a query that initially reaches the closest CDN edge. In the case of a cache miss, the CDN forwards the request to the DBaaS, which evaluates the query. It then estimates a time-to-live for the query result, the mechanics of which will be discussed later. The result is then returned to the CDN, added to the cache and finally returned to the client. For all subsequent requests with the same query, a single round-trip to the CDN is sufficient to retrieve the whole query result, as long as the result has not expired or has been invalidated. The CDN simply checks the hashes of incoming queries for a match in its table of cache entries. In terms of min21 imising response time, this is optimal from the client’s perspective. Note that the CDN is agnostic towards the content of its entries. It cannot recognise that some query’s result set is a subset of another cached query result. That would both require semantic insights into the nature of the queries as well as knowledge about the structure of the database’s values. This would essentially require a geo-replicated database to locally validate the similarity of queries, which is a different caching paradigm. However, this project specifically aims to exploit readily available HTTP caching infrastructure that does not require multiple dedicated DBaaS-server locations. This model can be illustrated with a simple example. For ease of reading, I use a relational table and a SQL query, which I will assume the reader to be familiar with. As pointed out above, even though DBaaS-queries are typically abstracted to short method calls and often query NoSQL databases, the mechanics of my caching scheme do not rely on a specific database or query paradigm. Consider a drastically simplified employee-table that only contains an id as its primary key and a salary, as seen in table 3.1. Id Salary 1 2 3 4 20,000 25,000 30,000 50,000 Table 3.1: Employee table. A query Q1 now selects all employees with salaries under a certain limit: SELECT * FROM employee WHERE salary < 30000 Consequently, the CDN will store a mapping from Q1 to the result set, as seen in table 3.2. The cache is conceptualised a simple hash table, i.e. the key Q1 refers to its hash. If another similar query Q2 is evaluated, the CDN blindly caches intersecting results separately: SELECT * FROM employee WHERE salary > 22000 22 Key Value Q1 {{id : 1, salary : 20, 000}, {id : 2, salary : 25, 000}} Table 3.2: CDN after caching Q1 as an object-list. The query Q2 will now leave the cache in the state seen in table 3.3. Key Value Q1 Q2 {{id : 1, salary : 20, 000}, {id : 2, salary : 25, 000}} {{id : 2, salary : 25, 000}, {id : 3, salary : 30, 000}, {id : 4, salary : 50, 000}} Table 3.3: CDN after caching Q1 , Q2 as object-lists. Now consider a write on some object that is part of a cached query result. Since the query result was cached as one big object (i.e. a single list of database objects), the whole result is invalidated. In the example, both entries for Q1 and Q2 are removed from the cache if the object with id 2 is updated. Depending on the workload, this can lead to drastically reduced cache hit rates, as a single update could empty the whole cache. However, if result sets of queries have mostly empty intersections, writes invalidate fewer results and increase cache-performance. Note that determining the result sets that need to be invalidated is a potentially expensive task on its own, which will be discussed later in this chapter. 3.2.2 Caching Id-Lists An alternative option to caching query results is the id-list model. Assuming an empty cache, the first difference of this model from the object-list approach is the actual query execution. Instead of executing the query in full and retrieving complete database objects, the query is intentionally executed to only return the ids (or keys) of matching objects. This can improve query cost, as the query can potentially be executed as a so called covered query. A query is covered if an index covers it: if all fields requested in the query are part of an index and all result fields are also part of that index, the query 23 can be executed by querying the index. The index is typically located in the RAM of the database server and thus significantly faster than disk reads. This is an established technique for query optimisation and routinely offered by databases [66]. The DBaaS then returns this list of ids to the CDN, which creates an entry for it. Reusing the previous example with an initially empty cache, the cache state is now in the state seen in table 3.4 after executing Q1 . Key Value Q1 {{id : 1}, {id : 2}} Table 3.4: CDN after caching Q1 as an id-list, before client has requested individual resources. Finally, the id-list is passed back to the client. Note that this already incurred a full round-trip to the DBaaS without delivering any actual result-objects. The client then starts requesting the individual REST resources identified by their ids in the list of results, leaving the CDN in the state shown in table 3.5. Key Value Q1 1 2 {{id : 1}, {id : 2}} {{id : 1, salary : 20, 000} {{id : 2, salary : 25, 000} Table 3.5: CDN after client has requested all individual resources. In the worst case, this incurs another full round-trip to the DBaaS for every individual resource. How is this model useful if it can involve so many expensive round-trips? There are two potential sources of cache hits. First, every time the client requests one of the resources from the id-list from the CDN, there is a potential cache hit on that resource. This is because the cache is potentially “prewarmed” by other queries with intersecting result sets. Second, if the client issues the same query again, the CDN can return the id-list (which is a separate cache entry), saving a round-trip to the DBaaS. Furthermore, the client does not necessarily request the individual resources sequentially, but will usually do so in parallel. I will later explore 24 the impact of parallel connections as part of the cost of caching queries in the online learning model. In a best case scenario, the client thus needs one round-trip to fetch the id-list from the CDN and one round-trip to fetch the (also cached) individual resources in parallel from the CDN. This seems an unintuitive choice, since the lower-bound on latency is cleary higher than the object-list model, which only needs one round-trip to the CDN to look up the query result in its best case. The advantage of the id-list model becomes more apparent upon consideration of its invalidation mechanics. In the framework of the example, Q1 selected for employees with salaries below 30,000. Now consider an update that changes the salary of employee 1 from 20,000 to 21,000. The DBaaS now needs to invalidate resource 1 in the CDN but it does not need to invalidate the id-list, as the same objects still match the query-predicate. In the example, the invalidation of id 1 would leave the CDN in the state of table 3.6. Key Value Q1 2 {{id : 1}, {id : 2}} {{id : 2, salary : 25, 000} Table 3.6: CDN after invalidation of id 1, id-list still matches query predicate. The point of this model is that the id-list contains the information which objects match the query-predicate, whereas the concrete objects are cached separately. The key advantage compared to the object-list model is thus that a single update only invalidates entries from the cache that have actually changed, as opposed to invalidating a whole list of objects. If object 1 had been updated to a salary over 30,000, this would have invalidated both the id-list and the resource, as seen in table 3.7. Key Value 2 {{id : 2, salary : 25, 000} Table 3.7: CDN after invalidation of id 1, id-list does not match query predicate any more. 25 Even after invalidating both id-list and individual resource, the cached resource 2 can still cause cache hits for other overlapping queries. It is not hard to see how highly intersecting result sets can increase cache hit rates for the overall system in this model. Note that in the new HTTP/2 standard, multiplexing and server push can make the id-list an optimal choice for all workloads, since round-trips would be the same as for the object-list model [67]. So far, I have not explained how the DBaaS server detects if an update invalidates a query-predicate. In the following sections, I will outline how the task of query invalidation is a key factor in the performance of the caching scheme. 3.2.3 Matching Queries to Updates Matching queries to updates is necessary to determine which result sets are not valid any more. I begin by describing the invalidation mechanism for caching individual volatile objects, as described by Gessert et al. [6]. A key point to understanding the invalidation process is remembering the distributed nature of a DBaaS-infrastructure. In principle, there are both multiple cache edges as well as arbitrarily many instances of the DBaaS middleware server, interfaced for instance through an elastic load balancer. This has the following implication to invalidation: In a system with more than one DBaaS server, each individual server does not have sufficient information for invalidation. An invalidation is only necessary if the object is cached, i.e. if it was read from the DBaaS previously. The problem is that reads and writes may be be processed by difference server instances. That means a server receiving a write request cannot know on its own whether the object might be cached from a read to another server. Hence, there needs to be a central lookup-service that keeps track of cached objects and their expirations. Any centralised service is a potential performance bottleneck. Gessert et al. found an efficient solution by using Redis-backed Bloom filters [10, 68]. Every time an object is read and the DBaaS decides to cache with a certain 26 time-to-live estimation, it reports the key of the object and the ttl to the Bloom filter. The Bloom filter is implemented to always keep track of the longest expiration. If different DBaaS servers have different local estimates of an optimal expiration, the Bloom filter keeps the ttl of the longest absolute expiration in the future. Thus, whenever an update is processed, the DBaaS can query the Bloom filter. If it has an entry for the key, the object is potentially cached at some edge-server of the CDN. The DBaaS then deletes the object from the Bloom filter and requests an invalidation. If the object has already expired from the cache, it also has expired from the Bloom filter, since all estimated expirations are reported. This way, invalidations are only requested when they are actually necessary, with some small false positive rate through the Bloom filter. This approach cannot be used for matching updates to queries because the relation between updates and affected queries is one-to-many. The DBaaS has no way of knowing which entries to query from the Bloom filter on an update, so it has to try to match updated objects to result sets. In principle, the DBaaS can hold the id-lists of all cached query results in memory, which is suitable for Monte Carlo simulations. For practical purposes, a distributed stream processing engine like Apache Storm [69] might be appropriate for query matching. For every update, after images of the write operation can be streamed into Storm, which evaluates them against queries and result sets. If the after image of a write does not match result sets containing the changed object, the queries belonging to the respective result sets need to be invalidated from the cache, as illustrated in figure 3.2. Practically, a load balancer routes requests to various instances of the DBaaS server. Each instance communicates with the database cluster to execute queries and updates. On every update, an invalidation engine is consulted to determine which cached query results have become stale. Finally, a central Bloom filter service is consulted to look up if the stale result is still potentially cached before sending out an invalidation. An overview of this architecture can be found in figure 3.1. The implementation of the matching algorithm will depend on the database-paradigm. For instance, a 27 Clients (end devices) requests CDN edge 1 CDN edge ... CDN edge n Distributes requests in network Load balancer DBaaS instance 1 DBaaS instance ... Look up records DBaaS instance n Find stale results 0 1 0 1 0 Central Bloom filter Query matching engine DB cluster Figure 3.1: Query matching architecture overview. A load balancer distributes requests from caches. An invalidation engine determines which query results are stale. Bloom filters can then be used to determine whether they are still cached at some CDN edge. 28 document database like MongoDB represents objects as JSON-documents. There are specific libraries to evaluate MongoDB queries on JSON-documents [70], thus enabling the matching of after-images to queries. Instead of going into more detail on how to achieve query-matching for specific databases, some high-level comments on the role of invalidation in the caching-scheme are necessary. Generally, any matching system will only be able to handle a certain throughput. A possible perspective on this limit would to be to consider the matching throughput a resource that needs to be leveraged optimally for overall performance. This naturally leads to the question of when it is not feasible to cache an object or query, which I will briefly discuss in the following section. 3.2.4 When Not to Cache From the client’s perspective, reading a cached copy is naturally desirable. Nevertheless, there are situations when it is impractical for the DBaaS to cache objects. Entries that are (almost) exclusively written should not be cached. This would increase the risk of stale reads and importantly cause a high invalidation load. This notion of observing write and read frequencies is employed in the estimation of expiration for query results in the next section. However, there is another relevant aspect to the cost of caching. The invalidation of a single resource comes at predictable computational cost, i.e. a (constant time) Bloom filter lookup to determine whether the resource is cached. In contrast, the matching cost of determining which queries need invalidation is practically unbounded, as an object might be part of arbitrarily many cached query results. This creates another decision problem for the DBaaS. It does not only need to decide which caching model to use for each query, it also needs to make economical decisions not to cache some queries depending on the matching cost. 29 After-image of update Update spout Matching bolt Update spout Evaluate cached queries on after-image for changed result Determine bolt with relevant queries Matching bolt Output invalidated queries DBaaS Sends invalidations Figure 3.2: Topology of an Apache Storm invalidation pipeline. After-images of update operations are published to Storm spouts (data stream endpoints). They determine which bolt (stream processing node) holds the cached queries related to that update. Bolts evaluate the queries on the after-image to find which result sets are invalid and notify the DBaaS, which sends invalidations to the cache. 30 3.3 3.3.1 Estimating Expirations Approximating Poisson Processes I now turn to discussing the estimation model for cache expirations. For now, the problem of estimating an optimal ttl for a query result is treated separately from the question of whether to represent the query result as an object-list or an id-list. Remember, the goal of estimating expirations for queries is to find an optimal trade-off between invalidation load and cache hits while also minimising stale reads. Ideally, a cached item will expire right before an update at the DBaaS so there is no matching cost. My approach to estimate durations for result sets of queries tries to approximate query behaviour through Poisson processes. Poisson processes count the occurrences of events in time intervals and are characterised by an arrival rate λ and a time interval t. For a Poisson process, the interarrival times of events have an exponential cumulative distribution function (CDF), i.e. each of the identically and independently distributed random variables Xi has the cumulative density F (x; λ) = 1 − e(−λx) f or x ≥ 0 (3.1) and mean 1/λ. The probability for a number of arrivals n in some interval (0, t] is then given by the Poisson probability mass function (PMF) [71]: pN (t) (n) = (λt)n e−λt n! (3.2) The DBaaS can only approximate the λ of the write-process. For each database entry, the DBaaS can track the rate of incoming writes λw in some time window t. The expected time of the next write is then 1/λw . However, the Poisson process of reads and queries is only partially observable, as the DBaaS only receives cache misses on queries and reads. In previous work, expirations for single records were estimated by comparing miss rates and 31 write rates to compute quantiles on write probabilities [6]. How can expirations for complete result sets be estimated? The result set of a query Q of cardinality n can be conceptualised as a set of independent exponentially distributed random variables Xi , . . . , Xn with different write rate parameters λw1 , . . . , λwn . Estimating the expected time-to-live before one of the objects is written requires a distribution that models the minimum to the next write, i.e. min{X1 , . . . , Xn }, which is again exponentially distributed (proof in appendix A): min{X1 , . . . , Xn } ∼ exponential n X ! λi (3.3) i=0 Hence, the DBaaS can simply compute λmin as the rate-parameter for Q by summing up write rates on individual records: λmin = λ1 + . . . + λn (3.4) It is questionable whether cache miss rates should be tracked and compared to cache miss rates, as proposed in previous work. Ultimately, DBaaS providers are interested in the workload mixture of reads/queries and writes on a given table or collection. For instance, if the workload is dominated by write operations, ttls should be estimated rather conservatively to reduce invalidations and stale reads. However, if the read process cannot be directly observed, there are two options. First, the model can simply rely on writes. Second, the model can try to approximate the workload mixture of reads and writes through various measures. In the remainder of this section, I will outline both alternatives. Further, I will comment on some practical issues of real-time monitoring at the end of this chapter. 32 3.3.2 Write-Only Estimation From a perspective of scalability, tracking miss rates on every database record can be too expensive. However, one could also take a position of ignoring the read proportion of the workload. Instead, one could base the ttl estimation simply on the probability of the next write. This requires the inverse CDF (or quantile function) of the exponential distribution parametrised by λmin to estimate expirations. The quantile function then provides time-to-lives that have a probability of p of seeing a write before expiration: F −1 (p, λmin ) = −ln(1 − p) λmin (3.5) Using the median inter-arrival time of writes (p = 0.5) then gives a straightforward ttl estimate for the result set of a query: F −1 (0.5, λmin ) = ln(2) λmin (3.6) The problem with this approach is that it does not provide a good intuition about the trade-off between cache hit rate and latency. It completely ignores whether a workload mixture consists primarily of reads or if it is dominated by writes. Fundamentally, service providers need to determine how many potential cache hits they are willing to trade for one invalidation that carries the risk of a stale read with it. The expected reduction in invalidations is (1 − p) · writes: for p = 1, every write is expected to cause an invalidation, for p = 0 (no caching), no object is invalidated. If the model completely ignores reads, it might not be flexible enough to deal with changing workloads. For instance, to instantly increase cache hits and thus reduce database load, p could be increased to e.g. 0.75. Developers could also specify a p for a given table or collection of documents by choosing 33 from predefined options. This is somewhat unsatisfying, as developers cannot be realistically expected to be aware of the detailed tuning mechanisms in the caching infrastructure. Another possible issue with this model is that it performs differently depending on the chosen cache model. For an objectlist, anticipating the next write on the result set is sensible as it invalidates the whole result. For id-lists, the next write is only relevant if the changed object does not match the query predicate any more. It is thus possible that optimal quantiles differ for the different representations. 3.3.3 Dynamic Quantile Estimation The goal of dynamic quantile estimation is to determine a p in the inverse CDF that reflects workload mixture as well as the tolerance on eventual consistency (higher consistency requirements lead to less cache hits). Instead of comparing cache misses on records and using the miss-to-write ratio as a proxy, one can directly estimate the workload mixture in the first step. Later, this estimate can be used to adjust quantiles of the next expected write. Again, one can argue that using the miss rate at the database is not informative enough. Since the true workload is hidden behind caches, the DBaaS cannot use the miss rate to infer if the workload even warrants caching. There are multiple possible models for estimating the workload mixture. First, the developer can specify the expected workload mixture. Note that this is different from the write-only model, where it was suggested the developer could directly choose a quantile. Providing a workload mixture is much more intuitive, as the developer can be expected to know whether a schema is primarily read or written. Another option is based on the insight that some objects will not be cached at all for various reasons. First, every cached object requires an entry in the server-side expiring Bloom filter, thus increasing probability of a false positive lookup, which in turn causes unnecessary invalidations. Second, the limited cache size can force the DBaaS to mark some objects as uncachable. This issue is related not only to the workload mixture, but also to the request 34 distribution. The workload mixture is the proportion of reads, writes and queries, whereas the request distribution describes how often individual keys are accessed by operations. In a typical Zipfian request distribution, some objects will be written extremely frequently, even though the workload mixture is dominated by reads. Furthermore, the expected bounds on stale reads depend on the latency distribution of the invalidation request: the longer it takes for an invalidation to complete, the higher the cumulative probability of a stale read. In summary, the overall workload mixture for a table can be estimated by marking some objects as uncachable for various reasons and then measuring their read/write mixture. Finally, one could track other query metrics through CDN log analysis, as proposed by Ozcan et al. [72, 73]. Query Shareness (QS) quantifies how many clients request a certain query, which is also interesting to the objectlist versus id-list decision, as a query that is shared by multiple users can particularly benefit from a pre-warmed cache. Query frequency stability (QFS) models the popularity change of query frequency over time. After obtaining an estimate of the workload mixture, there are multiple options to map estimates to quantiles. Using offline optimisation, a provider can obtain optimal values for typical workload mixtures. Quantiles can then simply be looked up from a configuration file. It is however questionable if such a model can reflect the nature of drastically changing workloads, e.g. applications suddenly growing viral. Alternately, an online learning model could use a budgeting approach. If there is a limited number of invalidations the system can perform, quantiles can be adjusted according to the number of invalidations performed. For instance, if the invalidation load is too high, the probability of seeing a write within the time-to-live of a cached object needs to be lowered. In summary, this chapter has introduced different query-caching execution models that are based on record-level access frequencies. I have also discussed strategies to invalidate result sets of queries, ttl estimation based on Poisson processes as well as various practical limitations. The baseline of all these considerations is a very long static ttl, which causes a maximum of cache 35 hits, invalidation cost and stale reads. In the next chapter, these insights are combined into an online decision model that can achieve fine-grained performance trade-offs. 36 Chapter 4 Online Learning 4.1 Introduction In the previous chapter, a theory of different execution models and their constraints was introduced. Specifically, trade-offs between execution models and parameters of ttl estimation were discussed. However, these insights are only actionable if the DBaaS has a decision model that can adapt to changing request loads. In this chapter, I first describe the decision problem in a formal framework and then derive a solution. Further, I introduce a generic method to find optimal parametrisations through utility functions. In order to construct a learning model, one first needs to consider what information is available at what point in the decision process. The learning process will also need to consider the granularity of decision making both for the execution model as well as for ttl estimation. I begin by considering what is available to the DBaaS. The DBaaS can monitor reads, writes and issues invalidations requests after updates. Further, the DBaaS does not know the exact status of the various caching layers nor about the latencies a client sees for specific requests. Next, the processing flow of a potential decision model needs to be considered. The base case is a system that has not processed any queries yet and all caches are empty. At some point, the server receives 37 an initial query. The challenge from the server’s perspective is that it does not know anything about the result of this query yet but still has to make a decision on how to execute the query. As explained previously, the DBaaS can either order the database to execute a covered query on the index that only returns ids or a full query that returns all entries matching the query predicate. The query result is more informative than the query itself. Since the result contains the specific database objects or at least their ids, any available metrics on these objects can be used to improve future decisions. In principle, the model aims to improve decision making by considering how the previous decision impacted system performance, i.e. average response times for clients and load at the backend. In the following sections, I express this problem in a formal framework, present my solution and reason about the issues related to the scalability and performance of real-time learning. 4.2 4.2.1 Representation as an MDP State and Action spaces Finding a closed-form solution might be impractical due to the complex and stochastic nature of the problem. Many of the relevant variables like write rates, workload mixture and response times can only be approximated at the DBaaS. This lack of an analytical model suggests that reinforcement learning could be a sensible approach. This requires the task to be framed as a Markov decision process. The learning model is hence constructed by first considering each component of an MDP with regard to the problem. I then derive a model that I believe captures best the constraints of the problem. For now, the decision not to cache an object is ignored and deferred to the ttl estimation. This means the learning model only makes a decision between execution models and the ttl estimation model can then estimate a ttl of 0 if it decides the object should not be cached. Clearly, the space of possible 38 actions in this simplified scenario is A = {object-list, id-list}. One could then argue that queries should constitute the space of states, as the decision model must map queries to actions. Each action would then lead to a new query as the next state, i.e. Queries × A → Queries. There are multiple problems with this representation. It is questionable whether using queries as states even satisfies the Markov property since the effects of a decision taken in a state do not only depend on that single query. An incoming query does not capture all information relevant to the DBaaS. As I argued in chapter 3, various realtime metrics need to be taken into account. Further, a RL agent assumes that his actions determine his next state. Even if there is a probabilistic transition model that assumes a distribution of possible states for a decision, this is not a valid assumption. Queries from different clients are not in any causal relationship. Assuming that a decision on one query leads to another query as a new state is thus not a useful intuition. 4.2.2 Decision Granularity The observation that a decision model should depend on access patterns and workload metrics leads to multiple insights into the model structure. First, this suggests that states could be conceptualised as a set of load metrics instead of single queries. This implies a large state space that cannot be represented as a lookup table and must be approximated either through a linear sum of weighted features or a non-linear approximator like a neural network. Second, using the global system performance as a state has consequences on the granularity at which decision making is sensible. Measuring the impact of the decision on a single query on the system is infeasible. While the model aims to make ttl estimates on the level of individual query result sets, the execution model might be captured on the level of tables or document collections. As shown in the examples in chapter 3, the choice of execution model should in part depend on how much query predicates overlap. Consequently, a sensible model might assess access patterns on the level of 39 collections and use a single execution model for all queries on that collection or for all parametrisations of prepared queries. 4.2.3 Reward Signals Before discussing how to map states to actions in a formal manner, a reward signal needs to be specified. A fundamental problem in online learning is the definition of a good reward function. In data management, users and providers are often interested in learning how to achieve very specific tradeoffs on various performance metrics, which are then expressed through service level agreements. Naturally, one can only achieve trade-offs on features that are modelled into the reward function. For many examples in reinforcement learning, this is a straight forward measure such as the score in a game or making it to a certain height in the mountain car problem [74]. The difficulty is then rather to learn an approximation of the cost for actions in continuous state spaces from noisy and delayed rewards. For the decision model, the structure of the reward signal itself is a challenge. I begin by recapitulating features that are relevant to the execution model. For a given query, the database returns a result comprised of objects or keys (ignoring the trivial case of an empty result). In general, the goal is to minimise invalidations on these keys, to maximise cache hits, and to minimise overall query latency. However, only invalidations are directly visible to the DBaaS through the server-side expiring Bloom filter. However, cache misses registered at the DBaaS might be used as a proxy for cache hits. Earlier, I argued that cache misses cannot be used to infer the workload mixture. Nevertheless, a learner can still extract a reward from just comparing the total amount of cache misses in a given time period for different decisions. Further, while request latency for clients is unknown, a learning model could instead use the expected relative cost between execution models as a reward. Using the id-list model, the relative latency cost is a factor of result set cardinality and parallel connections. 40 Requesting all resources from a list of ids is more expensive by a factor of card(result set) . connections (4.1) The key point is that the model lacks an absolute notion of the quality of an action. The exact number of invalidations or cache misses following a sequence of decisions is not meaningful. While a certain absolute number of invalidations can be seen as an indicator for uncachable objects, cache misses are only meaningful when compared between execution models under the assumption that the workload is constant during the period of observation. It is also notable that the same metrics that I suggested to represent a state are used in the reward signal. Specifically, the state is comprised by the overall load, whereas the reward consists of the specific metrics for keys that are part of a query result. The structure of the reward suggests that the model needs to continuously compare choices for the same queries to see which decision yields the higher relative reward. So far, this approach has not dealt with the question of how to map the continuous action space to a binary set of actions. While substantial research efforts have gone into approximation of continuous state and action spaces [75, 51, 52], it is questionable whether this effort is necessary here. If reward features are a subset of state features and states need to be mapped to actions according to relative rewards, the model can simply represent its policy as a probability distribution over actions to sample from, as actions need to be constantly compared for relative rewards. In the following section, I hence combine the previous observations into a model that directly updates the belief state about the optimal distribution of actions. 41 4.3 Belief State Approximation I propose the following model: each collection or table begins with a prior on execution models, e.g. without further assumptions one might use a uniform prior with p(object - list) = 0.5 and p(id - list) = 0.5. A learning period is defined by the number of samples n that the model collects before updating its belief state. Further, the model is parametrised through the interval length of the moving window at which writes, invalidations and cache misses can be tracked. Every time the DBaaS receives a query, a decision on the execution model is made by drawing from the distribution, e.g. the model optimises on the overall distribution for a collection of entries. One could also imagine the model learning a distribution for all parametrisations of a prepared query, e.g. a query that always requests the same content but allows for user-defined filters. After query execution, a reward r on a list of k result ids id1 , . . . , idk for a sample is computed through k ω1 ω2 1 X + , r= · c j=1 invalidations(idj ) cache missses(idj ) (4.2) with c being the relative cost of execution c= k connections ω if id - list (4.3) if object - list 3 and invalidations and cache misses representing their approximated frequencies. An inverse sum is used because the goal is to minimise these values. Scalable sampling methods to approximate these frequencies will be discussed at the end of this chapter. The reward also needs to include weights ω1 , ω2 , ω3 to be able to express a preference towards lowering invalidations, cache misses or response times at the client. For instance, increasing ω3 would increase the reward for using object-lists, thus generally lowering client latency. At the end of a learning period (n samples and rewards), the belief state is updated batch-wise. First, the model computes the normalised total reward for each 42 execution model by averaging over the number of samples out of n for which the decision object-list (n1 ) or id-list (n2 ) was made: Pn1 robject - list = i=1 ri (object - list) n1 , (4.4) Pn2 rid - list = i=1 ri (id - list) n2 Finally, the current belief state is batch-updated through p(object - list)t+1 = p(object - list)t + αt · robject - list − rid - list robject - list + rid - list (4.5) and p(id - list)t+1 = 1 − p(object - list)t+1 , (4.6) where αt ∈ [0, 1] is the learning rate at time point t. Again, the reason updates are performed through batch-wise comparison is that rewards on single queries are deemed to be too noisy. Intuitively, the model simply samples rewards for decisions, compares rewards and shifts its belief state according to the difference in rewards in the observation period normalised by the total reward obtained. The learning rate can either be held constant or tuned proactively. An apparent disadvantage of the model is that, as it converges towards one execution model, fewer and fewer samples will be drawn from the model that is deemed to be less relevant. Hence, special consideration needs to be taken with regard to convergence strategies. 4.3.1 Convergence and Exploration There are multiple convergence scenarios. First, the model could convert to a mixture that does not put a clear preference on one execution model, which could also be caused by an unfavourable parametrisation that leads to too much noise or to little data, e.g. observation window for reward 43 measurements is too short. This could be defined as a case where 0.4 ≤ p(object - list) ≤ 0.6 and thus also 0.4 ≤ p(id - list) ≤ 0.6. This is an unfavourable outcome, as it implies that random decisions on a uniform prior are sufficient (hence no learning necessary). If the model converges strongly towards one model, e.g. a probability of 90% or more for a single decision, it might not be able to adapt to changing workloads later. A typical solution to this problem is to introduce a small probability where a non-dominant action is taken, a so called epsilon-greedy approach [76]. The model greedily chooses the presumed best action with a probability of 1 − and otherwise a random action. This can be practically achieved by bounding probabilities for one decision by (1−). In the experimental evaluation, I will demonstrate how this enables the model to detect and adapt to changing workloads. 4.3.2 Sampling Techniques Various methods exist to approximate streams of incoming data [77]. Good examples for approximations are cache miss and invalidation frequencies through a moving window of arrival times. However, more sophisticated methods exist: (biased) reservoir sampling can be used to summarise streams by keeping a fixed-size reservoir of representative values and updating the reservoir through a bias function [78, 79]. Initially, all incoming values are used to fill the reservoir. A bias function (often exponential) f (r, t) is then used to define a relative probability of an r-th point still belonging to the reservoir at the arrival of a later arriving t-th point. Alternatively, one can simply replace elements in the reservoir with a certain rate uniformly at random. The advantage of the reservoir sampling approach is that it does not completely ignore values after a certain period (like a moving window) [80]. Another aspect of sampling and approximation is extrapolation. For a large database, it is infeasible to hold moving windows for all database records in memory. Instead, one should expect to extrapolate from a set of representative records. I expect the learning model to be computationally inexpensive, as it primarily consists of in-memory summations. From a practical perspec44 tive, this makes the model very favourable, as many sophisticated prediction techniques require costly matrix factorisations that are problematic for scalable realtime learning. For instance, inference on Gaussian processes runs with O(n3 ) runtime and O(n2 ) space complexity [81]. Sparse matrix approximation techniques exist, but are rather targeted at offline processing of large datasets and do not operate on a timescale of miliseconds [82, 83]. 4.3.3 Hyperparameter Optimisation At various point in this work, I have pointed towards trade-offs in consistency, latency, server load and cache efficiency. For instance, the reward function is parametrised through weights ω1 , ω2 , ω3 that characterise a preference between cache misses and invalidations. It is however not straight-forward to define parameters that express a specific performance level. For instance, a DBaaS provider might desire to analyse the required performance at various components to achieve a certain average latency for a specific caching topology. This section briefly explains how to optimise the parameters of the learning model. First, one needs to define a global utility of an instance of the Monte Carlo simulation. An instance means running a certain workload with specific request and latency distributions and monitoring all performance measures of interest. The global utility u of an instance is a linear combination of n utility functions f that map concrete values to a normalised utility u= n X ω i fi . (4.7) i For illustration, consider a possible utility function for average query latency at the client, as seen in figure 4.1. Here, an average latency below 50 milliseconds is desired. Latencies of about 100 milliseconds are already considered to be of much less utility and latencies close to 200 milliseconds are of no value, e.g. due to a service level agreement. 45 1 Utility Utility 0.8 0.6 0.4 0.2 0 0 50 100 150 Average response time (ms) 200 Figure 4.1: Utility function example for response times. In general, a configuration can then be found according to the following steps: (1) Definition of a hyperparameter space, e.g. parameters ω1 , ω2 ∈ (0, 1) define a two-dimensional grid. (2) Definition of a linear combination of utility functions on the performance metrics of interest, thus mapping concrete desired values to a normalised score. (3) By repeatedly drawing samples from the hyperparameter space, a local optimum is determined. The key insight is that this method does not draw at random or by using a stochastic gradient descent, which would only take into account local improvements. Traditional approaches include random search, grid search and manual search of optimal parameters [84, 85]. However, these approaches can be inefficient if the reward function is expensive to evaluate. In contrast, Bayesian approaches using Gaussian processes construct a probabilistic model of the reward function and make educated estimates on where in the parameter space to next evaluate the function. This is done by utilising all available information from previous evaluations instead of just making a 46 local estimate [86]. I use the Spearmint framework described by Snoek et al. to perform Monte Carlo optimisation with Gaussian processes [87, 88]. 47 48 Chapter 5 Evaluation 5.1 Aims This chapter describes the implementation, the experimental set-up and the experimental evaluation. First, however, the evaluation goals need to be defined. The experiments should (1) confirm the trade-offs of the different execution models suggested by my theory, (2) investigate the relationship between the stochastic ttl estimation model and consistency, (3) demonstrate that the estimation method is superior to a static model with regard to invalidation load, and (4) validate the learning model as a method to achieve the desired tradeoffs. It is also necessary to understand the baseline of the evaluation. Section 5.3 compares cache hit rates and response times between different execution models. DBaaS providers usually do not cache their dynamic content because of consistency and invalidation issues. An appropriate baseline is thus a DBaaS that does not cache its dynamic content. Section 5.4 investigates 49 consistency and invalidation load for the proposed model. Specifically, it is illustrated how naive caching techniques for dynamic content are bottlenecked by invalidation cost (and hence not used in practice). Finally, section 5.5 analyses the performance of the online learning scheme. 5.2 5.2.1 Simulation Framework Design and Implementation All experiments were carried out in a Java 8 simulation framework. I chose Java for its concurrency utilities and for the availability of some required libraries. The implementation is based on previous work on dynamic caching as well as the Yahoo Cloud Serving Benchmark (YCSB) [89, 6]. YCSB is an established framework to benchmark cloud databases by providing a set of typical workloads and a common interface for standard database operations. It thus enables a comparison of database performance. In order to compare two databases, users can provision a certain computing power (often Amazon EC2 instances [90]) and then deploy the benchmark by implementing the interface and specifying a workload. A workload is defined as a set of parameters like read rate and write rate (e.g. 50/50), a request-distribution, a desired throughput, the number of objects in the database, the number of fields per entry and the length of these fields as well as the number of concurrent clients. In previous work, YCSB was extended to analyse caching behaviour of individual database entries [6]. In particular, no actual database was used in my previous work with Gessert et al. Instead, a database was simulated as a hash table. In this work, I abandoned the YCSB framework in favour of a dedicated query simulation framework. I reused and extended some classes, namely modules for treating individual resources. Specifically, I adapted and modified the simulated cache class, the staleness detection mechanism through time stamping, the moving window mechanism used to collect fre50 quencies, as well as the expiring Bloom filter. In the code, I have commented each individual class to indicate whether it was reused, modified existing code or completely independent. The main difference of my framework compared to our previous work is the ability to generate, execute and evaluate queries. Queries were constructed by drawing projections from specified ranges (e.g. f ield 1 > 10) and then parsed and executed on a MongoDB server. The advantage of using MongoDB is that users do not have to specify a schema. Thus, the benchmark can simply insert and overwrite documents with arbitrary specified contents instead of having to declare typed attributes. In further benchmarks, one could also set up a schema to test other database paradigms (e.g. relational), e.g. by following the specifications of the established TPC-C benchmark [91]. Figure 5.1 provides an overview of my implementation. The main components are clients (each associated to a thread generating requests), a cache instance and the DBaaS endpoint that is managing database access, ttl estimation, invalidations, and learning. After specifying the workload parameters, the simulation populates MongoDB and ensures indices. All MongoDB requests are executed with the write concern “acknowledged”. Write concerns are guarantees on consistency after updates which directly affect performance. For instance, “acknowledged” as the default concern means that changes have been applied to the in-memory view of the data. Clients then continuously generate requests that are routed to the CDN edge server, which forwards them to the DBaaS. The DBaaS server parses requests and executes queries on MongoDB while consulting the learning module for decisions on execution models and the ttl estimator for expirations. On every update, the query matching engine is consulted to decide which query results have to be invalidated. The specific control flow of query caching and the decision model have been extensively covered in chapters 3 and 4. A detailed explanation of the individual modules can be found on the project website [92]. 51 Workload mixture Request distribution Number of operations ... Client layer Issues global versions Tests for staleness on each read Staleness detector Worker thread Generates next request Send requests delayed by latency sample Cache layer Collapse and forward requests Invalidate Return cached query results DBaaS layer Match updates on Cached results Learner TLL estimator execute Queries/Updates Query Matching engine Access metrics Figure 5.1: Overview of the simulation architecture. 52 MongoDB 5.2.2 Benchmark Configuration All experiments were carried out on a machine with 16 GB RAM and a quad-core i5 CPU (2.8 GHz). Further, normal distributions were used for the latencies between client and cache, cache and DBaaS (using known Amazon EC2 region latencies) and for invalidation latency (using data from the Fastly CDN [64]). It should be noted that a distributed benchmark was not performed. A requirement of this project was a cost-neutral evaluation. While Amazon Web Services provides free micro tier instances to students, these are not very useful here. Matching updates to query results is a computationally expensive task. Executing the benchmark on micro tier instances would skew the results. 5.3 5.3.1 Comparing Execution Models Read-Dominant Workload The evaluation begins by comparing the object-list and the id-list model for typical workloads. The model suggests that caching whole query results leads to much lower latency due to fewer latency round-trips. In turn, I expect higher cache hit rates when caching results as id-lists because intersecting query predicates benefit from sharing cached entries. First, I examine a typical read-heavy workload that consists of 95% reads and queries and 5% updates on a Zipfian access distribution, e.g. photo tagging. To clarify, a read is equivalent to a GET request on a single resource identified by its key, whereas a query consists of at least one projection and requires evaluation by the database’s query engine. The workload initially inserts 1000 documents with each 10 fields of random data and then performs 100, 000 requests by 10 parallel threads (each one connection), beginning with a mixture of 40% reads, 55% queries and 5% updates to demonstrate the principal difference in execution models. 53 Cache hit rate 1 0.8 0.6 0.4 0.2 10 0 Object-list Id-list 10 -1 10 -2 Query selectivity 10 -3 10 -4 Figure 5.2: Cache hit rates as a function of average query selectivity on a mixture of 40% reads, 55% queries and 5% writes. Average response time (ms) Object-list Id-list Uncached DBaaS 200 150 100 50 0 10 0 10 -1 10 -2 Query selectivity 10 -3 10 -4 Figure 5.3: Average query response times as a function of average query selectivity on a mixture of 40% reads, 55% queries and 5% writes. Figures 5.2 and 5.3 show how the performance of execution models relates to average query selectivity. Further, response times for a DBaaS without 54 dynamic query caching are shown. A query selectivity of 1 means that the query predicate matches all objects in the database and a query predicate of 0.1 means that the predicate matches 10% of all keys, i.e. selectivity indicates how much result sets of different queries intersect. First, the result matches the expectations as the object-list execution model achieves better response times than the id-list model but has a worse cache hit rate. For an uncached DBaaS, every request requires a full round-trip to the backend, resulting in noticeable response times for the client particularly if the application requires more than one round-trip. There are two artifacts in figure 5.3 worth discussing. For a query selectivity of 1, i.e. the query predicate matching all documents in a collection, both models have slow response times. This is due to collapsed forwarding in the cache. If many clients request the same content from a cache edge server, the cache will collapse the requests to a single database query, thus blocking multiple clients. This decrease in request parallelism causes longer response times for clients. Additionally, write locks also block incoming reads. Further, response times for very selective queries (average selectivity of 0.0001) are very similar because the predicate matches only one object in the simulation (or none). This means that most queries will be uncached and thus require a full round-trip to the DBaaS. Note that a round-trip between Europe and USA EC2 regions is around 170 milliseconds [6], matching the result. Cache hit rates on resources are still drastically different, because in the id-list model, normal reads still pre-warm the cache for query results. The reason it still takes a full round-trip is that the client first needs to fetch the id-list. The result shown in this section were averaged over 5 runs. Considering the probabilistic nature of the experiments, the simulation is very consistent. For the plot in figure 5.2, the average cache hit rate for the id-list model for an average query selectivity of 1 is 98.73% with a standard deviation (sd) of 0.00015 and 95% confidence intervals (CI) of (0.9871, 0.9875). The average cache hit rate of the object-list model is 74.34% (sd = 0.0035, CI = 0.7190, 0.7278). Similarly, the average response time for the id-list model is 158.88 ms (sd = 0.7 ms, CI = 158.01, 159.74). For the object-list model, an 55 average of 65.75 ms (sd = 0.55 ms, CI = 65.07, 66.44) is observed. In summary, errors were negligible: the simulation converges to the desired target distributions after a few thousand requests. Since each workload executes 100, 000 requests, small fluctations (e.g. garbage collection) are averaged out. Errors are hence omitted in further experiments. The same experiment was repeated for a workload with a uniform access distribution on the keys, as seen in figures 5.4 and 5.5. Notably, there is no spike in latency at a selectivity of 1, as was observed in figure 5.3. Since individual reads are now uniformly distributed over the key space, there is less lock contention due to writes and thus more query parallelism at the database. However, one can observe the same genereal trend and I will thus in the remainder of the experiments use a Zipfian access distribution, which is a more typical case [93, 94]. Cache hit rate 1 0.8 0.6 0.4 0.2 10 0 Object-list Id-list 10 -1 10 -2 Query selectivity 10 -3 10 -4 Figure 5.4: Cache hit rates as a function of average query selectivity on a mixture of 40% reads, 55% queries and 5% writes under a uniform access distribution. In the experiments above, all expirations are estimated by only tracking incoming writes, as described in chapter 3. The cumulative probability of a write within the expiration is adjusted to p = 0.75 to enable high cache hit rates on a read-heavy workload. While the impact of write-quantiles on 56 Average response time (ms) Object-list Id-list Uncached DBaaS 200 150 100 50 0 10 0 10 -1 10 -2 Query selectivity 10 -3 10 -4 Figure 5.5: Average query response times as a function of average query selectivity on a mixture of 40% reads, 55% queries and 5% writes under a uniform acess distribution. invalidation load and eventual consistency will be analysed in the upcoming sections, the execution models will first be compared under another workload. 5.3.2 Write-Dominant Workload Figures 5.6 and 5.7 show the same metrics for a write-heavy workload that consists of 50% writes and 25% queries and reads each. Figure 5.7 shows a similar overall trade-off between execution models as the read-heavy workload. However, there is a clear difference in cache hit rates, as the object-list model provides very weak cache performance. This was not the case in the read-heavy workload, where cache hit rates began high but degraded with increasing selectivity. In turn, average response times are rather high (above 400 milliseconds) for the id-list model. Notably, they are even worse than not caching at all. Since objects are written very frequently, clients first do not get a cache hit on the id-list in the CDN. 57 Average response time (ms) Object-list Id-list Uncached DBaaS 250 200 150 100 10 0 10 -1 10 -2 Query selectivity 10 -3 10 -4 Figure 5.6: Average query response times as a function of average query selectivity on a mixture of 25% reads, 25% queries and 50% writes. Estimation quantiles have been adjusted to p = 0.4 to account for the write-dominant workload. Cache hit rate 1 Object-list Id-list 0.8 0.6 0.4 0.2 10 0 10 -1 10 -2 Query selectivity 10 -3 10 -4 Figure 5.7: Cache hit rates as a function of average query selectivity on a mixture of 25% reads, 25% queries and 50% writes. 58 After retrieving the id-list, clients have to iterate over the individual resources, which also might not be available in the cache. Hence, in this case it is more economical to use the object-list model. This experiment has illustrated how demanding write-dominant workloads are for the DBaaS. In order to maintain reasonable latencies at clients, one has to accept low cache efficiency. In the following section, I investigate how ttl estimations affect invalidation load and client consistency. 5.4 5.4.1 Consistency and Invalidations Adjusting Quantiles This section deals with the effect of write quantiles on client consistency and cache hit rates. Specifically, the experiments should quantify how invalidation load and stale reads are connected to the cumulative write probability within a cache expiration duration. I again consider the write-dominant workload that causes expensive trade-offs on cache efficiency for acceptable client latency. The first experiment investigates staleness using the data on invalidation latencies provided by the Fastly CDN [64]. Figure 5.8 shows the absolute number of stale reads. As expected, stale reads increase with increasing quantiles because every write on a still cached object triggers the possibility of a stale read, depending on how fast the invalidation is executed. The highest number of stale reads observed accounted for 1% of all reads and queries (500 out of 50,0000), depending on the execution model (see appendix B for impact of invalidation latency on stale reads). For workloadadjusted quantiles (i.e. lower quantiles on write-dominant workloads) the average number of stale reads is about 0.1%, which seems acceptable for most applications without strong transactional semantics. Figure 5.9 shows how cache hit rates depenend on the quantile of the next expected write on the same workload. As noted above, the object-list model provides weak cache performance on a write-dominant workload. 59 Stale reads 600 400 200 Object-list Id-list 0 0 0.2 0.4 0.6 0.8 Quantile of next expected write 1 Figure 5.8: Absolute number of stale reads on the write-dominant workload as a function of the quantile of the next expected write. Specifically, most cache hits in in the object-list model in this scenario stem from GET requests on individual resources, not from cached queries. Cache hit rate 1 0.8 0.6 0.4 0.2 0 Object-list Id-list 0 0.2 0.4 0.6 0.8 Quantile of next expected write 1 Figure 5.9: Cache hit rates on the write-dominant workload as a function of the quantile of the next expected write. In these experiments, all objects were cachable, leading to an invalidation load of 45, 000 to 50, 000, i.e. almost every write leading to an invalidation on 60 a Zipfian access distribution. This illustrates the necessity of marking certain objects uncachable, as they will otherwise bottleneck the query matching engine. The following section investigates which trade-offs can be achieved with regard to invalidation load. 5.4.2 Reducing Invalidation Load As discussed in chapter 3, the DBaaS needs to be able to reduce invalidation load depending on the achievable throughput of matching updates to query results. I thus have implemented the proposed model of marking certain objects uncachable based on the insight that some objects might be updated so frequently that they cannot reasonably be cached (e.g. they would be stale by the time they have arrived at the CDN). Figure 5.10 shows a comparison of invalidation loads when using this approach to the previous approach of caching all objects depending on their write frequency and the chosen quantile. For comparison, I also show a static caching method that caches all objects with the same expiration. First, one can note that for a writedominant workload, the cache hit rate is capped at 86.6%. Second, marking certain objects as uncachable results in an average cache hit rate of 72.6% (excluding quantile 0, which means no caching). The interesting question is now how this relates to invalidation load. Figure 5.11 compares invalidation loads from the same experiments. Notably, there is a drastic decrease in invalidation load by dynamically marking objects as uncachable. Average invalidation load is reduced by about 50%, while only giving up 14% cache hit rate (response times did not differ significantly). The same effect can be observed for the object-list model. 61 Cache hit rate 1 0.8 0.6 0.4 Naive id-list Uncachable objects marked Static caching 0.2 0 0 0.2 0.4 0.6 0.8 Quantile of next expected write 1 Figure 5.10: Cache hit rates on the write-dominant workload as a function of the quantile of the next expected write and compared to a static caching method. Invalidations 6 #10 4 4 2 0 Naive id-list Uncachable objects marked Static caching 0 0.2 0.4 0.6 0.8 Quantile of next expected write 1 Figure 5.11: Invalidation loads for using the naive id-list approach versus dynamically marking frequently written objects as uncachable. 62 5.5 Online Learning 5.5.1 Learning Decisions In the previous sections, I have demonstrated the trade-offs related to execution models and their parametrisations. I begin the evaluation of the learning model by applying the decision model to the read-dominant workload from above while also using the optimisation of marking some objects uncachable. With only 5% writes, primary focus is not on limiting invalidation load but rather on client latency. Query selectivity Learner Random guess Object-list Id-list 1 0.1 0.01 0.001 0.0001 211.2 151.5 107 130.5 146.8 348.2 195.5 124.4 135.9 147 188.9 142.3 104.5 129.1 148.7 552.7 251.1 167 147.8 148.6 Table 5.1: Average overall request response times (ms) for learning model compared to random guessing and static decisions on a read-dominant workload. Query selectivity Learner Random guess Object-list Id-list 1 0.1 0.01 0.001 0.0001 0.3 0.42 0.6 0.37 0.19 0.61422 0.7115 0.74 0.441 0.21 0.22 0.28 0.46 0.29 0.18 0.87 0.889 0.93 0.72 0.46 Table 5.2: Cache hit rates for learning model compared to random guessing and static decisions on a read-dominant workload. Table 5.1 compares average request response times for the learning model compared to a uniform random guess and static decisions. The differences in response times between learner and random guessing are small because a random mixture already provides relatively low latencies, as id-cached results pre-warm the cache for individual reads. By having a bias towards low latencies, the learning model has traded in cache efficiency (as seen in table 5.2). 63 One can also see that the learner converges towards the performance of the object-list model. For the evaluation of the learning model, the comparison to a static decision is not as useful because it is already known that either object-list or id-list is optimal depending on the desired trade-offs. Having established that the model can converge to the performance of a static model, the question is thus rather whether its decisions are better than random decisions, which I will focus on in the following (hence omitting static decisions, as I have covered them extensively above). For a more detailed analysis, isolated query response times of learning and guessing can be considered, i.e. ignoring response times for individual GET requests, as seen in table 5.3. Previously, I already suggested that the extreme ends of selectivity can be rather ignored because of lack of parallelism for selectivity of 1 and no difference between decisions for highly selective queries. Instead, one might consider rather typical cases, e.g. for an average query selectivity of 1% the average query response time could be reduced from 104.8 to 67.4 milliseconds (35.6% decrease). Query selectivity Belief state approximation Random guess 1 0.1 0.01 0.001 0.0001 295.7 176.8 67.4 120.4 171.8 528.6 229.8 104.8 139.5 173.7 Table 5.3: Average query response times (ms) for learning model compared to random guessing on execution model on read-dominant workload. I have repeated the same experiment for the write-dominant workload This time, learning was focussed on invalidation load and response times, as previous experiments have already shown that high cache performance is not possible without very high latencies. Again, latency can be drastically reduced while maintaining approximately the same invalidation levels, as seen in table 5.5. This is achieved by trading in cache performance. In this particular experiment, the average cache-hit rate of the learner sinks to 6% from 21% for random guessing. Both cache hit rates are very low, as hotspot 64 Query selectivity Belief state approximation Random guess 1 0.1 0.01 0.001 0.0001 230.3 215.9 189.6 168.3 168.5 582 422.8 260.6 187.4 171 Table 5.4: Average request response times (ms) for learning model compared to random guessing on execution model on write-dominant workload. objects are marked uncachable to reduce invalidation loads. In principle, these experiments have established that the learning model can indeed learn towards certain metrics. In the following sections, I will analyse the quality of these trade-offs and convergence properties in more detail. Query selectivity Belief state approximation Random guess 1 0.1 0.01 0.001 0.0001 28142 26493 18919 18184 12066 24725 26988 25028 16768 11650 Table 5.5: Invalidation loads for learning model compared to random guessing on execution model on write-dominant workload. 5.5.2 Evaluating Trade-offs The tables above show that the learning model can achieve improvements on various metrics. However, it is hard to quantify the quality of a trade-off by simply comparing for instance cache hit rates and invalidations, as was done above. To this end, the approach of defining a linear combination of utilities from chapter 4 is used. By defining utility functions for response times, invalidations and cache hit rates, the global system utility of a workload instance is defined. Consequently, it can be assessed if and how the learning model increases utility over time. 65 I use the utility function from section 4.3.3 for latency and a linear function of invalidation utility, i.e. u(invalidation) = 1−invalidations/writes. Further, the cache hit rate itself can be used as a utility function because it is already normalised. Figure 5.12 shows how system utility changes over the number of operations performed during a benchmark instance (read-dominant workload). After 2, 000 operations, there is an initially high utility for random decisions and a rather low utility for learning. I attribute this to warmup effects. As initial response times are longer, less utility comes from latency. Within a few thousand operations, the learning model (learning rate α = 0.1) achieves much higher utility and the utility of random guessing degrades. 0.4 Random guessing Learner Utility 0.35 0.3 0.25 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 Operations performed #10 Figure 5.12: Global utility as a function of operations performed. The key point here is that there is a formal framework of mapping concrete values of metrics (e.g. latency of 100 milliseconds) to a normalised score. The learning model is consequently able to learn trade-offs that optimise towards whatever preference is expressed by the service provider. 66 5.5.3 Convergence and Stability Finally, the evaluation needs to consider situations when the learning model has converged and is then confronted with a changing workload. To this end, I start with the write-dominant workload for 100, 000 operations and then introduce a higher proportion of reads (mixture of 40% reads, 20% writes, 40%). Figure 5.13 demonstrates the associated behaviour. Ostensibly, the model quickly achieves a higher utility which then steadily increases. However, one can also note that random guessing seems to achieve higher utility over time. This can be explained through higher cache utilisation. As the cache fills up again after a change in request load, latency and cache hit rate improve and total utility increases for all models. In summary, the evaluation has validated the caching scheme. Using dynamic query caching and the learning scheme, average response times for clients can be lowered to an imperceptible range (below 100 ms). Further, I have investigated consistency and invalidation load as manageable practical constraints. Finally, the experiments have demonstrated how an online learning model can achieve these trade-offs dynamically through a method motivated by reinforcement learning. 67 0.4 Random guessing Learner Utility 0.3 0.2 0.1 0 0.5 1 1.5 Operations performed 2 #10 5 Figure 5.13: Behaviour of learning model versus random guessing under a change of workload mixture. 68 Chapter 6 Outlook and Conclusion 6.1 Summary and Conclusion In this project, I have identified remote access latency as a key performance problem in interactive applications. What is more, I have pointed out the constraints of naive caching schemes with regard to consistency and invalidation load. Considering these limitations, I have introduced a caching scheme which I believe can achieve low latency for the client while maintaining tunable invalidation load and consistency. The first component of the caching mechanism is based on the idea that different representations and execution models can be used for varying workloads. The second contribution is an online learning model that uses various approximations to make decisions based upon these representations. Through the Monte Carlo simulation of typical workloads, various trade-offs for client performance, cache efficiency and server load were shown. To the best of my knowledge, this study has introduced the first model for caching highly dynamic query content. In principle, any REST-ful web service can implement the proposed architecture thereby achieving drastically improved response times for clients whithout incurring too great an invalidation load at the backend. On a more general note, this project has provided 69 an example of how the intersection of distributed systems, databases and machine learning enables more flexible and adaptive infrastructures. 6.2 6.2.1 Future Work Parsing Query Predicates This work did not extensively cover query semantics during invalidation. I suggested comparing before and after images of documents affected by an update. In future work, query predicates could be parsed by a schemaaware middleware that could enable more efficient invalidation mechanisms. For instance, on a numeric predicate, deciding whether an invalidation is necessary is a simple range comparison between update value and predicate range. On a similar note, knowledge about the schema would also enable mixed decisions on cache representations. A typical use case of this is a schema containing a counter, which is an essential data type for today’s application economy (counting impressions, click-streams). A query would usually select the counter to display its value. A sensible model might select to cache counters as id-lists and other parts of a result as an object-list, since updates on the counter would always invalidate the whole object-list. 6.2.2 Unified Learning Model In the learning model, estimating expirations and making decisions on the execution models were treated as two distinct tasks. This is because the model was lacking a good function approximation of mapping the state space of load metrics to a pair of expiration time and a decision on the execution model. In future work, a unified reinforcement learning model that supports more proactive decisions at the server could be explored. Instead of just reacting to individual queries, an advanced model could maintain lists of cachable and uncachable objects. It could then independently decide to 70 push and remove objects to and from caches. This prefetching of data to edge servers is particularly relevant for initial load times. Further, there are various other decisions and tunable runtime parameters related to latency. In particular, my model examines eventual consistency in the context of stale reads from the cache. Equally, performance could be tuned by adjusting write concerns at the database cluster itself. That is to say, a tenant might have a default setting of blocking request responses until an update has been persisted to all replica sets. This could be relaxed during flash crowds. 71 72 Appendix A Proofs A.1 Minimum of Exponential Random Variables The following theorem is straight-forward but I have not been able to locate a proof in print [95]. Theorem A.1.1. Let Xi , . . . , Xn be mutually independent exponentially distributed random variables with rate parameters λi , . . . , λn . Then the minimum is again exponentially distributed: min{X1 , . . . , Xn } ∼ exponential n X ! λi i=0 Proof. Each Xi has the cumulative distribution function F (x; λ) = 1 − exp(−λx) f or x ≥ 0 and rate parameter λi . The random variable Xmin = min{X1 , . . . , Xn } has the CDF 73 F (x; λmin ) = P (Xmin ≤ x) = 1 − P (min{X1 , . . . , Xn } > x) = 1 − P (X1 > x, . . . , Xn > x) n Y =1− P (Xi > x) i=1 =1− n Y exp(−λi x) i=1 = 1 − exp −x n X ! λi i=1 = 1 − exp(−λmin x). 74 Appendix B Additional Analysis B.1 Impact of Invalidation Latency Stale reads 800 600 Naive id-list Uncachable objects marked 400 200 0 100 150 200 Mean invalidation latency 250 Figure B.1: Stale reads as a function of mean invalidation latency on 100,000 operations. Higher invalidation latency gives rise to more stale reads, as there is a bigger time window to retrieve stale content from the cache. Marking frequently written objects as uncachable reduces this effect. 75 B.2 Monte Carlo Optimisation Table B.1 demonstrates an example of hyperparameter optimisation through Bayesian inference using the Spearmint framework [88]. Consider the writedominant workload from chapter 5. The target parameters are the writequantile and the maximum time-to-live (values between 0 and 60 seconds allowed) the model can estimate. In this simple example, the utility of response time is set to three times the utility of the cache hit rate. Experiment Quantile p Maximum ttl (s) Utility 1 2 3 4 5 0.5 0.75 1 1 1 30 15 2 0 60 0.187 0.217 0.326 0.337 0.364 Table B.1: Bayesian optimisation of optimal quantile p and maximum allowed ttl. The Gaussian process quickly predicts that the highest utility is achieved by setting a high write quantile and a high maximum ttl. Further experiments did not show any improvement, as the inference model tried to improve the utility by making tiny adjustments in the allowed range (e.g. a ttl of 59). Since every run of the simulation takes a few minutes, Bayesian optimisation is a convenient tool for quickly finding parametrisations. This is simply done by defining the utility of the performance metrics of interest and then sampling the Gaussian process for suggestions on the parameters repeatedly. Every suggestion takes into account the utility of previous suggestions to quickly find a maximum. 76 Bibliography [1] Ioannis Arapakis, Xiao Bai, and B. Barla Cambazoglu. Impact of response latency on user behavior in web search. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’14, pages 103–112, New York, NY, USA, 2014. ACM. [2] Wolfgang Lehner and Kai-Uwe Sattler. Web-Scale Data Management for the Cloud. Springer, New York, 2013 edition, April 2013. [3] Guoqiang Zhang, Yang Li, and Tao Lin. Caching in information centric networking: A survey. Comput. Netw., 57(16):3128–3141, November 2013. [4] R. T. Hurley and B. Y. Li. A performance investigation of web caching architectures. In Proceedings of the 2008 C3S2E Conference, C3S2E ’08, pages 205–213, New York, NY, USA, 2008. ACM. [5] Taekook Kim and Eui-Jik Kim. Hybrid storage-based caching strategy for content delivery network services. Multimedia Tools Appl., 74(5):1697–1709, March 2015. [6] Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. The cache sketch: Revisiting expirationbased caching in the age of cloud data management. BTW ’15, Hamburg, Germany, March 2015. [7] Mukaddim Pathan and Rajkumar Buyya. A taxonomy of cdns. In Rajkumar Buyya, Mukaddim Pathan, and Athena Vakali, editors, Content Delivery Networks, volume 9 of Lecture Notes Electrical Engineering, pages 33–77. Springer Berlin Heidelberg, 2008. [8] Jia Wang. A survey of web caching schemes for the internet. SIGCOMM Comput. Commun. Rev., 29(5):36–46, October 1999. 77 [9] Werner Vogels. Eventually consistent. Commun. ACM, 52(1):40–44, January 2009. [10] Felix Gessert, Steffen Friedrich, Wolfram Wingerath, Michael Schaarschmidt, and Norbert Ritter. Towards a scalable and unified REST API for cloud data stores. In 44th annual conference of the society for informatics, Informatik 2014, Big Data - Mastering Complexity, 22.-26. September 2014 in Stuttgart, Deutschland, pages 723–734, 2014. [11] F. Gessert, F. Bucklers, and N. Ritter. Orestes: A scalable database-asa-service architecture for low latency. In Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on, pages 215–222, March 2014. [12] Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. Probabilistically bounded staleness for practical partial quorums. Proceedings of the VLDB Endowment (PVLDB 2012), 5(8):776–787, 2012. [13] Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. Quantifying eventual consistency with pbs. Commun. ACM, 57(8):93–102, August 2014. [14] Wojciech Golab, Xiaozhou Li, and Mehul A. Shah. Analyzing consistency properties for fun and profit. In Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC ’11, pages 197–206, New York, NY, USA, 2011. ACM. [15] Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. An analysis of facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 167–181, New York, NY, USA, 2013. ACM. [16] Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li. Ripq: Advanced photo caching on flash for facebook. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST’15, pages 373–386, Berkeley, CA, USA, 2015. USENIX Association. [17] Stefan Saroiu, Krishna P. Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy. An analysis of internet content delivery systems. SIGOPS Oper. Syst. Rev., 36(SI):315–327, December 2002. 78 [18] Michael J. Freedman. Experiences with coralcdn: A five-year operational view. In In Proc NSDI, 2010. [19] Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang. The stretched exponential distribution of internet media access patterns. In Proceedings of the Twenty-seventh ACM Symposium on Principles of Distributed Computing, PODC ’08, pages 283–294, New York, NY, USA, 2008. ACM. [20] Patrick Wendell and Michael J. Freedman. Going viral: Flash crowds in an open cdn. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC ’11, pages 549–558, New York, NY, USA, 2011. ACM. [21] Salvatore Scellato, Cecilia Mascolo, Mirco Musolesi, and Jon Crowcroft. Track globally, deliver locally: Improving content delivery networks by tracking geographic social cascades. In Proceedings of the 20th International Conference on World Wide Web, WWW ’11, pages 457–466, New York, NY, USA, 2011. ACM. [22] Mike P. Wittie, Veljko Pejovic, Lara Deek, Kevin C. Almeroth, and Ben Y. Zhao. Exploiting locality of interest in online social networks. In Proceedings of the 6th International COnference, Co-NEXT ’10, pages 25:1–25:12, New York, NY, USA, 2010. ACM. [23] Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of the Conference on Innovative Data system Research (CIDR), pages 223–234, 2011. [24] Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, and Phoenix Tong. F1: The fault-tolerant distributed rdbms supporting google’s ad business. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, pages 777–778, New York, NY, USA, 2012. ACM. [25] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and 79 Dale Woodford. Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst., 31(3):8:1–8:22, August 2013. [26] Charles Garrod, Amit Manjhi, Anastasia Ailamaki, Bruce Maggs, Todd Mowry, Christopher Olston, and Anthony Tomasic. Scalable query result caching for web applications. Proc. VLDB Endow., 1(1):550–561, August 2008. [27] Sadiye Alici, Ismail Sengor Altingovde, Rifat Ozcan, Berkant Barla Cambazoglu, and Özgür Ulusoy. Timestamp-based result cache invalidation for web search engines. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, pages 973–982, New York, NY, USA, 2011. ACM. [28] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422–426, July 1970. [29] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, and George Varghese. An improved construction for counting bloom filters. In Algorithms–ESA 2006, pages 684–695. Springer, 2006. [30] Andrei Broder and Michael Mitzenmacher. Network applications of bloom filters: A survey. Internet Math., 1(4):485–509, 2003. [31] Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, and John Lockwood. Deep packet inspection using parallel bloom filters. In High performance interconnects, 2003. proceedings. 11th symposium on, pages 44–51. IEEE, 2003. [32] Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw., 8(3):281–293, June 2000. [33] W. R. Gilks. Markov Chain Monte Carlo. John Wiley & Sons, Ltd, 2005. [34] Reuven Y. Rubinstein and Dirk P. Kroese. Markov Chain Monte Carlo, pages 167–200. John Wiley & Sons, Inc., 2007. [35] Siddhartha Chib and Edward Greenberg. Understanding the metropolishastings algorithm. THE AMERICAN STATISTICIAN, 1995. [36] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., 6(6):721–741, November 1984. [37] Luc Devroye. Non-uniform random variate generation, 1986. 80 [38] Siddhartha Chib. Chapter 57 - markov chain monte carlo methods: Computation and inference. volume 5 of Handbook of Econometrics, pages 3569 – 3649. Elsevier, 2001. [39] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. [40] G. Yen and T. Hickey. Reinforcement learning algorithms for robotic navigation in dynamic environments. In Neural Networks, 2002. IJCNN ’02. Proceedings of the 2002 International Joint Conference on, volume 2, pages 1444–1449, 2002. [41] AndreyV. Gavrilov and Artem Lenskiy. Mobile robot navigation using reinforcement learning based on neural network with short term memory. In De-Shuang Huang, Yong Gan, Vitoantonio Bevilacqua, and JuanCarlos Figueroa, editors, Advanced Intelligent Computing, volume 6838 of Lecture Notes in Computer Science, pages 210–217. Springer Berlin Heidelberg, 2012. [42] Gerald Tesauro. Temporal difference learning and td-gammon. Commun. ACM, 38(3):58–68, March 1995. [43] Johannes Fürnkranz. Recent advances in machine learning and game playing. ÖGAI Journal, 26(2):19–28, 2007. [44] M. Wiering and M. van Otterlo. Reinforcement Learning: State-of-theArt. Adaptation, Learning, and Optimization. Springer Berlin Heidelberg, 2012. [45] Glenn F Matthews and Khaled Rasheed. Temporal difference learning for nondeterministic board games. In IC-AI, pages 800–806, 2008. [46] Peter Dayan and Bernard W Balleine. Reward, motivation, and reinforcement learning. Neuron, 36(2):285–298, 2002. [47] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3):279–292, 1992. [48] Andrew G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning and sequential decision making. In LEARNING AND COMPUTATIONAL NEUROSCIENCE, pages 539–602. MIT Press, 1989. [49] AndrewW. Moore and ChristopherG. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1):103–130, 1993. 81 [50] Kenneth O. Stanley and Risto Miikkulainen. Efficient reinforcement learning through evolving neural network topologies. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’02, pages 569–577, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. [51] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013. [52] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 02 2015. [53] Carl Edward Rasmussen and Malte Kuss. Gaussian processes in reinforcement learning. In Advances in Neural Information Processing Systems 16, pages 751–759. MIT Press, 2004. [54] Yaakov Engel, Shie Mannor, and Ron Meir. Reinforcement learning with gaussian processes. In Proceedings of the 22Nd International Conference on Machine Learning, ICML ’05, pages 201–208, New York, NY, USA, 2005. ACM. [55] G. Tesauro, R. Das, W.E. Walsh, and J.O. Kephart. Utility-functiondriven resource allocation in autonomic systems. In Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference on, pages 342–343, June 2005. [56] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani. A hybrid reinforcement learning approach to autonomic resource allocation. In Proceedings of the 2006 IEEE International Conference on Autonomic Computing, ICAC ’06, pages 65–73, Washington, DC, USA, 2006. IEEE Computer Society. [57] Jianxin Yao, Chen-Khong Tham, and Kah-Yong Ng. Decentralized dynamic workflow scheduling for grid computing using reinforcement learning. In Networks, 2006. ICON ’06. 14th IEEE International Conference on, volume 1, pages 1–6, Sept 2006. [58] Sebastian Angel, Hitesh Ballani, Thomas Karagiannis, Greg O’Shea, and Eno Thereska. End-to-end performance isolation through virtual 82 datacenters. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14, pages 233–248, Berkeley, CA, USA, 2014. USENIX Association. [59] Facebook. The parse backend platform. https://parse.com/. [60] David Flanagan. JavaScript: The Definitive Guide. O’Reilly Media, Inc., 2006. [61] Peter Bailis and Ali Ghodsi. Eventual consistency today: limitations, extensions, and beyond. Communications of the ACM, 56(5):55–63, 2013. [62] Peter Bailis, Ali Ghodsi, Joseph M Hellerstein, and Ion Stoica. Bolt-on causal consistency. In SIGMOD 2013, pages 761–772. ACM, 2013. [63] Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu, and Yaron Minsky. Bimodal multicast. ACM Trans. Comput. Syst., 17(2):41–88, May 1999. [64] fastly. Blog post on multicast implementation in fastly. http://www.fastly.com/blog/building-fast-and-reliable-purgingsystem/, February 2014. [65] MongoDB, Inc. MongoDB. http://www.mongodb.org/. [66] MongoDB, Inc. Tutorial on query optimization for mongodb. http://docs.mongodb.org/manual/core/query-optimization/, 2015. [67] Ilya Grigorik. Presentation on http/2 mechanics. goo.gl/8yczyz. [68] Saar Cohen and Yossi Matias. Spectral bloom filters. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD ’03, pages 241–252, New York, NY, USA, 2003. ACM. [69] The Apache Software https://storm.apache.org/. Foundation. Apache Storm. [70] Craig Jefferds. Sift.js library for evaluating mongodb-queries. https://github.com/crcn/sift.js/tree/master. [71] R.G. Gallager. Discrete Stochastic Processes. The Springer International Series in Engineering and Computer Science. Springer US, 1995. [72] Amine Abou-Rjeili and George Karypis. Multilevel algorithms for partitioning power-law graphs. In Proceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS’06, pages 124– 124, Washington, DC, USA, 2006. IEEE Computer Society. 83 [73] Yinglian Xie and D. O’Hallaron. Locality in search engine queries and its implications for caching. In INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, volume 3, pages 1238–1247 vol.3, 2002. [74] Satinder Singh and Richard S. Sutton. Reinforcement learning with replacing eligibility traces. In MACHINE LEARNING, pages 123–158, 1996. [75] Hesam Montazeri, Sajjad Moradi, and Reza Safabakhsh. Continuous state/action reinforcement learning: A growing self-organizing map approach. Neurocomputing, 74(7):1069–1082, 2011. [76] Djallel Bouneffouf, Amel Bouzeghoub, and Alda Lopes Gançarski. A contextual-bandit algorithm for mobile context-aware recommender system. In Neural Information Processing, pages 324–331. Springer, 2012. [77] Graham Cormode, Minos Garofalakis, Peter J. Haas, and Chris Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Found. Trends databases, 4(1–3):1–294, January 2012. [78] Charu C. Aggarwal. On biased reservoir sampling in the presence of stream evolution. In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB ’06, pages 607–618. VLDB Endowment, 2006. [79] Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37–57, 1985. [80] Graham Cormode, Vladislav Shkapenyuk, Divesh Srivastava, and Bojian Xu. Forward decay: A practical time decay model for streaming systems. In Data Engineering, 2009. ICDE’09. IEEE 25th International Conference on, pages 138–149. IEEE, 2009. [81] James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013. [82] Joaquin Quiñonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approximate gaussian process regression. The Journal of Machine Learning Research, 6:1939–1959, 2005. [83] Edward Snelson and Zoubin Ghahramani. Local and global sparse gaussian process approximations. In International Conference on Artificial Intelligence and Statistics, pages 524–531, 2007. 84 [84] James S. Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper-parameter optimization. In J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 2546–2554. Curran Associates, Inc., 2011. [85] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13(1):281–305, February 2012. [86] Nimalan Mahendran, Ziyu Wang, Firas Hamze, and Nando de Freitas. Adaptive mcmc with bayesian optimization. In Neil D. Lawrence and Mark Girolami, editors, AISTATS, volume 22 of JMLR Proceedings, pages 751–760. JMLR.org, 2012. [87] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In F. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2951–2959. Curran Associates, Inc., 2012. [88] Jasper Snoek. Spearmint package for bayesian optimisation. https://github.com/HIPS/Spearmint. [89] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pages 143–154, New York, NY, USA, 2010. ACM. [90] Amazon Web Services. Amazon Elastic Compute Cloud (amazon ec2). http://aws.amazon.com/de/ec2/. [91] Scott T. Leutenegger and Daniel Dias. A modeling study of the tpc-c benchmark. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD ’93, pages 22–31, New York, NY, USA, 1993. ACM. [92] Michael Schaarschmidt. Github project page of the Monte Carlo simulation framework. https://github.com/mschaars/Query-SimulationFramework. [93] Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC ’07, pages 29–42, New York, NY, USA, 2007. ACM. 85 [94] L. Breslau, Pei Cao, Li Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: evidence and implications. In INFOCOM ’99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, volume 1, pages 126–134 vol.1, Mar 1999. [95] Daniel S. Myers. Lecture notes on exponential distributions. http://pages.cs.wisc.edu/ dsmyers/cs547/lecture 9 memoryless property.pdf. 86
© Copyright 2026 Paperzz