Towards Latency: An Online Learning Mechanism for Caching

Towards Latency: An Online Learning
Mechanism for Caching Dynamic
Query Content
Michael Schaarschmidt
Sidney Sussex College
A dissertation submitted to the University of Cambridge
in partial fulfilment of the requirements for the degree of
Master of Philosophy in Advanced Computer Science
(Research Project - Option B)
University of Cambridge
Computer Laboratory
William Gates Building
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom
Email: [email protected]
June 11, 2015
Declaration
I Michael Schaarschmidt of Sidney Sussex College, being a candidate for the
M.Phil in Advanced Computer Science, hereby declare that this report and
the work described in it are my own work, unaided except as may be specified
below, and that the report does not contain material that has already been
used to any substantial extent for a comparable purpose.
Total word count: 14,975 (excluding appendices A and B)
Signed:
Date:
c
This dissertation is copyright 2015
Michael Schaarschmidt.
All trademarks used in this dissertation are hereby acknowledged.
Acknowledgements
I would like to express gratitude to my supervisor Dr. Eiko Yoneki for her
comments, advice and encouragement throughout this project. I would further like to especially thank Felix Gessert for his advice and our discussions
on practical caching issues. Additionally, I would like to thank Valentin Dalibard for his insights into Bayesian optimisation. Finally, I want to thank
Dr. Damien Fay for his comments on online learning.
Abstract
This study investigates caching models of dynamic query content in distributed web infrastructures. Web performance is largely governed by latency
and the number of round-trips required to retrieve content. It has also been
established that latency is directly linked to user behaviour and satisfaction
[1]. Recently, access latency has gained importance together with service
abstraction in the data management space. Instead of having to manage a
dedicated cluster of database servers on premises, applications can use highlyavailable and scalable database-as-a-service (DBaaS) platforms. These services typically provide a REST interface to a set of basic database operations
[2]. A REST-ful approach enables the use of HTTP caching through browser
caches, content delivery networks (CDNs), proxy caches and reverse proxy
caches [3, 4, 5]. Such methods are used extensively to cache static content like JavaScript libraries or background images. However, caching result
sets of database queries over an arbitrary number of dynamic objects in distributed infrastructures poses multiple challenges. First, any query-caching
scheme needs to maintain consistency from the client’s perspective, i.e. a
cache should not return stale content. From the server’s perspective, it is
hard to predict an optimal expiration for a collection of objects that form a
query result since each individual object is read and updated with arbitrary
frequency. DBaaS providers thus generally do not cache their interactive
content, resulting in noticable loading times when interacting with dynamic
applications.
This project introduces a comprehensive scheme for caching dynamic query
results. The first component of this model is based upon the idea that there
are multiple ways to represent and cache query results. Further, the model
relies on a stochastic method to estimate optimal expiration times for dynamically changing content. Finally, an online learning model enables real-time
decisions on the different cache representations. As a result, the model is able
to provide imperceptible request latency and consistent reads for clients.
Contents
1 Introduction
1
2 Background and Related Work
2.1 Web Caching . . . . . . . . . . . . . . . . . . .
2.1.1 Introduction to Web Caching . . . . . .
2.1.2 Previous Work . . . . . . . . . . . . . .
2.2 Bloom Filters . . . . . . . . . . . . . . . . . . .
2.3 Monte Carlo Methods . . . . . . . . . . . . . .
2.4 Machine Learning . . . . . . . . . . . . . . . . .
2.4.1 Reinforcement Learning . . . . . . . . .
2.4.2 Machine Learning in Data Management
3 Caching Queries
3.1 Introduction . . . . . . . . . . . . . . . . . .
3.1.1 The Latency Problem . . . . . . . . .
3.1.2 The Staleness Problem . . . . . . . .
3.1.3 Model Assumptions and Terminology
3.2 Caching Models for Queries . . . . . . . . .
3.2.1 Caching Object-Lists . . . . . . . . .
3.2.2 Caching Id-Lists . . . . . . . . . . . .
3.2.3 Matching Queries to Updates . . . .
3.2.4 When Not to Cache . . . . . . . . . .
3.3 Estimating Expirations . . . . . . . . . . . .
3.3.1 Approximating Poisson Processes . .
3.3.2 Write-Only Estimation . . . . . . . .
3.3.3 Dynamic Quantile Estimation . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
7
9
10
12
12
14
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
18
19
21
21
23
26
29
31
31
33
34
4 Online Learning
37
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Representation as an MDP . . . . . . . . . . . . . . . . . . . . 38
i
4.3
4.2.1
4.2.2
4.2.3
Belief
4.3.1
4.3.2
4.3.3
State and Action spaces . . .
Decision Granularity . . . . .
Reward Signals . . . . . . . .
State Approximation . . . . .
Convergence and Exploration
Sampling Techniques . . . . .
Hyperparameter Optimisation
5 Evaluation
5.1 Aims . . . . . . . . . . . . . . . . .
5.2 Simulation Framework . . . . . . .
5.2.1 Design and Implementation
5.2.2 Benchmark Configuration .
5.3 Comparing Execution Models . . .
5.3.1 Read-Dominant Workload .
5.3.2 Write-Dominant Workload .
5.4 Consistency and Invalidations . . .
5.4.1 Adjusting Quantiles . . . . .
5.4.2 Reducing Invalidation Load
5.5 Online Learning . . . . . . . . . . .
5.5.1 Learning Decisions . . . . .
5.5.2 Evaluating Trade-offs . . . .
5.5.3 Convergence and Stability .
6 Outlook and Conclusion
6.1 Summary and Conclusion . . . .
6.2 Future Work . . . . . . . . . . . .
6.2.1 Parsing Query Predicates
6.2.2 Unified Learning Model .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38
39
40
42
43
44
45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
50
50
53
53
53
57
59
59
61
63
63
65
67
.
.
.
.
69
69
70
70
70
A Proofs
73
A.1 Minimum of Exponential Random Variables . . . . . . . . . . 73
B Additional Analysis
75
B.1 Impact of Invalidation Latency . . . . . . . . . . . . . . . . . 75
B.2 Monte Carlo Optimisation . . . . . . . . . . . . . . . . . . . . 76
ii
List of Figures
1.1
Simplified caching architecture with clients in Europe bound
by access latency to a backend server in the USA. . . . . . . .
3
2.1
2.2
2.3
Empty Bloom filter of length m. . . . . . . . . . . .
Insertion of new element e into Bloom filter. . . . .
Reinforcement learning: An agent takes actions and
new states and rewards through his environment. .
3.1
Query matching architecture overview. A load balancer distributes requests from caches. An invalidation engine determines which query results are stale. Bloom filters can then be
used to determine whether they are still cached at some CDN
edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Topology of an Apache Storm invalidation pipeline. Afterimages of update operations are published to Storm spouts
(data stream endpoints). They determine which bolt (stream
processing node) holds the cached queries related to that update. Bolts evaluate the queries on the after-image to find
which result sets are invalid and notify the DBaaS, which sends
invalidations to the cache. . . . . . . . . . . . . . . . . . . . . 30
3.2
. . . . . . 9
. . . . . . 9
observes
. . . . . . 12
4.1
Utility function example for response times. . . . . . . . . . . 46
5.1
5.2
Overview of the simulation architecture. . . . . . . . . . . .
Cache hit rates as a function of average query selectivity on a
mixture of 40% reads, 55% queries and 5% writes. . . . . . .
Average query response times as a function of average query
selectivity on a mixture of 40% reads, 55% queries and 5%
writes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cache hit rates as a function of average query selectivity on
a mixture of 40% reads, 55% queries and 5% writes under a
uniform access distribution. . . . . . . . . . . . . . . . . . .
5.3
5.4
iii
. 52
. 54
. 54
. 56
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
Average query response times as a function of average query
selectivity on a mixture of 40% reads, 55% queries and 5%
writes under a uniform acess distribution. . . . . . . . . . . . .
Average query response times as a function of average query
selectivity on a mixture of 25% reads, 25% queries and 50%
writes. Estimation quantiles have been adjusted to p = 0.4 to
account for the write-dominant workload. . . . . . . . . . . . .
Cache hit rates as a function of average query selectivity on a
mixture of 25% reads, 25% queries and 50% writes. . . . . . .
Absolute number of stale reads on the write-dominant workload as a function of the quantile of the next expected write. .
Cache hit rates on the write-dominant workload as a function
of the quantile of the next expected write. . . . . . . . . . . .
Cache hit rates on the write-dominant workload as a function
of the quantile of the next expected write and compared to a
static caching method. . . . . . . . . . . . . . . . . . . . . . .
Invalidation loads for using the naive id-list approach versus
dynamically marking frequently written objects as uncachable.
Global utility as a function of operations performed. . . . . . .
Behaviour of learning model versus random guessing under a
change of workload mixture. . . . . . . . . . . . . . . . . . . .
57
58
58
60
60
62
62
66
68
B.1 Stale reads as a function of mean invalidation latency on 100,000
operations. Higher invalidation latency gives rise to more stale
reads, as there is a bigger time window to retrieve stale content from the cache. Marking frequently written objects as
uncachable reduces this effect. . . . . . . . . . . . . . . . . . . 75
iv
List of Tables
3.1
3.2
3.3
3.4
3.5
3.6
3.7
5.1
5.2
5.3
5.4
5.5
Employee table. . . . . . . . . . . . . . . . . . . . . . . . . .
CDN after caching Q1 as an object-list. . . . . . . . . . . . .
CDN after caching Q1 , Q2 as object-lists. . . . . . . . . . . .
CDN after caching Q1 as an id-list, before client has requested
individual resources. . . . . . . . . . . . . . . . . . . . . . .
CDN after client has requested all individual resources. . . .
CDN after invalidation of id 1, id-list still matches query predicate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CDN after invalidation of id 1, id-list does not match query
predicate any more. . . . . . . . . . . . . . . . . . . . . . . .
. 22
. 23
. 23
. 24
. 24
. 25
. 25
Average overall request response times (ms) for learning model
compared to random guessing and static decisions on a readdominant workload. . . . . . . . . . . . . . . . . . . . . . . . .
Cache hit rates for learning model compared to random guessing and static decisions on a read-dominant workload. . . . . .
Average query response times (ms) for learning model compared to random guessing on execution model on read-dominant
workload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Average request response times (ms) for learning model compared to random guessing on execution model on write-dominant
workload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Invalidation loads for learning model compared to random
guessing on execution model on write-dominant workload. . .
63
63
64
65
65
B.1 Bayesian optimisation of optimal quantile p and maximum
allowed ttl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
v
vi
Chapter 1
Introduction
In recent years, cloud computing has allowed users to delegate the task of
storing and managing data for web-based services. In particular, companies
or individual developers can now completely withdraw from the costly task
of setting up and maintaining dedicated database servers. Instead, they can
utilise database-as-a-service (DBaaS) platforms that offer service level agreements on availability and performance, flexible pricing models and elastic
configurations that can quickly allocate virtualised resources [2].
This project assesses how providers of such services can cache content that
is updated by users interacting with dynamic applications, e.g. mobile applications or web sites. For instance, a social network application constantly
refreshes in order to show the latest content from a user’s network. While
interacting with such applications, users need to wait on the DBaaS server
to deliver new data to their end devices. For an interactive application, imperceptible loading times (ideally below 100 milliseconds) are desirable so
the user’s experience is not interrupted. However, this often times proves
problematic if clients and DBaaS servers are located in different geographic
regions.1 Hence, service providers aim to deliver as much content as possible
through local cache servers. A good example of an effective local cache server
1
A single round-trip between Europe and the United States takes around 170 milliseconds. [6]
1
are content delivery networks (CDNs). CDNs are often used to deliver static
background images and style sheets.
Caching dynamic content can nevertheless prove difficult because some content may be updated very frequently, e.g. every few seconds. This is problematic because on every update to any part of the content, the DBaaS provider
needs to determine which entries to delete from caches, which is a computationally expensive task for large databases. Further, in order to prevent
clients from reading stale content the DBaaS would have to permanently
send out requests to delete old cached content. These issues of maintaining
a consistent view of the data for the client while not blocking performance of
the backend server generally prevent DBaaS providers from caching volatile
content. Consequently, many applications suffer from long loading times.
This study proposes a comprehensive caching scheme for caching dynamic
query content. Figure 1.1 provides an abstract view on the suggested caching
infrastructure. There are clients in one geographic region and a DBaaS in another geographic region. In a drastically simplified network topology, clients
access the geographically closest CDN server to access content, thus minimising latency. By interacting with their applications, clients also continuously
update content, e.g. by posting a comment in a social feed. This project thus
deals with mechanisms that allow the DBaaS to cache the content of queries
that change on the scale of seconds while ensuring high consistency at the
client. On a high level, this will be achieved by monitoring access metrics
like the frequency of incoming updates, thus allowing for a stochastic view
on optimal cache expirations. Further insights with regard to the semantics of caching lead to different caching models. Acting on these results, a
machine learning module will provide an effective model for online decisions
on incoming queries. Through the combination of semantic insights about
caching, stochastic analysis and machine learning, I will therefore present the
first comprehensive web caching scheme for highly volatile query content.
2
Clients (end devices)
Clients
in
Europe
Requests resources
Issues queries
CDN
edge in
Europe
Cache
Returns resources
and requests invalidations
Forwards requests
DBaaS
Updates
Requests decision and ttl
Amazon
EC2 US
region
Queries
Learner
Estimates
rewards
Access metrics
MongoDB
Figure 1.1: Simplified caching architecture with clients in Europe bound by
access latency to a backend server in the USA.
3
In summary, this work makes the following contributions:
• A comprehensive scheme for caching dynamic query results, thus enabling interactive applications with drastically reduced response times.
• A stochastic method to estimate optimal expiration times for dynamically changing query results.
• An online learning mechanism that can adapt caching models to changing request loads.
• A dedicated Monte Carlo simulation framework which can be used to
analyse various properties of query processing.
The structure of my dissertation is as follows: Chapter 2 provides a brief
overview on REST-based web caching, Bloom filters, Monte Carlo methods and reinforcement learning as well as on related work on these topics.
Chapter 3 introduces the concept of query caching and the implications of
different cache representations. Chapter 4 proposes a machine learning model
that provides online decisions on these representations. In chapter 5, I first
explain the implementation of my simulation framework before evaluating
different cache representations and the learning model. Finally, chapter 6
summarises my findings and concludes with an outlook on future work.
4
Chapter 2
Background and Related Work
2.1
2.1.1
Web Caching
Introduction to Web Caching
In this chapter, I provide an overview on some essential concepts concerning
web caching, Bloom filters and machine learning. I also supply recent examples of work related to these concepts. In doing so, I assume the reader to
be familiar with the basic ideas of web protocols, database management and
probability theory. This section begins with an introduction to web (HTTP)
caching.
The fundamental challenge of web caching is consistency, i.e. the requirement that content read from a cache be up-to date. For consistency purposes, there are essentially two types of caches. Expiration-based caches
like browser caches, forward proxy caches or ISP caches control consistency
through freshness and validation. Freshness is the duration for which a
cached copy is considered fresh and can be controlled through “max-age”
or “expires” HTTP headers. For instance, if a cached object expires after
one minute, then there is a clear one minute upper limit on how long a client
may see old content if the original content is modified. Expiration-based
5
caches can also validate their content by using an “If-modified” header in a
refresh request. On the other hand, invalidation-based caches like content
delivery networks or reverse proxy caches are server-controlled caches. That
is, the origin server of the content can actively control consistency by deleting
(invalidating) content from the cache through a specific HTTP request [7, 8].
A reverse proxy is usually located at the network of the origin server and can
be used to hide the structure of the internal network, reduce and distribute
load from incoming requests and cache content from the origin server. Note
that reverse proxies, due to their location at the origin server, do not aid
in mitigating latency caused by access from a geographically distant client.
Thus, this project deals primarily with invalidation-based mechanics using
the example of CDNs.
Content delivery networks distribute content through a globally distributed
network of cache servers (or edge servers). Requests from clients are usually
routed to the closest edge server to minimise access latency. There are various
types of CDN architectures, network topologies and use cases. CDNs can
be used to cache complete websites in the function of a proxy cache, to
synchronise and deliver streaming content or to cache the embedded static
parts (e.g. stylesheets) of dynamic websites. For this project, the most
relevant feature of CDNs is their invalidation mechanism. The origin server
generally sets an expiration for the cached content. However, the origin server
can also ask the CDN to remove the content through an invalidation request,
which means the origin server can actively mitigate reads of stale content
from the cache. Clients can also add revalidation headers to their request
if they do not want to risk reading stale cache content. This instructs the
CDN to request the latest content version from the origin server.
The key point here is that this does not enforce strong consistency because
the cache does not know about updates at the DBaaS immediately. Instead,
there is the notion of eventual consistency, i.e. consistency requirements are
relaxed for higher performance (more cache hits, less requests sent to the
origin server) [9]. Even if the origin server sends an invalidation to the CDN
directly after an update, there is a small time window until the invalidation is
6
completed in which clients can read stale content. A read is only guaranteed
to be consistent if it adds a revalidation header, thus excluding cached content
and increasing load at the origin server. A large part of this work concerns the
mechanisms of invalidation for dynamic query content and their implication
for overall system performance.
2.1.2
Previous Work
In this section, I briefly survey previous and related work on caching. First,
this project relies upon my own previous work on expiration based caching
[6] and on the architecture of scalable cloud-databases [10, 11]. Gessert
and I have proposed a comprehensive scheme for leveraging global HTTP
caching infrastructures. More specifically, we have introduced the Cache
Sketch, a Bloom filter-based representation of database records that is used
to enable tunable consistency and performance in web-caching. Throughout
this dissertation, I will repeatedly point towards specific aspects of this work
(and other related work) in order to clarify my analysis. The primary focus of
our previous work was to introduce a proof of concept for dynamic caching of
single database records. The contribution of this project is to advance these
ideas into a model of caching full query results as well as adding an online
learning component for decision making on query-execution. To this end,
Monte Carlo methods are employed to analyse the performance of various
configurations. Monte Carlo simulations have been used previously to help
quantify eventual consistency measures [12, 13, 14].
Recently, Huang et al. have provided an in-depth analysis of a large scale
productive caching infrastructure by looking at Facebook’s photo cache [15,
16]. Even though this example contains some photo-specific problems (resizing), it still contains relevant insights. Pictures are essentially read-only
content and the challenge in an infrastructure at the scale of Facebook’s photo
cache lies in the huge data volume. Nevertheless, their work can provide an
understanding of typical workloads and achievable cache hit rates. Apart
from this recent work, there is an extensive body of research on the nature
7
of internet content delivery systems [17, 18] and their workloads [19]. Recent
research has also looked into content delivery networks (CDN) and their role
in dealing with sudden popularity (“flash crowds”) of social media content
as well as with geographically distributed workloads [20, 21, 22].
This work aims to provide low-latency through exploiting existing HTTP
caching infrastructures. Another popular approach that however requires
additional infrastructure is geo-replication. Instead of caching data on geographically distributed edges of a CDN infrastructure, the database system
itself is globally distributed [23, 24]. A primary example of this is Google’s
Spanner [25]. Data is replicated across datacenters and manages serialisation
of distributed transactions through globally meaningful commit timestamps.
This enables globally-consistent reads across the database for a given timestamp. The main performance issue stems from the fact that synchronisation
between data centers is costly, as it is bound by round-trip delay time between geographic regions. Finally, there have been some previous efforts into
scalable query caching. Garrod et al. have achieved high cache hit rates by
using proxy servers with a distributed consistency management model based
on a publish/subscribe invalidation architecture [26]. There have also been
some efforts into adaptive time-to-live (ttl) estimation of web-search results
[27]. This work separates itself from previous work in multiple aspects. First,
it uses existing HTTP infrastructure and does not require additional dedicated servers for caching. Employing stochastic models, this work provides
a record-level analysis of query results to provide much more fine-grained
ttl estimates. Furthermore, the online learning model can achieve tunable
trade-offs between average query response time, consistency and server load
by changing execution models for queries at runtime.
8
2.2
Bloom Filters
Bloom filters are space-efficient probabilistic data structures that allow membership queries on sets with a certain false positive rate [28]. A Bloom filter
represents a set S of n elements through a bit array of length m. It also
requires k independent hash functions h1 , . . . , hk with range 1, . . . , m that
map each element uniformly to a random index of the bit array. To save
an element s ∈ S to the Bloom filter, all k hash functions are computed
independently and the appropriate indices in the bit array are set to 1 (and
stay 1 if they were already set from another insert), as seen in figures 2.1 and
2.2.
0
0
0
0
0
0
0
1
0
m
Figure 2.1: Empty Bloom filter of length m.
Element e
h1(e)
1
0
h2(e)
0
1
0
1
h3(e)
1
0
0
m
Figure 2.2: Insertion of new element e into Bloom filter.
A membership query can then be performed by again computing the hash
functions and looking up if all k result indices are set to 1. This means that
a false positive occurs through hash collisions if inserts from other elements
have already set the relevant bits. An extension of this concept is the counting
Bloom filter, which has counters instead of single bits, thus also enabling the
deletion of elements through decreasing the counter (which could cause a
9
false negative with a single bit). It can then be shown that the false positive
rate can be approximated as follows [29]:
k
kn
1 kn
f = 1 − (1 −
≈ 1 − e− m .
m
(2.1)
The implication of being able to determine the false positive rate as a function
of expected objects, length m and hash functions is that Bloom filters are
precisely tunable, i.e. the size can be controlled according to the false positive
rate. Bloom filters have found particular use in networking applications, as
they can be transferred quickly due to their compact representation [30, 31,
32].
2.3
Monte Carlo Methods
Monte Carle methods are a set of computational techniques that are used
to approximate distributions in experiments through repeated random sampling. Monte Carlo simulations are widely employed in the physical sciences
to model and understand the behaviour of probabilistic systems. They essentially rely on the law of large numbers, i.e. the expectation that the sample
mean over a sufficient number of inputs will approximate the actual mean
of the target distribution [33]. There are three central components to Monte
Carlo simulations [34]:
(1) A known input distribution for the system.
(2) Random sampling from the input distribution and simulation of the
system and its conditions of interest under the sampled inputs.
(3) Numerical evaluation of the aggregated results.
A generic approach to Monte Carlo simulation is the construction of a Markov
Chain that converges to a target density equal to the distribution of interest.
10
This is particularly relevant to the simulation of complex multivariate distributions. Consequently, there is an extensive body of research on sampling
methods, notably Gibbs sampling and the Metropolis-Hastings algorithm
[35, 36]. Monte Carlo simulation is also useful in the analysis of distributed
systems and caching infrastructures. In particular, Monte Carlo simulation
of access and latency distributions enables detailed analysis of caching behaviour, as it can quantify the impact of small changes in latency and workload on performance. Fortunately, simulation of database workloads can be
achieved by drawing a key for a database entry to access from a univariate
discrete distribution. An easy way to do this is the inverse integral transform
method, which will be introduced briefly [37]. Consider a discrete random
variable X to sample from and its probability mass function
fX (x) = P r(X = x) = pj , j = 1, 2, . . . ,
X
pj = 1,
(2.2)
j
as well as its cumulative mass function
P r(X ≤ Xi ) ≡ F (Xi ) = p1 + p2 + . . . + pj .
(2.3)
The inverse then takes the form
F −1 (u) = Xi if p1 + p2 + . . . + pj−1 ≤ u ≤ p1 + p2 + . . . + pj .
(2.4)
Hence, the discrete distribution can be sampled by drawing a sample U from
a distribution uniform on (0, 1) and then computing the inverse F −1 (U ) ,
as described by Chib [38]. It thus follows that one can sample Xi with its
probability pi because
P r(F −1 (U ) = Xi ) = P r(p1 + . . . + pj−1 ≤ U ≤ p1 + . . . + pj ) = pj .
(2.5)
In the Monte Carlo simulation framework, inverse integral transform is used
because it is computationally inexpensive and provides good accuracy.
11
2.4
2.4.1
Machine Learning
Reinforcement Learning
Reinforcement learning (RL) is a machine learning technique that is characterised by software agents that interact with an environment and learn
optimal behaviour through rewards on the actions they take [39]. Initially,
the agent does not know how its actions change its environment and thus
has to explore the space of available actions (as schematically depicted in
figure 2.3). Hence, RL does not require an explicit analytical model of the
environment.
Agent
Reward
State
Action
Environment
Figure 2.3: Reinforcement learning: An agent takes actions and observes new
states and rewards through his environment.
More precisely, RL is a form of sequential decision making. The goal of the
agent is to select actions that maximise the sum of all future rewards. A
reward is a scalar feedback value the agent receives after taking an action.
Rewards can be stochastic and delayed, thus making it harder for the agent to
reason about the consequences of his actions. For instance, a single move in
a board game during the beginning of a match might have consequences that
only become apparent after one player wins. Variations of RL have been used
in various applications, notably navigation in robotics [40, 41] and complex
12
board games [42, 43, 44, 45]. Formally, RL problems can be understood as
Markov decision processes (MDPs). A finite MDP has four elements [46, 39]:
(1) A set of states S.
(2) A set of actions A.
(3) For a given pair of state and action (s, a) at some point in time t, a
transition probability of possible next states s0 is
a
0
Pss
0 = P r{st+1 = s |st = s, at = a}.
(4) The associated expected reward for a transition from s to s0 trough a:
a
0
Rss
0 = E{rr+1 |st = s, at = a, ss+1 = s }.
A policy then maps states to actions that presumably maximise rewards.
In general, RL techniques aim to find optimal policies for a given MDP by
iteratively improving upon their current estimates for state and action pairs
as they observe rewards. A popular RL method is Q-learning, which can
learn optimal policies by comparing expected cumulative rewards (Q-values)
in environments with stochastic transitions and rewards [47]. This is achieved
by updating a function Q : S × A → R:
h
i
Qt+1 (st , at ) ← Qt (st , at ) + α rt+1 + γ max Qt (st+1 , a) − Qt (st , at )
a
(2.6)
Intuitively, initially fixed Q-values are adjusted by combining observed reward rt+1 after taking a transition and selection of the action that is estimated
to maximise future rewards. Updates are parametrised through a learning
rate α and a discount factor γ that prohibits infinite rewards in state-action
loops. A central component of RL is the trade-off between exploitation and
exploration. During learning, the agent needs to explore his environment
by trying out actions that are non-optimal under his current policy to find
whether these state-action sequences lead to higher overall rewards than following his current policy. Typically, this exploration rate is decreased over
time so the agent eventually primarily exploits its found policy.
13
Reinforcement learning (RL) in small (finite) state and action spaces is a well
understood application of finite Markov decision processes [39, 48, 47, 49].
For large state and action spaces, policies cannot be expressed as simple
lookup tables of actions and associated rewards on transitions. Hence, function approximators are frequently employed to estimate rewards [50, 42].
Specifically, the rise of deep neural networks has recently inspired novel research, as neural networks have been successfully applied to approximate
delayed rewards in complex noisy environments [51, 52]. Other RL approaches from Bayesian learning employ Gaussian processes to estimate rewards [53, 54].
2.4.2
Machine Learning in Data Management
In recent years, many scientific disciplines have begun to investigate machine
learning methods as a new tool for research in their domains. In database
management and cloud computing, a particularly interesting problem is the
question of how to adapt behaviour to changing workloads. Traditional rulebased or threshold-approaches on resource allocation in compute clusters (e.g.
provision a new server if load is over a given percentage of capacity) can be
replaced by online learning strategies [55, 56, 57]. Such improvements can be
practically achieved by implementing a middleware control layer that tracks
data flow in real-time on top of web-based services. For instance, Angel et
al. have demonstrated how to provide throughput guaruantees for multitenancy clusters by network request header inspection [58]. Alici et al. have
explored machine learning based expiration estimation strategies for search
engine results that however rely on offline training data to build a set of
features and cannot adapt to highly dynamic workloads [27]. This is because
their approach works on a much larger time-scale of months, whereas this
work provides a learning model that recognises workload changes in a matter
of minutes.
14
The advantage of the middleware approach is that it allows for a more generic
and transferable learning process, as opposed to interferring with the underlying application to achieve more control over specific configurations. In
this project, I also opt for a middleware approach and treat the database
as a query execution service to the learning model. This way, the concept
is not limited to specific query languages or database paradigms but relies
only on properties of request distributions that are interpreted as stochastic
processes.
15
16
Chapter 3
Caching Queries
3.1
Introduction
This chapter contains an in-depth description of the query caching scheme.
First, the challenges of caching dynamic content are discussed in detail. Next,
different representations for caching queries are suggested. Further, the problem of determining which queries are invalidated by an update is considered.
Finally, a stochastic method to determine optimal expirations for query results is introduced.
3.1.1
The Latency Problem
The framework of assumptions is a database-as-a-service provider exposing
its API through a HTTP/REST interface, e.g. Facebook’s popular parse
platform [59]. Understanding the structure of modern web or mobile application helps to see the importance of latency. In general, there are two
aspects to the performance of a web-based application. First, there is an
initial loading period when the browser has to request all external resources
and build the Document Object Model of the application. The duration of
this so called critical rendering path depends on the number of critical exter17
nal resources, their size and the time it takes to fetch them from a DBaaS,
a CDN or a proxy cache. Loading times hence depend on the number of
round-trips and on the round-trip latency. Static resources like JavaScript libraries or background images are thus cached on all levels of the web-caching
hierarchy. Second, there is dynamic and interactive content that has to be
requested by the end device while the user is interacting with the application. Single-page applications are a typical form of this interaction. On a
single-page application, all navigation happens within a single website that is
never left but dynamically changed depending on user actions [60]. Considering mobile applications running with DBaaS platforms, application logic
is often executed on the client side (smart client), whereas the server is primarily a data management service. Consequently, user experience critically
depends on low latencies of all interactions with the DBaaS, which includes
minimising the number of geographically-bound latency round-trips as well
as maximising cache hits in the caching hierarchy.
3.1.2
The Staleness Problem
The latency problem cannot be solved by simply pushing as much dynamic
content as possible into various layers of the caching hierarchy. Without
further measures, writes would continuously flush the caches. This creates
problems for both clients and servers. First, determining which objects are
potentially alive in which layer of the caching hierarchy and sending invalidations requests creates load on the DBaaS. Further, every invalidation creates
a potential stale read for the client. A stale read occurs in the following
situation:
• A client sends an update on some object and gets an acknowledgement
at some point t0 for version vw .
• At some later point in time t1 , a client requests the same object.
• The cache returns the object with some version vr .
18
• If the write-acknowledged version vw is newer than the version vr read
from the cache, the read is stale.
How can the cache return an older version if the server already acknowledged the write of a newer version? This is because invalidation is generally
an asynchronous operation. The DBaaS executes an update and waits for
a write-acknowledgement from the database. It then sends an acknowledgement of the write back to the client and an invalidation to the appropriate
caching layers. Blocking on invalidations is not feasible because an invalidation could get lost (e.g. network partition) and the DBaaS would lose
availability. Further, a single cached object might have to be invalidated
over multiple geographic locations, thus potentially incurring multiple expensive round-trips. This work thus aims for a best-effort view on eventual
consistency. First, if the cached content expires before a write, there can be
no stale read. Second, even if a cached item has not expired, there cannot
be a stale read if the invalidation is fast enough and there is enough time
between an update and subsequent read. There is an inherent trade-off between providing results both in a consistent and timely manner. This notion
of eventual consistency has been thoroughly investigated by Bailis et al. in
their seminal work on probabilistically bounded staleness [12, 61, 62, 13]. In
particular, they were able to provide expected bounds on staleness for certain
request distributions.
In summary, the problem of staleness and invalidation load makes it prohibitively expensive to cache dynamic query content. To the best of my
knowledge, DBaaS-providers and other web-services thus refrain from caching
their interactive content.
3.1.3
Model Assumptions and Terminology
Previous work has proposed a cache-coherent scheme to cache volatile objects [6], as summarized in chapter 2. This work dealt exclusively with simple
create, read and update operations on individual objects. However, at least
regarding the content of the requests, this was rather a proof-of-concept,
19
as actual queries usually do not just request single database objects. This
project thus turns to investigate query-specific problems. The first insight
of the query-caching scheme is to acknowledge that there are multiple ways
to execute queries and represent query results. Before introducing these different representations, it is worth discussing some model assumptions. As
explained in the introduction on web-caching, there is generally a whole hierarchy of expiration and invalidation-based caches. This work concentrates
on the specific interaction of clients and servers with an invalidation-based
cache, i.e. a CDN. When I use the term “the cache” in the remainder of this
work, I am referring to an instance of an invalidation-based cache. Note that
a CDN has multiple points of presence, ideally one in all major geographic
regions.
It is a valid reduction to investigate caching behaviour on a single edge-server.
First, for a given client, requests will usually be routed to the same edgeserver in a CDN infrastructure, i.e. the one that minimises (geographically
bounded) round-trip latency. Second, consider the hypothetical case of a
client whose HTTP requests are randomly routed to one of multiple edges.
A write operation still only causes a single invalidation request to be sent out
by the DBaaS. This is because the DBaaS does not have to send invalidation
requests to every edge-server of a CDN. Instead, an invalidation request is
only sent to the closest CDN edge. The invalidation can then be distributed
through a bimodal multicasting protocol [63, 64]. The important insight here
is that the invalidation load for the DBaaS does not depend on the number of
CDN edges (and grows linearly for reverse proxies). Similarly, routing queries
to different edge-servers will lead to a lower cache hit rate on the individual
edges, which can easily be simulated on a single cache by adjusting query
parameters. I thus consider the abstraction of using a single invalidationbased cache to be feasible for the analysis of query-caching behaviour.
Furthermore, the term “database object” needs to be clarified. An object
refers to a single entry in the database, which can be a single row in a relational model, a JSON-style document in a document database or a serialised
string in a key-value store. The usage of the term “object” is primarily mo20
tivated through the fact that the DBaaS server represents database entries
as REST-ful resources after retrieving them from the database. This illustrates the point that the proposed caching scheme is independent from the
database employed by the DBaaS. Naturally, the performance of the system
will vary depending on whether the chosen database matches the requirements of the workload. MongoDB, a wide-spread document database that
is based on JSON-style documents as the basic record [65] is used in the
evaluation. MongoDB organises documents in collections (roughly equivalent to a relational table) and is popular for its scalability and flexibility to
store schema-free data. Finally, this work assumes a large cache so the performance does not depend on cache size or eviction algorithms. It is clear
that a smaller cache leads to systematically lower cache hit rates for certain request distributions. Incorporating this additional degree of freedom is
hence not particularly interesting to this study. Nevertheless, the impact of a
limited cache size will be factored into the discussion of uncachable objects.
3.2
3.2.1
Caching Models for Queries
Caching Object-Lists
The first query-caching model is the naive approach of caching query results
as complete collections of result-objects, i.e. a single entry in the cache maps
a query to its result. The processing flow of this model is fairly straightforward. A client issues a query that initially reaches the closest CDN edge.
In the case of a cache miss, the CDN forwards the request to the DBaaS,
which evaluates the query. It then estimates a time-to-live for the query
result, the mechanics of which will be discussed later. The result is then
returned to the CDN, added to the cache and finally returned to the client.
For all subsequent requests with the same query, a single round-trip to the
CDN is sufficient to retrieve the whole query result, as long as the result has
not expired or has been invalidated. The CDN simply checks the hashes of
incoming queries for a match in its table of cache entries. In terms of min21
imising response time, this is optimal from the client’s perspective. Note that
the CDN is agnostic towards the content of its entries. It cannot recognise
that some query’s result set is a subset of another cached query result. That
would both require semantic insights into the nature of the queries as well as
knowledge about the structure of the database’s values. This would essentially require a geo-replicated database to locally validate the similarity of
queries, which is a different caching paradigm. However, this project specifically aims to exploit readily available HTTP caching infrastructure that does
not require multiple dedicated DBaaS-server locations.
This model can be illustrated with a simple example. For ease of reading,
I use a relational table and a SQL query, which I will assume the reader
to be familiar with. As pointed out above, even though DBaaS-queries are
typically abstracted to short method calls and often query NoSQL databases,
the mechanics of my caching scheme do not rely on a specific database or
query paradigm. Consider a drastically simplified employee-table that only
contains an id as its primary key and a salary, as seen in table 3.1.
Id
Salary
1
2
3
4
20,000
25,000
30,000
50,000
Table 3.1: Employee table.
A query Q1 now selects all employees with salaries under a certain limit:
SELECT * FROM employee WHERE salary < 30000
Consequently, the CDN will store a mapping from Q1 to the result set, as
seen in table 3.2. The cache is conceptualised a simple hash table, i.e. the
key Q1 refers to its hash.
If another similar query Q2 is evaluated, the CDN blindly caches intersecting
results separately:
SELECT * FROM employee WHERE salary > 22000
22
Key
Value
Q1
{{id : 1, salary : 20, 000}, {id : 2, salary : 25, 000}}
Table 3.2: CDN after caching Q1 as an object-list.
The query Q2 will now leave the cache in the state seen in table 3.3.
Key
Value
Q1
Q2
{{id : 1, salary : 20, 000}, {id : 2, salary : 25, 000}}
{{id : 2, salary : 25, 000}, {id : 3, salary : 30, 000},
{id : 4, salary : 50, 000}}
Table 3.3: CDN after caching Q1 , Q2 as object-lists.
Now consider a write on some object that is part of a cached query result.
Since the query result was cached as one big object (i.e. a single list of
database objects), the whole result is invalidated. In the example, both
entries for Q1 and Q2 are removed from the cache if the object with id 2 is
updated. Depending on the workload, this can lead to drastically reduced
cache hit rates, as a single update could empty the whole cache. However, if
result sets of queries have mostly empty intersections, writes invalidate fewer
results and increase cache-performance. Note that determining the result
sets that need to be invalidated is a potentially expensive task on its own,
which will be discussed later in this chapter.
3.2.2
Caching Id-Lists
An alternative option to caching query results is the id-list model. Assuming
an empty cache, the first difference of this model from the object-list approach
is the actual query execution. Instead of executing the query in full and
retrieving complete database objects, the query is intentionally executed to
only return the ids (or keys) of matching objects. This can improve query
cost, as the query can potentially be executed as a so called covered query.
A query is covered if an index covers it: if all fields requested in the query
are part of an index and all result fields are also part of that index, the query
23
can be executed by querying the index. The index is typically located in the
RAM of the database server and thus significantly faster than disk reads.
This is an established technique for query optimisation and routinely offered
by databases [66]. The DBaaS then returns this list of ids to the CDN, which
creates an entry for it. Reusing the previous example with an initially empty
cache, the cache state is now in the state seen in table 3.4 after executing
Q1 .
Key
Value
Q1
{{id : 1}, {id : 2}}
Table 3.4: CDN after caching Q1 as an id-list, before client has requested
individual resources.
Finally, the id-list is passed back to the client. Note that this already incurred
a full round-trip to the DBaaS without delivering any actual result-objects.
The client then starts requesting the individual REST resources identified by
their ids in the list of results, leaving the CDN in the state shown in table
3.5.
Key
Value
Q1
1
2
{{id : 1}, {id : 2}}
{{id : 1, salary : 20, 000}
{{id : 2, salary : 25, 000}
Table 3.5: CDN after client has requested all individual resources.
In the worst case, this incurs another full round-trip to the DBaaS for every individual resource. How is this model useful if it can involve so many
expensive round-trips? There are two potential sources of cache hits. First,
every time the client requests one of the resources from the id-list from the
CDN, there is a potential cache hit on that resource. This is because the
cache is potentially “prewarmed” by other queries with intersecting result
sets. Second, if the client issues the same query again, the CDN can return the id-list (which is a separate cache entry), saving a round-trip to the
DBaaS. Furthermore, the client does not necessarily request the individual
resources sequentially, but will usually do so in parallel. I will later explore
24
the impact of parallel connections as part of the cost of caching queries in
the online learning model. In a best case scenario, the client thus needs one
round-trip to fetch the id-list from the CDN and one round-trip to fetch the
(also cached) individual resources in parallel from the CDN. This seems an
unintuitive choice, since the lower-bound on latency is cleary higher than the
object-list model, which only needs one round-trip to the CDN to look up
the query result in its best case.
The advantage of the id-list model becomes more apparent upon consideration of its invalidation mechanics. In the framework of the example, Q1
selected for employees with salaries below 30,000. Now consider an update
that changes the salary of employee 1 from 20,000 to 21,000. The DBaaS
now needs to invalidate resource 1 in the CDN but it does not need to invalidate the id-list, as the same objects still match the query-predicate. In the
example, the invalidation of id 1 would leave the CDN in the state of table
3.6.
Key
Value
Q1
2
{{id : 1}, {id : 2}}
{{id : 2, salary : 25, 000}
Table 3.6: CDN after invalidation of id 1, id-list still matches query predicate.
The point of this model is that the id-list contains the information which
objects match the query-predicate, whereas the concrete objects are cached
separately. The key advantage compared to the object-list model is thus that
a single update only invalidates entries from the cache that have actually
changed, as opposed to invalidating a whole list of objects. If object 1 had
been updated to a salary over 30,000, this would have invalidated both the
id-list and the resource, as seen in table 3.7.
Key
Value
2
{{id : 2, salary : 25, 000}
Table 3.7: CDN after invalidation of id 1, id-list does not match query predicate any more.
25
Even after invalidating both id-list and individual resource, the cached resource 2 can still cause cache hits for other overlapping queries. It is not
hard to see how highly intersecting result sets can increase cache hit rates for
the overall system in this model. Note that in the new HTTP/2 standard,
multiplexing and server push can make the id-list an optimal choice for all
workloads, since round-trips would be the same as for the object-list model
[67].
So far, I have not explained how the DBaaS server detects if an update
invalidates a query-predicate. In the following sections, I will outline how the
task of query invalidation is a key factor in the performance of the caching
scheme.
3.2.3
Matching Queries to Updates
Matching queries to updates is necessary to determine which result sets are
not valid any more. I begin by describing the invalidation mechanism for
caching individual volatile objects, as described by Gessert et al. [6]. A
key point to understanding the invalidation process is remembering the distributed nature of a DBaaS-infrastructure. In principle, there are both multiple cache edges as well as arbitrarily many instances of the DBaaS middleware server, interfaced for instance through an elastic load balancer. This
has the following implication to invalidation: In a system with more than one
DBaaS server, each individual server does not have sufficient information for
invalidation. An invalidation is only necessary if the object is cached, i.e.
if it was read from the DBaaS previously. The problem is that reads and
writes may be be processed by difference server instances. That means a
server receiving a write request cannot know on its own whether the object
might be cached from a read to another server. Hence, there needs to be a
central lookup-service that keeps track of cached objects and their expirations. Any centralised service is a potential performance bottleneck. Gessert
et al. found an efficient solution by using Redis-backed Bloom filters [10, 68].
Every time an object is read and the DBaaS decides to cache with a certain
26
time-to-live estimation, it reports the key of the object and the ttl to the
Bloom filter. The Bloom filter is implemented to always keep track of the
longest expiration. If different DBaaS servers have different local estimates
of an optimal expiration, the Bloom filter keeps the ttl of the longest absolute expiration in the future. Thus, whenever an update is processed, the
DBaaS can query the Bloom filter. If it has an entry for the key, the object is
potentially cached at some edge-server of the CDN. The DBaaS then deletes
the object from the Bloom filter and requests an invalidation.
If the object has already expired from the cache, it also has expired from
the Bloom filter, since all estimated expirations are reported. This way,
invalidations are only requested when they are actually necessary, with some
small false positive rate through the Bloom filter. This approach cannot be
used for matching updates to queries because the relation between updates
and affected queries is one-to-many. The DBaaS has no way of knowing
which entries to query from the Bloom filter on an update, so it has to try
to match updated objects to result sets. In principle, the DBaaS can hold
the id-lists of all cached query results in memory, which is suitable for Monte
Carlo simulations.
For practical purposes, a distributed stream processing engine like Apache
Storm [69] might be appropriate for query matching. For every update, after
images of the write operation can be streamed into Storm, which evaluates
them against queries and result sets. If the after image of a write does not
match result sets containing the changed object, the queries belonging to the
respective result sets need to be invalidated from the cache, as illustrated in
figure 3.2. Practically, a load balancer routes requests to various instances
of the DBaaS server. Each instance communicates with the database cluster
to execute queries and updates. On every update, an invalidation engine is
consulted to determine which cached query results have become stale. Finally, a central Bloom filter service is consulted to look up if the stale result
is still potentially cached before sending out an invalidation. An overview
of this architecture can be found in figure 3.1. The implementation of the
matching algorithm will depend on the database-paradigm. For instance, a
27
Clients (end devices)
requests
CDN edge 1
CDN edge ...
CDN edge n
Distributes
requests in
network
Load balancer
DBaaS instance
1
DBaaS instance
...
Look up
records
DBaaS instance
n
Find stale
results
0 1 0 1 0
Central Bloom filter
Query matching engine
DB cluster
Figure 3.1: Query matching architecture overview. A load balancer distributes requests from caches. An invalidation engine determines which query
results are stale. Bloom filters can then be used to determine whether they
are still cached at some CDN edge.
28
document database like MongoDB represents objects as JSON-documents.
There are specific libraries to evaluate MongoDB queries on JSON-documents
[70], thus enabling the matching of after-images to queries. Instead of going
into more detail on how to achieve query-matching for specific databases,
some high-level comments on the role of invalidation in the caching-scheme
are necessary. Generally, any matching system will only be able to handle
a certain throughput. A possible perspective on this limit would to be to
consider the matching throughput a resource that needs to be leveraged optimally for overall performance. This naturally leads to the question of when
it is not feasible to cache an object or query, which I will briefly discuss in
the following section.
3.2.4
When Not to Cache
From the client’s perspective, reading a cached copy is naturally desirable.
Nevertheless, there are situations when it is impractical for the DBaaS to
cache objects. Entries that are (almost) exclusively written should not be
cached. This would increase the risk of stale reads and importantly cause a
high invalidation load. This notion of observing write and read frequencies
is employed in the estimation of expiration for query results in the next
section. However, there is another relevant aspect to the cost of caching.
The invalidation of a single resource comes at predictable computational
cost, i.e. a (constant time) Bloom filter lookup to determine whether the
resource is cached. In contrast, the matching cost of determining which
queries need invalidation is practically unbounded, as an object might be
part of arbitrarily many cached query results. This creates another decision
problem for the DBaaS. It does not only need to decide which caching model
to use for each query, it also needs to make economical decisions not to cache
some queries depending on the matching cost.
29
After-image of
update
Update
spout
Matching
bolt
Update
spout
Evaluate cached
queries on
after-image for
changed result
Determine bolt
with relevant
queries
Matching
bolt
Output invalidated
queries
DBaaS
Sends invalidations
Figure 3.2: Topology of an Apache Storm invalidation pipeline. After-images
of update operations are published to Storm spouts (data stream endpoints).
They determine which bolt (stream processing node) holds the cached queries
related to that update. Bolts evaluate the queries on the after-image to find
which result sets are invalid and notify the DBaaS, which sends invalidations
to the cache.
30
3.3
3.3.1
Estimating Expirations
Approximating Poisson Processes
I now turn to discussing the estimation model for cache expirations. For
now, the problem of estimating an optimal ttl for a query result is treated
separately from the question of whether to represent the query result as an
object-list or an id-list. Remember, the goal of estimating expirations for
queries is to find an optimal trade-off between invalidation load and cache
hits while also minimising stale reads. Ideally, a cached item will expire right
before an update at the DBaaS so there is no matching cost. My approach
to estimate durations for result sets of queries tries to approximate query
behaviour through Poisson processes.
Poisson processes count the occurrences of events in time intervals and are
characterised by an arrival rate λ and a time interval t. For a Poisson process,
the interarrival times of events have an exponential cumulative distribution
function (CDF), i.e. each of the identically and independently distributed
random variables Xi has the cumulative density
F (x; λ) = 1 − e(−λx) f or x ≥ 0
(3.1)
and mean 1/λ. The probability for a number of arrivals n in some interval
(0, t] is then given by the Poisson probability mass function (PMF) [71]:
pN (t) (n) =
(λt)n e−λt
n!
(3.2)
The DBaaS can only approximate the λ of the write-process. For each
database entry, the DBaaS can track the rate of incoming writes λw in some
time window t. The expected time of the next write is then 1/λw . However,
the Poisson process of reads and queries is only partially observable, as the
DBaaS only receives cache misses on queries and reads. In previous work,
expirations for single records were estimated by comparing miss rates and
31
write rates to compute quantiles on write probabilities [6]. How can expirations for complete result sets be estimated? The result set of a query Q
of cardinality n can be conceptualised as a set of independent exponentially
distributed random variables Xi , . . . , Xn with different write rate parameters λw1 , . . . , λwn . Estimating the expected time-to-live before one of the
objects is written requires a distribution that models the minimum to the
next write, i.e. min{X1 , . . . , Xn }, which is again exponentially distributed
(proof in appendix A):
min{X1 , . . . , Xn } ∼ exponential
n
X
!
λi
(3.3)
i=0
Hence, the DBaaS can simply compute λmin as the rate-parameter for Q by
summing up write rates on individual records:
λmin = λ1 + . . . + λn
(3.4)
It is questionable whether cache miss rates should be tracked and compared to
cache miss rates, as proposed in previous work. Ultimately, DBaaS providers
are interested in the workload mixture of reads/queries and writes on a given
table or collection. For instance, if the workload is dominated by write operations, ttls should be estimated rather conservatively to reduce invalidations
and stale reads. However, if the read process cannot be directly observed,
there are two options. First, the model can simply rely on writes. Second,
the model can try to approximate the workload mixture of reads and writes
through various measures. In the remainder of this section, I will outline both
alternatives. Further, I will comment on some practical issues of real-time
monitoring at the end of this chapter.
32
3.3.2
Write-Only Estimation
From a perspective of scalability, tracking miss rates on every database record
can be too expensive. However, one could also take a position of ignoring the
read proportion of the workload. Instead, one could base the ttl estimation
simply on the probability of the next write. This requires the inverse CDF
(or quantile function) of the exponential distribution parametrised by λmin
to estimate expirations. The quantile function then provides time-to-lives
that have a probability of p of seeing a write before expiration:
F −1 (p, λmin ) =
−ln(1 − p)
λmin
(3.5)
Using the median inter-arrival time of writes (p = 0.5) then gives a straightforward ttl estimate for the result set of a query:
F −1 (0.5, λmin ) =
ln(2)
λmin
(3.6)
The problem with this approach is that it does not provide a good intuition
about the trade-off between cache hit rate and latency. It completely ignores
whether a workload mixture consists primarily of reads or if it is dominated
by writes. Fundamentally, service providers need to determine how many
potential cache hits they are willing to trade for one invalidation that carries
the risk of a stale read with it. The expected reduction in invalidations is
(1 − p) · writes: for p = 1, every write is expected to cause an invalidation,
for p = 0 (no caching), no object is invalidated.
If the model completely ignores reads, it might not be flexible enough to
deal with changing workloads. For instance, to instantly increase cache hits
and thus reduce database load, p could be increased to e.g. 0.75. Developers
could also specify a p for a given table or collection of documents by choosing
33
from predefined options. This is somewhat unsatisfying, as developers cannot
be realistically expected to be aware of the detailed tuning mechanisms in
the caching infrastructure. Another possible issue with this model is that it
performs differently depending on the chosen cache model. For an objectlist, anticipating the next write on the result set is sensible as it invalidates
the whole result. For id-lists, the next write is only relevant if the changed
object does not match the query predicate any more. It is thus possible that
optimal quantiles differ for the different representations.
3.3.3
Dynamic Quantile Estimation
The goal of dynamic quantile estimation is to determine a p in the inverse
CDF that reflects workload mixture as well as the tolerance on eventual
consistency (higher consistency requirements lead to less cache hits). Instead
of comparing cache misses on records and using the miss-to-write ratio as
a proxy, one can directly estimate the workload mixture in the first step.
Later, this estimate can be used to adjust quantiles of the next expected
write. Again, one can argue that using the miss rate at the database is
not informative enough. Since the true workload is hidden behind caches,
the DBaaS cannot use the miss rate to infer if the workload even warrants
caching. There are multiple possible models for estimating the workload
mixture. First, the developer can specify the expected workload mixture.
Note that this is different from the write-only model, where it was suggested
the developer could directly choose a quantile. Providing a workload mixture
is much more intuitive, as the developer can be expected to know whether a
schema is primarily read or written.
Another option is based on the insight that some objects will not be cached
at all for various reasons. First, every cached object requires an entry in
the server-side expiring Bloom filter, thus increasing probability of a false
positive lookup, which in turn causes unnecessary invalidations. Second, the
limited cache size can force the DBaaS to mark some objects as uncachable.
This issue is related not only to the workload mixture, but also to the request
34
distribution. The workload mixture is the proportion of reads, writes and
queries, whereas the request distribution describes how often individual keys
are accessed by operations. In a typical Zipfian request distribution, some
objects will be written extremely frequently, even though the workload mixture is dominated by reads. Furthermore, the expected bounds on stale reads
depend on the latency distribution of the invalidation request: the longer it
takes for an invalidation to complete, the higher the cumulative probability
of a stale read. In summary, the overall workload mixture for a table can
be estimated by marking some objects as uncachable for various reasons and
then measuring their read/write mixture.
Finally, one could track other query metrics through CDN log analysis, as
proposed by Ozcan et al. [72, 73]. Query Shareness (QS) quantifies how
many clients request a certain query, which is also interesting to the objectlist versus id-list decision, as a query that is shared by multiple users can
particularly benefit from a pre-warmed cache. Query frequency stability
(QFS) models the popularity change of query frequency over time.
After obtaining an estimate of the workload mixture, there are multiple options to map estimates to quantiles. Using offline optimisation, a provider
can obtain optimal values for typical workload mixtures. Quantiles can then
simply be looked up from a configuration file. It is however questionable if
such a model can reflect the nature of drastically changing workloads, e.g.
applications suddenly growing viral. Alternately, an online learning model
could use a budgeting approach. If there is a limited number of invalidations
the system can perform, quantiles can be adjusted according to the number
of invalidations performed. For instance, if the invalidation load is too high,
the probability of seeing a write within the time-to-live of a cached object
needs to be lowered.
In summary, this chapter has introduced different query-caching execution
models that are based on record-level access frequencies. I have also discussed
strategies to invalidate result sets of queries, ttl estimation based on Poisson
processes as well as various practical limitations. The baseline of all these
considerations is a very long static ttl, which causes a maximum of cache
35
hits, invalidation cost and stale reads. In the next chapter, these insights
are combined into an online decision model that can achieve fine-grained
performance trade-offs.
36
Chapter 4
Online Learning
4.1
Introduction
In the previous chapter, a theory of different execution models and their
constraints was introduced. Specifically, trade-offs between execution models
and parameters of ttl estimation were discussed. However, these insights are
only actionable if the DBaaS has a decision model that can adapt to changing
request loads. In this chapter, I first describe the decision problem in a formal
framework and then derive a solution. Further, I introduce a generic method
to find optimal parametrisations through utility functions.
In order to construct a learning model, one first needs to consider what
information is available at what point in the decision process. The learning
process will also need to consider the granularity of decision making both for
the execution model as well as for ttl estimation. I begin by considering what
is available to the DBaaS. The DBaaS can monitor reads, writes and issues
invalidations requests after updates. Further, the DBaaS does not know the
exact status of the various caching layers nor about the latencies a client sees
for specific requests. Next, the processing flow of a potential decision model
needs to be considered. The base case is a system that has not processed
any queries yet and all caches are empty. At some point, the server receives
37
an initial query. The challenge from the server’s perspective is that it does
not know anything about the result of this query yet but still has to make a
decision on how to execute the query.
As explained previously, the DBaaS can either order the database to execute
a covered query on the index that only returns ids or a full query that returns
all entries matching the query predicate. The query result is more informative than the query itself. Since the result contains the specific database
objects or at least their ids, any available metrics on these objects can be
used to improve future decisions. In principle, the model aims to improve
decision making by considering how the previous decision impacted system
performance, i.e. average response times for clients and load at the backend. In the following sections, I express this problem in a formal framework,
present my solution and reason about the issues related to the scalability and
performance of real-time learning.
4.2
4.2.1
Representation as an MDP
State and Action spaces
Finding a closed-form solution might be impractical due to the complex and
stochastic nature of the problem. Many of the relevant variables like write
rates, workload mixture and response times can only be approximated at
the DBaaS. This lack of an analytical model suggests that reinforcement
learning could be a sensible approach. This requires the task to be framed as
a Markov decision process. The learning model is hence constructed by first
considering each component of an MDP with regard to the problem. I then
derive a model that I believe captures best the constraints of the problem.
For now, the decision not to cache an object is ignored and deferred to the ttl
estimation. This means the learning model only makes a decision between
execution models and the ttl estimation model can then estimate a ttl of 0
if it decides the object should not be cached. Clearly, the space of possible
38
actions in this simplified scenario is A = {object-list, id-list}. One could then
argue that queries should constitute the space of states, as the decision model
must map queries to actions. Each action would then lead to a new query
as the next state, i.e. Queries × A → Queries. There are multiple problems
with this representation. It is questionable whether using queries as states
even satisfies the Markov property since the effects of a decision taken in
a state do not only depend on that single query. An incoming query does
not capture all information relevant to the DBaaS. As I argued in chapter
3, various realtime metrics need to be taken into account. Further, a RL
agent assumes that his actions determine his next state. Even if there is a
probabilistic transition model that assumes a distribution of possible states
for a decision, this is not a valid assumption. Queries from different clients
are not in any causal relationship. Assuming that a decision on one query
leads to another query as a new state is thus not a useful intuition.
4.2.2
Decision Granularity
The observation that a decision model should depend on access patterns and
workload metrics leads to multiple insights into the model structure. First,
this suggests that states could be conceptualised as a set of load metrics
instead of single queries. This implies a large state space that cannot be
represented as a lookup table and must be approximated either through a
linear sum of weighted features or a non-linear approximator like a neural
network.
Second, using the global system performance as a state has consequences
on the granularity at which decision making is sensible. Measuring the impact of the decision on a single query on the system is infeasible. While the
model aims to make ttl estimates on the level of individual query result sets,
the execution model might be captured on the level of tables or document
collections. As shown in the examples in chapter 3, the choice of execution model should in part depend on how much query predicates overlap.
Consequently, a sensible model might assess access patterns on the level of
39
collections and use a single execution model for all queries on that collection
or for all parametrisations of prepared queries.
4.2.3
Reward Signals
Before discussing how to map states to actions in a formal manner, a reward
signal needs to be specified. A fundamental problem in online learning is
the definition of a good reward function. In data management, users and
providers are often interested in learning how to achieve very specific tradeoffs on various performance metrics, which are then expressed through service
level agreements. Naturally, one can only achieve trade-offs on features that
are modelled into the reward function. For many examples in reinforcement
learning, this is a straight forward measure such as the score in a game or
making it to a certain height in the mountain car problem [74]. The difficulty
is then rather to learn an approximation of the cost for actions in continuous
state spaces from noisy and delayed rewards. For the decision model, the
structure of the reward signal itself is a challenge.
I begin by recapitulating features that are relevant to the execution model.
For a given query, the database returns a result comprised of objects or
keys (ignoring the trivial case of an empty result). In general, the goal is to
minimise invalidations on these keys, to maximise cache hits, and to minimise
overall query latency. However, only invalidations are directly visible to the
DBaaS through the server-side expiring Bloom filter. However, cache misses
registered at the DBaaS might be used as a proxy for cache hits. Earlier,
I argued that cache misses cannot be used to infer the workload mixture.
Nevertheless, a learner can still extract a reward from just comparing the
total amount of cache misses in a given time period for different decisions.
Further, while request latency for clients is unknown, a learning model could
instead use the expected relative cost between execution models as a reward.
Using the id-list model, the relative latency cost is a factor of result set
cardinality and parallel connections.
40
Requesting all resources from a list of ids is more expensive by a factor of
card(result set)
.
connections
(4.1)
The key point is that the model lacks an absolute notion of the quality of
an action. The exact number of invalidations or cache misses following a
sequence of decisions is not meaningful. While a certain absolute number of
invalidations can be seen as an indicator for uncachable objects, cache misses
are only meaningful when compared between execution models under the
assumption that the workload is constant during the period of observation.
It is also notable that the same metrics that I suggested to represent a state
are used in the reward signal. Specifically, the state is comprised by the overall load, whereas the reward consists of the specific metrics for keys that are
part of a query result. The structure of the reward suggests that the model
needs to continuously compare choices for the same queries to see which decision yields the higher relative reward. So far, this approach has not dealt
with the question of how to map the continuous action space to a binary set
of actions. While substantial research efforts have gone into approximation of
continuous state and action spaces [75, 51, 52], it is questionable whether this
effort is necessary here. If reward features are a subset of state features and
states need to be mapped to actions according to relative rewards, the model
can simply represent its policy as a probability distribution over actions to
sample from, as actions need to be constantly compared for relative rewards.
In the following section, I hence combine the previous observations into a
model that directly updates the belief state about the optimal distribution
of actions.
41
4.3
Belief State Approximation
I propose the following model: each collection or table begins with a prior on
execution models, e.g. without further assumptions one might use a uniform
prior with p(object - list) = 0.5 and p(id - list) = 0.5. A learning period is
defined by the number of samples n that the model collects before updating
its belief state. Further, the model is parametrised through the interval
length of the moving window at which writes, invalidations and cache misses
can be tracked. Every time the DBaaS receives a query, a decision on the
execution model is made by drawing from the distribution, e.g. the model
optimises on the overall distribution for a collection of entries. One could
also imagine the model learning a distribution for all parametrisations of
a prepared query, e.g. a query that always requests the same content but
allows for user-defined filters. After query execution, a reward r on a list of
k result ids id1 , . . . , idk for a sample is computed through
k ω1
ω2
1 X
+
,
r= ·
c j=1 invalidations(idj ) cache missses(idj )
(4.2)
with c being the relative cost of execution
c=


k
connections
ω
if id - list
(4.3)
if object - list
3
and invalidations and cache misses representing their approximated frequencies. An inverse sum is used because the goal is to minimise these values.
Scalable sampling methods to approximate these frequencies will be discussed
at the end of this chapter. The reward also needs to include weights ω1 , ω2 , ω3
to be able to express a preference towards lowering invalidations, cache misses
or response times at the client. For instance, increasing ω3 would increase the
reward for using object-lists, thus generally lowering client latency. At the
end of a learning period (n samples and rewards), the belief state is updated
batch-wise. First, the model computes the normalised total reward for each
42
execution model by averaging over the number of samples out of n for which
the decision object-list (n1 ) or id-list (n2 ) was made:
Pn1
robject - list =
i=1 ri (object - list)
n1
,
(4.4)
Pn2
rid - list =
i=1 ri (id - list)
n2
Finally, the current belief state is batch-updated through
p(object - list)t+1 = p(object - list)t + αt ·
robject - list − rid - list
robject - list + rid - list
(4.5)
and
p(id - list)t+1 = 1 − p(object - list)t+1 ,
(4.6)
where αt ∈ [0, 1] is the learning rate at time point t. Again, the reason
updates are performed through batch-wise comparison is that rewards on
single queries are deemed to be too noisy. Intuitively, the model simply
samples rewards for decisions, compares rewards and shifts its belief state
according to the difference in rewards in the observation period normalised
by the total reward obtained. The learning rate can either be held constant
or tuned proactively. An apparent disadvantage of the model is that, as
it converges towards one execution model, fewer and fewer samples will be
drawn from the model that is deemed to be less relevant. Hence, special
consideration needs to be taken with regard to convergence strategies.
4.3.1
Convergence and Exploration
There are multiple convergence scenarios. First, the model could convert
to a mixture that does not put a clear preference on one execution model,
which could also be caused by an unfavourable parametrisation that leads
to too much noise or to little data, e.g. observation window for reward
43
measurements is too short. This could be defined as a case where
0.4 ≤ p(object - list) ≤ 0.6 and thus also 0.4 ≤ p(id - list) ≤ 0.6. This is an
unfavourable outcome, as it implies that random decisions on a uniform prior
are sufficient (hence no learning necessary). If the model converges strongly
towards one model, e.g. a probability of 90% or more for a single decision, it
might not be able to adapt to changing workloads later. A typical solution
to this problem is to introduce a small probability where a non-dominant
action is taken, a so called epsilon-greedy approach [76]. The model greedily
chooses the presumed best action with a probability of 1 − and otherwise
a random action. This can be practically achieved by bounding probabilities
for one decision by (1−). In the experimental evaluation, I will demonstrate
how this enables the model to detect and adapt to changing workloads.
4.3.2
Sampling Techniques
Various methods exist to approximate streams of incoming data [77]. Good
examples for approximations are cache miss and invalidation frequencies
through a moving window of arrival times. However, more sophisticated
methods exist: (biased) reservoir sampling can be used to summarise streams
by keeping a fixed-size reservoir of representative values and updating the
reservoir through a bias function [78, 79]. Initially, all incoming values are
used to fill the reservoir. A bias function (often exponential) f (r, t) is then
used to define a relative probability of an r-th point still belonging to the
reservoir at the arrival of a later arriving t-th point. Alternatively, one can
simply replace elements in the reservoir with a certain rate uniformly at random. The advantage of the reservoir sampling approach is that it does not
completely ignore values after a certain period (like a moving window) [80].
Another aspect of sampling and approximation is extrapolation. For a large
database, it is infeasible to hold moving windows for all database records in
memory. Instead, one should expect to extrapolate from a set of representative records. I expect the learning model to be computationally inexpensive,
as it primarily consists of in-memory summations. From a practical perspec44
tive, this makes the model very favourable, as many sophisticated prediction
techniques require costly matrix factorisations that are problematic for scalable realtime learning. For instance, inference on Gaussian processes runs
with O(n3 ) runtime and O(n2 ) space complexity [81]. Sparse matrix approximation techniques exist, but are rather targeted at offline processing of large
datasets and do not operate on a timescale of miliseconds [82, 83].
4.3.3
Hyperparameter Optimisation
At various point in this work, I have pointed towards trade-offs in consistency,
latency, server load and cache efficiency. For instance, the reward function
is parametrised through weights ω1 , ω2 , ω3 that characterise a preference
between cache misses and invalidations. It is however not straight-forward
to define parameters that express a specific performance level. For instance,
a DBaaS provider might desire to analyse the required performance at various components to achieve a certain average latency for a specific caching
topology. This section briefly explains how to optimise the parameters of the
learning model. First, one needs to define a global utility of an instance of
the Monte Carlo simulation. An instance means running a certain workload
with specific request and latency distributions and monitoring all performance measures of interest. The global utility u of an instance is a linear
combination of n utility functions f that map concrete values to a normalised
utility
u=
n
X
ω i fi .
(4.7)
i
For illustration, consider a possible utility function for average query latency
at the client, as seen in figure 4.1. Here, an average latency below 50 milliseconds is desired. Latencies of about 100 milliseconds are already considered
to be of much less utility and latencies close to 200 milliseconds are of no
value, e.g. due to a service level agreement.
45
1
Utility
Utility
0.8
0.6
0.4
0.2
0
0
50
100
150
Average response time (ms)
200
Figure 4.1: Utility function example for response times.
In general, a configuration can then be found according to the following steps:
(1) Definition of a hyperparameter space, e.g. parameters ω1 , ω2 ∈ (0, 1)
define a two-dimensional grid.
(2) Definition of a linear combination of utility functions on the performance metrics of interest, thus mapping concrete desired values to a
normalised score.
(3) By repeatedly drawing samples from the hyperparameter space, a local
optimum is determined.
The key insight is that this method does not draw at random or by using
a stochastic gradient descent, which would only take into account local improvements. Traditional approaches include random search, grid search and
manual search of optimal parameters [84, 85]. However, these approaches
can be inefficient if the reward function is expensive to evaluate. In contrast, Bayesian approaches using Gaussian processes construct a probabilistic model of the reward function and make educated estimates on where in
the parameter space to next evaluate the function. This is done by utilising
all available information from previous evaluations instead of just making a
46
local estimate [86]. I use the Spearmint framework described by Snoek et al.
to perform Monte Carlo optimisation with Gaussian processes [87, 88].
47
48
Chapter 5
Evaluation
5.1
Aims
This chapter describes the implementation, the experimental set-up and the
experimental evaluation. First, however, the evaluation goals need to be
defined. The experiments should
(1) confirm the trade-offs of the different execution models suggested by
my theory,
(2) investigate the relationship between the stochastic ttl estimation model
and consistency,
(3) demonstrate that the estimation method is superior to a static model
with regard to invalidation load, and
(4) validate the learning model as a method to achieve the desired tradeoffs.
It is also necessary to understand the baseline of the evaluation. Section
5.3 compares cache hit rates and response times between different execution
models. DBaaS providers usually do not cache their dynamic content because of consistency and invalidation issues. An appropriate baseline is thus
a DBaaS that does not cache its dynamic content. Section 5.4 investigates
49
consistency and invalidation load for the proposed model. Specifically, it is illustrated how naive caching techniques for dynamic content are bottlenecked
by invalidation cost (and hence not used in practice). Finally, section 5.5
analyses the performance of the online learning scheme.
5.2
5.2.1
Simulation Framework
Design and Implementation
All experiments were carried out in a Java 8 simulation framework. I chose
Java for its concurrency utilities and for the availability of some required
libraries. The implementation is based on previous work on dynamic caching
as well as the Yahoo Cloud Serving Benchmark (YCSB) [89, 6]. YCSB is an
established framework to benchmark cloud databases by providing a set of
typical workloads and a common interface for standard database operations.
It thus enables a comparison of database performance. In order to compare
two databases, users can provision a certain computing power (often Amazon EC2 instances [90]) and then deploy the benchmark by implementing
the interface and specifying a workload. A workload is defined as a set of
parameters like read rate and write rate (e.g. 50/50), a request-distribution,
a desired throughput, the number of objects in the database, the number
of fields per entry and the length of these fields as well as the number of
concurrent clients.
In previous work, YCSB was extended to analyse caching behaviour of individual database entries [6]. In particular, no actual database was used in
my previous work with Gessert et al. Instead, a database was simulated as
a hash table. In this work, I abandoned the YCSB framework in favour of a
dedicated query simulation framework. I reused and extended some classes,
namely modules for treating individual resources. Specifically, I adapted
and modified the simulated cache class, the staleness detection mechanism
through time stamping, the moving window mechanism used to collect fre50
quencies, as well as the expiring Bloom filter. In the code, I have commented
each individual class to indicate whether it was reused, modified existing code
or completely independent. The main difference of my framework compared
to our previous work is the ability to generate, execute and evaluate queries.
Queries were constructed by drawing projections from specified ranges (e.g.
f ield 1 > 10) and then parsed and executed on a MongoDB server. The
advantage of using MongoDB is that users do not have to specify a schema.
Thus, the benchmark can simply insert and overwrite documents with arbitrary specified contents instead of having to declare typed attributes. In
further benchmarks, one could also set up a schema to test other database
paradigms (e.g. relational), e.g. by following the specifications of the established TPC-C benchmark [91].
Figure 5.1 provides an overview of my implementation. The main components are clients (each associated to a thread generating requests), a cache
instance and the DBaaS endpoint that is managing database access, ttl estimation, invalidations, and learning. After specifying the workload parameters, the simulation populates MongoDB and ensures indices. All MongoDB requests are executed with the write concern “acknowledged”. Write
concerns are guarantees on consistency after updates which directly affect
performance. For instance, “acknowledged” as the default concern means
that changes have been applied to the in-memory view of the data. Clients
then continuously generate requests that are routed to the CDN edge server,
which forwards them to the DBaaS. The DBaaS server parses requests and
executes queries on MongoDB while consulting the learning module for decisions on execution models and the ttl estimator for expirations. On every
update, the query matching engine is consulted to decide which query results
have to be invalidated. The specific control flow of query caching and the
decision model have been extensively covered in chapters 3 and 4. A detailed
explanation of the individual modules can be found on the project website
[92].
51





Workload mixture
Request distribution
Number of operations
...

Client layer
Issues global
versions
Tests for staleness
on each read
Staleness
detector
Worker thread
Generates next
request
Send requests delayed by
latency sample
Cache layer


Collapse and forward
requests
Invalidate
Return cached query
results
DBaaS layer
Match updates on
Cached results
Learner
TLL
estimator
execute
Queries/Updates
Query
Matching
engine
Access metrics
Figure 5.1: Overview of the simulation architecture.
52
MongoDB
5.2.2
Benchmark Configuration
All experiments were carried out on a machine with 16 GB RAM and a
quad-core i5 CPU (2.8 GHz). Further, normal distributions were used for
the latencies between client and cache, cache and DBaaS (using known Amazon EC2 region latencies) and for invalidation latency (using data from the
Fastly CDN [64]). It should be noted that a distributed benchmark was
not performed. A requirement of this project was a cost-neutral evaluation.
While Amazon Web Services provides free micro tier instances to students,
these are not very useful here. Matching updates to query results is a computationally expensive task. Executing the benchmark on micro tier instances
would skew the results.
5.3
5.3.1
Comparing Execution Models
Read-Dominant Workload
The evaluation begins by comparing the object-list and the id-list model for
typical workloads. The model suggests that caching whole query results leads
to much lower latency due to fewer latency round-trips. In turn, I expect
higher cache hit rates when caching results as id-lists because intersecting
query predicates benefit from sharing cached entries. First, I examine a
typical read-heavy workload that consists of 95% reads and queries and 5%
updates on a Zipfian access distribution, e.g. photo tagging. To clarify, a
read is equivalent to a GET request on a single resource identified by its key,
whereas a query consists of at least one projection and requires evaluation by
the database’s query engine. The workload initially inserts 1000 documents
with each 10 fields of random data and then performs 100, 000 requests by
10 parallel threads (each one connection), beginning with a mixture of 40%
reads, 55% queries and 5% updates to demonstrate the principal difference
in execution models.
53
Cache hit rate
1
0.8
0.6
0.4
0.2
10 0
Object-list
Id-list
10 -1
10 -2
Query selectivity
10 -3
10 -4
Figure 5.2: Cache hit rates as a function of average query selectivity on a
mixture of 40% reads, 55% queries and 5% writes.
Average response time (ms)
Object-list
Id-list
Uncached DBaaS
200
150
100
50
0
10 0
10 -1
10 -2
Query selectivity
10 -3
10 -4
Figure 5.3: Average query response times as a function of average query
selectivity on a mixture of 40% reads, 55% queries and 5% writes.
Figures 5.2 and 5.3 show how the performance of execution models relates
to average query selectivity. Further, response times for a DBaaS without
54
dynamic query caching are shown. A query selectivity of 1 means that the
query predicate matches all objects in the database and a query predicate of
0.1 means that the predicate matches 10% of all keys, i.e. selectivity indicates
how much result sets of different queries intersect. First, the result matches
the expectations as the object-list execution model achieves better response
times than the id-list model but has a worse cache hit rate. For an uncached
DBaaS, every request requires a full round-trip to the backend, resulting in
noticeable response times for the client particularly if the application requires
more than one round-trip.
There are two artifacts in figure 5.3 worth discussing. For a query selectivity
of 1, i.e. the query predicate matching all documents in a collection, both
models have slow response times. This is due to collapsed forwarding in the
cache. If many clients request the same content from a cache edge server, the
cache will collapse the requests to a single database query, thus blocking multiple clients. This decrease in request parallelism causes longer response times
for clients. Additionally, write locks also block incoming reads. Further, response times for very selective queries (average selectivity of 0.0001) are very
similar because the predicate matches only one object in the simulation (or
none). This means that most queries will be uncached and thus require a full
round-trip to the DBaaS. Note that a round-trip between Europe and USA
EC2 regions is around 170 milliseconds [6], matching the result. Cache hit
rates on resources are still drastically different, because in the id-list model,
normal reads still pre-warm the cache for query results. The reason it still
takes a full round-trip is that the client first needs to fetch the id-list.
The result shown in this section were averaged over 5 runs. Considering the
probabilistic nature of the experiments, the simulation is very consistent.
For the plot in figure 5.2, the average cache hit rate for the id-list model
for an average query selectivity of 1 is 98.73% with a standard deviation
(sd) of 0.00015 and 95% confidence intervals (CI) of (0.9871, 0.9875). The
average cache hit rate of the object-list model is 74.34% (sd = 0.0035, CI =
0.7190, 0.7278). Similarly, the average response time for the id-list model is
158.88 ms (sd = 0.7 ms, CI = 158.01, 159.74). For the object-list model, an
55
average of 65.75 ms (sd = 0.55 ms, CI = 65.07, 66.44) is observed. In summary, errors were negligible: the simulation converges to the desired target
distributions after a few thousand requests. Since each workload executes
100, 000 requests, small fluctations (e.g. garbage collection) are averaged
out. Errors are hence omitted in further experiments.
The same experiment was repeated for a workload with a uniform access
distribution on the keys, as seen in figures 5.4 and 5.5. Notably, there is no
spike in latency at a selectivity of 1, as was observed in figure 5.3. Since
individual reads are now uniformly distributed over the key space, there is
less lock contention due to writes and thus more query parallelism at the
database. However, one can observe the same genereal trend and I will thus
in the remainder of the experiments use a Zipfian access distribution, which
is a more typical case [93, 94].
Cache hit rate
1
0.8
0.6
0.4
0.2
10 0
Object-list
Id-list
10 -1
10 -2
Query selectivity
10 -3
10 -4
Figure 5.4: Cache hit rates as a function of average query selectivity on a
mixture of 40% reads, 55% queries and 5% writes under a uniform access
distribution.
In the experiments above, all expirations are estimated by only tracking
incoming writes, as described in chapter 3. The cumulative probability of
a write within the expiration is adjusted to p = 0.75 to enable high cache
hit rates on a read-heavy workload. While the impact of write-quantiles on
56
Average response time (ms)
Object-list
Id-list
Uncached DBaaS
200
150
100
50
0
10 0
10 -1
10 -2
Query selectivity
10 -3
10 -4
Figure 5.5: Average query response times as a function of average query
selectivity on a mixture of 40% reads, 55% queries and 5% writes under a
uniform acess distribution.
invalidation load and eventual consistency will be analysed in the upcoming
sections, the execution models will first be compared under another workload.
5.3.2
Write-Dominant Workload
Figures 5.6 and 5.7 show the same metrics for a write-heavy workload that
consists of 50% writes and 25% queries and reads each.
Figure 5.7 shows a similar overall trade-off between execution models as the
read-heavy workload. However, there is a clear difference in cache hit rates,
as the object-list model provides very weak cache performance. This was
not the case in the read-heavy workload, where cache hit rates began high
but degraded with increasing selectivity. In turn, average response times are
rather high (above 400 milliseconds) for the id-list model. Notably, they are
even worse than not caching at all. Since objects are written very frequently,
clients first do not get a cache hit on the id-list in the CDN.
57
Average response time (ms)
Object-list
Id-list
Uncached DBaaS
250
200
150
100
10 0
10 -1
10 -2
Query selectivity
10 -3
10 -4
Figure 5.6: Average query response times as a function of average query selectivity on a mixture of 25% reads, 25% queries and 50% writes. Estimation
quantiles have been adjusted to p = 0.4 to account for the write-dominant
workload.
Cache hit rate
1
Object-list
Id-list
0.8
0.6
0.4
0.2
10 0
10 -1
10 -2
Query selectivity
10 -3
10 -4
Figure 5.7: Cache hit rates as a function of average query selectivity on a
mixture of 25% reads, 25% queries and 50% writes.
58
After retrieving the id-list, clients have to iterate over the individual resources, which also might not be available in the cache. Hence, in this case
it is more economical to use the object-list model. This experiment has illustrated how demanding write-dominant workloads are for the DBaaS. In
order to maintain reasonable latencies at clients, one has to accept low cache
efficiency. In the following section, I investigate how ttl estimations affect
invalidation load and client consistency.
5.4
5.4.1
Consistency and Invalidations
Adjusting Quantiles
This section deals with the effect of write quantiles on client consistency and
cache hit rates. Specifically, the experiments should quantify how invalidation load and stale reads are connected to the cumulative write probability
within a cache expiration duration. I again consider the write-dominant
workload that causes expensive trade-offs on cache efficiency for acceptable
client latency. The first experiment investigates staleness using the data on
invalidation latencies provided by the Fastly CDN [64]. Figure 5.8 shows
the absolute number of stale reads. As expected, stale reads increase with
increasing quantiles because every write on a still cached object triggers the
possibility of a stale read, depending on how fast the invalidation is executed.
The highest number of stale reads observed accounted for 1% of all reads
and queries (500 out of 50,0000), depending on the execution model (see
appendix B for impact of invalidation latency on stale reads). For workloadadjusted quantiles (i.e. lower quantiles on write-dominant workloads) the
average number of stale reads is about 0.1%, which seems acceptable for
most applications without strong transactional semantics. Figure 5.9 shows
how cache hit rates depenend on the quantile of the next expected write on
the same workload. As noted above, the object-list model provides weak
cache performance on a write-dominant workload.
59
Stale reads
600
400
200
Object-list
Id-list
0
0
0.2
0.4
0.6
0.8
Quantile of next expected write
1
Figure 5.8: Absolute number of stale reads on the write-dominant workload
as a function of the quantile of the next expected write.
Specifically, most cache hits in in the object-list model in this scenario stem
from GET requests on individual resources, not from cached queries.
Cache hit rate
1
0.8
0.6
0.4
0.2
0
Object-list
Id-list
0
0.2
0.4
0.6
0.8
Quantile of next expected write
1
Figure 5.9: Cache hit rates on the write-dominant workload as a function of
the quantile of the next expected write.
In these experiments, all objects were cachable, leading to an invalidation
load of 45, 000 to 50, 000, i.e. almost every write leading to an invalidation on
60
a Zipfian access distribution. This illustrates the necessity of marking certain
objects uncachable, as they will otherwise bottleneck the query matching
engine. The following section investigates which trade-offs can be achieved
with regard to invalidation load.
5.4.2
Reducing Invalidation Load
As discussed in chapter 3, the DBaaS needs to be able to reduce invalidation
load depending on the achievable throughput of matching updates to query
results. I thus have implemented the proposed model of marking certain
objects uncachable based on the insight that some objects might be updated
so frequently that they cannot reasonably be cached (e.g. they would be stale
by the time they have arrived at the CDN). Figure 5.10 shows a comparison
of invalidation loads when using this approach to the previous approach
of caching all objects depending on their write frequency and the chosen
quantile. For comparison, I also show a static caching method that caches
all objects with the same expiration. First, one can note that for a writedominant workload, the cache hit rate is capped at 86.6%. Second, marking
certain objects as uncachable results in an average cache hit rate of 72.6%
(excluding quantile 0, which means no caching). The interesting question is
now how this relates to invalidation load. Figure 5.11 compares invalidation
loads from the same experiments.
Notably, there is a drastic decrease in invalidation load by dynamically marking objects as uncachable. Average invalidation load is reduced by about
50%, while only giving up 14% cache hit rate (response times did not differ
significantly). The same effect can be observed for the object-list model.
61
Cache hit rate
1
0.8
0.6
0.4
Naive id-list
Uncachable objects marked
Static caching
0.2
0
0
0.2
0.4
0.6
0.8
Quantile of next expected write
1
Figure 5.10: Cache hit rates on the write-dominant workload as a function
of the quantile of the next expected write and compared to a static caching
method.
Invalidations
6
#10 4
4
2
0
Naive id-list
Uncachable objects marked
Static caching
0
0.2
0.4
0.6
0.8
Quantile of next expected write
1
Figure 5.11: Invalidation loads for using the naive id-list approach versus
dynamically marking frequently written objects as uncachable.
62
5.5
Online Learning
5.5.1
Learning Decisions
In the previous sections, I have demonstrated the trade-offs related to execution models and their parametrisations. I begin the evaluation of the learning
model by applying the decision model to the read-dominant workload from
above while also using the optimisation of marking some objects uncachable.
With only 5% writes, primary focus is not on limiting invalidation load but
rather on client latency.
Query selectivity
Learner
Random guess
Object-list
Id-list
1
0.1
0.01
0.001
0.0001
211.2
151.5
107
130.5
146.8
348.2
195.5
124.4
135.9
147
188.9
142.3
104.5
129.1
148.7
552.7
251.1
167
147.8
148.6
Table 5.1: Average overall request response times (ms) for learning model
compared to random guessing and static decisions on a read-dominant workload.
Query selectivity
Learner
Random guess
Object-list
Id-list
1
0.1
0.01
0.001
0.0001
0.3
0.42
0.6
0.37
0.19
0.61422
0.7115
0.74
0.441
0.21
0.22
0.28
0.46
0.29
0.18
0.87
0.889
0.93
0.72
0.46
Table 5.2: Cache hit rates for learning model compared to random guessing
and static decisions on a read-dominant workload.
Table 5.1 compares average request response times for the learning model
compared to a uniform random guess and static decisions. The differences
in response times between learner and random guessing are small because a
random mixture already provides relatively low latencies, as id-cached results
pre-warm the cache for individual reads. By having a bias towards low latencies, the learning model has traded in cache efficiency (as seen in table 5.2).
63
One can also see that the learner converges towards the performance of the
object-list model. For the evaluation of the learning model, the comparison
to a static decision is not as useful because it is already known that either
object-list or id-list is optimal depending on the desired trade-offs. Having
established that the model can converge to the performance of a static model,
the question is thus rather whether its decisions are better than random decisions, which I will focus on in the following (hence omitting static decisions,
as I have covered them extensively above).
For a more detailed analysis, isolated query response times of learning and
guessing can be considered, i.e. ignoring response times for individual GET
requests, as seen in table 5.3. Previously, I already suggested that the extreme ends of selectivity can be rather ignored because of lack of parallelism
for selectivity of 1 and no difference between decisions for highly selective
queries. Instead, one might consider rather typical cases, e.g. for an average
query selectivity of 1% the average query response time could be reduced
from 104.8 to 67.4 milliseconds (35.6% decrease).
Query selectivity
Belief state approximation
Random guess
1
0.1
0.01
0.001
0.0001
295.7
176.8
67.4
120.4
171.8
528.6
229.8
104.8
139.5
173.7
Table 5.3: Average query response times (ms) for learning model compared
to random guessing on execution model on read-dominant workload.
I have repeated the same experiment for the write-dominant workload This
time, learning was focussed on invalidation load and response times, as previous experiments have already shown that high cache performance is not
possible without very high latencies. Again, latency can be drastically reduced while maintaining approximately the same invalidation levels, as seen
in table 5.5. This is achieved by trading in cache performance. In this particular experiment, the average cache-hit rate of the learner sinks to 6% from
21% for random guessing. Both cache hit rates are very low, as hotspot
64
Query selectivity
Belief state approximation
Random guess
1
0.1
0.01
0.001
0.0001
230.3
215.9
189.6
168.3
168.5
582
422.8
260.6
187.4
171
Table 5.4: Average request response times (ms) for learning model compared
to random guessing on execution model on write-dominant workload.
objects are marked uncachable to reduce invalidation loads. In principle,
these experiments have established that the learning model can indeed learn
towards certain metrics. In the following sections, I will analyse the quality
of these trade-offs and convergence properties in more detail.
Query selectivity
Belief state approximation
Random guess
1
0.1
0.01
0.001
0.0001
28142
26493
18919
18184
12066
24725
26988
25028
16768
11650
Table 5.5: Invalidation loads for learning model compared to random guessing
on execution model on write-dominant workload.
5.5.2
Evaluating Trade-offs
The tables above show that the learning model can achieve improvements on
various metrics. However, it is hard to quantify the quality of a trade-off by
simply comparing for instance cache hit rates and invalidations, as was done
above. To this end, the approach of defining a linear combination of utilities
from chapter 4 is used. By defining utility functions for response times,
invalidations and cache hit rates, the global system utility of a workload
instance is defined. Consequently, it can be assessed if and how the learning
model increases utility over time.
65
I use the utility function from section 4.3.3 for latency and a linear function of
invalidation utility, i.e. u(invalidation) = 1−invalidations/writes. Further,
the cache hit rate itself can be used as a utility function because it is already
normalised. Figure 5.12 shows how system utility changes over the number of operations performed during a benchmark instance (read-dominant
workload).
After 2, 000 operations, there is an initially high utility for random decisions
and a rather low utility for learning. I attribute this to warmup effects. As
initial response times are longer, less utility comes from latency. Within a
few thousand operations, the learning model (learning rate α = 0.1) achieves
much higher utility and the utility of random guessing degrades.
0.4
Random guessing
Learner
Utility
0.35
0.3
0.25
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
4
Operations performed
#10
Figure 5.12: Global utility as a function of operations performed.
The key point here is that there is a formal framework of mapping concrete
values of metrics (e.g. latency of 100 milliseconds) to a normalised score. The
learning model is consequently able to learn trade-offs that optimise towards
whatever preference is expressed by the service provider.
66
5.5.3
Convergence and Stability
Finally, the evaluation needs to consider situations when the learning model
has converged and is then confronted with a changing workload. To this
end, I start with the write-dominant workload for 100, 000 operations and
then introduce a higher proportion of reads (mixture of 40% reads, 20%
writes, 40%). Figure 5.13 demonstrates the associated behaviour. Ostensibly,
the model quickly achieves a higher utility which then steadily increases.
However, one can also note that random guessing seems to achieve higher
utility over time. This can be explained through higher cache utilisation. As
the cache fills up again after a change in request load, latency and cache hit
rate improve and total utility increases for all models.
In summary, the evaluation has validated the caching scheme. Using dynamic query caching and the learning scheme, average response times for
clients can be lowered to an imperceptible range (below 100 ms). Further,
I have investigated consistency and invalidation load as manageable practical constraints. Finally, the experiments have demonstrated how an online
learning model can achieve these trade-offs dynamically through a method
motivated by reinforcement learning.
67
0.4
Random guessing
Learner
Utility
0.3
0.2
0.1
0
0.5
1
1.5
Operations performed
2
#10
5
Figure 5.13: Behaviour of learning model versus random guessing under a
change of workload mixture.
68
Chapter 6
Outlook and Conclusion
6.1
Summary and Conclusion
In this project, I have identified remote access latency as a key performance
problem in interactive applications. What is more, I have pointed out the
constraints of naive caching schemes with regard to consistency and invalidation load. Considering these limitations, I have introduced a caching scheme
which I believe can achieve low latency for the client while maintaining tunable invalidation load and consistency. The first component of the caching
mechanism is based on the idea that different representations and execution
models can be used for varying workloads. The second contribution is an
online learning model that uses various approximations to make decisions
based upon these representations. Through the Monte Carlo simulation of
typical workloads, various trade-offs for client performance, cache efficiency
and server load were shown.
To the best of my knowledge, this study has introduced the first model for
caching highly dynamic query content. In principle, any REST-ful web service can implement the proposed architecture thereby achieving drastically
improved response times for clients whithout incurring too great an invalidation load at the backend. On a more general note, this project has provided
69
an example of how the intersection of distributed systems, databases and
machine learning enables more flexible and adaptive infrastructures.
6.2
6.2.1
Future Work
Parsing Query Predicates
This work did not extensively cover query semantics during invalidation. I
suggested comparing before and after images of documents affected by an
update. In future work, query predicates could be parsed by a schemaaware middleware that could enable more efficient invalidation mechanisms.
For instance, on a numeric predicate, deciding whether an invalidation is
necessary is a simple range comparison between update value and predicate
range. On a similar note, knowledge about the schema would also enable
mixed decisions on cache representations. A typical use case of this is a
schema containing a counter, which is an essential data type for today’s
application economy (counting impressions, click-streams). A query would
usually select the counter to display its value. A sensible model might select
to cache counters as id-lists and other parts of a result as an object-list, since
updates on the counter would always invalidate the whole object-list.
6.2.2
Unified Learning Model
In the learning model, estimating expirations and making decisions on the
execution models were treated as two distinct tasks. This is because the
model was lacking a good function approximation of mapping the state space
of load metrics to a pair of expiration time and a decision on the execution
model. In future work, a unified reinforcement learning model that supports
more proactive decisions at the server could be explored. Instead of just
reacting to individual queries, an advanced model could maintain lists of
cachable and uncachable objects. It could then independently decide to
70
push and remove objects to and from caches. This prefetching of data to
edge servers is particularly relevant for initial load times. Further, there are
various other decisions and tunable runtime parameters related to latency. In
particular, my model examines eventual consistency in the context of stale
reads from the cache. Equally, performance could be tuned by adjusting
write concerns at the database cluster itself. That is to say, a tenant might
have a default setting of blocking request responses until an update has been
persisted to all replica sets. This could be relaxed during flash crowds.
71
72
Appendix A
Proofs
A.1
Minimum of Exponential Random Variables
The following theorem is straight-forward but I have not been able to locate
a proof in print [95].
Theorem A.1.1. Let Xi , . . . , Xn be mutually independent exponentially distributed random variables with rate parameters λi , . . . , λn . Then the minimum is again exponentially distributed:
min{X1 , . . . , Xn } ∼ exponential
n
X
!
λi
i=0
Proof. Each Xi has the cumulative distribution function
F (x; λ) = 1 − exp(−λx) f or x ≥ 0
and rate parameter λi . The random variable Xmin = min{X1 , . . . , Xn } has
the CDF
73
F (x; λmin ) = P (Xmin ≤ x)
= 1 − P (min{X1 , . . . , Xn } > x)
= 1 − P (X1 > x, . . . , Xn > x)
n
Y
=1−
P (Xi > x)
i=1
=1−
n
Y
exp(−λi x)
i=1
= 1 − exp −x
n
X
!
λi
i=1
= 1 − exp(−λmin x).
74
Appendix B
Additional Analysis
B.1
Impact of Invalidation Latency
Stale reads
800
600
Naive id-list
Uncachable objects marked
400
200
0
100
150
200
Mean invalidation latency
250
Figure B.1: Stale reads as a function of mean invalidation latency on 100,000
operations. Higher invalidation latency gives rise to more stale reads, as there
is a bigger time window to retrieve stale content from the cache. Marking
frequently written objects as uncachable reduces this effect.
75
B.2
Monte Carlo Optimisation
Table B.1 demonstrates an example of hyperparameter optimisation through
Bayesian inference using the Spearmint framework [88]. Consider the writedominant workload from chapter 5. The target parameters are the writequantile and the maximum time-to-live (values between 0 and 60 seconds
allowed) the model can estimate. In this simple example, the utility of response time is set to three times the utility of the cache hit rate.
Experiment
Quantile p
Maximum ttl (s)
Utility
1
2
3
4
5
0.5
0.75
1
1
1
30
15
2
0
60
0.187
0.217
0.326
0.337
0.364
Table B.1: Bayesian optimisation of optimal quantile p and maximum allowed
ttl.
The Gaussian process quickly predicts that the highest utility is achieved by
setting a high write quantile and a high maximum ttl. Further experiments
did not show any improvement, as the inference model tried to improve the
utility by making tiny adjustments in the allowed range (e.g. a ttl of 59).
Since every run of the simulation takes a few minutes, Bayesian optimisation
is a convenient tool for quickly finding parametrisations. This is simply
done by defining the utility of the performance metrics of interest and then
sampling the Gaussian process for suggestions on the parameters repeatedly.
Every suggestion takes into account the utility of previous suggestions to
quickly find a maximum.
76
Bibliography
[1] Ioannis Arapakis, Xiao Bai, and B. Barla Cambazoglu. Impact of response latency on user behavior in web search. In Proceedings of the
37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’14, pages 103–112, New York,
NY, USA, 2014. ACM.
[2] Wolfgang Lehner and Kai-Uwe Sattler. Web-Scale Data Management
for the Cloud. Springer, New York, 2013 edition, April 2013.
[3] Guoqiang Zhang, Yang Li, and Tao Lin. Caching in information centric
networking: A survey. Comput. Netw., 57(16):3128–3141, November
2013.
[4] R. T. Hurley and B. Y. Li. A performance investigation of web caching
architectures. In Proceedings of the 2008 C3S2E Conference, C3S2E ’08,
pages 205–213, New York, NY, USA, 2008. ACM.
[5] Taekook Kim and Eui-Jik Kim. Hybrid storage-based caching strategy for content delivery network services. Multimedia Tools Appl.,
74(5):1697–1709, March 2015.
[6] Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Steffen
Friedrich, and Norbert Ritter. The cache sketch: Revisiting expirationbased caching in the age of cloud data management. BTW ’15, Hamburg,
Germany, March 2015.
[7] Mukaddim Pathan and Rajkumar Buyya. A taxonomy of cdns. In Rajkumar Buyya, Mukaddim Pathan, and Athena Vakali, editors, Content
Delivery Networks, volume 9 of Lecture Notes Electrical Engineering,
pages 33–77. Springer Berlin Heidelberg, 2008.
[8] Jia Wang. A survey of web caching schemes for the internet. SIGCOMM
Comput. Commun. Rev., 29(5):36–46, October 1999.
77
[9] Werner Vogels. Eventually consistent. Commun. ACM, 52(1):40–44,
January 2009.
[10] Felix Gessert, Steffen Friedrich, Wolfram Wingerath, Michael
Schaarschmidt, and Norbert Ritter. Towards a scalable and unified
REST API for cloud data stores. In 44th annual conference of the society for informatics, Informatik 2014, Big Data - Mastering Complexity,
22.-26. September 2014 in Stuttgart, Deutschland, pages 723–734, 2014.
[11] F. Gessert, F. Bucklers, and N. Ritter. Orestes: A scalable database-asa-service architecture for low latency. In Data Engineering Workshops
(ICDEW), 2014 IEEE 30th International Conference on, pages 215–222,
March 2014.
[12] Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M.
Hellerstein, and Ion Stoica. Probabilistically bounded staleness for practical partial quorums. Proceedings of the VLDB Endowment (PVLDB
2012), 5(8):776–787, 2012.
[13] Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M.
Hellerstein, and Ion Stoica. Quantifying eventual consistency with pbs.
Commun. ACM, 57(8):93–102, August 2014.
[14] Wojciech Golab, Xiaozhou Li, and Mehul A. Shah. Analyzing consistency properties for fun and profit. In Proceedings of the 30th Annual
ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC ’11, pages 197–206, New York, NY, USA, 2011. ACM.
[15] Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev
Kumar, and Harry C. Li. An analysis of facebook photo caching. In
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 167–181, New York, NY, USA, 2013.
ACM.
[16] Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li.
Ripq: Advanced photo caching on flash for facebook. In Proceedings of the 13th USENIX Conference on File and Storage Technologies,
FAST’15, pages 373–386, Berkeley, CA, USA, 2015. USENIX Association.
[17] Stefan Saroiu, Krishna P. Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy. An analysis of internet content delivery systems.
SIGOPS Oper. Syst. Rev., 36(SI):315–327, December 2002.
78
[18] Michael J. Freedman. Experiences with coralcdn: A five-year operational view. In In Proc NSDI, 2010.
[19] Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang.
The stretched exponential distribution of internet media access patterns.
In Proceedings of the Twenty-seventh ACM Symposium on Principles of
Distributed Computing, PODC ’08, pages 283–294, New York, NY, USA,
2008. ACM.
[20] Patrick Wendell and Michael J. Freedman. Going viral: Flash crowds in
an open cdn. In Proceedings of the 2011 ACM SIGCOMM Conference on
Internet Measurement Conference, IMC ’11, pages 549–558, New York,
NY, USA, 2011. ACM.
[21] Salvatore Scellato, Cecilia Mascolo, Mirco Musolesi, and Jon Crowcroft.
Track globally, deliver locally: Improving content delivery networks by
tracking geographic social cascades. In Proceedings of the 20th International Conference on World Wide Web, WWW ’11, pages 457–466,
New York, NY, USA, 2011. ACM.
[22] Mike P. Wittie, Veljko Pejovic, Lara Deek, Kevin C. Almeroth, and
Ben Y. Zhao. Exploiting locality of interest in online social networks. In
Proceedings of the 6th International COnference, Co-NEXT ’10, pages
25:1–25:12, New York, NY, USA, 2010. ACM.
[23] Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and
Vadim Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of the Conference on Innovative Data system Research (CIDR), pages 223–234, 2011.
[24] Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins,
Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat
Jegerlehner, Kyle Littlefield, and Phoenix Tong. F1: The fault-tolerant
distributed rdbms supporting google’s ad business. In Proceedings of
the 2012 ACM SIGMOD International Conference on Management of
Data, SIGMOD ’12, pages 777–778, New York, NY, USA, 2012. ACM.
[25] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes,
Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev,
Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David
Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and
79
Dale Woodford. Spanner: Google’s globally distributed database. ACM
Trans. Comput. Syst., 31(3):8:1–8:22, August 2013.
[26] Charles Garrod, Amit Manjhi, Anastasia Ailamaki, Bruce Maggs, Todd
Mowry, Christopher Olston, and Anthony Tomasic. Scalable query result caching for web applications. Proc. VLDB Endow., 1(1):550–561,
August 2008.
[27] Sadiye Alici, Ismail Sengor Altingovde, Rifat Ozcan, Berkant Barla
Cambazoglu, and Özgür Ulusoy. Timestamp-based result cache invalidation for web search engines. In Proceedings of the 34th International
ACM SIGIR Conference on Research and Development in Information
Retrieval, SIGIR ’11, pages 973–982, New York, NY, USA, 2011. ACM.
[28] Burton H. Bloom. Space/time trade-offs in hash coding with allowable
errors. Commun. ACM, 13(7):422–426, July 1970.
[29] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh,
and George Varghese. An improved construction for counting bloom
filters. In Algorithms–ESA 2006, pages 684–695. Springer, 2006.
[30] Andrei Broder and Michael Mitzenmacher. Network applications of
bloom filters: A survey. Internet Math., 1(4):485–509, 2003.
[31] Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, and
John Lockwood. Deep packet inspection using parallel bloom filters.
In High performance interconnects, 2003. proceedings. 11th symposium
on, pages 44–51. IEEE, 2003.
[32] Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder. Summary
cache: A scalable wide-area web cache sharing protocol. IEEE/ACM
Trans. Netw., 8(3):281–293, June 2000.
[33] W. R. Gilks. Markov Chain Monte Carlo. John Wiley & Sons, Ltd,
2005.
[34] Reuven Y. Rubinstein and Dirk P. Kroese. Markov Chain Monte Carlo,
pages 167–200. John Wiley & Sons, Inc., 2007.
[35] Siddhartha Chib and Edward Greenberg. Understanding the metropolishastings algorithm. THE AMERICAN STATISTICIAN, 1995.
[36] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern
Anal. Mach. Intell., 6(6):721–741, November 1984.
[37] Luc Devroye. Non-uniform random variate generation, 1986.
80
[38] Siddhartha Chib. Chapter 57 - markov chain monte carlo methods:
Computation and inference. volume 5 of Handbook of Econometrics,
pages 3569 – 3649. Elsevier, 2001.
[39] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement
Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998.
[40] G. Yen and T. Hickey. Reinforcement learning algorithms for robotic
navigation in dynamic environments. In Neural Networks, 2002. IJCNN
’02. Proceedings of the 2002 International Joint Conference on, volume 2, pages 1444–1449, 2002.
[41] AndreyV. Gavrilov and Artem Lenskiy. Mobile robot navigation using reinforcement learning based on neural network with short term
memory. In De-Shuang Huang, Yong Gan, Vitoantonio Bevilacqua, and
JuanCarlos Figueroa, editors, Advanced Intelligent Computing, volume
6838 of Lecture Notes in Computer Science, pages 210–217. Springer
Berlin Heidelberg, 2012.
[42] Gerald Tesauro. Temporal difference learning and td-gammon. Commun. ACM, 38(3):58–68, March 1995.
[43] Johannes Fürnkranz. Recent advances in machine learning and game
playing. ÖGAI Journal, 26(2):19–28, 2007.
[44] M. Wiering and M. van Otterlo. Reinforcement Learning: State-of-theArt. Adaptation, Learning, and Optimization. Springer Berlin Heidelberg, 2012.
[45] Glenn F Matthews and Khaled Rasheed. Temporal difference learning
for nondeterministic board games. In IC-AI, pages 800–806, 2008.
[46] Peter Dayan and Bernard W Balleine. Reward, motivation, and reinforcement learning. Neuron, 36(2):285–298, 2002.
[47] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3):279–292, 1992.
[48] Andrew G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning and
sequential decision making. In LEARNING AND COMPUTATIONAL
NEUROSCIENCE, pages 539–602. MIT Press, 1989.
[49] AndrewW. Moore and ChristopherG. Atkeson. Prioritized sweeping:
Reinforcement learning with less data and less time. Machine Learning,
13(1):103–130, 1993.
81
[50] Kenneth O. Stanley and Risto Miikkulainen. Efficient reinforcement
learning through evolving neural network topologies. In Proceedings of
the Genetic and Evolutionary Computation Conference, GECCO ’02,
pages 569–577, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.
[51] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari
with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[52] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu,
Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie,
Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran,
Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533,
02 2015.
[53] Carl Edward Rasmussen and Malte Kuss. Gaussian processes in reinforcement learning. In Advances in Neural Information Processing
Systems 16, pages 751–759. MIT Press, 2004.
[54] Yaakov Engel, Shie Mannor, and Ron Meir. Reinforcement learning with
gaussian processes. In Proceedings of the 22Nd International Conference
on Machine Learning, ICML ’05, pages 201–208, New York, NY, USA,
2005. ACM.
[55] G. Tesauro, R. Das, W.E. Walsh, and J.O. Kephart. Utility-functiondriven resource allocation in autonomic systems. In Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference
on, pages 342–343, June 2005.
[56] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani. A hybrid reinforcement learning approach to autonomic resource allocation. In Proceedings
of the 2006 IEEE International Conference on Autonomic Computing,
ICAC ’06, pages 65–73, Washington, DC, USA, 2006. IEEE Computer
Society.
[57] Jianxin Yao, Chen-Khong Tham, and Kah-Yong Ng. Decentralized dynamic workflow scheduling for grid computing using reinforcement learning. In Networks, 2006. ICON ’06. 14th IEEE International Conference
on, volume 1, pages 1–6, Sept 2006.
[58] Sebastian Angel, Hitesh Ballani, Thomas Karagiannis, Greg O’Shea,
and Eno Thereska. End-to-end performance isolation through virtual
82
datacenters. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14, pages 233–248,
Berkeley, CA, USA, 2014. USENIX Association.
[59] Facebook. The parse backend platform. https://parse.com/.
[60] David Flanagan. JavaScript: The Definitive Guide. O’Reilly Media,
Inc., 2006.
[61] Peter Bailis and Ali Ghodsi. Eventual consistency today: limitations,
extensions, and beyond. Communications of the ACM, 56(5):55–63,
2013.
[62] Peter Bailis, Ali Ghodsi, Joseph M Hellerstein, and Ion Stoica. Bolt-on
causal consistency. In SIGMOD 2013, pages 761–772. ACM, 2013.
[63] Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai
Budiu, and Yaron Minsky. Bimodal multicast. ACM Trans. Comput.
Syst., 17(2):41–88, May 1999.
[64] fastly.
Blog post on multicast implementation in fastly.
http://www.fastly.com/blog/building-fast-and-reliable-purgingsystem/, February 2014.
[65] MongoDB, Inc. MongoDB. http://www.mongodb.org/.
[66] MongoDB, Inc.
Tutorial on query optimization for mongodb.
http://docs.mongodb.org/manual/core/query-optimization/, 2015.
[67] Ilya Grigorik. Presentation on http/2 mechanics. goo.gl/8yczyz.
[68] Saar Cohen and Yossi Matias. Spectral bloom filters. In Proceedings of
the 2003 ACM SIGMOD International Conference on Management of
Data, SIGMOD ’03, pages 241–252, New York, NY, USA, 2003. ACM.
[69] The
Apache
Software
https://storm.apache.org/.
Foundation.
Apache
Storm.
[70] Craig Jefferds.
Sift.js library for evaluating mongodb-queries.
https://github.com/crcn/sift.js/tree/master.
[71] R.G. Gallager. Discrete Stochastic Processes. The Springer International
Series in Engineering and Computer Science. Springer US, 1995.
[72] Amine Abou-Rjeili and George Karypis. Multilevel algorithms for partitioning power-law graphs. In Proceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS’06, pages 124–
124, Washington, DC, USA, 2006. IEEE Computer Society.
83
[73] Yinglian Xie and D. O’Hallaron. Locality in search engine queries and
its implications for caching. In INFOCOM 2002. Twenty-First Annual
Joint Conference of the IEEE Computer and Communications Societies.
Proceedings. IEEE, volume 3, pages 1238–1247 vol.3, 2002.
[74] Satinder Singh and Richard S. Sutton. Reinforcement learning with
replacing eligibility traces. In MACHINE LEARNING, pages 123–158,
1996.
[75] Hesam Montazeri, Sajjad Moradi, and Reza Safabakhsh. Continuous
state/action reinforcement learning: A growing self-organizing map approach. Neurocomputing, 74(7):1069–1082, 2011.
[76] Djallel Bouneffouf, Amel Bouzeghoub, and Alda Lopes Gançarski. A
contextual-bandit algorithm for mobile context-aware recommender system. In Neural Information Processing, pages 324–331. Springer, 2012.
[77] Graham Cormode, Minos Garofalakis, Peter J. Haas, and Chris Jermaine. Synopses for massive data: Samples, histograms, wavelets,
sketches. Found. Trends databases, 4(1–3):1–294, January 2012.
[78] Charu C. Aggarwal. On biased reservoir sampling in the presence of
stream evolution. In Proceedings of the 32Nd International Conference
on Very Large Data Bases, VLDB ’06, pages 607–618. VLDB Endowment, 2006.
[79] Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions
on Mathematical Software (TOMS), 11(1):37–57, 1985.
[80] Graham Cormode, Vladislav Shkapenyuk, Divesh Srivastava, and Bojian Xu. Forward decay: A practical time decay model for streaming
systems. In Data Engineering, 2009. ICDE’09. IEEE 25th International
Conference on, pages 138–149. IEEE, 2009.
[81] James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes
for big data. arXiv preprint arXiv:1309.6835, 2013.
[82] Joaquin Quiñonero-Candela and Carl Edward Rasmussen. A unifying
view of sparse approximate gaussian process regression. The Journal of
Machine Learning Research, 6:1939–1959, 2005.
[83] Edward Snelson and Zoubin Ghahramani. Local and global sparse gaussian process approximations. In International Conference on Artificial
Intelligence and Statistics, pages 524–531, 2007.
84
[84] James S. Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl.
Algorithms for hyper-parameter optimization. In J. Shawe-Taylor, R.S.
Zemel, P.L. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances
in Neural Information Processing Systems 24, pages 2546–2554. Curran
Associates, Inc., 2011.
[85] James Bergstra and Yoshua Bengio. Random search for hyper-parameter
optimization. J. Mach. Learn. Res., 13(1):281–305, February 2012.
[86] Nimalan Mahendran, Ziyu Wang, Firas Hamze, and Nando de Freitas.
Adaptive mcmc with bayesian optimization. In Neil D. Lawrence and
Mark Girolami, editors, AISTATS, volume 22 of JMLR Proceedings,
pages 751–760. JMLR.org, 2012.
[87] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian
optimization of machine learning algorithms. In F. Pereira, C.J.C.
Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2951–2959. Curran Associates,
Inc., 2012.
[88] Jasper Snoek.
Spearmint package for bayesian optimisation.
https://github.com/HIPS/Spearmint.
[89] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan,
and Russell Sears. Benchmarking cloud serving systems with ycsb. In
Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10,
pages 143–154, New York, NY, USA, 2010. ACM.
[90] Amazon Web Services. Amazon Elastic Compute Cloud (amazon ec2).
http://aws.amazon.com/de/ec2/.
[91] Scott T. Leutenegger and Daniel Dias. A modeling study of the tpc-c
benchmark. In Proceedings of the 1993 ACM SIGMOD International
Conference on Management of Data, SIGMOD ’93, pages 22–31, New
York, NY, USA, 1993. ACM.
[92] Michael Schaarschmidt. Github project page of the Monte Carlo simulation framework. https://github.com/mschaars/Query-SimulationFramework.
[93] Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online
social networks. In Proceedings of the 7th ACM SIGCOMM Conference
on Internet Measurement, IMC ’07, pages 29–42, New York, NY, USA,
2007. ACM.
85
[94] L. Breslau, Pei Cao, Li Fan, G. Phillips, and S. Shenker. Web caching
and zipf-like distributions: evidence and implications. In INFOCOM
’99. Eighteenth Annual Joint Conference of the IEEE Computer and
Communications Societies. Proceedings. IEEE, volume 1, pages 126–134
vol.1, Mar 1999.
[95] Daniel
S.
Myers.
Lecture
notes
on
exponential
distributions.
http://pages.cs.wisc.edu/
dsmyers/cs547/lecture 9 memoryless property.pdf.
86

Download Report

Towards Latency: An Online Learning Mechanism for Caching

Paperzz.com

Your Paperzz