16-arka-relationalcl.. - People @ EECS at UC Berkeley


Relational Cloud:
A Database-as-a-Service for the Cloud
Carlo Curino, Evan Jones, Raluca Ada Popa, Nirmesh
Malaviya, Eugene Wu, Sam Madden, Hari Balakrishnan,
Nickolai Zeldovich
Presented by
Arka Bhattacharya (for CS 294,Berkeley)
(some slides are taken from the CIDR ‘11 talk)
THE STARTUP STORY
Motivation
 Why move to the cloud ?
 Economies of scale (hw & licensing costs)
 Pay per use & lower administrative costs
 Present players :
 Amazon RDS (MySQL on EC2)
 Microsoft SQL Azure
Problems !
 Problems arising :
 Efficient Multi-tenancy (Provider)
 Elastic scalability (Provider)
 Privacy (User)
 Note : Relational Cloud is mainly for OLTP
workloads & DAS architectures , consistency
guarantees
1. Efficient Multi-tenancy –
Placement & Migrations
 Problem : Consolidate databases into smallest number
of servers, balancing load and without affecting
performance
 Solution : Kairos , SIGMOD ’11
 Upto 17:1 consolidation
 Key insight : Single database server per machine + logical
databases ; (as opposed to DB in VM , or multiple DB
servers per machine )

Reduces redundant work, group commits, lower RAM
wastage, code sharing, cheaper context switches
Kairos ….cntd
 Measure RAM,CPU & Disk usage of a database, and
estimate combined load



RAM : Probe table to gauge working set size ; additive
Disk : Deduce model by testing DBMS with different write
rates & working set size & measuring amount of IO
CPU : additive
 Frame optimization problem (non-linear programming)
 Solving takes time
 After lots of heuristics, optimization solutions terminate
in 8 minutes for 20 servers & 100 workloads !
2. Elastic Scalability
Database Partitioning
 Problem : Partition an OLTP database into N chunks
so as to maximize performance
 Solution : Schism , VLDB 2010
 Close to optimal
 Key insight : Minimize number of distributed
transactions
 Advantage over Hashing, round-robin
 Use workload trace to find good partitions
Schism …cntd
Schism …. cntd
 Use a classifier to capture partitioning in compact
form , for efficient query routing
 Lots of heuristics to choose good workload sample
 Sampling , blanket state filtering, etc
 Graph Partitioning in fast ( < 40 sec )
 Achieves almost linear scalability !
3. Privacy
 Problem :
 Prevent DBA from snooping on data
 ensure data security during application and DBMS
server compromise
 Solution : CryptDB , SOSP 2011
 Low overhead ~ 22.5%
 Key insight : Adjustable security
CrpytDB …Onions
RND
DET : equality selection
DET :
equality join
Any value
Onion 1
RND
OPE : inequality select
OPE-inequality join
Any value
Onion 2
HOM
int value
Onion 3
Overall architecture
DB stats
Partitions & placements
Relational Cloud
 Advantages :
 Unmodified DB backends
 Workload-aware consolidation
 Workload-aware sharding
 High availability via replication of front-end servers
 SQL over encrypted data