Replica-set

A Deep-Dive with Azure DocumentDB:
Partitioning, Data Modelling, and Geo Replication
Andrew Liu
[email protected]
Session objectives and takeaways
Yesterday's Session:
Objectives for Today:
A brief recap for those who missed yesterday…
Gartner’s 3Vs of Big Data
Volume
Velocity
Variety
How can my app deal with
massive volume of data &
throughput?
How do I write responsive apps?
How do I deal with schema
changes?
How do I elastically scale my
database?
How do I make data available
where my users are?
How do I write highly available
apps?
How do I iterate rapidly?
What data models work at scale?
Common scenarios + use cases
Retail
• Product Catalog
• Ordering and Payment Pipelines
• Personalization
• Customer 360 View
Gaming
• Multiplayer Games
• Social Gameplay
• Leaderboards
• Game Analytics
IoT / Sensor Data
• Telemetry + Event Store
• Telematics
• Device Registry
Ad Technology + Social Analytics
• User behavior telemetry
• Recommendations
DocumentDB Capabilities
Elastic and limitless
global scale
•
•
Guaranteed low latency
•
•
•
•
Independently scale throughput and
storage - locally and globally
Transparent partition management and
routing
SQL and JavaScript –
schema free
•
•
<10ms reads/<15ms writes @ P99.
Requests are served from local region
Write optimized, latch-free database
engine designed for SSDs and low latency
access.
Synchronous and automatic document
indexing at sustained ingestion rates
•
•
•
Automatic tree path based indexing
No schemas or secondary indices required
upfront
SQL and JavaScript language integrated
queries
Hash, range, and spatial
Multi-document, JavaScript language
integrated transactions
Multiple consistency levels
•
•
•
Multiple well defined consistency levels
Intuitive programming model for relaxed consistency models
Clear PACELC tradeoffs and 99.99% availability SLAs
DocumentDB 101 (ish)
Architecture (Behind the Scenes)
region
datacenter
datacenter
federation
federation
FD
•
DocumentDB service is manifested as an overlay
network with ring topology (aka federation)
resource
partitionset
•
Resources are partitioned; they span
federations, datacenters and regions
Partitionset
partition
•
partition
replica
Partitions are made highly available by replicasets
•
A replica in-turn hosts the DocumentDB
database engine and implements the replication
protocol and local persistence
physical
logical
Resource Model
1
•
•
•
Partition set
Resources
identified by their logical and stable URI
Represented as JSON documents
Partitioned and across span machines, clusters and regions
Replica-set
=
DocumentDB
Collection
3
•
•
2
•
•
Resource model
Stateless interaction (HTTP and TCP)
Hierarchical overlay atop partitioning model
Global distribution
US-East
Partitions
US-West
Partitioning Model
Grid Partitioning – horizontal based on hash/range
and vertical across regions
Each partition made highly available via a replica set
N Europe
Local distribution
Let’s talk about…
Everything you need to know to build
Blazing fast, planet-scale applications!
Collections != Tables
Collections do NOT enforce schema
Co-locate multiple types in a collection
Annotate documents with a "type" property
Co-locating types in the same collection
Ability to query across multiple entity types with a single network request.
Ability to query across multiple entity types with a single network request.
For example, we have two types of documents: cat and person.
{
"id": "Andrew",
"type": "Person",
"familyId": "Liu",
"worksOn": "DocumentDB"
{
"id": "Ralph",
"type": "Cat",
"familyId": "Liu",
"fur": {
"length": "short",
"color": "brown"
}
}
}
Ability to query across multiple entity types with a single network request.
For example, we have two types of documents: cat and person.
{
"id": "Andrew",
"type": "Person",
"familyId": "Liu",
"worksOn": "DocumentDB"
{
"id": "Ralph",
"type": "Cat",
"familyId": "Liu",
"fur": {
"length": "short",
"color": "brown"
}
}
}
We can query both types of documents without needing a JOIN simply by running a query without a filter on type:
SELECT * FROM c WHERE c.familyId = "Liu"
Ability to query across multiple entity types with a single network request.
For example, we have two types of documents: cat and person.
{
"id": "Andrew",
"type": "Person",
"familyId": "Liu",
"worksOn": "DocumentDB"
{
"id": "Ralph",
"type": "Cat",
"familyId": "Liu",
"fur": {
"length": "short",
"color": "brown"
}
}
}
If we wanted to filter on type = “Person”, we can simply add a filter on type to our query:
SELECT * FROM c WHERE c.familyId = "Liu" AND c.type = "Person"
Co-locating types in the same collection
Ability to query across multiple entity types with a single network request.
Ability to perform transactions across multiple types
Cost: every collection has one or more physical partitions underneath
Let's talk about partitioning.
Two Dimensions: Throughput and Storage
Measuring Throughput (Request Units)
% CPU
% Memory
% IOPS
Document
Documents
Document
Incoming Requests
Request Unit/sec (RU) is
the normalized currency
Rate
limit
Max RU/sec
No
throttling
Min RU/sec
Replica
Quiescent
Documents
Requests get rate limited
if they exceed the SLA
Operations consume request units (RUs)
Replica gets a fixed
budget of request units
Customers pay for reserved
request units by the hour
Partitioning Model
Collection
….
Partition 1
Partition 2
…
Partition i
….
Partition n
Partitioning Model
Partition Key = city
Houston
London
Chicago
New Delhi
Mumbai
Paris
New York
….
…
….
Boston
Berlin
…
Partition 1
…
Partition 2
Partition i
Partition n
Overall request volume should scale across Partition Keys
….
…
Partition 1
…
…
Partition 2
Partition i
….
…
Partition n
Overall request volume should scale across Partition Keys
….
…
Partition 1
…
…
Partition 2
Partition i
….
…
Partition n
Individual queries should minimize cross-partition lookups
….
…
Partition 1
…
…
Partition 2
Partition i
….
…
Partition n
Partition Key Design Goals
Choosing a Partition Key
Let’s talk about object model
"With great power comes great responsibility“
- Uncle Ben
How do approaches differ?
How do approaches differ?
Data normalization
How do approaches differ?
Data normalization
Come as you are
Modeling Data: The Relational Way
Person
Id
PersonContactDetailLnk
PersonId
ContactDetail
Id
ContactDetailId
Address
Id
ContactDetailType
Id
Modeling Data: The Document Way
Person
{
"id": "0ec1ab0c-de08-4e42-a429-...",
"addresses": [
{ "street": "1 Redmond Way",
"city": "Redmond", "state": "WA",
"zip": 98052}
],
"contactDetails": [
{"type": "home", "detail": “555-1212"},
{"type": "email", "detail": “[email protected]"}
],
...
Id
Addresses
Address
…
Address
…
ContactDetails
ContactDetail
…
}
To embed, or to reference, that is the question
Data modeling with denormalization
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"addresses": [
{
"line1": "100 Some Street",
"line2": "Unit 1",
"city": "Seattle",
"state": "WA",
"zip": 98012 }
],
"contactDetails": [
{"email: "[email protected]"},
{"phone": "+1 555 555-5555", "extension": 5555}
]
Try model your entity as a selfcontained document
Generally, use embedded data
models when:
contains
one-to-few
changes infrequently
bounds
won’t grow without
}
integral
better read performance
Data modeling with referencing
In general, use normalized data
models when:
{
"id": "address_xyz",
"userid": "xyz",
"address" : {
…
}
{
"id": "xyz",
"username: "user xyz"
Write performance
one-to-many
many-to-many
}
changes frequently
}
{
"id: "contact_xyz",
"userid": "xyz",
"email" : "[email protected]"
"phone" : "555 5555"
}
Normalizing typically provides better write performance
Hybrid models
No magic bullet
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [
{"thumbnail": "http://....png"}
{"profile": "http://....png"}
]
}
Model on a property-level
(as opposed to record-level)
Optimize your data model for
your workload…
{
"id": 1,
"name": "DocumentDB 101",
"authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}
]
}
Hybrid Approach:
(as opposed to blindly following types)
Segment data based on mutability
Query and Indexing
38
Documents as Trees
JSON serializable
values (aka JSON
Infoset)
JavaScript Object Literals
{
"locations":
[
{ "country": "Germany", "city": "Berlin" },
{ "country": "France", "city": "Paris" }
],
"headquarter": "Belgium",
"exports":[{ "city": "Moscow" },{ "city": "Athens"}]
}
locations headquarter
0
country
Germany
city
Berlin
1
country
France
Belgium
city
Paris
exports
0
1
city
city
Moscow
Athens
JSON document as tree
Query
SELECT C.locations
FROM company C
WHERE C.headquarter = "Belgium"
function businessLogic() {
var country = "Belgium";
__.filter(function(x){return x.headquarter===country;});}
JavaScript
SQL
{ "locations":
[ { "country":
{ "country":
],
"headquarter":
"exports": [{
}
"Germany", "city": "Berlin" },
"France", "city": "Paris" }
"Belgium",
"city": "Moscow" }, { "city": "Athens" }]
locations headquarter
0
country
Germany
city
Berlin
1
country
France
Belgium
city
Paris
{ "locations": [{ "country": "Germany", "city": "Bonn", "revenue": 200 } ],
"headquarter": "Italy",
"exports": [ { "city": "Berlin","dealers": [{"name": "Hans"}] }, { "city": "Athens" }
]
}
locations headquarter
exports
0
0
city
Moscow
1
country
city
Athens
Germany
revenue
Bonn
city
Berlin
200
1
dealers
city
0
Athens
name
Input documents
Hans
{
"results":
[
{
"locations":
[
{"country":"Germany","city":"Berlin"},
{"country":"France","city":"Paris"}
]
}
]
Query result
}
0
Italy
city
exports
results
0
locations
0
country
Germany
city
Berlin
1
country
France
city
Paris
Query
{"id":"GermanTax",
"body": "function GermanTax(income) {
if(income < 1000) return income * 0.1;
else if(income < 10000) return income * 0.2;
return income * 0.4;
}"
SELECT location.city, GermanTax(location.revenue) AS Tax
FROM location IN company.locations
WHERE location.revenue > 100
UDF
}
{
{
"locations":
[ { "country":
{ "country":
],
"headquarter":
"exports": [{
"locations": [{ "country": "Germany", "city": "Bonn", "revenue": 200 }],
"headquarter": "Italy",
"exports":
[{"city": "Berlin","dealers": [{"name":"Hans"}]}, {"city":"Athens"}]
"Germany", "city": "Berlin" },
"France", "city": "Paris" }
"Belgium",
"city": "Moscow" }, { "city": "Athens" }]
}
}
locations headquarter
0
locations headquarter
0
country
Germany
city
Berlin
1
country
France
Belgium
city
Paris
exports
0
country
1
city
city
Moscow
Athens
Germany
city
Bonn
0
Italy
revenue
city
Berlin
200
exports
1
dealers
city
0
Athens
name
Input documents
Hans
{
results
"results":
[
{"city":"Bonn","Tax":20}
]
0
city
}
Query result
Bonn
Tax
20
Schema Agnostic Indexing
• Logically the index is a union of all the document trees
• Structure contributed by the interior nodes, instance values are the leaves
• Columnar index for fast scans
• Support for rich hierarchical, relational and analytical queries
• Different path encodings depending on index type
• Support for multi-tenancy requires fixed upper bound on index size
• Structural information and instance values are normalized into a
unifying concept of JSON-Path
Common
structure
0
Germany
location
location
country
0
0
country
coordinates
country
0
0
0
0
country
location
location
Germany
Range (>, <, !=) &
ORDERBY queries
Wildcard queries
Spatial queries
Terms
Postings List
$/location/0/
1, 2
location/0/country/
1, 2
location/0/city/
1, 2
0/country/Germany
1, 2
1/country/France
2
…
…
0/city/Moscow
2
0/dealers/0
2
Dynamic
Encoding of
Postings List
(E-WAH/differential)
Queries that use the index
Indexing Policies
Configuration
Level
Options
Automatic
Per collection
True (default) or False
Override with each document write
Indexing Mode
Per collection
Consistent or Lazy
Lazy for eventual updates/bulk ingestion
Included and excluded
paths
Per path
Individual path or recursive includes (? And *)
Indexing Type
Per path
Support Hash (Default) and Range
Hash for equality, range for range queries
Indexing Precision
Per path
Supports 3 – 7 per path
Tradeoff storage, query RUs and write RUs
Indexing Paths
Path
/
Description/use case
Default path for collection. Recursive and applies to whole document tree.
/"prop"/?
Serve queries like the following (with Hash or Range types respectively):
SELECT * FROM collection c WHERE c.prop = "value"
SELCT * FROM collection c WHERE c.prop > 5
/"prop"/*
All paths under the specified label.
/"prop"/"subprop"/
Used during query execution to prune documents that do not have the
specified path.
Serve queries (with Hash or Range types respectively):
/"prop"/"subprop"/?
SELECT * FROM collection c WHERE c.prop.subprop = "value"
SELECT * FROM collection c WHERE c.prop.subprop > 5
Global Distribution
Multi-region DocumentDB databases
Total RUs =
Provisioned RUs x Number of
regions
Partition set
Replica-set
2M RUs
In this example:
2M RUs x 3 regions = 6M RUs
A DocumentDB collection
DocumentDB
Collection
Primary Replica-sets
2M RUs
Global distribution
US-East
Partitions
US-West
Secondary Replica-sets
2M RUs
India
Secondary Replica-sets
2M RUs
Local distribution
Programmable data consistency
Strong consistency,
High latency
“Its hard to write distributed apps.”
Eventual consistency, Low
latency
Consistency Levels
• PACELC Theorem and the associated tradeoffs
Consistency Levels
• Strong, Eventual, Bounded Staleness, and Session
LEFT TO RIGHT 
Weaker Consistency, Better Read scalability, Lower write latency
Strong
S
Client
Client
Client
P
Session
Bounded Staleness
S
P
•
•
S
P
S
Consistent Prefix reads.
Reads lag behind writes by K
prefixes or T interval
•
S
Eventual
Client
S
Monotonic reads, writes and
Read your writes guarantee
Client
P
S
S
General Tips
General Tips: Low latency
void ServerStart()
{
...
await _client.OpenAsync();
}
return new DocumentClient(endpoint,
key, policy);
DocumentClient _client
DocumentClient _client
DocumentClient _client
Server Instance
Server Instance
Server Instance
Create a singleton instance of DocumentClient
for an app server instance
ConnectionPolicy policy = new
ConnectionPolicy
{
Protocol = Protocol.Tcp,
Mode = ConnectionMode.Direct
};
Warm up DocumentClient cache by calling
DocumentClient.OpenAsync() upon start of
your app server
Use Direct Connectivity and TCP for .NET SDK
Use Direct Connectivity and HTTPS for Java
SDK
General Tips: Throughput
Throughput
100
80
60
40
20
0
Use relaxed consistency levels for efficient
utilization of provisioned throughput
POST .../colls
{
GET https://.../docs
x-ms-max-item-count: 1
If-None-Match: "28535"
A-IM: Incremental feed
x-ms-documentdb-partitionkeyrangeid: 16
...
Subscribe for changes via change feed APIs
instead of polling and reading the entire feed
...
indexingPolicy : {
IndexingMode : "None"
…
}
If you intend to use DocumentDB as a KV
store, you can tell them system to drop the
secondary indexes. This will also save storage.
Roadmap 2017
Change Feed
Distributed replication log
Keep your cache or data warehouse up to date
Perform notifications on changes
Perform streaming aggregation
Lambda pattern with significantly lower TCO
Single scalable database solution for both ingestion and
query
Aggregates at global scale
Low latency aggregates at any scale
Supported via Updatable, column store index at global scale
Deeply integrated with latch free, log structured database
engine
Preview now available
Spark connector for DocumentDB
RDD and Dataset-based connectors available
Native integration with Spark SQL
Direct mapping to DocumentDB partitions
Natively leverage DocumentDB index
Predicate pushdown
Public release in H1 CY2017
Pricing and scaling improvements
Enable bursting up to 10x for spiky workloads
Reduced starting price for partitioned collections (4x)
Create up to 10 TB collections without support ticket
Deprecating S1 – S3 offers
Bursting available H1 2017
Graph APIs
SQL and Gremlin query
Independently scalable graph engine using TinkerPop
Optimized query engine for relationship traversals
Schema freedom for ad-hoc expansion of attributes on
nodes & edges
Limitless scale to support massive graphs
Same NoSQL stack
Session objectives and takeaways
Continue your Ignite learning path
Visit Channel 9 to access a wide range of Microsoft training
and event recordings https://channel9.msdn.com/
Head to the TechNet Eval Centre to download trials of the latest
Microsoft products http://Microsoft.com/en-us/evalcenter/
Visit Microsoft Virtual Academy for free online training visit
https://www.microsoftvirtualacademy.com
Microsoft Ignite