Web-scale databases: how today`s big web companies make sure

Consistency in a
distributed world
Giovanni Chierico | May 2012 | Дубна
My goals
• Quick Review
• Distributed Transactions
• A Different Approach
• CAP Theorem
• Eventually Consistent
• Best Practices
Giovanni Chierico | May 2012 | Дубна
2
Quick Review
The Good
• Concurrency
• Consistency
• Integrity
The Bad
large scale
systems?
• Locks
The really ugly
• Failures
Giovanni Chierico | May 2012 | Дубна
3
Distributed System
Still the same problems
• Concurrency
• Consistency
• Integrity
But more things that can fail
Giovanni Chierico | May 2012 | Дубна
4
Two Generals’ Problem
G1
Rules
1. Single General attack ⇒ Defeat
2. Double attack ⇒ Victory
3. Unreliable communication
Giovanni Chierico | May 2012 | Дубна
G2
E
1.
2.
3.
4.
G1: Let’s attack at 18:45
G2: confirmation
G1: confirmation
…
5
Two-Phase Commit (2PC)
Goal: all commit or all rollback
1. Prepare Phase
• Initiator asks other nodes to promise to
commit or rollback, even if there’s a
failure
• If any node cannot prepare ⇒ rollback
2. Commit Phase
• Initiator commits and asks others to do
the same
Giovanni Chierico | May 2012 | Дубна
6
Two-Phase Commit (2PC)
Prepare Phase: promise to commit or roll-back
1. Record operation in the “REDO” logs so that it can either
commit or rollback regardless of failures
2. Place a “read lock” on the modified tables
3. Flag the transaction as “in-doubt”
Non-failure case
1. Coordinator will ask to either commit or rollback
2. Remove the locks
Failure
• Transaction will remain “in-doubt” and resources are
inaccessible
Giovanni Chierico | May 2012 | Дубна
7
Two-Phase Commit (2PC)
• Pros
• ACID
• Transparent (abstraction
doesn’t “leak”)
“Better” Solutions Exist
• More Complex
• More expensive
• Consensus (Paxos)
• Google’s “Chubby”
• Cons
• When it works
• Expensive
• Read Locks
• When it fails
• Leaves Locks
Giovanni Chierico | May 2012 | Дубна
8
A Different Approach
Your Coffee Shop Doesn’t Use Two-Phase Commit
Employees: Baristas (B) and Cashiers (C)
Process
1. You order to C
2. C writes your name on cup and puts it in a queue
3. You pay to C
4. B eventually prepares coffee and calls your name
5. You pick up your coffee
Asynchronous
• Pros: less locking ⇒ more efficient use of resources
• Cons: A whole set of different problems …
Giovanni Chierico | May 2012 | Дубна
9
Asynchronous Problems
Correlation: orders might be fulfilled not in the order they are
queued ⇒ correlation identifier
Exception Handling: cannot be easily “abstracted”
• Write-off: coffee made but you can’t pay
• Retry: coffee was not good (idempotent receivers)
• Compensating action: coffee machines breaks
Optimistic “Happy Day” Approach
Pessimistic Approach: Escrow Company
• Prepare: debit money
• Rollback: credit money back
• Commit: do nothing
Giovanni Chierico | May 2012 | Дубна
10
Compensating Action
• Not as “simple” as in the ACID world
• Some things cannot be compensated for
• Shifts burden
• From infrastructure (declarative)
• To the client (ad-hoc solutions)
• Less “economy of scale”
• Monitoring, Control, ...
Giovanni Chierico | May 2012 | Дубна
11
Conversation Pattern
Half-sync
Half-async
Async
Sync
Sync
Giovanni Chierico | May 2012 | Дубна
12
CAP Theorem
• Consistency
Distributed Transactions
• Availability
• Partition-tolerance
• All 3 are desirable
• Can have any 2 but not 3
ACID Vs BASE
Basically Available, Soft State, Eventually Consistent
Giovanni Chierico | May 2012 | Дубна
13
Eventually Consistent
Partitions are a given in larger systems
• Relax availability
• Might refuse to write
• Relax consistency
• Accept a write but this is not reflected in subsequent
reads
• Strong: after update any read will get the updated value
•
•
Weak: inconsistency window
Eventual: if not further updates all the read will “eventually”
get the update value. Inconsistency Window = f(delays, load,
#replicas, …)
Giovanni Chierico | May 2012 | Дубна
http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
14
Eventually Consistent
• Causal C:
• A updates, tells B, B reads updated value
• Read-yours-writes C
• special case of previous
• Session Consistency C
• as long the session exists RYWC is guaranteed
• Monotonic Read C
• Once you see a value you never see a previous version
• Monotonic Write C
• Serialise Writes
Amazon’s Dynamo “has brought all of these properties under explicit
control of the application architecture”, “allow the application service
owner […] to make the trade-offs between consistency, durability,
availability, and performance at a certain cost point"
Giovanni Chierico | May 2012 | Дубна
15
Scalability Best Practices
EBay
• Avoid Distributed Transactions
• “we allow absolutely no client-side or distributed
transactions of any kind - no two-phase commit.”
• Decouple Functions Asynchronously
• Messages and queues
• Move Processing To Asynchronous Flows
• Execution latency Vs User latency
• Scale for peak Vs Scale for average
Giovanni Chierico | May 2012 | Дубна
16
Lesson Learned
• Large distributed systems often have different
needs and requirements
• To maximise “business value” we might need
to relax some constraints
• Problems are often “wicked” and the best
solution depends on a lot of details and
dependencies
Giovanni Chierico | May 2012 | Дубна
17
Q&A
Giovanni Chierico | May 2012 | Дубна
18
спасибо
Globe of Science and Innovation, CERN
Giovanni Chierico | May 2012 | Дубна
19
Giovanni Chierico | May 2012 | Дубна
20