Distributed Systems: Consistency Models & Vector Clocks Goals of Large-scale Distributed Systems: Scalability, Availability and Fault Tolerance (Reliability or Robustness) Partitions and replications are two key techniques for achieving these goals • • allow concurrency & parallelism, but often need to maintain shared (distributed) state! Consistency • • becomes an issue! may need some form of synchronization (of state) or notion of “global” time for ordering events CSci8211: Consistency Models & Vector Clocks 1 Availability, Reliability, Consistency & Performance Trade-offs Eric Brewer’s CAP Theorem: In a large-scale distributed system (thus latency & networking issues become critical), we can have all three of the following: consistency, availability and tolerance of network partitions! Unlike classical single-machine or small cluster systems such as classical relational database systems or networked file systems Large “real” (operational) large-scale systems sacrifice at least one of these properties: often consistency • • e.g., DNS, (nearly all) today’s web services BASE: Basically Availability, Soft State & Eventual Consistency What really at stake: latency, failures & performance • • large latency makes ensuring strong consistency expensive availability vs. Consistency: yield (throughput) & harvest (“goodput”) 2 Classical Consistency Models Consistency models not using synchronization operations. Consistency Description Strict Absolute time ordering of all shared accesses matters. Linearizability All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (non-unique) global timestamp Sequential All processes see all shared accesses in the same order. Accesses are not ordered in time Causal All processes see causally-related shared accesses in the same order. FIFO All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order Models with synchronization operations. Consistency Description Weak Shared data can be counted on to be consistent only after a synchronization is done Release Shared data are made consistent when a critical region is exited Entry Shared data pertaining to a critical region are made consistent when a critical region is entered. What is a Consistency Model? • A Consistency Model is a contract between the software and the memory – it states that the memory will work correctly but only if the software obeys certain rules • The issue is how we can state rules that are not too restrictive but allow fast execution in most common cases • These models represent a more general view of sharing data than what we have seen so far! Conventions we will use: W(x)a means “a write to x with value a” R(y)b means “a read from y that returned value b” “processor” used generically Strict Consistency • Strict consistency is the strictest model – a read returns the most recently written value (changes are instantaneous) – not well-defined unless the execution of commands is serialized centrally – otherwise the effects of a slow write may have not propagated to the site of the read – this is what uniprocessors support: a = 1; a = 2; print(a); always produces “2” – to exercise our notation: P1: W(x)1 P2: R(x)0 R(x)1 – is this strictly consistent? Sequential Consistency • Sequential consistency (serializability): the results are the same as if operations from different processors are interleaved, but operations of a single processor appear in the order specified by the program • Example of sequentially consistent execution: P1: W(x)1 P2: R(x)0 R(x)1 • Sequential consistency is inefficient: we want to weaken the model further Causal Consistency • Causal consistency: writes that are potentially causally related must be seen by all processors in the same order. Concurrent writes may be seen in a different order on different machines – causally related writes: the write comes after a read that returned the value of the other write • Examples (which one is causally consistent, if any?) P1: W(x)1 P2: P3: P4: P1: W(x)1 P2: P3: P4: W(x)3 R(x)1 W(x)2 R(x)1 R(x)1 R(x)3 R(x)2 R(x)2 R(x)3 R(x)1 W(x)2 R(x)2 R(x)1 R(x)1 R(x)2 • Implementation needs to keep dependencies Pipelined RAM (PRAM) or FIFO Consistency • PRAM consistency is even more relaxed than causal consistency: writes from the same processor are received in order, but writes from distinct processors may be received in different orders by different processors P1: W(x)1 P2: P3: P4: R(x)1 W(x)2 R(x)2 R(x)1 R(x)1 R(x)2 • Slight refinement: – Processor consistency: PRAM consistency plus writes to the same memory location are viewed everywhere in the same order Weak Consistency • Weak consistency uses synchronization variables to propagate writes to and from a machine at appropriate points: – accesses to synchronization variables are sequentially consistent – no access to a synchronization variable is allowed until all previous writes have completed in all processors – no data access is allowed until all previous accesses to synchronization variables (by the same processor) have been performed • That is: – accessing a synchronization variable “flushes the pipeline” – at a synchronization point, all processors have consistent versions of data Release Consistency • Release consistency is like weak consistency, but there are two operations “lock” and “unlock” for synchronization – (“acquire/release” are the conventional names) – doing a “lock” means that writes on other processors to protected variables will be known – doing an “unlock” means that writes to protected variables are exported – and will be seen by other machines when they do a “lock” (lazy release consistency) or immediately (eager release consistency) Eventual Consistency A form of “weak consistency” – but no explicit notion of synchronization variables – also known as “optimistic replication” All replicas eventually converge – or making progress toward convergence -- “liveness” guarantee How to ensure eventual consistency • apply “anti-entropy” measures, e.g., a gossip protocol • apply conflict resolution or “reconciliation”, e.g., last write wins Conflict resolution often leaves to applications! • E.g., GFS --- not application-transparent, but applications know best! Strong eventual consistency – Add “saftey” guarantee: i) any two nodes that have received the same (unordered) set of updates will be in the same state; ii) the system is monotonic, the application will never suffer rollbacks. – Using so-called “conflict-free” replicated data types & gossip protocol Time and Clock • We need to clock to keep “time” so as to order events and to synchronize • Physical Clocks – e.g., UT1, TAI or UTC – physical clocks drift over time -- synch. via, e.g., NTP – can keep closely synchronized, but never perfect • Logical Clocks – Encode causality relationship – Lamport clocks provide only one-way encoding – Vector clocks provide exact causality information Logical Time or “Happen Before” • Capture just the “happens before” relationship between events – corresponds roughly to causality • Local time at each process is well-defined – Definition (→i): we say e →i e’ if e happens before e’ at process i • Global time (→) --- or rather a global partial ordering: we define e → e’ using the following rules: – Local ordering: e → e’ if e →i e’ for any process i – Messages: send(m) → receive(m) for any message m – Transitivity: e → e’’ if e → e’ and e’ → e’’ • We say e “happens before” e’ if e → e’ Currency & Lamport Logical Clocks • Definition of concurrency: – we say e is concurrent with e’ (written e||e’) if neither e → e’ nor e’ → e • Lamport clock L orders events consistent with logical “happens before” ordering – if e → e’, then L(e) < L(e’) • But not the converse – L(e) < L(e’) does not imply e → e‘ • Similar rules for concurrency – L(e) = L(e’) implies e║|e’ (for distinct e,e’) – e║|e’ does not imply L(e) = L(e’) i.e., Lamport clocks arbitrarily order some concurrent events Lamport’s Algorithm • • Each process i keeps a local clock, Li Three rules: 1. 2. 3. • – at process i, increment Li before each event to send a message m at process i, apply rule 1 and then include the current local time in the message: i.e., send(m,Li) to receive a message (m,t) at process j, set Lj = max(Lj,t) and then apply rule 1 before time-stamping the receive event The global time L(e) of an event e is just its local time for an event e at process i, L(e) = Li(e) Total-order of Lamport clocks? – many systems require a total-ordering of events, not a partialordering • Use Lamport’s algorithm, but break ties using the process ID – L(e) = M * Li(e) + i --- M = maximum number of processes Vector Clocks • Goal: want ordering that matches causality – V(e) < V(e’) if and only if e → e’ • Method – Label each event by vector V(e) =[c1, c2 …, cn] • ci = # events in process i that causally precede e, n: # of processes • Algorithm: – – – – Initialization: all process starts with V(0)=[0,…,0] for event on process i, increment own ci Label message sent with local vector When process j receives message with vector [d1, d2, …, dn]: • Set local each local entry k to max(ck, dk) • Increment value of cj
© Copyright 2026 Paperzz