Global Predicate Detection and Event Ordering Our Problem To compute predicates over the state of a distributed application Model Message passing No failures Two possible timing assumptions: Synchronous System Asynchronous System No upper bound on message delivery time No bound on relative process speeds No centralized clock Asynchronous systems Weakest possible assumptions Weak assumptions ´ less vulnerabilities Asynchronous slow “Interesting” model w.r.t. failures Client-Server •Processes exchange messages using •Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response c s Client-Server • Processes exchange messages using • Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response The server computes the response (possibly asking other servers) and returns it to the client #!?%! c s Deadlock! Goal •Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds Wait-For Graphs Draw arrow from pi to pj if pj has received a request but has not responded yet Wait-For Graphs Draw arrow from pi to pj if pj has received a request but has not responded yet Cycle in WFG ) deadlock Deadlock ) ¦ cycle in WFG The protocol p0 sends a message to p1 p3 On receipt of p0 ‘s message, pi replies with its state and wait-for info An execution An execution An execution Ghost Deadlock! We have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive epi is the i-th event of process p The local history hp of process p is the sequence of events executed by process hpk : prefix that contains first k events hp0 : initial, empty sequence The history H is the set hp0 [ hp1 [ … hpn-1 NOTE: In H, local histories are interpreted as sets, rather than sequences, of events Ordering events Observation 1: Events in a local history are totally ordered • time Ordering events Observation 1: Events in a local history are totally ordered time Observation 2: For every message m, send(m) precedes receive(m) time time Happened-before (Lamport[1978]) •A binary relation defined over events 1. if eik,eil2hi and k<l , then eik !eil 2. if ei=send (m) and ej=receive(m), then ei! ej 3. if e!e’ and e’! e‘’, then e! e‘’ Space-Time diagrams •A graphic representation of a distributed execution time Space-Time diagrams •A graphic representation of a distributed execution time Space-Time diagrams •A graphic representation of a distributed execution time Space-Time diagrams •A graphic representation of a distributed execution time Space-Time diagrams •A graphic representation of a distributed execution H and time impose a partial order Space-Time diagrams •A graphic representation of a distributed execution H and time impose a partial order Space-Time diagrams •A graphic representation of a distributed execution H and time impose a partial order Space-Time diagrams •A graphic representation of a distributed execution H and time impose a partial order Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: h1,h2, … ,hn is a run A run is consistent if the total order imposed in the run is an extension of the partial order induced by! A single distributed computation may correspond to several consistent runs! Cuts •A cut C is a subset of the global history of H • Cuts •A cut C is a subset of the global history of H •The frontier of C is the set of events • Global states and cuts The global state of a distributed computation is an tuple of n local states = (1 ... n ) To each cut (1c1, ... ncn) corresponds a global state Consistent cuts and consistent global states A cut is consistent if A consistent global state is one corresponding to a consistent cut What sees What sees •Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by p3 but not the corresponding send event Our task Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system... Our approach Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each m timestamped with with T(send(m)) Snapshot I • 1. p0 selects tss • 2. p0 sends “take a snapshot at tss” to all processes • 3. when clock of pi reads tss then a. b. c. records its local state i starts recording messages received on each of incoming channels stops recording a channel when it receives first message with timestamp greater than or equal to tss Snapshot I • 1. p0 selects tss • 2. p0 sends “take a snapshot at tss” to all processes • 3. when clock of pi reads tss then a. b. c. d. records its local state i sends an empty message along its outgoing channels starts recording messages received on each of incoming channels stops recording a channel when it receives first message with timestamp greater than or equal to tss Correctness Theorem: Proof: Snapshot I produces a consistent cut Need to prove < Definition > < Assumption > < Assumption > < 0 and 1> < Property of real time> < 2 and 4> < 5 and 3> < Definition > Clock Condition < Property of real time> Can the Clock Condition be implemented some other way? Lamport Clocks •Each process maintains a local variable LC •LC(e) = value of LC for event e Increment Rules Timestamp m with Space-Time Diagrams and Logical Clocks 2 3 6 7 8 7 1 8 4 5 6 9 A subtle problem • when LC=t do S • doesn’t make sense for Lamport clocks! • there is no guarantee that LC will ever be t S is anyway executed after • • Fixes: • If e is internal/send and LC = t-2 • execute e and then S • If e = receive(m) Æ (TS(m) ¸ t) Æ (LC · t-1) • put message back in channel • re-enable e; set LC=t-1; execute S An obvious problem No tss ! Choose large enough that it cannot be reached by applying the update rules of logical clocks An obvious problem No tss! Choose large enough that it cannot be reached by applying the update rules of logical clocks Doing so assumes upper bound on message delivery time upper bound relative process speeds Better relax it Snapshot II p0 selects p0 sends “take a snapshot at tss” to all processes; it waits for all of them to reply and then sets its logical clock to when clock of pi reads then pi records its local state i sends an empty message along its outgoing channels starts recording messages received on each incoming channel stops recording a channel when receives first message with timestamp greater than or equal to Relaxing synchrony empty message: take a snapshot at Process does nothing for the protocol during this time! records local state monitors channels sends empty message: Use empty message to announce snapshot! Snapshot III Processor p0 sends itself “take a snapshot “ when pi receives “take a snapshot” for the first time from pj: records its local state i sends “take a snapshot” along its outgoing channels sets channel from pj to empty starts recording messages received over each of its other incoming channels when receives “take a snapshot” beyond the first time from pk: pi stops recording channel from pk when pi has received “take a snapshot” on all channels, it sends collected state to p0 and stops. Snapshots: a perspective The global state s saved by the snapshot protocol is a consistent global state Snapshots: a perspective The global state s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that s could have occurred Snapshots: a perspective The global state s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that s could have occurred We are evaluating predicates on states that may have never occurred! An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice An Execution and its Lattice Reachability • kl is reachable from ij if there is a path from kl to ij in the lattice Reachability • kl is reachable from ij if there is a path from kl to ij in the lattice Reachability • kl is reachable from ij if there is a path from kl to ij in the lattice Reachability • kl is reachable from ij if there is a path from kl to ij in the lattice So, why do we care about s again? Deadlock is a stable property Deadlock Deadlock If a run R of the snapshot protocol starts in i and terminates in f, then So, why do we care about s again? Deadlock is a stable property Deadlock Deadlock If a run R of the snapshot protocol starts in i and terminates in f, then Deadlock in s implies deadlock in f No deadlock in s implies no deadlock in i Same problem, different approach Monitor process does not query explicitly Instead, it passively collects information and uses it to build an observation. (reactive architectures, Harel and Pnueli [1985]) An observation is an ordering of event of the distributed computation based on the order in which the receiver is notified of the events. Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications • Causal delivery •FIFO delivery guarantees: • • Causal delivery •FIFO delivery guarantees: •Causal delivery generalizes FIFO: • Causal delivery •FIFO delivery guarantees: •Causal delivery generalizes FIFO: • send event receive event deliver event Causal delivery •FIFO delivery guarantees: •Causal delivery generalizes FIFO: • send event receive event deliver event Causal delivery •FIFO delivery guarantees: •Causal delivery generalizes FIFO: • • send event receive event deliver event Causal delivery •FIFO delivery guarantees: •Causal delivery generalizes FIFO: send event receive event deliver event • 1 Causal delivery •FIFO delivery guarantees: •Causal delivery generalizes FIFO: send event receive event deliver event 1 2 Causal Delivery in Synchronous Systems •We use the upper bound on message delivery time • Causal Delivery in Synchronous Systems •We use the upper bound delivery time on message •DR1: At time t , p0 delivers all messages it received with timestamp up to t- in increasing timestamp order Causal Delivery with Lamport Clocks • DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. Causal Delivery with Lamport Clocks • DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 1 Causal Delivery with Lamport Clocks • DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. • • • 1 4 Should p0 deliver? Causal Delivery with Lamport Clocks • DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 1 4 Should p0 deliver? • Problem: Lamport Clocks don’t provide gap detection • Given two events e and e’ and their clock values LC(e) and LC(e’) — where LC(e) < LC(e’), determine whether some e’’ event exists s.t. LC(e) < LC(e’’) < LC(e’) Stability •DR2: Deliver all received stable messages in increasing (logical clock) timestamp order. •A message m received by p is stable at p if p will never receive a future message m s.t. TS(m’) < TS(m) Implementing Stability Real-time clocks wait for time units Implementing Stability Real-time clocks wait for time units Lamport clocks wait on each channel for m s.t. TS(m) > LC(e) Design better clocks! Clocks and STRONG Clocks Lamport clocks implement the clock condition: We want new clocks that implement the strong clock condition: Causal Histories The causal history of an event e in (H,!) is the set Causal Histories The causal history of an event e in (H,!) is the set Causal Histories The causal history of an event e in (H,!) is the set How to build •Each process : pi initializes = 0 if eik is an internal or send event, then if eik is a receive event for message m, then Pruning causal histories Prune segments of history that are known to all processes (Peterson, Bucholz and Schlichting) Use a more clever way to encode (e) Vector Clocks Consider i(e), the projection of (e) on pi i(e) is a prefix of hi encoded using ki : i(e) = hiki – it can be (e) = 1(e) [ 2(e) [ . . . [ n(e) can be encoded using Represent using an n-vector VC such that Update rules Message m is timestamped with Example [1,0,0] [2,1,0] [3,1,2] [4,1,2] [5,1,2] [1,2,3] [0,1,0] [4,3,3] [1,0,1] [1,0,2] [1,0,3] [5,1,4] Operational interpretation [1,0,0] [2,1,0] [3,1,2] [4,1,2] [5,1,2] [1,2,3] [0,1,0] [4,3,3] [1,0,1] •= •= [1,0,2] [1,0,3] [5,1,4] Operational interpretation [1,0,0] [2,1,0] [3,1,2] [4,1,2] [5,1,2] [1,2,3] [0,1,0] [4,3,3] [1,0,1] [1,0,2] [1,0,3] [5,1,4] • ´ no. of events executed pi by up to and including ei •´ Operational interpretation [1,0,0] [2,1,0] [3,1,2] [4,1,2] [5,1,2] [1,2,3] [0,1,0] [4,3,3] [1,0,1] [1,0,2] [1,0,3] [5,1,4] •´ no. of events executed pi by up to and including ei •´ no. of events executed by pj that happen before ei of pi VC properties: event ordering 1. Given two vectors V and V+, less than is defined as: • V < V+ ´ (V V+) Æ (8 k : 1· k· n : V[k] · V+[k]) Strong Clock Condition: Simple Strong Clock Condition: Given ei of pi and ej of pj, where i j Concurrency: Given ei of pi and ej of pj, where i j VC properties: consistency Pairwise inconsistency • Events ei of pi and ej of pj (i j) are pairwise inconsistent (i.e. can’t be on the frontier of the same consistent cut) if and only if Consistent Cut • A cut defined by (c1, . . ., cn) is consistent if and only if VC properties: weak gap detection Weak gap detection • Given ei of pi and ej of pj, if VC(ei)[k] < VC(ej)[k] for some k j, then there exists ek s.t [2,0,1] [2,2,2] [0,0,2] VC properties: weak gap detection Weak gap detection • Given ei of pi and ej of pj, if VC(ei)[k] < VC(ej)[k] for some k j, then there exists ek s.t [1,0,1] [2,0,1] [2,2,2] [2,1,1] [0,0,1] [0,0,2] VC properties: strong gap detection Weak gap detection • Given ei of pi and ej of pj, if VC(ei)[k] < VC(ej)[k] for some k j, then there exists ek s.t Strong gap detection • Given ei of pi and ej of pj, if VC(ei)[i] < VC(ej)[i] for some k j, then there exists ei’ s.t VCs for Causal Delivery Each process increments the local component of its VC only for events that are notified to the monitor Each message notifying event e is timestamped with VC(e) The monitor keeps all notification messages in a set M Stability •Suppose p0 has received mj from pj. •When is it safe for p0 to deliver mj ? Stability •Suppose p0 has received mj from pj •When is it safe for p0 to deliver mj ? • There is no earlier message in M Stability •Suppose p0 has received mj from pj •When is it safe for p0 to deliver mj ? • There is no earlier message in M • There is no earlier message from pj no. of pj messages delivered by p0 Stability •Suppose p0 has received mj from pj •When is it safe for p0 to deliver mj ? • There is no earlier message in M • There is no earlier message from pj no. of pj messages delivered by p0 • There is no earlier message mk’’ from pk (k j) … ? Checking for . Let mk’ be the last message p0 delivered from pk By strong gap detection, mk’’ exists only if Hence, deliver mj as soon as The protocol p0 maintains an array D[1, . . ., n] of counters D[i] = TS(mi)[i] where mi is the last message delivered from pi • DR3: Deliver m from pj as soon as both of the following conditions are satisfied: 1. 2.
© Copyright 2024 Paperzz