No Time for Asynchrony Marcos K. Aguilera Michael Walfish Microsoft Research Silicon Valley UCL, Stanford, UT Austin Problem: Nodes in Distributed Systems Fail apps OS VM protocols drivers primary backup • Pragmatic response: end-to-end timeouts Getting them right: hard. Getting them wrong: bad. Problem: Nodes in Distributed Systems Fail apps OS VM protocols drivers primary backup • Pragmatic response: end-to-end timeouts Getting them right: hard. Getting them wrong: bad. Problem: Nodes in Distributed Systems Fail apps OS VM protocols drivers primary backup • Pragmatic response: end-to-end timeouts Getting them right: hard. Getting them wrong: bad. Problem: Nodes in Distributed Systems Fail apps OS VM protocols drivers ? primary Paxos backup • Pragmatic response: end-to-end timeouts Getting them right: hard. Getting them wrong: bad. • Current view/lore/wisdom: design for asynchrony Very general guarantee of safety Different Points of View 1. Keep it simple: “rely on time and timeouts” 2. Keep it safe: “design for asynchrony” 3. Our view: there is good in both We want simplicity, safety, and high availability Our mantra: “no end-to-end timeouts” A Proposal That Meets Our Goals spy failure detector app OS VM/hypervisor network driver network card primary backup • Spies indicate crashed-or-not authoritatively • Why do we want device drivers killing OSes? This Talk Will Argue: 1. Asynchrony is problematic (And often disregarded in practice) 2. Spy-based failure detection meets our goals Scope • Enterprises, data centers • Not Byzantine failures Asynchrony Detracts From Safety 1. “Safety under asynchrony” downplays liveness But highest layers in a system have deadlines Lower layer loses liveness at deadline, higher layer may be bereft lose “whole system” safety Asynchrony Detracts from Safety (Cont’d.) 2. Under asynchrony, components hide useful info. Unresponsiveness higher layers guess Wrong guesses loss of safety asynchronous component 3. ? complex designs (example: Paxos) Complexity mistakes safety violations Empirical Observations Against Asynchrony • Paxos-using systems rely on synchrony for safety Chubby [Burrows OSDI06], Petal [Lee ASPLOS96], WheelFS [Stribling et al. NSDI09], … Paxos Leases, … “Safety under asynchrony” hard to meet Generality of asynchrony maybe not needed in reality • World fundamentally synchronous Electrons, CPUs, human beings, organizations Recap Argument Against Asynchrony Appeal of asynchrony: generality safety Argument against asynchrony: • Async. components can lead to unsafe systems • Hard to meet “safety under asynchrony” • Asynchrony doesn’t represent reality • People forced to depart from asynchrony anyway Our Argument, Continued 1. Asynchrony is problematic (And often disregarded in practice) 2. Spy-based failure detection meets our goals A Powerful Abstraction: Perfect Failure Detectors [Chandra & Toueg, JACM 96] CRASHED?( “ “up” ” ) Perfect failure detector (PFD) processes • A perfect failure detector is an oracle Asynchronous model: ? Want a model where: PFDs Safe, Simple Distributed Algorithms PFD primary backup • Replication by primary-backup instead of Paxos • Other examples in the paper (not our contribution) How to Build a Perfect Failure Detector? FD ? • Failure detection (not PFD) uses status messages • Hard to make this FD a PFD Variable timing, system a black box Realizing Perfect Failure Detectors app OS VM/hypervisor network driver network card ? ? PFD • Recall our third goal: [Fetzer IEEE Trans. 2003, Ricciardi & high availability. Birman PODC91] • Approach is “surgical”: Operate inside layers Focus on E2E behavior Use only local timing Use E2E timeouts Kill as a last resort Kill/exclude any suspect • Current proposals coarse Spies Orchestrated to Form Surgical PFD PFD app OS VM/hypervisor network driver network card network switch • Example: spy in VM tracks OS state • Lower-level spies also monitor higher-level ones Allows localization of smallest failed component Limitations and Discussion 1. Under network partition, PFD module blocks 2. To realize spies, must modify system infrastructure • We think this is okay in data centers Partitions often cause block anyway One administrative domain • Harder to address in wide area Requires spies in Internet switches and routers Network to host feedback not totally implausible Summary and Conclusion • End-to-end timing assumptions problematic. So: Avoid timing with inside info., assassination Avoid end-to-end by infiltrating many layers • The gain: simple, safe, and live distributed systems • But: PFDs, spies not a good fit for all environments • Next step: get it implemented and deployed • This is a call to arms
© Copyright 2026 Paperzz