Defending Networked Resources Against Floods of Unwelcome

No Time for Asynchrony
Marcos K. Aguilera
Michael Walfish
Microsoft Research Silicon Valley
UCL, Stanford, UT Austin
Problem: Nodes in Distributed Systems Fail
apps
OS
VM
protocols
drivers
primary
backup
• Pragmatic response: end-to-end timeouts
 Getting them right: hard. Getting them wrong: bad.
Problem: Nodes in Distributed Systems Fail
apps
OS
VM
protocols
drivers
primary
backup
• Pragmatic response: end-to-end timeouts
 Getting them right: hard. Getting them wrong: bad.
Problem: Nodes in Distributed Systems Fail
apps
OS
VM
protocols
drivers
primary
backup
• Pragmatic response: end-to-end timeouts
 Getting them right: hard. Getting them wrong: bad.
Problem: Nodes in Distributed Systems Fail
apps
OS
VM
protocols
drivers
?
primary
Paxos
backup
• Pragmatic response: end-to-end timeouts
 Getting them right: hard. Getting them wrong: bad.
• Current view/lore/wisdom: design for asynchrony
 Very general  guarantee of safety
Different Points of View
1. Keep it simple: “rely on time and timeouts”
2. Keep it safe: “design for asynchrony”
3. Our view: there is good in both
 We want simplicity, safety, and high availability
 Our mantra: “no end-to-end timeouts”
A Proposal That Meets Our Goals
spy
failure
detector
app
OS
VM/hypervisor
network driver
network card
primary
backup
• Spies indicate crashed-or-not authoritatively
• Why do we want device drivers killing OSes?
This Talk Will Argue:
1. Asynchrony is problematic
 (And often disregarded in practice)
2. Spy-based failure detection meets our goals
Scope
• Enterprises, data centers
• Not Byzantine failures
Asynchrony Detracts From Safety
1. “Safety under asynchrony” downplays liveness
 But highest layers in a system have deadlines
 Lower layer loses liveness
 at deadline, higher layer may be bereft
 lose “whole system” safety
Asynchrony Detracts from Safety (Cont’d.)
2. Under asynchrony, components hide useful info.
 Unresponsiveness  higher layers guess
 Wrong guesses  loss of safety
asynchronous component
3.
?  complex designs (example: Paxos)
 Complexity  mistakes  safety violations
Empirical Observations Against Asynchrony
• Paxos-using systems rely on synchrony for safety
Chubby [Burrows OSDI06], Petal [Lee ASPLOS96],
WheelFS [Stribling et al. NSDI09], …
Paxos
Leases, …
 “Safety under asynchrony” hard to meet
 Generality of asynchrony maybe not needed in reality
• World fundamentally synchronous
 Electrons, CPUs, human beings, organizations
Recap Argument Against Asynchrony
Appeal of asynchrony: generality  safety
Argument against asynchrony:
• Async. components can lead to unsafe systems
• Hard to meet “safety under asynchrony”
• Asynchrony doesn’t represent reality
• People forced to depart from asynchrony anyway
Our Argument, Continued
1. Asynchrony is problematic
 (And often disregarded in practice)
2. Spy-based failure detection meets our goals
A Powerful Abstraction: Perfect Failure Detectors
[Chandra & Toueg, JACM 96]
CRASHED?(
“ “up” ”
)
Perfect failure
detector (PFD)
processes
• A perfect failure detector is an oracle
Asynchronous model:
?
Want a model where:

PFDs  Safe, Simple Distributed Algorithms
PFD
primary
backup
• Replication by primary-backup instead of Paxos
• Other examples in the paper (not our contribution)
How to Build a Perfect Failure Detector?
FD
?
• Failure detection (not PFD) uses status messages
• Hard to make this FD a PFD
 Variable timing, system a black box
Realizing Perfect Failure Detectors
app
OS
VM/hypervisor
network driver
network card
?
?
PFD
• Recall our third goal:
[Fetzer IEEE Trans. 2003, Ricciardi &
high availability.
Birman PODC91]
• Approach is “surgical”:
 Operate inside layers
 Focus on E2E behavior
 Use only local timing
 Use E2E timeouts
 Kill as a last resort
 Kill/exclude any suspect
• Current proposals coarse
Spies Orchestrated to Form Surgical PFD
PFD
app
OS
VM/hypervisor
network driver
network card
network switch
• Example: spy in VM tracks OS state
• Lower-level spies also monitor higher-level ones
 Allows localization of smallest failed component
Limitations and Discussion
1. Under network partition, PFD module blocks
2. To realize spies, must modify system infrastructure
• We think this is okay in data centers
 Partitions often cause block anyway
 One administrative domain
• Harder to address in wide area
 Requires spies in Internet switches and routers
 Network to host feedback not totally implausible
Summary and Conclusion
• End-to-end timing assumptions problematic. So:
 Avoid timing with inside info., assassination
 Avoid end-to-end by infiltrating many layers
• The gain: simple, safe, and live distributed systems
• But: PFDs, spies not a good fit for all environments
• Next step: get it implemented and deployed
• This is a call to arms