Distributed Algorithms for Mobile/Sensor Networks

Impossibility of Consensus in
Distributed Systems…
and other tales about distributed
computing theory
Nancy Lynch
MIT
Adriaan van Wijngaarden lecture
CWI 60th anniversary, February 9, 2006
1. Prologue




Thank you!
Adriaan van Wijngaarden: Numerical analysis,
programming languages, CWI leadership.
My contributions: Distributed computing theory.
This talk:


A general description of (what I think are) my main
contributions, with history + perspective.
Highlight a particular result: Impossibility of reaching
consensus in a distributed system, in the presence of failures
[Fischer, Lynch, Paterson 85].
2. My introduction to
distributed computing theory



1972-78: Complexity theory
1978, Georgia Tech: Distributed computing theory
Dijkstra’s mutual exclusion algorithm [Dijkstra 65]



Several processes run, with arbitrary interleaving of steps, as
if concurrently.
Share read/write memory.
Arbitrate the usage of a single higher-level resource:


Mutual exclusion: Only one process can “own” the resource at
a time.
Progress: Someone should always get the resource, when it’s
available and someone wants it.
Dijkstra’s Mutual Exclusion
algorithm


Initially: All flags = 0, turn is arbitrary.
To get the resource, process i does the following:

Phase 1:


Set flag(i) := 1
Repeatedly:



When turn = i, move on to Phase 2.
Phase 2:


Sets flag(i) := 2.
Checks everyone else’s flag to see if any = 2.



If turn = j, and flag(j) = 0, sets turn := i.
If so, go back to Phase 1.
If not, move on and get the resource.
To return the resource:

Set flag(i) := 0.
Dijkstra’s Mutual Exclusion
algorithm

It is not obvious that this algorithm is correct:


Properties must hold regardless of order of read and write steps.



Interleaving complications don’t arise in sequential algorithms.
In general, how should we go about arguing correctness of such
algorithms?
This got me interested in learning how to prove properties of:



Mutual exclusion, progress.
Algorithms for systems of parallel processes that share memory.
Algorithms in which processes communicate by channels (with
possible delay).
And led to work on general techniques for:

Modeling distributed algorithms precisely


Using interacting state-machine models.
Proving their correctness.
Impossibility results

Distributed algorithms have inherent limitations, because they must
work in badly-behaved settings:




With precise models, we could hope to prove impossibility results,
saying that certain problems cannot be solved, in certain settings.
First example: [Cremers, Hibbard 76]





Arbitrary interleaving of process steps.
Action based only on local knowledge.
Mutual exclusion with fairness: Every process who wants the resource
eventually gets it.
Not solvable for two processes with one shared variable, two values.
Even if processes can use operations more powerful than reads/writes.
Burns, Fischer, and I started trying to identify other cases where
problems could provably not be solved in distributed settings
That is, to understand the nature of computability in distributed
settings.
3. The next 20 years



Lots of work on algorithms: Mutual exclusion,
resource allocation, clock synchronization,
distributed consensus, leader election, reliable
communication…
And even more work on impossibility results.
And on modeling and verification methods.
Example impossibility result
[Burns, Lynch 93]


Mutual exclusion for n processes, using read/write shared memory,
requires at least n shared variables.
Even if:




No fairness is required, just progress.
Everyone can read and write all the variables.
The variables can be of unbounded size.
p1
p2
x
Example: n = 2.



Suppose two processes solve mutual exclusion, with progress, using
only one read/write shared variable x.
Suppose process 1 arrives alone and wants the resource. By the
progress requirement, it must be able to get it.
Along the way, process 1 writes to the shared variable x:


If not, process 2 wouldn’t know that process 1 was there.
Then process 2 could get the resource too, contradicting mutual exclusion.
Impossibility for mutual
exclusion
p1 arrives
p1 writes x
p1 gets the
resource
p2 writes x
p2 gets the
resource
p1 writes x,
overwriting p2
Contradicts mutual exclusion.
p1 gets the
resource
Impossibility for mutual
exclusion

Mutual exclusion with n processes, using read/write
shared memory, requires n shared variables:
p1



Argument for n > 2 is more intricate.
Proofs done in terms of math models.
Example shows the key ideas:


p2
x1
pn
x2
A write operation to a shared variable overwrites everything
previously in the shared variable.
Process sees only its own state, and values of the variables it
reads---its action depends on “local knowledge”.
Modeling and proof techniques

More and more clever, complex algorithms:





We needed:



[Gallager, Humblet, Spira 83] Minimum Spanning Tree algorithm.
Communication algorithms in networks with changing
connectivity [Awerbuch].
Concurrency control algorithms for distributed databases.
Atomic memory algorithms [Burns, Peterson 87],
[Vitanyi, Awerbuch 87] [Kirousis, Kranakis, Vitanyi 88],…
A simple, general math foundation for modeling algorithms
precisely, and
Usable, general techniques for proving their correctness.
We worked on these…
Modeling techniques



I/O Automata framework
[Lynch, Tuttle, CWI Quarterly 89]
I/O automaton: A state machine that can
interact, using input and output actions, with
other automata or with an external environment.
Composition:



Compose I/O automata to yield other I/O
automata.
Model a distributed system as a composition of
process and channel automata.
Levels of abstraction:



Model a system at different levels of abstraction.
Start from a high-level behavior specification.
Refine, in stages, to detailed algorithm description.
Proof techniques

Invariant assertions, statements about the system state.



Prove by induction on the number of steps in an execution.
Entropy functions, to argue progress.
Simulation relations:

Construct abstract version of the algorithm.


Need not be a distributed algorithm.
Proof breaks into two pieces:

Prove correctness of the abstract algorithm.



Interesting, involves the deep logical ideas behind the algorithm.
Tractable, because the abstract version is simple.
Prove the real algorithm emulates the abstract version.



A simulation relation.
Tractable, generally a simple step-by-step correspondence.
Does not involve the logical ideas behind the algorithm.
Example: Mutual exclusion in
a tree network

From [Lynch, Tuttle, CWI Quarterly 89]
Allocate a resource (fairly) among processes at the nodes of a tree:

Algorithm:






Use token to represent the single resource.
Token traverses subtree of active requests systematically.
Describe abstract version: Graph with moving token.
Prove the abstract version yields the needed properties.
Prove a simulation relation between the real algorithm and the
abstract version.
4. FLP




[Fischer, Lynch, Paterson 83]
Impossibility of consensus in fault-prone
distributed systems.
My best-known result…
Dijkstra Prize, 2001
Distributed Consensus


A set of processes in a distributed network, operating at arbitrary
speeds, want to reach agreement.
E.g., about:




Each process starts with initial value in V, and they want to decide
on a value in V:




The value of a sensor reading.
Whether to accept/reject the results of a database transaction.
Abstractly, on a value in some set V.
Agreement: Decide on the same value.
Validity: It should be some process’ initial value.
The twist: A (presumably small) number of processes might be
faulty, and might not participate correctly in the algorithm.
Problem appeared as:


Database commit problem [Gray 78].
Byzantine agreement problem [Pease, Shostak, Lamport 80].
FLP Impossibility Result


[Fischer, Lynch, Paterson 83] proved an impossibility result for
distributed consensus.
Proof works even for very limited failures:





Original result: Processes communicate using channels (with
possible delays).
Same result (essentially same proof) for read/write shared memory.
Result seemed counter-intuitive:




At most one process ever fails, and everyone knows this.
The process may simply stop, without warning.
If there are many processes, and at most one can fail, then it seems
like the rest could agree, and tell the faulty process the decision later…
But nonfaulty processes don’t know that the other process has failed.
But still, it seems like all but one of the processes could agree, then
later tell the other process the decision (whether or not it has failed).
But no, this doesn’t work!
FLP Impossibility proof



Proceed by contradiction---assume an algorithm exists
to solve consensus, argue based on the problem
requirements that it can’t work.
Assume V = {0,1}.
Notice that:



In an “extreme” execution, in which everyone starts with 0,
the only allowed decision is 0.
Likewise, if everyone starts with 1, the only allowed decision
is 1.
For “mixed inputs”, the requirements don’t say.
FLP Impossibility proof







First prove that the algorithm must have the
following pattern of executions: a “Hook”:
If i takes the next step after , then the only possible
decision thereafter is 0.
If j takes the next step, followed by i, then the only
possible decision is 1.
Thus, we can “localize” the decision point to a
particular pattern of executions.
For, if not, we can maneuver the algorithm to
continue executing forever, everyone continuing to
take steps, and no one ever deciding.
Contradicts requirement that all the nonfaulty
processes should eventually decide.
j
i
i
0 only
1 only
A Hook
FLP Impossibility proof






If not, then their steps are independent, so the order
can’t matter.
So different orders can’t result in different decisions,
contradiction.
Can’t both read x:



Now get a contradiction based on what processes j
and i do in their respective steps.
Each reads or writes a shared variable.
They must access the same variable x:
Order of reads can’t matter, since reads don’t change x.
That leaves three cases:



i reads x and j writes x.
i reads x and j reads x.
Both i and j write x.
j
i
i
0 only
1 only
A Hook
FLP Impossibility proof








What is different after  i vs.  j i?
In one case, j writes to the variable x before i does.
But in that case, i immediately overwrites what j
wrote.
So, the only difference is internal to j.
If we fail j, we can run the rest of the processes
after  i and after  j i, and they will do exactly the
same thing.
But this contradicts the fact that they must decide
differently in the two cases!
Case 1: i reads x and j writes x.



Case 3: Both write x.
Similar argument.
Case 2: i writes x and j reads x.

Similar argument.
j
i
i
0 only
1 only
A Hook
Significance of FLP

Significance for distributed computing practice:

Reaching agreement is sometimes important in practice:





For agreeing on aircraft altimeter readings.
Database transaction commit.
FLP shows limitations on the kind of algorithm one can look for.
Cannot hope for a timing-independent algorithm that tolerates even
one process stopping failure.
Main impact: Distributed computing theory
1. Variations on the result:



FLP proved for distributed networks, with reliable broadcast
communication.
[Loui, Abu-Amara 87] extended FLP to read/write shared memory.
[Herlihy 91] considered consensus with stronger fault-tolerance
requirements:



Any number of failures.
Simpler proof.
New proofs of FLP are still being produced.
Significance of FLP
2. Ways to circumvent the impossibility result:


Using limited timing information
[Dolev, Dwork, Stockmeyer 87].
Using randomness [Ben-Or 83][Rabin 83].

Weaker guarantees:


Small probability of a wrong decision, or
Probability of terminating approaches 0 as time approaches
infinity.
Significance of FLP
3. New, “stabilizing” version of the requirements:


Agreement, validity must hold always.
Termination required only if system behavior “stabilizes” for a while:




Has good solutions, both theoretically and in practice.
[Dwork, Lynch, Stockmeyer 88] algorithm:





Keeps trying to choose a leader, who tries to coordinate agreement.
Many attempts can fail.
Once system stabilizes, unique leader is chosen, coordinates agreement.
The tricky part: Ensuring failed attempts don’t lead to inconsistent decisions.
[Lamport 89] Paxos algorithm.



No new failures.
Timing (of process steps, messages) within “normal” bounds.
Improves on [DLS] by allowing more concurrency, and by having a funny story.
Refined, engineered for practical use.
[Chandra, Hadzilacos, Toueg 96] Failure detectors.



Services that encapsulate use of time in stabilizing algorithms.
Developed algorithms like [DLS], [Lamport], using failure detectors.
Studied properties of failure detectors, identified weakest FD to solve consensus.
Significance of FLP
4. Characterizing computability in distributed systems, in
the presence of failures.



E.g., k-consensus: At most k different decisions occur overall.
Problem defined by [Chaudhuri 93].
Characterization of computability in distributed settings:



Solvable for k-1 process failures but not for k failures.
Algorithm for k-1 failures: [Chaudhuri 93].
Matching impossibility result:


[Chaudhuri 93] Partial progress, using arguments like FLP.
[Herlihy, Shavit 93], [Borowsky, Gafni 93], [Saks, Zaharoglu 93]



Godel Prize, 2004.
Techniques from algebraic topology: Sperner’s Lemma.
Used to obtain k-dimensional analogue of the Hook.
Open questions related to FLP

Characterize exactly what problems can be solved in
distributed systems:



Based on problem type, number of processes, and number of
failures.
Which problems can be used to solve which others?
Exactly what information about timing and/or failures
must be provided to processes in order to make
various unsolvable problems solvable?

For example, what is the weakest failure detector that allows
solution of k-consensus with k failures?
5. Modeling Frameworks

Recall I/O automata [Lynch, Tuttle 87].







State machines that interact using input and output actions.
Good for describing asynchronous distributed systems:
no timing assumptions.
Components take steps at arbitrary speeds
Steps can interleave arbitrarily.
Supports system description and analysis using
composition and levels of abstraction.
I/O Automata are adequate for much of distributed
computing theory.
But not for everything…
Timed I/O Automata




We need also to model and analyze timing aspects of systems.
Timed I/O Automata, extension of I/O Automata
[Lynch, Vaandrager 92, 94, 96], [Kaynar, Segala, L, V 05].
Trajectories describe evolution of state over a time interval.
Can be used to describe:




Used for time performance analysis.
Used to model hybrid systems:


Real-world objects (vehicles, airplanes, robots,…) + computer programs.
Hybrid I/O Automata [Lynch, Segala, Vaandrager 03]


Time bounds, e.g., on message delay, process speeds.
Local clocks, used by processes to schedule steps.
Also allows continuous interactions between components.
Applications: Timing-based distributed algorithms, hybrid systems.
Probabilistic I/O Automata,…



[Segala 94] Probabilistic I/O Automata, Probabilistic
Timed I/O Automata.
Express random choices, random system behavior.
Current work: Improving PIOA


Composition, simulation relations.
Current work: Integrating PIOA with TIOA and HIOA.

The combination should allow modeling and analysis of any
kind of distributed system we can think of.
6. New Challenges

[Distributed Algorithms 96]:





Summarizes basic results of distributed computing theory, ca. 1996.
Asynchronous algorithms, plus a few timing-dependent algorithms.
Fixed, wired networks.
Still some open questions, e.g., general characterizations of computability.
New frontiers in distributed computing theory:


E.g., algorithms for mobile wireless networks.
Much worse behaved than traditional wired networks.





No one knows who the participating processes are.
The set of participants may change
Mobility
Much harder to program.
So, this area needs a theory!




New algorithms.
New modeling and analysis methods.
New impossibility results, giving the limits of what is possible in such networks.
The entire area is wide open for new theoretical work.
Distributed algorithms for
mobile wireless networks

My group (and others) are now working in this area,
developing algorithms, proving impossibility results.


Clock synchronization, consensus, reliable communication,…
One approach to algorithm design: Virtual Node Layers.



Use the existing network to implement (emulate) a betterbehaved network, as a higher level of abstraction.
Use the Virtual Node Layer to implement applications.
We are exploring VNLs, both theoretically and experimentally*.
*Note: Using CWI’s Python language…
7. Epilogue

Overview of our work in distributed computing theory,
especially



Impossibility results.
Models and proof methods.
Emphasis on FLP impossibility result, for consensus in
fault-prone distributed systems.
Thanks to my collaborators:
Yehuda Afek, Myla Archer, Eshrat Arjomandi, James Aspnes, Paul Attie, Hagit
Attiya, Ziv Bar-Joseph, Bard Bloom, Alan Borodin, Elizabeth Borowsky, James
Burns, Ran Canetti, Soma Chaudhuri, Gregory Chockler, Brian Coan, Ling
Cheung, Richard DeMillo, Murat Demirbas, Roberto DePrisco, Harish
Devarajan, Danny Dolev, Shlomi Dolev, Ekaterina Dolginova, Cynthia Dwork,
Rui Fan, Alan Fekete, Michael Fischer, Rob Fowler, Greg Frederickson, Eli Gafni,
Stephen Garland, Rainer Gawlick, Chryssis Georgiou, Seth Gilbert, Kenneth
Goldman, Nancy Griffeth, Constance Heitmeyer, Maurice Herlihy, Paul Jackson,
Henrik Jensen, Frans Kaashoek, Dilsun Kaynar, Idit Keidar, Roger Khazan, Jon
Kleinberg, Richard Ladner, Butler Lampson, Leslie Lamport, Hongping Lim,
Moses Liskov, Carolos Livadas, Victor Luchangco, John Lygeros, Dahlia Malkhi,
Yishay Mansour, Panayiotis Mavrommatis, Michael Merritt, Albert Meyer, Sayan
Mitra, Calvin Newport, Tina Nolte, Michael Paterson, Boaz Patt-Shamir, Olivier
Pereira, Gary Peterson, Shlomit Pinter, Anna Pogosyants, Stephen Ponzio,
Sergio Rajsbaum, David Ratajczak, Isaac Saias, Russel Schaffer, Roberto
Segala, Nir Shavit, Liuba Shrira, Alex Shvartsman, Mark Smith, Jorgen
Sogaaard-Andersen, Ekrem Soylemez, John Spinelli, Eugene Stark, Larry
Stockmeyer, Joshua Tauber, Mark Tuttle, Shinya Umeno, Frits Vaandrager,
George Varghese, Da-Wei Wang, William Weihl, H.P.Weinberg, Jennifer Welch,
Lenore Zuck,……and others I have forgotten to list.
Thank you!