Speculative Linearizability - Infoscience

Speculative Linearizability
Rachid Guerraoui
Viktor Kuncak
Giuliano Losa
School of Computer and Communication Sciences
Swiss Federal Institute of Technology Lausanne (EPFL)
[email protected]
[email protected]
[email protected]
Abstract
Linearizability is a key design methodology for reasoning about
implementations of concurrent abstract data types in both shared
memory and message passing systems. It provides the illusion that
operations execute sequentially and fault-free, despite the asynchrony and faults inherent to a concurrent system, especially a distributed one. A key property of linearizability is inter-object composability: a system composed of linearizable objects is itself linearizable. However, devising linearizable objects is very difficult,
requiring complex algorithms to work correctly under general circumstances, and often resulting in bad average-case behavior. Concurrent algorithm designers therefore resort to speculation: optimizing algorithms to handle common scenarios more efficiently.
The outcome are even more complex protocols, for which it is no
longer tractable to prove their correctness.
To simplify the design of efficient yet robust linearizable protocols, we propose a new notion: speculative linearizability. This
property is as general as linearizability, yet it allows intra-object
composability: the correctness of independent protocol phases implies the correctness of their composition. In particular, it allows
the designer to focus solely on the proof of an optimization and derive the correctness of the overall protocol from the correctness of
the existing, non-optimized one.
Our notion of protocol phases allows processes to independently
switch from one phase to another, without requiring them to reach
agreement to determine the change of a phase. To illustrate the applicability of our methodology, we show how examples of speculative algorithms for shared memory and asynchronous message
passing naturally fit into our framework.
We rigorously define speculative linearizability and prove our
intra-object composition theorem in a trace-based as well as an
automaton-based model. To obtain a further degree of confidence,
we also formalize and mechanically check the theorem in the
automaton-based model, using the I/O automata framework within
the Isabelle interactive proof assistant. We expect our framework
to enable, for the first time, scalable specifications and mechanical
proofs of speculative implementations of linearizable objects.
1.
Introduction
The correctness of a system of processes communicating through
linearizable objects [12, 17, 18] reduces to the correctness of the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
PLDI’12, June 11–16, 2012, Beijing, China.
c 2012 ACM 978-1-4503-1205-9/12/06. . . $10.00
Copyright sequential executions of that system. In other words, linearizability reduces the difficult problem of reasoning about concurrent data
types to that of reasoning about sequential ones. In this sense, the
use of linearizable objects greatly simplifies the construction of
concurrent systems. At first glance, the design and implementation
of linearizable objects themselves looks also simple. One can focus
on each object independently, design the underlying linearizable algorithm, implement and test it, and then compose it with algorithms
ensuring the linearizability of the other objects of the system. In
short, linearizability is preserved by inter-object composition: a set
of objects is linearizable if and only if each object is linearizable
when considered independently of the others.
However, it is extremely difficult to devise efficient and robust
implementations of a linearizable object, even when considered independently from the others. The difficulty stems from the following dilemma. On the one hand, achieving robustness in a concurrent
system requires assuming that the scheduling of processes, communication, and faults is completely nondeterministic. The object does
not know which execution is going to unfold but needs to deliver a
response to each of its method invocations no matter the execution.
The object has to prepare for all scenarios, using up its resources for
that task. This conservative approach typically yields wait-free [11]
but inefficient strategies. On the other hand, achieving efficiency requires speculating that only a small subset of executions is likely
to occur [23]. This is practically appealing because a real system
typically spends most of its time in a common case and seldom
experiences periods of alternative executions. Should the object focus on a very small subset of executions, it would save resources
by preparing only for that restricted subset. Typically, the common
cases that are usually considered in practice are those where the
system is synchronous, there is no failure, contention is not very
high, a specific conditional branch is to be considered, etc. At a
high level, a speculative system repeats the following steps:
1. Speculate about the likelihood of certain executions and choose
a corresponding algorithm.
2. Initialize the chosen algorithm.
3. Monitor the algorithm to estimate if the speculation was accurate. If not, abort the algorithm and recover a consistent state.
Examples of speculation include the Ethernet protocol, where
processes speculatively occupy a single-user communication
medium before backing off if collision is detected, or branch prediction in microprocessors, where the processor speculates that a
particular branch in the code will be taken before discarding its
computation if this is not the case. More recent instances of speculation include optimistic replication [15] or adaptive mutual exclusion [14]. In fact, most practical concurrent systems are speculative. In general, speculative systems may choose between many
different options, or speculation phases, in order to closely match
a changing common case.
To illustrate the challenges addressed in this work, consider
a speculative algorithm that may switch between any two of n
speculation phases in a safe manner. That is, recovering a consistent
state of the first phase and then initializing the second. Doing so
in an ad-hoc manner poses two major scalability problems to the
design process. First, there are O(n2 ) different switching cases that
need to be carefully handled in order to preserve linearizability.
Second, adding a new phase to an object composed of n phases
built ad-hoc may require deep changes to all of the n existing
phases.
The case of State Machine Replication [16] (abbreviated SMR),
used to build robust linearizable object implementations, illustrates
those problems. Non-speculative SMR algorithms like Paxos [7] or
PBFT [6] are notoriously hard to understand. The formal correctness proof of Disk Paxos [13] took about 7000 lines. Only an informal proof, 35 pages long, of a simplified version of PBFT is known
to the authors [5]. Speculative SMR protocols are even harder. For
instance, the Zyzzyva [15] protocol combines PBFT with a fast
path implemented by a simple agreement protocol. The fast path is
more efficient than PBFT when there are no failures. In the advent
of failures, the fast path cannot make progress and Zyzzyva falls
back to executing PBFT. The ad-hoc composition of the fast path
with PBFT required deep changes to both algorithms and resulted
in an entanglement that is hardly understandable. Although PBFT
had previously been widely tested [5], Zyzzyva suffered from fragile implementations and no correctness proof has ever been proposed. In fact, Zyzzyva, being restricted to two phases, is very fragile [24]. If the common case is not what is expected by the fast path
one falls-back to PBFT, making the optimization useless. An adversary can easily weaken the system by always making it abort
the fast path and go through the slow one. Introducing a new dimension of speculation might make the protocol more robust but
would require a new ad-hoc composition, including an alternative
fast-path, at a cost comparable to the effort needed to build Zyzzyva
from scratch, namely a Dantean effort. Given the diversity of situations encountered in practice, we are convinced that this ad-hoc
approach is simply intractable.
Contributions. We propose in this paper a framework for devising, reasoning about, and mechanically verifying effective and
robust linearizable implementations of concurrent data types in a
scalable way. In short, we show how speculative implementations
of linearizable objects can be devised and analyzed in an incremental manner. One can focus on each dimension of speculation independently of the others. For example, a programmer can first devise
an algorithm with no speculation, implement, test, and prove it correct, then augment it later with an optimization that speculates that
a certain subset of executions is more likely to occur (say there is
no failure). This optimization sub-algorithm is composed with the
original one as is. Later, another programmer can consider other dimensions of speculations (say there is no contention or asynchrony)
and add new phases, still without modifying the original implementation and proofs.
At the heart of our framework lies our new notion, speculative
linearizability. Speculative linearizability encompasses the notion
of switch actions, which makes it significantly richer than linearizability, yet it reduces to linearizability if these actions are ignored.
Speculative linearizability augments classical linearizability with a
new aspect of composition. Not only a system of concurrent objects can be considered correct if each of them is correct (interobject composition), but a set of algorithms implementing different
speculation phases of the same object is correct if each of them
is correct (intra-object composition). We express this new aspect
through a new composition theorem which we state and prove. Intuitively, speculative linearizability captures the idea of safe and
live abortability. An implementation can abort if the assumptions
1:
2:
3:
4:
5:
6:
7:
8:
//Shared Register V , Initially ⊥
Function propose(val):
if V = ⊥ then
V ← val
return val
else
return V
end if
Figure 1. Consensus Specification
behind speculation reveals wrong. When it does abort, it does so in
a safe manner, by preserving the consistency (linearizability) of the
object state. Moreover, the abort is also performed in a live manner,
because a new protocol phase can resume and make progress. Processes can switch independently from one phase to another, without
the need to reach agreement. Our notion of speculative linearizability is itself based on a new definition of linearizability. This definition is interesting in its own right because it enables a more local
form of reasoning than the original definition and it allows for repeated events, which are the norm in practice. We have proved that
our new definition of linearizability is equivalent to the classical
definition.
The generality of our framework and its Isabelle/HOL formalization [10] sets our work apart from state of the art in the area
[1, 2, 8]: we enable for the first time scalable mechanically-checked
development of linearizable objects of arbitrary abstract data types.
Additional proofs and formalizations are available in [9, 10], as
well as at http://lara.epfl.ch/w/slin.
2.
Putting Speculative Linearizability to Work
In this section we provide intuition for speculative linearizability
and its key property: intra-object composability. Our goal is to motivate correct-by-construction design of objects based on speculation phases. As an illustration, we present two speculative implementations of a consensus object. Each is composed of two speculation phases. The first applies to the message-passing computation
model and the other to the shared-memory model. We analyze each
phase of those speculative implementations in isolation and conclude, using the intra-object composition theorem, that their compositions are correct implementations of consensus.
2.1
Message-Passing Consensus
Our first example is a consensus implementation in a system composed of client and server processes which communicate by asynchronous message passing and which may crash at any point. Our
goal is to implement consensus among the clients. Notable use
cases of consensus in message-passing systems include Google’s
Chubby distributed lock service [4] and the Gaios reliable data store
[3].
Consensus allows a set of clients to agree on a common value.
Clients each propose a value and should receive a common decision value taken among their proposals. We consider implementations that offer an invocation function propose(value) to each
of its clients. The invocation propose(val) returns another value
indicating the common decision. When a process returns from
propose(value) with a value d, we say the it decides value d. An
algorithm is linearizable with respect to the consensus data type
if all calls to propose(val) appear as if they executed atomically
the algorithm in Figure 1 at some point after their invocation (we
assume that proposal are different from ⊥).
In an asynchronous message-passing system where processes
may crash, there is no component which can reliably hold state.
Hence, some crucial data may become unavailable due to the failure
of a process. For example, a process may decide some value and
then crash before informing any other process of its decision. In
this case other processes have no way to know which value the
crashed process decided and they cannot decide without risking to
violate consensus.
Paxos [7] is an algorithm that implements consensus if less than
half of the servers may fail. Paxos has a minimum latency of 3
message delays. However, suppose that executions are fault free
and contention free (i.e. the time intervals delimited by corresponding invocations and responses do not overlap). Under this assumption, very simple algorithms can solve consensus much faster than
Paxos, which still has a minimum latency of 3 message delays.
We would like to optimize Paxos for the fault-free and
contention-free case. This could be done ad-hoc by adding a fast
path inside Paxos. However, Paxos is a very intricate algorithm:
the proof of Disk Paxos [7], a variant of Paxos, is 7000 lines long
[13]. Adding a fast path to Paxos would require re-examining the
correctness of the entire algorithm, because the fast path is interleaved with executions of the others parts. The entire proof would,
in principle, need to be rewritten.
Our framework allows us to optimize Paxos without directly
modifying the original algorithm, and enables reasoning about the
correctness of the optimization independently of the basic Paxos itself. To use our framework, we wrap Paxos in a simple interface
that allows composition with other speculation phases. This requires adding a trivial level of indirection and makes Paxos a speculation phase in its own right. Speculative linearizability ensures that
any speculatively linearizable speculation phase can be composed
with the Paxos speculation phase to form a correct implementation of consensus. Moreover, speculative linearizability, instantiated for consensus, is a property independent of Paxos. Therefore,
from the point of view of algorithm designers, speculative linearizability hides the internal complexity of Paxos.
In this example we compose Paxos with the Quorum speculation phase. Quorum manages to decide on the value in only 2
message delays, whenever there is neither contention nor faults.
If faults or contention happen, Quorum cannot decide and instead
passes control to Paxos. In this case, we say that Quorum switches
to Paxos. From then on, Paxos takes over the execution and can
decide as long as less than half of the processes are faulty. To highlight that Paxos is used as a backup when optimization by Quorum
is not possible, we call the Paxos speculation phase Backup. Using speculative linearizability, we will prove that the composition
of Quorum and Backup is linearizable by reasoning about Quorum
in isolation and by reusing an existing proof of Paxos.
By combining Quorum and Backup we obtain a system that
is optimized for contention-free and fault-free loads while still remaining correct in all other conditions under which the Backup is
correct. Such an approach has proved empirically successful in recent work [8]; the present paper provides the underlying theoretical
foundations for such approaches.
We now describe more precisely the Quorum and Backup speculation phases. Their interface is that of a consensus object, as
defined above, augmented with a switch-to-backup(sv) call. The
switch-to-backup(sv) call is used by a client to switch from the
Quorum phase to Backup, i.e. switch-to-backup(sv) is a procedure of Backup which may be called by a client executing Quorum. The sv parameter is called a switch value. A client calling
switch-to-backup(sv) transfer its pending invocation to Backup
and provides sv as an indication to Backup on how to take over
Quorum’s execution.
Informally, the Quorum algorithm works as follows:
• Upon propose(v), a client c broadcasts its proposal to all server
processes and waits for accept messages. It also stores v in local
variable proposalc of its memory and starts a local timer tc .
• When a server process receives a proposal v from a client c:
If it has not sent any accept message, it sends an accept
message accept(v) to c.
If it has already sent an accept message accept(v 0 ), it sends
accept(v 0 ) to c.
• If a client receives two different accept messages, it switches to
Backup by calling and returning switch-to-backup(proposalc ).
• If a client receives the same accept messages accept(v) from
all the servers, then it decides v.
• If timer tc expires, then c waits for at least one message
accept(v 0 ) from a server, or selects one v 0 if it has some
messages of the form accept(v 0 ) already. It then switches to
Backup by calling switch-to-backup(v 0 ).
The Backup phase is Lamport’s Paxos algorithm where clients
have the role of proposers and learners, while servers have the role
of acceptors. Backup treats the switch calls from Quorum as regular
proposals. In other words, upon a call switch-to-backup(v), a client
process proposes the value v to the Paxos algorithm.
By composing Quorum and Backup we obtain a linearizable
implementation of consensus that is optimized for crash-free and
contention-free workloads. This new optimized implementation of
consensus was obtained by composing a black-box implementation
of Paxos with a much simpler protocol, Quorum, which makes
progress only under particular conditions.
We next uncover the principles behind this approach and we
will show the correctness of our optimized protocol by reasoning
about each speculation phase in isolation, independently of each
other. Thanks to our modular approach, we will be able to rely the
existing proof of Paxos [13] to show the correctness of Backup with
minor effort.
2.2
Linearizability
Before explaining speculative linearizability, we briefly review linearizability itself: we provide both its classical definition and our
new formalization.
We consider a set of concurrent clients accessing an object
through invocation procedures whose execution ends by returning
a response, as in the consensus examples above. We assume that
each client is sequential. In other words, a client does not invoke
the object before having returned from its preceding invocation.
An object is accessed sequentially if every response to an invocation immediately follows the invocation. There must be no other
response or invocation in between them. For example, the specification of a consensus object states that all processes return the same
decision value and that this value must have been proposed by some
process before it can be returned. Hence, in a sequential execution,
the first proposed value must be returned to every process. Note that
this corresponds to executing sequentially the algorithm in Figure
1: The first process executing will impose its value to all others.
We consider only deterministic objects. Therefore, in the sequential case, the response to a given invocation is determined by
the sequence of past invocations that the object received. Therefore,
we represent a sequential execution by its subsequence containing
only and all invocations in the execution. We call such sequences
of invocations histories.
When accessed concurrently, invocations and responses of different processes may be arbitrarily interleaved. The sequences of
invocations and responses that may be observed at the interface of
an object that is accessed concurrently are called traces. The occurrence of an invocation or a response in a trace is called an event, or
action.
Linearizability is defined with respect to a sequential specification, which is a description of the allowed behaviors of the object
when it is accessed sequentially. Intuitively, a trace is linearizable
if each invocation appears to take effect at a single point in time after the invocation call and before the corresponding response. This
intuitive definition can be restated as follows: A trace t is linearizable if we can associate, to each response r returned by a client c,
a history h (called commit history) such that:
1. The response r is the response obtained in the sequential execution represented by history h.
2. All invocations in the history h are invoked in the trace t before
the response r is returned.
3. The history h ends with the last invocation of client c.
4. For all commit histories h1 and h2 , either h1 is a prefix of h2
or h2 is a prefix of h1 .
This definition of linearizability is stated precisely in Section 4. In
the terminology of Section 4, condition 1 above says that history h
explains response r. Condition 1 is specific to a particular abstract
data type. Conditions 2 and 3 correspond to Validity, and 4 to Commit Order. They do not depend on the particular ADT considered.
We have proved [9] that this definition is equivalent to the original
definition of linearizability [12].
As an example, consider a trace of a consensus object such that
client c1 proposes value v1 , then client c2 proposes value v2 , then
client c2 returns v2 , and finally client c1 returns v2 . This execution
is linearizable because the returned values are as if the invocation
of client c2 was executed atomically before the invocation of client
c1 . To show this, associate the history [propose(v2 )] to the response
of c2 and the history [propose(v2 ), propose(v1 )] to the response of
c1 . It is easy to check that all four conditions above are satisfied.
Notably, when applied sequentially to a consensus object, both
histories lead to response v2 . By the definition above, the trace is
linearizable. Examples of traces that are not linearizable include:
[c1 proposes v1 , c2 proposes v2 , c1 decides v1 , c2 decides v2 ],
as well as:
[c1 proposes v1 , c1 decides v2 , c2 proposes v2 , c2 decides v2 ].
2.3
Speculative Linearizability
We now consider modular implementations composed of two speculation phases (the definition generalizes to any number of phases).
As in the composition of the Quorum and Backup algorithms, each
client process starts by executing the first phase and may switch to
the second phase using a procedure call, providing an argument that
we call a switch value along with its pending invocation. In the examples of this section we omit the pending invocation from switch
calls for the sake of conciseness.
We would like to reason about each speculation phase in isolation. For this purpose, we require that the switch values provided
by the clients when changing phase be the only information passed
from one speculation phase to the other. No side effects are permitted across speculation phases. Speculation phases are thus separated by a clear interface: a call to the phase’s switch procedure
with a switch value and a pending invocation as arguments.
Speculative linearizability is a property of speculation phases
that relates the switch events and the invocations received by the
speculation phase to the responses and switch events produced by
the same phase. Moreover, no assumption other that trivial wellformedness conditions is made on the received switch events and
invocations. This allows us to check that a phase is speculatively
linearizable by reasoning about the phase independently of other
phases it might be composed with.
Speculative linearizability is an extension of linearizability to
traces that contain switch calls. In the definition of linearizability
stated above, we associated to each response returned by a client
a history of inputs with certain properties. In the same spirit, in
addition to associating histories to responses returned, we now
associate to each switch call in the trace a history of inputs. With
respect to the first phase, we call these histories abort histories.
With respect to the second phase, we call them init histories.
Such histories associated to switch calls represent possible linearizations of the execution of the first speculation phase. The idea
is that, with the knowledge of how the execution of the first phase
was linearized, the second phase can continue execution without violating linearizability. In order for a speculation phase to deduce a
possible current linearization from a switch value, we suppose that
all speculation phases agree on a common mapping from switch
values to histories. We denote this mapping by rinit .
Depending on the particular data type being implemented, there
are some sets of histories whose members all bring the system in
the same logical state. In other words, the response to a new invocation is independent of which history of such a set was executed
before. We call histories in such sets equivalent with respect to the
data type. Consider our consensus example. Any set of histories
whose members start with the same proposal is a set of equivalent
histories: because the first proposal is the decision value, any new
invocation will return the first proposed value. Processes invoking
the object at a point after such a history has been executed cannot
tell which history in the set actually happened. Because of this observation, we require that speculation phases agree on a mapping
from switch values to sets of equivalent histories, as opposed to
a single history. This allows us to use a compact representation of
histories while ensuring that enough information is available for the
second phase to take over the execution.
Two other principles, fault tolerance and avoidance of unnecessary synchronization, influenced our definition. For fault tolerance,
we do not rely on just one client passing a single switch value. For
efficiency reasons, we avoid the requirement that all clients switch
with the same value because it would imply solving a potentially
expensive agreement problem. Hence, the switch values of different clients may be different and no single switch call determines
the execution of the second phase.
Applied to the first phase, speculative linearizability requires
that the first phase be linearizable. Moreover, for all traces of the
first phase, it must be possible to associate histories to the switch
events so that each such history is a possible linearization of the
trace up to the considered event and each association from switch
event to history respects the common mapping that phases agree
on. Finally, the longest common prefix of all abort histories must
be a full linearization of the trace of the first phase. The last two
requirements correspond to the Abort Order property of Section 5.
Applied to the second phase, speculative linearizability says that
for all traces t of the second phase, for all possible associations
of histories to switch values that respect the common mapping
agreed on, by concatenating t to the trace represented by the longest
common prefix of all init histories and by replacing switch calls
with the pending invocation they contain, one should obtain a
linearizable trace. This corresponds to the Init Order property of
Section 5.
Having defined the new concept of speculative linearizability,
we will show that the composition of any number of speculatively
linearizable phases is a linearizable algorithm. This result is a
corollary of our intra-object composition theorem, which states that
the composition of two speculation phases is itself a speculation
phase.
2.4
Applying Speculative Linearizability to Quorum+Backup
We next show that our message-passing speculation phases, Quorum and Backup, satisfy speculative linearizability. As explained
above, we view the values passed when switching from Quorum
to Backup as representing sets of histories equivalent to some lin-
earization of Quorum’s execution. To this end we assumed that
speculation phases agreed on a common mapping from switch
events to sets of histories. In this example, the mapping of a switch
event of client c with switch value v is the set of all histories starting with invocation propose(v) from a client c0 6= c and containing
only invocations from clients other than c. Hence, a client switching
to Backup with switch value v indicates to Backup that any history
in the set mapped to v is a possible linearization of Quorum’s execution.
Quorum is speculatively linearizable. We prove that Quorum is
speculatively linearizable in two steps. First, we show that Quorum
satisfies the following three invariants. Then we show that those
invariants imply speculative linearizability.
I1: If some client c decides value v then all clients that switch,
either before or after c decides, do so with value v.
I2: If some client c decides value v then all clients that decide do
so with value v.
I3: All clients that switch or decide do so with a value that was
proposed before they switch or decide.
In Quorum, if there are no faults nor contention, a client broadcasting a proposal will receive identical accept messages from all
server processes and will return the value contained in the accept
messages. However, if contention causes proposals from different
clients to be received in different orders at different servers, or if
a server fails or messages are lost, clients will switch to Backup
because different accept messages will be received or not enough
messages will be received before the timers expire.
Observe that if a client returns a value v in Quorum then all
servers will reply with accept(v) to all the other clients. This is
because a client only returns value v if all servers send it accept
messages with value v, and a server always responds with the same
accept message. Hence, if some client decided, all the other clients
will either return the same value v or will switch to Backup with the
initialization value v in case message loss or server crashes cause
their timers to expire. From these observation we can conclude that
the invariants I1 and I2 are satisfied by any trace of Quorum.
Moreover, as servers respond with an accept message containing the first proposal received, client processes return or switch with
a value that was proposed before. Hence, the invariant I3 holds.
We now prove that any trace satisfying I1, I2, and I3 satisfies
speculative linearizability. Consider a trace t of Quorum that satisfies I1, I2, and I3.
According to the definition of speculative linearizability for
the first phase, we begin by showing that t is linearizable. If no
client decides then the trace t is trivially linearizable. Therefore
assume that some clients decide. The invariant I2 implies that
all decisions in the trace are the same. Let v be the common
decision. The invariant I3 implies that some client, noted winner,
proposed v before any client decided v. Let the history h be such
that h starts with winner’s proposal and the sub-sequence of h
starting at position 2 is equal to the subsequence of t containing
all the proposals of the clients that decide and that are not winner.
The history h represents a linearization of trace t. We satisfy our
definition of linearizability by associating to each decision from a
client c the history h truncated just after the proposal of c. The
linearizability of Quorum follows.
To prove speculative linearizability, it remains to show that we
can associate appropriate histories to each switch event in t. We
consider two cases.
First, assume that some clients decided. Then, by invariant I1
we know that clients decide or switch with the same value v. Let
history h be as defined above. To each switch event in t we associate h. By definition of h, h starts with the common decision v
and all clients that switch do so with value v. Moreover the clients
that switch do not decide; their invocations thus do not appear in h.
Hence, the association from switch events to histories respects the
common mapping. Moreover, h is a linearization of trace t. Therefore, each history associated to a switch event is a linearization of
t. Finally, the longest common prefix of all histories associated to
switch events is obviously h itself, and h is a linearization of trace
t. We conclude that t satisfies speculative linearizability.
Now assume that no client decides. In this case the trace is
trivially linearizable. Moreover, it is easy to associate histories
to switch events, so that the longest common prefix of all such
histories is empty. We thus trivially have that t satisfies speculative
linearizability.
Note that a client that does not fail returns or switches at the
latest when its timer expires. Quorum is thus wait-free.
Backup is speculatively linearizable.
As for Quorum, we
proceed by first abstracting Backup using simple invariants and
then proving that those invariants imply speculative linearizability.
Since Paxos has been proved correct in the past [13], we can
trivially abstract Backup using the following two invariants:
I4 All clients decide the same value.
I5 All clients decide a switch value previously submitted by some
client.
Consider a trace t satisfying invariants I4 and I5. We consider
two cases. First assume that all switch values are the same and
equal to v. Then, by definition of the common mapping from switch
events to histories, for all associations of histories to switch events
respecting the common mapping, the longest common prefix h of
the histories associated to switch values is such that h starts with
an invocation propose(v) from a client c that doesn’t execute in t
and contains only invocations by clients that don’t execute in t. Let
th be the trace represented by history h. By invariants I4 and I5 we
know that all clients decide value v in t. Let t0 be the trace t where
switch calls are replaced by the pending invocation they contain.
Consider the concatenation th @t0 of th and t0 . Let the history h0
be the concatenation of h and some ordering of the proposals in
t0 . Because h represents th and because all clients decide v in
t, we have that h0 is a linearization of th @t0 . Hence, t satisfies
speculative linearizability.
Now suppose that at least two switch values are different. Then
for any association of histories to switch events respecting the common mapping, the longest common prefix of the histories associated to switch events is the empty history. From invariants I4 and
I5 we have that all processes decide on one of the switch values
submitted before any decision. Hence, t is linearizable, and so is
the concatenation of the empty history and t. Therefore, t satisfies
speculative linearizability.
We have proved that Quorum and Backup satisfy speculative
linearizability. Therefore, using the intra-object composition theorem that we will prove below, we conclude that the composition of
Quorum and Backup is a linearizable implementation of consensus.
Using speculative linearizability, we have optimized the Paxos
algorithm to obtain an algorithm that has a latency of two message
delays in the absence of faults and contention. Implementing this
optimization by modifying the Paxos algorithm would have meant
modifying a notoriously intricate distributed algorithm. Instead, by
using our framework, we were able to optimize Paxos without
modifying it. Moreover, thanks to the intra-object composition
theorem, we easily proved the optimized protocol correct just by
proving the optimization speculatively linearizable and by relying
on the correctness of Paxos.
2.5
Shared-Memory Consensus
To illustrate the broad applicability of our framework we now
apply it to an algorithm in the shared-memory model. We consider
a consensus implementation in an asynchronous shared-memory
system.
Consensus can be implemented on modern microprocessors
using the wait-free compare-and-swap (CAS) instruction, but this
instruction may be slower than an atomic register access. It has
been proved [11] that wait-free consensus cannot be implemented
with registers.
However, assuming that all executions are contention free (i.e.
sequential), Figure 1 implements consensus using only registers.
Hence the following question: is it possible to devise an object that
uses only registers in contention-free executions but that always executes correctly? We obtain such an object by composing a registerbased speculation phase called RCons, shown in Figure 2, and a
CAS-based speculation phase called CASCons, shown in Figure 3.
The CAS-based speculation phase is a straightforward implementation of consensus using the CAS operation. Both speculation phase
are inspired by [25]. It turns out that these speculation phases, like
many other speculation-based objects, are instances of our framework.
The register-based consensus uses a wait-free splitter algorithm
[19]. The splitter guarantees that at most one process returns with
true and all others return with false. Moreover, it guarantees that, in
the absence of contention, exactly one process returns with true. A
splitter can be implemented using only registers, as in our example
implementation in Figure 2.
A client invokes the register-based speculation phase using the
function propose(val), where val is the value that it proposes. To
simplify the notation, we assume that, at the time of the call, the
caller identifier is stored in the special variable c. The registerbased speculation phase, RCons, can return the content of register
V , or it can switch to the CAS-based speculation phase, CASCons,
by invoking the function switch-to-CASCons(V ) and returning its
result. In the latter case, we say that a client switches with value
v, where v is the content of register V when it was last read. Such
switching is the basic mechanism of composing speculation phases
in our framework.
As for Quorum and Backup, we will show that the composition of the phases RCons and CASCons form a linearizable implementation of consensus by a modular reasoning. Thanks to the
intra-object composition theorem, we will get the correctness of
the composition of RCons and CASCons by analyzing each in isolation.
Let us now show that the RCons and CASCons algorithms are
speculatively linearizable.
As for Quorum and Backup, we will first consider RCons in
isolation and, given an arbitrary execution of RCons, we will show
that we can associate appropriate histories to the responses returned
and to the switch calls. We will then consider CASCons in isolation
and show that, given an arbitrary execution of CASCons and an
arbitrary association of histories to the switch calls of CASCons
(respecting the common mapping), we can associate appropriate
histories to the responses appearing in the trace.
We first show that the invariants I1, I2, and I3 defined for Quorum also hold of the traces of RCons. To establish these invariants,
we first observe the following: By property of the splitter, at most
one client executes lines 15 to 18. Moreover, if one client c does so
and returns, it makes sure that other clients will either return with
its value at line 12 or switch with its value at line 27. Indeed, if the
variable Contention is false at line 16, it means that no process got
past line 23. Therefore, all processes that switch will find c’s value
in variable V at line 25. We conclude that the invariants I1 and I2
hold on all executions of RCons:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
Object RCons
//Shared Register V , Initially ⊥; and D, Initially ⊥
//Shared Register Contention ∈ {true, false}, Initially false
//Shared Registers: Y ∈ {true, false}, Initially false; and X
//Local Variable v; local constant c denoting the caller’s identifier
Function propose(val):
v ← val
if D 6= ⊥ then
return D
end if
if splitter() = true then
V ←v
if ¬Contention then
D←v
return v
else
return switch-to-CASCons(v)
end if
else
Contention = true
if V 6= ⊥ then
v←V
end if
return switch-to-CASCons(v)
end if
Function splitter():
X ← c //Remember that c is the caller’s identifier
if Y = true then
return false
end if
Y ← true
if X = c then
return true
else
return false
end if
Figure 2. Register-Based Speculative Consensus
Object CASCons
//Shared Register D Initially ⊥
Function switch-to-CASCons(val):
return CAS(D, ⊥, val)
Function propose(val):
//Since processes have to call switch-to-CASCons first, we know that
the consensus has already been won, hence just return D.
7: return D
1:
2:
3:
4:
5:
6:
Figure 3. CAS-Based Speculative Consensus
I1: If some client c returns value v then all clients that switch, either
before or after c returns, do so with value v.
I2: If some client c returns value v then all clients that return do so
with value v.
Next, observe that all clients return or switch with their own value
or with the content of register V , which can only contain proposed
values. Consequently, the invariant I3 is also an invariant on all
executions of RCons:
I3: All clients that switch do so with a value that was proposed
before they switch.
Similarly, the CASCons phase can be abstracted by the same
invariants that we used to abstract the Backup speculation phase:
I4 All clients return the same value.
I5 All clients return a switch value previously submitted by some
client.
When we proved that Quorum and Backup are speculatively
linearizable, we showed that the invariants I1, I2, and I3 imply
speculative linearizability of the first phase and that the invariants
I4 and I5 imply the speculative linearizability of the second phase.
Since we established that the traces of RCons satisfy I1, I2, and I3
and that the traces of CASCons satisfy I4 and I5 we can conclude
that RCons and CASCons satisfy speculative linearizability.
We will now formally define speculative linearizability.
3.
Trace Properties
In this section we define trace properties and the operations they
support. Trace properties are our model of distributed systems.
A trace property is a set of finite sequences of events. We use
trace properties to describe the set of all finite sequences that a
system can generate, as well as the desired properties of systems,
such as linearizability with respect to a given abstract data type. We
restrict ourselves to finite sequences because our work deals with
safety properties only.
Sequences.
We write [1..n] for the set {1, 2, . . . , n}. A sequence s of length n with elements from a set E is a function
s : [1..n] → E. We denote n by |s|. We write E ∗ for the
set of all sequences of elements in E. We write [e1 , e2 , . . . , en ]
for the sequence s such that s(1) = e1 , s(2) = e2 , . . . , and
s(n) = en . If s = [e1 , e2 , . . . , en ] then s::en+1 is the sequence [e1 , e2 , . . . , en , en+1 ]. If s = [e1 , e2 , . . . , en ] and s0 =
[e01 , e02 , . . . , e0n ] then s:::s0 = [e1 , . . . , en , e01 , . . . , e0n ] and, if m <
|s|, we write s|m for the the sequence [e1 , e2 , . . . , em ]. We say that
a sequence s is a prefix of a sequence s0 iff there exists a sequence
s00 such that s0 = s:::s00 . We say s is a strict prefix of s0 iff the
s00 is non-empty. Given a set of sequences P , the longest common
prefix of set P is the longest among the sequences s0 such that for
all sequence s ∈ P , s0 is a prefix of s.
Multisets.
We represent multisets of elements of set E by a
multiplicity function E → N. Given two multisets m1 and m2
and e ∈ E, we define (m1 ∪ m2 )(e) = max(m1 (e), m2 (e)) and
(m1 ] m2 )(e) = m1 (e) + m2 (e). Moreover m1 ⊆ m2 iff for
all e ∈ E, m1 (e) ≤ m2 (e). Let elems : E ∗ → (E → N) be
a function that given a sequence returns a multiset representing all
the elements found in the sequence and their number of occurences.
Given a sequence s we write e ∈ s iff elems(s)(e) > 0.
Trace Properties.
A trace is a sequence of actions which
represents events happening at the interface between a system and
its environment. An event occurs at some point in time and has no
duration. We can approximate invoking a method and starting to
execute it as one action. The execution of a method from start to
finish, however, is typically not an action because it has a positive
duration in time. We would represent it with an invocation action
and a return action.
We classify actions into input and output actions. This classification is especially important to define composition of systems,
because we need to know which actions of a component affect another component. The classification is described by a signature. A
signature sig is a pair (in, out) where in is a set of input actions
and out is a disjoint set of output actions. We require in and out to
be disjoint. We denote by acts(sig) the set of all actions in signature
sig. When all the actions of a trace t belong to acts(sig) we say that
t is a trace in sig.
Definition 1. A trace property is a pair P = (sig, Traces) where
sig is a signature and Traces is a set of traces in acts(sig).
We denote sig by sig(P ) and Traces by Traces(P ). We allow to
write in(P ) instead of in(sig(P )) and similarly for the other sets
of actions defined above.
We say that a trace property Q satisfies a trace property P ,
denoted Q |= P , if Q’s signature is equal to P ’s signature and
the set of trace of Q is a subset of the set of traces of P :
Q |= P ⇔
(sig(Q) = sig(P ) ∧ Traces(Q) ⊆ Traces(P )) .
Composition of Trace Properties.
Trace properties may be
composed only if their signatures are compatible. Signatures sig1
and sig2 are compatible if and only if they have no output actions in
common, that is out(sig1 ) is disjoint from out(sig2 ). As a result,
out(sig1 ) ∩ acts(sig2 ) ⊆ in(sig2 ) and vice versa. We define the
projection proj(t, A) of a trace t onto a set A of actions as the
sequence obtained by removing all actions of t that are not in A.
For example, proj([x, y, x0 , z, y 0 , z, y, z, y], {x0 , y 0 }) = [x0 , y 0 ].
Definition 2. If sig1 is compatible with sig2 then the composition
P = P1 ||P2 is such that:
• The signature sig(P ) is such that in(P ) = (in(P1 )∪in(P2 ))\
(out(P1 ) ∪ out(P2 )) and out(P ) = out(P1 ) ∪ out(P2 ).
• The set of traces Traces(P ) is the largest set of traces in acts(P )
such that if t ∈ Traces(P ) then proj(t, acts(P1 )) ∈ Traces(P1 )
and proj(t, acts(P2 )) ∈ Traces(P2 )
Intuitively, composition requires components to execute at the
same time the actions that appear as input of one component and
output of the other. Actions that appear in a unique component are
executed asynchronously from the other components. Composition
has the following property:
Property 1. If Q1 |= P1 and Q2 |= P2 then Q1 ||Q2 |= P1 ||P2 .
Projection of Trace Properties.
Definition 3. Given a trace property P we define its projection
proj(P, A) onto a set of actions A such that sig(proj(P, A)) =
(in(P ) ∩ A, out(P ) ∩ A) and Traces(proj(P, A)) = {t | ∃t0 ∈
Traces(P ).t = proj(t0 , A)}
Applying the proj operator to a trace property P amounts to
removing all the actions not in A from P ’s interface.
4.
Linearizability
Our notion of speculative linearizability is itself based on a new
definition of linearizability. This definition is interesting in its own
right because it enables a more local form of reasoning than the
original definition. Moreover, it allows for repeated events, which
are the norm in practice. The technical report [9] contains a proof
that our new definition of linearizability is equivalent to the classical definition.
4.1
Abstract Data Types
Abstract Data Types (ADTs) formalize our intuitive understanding
of how sequential objects may behave.
In order to simplify our discussion in the rest of the paper, we
deviate from the usual, state-machine based, definition of abstract
data types. However we do so without loss of generality. Traditionally, the allowed behaviors are described using an ad-hoc state
machine that given a state and an input transitions to a new state
and produces an output. Instead, we determine outputs from the input history seen so far using an output function. Computation of
the output function amounts to replaying the execution of the statemachine description.
Definition 4. An ADT T is a tuple T = (IT , OT , fT ) where IT
is a non-empty set, whose members we call inputs, OT is a disjoint
non-empty set whose members we call outputs, and fT : IT∗ → OT
is an output function.
In the rest of the paper, we consider objects of some fixed ADT
T = (IT , OT , fT ).
Example 1. We consider a consensus object with operations of
the form (p(v), d(v 0 )) with v and v 0 belonging to some set VCons .
p(v) is a shorthand for “propose(v)” and d(v) for “decide(v)”.
The consensus object must guarantee that a unique value v may
be decided and that at any point a decided value has been proposed before. Hence, because we are considering sequential executions, the first proposed value must be decided in all subsequent operations. Formally, we define the ADT Cons such that
ICons = {p(v) | v ∈ VCons }, OCons = {d(v) | v ∈ VCons },
fCons ([p(v1 ), p(v2 ), . . . , p(vn )]) = d(v1 )
4.2
A Trace Model of Concurrent Objects
We consider a set C of asynchronous sequential processes called
clients. Clients use a concurrent object of ADT T by calling the
object’s invocation function with one of the ADT’s inputs, and the
invocation function returns one of the ADT’s outputs. We denote a
sequence of interaction between clients and object as follows. Suppose that the following sequence of interactions between clients
and object is observed: Client c1 invokes input in1 ; Client c2 invokes input in2 ; Client c2 returns with output out2 ; Finally client c1
returns with output out1 . This sequence of interactions is denoted
by the sequence of actions
[inv(c1 , in1 ), inv(c2 , in2 ), res(c2 , in2 , out2 ), res(c1 , in1 , out1 )]
Signature of a Concurrent Object
Formally, a sequence of
interactions between a concurrent object and its clients is a trace
of actions in the signature sigT , where the inputs in(sigT ), called
invocation actions, are of the form inv(c, n, in) and the outputs
out(sigT ), called response actions, are of the form res(c, n, in, out)
for some client c, natural number n, in ∈ IT , and out ∈ OT .
The reader will notice that the second parameter of the actions is
not present in our preceding example. It will be used in Subsection
5 to define the signature of a speculation phase.
In Section 4 we define the trace property LinT such that a
concurrent object described by a trace property O is linearizable
w.r.t. T if and only if O ⊆ LinT .
Underscore Notation. Throughout the paper we will use the underscore symbol “ ” in mathematical expressions. Each occurrence of
this symbol is to be replaced by a fixed fresh variable. For example,
we might say an action a is an invocation action if a = inv( , , ),
or equivalently, that if a = inv(c, n, in) for some c, n, and in, then
a is an invocation action.
4.3
A New Definition of Classical Linearizability
In this section we define a trace property called linearizability, and
noted LinT . Informally, a trace in sigT is said to be linearizable iff
its actions may be reordered to form a sequential trace that:
1. Respects the semantics of the ADT T .
2. Preserves the order of non-overlapping operations.
Property 1 implies that clients accessing a linearizable object cannot tell whether the object produces sequential or concurrent traces.
This is a crucial property because it hides concurrency from the
view of application developers. Property 2 makes linearizability a
local property. In other words, a system composed of linearizable
objects is itself linearizable. We refer the reader to [12] for an indepth discussion about linearizability.
Instead of using the definition above we propose a new definition of linearizability that will simplify our task when we define
speculation phases. It directly establishes a relationship between
the linearizable traces and the ADT, without the intermediary of
sequential traces.
Note that some definitions of linearizability [12] assume more
or less explicitly that all inputs submitted are unique. This seems
unwarranted as many algorithms qualified as implementing a linearizable object are deemed correct without requiring this assump-
tion. Our definition does not rely on this assumption and it coincides with the other definitions on traces satisfying the assumption.
Consider a trace t in sigT .
Definition 5. Trace t is linearizable iff it is well-formed and it
admits a linearization function g : [1..|t|] → IT∗ .
We define linearization functions and well-formed traces below.
To ensure that our definition corresponds to the usual one, in
the technical report [9] we also formalize the usual definition of
linearizability [12], by defining when a trace is linearizable∗ , even
in case of repeated events. We then prove the equivalence between
the two definitions.
Theorem 1. A trace in sigT is linearizable if and only if it is
linearizable∗ .
4.4
Linearization Functions
In the rest of the paper we call sequences of inputs histories. A
linearization function maps the indices of t to histories which must
satisfy two properties:
First, applying the ADT’s output function to the history associated with a response index must yield the output contained in the
response. Hence a linearization function may be seen as giving an
explanation of the outputs observed, in term of the ADT’s semantics.
Second, the history associated to a response index must only
contain inputs invoked before index i and for all pairs of histories
corresponding to response indices, one is a strict prefix of the
other. This second condition gives us a total order on the response
indices of t (the prefix order on histories) which corresponds to the
reordering needed in the original definition.
A history corresponding to some response index i in a trace t
according to a linearization function is said to be a linearization of
the (concurrent) trace t|i .
We now state this definition formally:
Definition 6. Linearization Function. A function g is a linearization function for t iff
• Function g explains trace t.
• Function g and trace t satisfy the validity(t,g) and commitorder(t,g) predicates.
Definition 7. Explains. We say that function g explains trace t
iff for all index i ∈ [1..|t|], if t(i) = res( , , , out) then out =
fT (g(i)).
Consider a function g : [1..n] → IT∗ that explains trace t.
Definition 8. Commit Index and Commit Histories. If i ∈ [1..n]
is such that t(i) = res( , , , ) then we say that i is a commit index
and that g(i) is a (t, g)-commit history.
Definition 9. Sequence of Previous Inputs. We define the sequence of previous inputs at i in t, inputs(t, i), as the sequence of all inputs submitted before index i in trace t. Formally, for all i ∈ [1..|t|], |inputs(t, i)| = |proj(t|i , in(sigT )|,
and for all j ∈ [1..|inputs(t, i)|], proj(t|i , in(sigT ))(j) =
inv( , , inputs(t, i)(j)).
Definition 10. Valid Commit Index For all index i ∈ [1..|t|]
such that t(i) = res( , , in, ), we say that i is (t, g)-valid iff
elems(g(i)) ⊆ elems(inputs(t, i)) and the input in is the last
element of the history g(i).
In other words, if t(i) = res( , , in, ) then g(i) is a permutation
of the sequence of previous inputs at i and ends with the input in.
We now define validity and commit-order:
Definition 11. Validity(t,g). All commit indices in t are (t, g)valid.
Definition 12. Commit Order(t,g): For all pairs of distinct commit
indices (i, j), either g(i) is a strict prefix of g(j) or g(j) is a strict
prefix of g(i).
Example 2. Consider the following trace t:
0
0
[inv(c, in1 ), inv(c , in2 ), res(c , in2 , fT ([in2 ])), res(c, in1 , fT ([in2 , in1 ]))].
Consider a function g such that g(3) = [in2 ] and g(4) = [in2 , in1 ].
The function g explains t. Moreover t and g satisfy validity and
commit-order. Hence g is a linearization function for t.
4.5
Well-Formed Traces
The well-formedness of traces is a basic requirement. For example,
it states that a response should be given to a client only if it
previously issued an invocation.
Given a client c, let ActT (c) be the union of the sets
{inv(c, , in)|in ∈ IT } and {res(c, , in, out)|in ∈ IT ∧ out ∈ OT }.
Definition 13. Client sub-trace. The client sub-trace sub(t, c) of
trace t in sigT is the trace sub(t, c) = proj(t, ActT (c)).
Definition 14. Well-formed client sub-trace. A client sub-trace t
is well-formed iff t(1) is such that t(1) = inv( , , ) and for all
i ∈ [1..|t| − 1], if t(i) = inv( , , in), then t(i + 1) = res( , , in, ).
Definition 15. Well-formed traces. We say that trace t in sigT is
well-formed if and only if all its client sub-traces are well-formed.
Note that invocations with no corresponding response may appear in a well-formed trace. We say that those invocations are pending.
4.6
Speculative Linearizability
We next present our main results: a precise definition of speculative
linearizability, and the proof that a composition of implementations
that are speculatively linearizable implementations is itself speculatively linearizable. This section presents a trace-based formalization. We summarize an alternative, automata-based, formalization
in Section 6.
5.1
[inv(c1 , 1, in1 ), inv(c2 , 1, in2 ), swi(c2 , 2, in2 , v),
res(c1 , 1, in1 , out1 ), res(c2 , 2, in2 , out2 )]
Instead of formalizing the signature of a single speculation
phase identified by a natural number we generalize to the signature
of a composition of speculation phases. We now assume that a
speculation phase may be composed of several other speculation
phases.
Let N2< = {(m, n) | (m, n) ∈ N2 ∧ m < n}. We denote
a speculation phase by an element of N2< . A speculation phase
(m, n) ∈ N2< describes the possible behaviors of the composition
of speculation phases from m to n.
We now explain why we make this generalization.
We would like to define a property P (n) of speculation phase
n, for n ∈ N, such that for all m ∈ N the composition
P (1)||P (2)|| . . . ||P (m) implements LinT . To prove that
P (1)||P (2)|| . . . ||P (m) |= LinT
we generalize P (n) to P (m, n) in order to use induction.
We would like P (m, n) to be such that:
For all natural number n > 1, P (1, n) |= LinT
For all (m, n) ∈
Speculation Phases
We now consider concurrent objects implemented by several speculation phases. Clients use a speculation phase by calling the object’s
invocation functions and the invocation functions return outputs to
the clients. However, a speculation phase also accepts initialization
calls from clients and it may switch to another speculation phase.
Moreover, each speculation phases is identified by a natural
number and a speculation phase identified by n may only switch
to a speculation phase identified by n + 1. Clients start by accessing speculation phase 1 and continue to do so as long as they return
in the phase 1. However, speculation phase 1 may switch to phase
2. In this case it passes an initialization value and its pending input
to phase 2. A client which returns in phase 2 stops accessing speculation phase 1 and instead uses phase 2 for its next invocations.
Let S1 and S2 be two consecutive speculation phases. Suppose
that client c1 invokes input in1 on S1 ; client c2 invokes input in2 on
S1 ; client c2 switches from S1 to S2 providing initialization value
v; client c1 returns response out1 from S1 ; finally client c2 returns
response out2 from S2 . This sequence of interactions is denoted by
N2<
and (n, o) ∈
(1)
N2< ,
P (m, n)||P (n, o) |= P (m, o)
(2)
Using induction and equation 2 we have that for all n > 1,
P (1, 2)||P (2, 3)|| . . . ||P (n − 1, n) |= P (1, n).
The LinT Trace Property
We define the trace property LinT such that the signature sig(LinT )
is sigT and the set Traces(LinT ) is the set of all traces satisfying
linearizability.
Finally, consider a system S whose behavior is denoted by a
trace property P . We say that the system S implements the ADT T
if and only if proj(S, sigT ) |= LinT .
5.
this sequence of events, from now on called actions:
Thus, with equation 1, we have that for all n > 1,
P (1, 2)||P (2, 3)|| . . . ||P (n − 1, n) |= LinT
(3)
Now suppose that we have a family of trace properties
Q1 , Q2 , . . . , Qn , representing implementations of speculation
phases, such that Qi |= P (i, i + 1) for all i ∈ [1..n]. Let
Comp = Q1 ||Q2 || . . . ||Qn . By property 1 and equation 3, we
have that Comp |= LinT .
Signature of a Speculation Phase We now define a signature
sigT (m, n, Init) which models the interface of a speculation phase
(m, n) that uses initialization values from the set Init. For all
m ∈ N and any arbitrary non-empty set Init of initialization values,
we define the set of switch actions as formed of all actions of the
form swi(c, m, in, v) where c is a client, m ∈ N, in ∈ IT , and
v ∈ Init.
Definition 16. Signature of a speculation phase. For all (m, n) ∈
N2< and any non-empty set Init of initialization values we define the
signature sigT (m, n, Init) as the union of the sets of all invocation
actions inv( , o, ), of all response actions res( , o, , ), and of all
switch actions swi( , o, , ), where o ∈ [m..n].
5.2
Specification of a Speculation Phase
In this section we propose a specification for speculation phases.
We consider a set of initialization values Init, and a relation rinit ∈
−1
Init × IT∗ such that rinit
is a total onto function. The relation rinit
associates a set of input sequences to any value in Init. Remember
that the init values convey information about the execution of one
phase to the next phase. We think of this information in terms of input histories which represent possible linearizations of the concurrent trace of a phase. We say the set of input sequences associated
by rinit with a value is the value’s possible interpretations.
We will now define a trace property SLinT (m, n), such that
• For all natural number m, proj(SLinT (1, m), acts(sigT )) |=
LinT .
all (m, n)
∈
N2< and (n, o)
∈
N2< ,
SLinT (m, n)||SLinT (n, o) |= SLinT (m, o).
We define the property SLinT (m, n) in the same vein as LinT .
Consider a trace t in sigT (m, n, Init).
Definition 17. Interpretation of Init Actions. We say that a function f : [1..|t|] → I ∗ is an interpretation of the init actions
of t iff for all index i ∈ [1..|t|], if t(i) = swi( , m, , v) then
(v, f (i)) ∈ rinit .
Definition 18. Interpretation of Abort Actions. We say that a
function f : [1..|t|] → I ∗ is an interpretation of the abort actions
of t iff for all index i ∈ [1..|t|], if t(i) = swi( , n, , v) then
(v, f (i)) ∈ rinit .
Definition 19. Speculatively Linearizable. We say that trace t is
(m, n)-speculatively linearizable iff trace t is (m, n)-well-formed
and for all interpretation finit of the init actions of t, there exists
two functions g : [1..|t|] → IT∗ and fabort : [1..|t|] → IT∗ such that:
• fabort is an interpretation of the abort actions of t.
• g is an (finit , fabort , m, n)-speculative linearization function for
t.
• For
5.3
Speculative Linearization Functions
Intuition Behind the Definitions. Consider a speculation phase S1
whose clients switch to speculation phase S2 . We require clients to
switch with a value whose set of possible interpretations includes
a possible linearization of the concurrent trace at the point where
the client switches. Note that every history (1) of which all commit
histories are prefix and (2) that contains only values that have been
invoked is a possible linearization. This is the intuition behind the
definition of Abort-Order property (Definition 32).
To ensure that phase S2 continues executing consistently with
what phase S1 did, we require all its commit histories to have as
a prefix one of the possible linearizations of S1 ’s trace. This is the
intuition behind the Init Order property (Definition 31). However
S2 receives switch values that represent sets of histories, and cannot determine which history in this set is a possible linearization.
Hence the quantifier alternation in Definition 19: for all possible
interpretations of the init values, phase S2 needs to ensure the existence of a proper linearization function.
Definition 20. Speculative linearization function. A function g :
[1..|t|] → IT∗ is a (finit , fabort , m, n)-speculative linearization function for t iff:
• The function g explains trace t.
• The predicates Validity(t, g, finit , fabort ), Commit-Order(t, g),
Init-Order(t, g, finit , fabort ), and Abort-Order(t, g, fabort ) hold.
Definition 21. Explains. We say that the function g explains trace
t iff for all index i ∈ [1..|t|], if t(i) = res( , , , out) then out =
fT (g(i))
Consider a function g : [1..|t|] → IT∗ that explains trace t and
consider two functions finit ⊆ rinit and fabort ⊆ rinit . In order to
define validity, commit-order, init-order, and abort-order we need
the definitions 22 to 28 below.
Definition 22. Commit Indices and Commit Histories. If i ∈
[1..n] is such that t(i) = res( , , , ) then we say that i is a
(t, m, n)-commit index and that g(i) is a (t, g)-commit history.
Definition 23. Init Indices and Init Histories. If i ∈ [1..n] is such
that t(i) = swi( , m, , v) we say that i is a (t, m, n)-init index
and that finit (i) is a (t, m, n, finit )-init history.
Definition 24. Abort Indices and Abort Histories. If i ∈ [1..n] is
such that t(i) = swi( , n, , v) we say that i is a (t, m, n)-abort
index and that fabort (i) is a (t, m, n, fabort )-abort history.
The multiset ivi(m, t, finit , i) contains all the inputs known to
to have been invoked in phases (k, l), where l ≤ m:
Definition 25. Initially Valid Inputs.
ivi(m, t, finit , i) =
[
{elems(finit (v)) ∪ {in} | ∃j < i.t(i) = swi( , m, in, v)}.
S
Note that, in the previous definition,
is the multiset union
(defined in Section 3) and corresponds to pointwise maximum of
representation functions.
Now recall that inputs(t, i) is the sequence of inputs that were
invoked before i in t.
Definition 26. Valid Inputs. We define
vi(m, t, finit , i) = ivi(m, t, finit , i) ] elems(inputs(t, i))
Definition 27. Valid Commit Index. For all index i such that
t(i) = res( , , in, ), we say that i is (t, g, finit )-valid if and only if
elems(g(i)) ⊆ vi(m, t, finit , i) and the input in is the last element
of history g(i).
Definition 28. Valid Abort Index. For all index i such that t(i) =
swi( , n, in, v), we say that i is (t, finit , fabort )-valid if and only if
elems(fabort (v)) ∪ {in} ⊆ vi(m, t, finit , i).
We now define validity, commit-order, init-order, and abortorder:
Definition 29. Validity(t, g, finit , fabort ): All (t, m, n)-commit indices are (t, g, finit )-valid and all (t, m, n)-abort indices are
(t, finit , fabort )-valid.
Definition 30. Commit Order(t,g): For all pair of distinct commit
indices i, j, either g(j) is a strict prefix of g(i) or g(i) is a strict
prefix of g(j).
Definition 31. Init Order(t, g, finit , fabort ): The longest common
prefix of all (t, m, n, finit )-init histories is a strict prefix of all (t, g)commit histories and of all (t, m, n, fabort )-abort histories.
We assume that the longest common prefix of an empty set of
histories is the empty history.
Definition 32. Abort Order(t, g, fabort ): All (t, g)-commit histories
are prefix of all (t, m, n, fabort )-abort histories.
Note that if m = 1, then t has no init histories. Hence Init Order
does not constrain commit and abort histories.
5.4
Well-Formed Traces
The well-formedness condition defines basic requirements on
traces of a concurrent object.
Given a client c let ActT (c, m, n) be the set of actions such that
ActT (c, m, n) = {inv(c, o, in)|o ∈ [m..n] ∧ in ∈ IT }
∪{res(c, o, in, r)|o ∈ [m..n] ∧ in ∈ IT ∧ r ∈ OT }
∪{swi(c, o, in, v)|o ∈ {m, n} ∧ in ∈ IT ∧ r ∈ OT ∧ v ∈ Init}
Note that the switch actions whose second parameter is not m or n
are projected away.
Definition 33. Client sub-trace. The (m, n)-client-sub-trace of the
trace t is the trace sub(t, m, n, c) = proj(t, ActT (c, m, n)).
Definition 34. Well-formed client sub-trace. A (m, n)-client subtrace tc is well-formed iff it is empty or:
• For all i ∈ [1..|tc | − 1], if t(i) = inv( , , in) or t(i) =
swi( , m, in, ), then t(i + 1) = res( , , in, ) or t(i + 1) =
swi( , n, in, ).
• If tc contains an abort action then it is the last element of tc .
• If m 6= 1 then tc (1) is an init action and there are no other init
actions in tc .
• If m = 1 then tc (1) is an invocation and there are no init
actions in tc .
Definition 35. Well-formed traces. We say that the trace t is
(m, n)-well-formed if and only if all its (m, n)-client sub-traces
are well-formed.
5.5
The SLinT Trace Property
We define the trace property SLinT (m, n) as follows:
Definition 36. The SLinT (m, n) is such that the signature
sig(SLinT (m, n))
is
sigT (m, n, Init)
and
the
set
Traces(SLinT (m, n)) is the set of all traces satisfying (m, n)speculative linearizability.
Finally, consider a system S whose behavior is denoted by a
trace property P . We say that the system S is a (m, n)-speculation
phase if and only if proj(P, sig(m, n)) |= ALin(m, n).
Theorem 2. For all natural number m, any initialization set Init,
and any initialization relation rinit ,
proj(SLinT (1, m), acts(sigT )) = LinT
Proof. Compared to linearizability, speculative linearizability only
adds constraints on the actions that are not in the signature sigT .
5.6
Intra-Object Composition Theorem
Our main theorem states that the composition of two speculation
phases is also a speculation phase.
Theorem 3. Suppose that S1 |= SLinT (m, n) and S2 |=
SLinT (n, o), then proj(S1 ||S2 , sigT (m, o, Init)) |= SLinT (m, o).
The proof of this result in the technical report [9]. It consists of
two pages in the current format. Among the key steps is the construction of a speculative linearization function g that explains the
traces of the composition of two phases. The function is constructed
by merging two speculative linearization functions. To show that
it is indeed a speculative linearization function we use a form of
transitive reasoning to relate commit histories between different
phases through abort histories. Among the properties of speculative linearizability, Validity is the most difficult to show, in part because it is sensitive to the fact that we allow duplicate events in our
traces (something that even most existing developments of classic
linearizability ignore).
6.
Automata Specification of Speculative
Linearizability and Isabelle/HOL Proof
We have also formalized in Isabelle/HOL an executable version of
the specification of speculative linearizability. Our formalization is
based on the theory of I/O-automata [21], which is formalized in
Isabelle’s IOA framework [22]. Our specification automaton corresponds to speculative linearizability instantiated for State Machine Replication [16] protocols (abbreviated SMR protocols). We
wrote a mechanically-checked proof of the intra-object composition theorem for the automata specification. Thanks to this proof,
it is now possible to obtain mechanically-checked proofs of speculative SMR protocols from the mechanically-checked proofs of its
speculation phases taken in isolation. Moreover, our proof of the
intra-object composition theorem shows that refinement proofs in
our framework are practical.
The speculative approach to SMR protocols has been shown
to yield some of the most efficient SMR protocols in practice [8].
Therefore, our formalization and the proof of the intra-object composition theorem enable for the first time scalable mechanicallychecked proofs of practical SMR protocols. A proof document and
the corresponding Isabelle theories are available at the Archive of
Formal Proofs [10].
Informal Description of the Automata Specification. The automata specification is a formalization of speculative linearizability
for an ADT that we call the universal ADT. The output function of
the universal ADT is the identity function. In other words, this ADT
responds to an invocation with its full trace, in the form of a history. The universal ADT can be used as an abstraction for generic
SMR protocols [8] because, given a linearizable implementation, it
suffices to apply the output function of another ADT A to the responses in order obtain an implementation of A. We formalize the
case where histories of inputs are used as initialization values and
where the relation rinit maps a history h to the singleton set {h}.
The specification automaton can be seen as an implementation of
speculative linearizability in which all clients reside on a unique,
centralized, process.
The automaton receives invocations and switch calls as inputs
and produces responses and switch call outputs.
The specification automaton maintains a state with the following components:
• A history hist, representing the longest linearization made visible to a client (i.e. the longest commit history);
• for each client c, a phase phase(c) ∈ { Sleep, Pending, Ready,
or Aborted };
• for each client c, pending(c), the last input submitted by c;
• a set InitHists containing the init histories received;
• two booleans: aborted and initialized.
We say that an input is pending if it is the last submitted input of a
client c whose phase is “Pending” and if it is not present in hist.
The automaton starts in a state where hist is the empty history,
InitHists is the empty set, aborted and initialized are set to false,
and for all clients c, phase(c) = “Sleep”.
We now describe how the automaton reacts to the inputs it
receives, namely invocations and switch calls from the previous
phase. When receiving a switch call from the previous phase by
client c, if phase(c) equals “Sleep” then the automaton adds the
provided init history to the set InitHists and sets pending(c) to
the provided input. When receiving a new invocation by client c,
if phase(c) equals “Ready”, the automaton sets pending(c) to the
input received. In both cases phase(c) is then set to “Pending”.
This denotes the fact that client c is waiting for a response to input
pending(c).
Moreover, the automaton nondeterministically performs one of
the following actions:
A1 If initialized = false and if there is at least one client c such
that phase(c) is not “Sleep”, then the automaton computes the
longest common prefix of the histories in InitHists, assigns it to
hist, and sets initialized to true.
A2 It selects a pending input, say from client c, appends it to hist,
emits an output for client c containing the new value of hist as
response, and sets phase(c) to “Ready”.
A3 It sets aborted to true.
A4 If aborted = true then a client c such that phase(c) is not
“Aborted” is selected is phase(c) is set to “Aborted”. Moreover,
the automaton emits a switch action containing an abort value
h0 such that hist is a prefix of h0 and the inputs of h0 which are
not in hist are pending.
The precise definition of our specification automaton is available in the ALM entry [10] of the Archive of Formal Proofs.
Relation to the Trace Based Specification.
The trace-based
specification requires associating histories to responses (commit
histories) and switch actions (init and abort histories). Because our
ADT’s output function is the identity and because rinit (h) = {h},
we have no choice but to associate to a response or switch action
the very history it contains (remember that outputs and switch
values are histories in this section). Hence commit histories are
obtained by truncating hist at a pending request. Since hist grows by
appending to it, we have that Commit Order is satisfied. Moreover,
abort histories are obtained by appending some pending requests to
hist, and at this point hist does not grow anymore. Hence Abort
Order holds. Since hist is initialized with the longest common
prefix of the init histories seen, Init Order also holds. Finally, hist
grows by appending pending requests to it, Validity is also satisfied.
To gain more insight into speculative linearizability, one may
also observe that any extension of history hist with some pending
requests is a linearization of the current trace. Thanks to this observation, step A2 may be interpreted as selecting a possible linearization and producing an output that realizes it. Step A4 can be seen
as passing one of the possible linearizations to the next speculation
phase. Finally, in step A1, the automaton computes the weakest
(given the init histories it received) under-approximation of the set
of possible linearizations of the previous phase and selects one of
them to initialize its execution.
Proof of the Intra-Object Composition Theorem. To prove
the intra-object composition theorem in this formulation, we construct a refinement mapping [20] between the composition of two
speculation phases and a single speculation phase. We prove the
refinement mapping correct with the help of 15 state invariants
about the composed automaton. Using the meta-theorem about refinement mappings (included in the IOA theory packaged with Isabelle/HOL), we conclude that the set of traces of the composition
of two speculation phases is included in the set of traces of a single
speculation phase. The proof is written in the structured proof language Isar [26] and consists of roughly of 500 proof steps. With the
specification, it forms a total of 1600 lines of Isabelle/HOL code.
Our automata specification can be used as the basis for
mechanically-checked refinement proofs of distributed protocols.
Our proof of the composition is a good example of such a refinement proof and shows the practicality of the approach.
7.
Concluding Remarks
We have presented speculative linearizability, an extension of the
theory of linearizability. Our extension allows us to implement
linearizable objects by composing independently devised speculation phases that are optimized for different situations. This form
of intra-object composition complements the classical concept of
inter-object composition, inherent to linearizability in its traditional
form. We propose a formalized framework that enables, for the first
time, scalable mechanically-checked development of linearizable
objects of arbitrary abstract data types.
Our work can be viewed as generalization of [1] and [2]. The
former work focused on contention-free executions and did not
address composition: a protocol that aborts simply stops executing
with no possibility of switching. In the latter work, a classification
of the trace properties that are switchable is proposed. A global
synchronization is required for switching. In contrast, our notion of
speculative linearizability does not require the switching protocols
to solve agreement and is, in that sense, much more general. Our
work can also be viewed as a formalization of a generalized form
of Abortable State Machine Replication (Abstract), an abstraction
presented in an intuitive form in [8] to allow the construction of
speculative Byzantine Fault Tolerant SMR protocols by composing
independently designed speculation phases. Compared to [8] , our
work concerns arbitrary abstract data types, including one-shot
ones, and generalizes the idea to linearizability. Being strictly more
general, our framework can be used to reason about the faulttolerant algorithms developed following those approaches [1, 2, 8].
References
[1] M. K. Aguilera, S. Frolund, V. Hadzilacos, S. L. Horn, and S. Toueg.
Abortable and query-abortable objects and their efficient implementation. In PODC, 2007.
[2] M. Bickford, C. Kreitz, R. v. Renesse, and X. Liu. Proving hybrid
protocols correct. In TPHOLs ’01, 2001.
[3] W. J. Bolosky, D. Bradshaw, R. B. Haagens, N. P. Kusters, and P. Li.
Paxos replicated state machines as the basis of a high-performance
data store. In Proc. NSDI. USENIX Assoc., 2011.
[4] M. Burrows. The Chubby lock service for loosely-coupled distributed
systems. In Proc. OSDI. USENIX Assoc., 2006.
[5] M. Castro and B. Liskov. A correctness proof for a practical byzantinefault-tolerant replication algorithm. Technical report, MIT, 1999.
[6] M. Castro and B. Liskov. Practical Byzantine fault tolerance. In OSDI,
1999.
[7] E. Gafni and L. Lamport.
16(1):1–20, 2003.
Disk paxos.
Distributed Computing,
[8] R. Guerraoui, N. Knezevic, V. Quema, and M. Vukolic. The Next 700
BFT Protocols. In EUROSYS, 2010.
[9] R. Guerraoui, V. Kuncak, and G. Losa. Speculative Linearizability.
Technical Report 170038, EPFL, 2011.
[10] R. Guerraoui, V. Kuncak, and G. Losa. Abortable linearizable
modules.
In G. Klein, T. Nipkow, and L. Paulson, editors,
The Archive of Formal Proofs. http://afp.sf.net/entries/
Abortable_Linearizable_Modules.shtml, March 2012. Formal
proof development.
[11] M. Herlihy. Wait-free synchronization. ACM Trans. Program. Lang.
Syst., 13:124–149, January 1991.
[12] M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition
for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463–
492, 1990.
[13] M. Jaskelioff and S. Merz. Proving the correctness of Disk Paxos. In
G. Klein, T. Nipkow, and L. Paulson, editors, The Archive of Formal
Proofs. http://afp.sf.net/entries/DiskPaxos.shtml, June
2005. Formal proof development.
[14] P. Jayanti. Adaptive and efficient abortable mutual exclusion. In
PODC, 2003.
[15] R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva:
speculative Byzantine fault tolerance. In SOSP, 2007.
[16] L. Lamport. The implementation of reliable distributed multiprocess
systems. Computer Networks, 2:95–114, 1978.
[17] L. Lamport. On interprocess communication. part I: Basic formalism.
Distributed Computing, 1(2):77–85, 1986.
[18] L. Lamport. On interprocess communication. part II: Algorithms.
Distributed Computing, 1(2):86–101, 1986.
[19] L. Lamport. A fast mutual exclusion algorithm. ACM Trans. Comput.
Syst., 5(1):1–11, 1987.
[20] N. Lynch and F. Vaandrager. Forward and backward simulations I:
untimed systems. Inf. Comput., 121:214–233, September 1995.
[21] N. A. Lynch and M. R. Tuttle. An introduction to input/output automata. CWI Quarterly, 2:219–246, 1989.
[22] O. Müller. I/O automata and beyond: Temporal logic and abstraction
in Isabelle. In TPHOLs, pages 331–348, 1998.
[23] F. Pedone. Boosting system performance with optimistic distributed
protocols. Computer, 34(12):80–86, 2001.
[24] A. Singh, T. Das, P. Maniatis, P. Druschel, and T. Roscoe. BFT
protocols under fire. In NSDI, 2008.
[25] M. M. V. Luchangco and N. Shavit. On the uncontended complexity
of consensus. In ICDCS, pages 45–59, 2003.
[26] M. Wenzel. Isar - a generic interpretative approach to readable formal
proof documents. In TPHOLs, pages 167–184, 1999.
A.
In this section we propose a formalization of the definition of
linearizability found in [12].
A.1
In this case we say that t0 is a linearization of t.
Linearizability with Repeated Events
Sequential Traces
A sequential trace in sigT is a trace in sigT composed of alternating
invocation and responses, starting with an invocation, and such that
a response at index i responds to the invocation at index i − 1.
Definition 37. Sequential Trace. A trace t in sigT is sequential iff
t(1) = inv( , , ) and for all i ∈ [1..|t| − 1], if t(i) = inv(c, , in)
then t(i + 1) = res(c, , in, ).
Example 3. A sequential trace is a trace of the form
[inv(c1 , , in1 ), res(c1 , , in1 , out1 ), inv(c2 , , in2 ), res(c2 , , in2 , out2 ),
. . . , inv(cn , , inn ), res(cn , , inn , outn )]
Definition 38. Trace Agreeing with an ADT. A sequential trace
agrees with the ADT T iff
A.4
Linearizability∗
Definition 46. Linearizability∗ A trace t in sigT is linearizable∗ iff:
• Trace t is well-formed
• There exists a completion of t that is linearizable∗ .
B.
Equivalence Proof
In this section we informally prove that the two definitions of
linearizability given above are equivalent. If a trace t is linearizable
then we can use any linearization of t to obtain a linearization
function for t. Conversely, if we know a linearization function for
t, we can obtain a linearization tseq of t.
Consider a trace t in sigT that is well-formed.
Lemma 1. If a completion of trace t is linearizable, then t is
linearizable.
for all i ∈ [1..n].
Proof. Let tcomp be a linearizable completion of trace t. Let gcomp
be a linearization function for tcomp . The restriction of gcomp to
the domain of t is a linearization function for t.
A.2
Lemma 2. If t is linearizable∗ then t is linearizable.
outi = fT ([in1 , . . . , ini ])
Completion of a Trace
Definition 39. Complete traces. A trace t in sigT is complete if
and only if it is well-formed and it has no pending invocations.
In other words, t is complete iff
• t is well-formed
• for all index i ∈ [1..|t|], if t(i) = inv( , , in) then there exists
an index j ∈ [i..|t|] such that t(j) = res( , , inv, )
Note that a complete trace has a number of elements that is even.
Definition 40. Completion of a trace. Trace t0 is a completion of
trace t if and only if
• t is a prefix of t0
• t0 is a complete trace.
A.3
Proof. Suppose that there exists a completion tcomp of t that is
linearizable∗ . By definition of linearizability∗ we know that there
exists a sequential trace tseq such that:
• tseq agrees with the semantics of T .
• tseq is a reordering of tcomp .
• tseq preserves the order of non-overlapping operations in tcomp .
The linearization function g. Since tseq agrees with the semantics of T we get
tseq =[inv(c1 , , in1 ), res(c1 , , in1 , out1 ), inv(c2 , , in2 ), res(c2 , , in2 , out2 ),
. . . , inv(cn , , inn ), res(cn , , inn , outn )]
with outi = fT ([in1 , in2 , . . . , ini ]) = fT (inputs(tseq , i)).
(4)
Linearizability of Complete Traces
We consider a complete trace t in sigT and a sequential trace t0 in
sigT .
Definition 41. Reordering of a trace. Trace t0 is a reordering of
trace t iff |t| = |t0 | and there exists a permutation σ of [1..|t|] such
that t0 (σ(i)) = t(i) for all i ∈ [1..|t|].
Definition 42. Invocation index. We call invocation index any
index i ∈ [1..t] such that t(i) = inv( , , ).
Definition 43. Response function. We define the response function
respt : [1..|t|] → [1..|t|]
such that given an invocation index i ∈ [1..|t|], if t(i) =
inv(c, , in) then respt (i) is the smallest index j ≥ i such that
t(j) = res(c, , in, ).
Definition 44. Preserving the order of non-overlapping operations. If t0 is a reordering of t according to σ, then trace t0 preserves the ordering of non-overlapping operations in t iff for all
pairs i, j of invocation indices,
if respt (i) < j then σ(i) < σ(j)
σ(respt (i)) = σ(i) + 1
Definition 45. Linearizable∗ complete trace. A complete trace t is
linearizable∗ iff there exists a sequential trace t0 in sigT such that:
• t0 agrees with the ADT T .
• t0 is a reordering of t.
• t0 preserves the order of non-overlapping operations in t.
Moreover, tseq being a reordering of tcomp , we get that there
exists a permutation σ of [1..|t|] such that
tseq (σ(i)) = tcomp (i) for all i ∈ [1..|t|].
(5)
Let function g be such that
for all commit index i of tcomp , g(i) = inputs(tseq , σ(i))
(6)
In other words, g(i) is the current input history at index σ(i) in
tseq .
We now show that g is a linearization function for tcomp .
Function g explains trance tcomp .
tcomp , then
If i is a commit index in
tcomp (i) = tseq (σ(i))
= res( , , , fT (inputs(tseq , σ(i)))
= res( , , , fT (g(i)))
using 5
using 4
using 6.
Hence
g explains trace tcomp .
Commit Order. For all distinct pairs i, j of commit indices of
tcomp , we get that
• g(i) = inputs(tseq , σ(i)) and g(j) = inputs(tseq , σ(j)), by 6.
• σ(i) =
6 σ(j) and both are commit indices of tseq , by 5.
• inputs(tseq , i) is a strict prefix of inputs(tseq , j) or vice-versa,
by 4.
Hence commit-order(tcomp , g) holds.
It remains to show that validity(tcomp , g) holds.
Observe that tcomp is a complete trace. Moreover let
Validity. Consider an index i ∈ [1..|tcomp ] such that tcomp (i) =
res( , , in, ). We have that
for all i ∈ [1..|t| + |t0 |]. Observe that gcomp is a linearization
function for tcomp .
We conclude that tcomp is a linearizable complete trace.
gcomp (i) = if i ≤ |t| then g(i) else hi−|t|
• tseq (σ(i)) = tcomp (i), by 5.
• tseq (σ(i)) = res( , , in0 , [. . . , in0 ]), by 4.
Lemma 4. Suppose that t is a linearizable complete trace. Then t
is linearizable∗ .
Hence
tcomp (i) ends with in.
It remains to show that for all commit indices i,
(7)
elems(g(i)) ⊆ elems(inputs(tcomp , i)).
Proof. Suppose that t is a linearizable complete trace. Then t is
well-formed and admits a linearization function g. Function g is
such that
• Function g explains t.
• The validity(t, g) predicate holds.
• The commit-order(t, g) predicate holds
For this, suppose the there exists a commit index i such that
elems(g(i)) 6⊆ elems(inputs(tcomp , i)).
Then we know that there exits an invocation index j such that
Consider a permutation σ of [1..|t|] such that:
j>i
(8)
σ(j) < σ(i) − 1
(9)
(10)
From 10 and 8 we have that resptcomp (k) < j, hence since tseq
preserves the order of non-overlapping operations in tcomp we have
that
σ(k) = σ(i) − 1
σ(k) < σ(j)
(11)
However from 11 and 9 we have that
From σ(i) < σ(j) and σ(j) < σ(i) we get a contradiction. Hence
for all commit index i,
(12)
From 7 and 12 we get that
Validity(tcomp , g) holds.
Conclusion. From g explains trace tcomp , commitorder(tcomp , g), and Validity(tcomp , g) we get that g is a
linearization function for tcomp . Since tcomp is a completion of t
we get by lemma 1 that t is linearizable.
Lemma 3. Suppose that t is linearizable. Then there exists a
completion of t that is linearizable.
Proof. Suppose that t admits a linearization function g.
Consider the last commit index ilast of t. Let the history hlast
be such that hlast = g(ilast ). Now consider the set of all invocation
indices of t whose invocation is pending. Arbitrarily order this
set to form the sequence is = [i1 , i2 , . . . , in ]. Let c1 . . . cn and
in1 . . . inn be such that for all j ∈ [1..n], t(ij ) = inv(cj , , inj ).
Let hs = [h1 , h2 , . . . , hn ] where
hj = hj−1 ::inj for all j ∈ [2..n]
Let the trace tseq be such that |tseq | = |t| and for all i ∈ [1..|t|],
t(i) = ts eqσ(i). Observe that tseq is a sequential trace. Hence
to prove that t is linearizable∗ it remains to show that tseq agrees
with the ADT T . Because g is a linearization function for t we
know that if t(i) = res( , , , out) then out = fT (g(i)). Hence
if tseq (i) = res( , , , out) then out = fT (g(σ −1 (i))). Moreover,
from validity(t, g) we get that g(i) = inputs(tseq , σ(i)). Therefore
tseq agrees with the ADT T .
Lemma 5. Suppose that t is linearizable. Then there exists a
completion of t that is linearizable∗ .
σ(j) < σ(k).
elems(g(i)) ⊆ elems(inputs(tcomp , i)).
g(j) then σ(i) < σ(j).
• For all invocation index i of t, σ(i) = σ(respt (i)) − 1.
Let the invocation index k be such that
resptcomp (k) = i.
• For all distinct commit index i and j, if g(i) is a strict prefix of
Proof. By lemma 3 and 4.
Theorem 4. A trace t in sigT is linearizable iff it is linearizable∗ .
Proof. By lemma 2 and 5
C.
Composition Theorem
The composition theorem states that the composition of two speculation phases is a speculation phase.
Theorem 5. Suppose that
S1 |= SLinT (m, n)
and
S2 |= SLinT (n, o)
Then
proj(S1 ||S2 , sigT (m, o, Init)) |= SLinT (m, o).
In the rest of this section we prove theorem 5.
We consider an ADT T and two speculation phases
(m, n, Init, rinit ) and (n, o) and such that m < n < o, Init 6= ∅,
and rinit 6= ∅.
Consider a trace
t ∈ acts(sigT (m, o, Init))∗
and
h1 = hmax ::in1
and let
Let
tmn = proj(t, acts(sigT (m, n, Init)))
t0 = [res(c1 , , in1 , fT (h1 )), res(c1 , , in2 , fT (h2 )),
. . . , res(cn , , inn , fT (hn ))]
and
tcomp = t:::t0
and
tno = proj(t, acts(sigT (n, o, Init))).
Assume that tmn is (m, n)-abortable linearizable and that tno
is (n, o)-abortable linearizable.
We would like to show that
• The trace t is (m, o)-well formed.
• For all interpretation finit of the init actions of t, there exists two
functions g : [1..|t|] → IT∗ and fabort : [1..|t|] → IT∗ such that:
fabort is an interpretation of the abort actions of t.
g is an (finit , fabort , m, n)-abortable linearization function
for t.
lemma 6 we know that k is a (tcno , n, o)-init index. Because tcno is
(n, o)-well-formed we know that j = 1.
To sum up we have obtained an index j ∈ [1..|tc |] such that
poscmn (j) is the last index of tcmn and poscno (j) is the first index
of tcno . Thus tc is the concatenation of tcmn and tcno . It is then easy
to see that tc is (m, o)-well formed because tcmn is (m, n)-well
formed and tcno is (n, o)-well formed.
First observe that
Now consider a function finit that is an interpretation of the init
acts(sigT (m, n, Init))∪acts(sigT (n, o, Init)) = acts(sigT (m, o, Init)).
actions in t.
Hence all actions of t appear at least in one of tmn or tno .
C.2 Existence of fabort and g such that g is an
We define two injective functions pos0mn : [1..|tmn |] → [1..|t|]
(finit , fabort , m, o)-abortable linearization function.
and pos0no : [1..|tno |] → [1..|t|] that create a correspondence
between the indices of t and its two sub-traces tmn and tno .
Because
tmn is (m, n)-abortable linearizable we know that there
mn
The function pos0mn is such that for all i ∈ [1..|tmn |]:
exists gmn and fabort
such that
• tmn (i) = t(pos0mn (i))
mn
• f
is an interpretation of the abort actions of t .
• pos0mn is strictly monotonically increasing.
Similariy the function pos0no is such that for all i ∈ [1..|tno |]:
• tno (i) = t(pos0no (i)).
• pos0no is strictly monotonically increasing.
We now define the two functions posmn = (pos0mn )−1 and
posno = (pos0no )−1 . Note that posmn and posno are injective
partial functions. Moreover, for any i ∈ [1..|t], i is in the domain
of at least one of posmn or posno .
We will be able to relate properties of tmn and tno thanks
to lemma 6. Informally, lemma 6 says that the abort histories of
speculation phase (m, n) are the init histories of speculation phase
(n, o).
Lemma 6. For all index i ∈ [1..|t|], the index posmn (i) is a
(tmn , m, n)-abort index if and only if the index posno (i) is a
(tno , n, o)-init index.
Proof. By the definition of (tmn , m, n)-abort index and
(tno , n, o)-init index.
abort
mn
mn
• gmn is an (finit , fabort
, m, n)-abortable linearization function for
tmn .
no
mn
Let finit
= fabort
◦ posmn ◦ (posno )−1 . By lemma 6 we have
no
that finit is an interpretation of the init actions of tno .
Hence, because tno is (n, o)-abortable linearizable we know
no
that there exists gno and fabort
such that
no
• fabort is an interpretation of the abort actions of tno .
mn
no
• gno is an (fabort
, fabort
, n, o)-abortable linearization function for
tno .
Let the function g be such that, for all commit index i of t,
• if i is a (t, m, n)-commit index then g(i) = gmn (posmn (i)).
• if i is a (t, n, o)-commit index then g(i) = gno (posno (i)).
no
Let fabort = fabort
.
We will show that:
• g explains trace t.
• The following predicates are satisfied:
Validity(t, g, finit , fabort ).
C.1
Trace t is Well-Formed.
We now prove that if tmn and tno are both well-formed then t is
well-formed.
Lemma 7. For all client c, if sub(tmn , c, Init) is (m, n)well-formed and sub(tno , c, Init) is (n, o)-well-formed then
sub(t, c, Init) is (m, o)-well-formed.
Proof. Consider a client c. Let tcmn = sub(tmn , c, Init), tcno =
sub(tno , c, Init), and tc = sub(t, c, Init) be the client projections
of tmn , tno , and t. Let us consider the two following cases.
Suppose that there exists no (tcmn , m, n)-abort index. Then by
lemma 6 there exists no (tcno , n, o)-init index. Then because tcno is
(n, o)-well-formed we have that tcno = []. Thus tc = tcmn . Because
tcmn is (m, n)-well-formed and n > 1 we get that tc is (m, o)-wellformed.
Now we assume that there exists a (tcmn , m, n)-abort index
iabort . Let us define the strictly monotically increasing functions
poscmn and poscno such that, for all i ∈ [1..|tc |],
• If tc (i) ∈ sigT (c, m, n, Init) then tc (i) = tcmn (poscmn (i)).
• If tc (i) ∈ sigT (c, n, o, Init) then tc (i) = tcno (poscno (i)).
By analogy with lemma 6 we get that for all index i ∈ [1..|tc |],
the index poscmn (i) is a (tcmn , m, n)-abort index if and only if the
index poscno (i) is a (tcno , n, o)-init index.
Let j = (poscmn )−1 (iabort ). Because tmn is (m, n)-valid with
have that iabort = |tcmn |. Let k = poscno (j). By the analogous to
Commit-Order(t, g).
Init-Order(t, g, finit , fabort ).
Abort-Order(t, g, fabort ).
Lemma 8. The function g explains trace t.
Proof. By the definition of g and the fact that gmn explains trace
tmn and gno explains trace tno .
mn
Since gmn is an an (finit , fabort
, m, n)-abortable linearization
no
no
function for tmn and gno is an (finit
, fabort
, n, o)-abortable linearization function for tno we know that the following predicates
are satisfied:
mn
• Validity(tmn , gmn , finit , fabort
)
• Commit-Order(tmn , gmn )
mn
• Init-Order(tmn , gmn , finit , fabort
)
mn
• Abort-Order(tmn , gmn , fabort
)
no
no
• Validity(tno , gno , finit
, fabort
)
• Commit-Order(tno , gno )
no
no
• Init-Order(tno , gno , finit
, fabort
)
no
• Abort-Order(tno , gno , fabort
)
Lemma 9. Validity(t, g, finit , fabort ) holds.
Proof. We first show that any commit index of t is (t, g, finit )-valid.
Consider an index i ∈ [1..|t|] such that t(i) = res( , , in, ). We
would like to show that
• elems(g(i)) ⊆ vi(m, t, finit , i)
• The input in is the last element of g(i).
Observe that either i is a (t, m, n)-commit index or i is a
(t, n, o)-commit index.
Suppose that i is a (t, m, n)-commit index. It follows that:
1. t(i) = tmn (posmn (i)), by definition of posmn
2. g(i) = gmn (posmn (i)), by definition of g
3. The index posmn (i) is (tmn , gmn , finit )-valid,
mn
by Validity(tmn , gmn , finit , fabort
)
From 1 and 3 we get that the input in is the last element of
gmn (i). With 2 we conclude that the input in is the last element of
g(i). It remains to show that elems(g(i)) ⊆ vi(m, t, finit , i)
From
3
we
have
that
elems(gmn (i))
⊆
vi(m, tmn , finit , posmn (i)). With 2 we get that elems(g(i)) ⊆
vi(m, tmn , finit , posmn (i)). From the fact that tmn is a subtrace of
t we get that
• elems(inputs(tmn , posmn (i))) ⊆ elems(inputs(t, i)).
• ivi(m, tmn , finit , posmn (i)) ⊆ ivi(m, t, finit , i).
Hence vi(m, tmn , finit , posmn (i)) ⊆ vi(m, t, finit , i). Finally we
get that elems(g(i)) ⊆ vi(m, t, finit , i).
Now suppose that i is a (t, n, o)-commit index. It follows that:
1. t(i) = tno (posno (i)), by definition of posno
2. g(i) = gno (posno (i)), by definition of g
mn
3. The index posno (i) is (tno , gno , fabort
)-valid,
no
no
by Validity(tno , gno , finit
, fabort
)
As before we get that input in is the last element of g(i). It remains
to show that elems(g(i)) ⊆ vi(m, t, finit , i).
From
3
we
have
that
elems(gno (i))
⊆
no
vi(n, tno , finit
, posno (i)). With 2 we get that elems(g(i)) ⊆
mn
vi(n, tno , fabort
, posno (i)).
By definition we have that
mn
vi(n, tno , fabort
, posno (i)) =
mn
ivi(n, tno , fabort
, posno (i)) ∪ elems(inputs(tno , posno (i))).
By lemma 6 we have that
mn
ivi(n, tno , fabort
, posno (i)) =
[
mn
{elems(fabort
(v)) ∪ {in} |
∃j ∈ [1..|tmn |].tmn (j) = swi( , n, in, v)}.
mn
Moreover from Validity(tmn , gmn , finit , fabort
) we have that, for
all indices j ∈ [1..|tmn |]] such that there exists in and v where
tmn (j) = swi( , n, in, v),
mn
elems(fabort
(v)) ∪ {in} ⊆ vi(m, tmn , finit , j).
Hence, by properties of the multiset union, we have that
mn
ivi(n, tno , fabort
, posno (i)) ⊆ vi(m, tmn , finit , posmn (i)).
Hence we have that
mn
vi(n, tno , fabort
, posno (i)) ⊆
vi(m, tmn , finit , posmn (i)) ∪ elems(inputs(tno , posno (i))).
Moreover,
vi(m, tmn , finit , posmn (i)) ∪ elems(inputs(tno , posno (i))) =
vi(m, t, finit , i) ∪ elems(inputs(t, i))
mn
Thus vi(n, tno , fabort
, posno (i))
⊆
vi(m, t, finit , i). With
mn
elems(g(i)) ⊆ vi(n, tno , fabort
, posno (i)) we conclude that
elems(g(i)) ⊆ vi(n, t, finit , i).
By a similar reasoning we can show that any abort index of t is
(t, finit , fabort ))-valid.
Lemma 10. Commit-Order(t, g) holds.
Proof. Consider any two (t, g)-commit histories h1 and h2 .
Suppose that both h1 and h2 appear in tmn . Then by CommitOrder(tmn , gmn ) we have that one is the prefix of the other.
Suppose that both h1 and h2 appear in tno . Then by CommitOrder(tno , gno ) we have that one is the prefix of the other.
Suppose that h1 appears in tmn and that h2 appears in tno .
no
no
By Init-Order(tno , gno , finit
, fabort
) and we know that there exists an index j such that posno (j) is a (tno , n, o)-init index and
no
h0 = finit
(posno (j)) is a prefix of h2 . By lemma 6 we have that
mn
posmn (j) is a (tmn , m, n)-abort index and h0 = fabort
(posmn (j)).
Hence by Abort-Order(tmn , gmn , finit ) we get that h1 is a prefix of
h0 . By transitivity we get that h1 is a prefix of h2 .
The case where h2 appears in tmn and that h1 appears in tno is
very similar.
no
Lemma 11. Init-Order(t, g, finit , fabort
) holds.
Proof. Let hmo
init be the longest common prefix of all (t, m, o, finit )init histories. Observe that hinit is also the longest common prefix
of all (tmn , m, n, finit )-init histories. We first show that hmo
init is
a prefix of any (t, g)-commit history. Consider a (t, g)-commit
history h. Observe that h is either a (tmn , gmn )-commit history
or a (tno , gno )-commit history.
Suppose that h is a (tmn , gmn )-commit history. Since hmo
init is
the longest common prefix of all (tmn , m, n)-init histories we have
mn
that hmo
init is a prefix of h by Init-Order(tmn , gmn , finit , fabort ).
Suppose that h is a (tno , gno )-commit history. Let hno
init be
the longest common prefix of all (tno , n, o)-init histories. By Initno
no
Order(tno , gno , finit
, fabort
) we have that hno
init is a prefix of h.
Moreover by lemma 6 we have that hno
init is the longest common
prefix of all (tmn , m, n)-abort histories. But by
mn
Init-Order(tmn , gmn , finit , fabort
) we have that hmo
init , which is equal
mn
to hinit , is a prefix of any (tmn , m, n)-abort history. Since hno
init is
the longest common prefix of all (tmn , m, n)-abort histories we get
no
that hmo
init is a prefix of hinit .
being
a
prefix
of h and history hmo
History hno
init being a prefix
init
no
mo
of hinit , we get that hinit is a prefix of h.
We can prove by a similar reasoning that hmo
init is a prefix of any
no
(t, m, o, fabort
)-abort history.
no
Lemma 12. Abort-Order(t, g, fabort
) holds.
no
Proof. Consider a (t, g)-commit history h and a (t, m, o, fabort
)no
abort history h0 . Observe that h0 is also a (tno , n, o, fabort
)-abort
history. Observe that h is either a (tmn , gmn )-commit history or a
(tno , gno )-commit history.
Suppose that h is a (tno , gno )-commit history. Since
no
h0 is a (tno , n, o, fabort
)-abort history we get by Abortno
Order(tno , gno , fabort
) that h is a prefix of h0 .
Suppose that h is a (tmn , gmn )-commit history. By Abortmn
Order(tmn , gmn , fabort
) we have that h is a prefix of any
mn
(tmn , m, n, fabort
)-abort history. Hence h is a prefix of the longest
common prefix of all (tmn , m, n, fabort mn)-abort histories. Hence
by lemma 6 we have that h is a prefix of the longest common prefix of all (tno , n, o, fabort mn)-init histories, noted hno
init . Moreover
no
no
by Init-Order(tno , gno , finit
, fabort
) we have that hno
init is a prefix of
no
0
any (t, m, o, fabort
)-abort history. Hence hno
init is a prefix of h .
no
no
In conclusion, h being a prefix of hinit and hinit being a prefix
of h0 , we get that h is a prefix of h0 .
Lemma 13. Suppose that trace tmn is (m, n)-abortably linearizable and that trace tno is (n, o)-abortably linearizable. Then trace
t is (m, o)-abortably linearizable.
Proof. From lemma 7 we get that t is (m, o)-well-formed.
For all interpretation finit of the init actions of t, we get from
lemmas 8, 9, 10, and 12 that there exit a function g and an interpretation fabort of the abort actions of t such that g explains trace t and
such that the following predicates are satisfied:
•
•
•
•
Validity(t, g, finit , fabort )
Commit-Order(t, g)
Init-Order(t, g, finit , fabort )
Abort-Order(t, g, fabort )
We conclude that t is (m, o)-abortably linearizable.