A Formal Analysis of the Deferred Update Technique

A Formal Analysis of the Deferred Update Technique
Fernando Pedone†
Rodrigo Schmidt?,†
?
École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
Phone: +41 21 693 6668
Fax: +41 21 693 6490
E-mail: [email protected]
†
University of Lugano (USI), CH-6904 Lugano, Switzerland
Phone: +41 91 912 4695
Fax: +41 91 913 8536
E-mail: [email protected]
Abstract
The deferred update technique is a widely used approach for building replicated database systems. Its
fame stems from the fact that read-only transactions can execute locally to any single database replica,
providing good performance for workloads where transactions are mostly of this type. In this paper, we
analyze the deferred update technique and show a number of characteristics and limitations common to any
replication protocol based on it. Previous works on this replication method usually start from a protocol
and then argue separately that it is based on the deferred update technique and satisfies serializability.
Differently, ours starts from the abstract definition of a serializable database and gradually changes it into
an abstract deferred update protocol. In doing that, we can formally characterize the deferred update
technique and rigorously prove its properties. Moreover, our specification can be extended to create new
protocols or used to prove existing ones correct.
Keywords: Database replication, deferred update technique, distributed transactions, group communication.
1
Introduction
In the deferred update technique, a number of database replicas are used to implement a single serializable database interface. Its main idea consists in executing all operations of a transaction initially on a
single database. Transactions that do not change the database state can commit locally to the replica they
executed, but other transactions must be globally certified and, if committed, have their update operations (those that change the database state) submitted to all database replicas. This technique is adopted
by a number of database replication protocols in different contexts (e.g., [5, 12, 13, 14, 16]) for its good
performance in general scenarios. The class of deferred update protocols is very heterogeneous, including
algorithms that can optimistically apply updates of uncertified transactions [12], certify transactions locally
to the database that executed them [5], execute all concurrent update transactions at the same database [13],
reorder transactions during certification [14], and even cope with partial database replication [16]. However,
all of them share the same basic structure, giving them some common characteristics and constraints.
Despite its wide use, we are not aware of any work that explored the inherent limitations and characteristics of deferred update database replication. Ours seems to be the first attempt in this direction. We
specify a general abstract deferred update algorithm that embraces all the protocols we know of. This general specification allows us to isolate the properties of the termination protocol necessary to certify update
transactions and propagate them to all database replicas. We show, for example, that such a termination
protocol must totally order globally committed transactions, a rather counter-intuitive result given that
serializability itself allows transactions that operate on different parts of the database state to execute in
any order. For example, according to serializability, if two transactions t1 and t2 , respectively, update data
items x and y inside the database and have no other operations, it is correct to execute either t1 before t2
or t2 before t1 . Therefore, one could expect that some databases would be allowed to execute t1 followed
by t2 while others would execute t2 followed by t1 . In deferred update protocols, however, all databases are
obliged to execute t1 and t2 in exactly the same order, limiting concurrency.
Moreover, previous works considered that databases should satisfy a property called order-preserving
serializability, which says that the commit order corresponds to a correct serialization of the committed
transactions. This bears the question: Is order-preserving serializability necessary for deferred update replication? We show that databases can satisfy a weaker property, namely active order-preserving serializability,
which we introduce. According to this property, found in some multiversion databases, the internal database
serialization must satisfy the commit order only for transactions that change the database state, without
further constraining read-only transactions.
In our approach, we start with a general serializable database and refine it to our abstract deferred
update algorithm. Similarly, one can use our specification to ease designing and proving specific protocols.
One can simply prove a protocol correct by showing that it implements ours by, for example, a refinement
mapping [1]. Our specifications use atomic actions to define safety properties [6, 9]. Due to the space
limitations, we present only high-level specifications in the main text. Complete TLA+ [7] specifications,
which have been model checked for a finite subset of the possible execution scenarios, are given in the
appendix. Moreover, in order to help the reader cope with our notation, a glossary/index also appears at
the appendix.
2
A General Serializable Database
The consistency criterion for transactional systems in general is Serializability, which is defined in terms
of the equivalence between the system’s actual execution and a serial execution of the submitted transactions [3]. Traditional definitions of equivalence between two executions of transactions referred to the
internal scheduling performed by the algorithms and their ordering of conflicting operations. This approach
has led to different notions of equivalence and, therefore, different subclasses of Serializability [11]. In a
1
distributed scenario, however, defining equivalence in terms of the internal execution of the scheduler is not
straightforward since there is usually no central scheduler responsible for ordering transaction operations.
To compare a serial centralized schedule with a general distributed one (e.g., in a replicated database), one
has to create mappings between the operations performed in both schedules and extend the notion of conflicting operations to deal with sets of operations, since a single operation in the serial centralized schedule
may be mapped to a set of operations executed on different sites in the distributed one [3]. This approach
is highly dependent on the implemented protocol and, as explained in [10], does not generalize well.
Differently, we specify a general serializable database system, which responds to requests according to
some internal serial execution of the submitted transactions. A database protocol satisfies serializability
if it implements the general serializable database specification, that is, if its interface changes could be
generated by the general serializable database. This sort of analysis is very common in distributed systems
for its compromise between abstraction and rigorousness [7, 9, 10].
In our specification of serializability, we first define all valid state transitions for normal interactions
between the clients and the database, without caring about the values returned as responses to issued
operations, but rather storing them internally as part of the transaction state. The database is free to abort
a transaction at any time during the execution of its operations. However, a transaction t can only be
committed if its commit request was issued and there exists a sequential execution order for all committed
transactions and t that corresponds to the results these transactions provided. We say the transaction is
decided if the database has aborted or committed it. Operations issued for decided transactions get the
final decision as its result.
We assume each transaction has a unique identifier and let Tid be the set of all identifiers. We call Op the
set of all possible transaction operations, which execute over a database state in set DBState and generate
a result in set Result and a new database state. We abstract the correct execution of an operation by
the predicate CorrectOp(op, res, dbst, newdbst), which is true iff operation op, when executed over database
state dbst, may generate res as the operation result and newdbst as the new database state. In this way, our
specification is completely independent of the allowed operations, coping with operations based on predicates
and even nondeterministic operations. As a simple example, one could define a database with two integer
variables x and y with read and write operations for each variable. In this case, DBstate corresponds to all
possible combinations of values for x and y, Op is the combination of an identifier for x or y with a read
tag or an integer (in case of a write), and Result is the set of integers. CorrectOp(op, res, dbst, newdbst) is
satisfied iff newdbst and res correspond to the results for the read or write operation op applied to dbst.
Two special requests, Commit and Abort, both not present in Op, are used to terminate a transaction,
that is, to force a decision to be taken. Two special responses, Committed and Aborted , not present in
Result, are used to tell the database user if the transaction has been committed or aborted. We also define
Decided to equal the set {Committed , Aborted }, Request to equal Op ∪ {Commit, Abort}, and Reply to equal
Result ∪ Decided .
During a transaction execution, operations are issued and responses are given until the client issues
a Commit or Abort request or the transaction is aborted by the database for some internal reason. We
represent the history of a transaction execution by a sequence of elements in Op × Result, corresponding to
the sequence of operations executed on the transaction’s behalf and their respective results. We say that
a transaction history h is atomically correct with respect to initial database state initst and final database
state finalst iff it satisfies the recursive predicate defined below, where THist is the set of all possible transaction histories and Head and Tail are the usual operators for sequences. Moreover, for notation simplicity,
we identify the first and second elements of a tuple t in Op × Result by t.op and t.res respectively.
∆
CorrectAtomicHist(h ∈ THist, initst ∈ DBState, finalst ∈ DBState) =
if h = hi then initst = finalst
else ∃ist ∈ DBState : CorrectOp(Head (h).op, Head (h).res, initst, ist) ∧
CorrectAtomicHist(Tail (h), ist, finalst)
2
Intuitively, a transaction history is atomically correct with respect to initst and finalst iff there are intermediate database states so that all operations in the history can be executed in their correct order and
generate their correct results.
During the system’s execution, many transactions are started and terminated (possibly concurrently).
We represent the current history of all transactions by a data structure called history vector (set THistVector )
that maps each transaction to its current history. We say that a sequence seq of transactions and a history
vector thist correspond to a correct serialization with respect to initial state initst and final state finalst iff
the recursive predicate below is satisfied, where Seq(S ) represents the set of all finite sequences of elements
in set S .
∆
CorrectSerialization(seq ∈ Seq(Tid ), thist ∈ THistVector , initst ∈ DBState, finalst ∈ DBState) =
if seq = hi then initst = finalst
else ∃ist ∈ DBState : CorrectAtomicHist(thist(Head (seq)), initst, ist) ∧
CorrectSerialization(Tail (seq), thist, ist, finalst)
Intuitively, this predicate is satisfied iff there are intermediate database states so that all transactions in the
sequence can be atomically executed in their correct order generating the correct results for their operations.
We can now easily define a predicate IsSerializable(S , thist, initst) for a finite set of transaction id’s S , history
vector thist, and database state initst, satisfied iff there is a sequence seq containing exactly one copy of
each element in S and a final database state finalst such that CorrectSerialization(seq, thist, initst, finalst)
is satisfied. Predicate IsSerializable indicates when a set of transactions can be serialized in some order,
according to their execution history, so that every operation returns its correct result when the execution is
started in a given database state.
We abstract the interface of our specification by the primitives DBRequest(t, req), which represents the
reception of a request req on behalf of transaction t, and DBResponse(t, rep), which represents the database
response to the last request on behalf of t with reply rep. The only restriction we make with respect to the
database interface is that an operation cannot be submitted on behalf of transaction t if the last operation
submitted for t has not been replied yet, which releases us from the burden of using unique identifiers for
operations in order to match them with their results. Notice that the system still allows a high degree of
concurrency since operations from different transactions can be submitted concurrently.
Our specification is based on the following internal variables:
thist: A history vector, initially mapping each transaction to an empty history.
tdec: A mapping from each transaction to its current decision status: Unknown, Committed , or Aborted .
Initially, it maps each transaction to Unknown.
q: A mapping from each transaction to its current request or NoReq if no request is being executed on
behalf of that transaction. Initially, it maps each transaction to NoReq.
Figure 1 presents the atomic actions of our specification. Action ReceiveReq(t, req) is responsible for
receiving a request on behalf of transaction t. Action ReplyRep(t, rep) replies to a received request. It is
enabled only if the transaction has been decided and the reply is the final decision or the transaction has
not been decided but the current request is an operation (neither Commit nor Abort) and the reply is in
Result. This means that responses given after the transaction has been decided carry the final decision and
requests to commit or abort a transaction are only replied after the transaction has been decided. Action
ReplyReq is responsible for updating the transaction history if the transaction has not been decided. It
does that by appending the pair hq[t], repi to thist[t] (we use ◦ to represent the standard append operation
for sequences). Action DoAbort(t) simply aborts a transaction if it has not been decided yet. Action
DoCommit(t) commits t only if a t’s commit request was issued and the set of all committed transactions
(represented by committedSet) together with t is serializable with respect to the initial database state,
denoted by the constant InitialDBState.
3
ReceiveReq(t ∈ Tid, req ∈ Request)
Enabled iff:
• DBRequest(t, req)
• q[t] = NoReq
Effect:
• q[t] ← req
ReplyReq(t ∈ Tid, rep ∈ Reply)
Enabled iff:
• q[t] ∈ Request
• if tdec[t] ∈ Decided
then rep = tdec[t]
else q[t] ∈ Op ∧ rep ∈ Result
Effect:
• DBResponse(t, rep)
• q[t] ← NoReq
• if tdec[t] ∈
/ Decided then
thist[t] ← thist[t] ◦ hq[t], repi
DoAbort(t ∈ Tid)
Enabled iff:
• tdec[t] ∈
/ Decided
Effect:
• tdec[t] ← Aborted
DoCommit(t ∈ Tid)
Enabled iff:
• tdec[t] ∈
/ Decided
• q[t] = Commit
• IsSerializable(committedSet ∪ {t}, thist, InitialDBState)
Effect:
• tdec[t] ← Committed
Figure 1: The atomic actions allowed in our specification of a serializable database.
3
3.1
The Deferred Update Technique
Preliminaries
As mentioned before, deferred update algorithms initially execute transactions on a single replica. Transactions that do not change the database state (hereinafter called passive) may commit locally only, but active
transactions (as opposed to passive ones) must be globally certified and, if committed, have their updates
propagated to all replicas (i.e., operations that make them active). In order to correctly characterize the
technique, we need to formalize the concepts of active and passive operations and transactions. An operation
op is passive iff its execution never changes the database state, that is, iff the following condition is satisfied.
∀st1, st2 ∈ DBState, rep ∈ Result : CorrectOp(op, rep, st1, st2) ⇒ st1 = st2
(1)
An operation that is not passive is called active. Similarly, we define a transaction history h to be passive
iff the condition below is satisfied.
∀st1, st2 ∈ DBState : CorrectAtomicHist(h, st1, st2) ⇒ st1 = st2
(2)
Notice that a transaction history composed of passive operations is obviously passive, but the converse is
not true. A transaction that adds and subtracts 1 to a variable is passive even though its operations are
active.
The deferred update technique requires some extra assumptions about the system. Operations, for
example, cannot generate new database states nondeterministically for this could lead different replicas to
inconsistent states. The following assumption makes sure that operations do not change the database state
nondeterministically but still allows nondeterministic results to be provided to the database user.
Assumption 1 (State-deterministic operations) For every operation op, and database states st and
st1, if there is a result res1 such that CorrectOp(op, res1, st, st1), then there is no result res2 and database
state st2 such that st1 6= st2 ∧ CorrectOp(op, res2, st, st2).
As for the database replicas, one may wrongly think that simply assuming that they are serializable is
enough to ensure global serializability. However, two replicas might serialize their transactions (local and
global) differently, making the distributed execution non-serializable. Previous works on deferred update
protocols assumed the notion of order-preserving serializability, originally introduced by Beeri et al. in the
4
context of nested transactions [2]. In our model, order-preserving serializability ensures that the transactions’
commit order represents a correct execution sequence, a condition satisfied by two-phase locking, for example.
We show that this assumption can be relaxed since deferred update protocols can work with the weaker
notion of active order-preserving serializability we introduce. Active order-preserving serializability ensures
that there is an execution sequence of the committed transactions that generates their correct outputs and
respects the commit order of all active transactions only. This notion is weaker than strict order-preserving
serializability in that passive transactions do not have to provide results based on the latest committed
state. Some multiversion concurrency control mechanisms [3] are active order-preserving but not strict
order-preserving. Specifications of order-preserving and active order-preserving serializability can be derived
from our specification in Figure 1 by just adding a variable serialSeq, initially equal to the empty sequence,
and changing the DoCommit action. We show the required changes in Figure 2 below. The strict case (a) is
simple and only requires that serialSeq ◦ t be a correct sequential execution of all committed transactions.
The action automatically extends serialSeq with t. The active case (b) is a little more complicated to
explain and requires some extra notation. Let Perm(S ) be the set of all permutations of elements in finite
set S (all the possible orderings of elements in S ), and let ActiveExtension(seq, t) be seq if thist[t] is a
passive history or seq ◦ t otherwise. The action is enabled only if there exists a sequence containing all
committed transactions such that it represents a correct sequential execution and ActiveExtension(seq, t) is
a subsequence of it.1 In this action, serialSeq is extended with t only if t is an active transaction.
DoCommit(t ∈ Tid)
Enabled iff:
• tdec[t] ∈
/ Decided
• q[t] = Commit
• ∃st ∈ DBState :
CorrectSerialization(serialSeq ◦ t,
thist, InitialDBState, st)
Effect:
• tdec[t] ← Committed
• serialSeq ← serialSeq ◦ t
DoCommit(t ∈ Tid)
Enabled iff:
• tdec[t] ∈
/ Decided
• q[t] = Commit
• ∃seq ∈ Perm(committedSet ∪ {t}), st ∈ DBState :
CorrectSerialization(seq, thist, InitialDBState, st) ∧
ActiveExtension(serialSeq, t) is a subsequence of seq
Effect:
• tdec[t] ← Committed
• serialSeq ← ActiveExtension(serialSeq, t)
(a)
(b)
Figure 2: DoCommit action for (a) strict and (b) active order-preserving serializability.
3.2
The abstract algorithm
We now present the specification of our abstract deferred update algorithm. It generalizes the ideas of a
handful of deferred update protocols and makes it easy to think about sufficient and necessary requirements
for them to work correctly. Our specification assumes a set Database of active order-preserving serializable
databases, and we use the notation DB (d )!Primitive( ) to represent the execution of interface primitive
Primitive (either DBRequest or DBResponse) of database d . Since transactions must initially execute on a
single replica only, we let DBof (t) represent the database responsible for the initial execution of transaction
t. One important remark is that these internal databases receive transactions whose id set is Tid × N, where
N is the set of natural numbers. This is done so because a single transaction in the system might have to
be submitted multiple times to a database replica in order to ensure that it commits locally. Recall that
our definition of active order-preserving serializability does not force transactions to commit. Therefore,
transactions that have been committed by the algorithm and submitted to the database replicas are not
guaranteed to commit unless further assumptions are made. The only way around this is to submit these
transactions multiple times (with different versions) until they commit. Besides the set of databases, we
1
sequence subseq is a subsequence of sequence seq iff it can be obtained by removing zero or more elements of seq.
5
assume a concurrent termination protocol, fully explained in the next section, responsible for committing
active transactions and propagating their active operations to all databases.
The algorithm we present in the following orchestrates the interactions between the global database
interface and the individual internal databases. It is mainly based on the following internal variables:
thist, q: Essentially the same variables as in the specification of a serializable database.
dreq: A mapping from each transaction t to the operation that is currently being submitted for execution
on DBof (t), or NoReq if no operation is being submitted. This variable is used to implement the
asynchronous communication that tells DBof (t) to execute an operation of t. Initially all transactions
are mapped to NoReq.
dreply: Similar to dreq, but mapping each transaction t to the last response given by DBof (t).
dcnt: A mapping from each database d and transaction t to an integer representing the number of operations
that executed on d for t. It counts the number of operations DBof (t) has executed for t during t’s initial
execution and, if t is active, the number of active operations the other databases (or DBof (t) if it does
not manage to commit t directly after it is globally committed) have executed for t after it is globally
committed. It is initially 0 for all databases and transactions.
pdec: A mapping like tdec in the specification of a serializable database, used to tell whether the transaction
was decided without being proposed for global termination either because it was prematurely aborted
during its initial execution or because it was a passive transaction that committed on its execution
database.
vers: A mapping from each database d and transaction t to an integer representing the current version of
t being submitted to d . It is initially 0 for all databases and transactions.
dcom: A mapping from each database d and transaction t to a boolean telling whether t has been committed
on d . It is initially false for all databases and transactions.
When a Commit request is issued for a transaction whose history has been active, a decision must be
taken on whether committing or aborting this transaction with respect to active transactions executed on
other databases. In our specification, this is done separately by a termination protocol. The reason why we
isolated this part of the specification is twofold. First, the nature of the rest of the algorithm is essentially
local to the database that is executing a given transaction and it seems interesting to separate it from the
part of the specification responsible for synchronizing active transactions executed on different databases.
Second, the properties of the termination protocol, when isolated, can be related to properties of other
agreement problems in distributed computing, which helps understand and solve it. The interface variables
of the termination protocol used in our general specification are the following:
proposed This is an input variable that keeps the set of all proposed transactions. It is initially empty.
gdec An output variable that keeps a mapping like pdec above, but managed by the termination protocol
only. It tells whether a proposed transaction has already been decided or not.
learnedSeq Another output variable mapping each database d to a sequence of globally committed active
transactions. This sequence tells database d the order in which these active transactions must be committed to make the whole execution serializable. Initially, it maps each database to the empty sequence.
Our specification implements a serializable database, which can be proved by a refinement mapping from
its internal variables to those of a general serializable database. Actually, the only internal variable of our
6
specification of a serializable database not directly implemented in our abstract algorithm is tdec, given by
joining the values of pdec and gdec in the following way:
∆
tdec[t] = if t ∈ proposed then gdec[t] else pdec[t]
(3)
For simplicity, we use this definition of tdec in some parts of our specification. Another extra definition
used in our algorithm is the ActHist(t) operator that returns the subsequence of thist[t] containing all its
active operations. The atomic actions of our abstract algorithm, disconsidering the internal actions of the
individual databases and the termination protocol, are shown in Figure 3.
Action ReceiveReq treats the receipt of a transaction request. If the transaction responsible for the
operation has been decided (either for pdec or gdec according to the definition of tdec given above), then
it only changes q[t]. Otherwise, it either proposes t for the termination protocol or sends the request to
DBof (t) through variable dreq[t]. Our complete specification allows passive transactions to be submitted for
the termination protocol too and this is why we wrote “is active” between quotation marks. We allow that
because sometimes it might not be possible to identify all passive transactions. Therefore, our specification
also embraces algorithms that identify only a subset of the passive transactions as passive and conservatively
propose the others for global termination.
Action ReplyReq replies a transaction request. It is very similar to the original ReplyReq action of our
serializable database specification. The small differences only make sure that the value replied for a normal
operation comes from DBof (t) and, in this case, dreq[t] is set back to NoReq to wait for the next operation.
Actions PrematureAbort and PassiveCommit abort or commit a transaction that has not been proposed for
global termination. It can only be committed if a commit request was correctly replied by DBof (t), which
can only happen if t has a passive history.
Action DBReq submits a request to a database. There are three conditions that enable this action. The
first one represents a normal request during the transaction’s initial execution or a commit request for a
passive transaction. The second one represents an operation request for an active transaction that has been
proposed to the termination protocol. Notice that operations of proposed transactions can be optimistically
submitted to the database before they commit or appear in some learnedSeq. Some algorithms do that to
save processing time after the transaction is committed, reducing the latency for propagating transactions
to the replicas. The third condition that enables this action represents a commit request for a transaction
that has been committed by the termination protocol. For that to happen, the transaction must be present
in learnedSeq[d ] and all transactions previous to it in the sequence must have been committed on that
database. Moreover, all active operations of that transaction must have been applied to the database
already, which is true if the database is the one originally responsible for the transaction and it has not
changed the transaction version or the operations counter dcnt[d ][t] equals the number of active operations
in the transaction history. Recall that, by the definition of a serializable database, a request can only be
submitted if there is no pending request for the same transaction. This is actually an implicit pre-condition
for DBReq given by the specification of a serializable database.
Action DBRep treats the receipt of a response coming from a database. If the database is the one
responsible for initially executing the transaction, it sets drepy[t] to the value returned. If the transaction is
aborted but it has been proposed for global termination, it changes the version of that transaction on that
database and sets the operation counter to zero so that the transaction’s operations can be resubmitted for
its new version; otherwise, it just increments the operation counter and sets dcom accordingly.
3.3
The termination protocol and its implications
The termination protocol gives a final decision to proposed transactions and, if they are committed, forwards
them to the database replicas. It “reads” from variables proposed and thist (it relies on the transaction
history to decide on whether commit or abort it), and changes variables gdec and learnedSeq. As explained
7
ReceiveReq(t ∈ Tid, req ∈ Request)
Enabled iff:
• DBRequest(t, req)
• q[t] = NoReq
Effect:
• q[t] ← req
• if tdec[t] ∈
/ Decided then
if req = Commit ∧ thist[t] “is active”
then proposed ← proposed ∪ {t}
else dreq[t] ← req
DBReq(d ∈ Database, t ∈ Tid, req ∈ Request)
Enabled iff any of the conditions below hold.
Condition 1: (external operation request)
• d = DBof (t)
• dreq[t] = req
• dcnt[d][t] = Len(thist[t])
Condition 2: (operation after termination)
• t ∈ proposed
• dcnt[d][t] < Len(ActHist(t))
• req = ActHist(t)[dcnt[d][t] + 1].op
ReplyReq(t ∈ Tid, rep ∈ Reply)
Enabled iff:
• q[t] ∈ Request
• if tdec[t] ∈ Decided
then rep = tdec[t]
else q[t] ∈ Op ∧ rep ∈ Result ∧
dcnt[DBof (t)][t] > Len(thist[t]) ∧
rep = dreply[t]
Effect:
• DBResponse(t, rep)
• q[t] ← NoReq
• if tdec[t] ∈
/ Decided then
– thist[t] ← thist[t] ◦ hq[t], repi
– dreq[t] ← NoReq
PrematureAbort(t ∈ Tid)
Enabled iff:
• t ∈
/ proposed
• pdec[t] ∈
/ Decided
Effect:
• pdec[t] ← Aborted
PassiveCommit(t ∈ Tid)
Enabled iff:
• t ∈
/ proposed
• pdec[t] ∈
/ Decided
• dreply[t] = Committed
Effect:
• pdec[t] ← Committed
Condition 3: (commit after termination)
• req = Commit
• ∃i ∈ 1..Len(learnedSeq[d]) :
learnedSeq[d][i] = t ∧
∀j ∈ 1..i : dcom[d][learnedSeq[d][j ]]
• either d = DBof (t) ∧ vers[d][t] = 0
or dcnt[d][t] = Len(ActHist(t))
Effect:
• DB (d)!DBRequest(ht, vers[d][t]i, req)
DBRep(d ∈ Database, t ∈ Tid, rep ∈ Reply)
Enabled iff:
• DB (d)!DBResponse(ht, vers[d][t]i, rep)
Effect:
• if d = DBof (t) then dreply[t] ← rep
• if rep = Aborted ∧ t ∈ proposed then
– vers[d][t] ← vers[d][t] + 1
– dcnt[d][t] ← 0
else
– dcnt[d][t] ← dcnt[d][t] + 1
– dcom[d][t] ← rep = Committed
Figure 3: The atomic actions allowed in our specification of a serializable database.
before, variable gdec simply assigns the final decision to a transaction; learnedSeq, however, represents the
order in which each database should submit the active transactions committed by the termination protocol.
These are the three safety properties the termination protocol must satisfy in order to ensure serializability:
Nontriviality For any transaction t, t is decided (gdec[t] ∈ Decided ) only if it was proposed.
Stability For any transaction t, if t is decided at any time, then its decision does not change at any later
time; and, for any database d , the value of learnedSeq[d ] at any time is a prefix of its value at all later
times.
Consistency There exists a sequence seq containing exactly one copy of every committed transaction
(according to gdec) and a database state st such that CorrectSerialization(seq, thist, InitialDBState, st)
is true and, for every database d , learnedSeq[d ] is a prefix of seq.
The following theorem asserts that our complete abstract specification of a a deferred update protocol is
serializable. This result shows that every protocol that implements our specification automatically satisfies
serializability. The proofs of our theorems are given in the appendix.
Theorem 1 Our abstract deferred update algorithm implements the specification of a serializable database
given in Section 2.
8
This theorem results in an interesting corollary, stated below. It shows that indeed databases are not
required to be strict order-preserving serializable, an assumption that can be relaxed to our weaker definition
of active order-preserving serializability.
Corollary 1 Serializability is guaranteed by our specification if databases are active order-preserving serializable instead of strict order-preserving serializable.
The three aforementioned safety properties are not strictly necessary to ensure serializability. Nontriviality can be relaxed so that non-proposed transactions may be aborted before they are proposed and
Serializability is still guaranteed. However, we see no practical use of this since our algorithm already allows
a transaction to be aborted at any point of the execution before it is proposed. Committing a transaction
before proposing depends on making sure that the history of the transaction will not change and, in case it is
active, on whether there are alternative sequences that ensure the Consistency properties if the transaction
is committed or not, a rather complicated condition to be used in practice. Stability can be relaxed by
allowing changes on sufixes of learnedSeq[d ] that have not been submitted to the database yet. However,
keeping knowledge of what part of the sequence has already been submitted to the database and possibly
changing the rest of it is equivalent to implementing our abstract algorithm with learnedSeq[d ] being the
exact sequence locally submitted to the database. As a result, we see no practical advantage in relaxing
Stability.
Consistency can be relaxed in a more complicated way. In fact, the different sequences learnedSeq[d ] can
differ, as long as the set of intermediate states they generate (states in between transactions) are a subset
of the intermmediate states generated by a sequence seq corresponding to a permutation of all globally
committed transactions that satisfies CorrrectSerialization(seq, thist, InitialDBState, st) for some state st.
Ensuring this property without forcing the learnedSeq sequences to prefix a common sequence is hard and
may lead to situations in which committed transactions cannot be added to a sequence learnedSeq[d ] for
they would generate states that are not present in any sequence that could satisfy our consistency criterion.
One might think, for example, that the consistency property can be relaxed to allow commuting transactions that are not related (i.e., operate on disjunct parts of the database state) in the sequences learnedSeq[d ].
For that, however, we have to make some assumptions about the database state in order to define what we
mean by disjunct parts of the database state. For simplicity, let us assume our database state is a mapping
from objects in a set Object to values in a set Value and operations can read or write a single object value.
We define the objects of a transaction history h, represented by Obj (h), to be the set of objects the operations in h read or write. A consistency property based on the commutativity of transactions that have no
intersecting object sets can be intuitively defined as follows:
Alternative Consistency There exists a sequence seq containing exactly one copy of every committed transaction (according to gdec) and a database state st such that
CorrectSerialization(seq, thist, InitialDBState, st) is true and, for every database d , learnedSeq[d ]
contains exactly one copy of some committed transactions (according to gdec) and, for every transaction
t in learnedSeq[d ], the following conditions are satisfied:
• Every transaction t 0 that precedes t in seq and shares some objects with t also precedes t in
learnedSeq[d ], and
• Every transaction t 0 that precedes t in learnedSeq[d ] either precedes t in seq or shares no objects
with t.
Although this new consistency condition seems a little complicated, it is weaker than our original property
for it allows the sequences learnedSeq[d ] differ in their order for transactions that operate on different
objects. The following theorem shows that this property is not enough to ensure Serializability in our
abstract algorithm.
9
Theorem 2 Our abstract deferred update algorithm with the Consistency property for termination changed
for the Alternative Consistency property defined above does not implement the specification of a serializable
database given in Section 2.
This result basically means that one cannot profit much from using Generic Broadcast [15] algorithms to
propagate committed transactions. Our properties as originally defined seem to be the weakest practical
conditions for ensuring Serializability in deferred update protocols. In fact, we are not aware of any deferred
update replication algorithm whose termination protocol does not satisfy the three properties above.
So far, we have not defined any liveness property for the termination protocol. Although we do not
want to force protocols to commit transactions in any situation (since this might rule out some deferred
update algorithms that conservatively abort transactions), we think that a termination protocol that does
not update the sequences learnedSeq[d ] eventually, after having committed a transaction, is completely
useless. Therefore, we add the following liveness property to our specification of the termination protocol:
Liveness If t is committed at a given time, then learnedSeq[d ] eventually contains t.
As it happens with agreement problems like Consensus, this property must be revisited in failure-prone
scenarios, since it cannot be guaranteed for databases that have crashed. Independently of that, one can
easily spot some similarities between the properties we have defined and those of Sequence Agreement as
explained in [8]. Briefly, in the sequence agreement problem, a set of processes agree on an ever-growing
sequence of commands, built out of proposed ones. The problem is specified in terms of proposer processes
that propose commands to be learned by learner processes, where learned [l ] represents the sequence of
commands learned by learner l . Sequence Agreement is defined by the following properties:
Nontriviality For any learner l , the value of learned [l ] is always a sequence of proposed commands.
Stability For any learner l , the value of learned [l ] at any time is a prefix of its value at any later time.
Consistency For any learners l1 and l2 , it is always the case that one of the sequences learned [l1 ] and
learned [l2 ] is a prefix of the other.
Liveness If command V has been proposed, then eventually the sequence learned [l ] will contain V as an
element.
This problem is a sequence-based specification of the celebrated atomic broadcast problem [4]. The exact
relation between the termination protocol and Sequence Agreement is given by the following theorem.
Theorem 3 The four properties Nontriviality, Stability, Consistency, and Liveness above satisfy the safety
and liveness properties of Sequence Agreement for transactions that commit.
One possible way of reading this theorem is that any implementation of the termination protocol is free
to abort transactions, but it must implement Sequence Agreement for the transactions it commits. As a
consequence, any lower bound or impossibility result for atomic broadcast and consensus applies to the
termination protocol.
4
Conclusion
In this paper, we have formalized the deferred update technique for database replication and stated some
intrinsic characteristics and limitations of it. Previous works have only considered new algorithms, with
independent specifications, analysis, and correctness proofs. To the best of our knowledge, our work is
first effort to formally characterize this family of algorithms and establish its requirements. Our general
10
abstraction can be used to derive other general limitation results as well as to create new algorithms and
prove existing ones correct. Some algorithms can be easily proved correct by a refinement mapping to ours.
Others may require an additional effort due to the extra assumptions they make, but the task seems still
easier than with previous formalisms. In our personal experience, we have successfully used our abstraction
to obtain interesting protocols and correctness proofs, which will appear elsewhere.
Finally, to increase the confidence in our results, we have model checked our specifications using the
TLA+ model checker (TLC). Our specifications have been extensively checked for consistency problems
besides type safety and deadlocks. For that we used a database containing a small vector of integers with
operations that could read and write the vector’s elements. Our model considered a limited number of
transactions (up to 10), each one containing a few operations. The automatic checking confirmed our
results and allowed us to find a number of small mistakes in the TLA+ translation of our ideas. We strongly
believe these specifications can be extended or directly used in future works in this area.
References
[1] M. Abadi and L. Lamport. The existence of refinement mappings. Theoretical Computer Science,
82(2):253–284, May 1991.
[2] C. Beeri, P. A. Bernstein, and N. Goodman. A model for concurrency in nested transaction systems.
Journal of the ACM, 36(2):230–269, Apr. 1989.
[3] P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems.
Addison-Wesley, 1987.
[4] V. Hadzilacos and S. Toueg. Fault-tolerant broadcasts and related problems, pages 97–145. ACM
Press/Addison-Wesley Publishing Co., New York, NY, USA, 2 edition, 1993.
[5] B. Kemme and G. Alonso. A new approach to developing and implementing eager database replication
protocols. ACM Transactions on Database Systems, 25(3):333–379, 2000.
[6] L. Lamport. A simple approach to specifying concurrent systems. Communications of the ACM,
32(1):32–45, Jan. 1989.
[7] L. Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers.
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002.
[8] L. Lamport. Generalized consensus and paxos. Technical Report MSR-TR-2005-33, Microsoft Research,
2004.
[9] N. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, Inc., San Mateo, CA, USA, 1996.
[10] N. Lynch, M. Merrit, W. Weihl, and A. Fekete. Atomic Transactions. Morgan Kaufmann Publishers,
Inc., San Mateo, CA, USA, 1994.
[11] C. H. Papadimitriou. The serializability of concurrent database updates. Journal of the ACM,
26(4):631–653, Oct. 1979.
[12] M. Patino-Martı́nez, R. Jiménez-Peris, B. Kemme, and G. Alonso. Scalable replication in database
clusters. In Proc. of the 14th International Symposium on Distributed Computing (DISC’00), 2000.
[13] F. Pedone and S. Frølund. Pronto: A fast failover protocol for off-the-shelf commercial databases. In
Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems (SRDS’2000), pages 176–185,
2000.
[14] F. Pedone, R. Guerraoui, and A. Schiper. The database state machine approach. Journal of Distributed
and Parallel Databases and Technology, 14(1):71–98, 2003.
[15] F. Pedone and A. Schiper. Handling message semantics with generic broadcast protocols. Distributed
Computing, 15(2):97–107, April 2002.
[16] N. Schiper, R. Schmidt, and F. Pedone. Optimistic algorithms for partial database replication. In
Proceedings of the 10th International Conference on Principles of Distributed Systems (OPODIS’2006),
pages 81–93, 2006.
11
Appendix – Glossary
Abort: An abort request, 2
Aborted : Response given when the transaction has been aborted, 2
ActHist(t): subsequence of thist[t] containing all its active operations, 7
ActiveExtension: Operator that extends a sequence with a transaction iff it is active, 5
Commit: A commit request, 2
Committed : Response given when the transaction has been committed, 2
CorrectAtomicHist: Predicate that tells if a transaction history is atomically correct, 2
CorrectOp: predicate representing the correct execution of an operation, 2
CorrectSerialization: Predicate that tells if a sequence of transactions is serializable, 3
DB (d )!Primitive( ): Interface primitive Primitive of database d , 5
DBRequest/DBResponse: Database interface primitives, 3
DBState: Set of all possible database states, 2
DBof (t):database responsible for executing transaction t, 5
Decided : Set equal to {Committed , Aborted }, 2
InitialDBState: The initial database state, 3
IsSerializable: Predicate that tells if a set of transactions can be correctly serialized, 3
NoReq: A non-valid request, 3
Op: Set of all possible transaction operations, 2
Perm(S ): All permutations of elements in S , 5
Reply: Set equal to Result ∪ Decided , 2
Request: Set equal to Op ∪ {Commit, Abort}, 2
Result: Set of all possible operation results, 2
THist: Set of all possible transaction histories, 2
THistVector : Set of all possible history vectors, 3
Tid : Set of all transaction identifiers, 2
◦: append operator for sequences, 3
active operation or transaction: One that might change the database state, 4
decided transaction: A transaction that has been internally committed or aborted, 2
history vector: Data structure that maps each transaction to its current history, 3
passive operation or transaction: One that does not change the database state, 4
12
A
Proof of Theorem 1
A.1
The main invariants
In order to prove Theorem 1, we state three invariants satisfied by our abstract algorithm. The invariants
are very intuitive, given the algorithm’s expected behavior. However, a rigorous proof that the algorithm
actually satisfies them is given in Section A.3. Before presenting these invariants, though, we introduce some
auxiliary notation. We let Commit > tedAt(d ) be the set of all transactions that have been committed at
database d under any version number. That is,
∆
CommittedAt(d ) = {t ∈ Tid : ∃v ∈ N : DB (d )!tdec[ht, v i] = Committed }.
Moreover, we let NVserialSeq(d ) be the standard projection of sequence DB (d )!serialSeq without the transactions’ version numbers.
Our first invariant relates the databases’ internal states to the global variables thist and learnedSeq. It is
mainly based on the fact that databases are active order-preserving serializable and transactions proposed to
the termination protocol (which includes all active ones) have their Commit requests submitted to database
d according to the order specified by learnedSeq[d ].
Database Invariant For every database d , there exists a sequence seq ∈ Perm(CommittedAt(d )), and a
database state st such that all conditions below hold:
1. CorrectSerialization(seq, thist, InitialDBState, st),
2. NVserialSeq(d ) is the subsequence of seq containing all its active transactions, and
3. NVserialSeq(d ) is the subsequence of a prefix of learnedSeq[d ] that contains all its active transactions.
Besides the invariant above, our proof uses the following two auxiliary invariants.
tdec Invariant For every transaction t, if t ∈
/ proposed and pdec[t] = Committed , then thist[t] is passive
and t ∈ CommittedAt(DBof (t)).
dreply Invariant For every transaction t, if dreply[t] = Committed and t ∈
/ proposed , then thist[t] is passive
and t belongs to CommittedAt(DBof (t)).
A.2
The theorem proof
Theorem 1 Our abstract deferred update algorithm implements the specification of a serializable database
given in Section 2.
Proof: The proof is by a refinement mapping where thist and q are implemented by the variables with the
same name and tdec is implemented according to the definition in terms of proposed , pdec, and gdec given
in the explanation of our abstract deferred update algorithm. Below, we show that each action executed by
the abstract algorithms implements an action of our serializable database specification.
1. Action ReceiveReq implements the action with the same name.
Proof sketch: Pre- and post-conditions on variables q and thist are exactly the same. The action
may change variable proposed , influencing tdec. However, t is proposed only if tdec[t] ∈
/ Decided and the
Nontriviality property of the termination protocol ensures that tdec remains the same.
2. Action ReplyReq implements the action with the same name.
Proof sketch: The actions’ pre- and post-conditions are obviously stricter than those of the original
action.
13
3. Action PrematureAbort implements action DoAbort
Proof sketch: The action changes pdec iff its changes reflect the changes on tdec performed by action
DoAbort.
We now skip action PassiveCommit for it deserves a slightly more complicated analysis. It is explained
right after the simple actions below.
4. Actions DBReq, DBRep, and internal actions performed by any database represent stuttering steps in
our specification of a serializable database
Proof sketch: Such actions do not change variables q, thist, proposed , pdec and gdec, not influencing
the mapping.
5. Changes on learnedSeq performed by the termination protocol also implement stuttering steps
Proof sketch: Such changes have no influence on variables q, thist, proposed , pdec and gdec.
The two cases below deserve a special analysis and a higher degree of rigorousness.
6. Action PassiveCommit implements action DoCommit
Proof: Let committedSet be the set of all committed transactions (according to the definition of tdec in
terms of pdec and gdec). We must show that IsSerializable(committedSet ∪ {t}, thist, InitialDBState) is
true before the execution of PassiveCommit. We prove that in the following proof steps.
6.1. Choose a sequence gseq that contains exactly one copy of every transaction mapped to Committed
in gdec and satisfies the two conditions below
1. ∃st ∈ DBState : CorrectSerialization(gseq, thist, InitialDBState, st)
2. ∀d ∈ Database : learnedSeq[d ] is a prefix of gseq
Proof: This sequence exists for the Consistency property of the termination protocol.
Let: subgseq be the subsequence of gseq containing all its active transactions.
6.2. For every database d , NVSerialSeq(d ) is a prefix of subgseq.
Proof: By step 6.1 and the third item of the Database Invariant.
6.3. For any transaction t contained by subgseq, let stgen(t) be the unique database state st such that
CorrectSerialization(preft, thist, InitialDBState, st) is satisfied for the prefix preft of subgseq limited
by (and containing) t.
Proof: Such state exists by the step PC 1 and the definition of subgseq, and it is unique by Assumption 1 (State-deterministic Operations).
6.4. For every passive transaction t that belongs to CommittedAt(DBof (t)), that is, every transaction
committed with some version at its delegate database, either one of the two conditions below is
satisfied:
• CorrectAtomicHist(thist[t], InitialDBState, InitialDBSTAte), or
• ∃tw ∈ Tid : tw appears in subgseq and CorrectAtomicHist(thist[t], stget(tw ), stget(tw )).
Proof: By the Database Invariant and the fact that t belongs to CommittedAt(DBof (t)), there
exists a sequence seq such that:
1. seq contains t,
2. there exists a database state st such that CorrectSerialization(seq, thist, InitialDBState, st),
3. NVserialSeq(DBof (t)) is the subsequence of seq containing all its active transactions,
4. NVserialSeq(DBof (t)) is the subsequence of a prefix of learnedSeq[d ] containing all its active
transactions.
Let strippedseq be the subsequence of seq containing all its active transactions and t, only. Since
only passive transactions are taken out, by the definition of a correct serialization, strippedseq also
represents a correct serialization with respect to thist, InitialDBState, and st. Now take the longest
prefix of strippedseq that does not contain t and let us call it strippedpref . If strippedpref is empty,
then the definition of a correct serialization and the fact that t is passive imply that thist[t] is
14
atomically correct with respect to InitialDBState (first condition of step 6.4). Otherwise, strippedpref
is a prefix of subgseq, and Assumption 1 (State-deterministic Operations) implies that thist[t] is
atomically correct with respect to stgen(tw ), where tw is the transaction immediately before t in
strippedseq, which satisfies the second condition of step 6.4.
6.5. Q.E.D.
Proof: By the Consistency and Nontriviality properties of the termination protocol, gseq contains
every transaction t such that t is proposed and gdeq[t] equals Committed . We first extend gseq
with all other committed transactions. By our mapping of tdec, these are the transactions not
in proposed but mapped to Committed by pdec. However, the tdec Invariant tells us that these
transactions are passive and internally committed at their delegate databases. Step 6.4 tells us
that they can be inserted at some position of gseq and still generate a correct serialization, by
the definition of CorrectSerialization. Last, the dreply Invariant and the pre-condition of action
PassiveCommit also imply that t is passive and internally committed at DBof (t). Therefore, by
step 6.4, t can also be inserted at some position of gseq and generate a correct serialization with
initial state InitialDBState.
7. Changes on gdec[t] performed by the termination protocol implement either DoCommit(t) or DoAbort(t)
Proof sketch: Here we assume the termination protocol changes only one entry of gdec at a time. An
implementation that does not do that can be easily proved equivalent to this behavior by the creation
of “dummy” states that change one entry at a time with the introduction of prophecy variables [1]. If
gdec[t] is changed to Aborted , the Nontriviality and Stability properties automatically imply the pre- and
post-conditions of DoAbort(t). Otherwise, we must follow basically the same steps as in step 6. The only
two (small) differences are the following:
• Step 1 should be based on the consistency property guaranteed after gdec is changed, producing a
sequence gseq that already contains t.
• The Q.E.D. step does not have to add t to the built sequence since it is originally in the gseq
sequence initially created.
A.3
Proving the basic invariants
In order to prove the basic invariants, we have to define a number of auxiliary invariants. We divided the
auxiliary invariants into two types: transaction invariants and database-transaction invariants. The first
group refers to invariants that are based on transactions only. The second group refers to invariants that
relate transactions and databases.
The only extra notation we introduce in these auxiliary invarints is the definition of an operator
Substr (seq, begin, end ) for a sequence seq and naturals begin and end , used in invariant DTI5. This operator returns the substring of seq from index begin until index end . If end < begin, it is assumed to return
an empty sequence.
Transaction Invariants (TI)
For every transaction t:
1. (tdec[t] ∈
/ Decided ∧ q[t] ∈ Op) ⇒
(a) ∀d ∈ Database : t ∈
/ CommittedAt(d ) and
(b) t ∈
/ proposed
2. (tdec[t] ∈
/ Decided ∧ q[t] = NoReq) ⇒
(a) ∀d ∈ Database : t ∈
/ CommittedAt(d ),
(b) t ∈
/ proposed ,
15
(c) vers[DBof (t)][t] = 0,
(d) thist[t] = DB (DBof (t))!thist[ht, 0i],
(e) dcnt[DBof (t)][t] = Len(thist[t]),
(f) ∀d ∈ Database : DB (d )!q[ht, vers[d ][t]i] = NoReq, and
(g) dreq[t] = NoReq
3. dcnt[DBof (t)][t] > Len(thist[t]) ∧ t ∈
/ proposed ⇒
(a) ∀d ∈ Database : DB (d )!q[ht, vers[d ][t]i] = NoReq,
(b) vers[DBof (t)][t] = 0,
(c) thist[t] ◦ hdreq[t], dreply[t]i = DB (DBof (t))!thist[ht, 0i], and
(d) dcnt[DBof (t)][t] = Len(thist[t]) + 1
4. t ∈ proposed ⇒
(a) thist[t] = DB (DBof (t))!thist[ht, 0i],
(b) tdec[t] ∈ Decided ∨ q[t] = Commit,
(c) dcnt[DBof (t)][t] ≥ Len(thist[t]) ∨ vers[DBof (t)][t] > 0, and
(d) dreq[t] = NoReq
5. dreq[t] ∈ Request ∧ dcnt[d ][t] = Len(thist[t]) ⇒
(a) t ∈
/ proposed ,
(b) vers[DBof (t)][t] = 0, and
(c) thist[t] = DB (DBof (t))!thist[ht, 0i]
6. (tdec[t] ∈
/ Decided ∧ t ∈
/ proposed ) ⇒ dreq[t] = q[t]
7. dreq[t] = Commit ⇒
(a) thist[t] is passive and
(b) t ∈
/ proposed
Database-Transaction Invariants (DTI)
For every database d and transaction t:
1. DB (d )!q[ht, v i] 6= NoReq ⇒ v = vers[d ][t]
2. ∀v 6= vers[d ][t] : DB (d )!tdec[ht, v i] 6= Committed
3. If DB (d )!q[ht, vers[d ][t]i] = Commit, then either
(a) thist[t] = DB (d )!thist[ht, vers[d ][t]] or
(b) the projection of the operations in DB (d )!thist[ht, vers[d ][t]i] equals the projection of the operations in ActHist(t).
4. If DB (d )!tdec[ht, vers[d ][t]i] = Committed and thist[t] is active, then for all database states st1, st2,
and st3 :
∧ CorrectAtomicHist(DB (d )!thist[ht, vers[d ][t]], st1, st2)
∧ CorrectAtomicHist(thist[t], st1, st3)
⇒ st2 = st3
16
5. If ¬(d = DBof (t) ∧ vers[d ][t] = 0), then the projection of the operations in DB (d )!thist[ht, vers[d ][t]i]
equals the projection of the operations in Substr (ActHist(t), 1, dcnt[d ][t]).
6. If DB (d )!tdec[ht, vers[d ][t]i] = Committed and t is proposed, then t appears in learnedSeq[d ] and
every transaction t 0 that precedes t in learnedSeq[d ] satisfies dcom[d ][t 0 ].
7. dcom[d ][t] ⇒ t ∈ CommittedAt(d )
8. If DB (d )!q[ht, vers[d ][t]i] = Commit, then either tdec[t] ∈ Decided or q[t] = Commit.
9. If DB (d )!q[ht, vers[d ][t]i] = Commit and t is proposed, then t appears in learnedSeq[d ] and every
transaction t 0 that precedes t in learnedSeq[d ] satisfies dcom[d ][t 0 ].
10. DB (d )!q[ht, vers[d ][t]i] ∈ Request ∧ t ∈
/ proposed ⇒
(a) d = DBof (t),
(b) dcnt[d ][t] = Len(thist[t]),
(c) DB (d )!q[ht, vers[d ][t]i] = dreq[t],
(d) vers[DBof (t)][t] = 0, and
(e) thist[t] = DB (DBof (t))!thist[ht, 0i]
11. DB (d )!q[ht, vers[d ][t]i] ∈ Op ∧ t ∈ proposed ⇒
(a) d 6= DBof (t) ∨ vers[d ][t] 6= 0 and
(b) DB (d )!q[ht, vers[d ][t]i] = ActHist(t)[dcnt[d ][t] + 1].op
12. ∀v > vers[d ][t] : DB (d )!thist[ht, v i] = hi
The intuition of the proof is quite simple. It is relatively easy to check the invariants for the initial state
of the abstract algorithm. We then assume that they are true and show that they remain true after the
execution of each of the algorithm’s atomic actions no matter what was the state upon which the action
was executed (as long as the invariants were satisfied on it). In the following we analyze action by action
and sketch the proof for each of the invariants we have previously defined.
Action ReceiveReq
Database Invariant This action does not change any of the variables involved in the Database Invariant.
tdec Invariant With respect to the tdec Invariant, this action can only propose a transaction, which
does not invalidate the invariant.
dreply Invariant As in the previous case, this action can only propose a transaction, which does not
invalidate the invariant.
TI1 This action sets q[t] to a request and may add t to proposed . Invariant TI2(a,b) and the fact that
t is added to proposed only if q[t] is set to Commit, which is not in Op, imply that TI1 is preserved.
TI2 The action sets q[t] to a request (different from NoReq), which preserves the invariant, since it
invalidates the implication condition for transaction t.
TI3 Invariant TI2(e) and the action’s pre-condition invalidate the implication condition of this invariant
for transaction t.
17
TI4 If t is proposed, then TI2(d) ensures TI4(a). TI4(b) is ensured because t is proposed only if q[t] is
set to Commit, TI4(c) is ensured by TI2(e), and TI4(d) is ensured by TI2(g).
TI5 If dreq[t] is set to a value in Request, the invariant is ensured by TI2(b-d).
TI6 The action sets q[t] to req. It does nothing else if tdec ∈ Decided , but this condition invalidates
the invariant’s implication condition. Otherwise, the action either proposes t, which also invalidates
the invariant’s implication condition, or it makes dreq[t] equal to q[t]. In all the cases, the invariant
is preserved.
TI7 Condition (a) is easily verified. Condition (b) is ensured in case the action sets dreq[t] to Commit
by TI2(b).
DTI1-5,7 Automatically preserved.
DTI6 Transaction t is proposed only if tdec[t] ∈
/ Decided and the action’s pre-condition together with
invariant TI2(a) implies that t has not been committed at any database under any version, which
invalidates this invariant’s implication condition.
DTI8 For the sake of contradiction, assume there is a database d such that DB (d )!q[ht, vers[d ][t]i] =
Commit, tdec[t] ∈
/ Decided , and q[t] is set to Commit by this action. Then, invariant TI2(f) with
these assumptions and the action’s pre-condition imply that DB (d )!q[ht, vers[d ][t]i] = NoReq, a
contradiction with our first assumption.
DTI9 Transaction t is proposed only if tdec[t] ∈
/ Decided and the action’s pre-condition together with
invariant TI2(f) implies that DB (d )!q[htvers[d ][t]i] = NoReq, which invalidates this invariant’s implication condition.
DTI10 For the sake of contradiction, assume there is a database d such that
DB (d )!q[ht, vers[d ][t]i] ∈ Request, t ∈
/ proposed , and dreq[t] is set to a value different from
DB (d )!q[ht, vers[d ][t]i] (we concentrate on condition (c) since the verification of the other conditions
and the implication itself are simple). Then, invariant TI2(f) with these assumptions and the fact
that dreq[t] is only changed if tdec[t] ∈
/ Decided imply that DB (d )!q[ht, vers[d ][t]i] = NoReq, a
contradiction with our first assumption.
DTI11 If t is proposed by this action, then tdec[t] ∈
/ Decided and invariant TI2(f) invalidates this
invariant’s implication condition.
DTI12 Automatically preserved.
Action ReplyReq
Database Invariant The action changes thist, which could affect the Database Invariant. However,
TI1(a) implies that t has not been committed at any database, preserving the Database Invariant.
tdec Invariant The action only changes thist[t] if tdec[t] ∈
/ Decided , which is not true if t ∈
/ proposed
and pdec[t] = Committed , by the definition of tdec. Therefore, the invariant is preserved.
dreply Invariant According to the action’s pre-condition, thist[t] is only changed if rep ∈ Result and
rep = dreply[t], which implies that dreply[t] 6= Committed and automatically preserves the invariant.
TI1 The action changes q[t] to NoReq, which automatically preserves this invariant.
TI2 The action’s pre-condition implies that it is executed for a transaction t such that tdec[t] ∈
/ Decided
only if q[t] ∈ Op. Invariant TI 1 implies that no database has committed t in this case and t has not
been proposed. Conditions (c-f) are ensured by TI3(a-d) and condition (g) is ensured by the action
definition.
18
TI3 By invariant TI3 and the action’s definition, if thist[t] changes, its length will equal dcnt[DBof (t)][t],
which just invalidates TI3’s implication condition.
TI4 As for conditions (a), (c), and (d), the action only changes thist[t] and dreq[t] if tdec[t] ∈
/ Decided
and q[t] ∈ Op. Invariant TI1(b) implies that t ∈
/ proposed , which contradicts the invariant’s implication condition. As for condition (b), assume t ∈ proposed , tdec[t] ∈
/ Decided and q[t] is changed from
Commit to NoReq by this action. Such assumptions conflict with the action definition since q[t] must
be in Op for it to be enabled when tdec[t] ∈
/ Decided .
TI5 The action only changes thist[t] if it sets dreq[t] to NoReq, which automatically preserves the
invariant.
TI6 Easily verified.
TI7 If the action changes thist[t], it also sets dreq[t] to NoReq, preserving the invariant.
DTI1-2 Automatically preserved.
DTI3 The only variable related to the invariant that is changed by the action is thist. However, thist[t] is
only changed if tdec[t] ∈
/ Decided , q[t] ∈ Op, and dcnt[DBof (t)][t] > Len(thist[t]). Invariant TI1(b)
validates the implication condition of TI3 for t and TI3(a) automatically invalidates the implication
condition of DTI3, preserving the invariant.
DTI4 Again, the only variable of interest is thist, and it is changed only if tdec[t] ∈
/ Decided and
q[t] ∈ Op. In this case, invariant TI1(a) invalidates the implication condition of DTI4, preserving the
invariant.
DTI5 The action may only extend thist[t], which automatically preserves this invariant.
DTI6-7 Automatically preserved.
DTI8 Assume, for the sake of contradiction, that DB (d )!q[ht, vers[d ][t]i] = Commit, tdec[t] ∈
/ Decided
and q[t] is changed from Commit to NoReq by this action. However, the action is only enabled when
tdec[t] ∈
/ Decided if q[t] ∈ Op, which contradicts the fact that q[t] equals Commit before the action
is executed.
DTI9 Automatically preserved.
DTI10 The action only changes thist[t] and dreq[t] if tdec[t] ∈
/ Decided and, in this case, the action’s
pre-condition implies that dcnt[d ][t] > Len(thist[t]), which conflicts with the invariant’s condition
(b) and contradicts its validity before the action execution, unless the implication condition is not
satisfied. Since the action does not change the variables involved in the implication condition, the
invariant is preserved.
DTI11 This action only changes thist[t] if tdec[t] ∈
/ Decided and invariant TI2(f) invalidates DTI11’s
implication condition, preserving the invariant.
DTI12 Automatically preserved.
Action PrematureAbort
Database Invariant Automatically preserved since this invariant does not involve pdec.
tdec Invariant The invariant preserved since it involves only transactions t such that t ∈
/ proposed
and pdec[t] = Committed . PrematureAbort executes for a transaction t only if t ∈
/ proposed and
pdec[t] ∈
/ Decided and it changes pdec[t] to Aborted , not interfering with the invariant condition.
dreply Invariant Automatically preserved.
19
TI1-2,6 This action can only change tdec[t] from Unknown to Aborted , which preserves these invariants
since Aborted ∈ Decided .
TI3 Automatically preserved.
TI4 Conditions (a), (c) , and (d) are automatically preserved. As for condition (b), this action can only
change tdec[t] to a value in Decided (Aborted ), preserving the invariant as well.
TI5,7 Automatically preserved
DTI1-7,9-12 Automatically preserved.
DTI8 Easily verified.
Action PassiveCommit
Database Invariant Automatically preserved.
tdec Invariant If tdec[t] is changed to Committed , the action’s pre-condition implies that dreply[t]
equals Committed and the dreply Invariant ensures that thist[t] is passive and t belongs to
CommittedAt(DBof (t)).
dreply Invariant Automatically preserved.
TI1-2,6 This action can only change tdec[t] from Unknown to Committed , which preserves these invariants since Committed ∈ Decided .
TI3 Automatically preserved.
TI4 Condition (a), (c), and (d) are automatically preserved. As for condition (b), this action can only
change tdec[t] to a value in Decided (Committed ), preserving the invariant as well.
TI5,7 Automatically preserved
DTI1-7,9-12 Automatically preserved.
DTI8 Easily verified.
Action DBReq joint with DB (d )!ReceiveReq
Database Invariant Automatically preserved.
tdec Invariant Automatically preserved.
dreply Invariant Automatically preserved.
TI1 Automatically preserved.
TI2 This action could break condition (f) of invariant TI2 for some transaction t. If the action is
enabled by its first condition, invariants TI5(a) and TI6 imply that q[t] ∈ Request, contradicting TI2’s
implication condition. If the action is enabled by its second or third condition, then the termination
properties ensure that t ∈ proposed and invariant TI4(b) contradict TI2’s implication condition.
TI3 This action sets DB (d )!q[ht, vers[d ][t]i] to a value different from NoReq and could break TI3(a) for
t. However, condition 1 requires that dcnt[DBof (t)][t] = Len(thist[t]), contradicting TI3’s implication
condition. Conditions 2 and 3 (with the Nontriviality property of the termination protocol) imply
that t has been proposed, also contradicting TI3’s implication condition.
TI4-7 Automatically preserved.
DTI1 Obviously preserved.
20
DTI2 Automatically preserved.
DTI3 This action can only set Db(d )!q[ht, vers[d ][t]i] to Commit by conditions 1 or 3. In the first case,
the invariant is guaranteed by invariant TI5(b-c). If condition 3 enables this action, there are two
cases to consider.
d = DBof (t) ∧ vers[d ][t] = 0: The Nontriviality and Consistency properties of the termination protocol imply that t ∈ proposed and invariant TI4(a) ensures that DTI3 is preserved.
dcnt[d ][t] = Len(ActHist(t)): In this case, DTI3 is ensured by DTI5.
DTI4-7 Automatically preserved.
DTI8 If the action is triggered by condition 1 and sets DB (d )!q[ht, vers[d ][t]i] to Commit, then dreq[t] =
Commit. Invariant TI7 implies that t ∈
/ proposed and invariant TI6 ensures DTI8. If the action is
triggered by condition 2, it cannot set DB (d )!q[ht, vers[d ][t]i] to Commit. Finally, if the action is
triggered by condition 3, then the Consistency and Nontriviality properties of the termination protocol
ensure that tdec[t] = Committed .
DTI9 If the action is triggered by condition 1, then t ∈
/ proposed by TI7(b). If the action is triggered
by condition 3, this condition itself ensures DTI9.
DTI10 The only enabling condition that could interfere with this invariant for this action is condition
1. However, it can be easily verified that it ensures DTI10(c).
DTI11 The only enabling condition that could interfere with this invariant for this action is condition
2. TI4(c) ensures DTI11(a) and DTI11(b) is easily verified.
DTI12 Automatically preserved.
Action DB (d )!DoAbort
Database, tdec, and dreply Invariants, and TI1-2 The action can only change DB (d )!tdec by internally abort a transaction, which does not change CommittedAt(d ).
TI3-7 Automatically preserved.
DTI1 Automatically preserved.
DTI2 Easily verified since it changes DB (d )!tdec[ht, v i] from Unknown to Aborted .
DTI3 Automatically preserved.
DTI4 Easily verified since it changes DB (d )!tdec[ht, v i] from Unknown to Aborted .
DTI5 Automatically preserved.
DTI6 Easily verified since it changes DB (d )!tdec[ht, v i] from Unknown to Aborted .
DTI7-12 Automatically preserved.
Action DB (d )!DoCommit
Database Invariant There are two cases to consider.
t ∈ proposed Take the sequence seq of the Database Invariant before the action is executed. Invariants
DTI9 and DTI7 imply that all transactions previous to t in learnedSeq[d ] are already present in
seq. Invariants DTI6 and DTI7 imply that all proposed transactions committed at d appear before
t in learnedSeq[d ], otherwise t would have already been committed at d and the action would not
be enabled. This fact and conditions 2 and 3 of the Database Invariant imply that the sequence
of states generated by seq is the same as the one generated by the longest prefix not including
21
t of the sequence defined in the Consistency property for termination. Let st be the last state
generated by this sequence. The Consistency property ensures that there is a state st2 such that
CorrectAtomicHist(thist[t], st, st2) is true. As a result, we can add t to the end of seq and satisfy
condition 1 of the Database Invariant after the action is executed. By DTI3, Assumption 1, the
definition of ActHist, if thist[t] is active, so is DB (d )!thist[ht, vers[d ][t]i] and t should be added to
DB (d )!serialSeq, satisfying condition 2 of the Database Invariant. Condition 3 is satisfied because,
as we pointed out in the very beginning of this case’s analysis, a proposed transaction is committed
at d iff it appears before t in learnedSeq[d ].
t ∈
/ proposed As before, take sequence seq of the Database Invariant before the action is executed.
A simple induction on the size of DB (d )!serialSeq and NVerialSeq(d ) taking into consideration
the Database Invariant as well as DTI4 shows that the sequence of different states generated
by these two sequences with respect to DB (d )!thist and thist, respectively, is exactly the the
same. DTI10(c), TI7 and the definition of action DoCommit imply that t is passive and it can
be atomically executed after some of the states mentioned in the previous step. We can place t
exactly after that state is generated in seq, satisfying condition 1 of the Database Invariant after
the action is executed. Conditions 2 and 3 are automatically satisfied since t is passive.
tdec Invariant Easily preserved, since CommittedAt(d ) can only be increased.
dreply Invariant Easily preserved, since CommittedAt(d ) can only be increased.
TI1-2 Easily preserved, given DTI7.
TI3-7 Automatically preserved.
DTI1 Automatically preserved.
DTI2 Easily verified given DTI1.
DTI3,5 Automatically preserved.
DTI4 By DTI3.
DTI6 By DTI9.
DTI7 Obviously preserved, since CommittedAt(d ) can only be increased.
DTI8-12 Automatically preserved.
Action DBRep joint with DB (d )!ReplyReq
Database and tdec Invariants Automatically preserved.
dreply Invariant dreply[t] is set to Committed only if DB (d )!q[ht, vers[d ][t]i] equals Commit. Invariants
DTI10(c) and TI 7(a) imply that thist[t] is passive, and invariant DTI10(a) with the definition of action
DB (d )!ReplyReq ensures that t will belong to CommittedAt(DBof (t)).
TI1 Automatically preserved.
TI2 The fact that t ∈ proposed contradicts TI2(b) and imply that the implication condition of TI2 is
false. Therefore, we have to consider only the case in which t ∈
/ proposed . In this case, DTI10(c)
implies that dreq[t] is different from NoReq and TI6 implies that so is q[t], a contradiction with the
implication condition of TI2.
TI3 Easily verified by DTI10 and the action definition.
TI4 DTI11(a) implies that TI4(a) is preserved. TI4(b) is automatically preserved, and TI4(c) is easily
verified by TI4(c) itself and the action definition. TI4(d) is also automatically preserved.
22
TI5 If t ∈ proposed , TI4(d) implies that dreq[t] = NoReq, contradicting the implication condition and
preserving the invariant. If t ∈
/ proposed , DTI10(b) and the action definition imply that dcnt[d ][t] is
set to Len(thist[t]) + 1, also contradicting the implication condition and preserving the invariant.
TI6-7 Automatically preserved.
DTI1 The action sets DB (d )!q[ht, vers[d ][t]i] to NoReq, preserving the invariant.
DTI2 vers[d ][t] is increased only if DB (d )!tdec[ht, v i] equals Aborted .
DTI3 Easily verified by DTI1 and the action definition since DB (d )!q[ht, vers[d ][t]i] is set to NoReq.
DTI4 If vers[d ][t] is changed, DTI2 preserves DTI4. Otherwise, if DB (d )!tdec[ht, vers[d ][t]i] = Committed ,
DB (d )!thist is not changed and the invariant is preserved.
DTI5 If vers[d ][t] is changed, then it is increased and dcnt[d ][t] is set to 0. In this case, invariant
DTI12 preserves DTI5. If vers[d ][t] is not changed, there are to cases to analyze. If t ∈
/ proposed the
invariant’s implication condition is invalidated by DTI10(a,d); If t ∈ proposed , then DTI11(b) and
the action definition preserve DTI5.
DTI6 Easily verified by DTI2 in case vers[d ][t] changes.
DTI7 Easily verified by the action definition.
DTI8-9 Easily verified in case vers[d ][t] changes by DTI1.
DTI10-11 The action sets DB (d )!q[ht, vers[d ][t]i] to NoReq, invalidating these invariants’ implication
condition.
DTI12 By DTI12 and the fact that vers[d ][t] can only be increased.
Termination action changing gdec[t] from Unknown to a value in Decided
Database, tdec, and dreply Invariants Automatically preserved.
TI1-2 Easily verified since this action can only change tdec[t] to a value in Decided , invalidating these
invariants’ implication condition.
TI3 Automatically preserved.
TI4 Conditions (a) and (c-d) are automatically preserved. Condition (b) is easily verified since this
action changes tdec[t] to a value in Decided .
TI5,7 Automatically preserved.
TI6 Easily verified since this action can only change tdec[t] to a value in Decided , invalidating the
invariant’s implication condition.
DTI1-7,9-12 Automatically preserved.
DTI8 Easily verified since this action can only change tdec[t] to a value in Decided , invalidating the
invariant’s implication condition.
Termination action changing learnedSeq[d ] — Recall that this action can only extend learnedSeq[d ] by
the Stability property of the termination protocol.
Database Invariant Easily verified since learnedSeq[d ] is only extended.
tdec and dreply Invariants Automatically preserved.
TI1-7 Automatically preserved.
DTI1-5,7-8,10-12 Automatically preserved.
DTI6,9 Easily verified since learnedSeq[d ] is only extended.
23
B
Proof of Theorem 2
Theorem 2 Our abstract deferred update algorithm with the Consistency property for termination changed
for the Alternative Consistency property defined above does not implement the specification of a serializable
database given in Section 2.
Proof sketch: To understand why, consider the case with two active transactions t1 and t2 that write
distinct database objects, x and y, respectively, and do not read anything. Transaction t1 can execute on
database d1 and transaction t2 can execute on database d2 . Both transactions are free to commit and can
be proposed to the termination protocol. Executing either t1 before t2 or t2 before t1 leads to the same final
state and they both can be committed in gdec. Assume, then, that database d1 follows the ordering ht1 , t2 i
and executes and commits t1 first. Database d2 follows the ordering ht2 , t1 i, executing and committing t2
first. At this point, if a passive transaction reads the whole state of database d1 , it will see the execution
of t1 but not the execution of t2 , which implies that t1 must be serialized before t2 . If a passive transaction
reads d2 , it will imply that t2 must be serialized before t1 . Since passive transactions are free to execute
completely at the databases responsible for them, all these transactions are free to commit locally and this
scenario would break the global serializability.
C
Proof of Theorem 3
Theorem 3 The four properties Nontriviality, Stability, Consistency, and Liveness of our Termination Protocol specification satisfy the Nontriviality, Stability, Consistency, and Liveness properties of Sequence Agreement for transactions that commit where commands are transactions and learnedSeq implements learned .
Proof sketch: For any execution of the Termination Protocol, consider only the set of proposed transactions that eventually commit (gdec[t] is set to Committed ) as the set of proposed transactions in an
execution of Sequence Agreement. We show that all properties of Sequence Agreement are guaranteed in
the following:
Nontriviality Guaranteed by Consistency and Nontriviality of Termination.
Stability Trivially guaranteed by Stability of Termination.
Consistency By the Consistency property of Termination, all learnedSeq sequences are prefixes of a common sequence seq of committed (proposed, for Sequence Agreement) transactions, which guarantees that,
for every two of them, one is a prefix of the other.
Liveness By the Liveness property of Termination.
D
D.1
TLA Specifications
Module DatabaseConstants
This module contains general database definitions.
module DatabaseConstants
extends Sequences, FiniteSets, Naturals
The specification is based on the following constants:
- Tid : Set of transaction ids, where each id identifies a single transaction.
- Op: Set of possible transaction operations different from Commit or Abort.
- Commit, Abort: Special operations for committing/aborting a transaction.
- Result: Set of operation results.
24
- Committed , Aborted : Special results returned when a transaction is committed/aborted.
- CorrectOp(op, res, dbstate, newdbstate): Predicate that tells if operation op, when executed upon database state dbstate,
may give res as a result and generate new database state newdbstate.
- DBState: Set of database states.
- InitialDBState: Initial database state.
- FSeq: A substitute for Seq – just a trap for the model checker.
- Universe: A set to bound unbounded choose statements – another trap for the model checker.
constants Tid , Op, Commit, Abort, Result, Committed , Aborted , CorrectOp( , , , ),
DBState, InitialDBState, FSeq( ), Universe
We define Unknown as a transaction status in which the transaction has been neither committed nor aborted.
Decided
Unknown
∆
= {Committed , Aborted }
∆
= choose v ∈ Universe : v ∈
/ Decided
Request is the set of all possible requests and NoReq is defined to be something that is not a (valid) request.
∆
Request = Op ∪ {Commit, Abort}
∆
NoReq = choose noreq ∈ Universe : noreq ∈
/ Request
Reply is the set of all possible replies for a request.
∆
= Result ∪ Decided
Reply
Assumptions
The values used as transaction decisions (Committed and Aborted ) must be different from operation results because we assume
the decision is given as the response for operations issued after the transaction has been committed or aborted, so that the client
is told that the operation was not performed because the transaction has been decided. If Committed or Aborted corresponds
to a correct operation result, the client cannot tell if the operation executed or the transaction terminated.
assume Committed ∈
/ Result
assume Aborted
∈
/ Result ∪ {Committed }
We must also assume that Commit and Abort requests are different and not present in Op.
assume Commit ∈
/ Op
assume Abort
∈
/ Op ∪ {Commit}
InitialDBState must belong to DBState
assume InitialDBState ∈ DBState
CorrectOp must be a correct predicate on op, res, dbstate, and newdbstate
assume ∀ op ∈ Op, res ∈ Result,
dbstate ∈ DBState, newdbstate ∈ DBState :
CorrectOp(op, res, dbstate, newdbstate) ∈ boolean
Auxiliar Expressions
OpRec represents a tuple in Op × Result as a record with two fields: op and res.
∆
OpRec = [op : Op, res : Result]
THist is the set of all possible transaction histories.
∆
THist = FSeq(OpRec)
25
THistVector is the set of all possible history vectors.
∆
THistVector = [Tid → THist]
CorrectAtomicHist verifies if the operations in transaction history h, when sequentially applied to the database state initst, can
provide the same results they provided in h and generate the final database state finalst. It is defined as a recursive function that
tests operation by operation, in order, with a simple tail recursion. CorrectAtomicHist is defined so that even nondeterministic
operations are allowed. A single operation can provide nondeterministic results or change the database nondeterministically.
∆
CorrectAtomicHist[h ∈ THist, initst ∈ DBState, finalst ∈ DBState] =
if h = hi
then initst = finalst
else ∃ ist ∈ DBState : ∧ CorrectOp(Head (h).op, Head (h).res, initst, ist)
∧ CorrectAtomicHist[Tail (h), ist, finalst]
Perm(S ) represents all sequences containing exactly one copy of each element in set S . It represents all the possible orderings
of elements in S . The name Perm comes from permutations although a permutation is a function from S to S , and not a
sequence derived from S . For want of a better name, we kept Perm.
∆
∆
Perm(S ) = let N = Cardinality(S )
in
{s ∈ [1 . . N → S ] : {s[i ] : i ∈ 1 . . N } = S }
CorrectSerialization verifies if sequence seq of transaction ids represents a correct serial execution of its transactions with
respect to their histories in history vector thist, initial database state initst, and final database state finalst. It is defined as as
a recursive function, like CorrrectAtomicHist, that verifies transaction by transaction with a simple tail recursion.
∆
CorrectSerialization[seq ∈ FSeq(Tid ), thist ∈ THistVector , initst ∈ DBState, finalst ∈ DBState] =
if seq = hi
then initst = finalst
else ∃ ist ∈ DBState : ∧ CorrectAtomicHist[thist[Head (seq)], initst, ist]
∧ CorrectSerialization[Tail (seq), thist, ist, finalst]
IsSerializable(S , thist, initst) verifies if set S can have a sequence containing each of its elements exactly once such that its
execution is serializable with respect to history vector thist and initial database state initst.
∆
IsSerializable(S , thist, initst) =
∃ seq ∈ Perm(S ), db ∈ DBState : CorrectSerialization[seq, thist, initst, db]
PassiveOp(op) is satisfied iff operation op is passive.
∆
PassiveOp(op) =
∀ st1, st2 ∈ DBState, res ∈ Result :
CorrectOp(op, res, st1, st2) ⇒ st1 = st2
PassiveHist(h) is satisfied iff history h is passive.
∆
PassiveHist(h) =
∀ st1, st2 ∈ DBState :
CorrectAtomicHist[h, st1, st2] ⇒ st1 = st2
D.2
Module SerializableDB
This module presents a TLA+ version of our serializable database specification. It extends module DatabaseInterface
that defines interface operators DBRequest and DBResponse in terms of an interface variable DBinter . Our
specifications are practically oblivious to how these operators are defined as long as their definitions are
disjoint. For the sake of simplicity, we do not present our specification of module DatabaseInterface.
26
module SerializableDB
extends DatabaseConstants, DBInterface
variables thist, tdec, q and DBinter from DBInterface
∆
thistType = THistVector
∆
tdecType = [Tid → Decided ∪ {Unknown}]
∆
qType
= [Tid → Request ∪ {NoReq}]
Set of all committed transactions
∆
committedSet = {t ∈ Tid : tdec[t] = Committed }
ReceiveReq(t, req) deals with the receipt of a transaction request. It stores the received request in q[t] for it to be processed
later.
∆
ReceiveReq(t, req) = ∧ DBRequest(t, req)
∧ q[t] = NoReq
∧ q 0 = [q except ![t] = req]
∧ unchanged hthist, tdeci
ReplyReq(t, rep) deals with the response of an executing request. It checks whether transaction t has already been decided; if
so, the response to t’s executing request is its final decision. If t has not been decided yet, then the action is enabled only if op
is in Op and rep is in Result. In such a case, the operation and its result are enqueued in t’s history.
∆
ReplyReq(t, rep) = ∧ q[t] ∈ Request
∧ DBResponse(t, rep)
∧ q 0 = [q except ![t] = NoReq]
∧ if tdec[t] ∈ Decided
then ∧ rep = tdec[t]
∧ unchanged hthist, tdeci
else ∧ q[t] ∈ Op
∧ rep ∈ Result
∧ thist 0 = [thist except ![t] = Append (@, [op →
7 q[t],
res →
7 rep])]
∧ unchanged htdeci
Action DoAbort(t) aborts t by setting tdec[t] to Aborted . This can be done at any time as long as t has not been decided yet.
∆
DoAbort(t) = ∧ tdec[t] ∈
/ Decided
0
∧ tdec = [tdec except ![t] = Aborted ]
∧ unchanged hthist, q, DBinter i
Action DoCommit(t) commits t by setting tdec[t] to Committed , which is done only if t has not been decided and t’s commit
request has been issued.
∆
DoCommit(t) = ∧ tdec[t] ∈
/ Decided
∧ q[t] = Commit
∧ tdec 0 = [tdec except ![t] = Committed ]
∧ IsSerializable(committedSet 0 , thist, InitialDBState)
∧ unchanged hthist, q, DBinter i
Specification
Initialization.
27
∆
Init = ∧ InitInterface
∧ thist = [i ∈ Tid 7→ hi]
∧ tdec = [i ∈ Tid 7→ Unknown]
∧ q = [t ∈ Tid 7→ NoReq]
Next defines the possible “next” steps in a correct execution.
∆
Next = ∃ t ∈ Tid :
∨ ∨ ∃ req ∈ Request : ReceiveReq(t, req)
∨ ∃ rep ∈ Reply : ReplyReq(t, rep)
∨ DoCommit(t)
∨ DoAbort(t)
Final specification.
Spec = Init ∧ 2[Next]hthist, tdec, q, DBinter i
∆
D.3
Module OPSerializableDB
This module presents our specification of an order-preserving serializable database.
module OPSerializableDB
extends SerializableDB
serialSeq keeps the commit order
variables serialSeq
∆
serialSeqType = {s ∈ FSeq(Tid ) : ∀ i , j ∈ domain s : i 6= j ⇒ s[i ] 6= s[j ]}
∆
OPDoCommit(t) = ∧ tdec[t] = Unknown
∧ q[t] = Commit
∧ tdec 0 = [tdec except ![t] = Committed ]
∧ serialSeq 0 = Append (serialSeq, t)
∧ ∃ st ∈ DBState : CorrectSerialization[serialSeq 0 , thist, InitialDBState, st]
∧ unchanged hthist, q, DBinter i
Specification
Initialization.
∆
OPInit = ∧ Init
∧ serialSeq = hi
Next defines the possible “next” steps in a correct execution.
∆
OPNext = ∃ t ∈ Tid :
∨ ∧ ∨ ∃ req ∈ Request : ReceiveReq(t, req)
∨ ∃ rep ∈ Reply : ReplyReq(t, rep)
∨ DoAbort(t)
∧ unchanged serialSeq
∨ OPDoCommit(t)
28
Final specification.
OPSpec = OPInit ∧ 2[OPNext]hthist, tdec, q, DBinter , serialSeqi
∆
D.4
Module AOPSerializableDB
This module presents our specification of an active order-preserving serializable database.
module AOPSerializableDB
extends SerializableDB
serialSeq keeps the commit order
variables serialSeq
∆
serialSeqType = {s ∈ FSeq(Tid ) : ∀ i , j ∈ domain s : i 6= j ⇒ s[i ] 6= s[j ]}
Function IsSubSeq below verifies if smallseq is a subsequence of bigseq.
∆
IsSubSeq[smallseq ∈ FSeq(Tid ), bigseq ∈ FSeq(Tid )] =
(smallseq 6= hi) ⇒
∃ i ∈ 1 . . Len(bigseq) :
∧ bigseq[i ] = Head (smallseq)
∧ ∀ j ∈ 1 . . Len(bigseq) :
bigseq[j ] = Head (smallseq) ⇒ j ≥ i
∧ IsSubSeq[Tail (smallseq), SubSeq(bigseq, i + 1, Len(bigseq))]
∆
AOPDoCommit(t) = ∧ tdec[t] = Unknown
∧ q[t] = Commit
∧ tdec 0 = [tdec except ![t] = Committed ]
∧ if PassiveHist(thist[t])
then unchanged serialSeq
else serialSeq 0 = Append (serialSeq, t)
∧ ∃ seq ∈ Perm(committedSet 0 ), st ∈ DBState :
∧ CorrectSerialization[seq, thist, InitialDBState, st]
∧ IsSubSeq[serialSeq 0 , seq]
∧ unchanged hthist, q, DBinter i
Specification
Initialization.
∆
AOPInit = ∧ Init
∧ serialSeq = hi
Next defines the possible “next” steps in a correct execution.
∆
AOPNext = ∃ t ∈ Tid :
∨ ∧ ∨ ∃ req ∈ Request : ReceiveReq(t, req)
∨ ∃ rep ∈ Reply : ReplyReq(t, rep)
∨ DoAbort(t)
∧ unchanged serialSeq
∨ AOPDoCommit(t)
29
Final specification.
AOPSpec = AOPInit ∧ 2[AOPNext]hthist, tdec, q, DBinter , serialSeqi
∆
D.5
Module GeneralDeferredUpdate
This is the TLA+ specification of our abstract deferred update algorithm.
module GeneralDeferredUpdate
extends DatabaseConstants, DBInterface
constants Database, DBof ( ), StripPassive( )
variables thist, q, dreq, pdec, Client variables
dreply, dcnt, vers, dcom, Database variables
ldinter , dthist, dtdec, dq, dserialSeq, Internal database variables
proposed , learnedSeq, gdec Termination variables
assume ∀ t ∈ Tid : DBof (t) ∈ Database
assume ∀ hist ∈ THist, st1, st2 ∈ DBState :
∧ StripPassive(hist) ∈ THist
∧ CorrectAtomicHist[hist, st1, st2] ⇒ CorrectAtomicHist[StripPassive(hist), st1, st2]
Definition of tdec based on pdec e gdec
∆
tdec = [t ∈ Tid 7→ if t ∈ proposed
then gdec[t]
else pdec[t]]
Each database accepts transactions with ids in the form htid , versioni where tid is an element of Tid and version is a Natural.
This allow “a single” transaction to be submitted to a database multiple times.
∆
LocalTid = Tid × Nat
The definition below instantiates each local database used by the general algorithm.
∆
DBS (d ) = instance AOPSerializableDB with Tid
DBinter
thist
tdec
q
serialSeq
← LocalTid ,
← ldinter [d ],
← dthist[d ],
← dtdec[d ],
← dq[d ],
← dserialSeq[d ]
NoRep is defined to be some value that is not a valid reply.
∆
NoRep = choose v : v ∈
/ Reply
The definition below creates an instance of the termination protocol specification.
∆
GT = instance GeneralTermination
Auxiliary definitions to help dealing with the declared variables.
30
cvars
ldvars
gdvars
dvars
tvars
∆
=
∆
=
∆
=
∆
=
∆
=
hthist, q, dreq, pdeci
hldinter , dthist, dtdec, dq, dserialSeqi
hdreply, dcnt, vers, dcomi
hgdvars, ldvarsi
hproposed , learnedSeq, gdeci
ActHist(t) returns the current history of transaction t with some of its passive operations taken out of the sequence (according
to operator StripPassive).
∆
ActHist(t) = StripPassive(thist[t])
DBvars(d ) returns the internal variables of database d .
∆
DBvars(d ) = hldinter [d ], dthist[d ], dtdec[d ], dq[d ], dserialSeq[d ]i
OtherDBsStutter (d ) is an action that forces all databases but d to execute a stuttering step, that is, a step in which their
internal variables do not change values. For simplicity, our specification does not allow interleaving of database actions. In fact,
as we explain in the following, it does not allow interleaving at all.
∆
OtherDBsStutter (d ) =
∆
let dbfn = [nd ∈ (Database \ {d }) 7→ DBvars(nd )]
in
dbfn 0 = dbfn
Here are the atomic actions of the general deferred update technique, not including the internal database actions and the
internal actions of the termination protocol. In order to model check this specification, we had to make it noninterleaving,
that is, we had to specify it in terms of actions that cannot occur concurrently (even considering that they are executed by
different specification components). This prevented us from using the DBRequest and DBResponse primitives to interact with
the internal databases. Instead, we used the ReceiveReq and ReplyReq actions directly to submit an operation and get a response
from a database.
The ReceiveReq action as explained in the paper.
∆
ReceiveReq(t, req) =
∧ DBRequest(t, req)
∧ q[t] ∈
/ Request
0
∧ q = [q except ![t] = req]
∧ if tdec[t] ∈
/ Decided
then ∨ ∧ req = Commit
∧ GT !Propose(t)
∧ unchanged hthist, dreq, pdec, dvarsi
∨ ∧ req = Commit ⇒ PassiveHist(thist[t])
∧ dreq 0 = [dreq except ![t] = req]
∧ unchanged hthist, pdec, dvars, tvarsi
else unchanged hthist, dreq, pdec, dvars, tvarsi
The ReplyReq action.
∆
ReplyReq(t, rep) =
∧ q[t] ∈ Request
∧ DBResponse(t, rep)
∧ q 0 = [q except ![t] = NoReq]
∧ if tdec[t] ∈ Decided
then ∧ rep = tdec[t]
31
∧ unchanged hthist, dreq, pdec, dvars, tvarsi
else ∧ q[t] ∈ Op
∧ rep ∈ Result
∧ dcnt[DBof (t)][t] > Len(thist[t])
∧ rep = dreply[t]
∧ thist 0 = [thist except ![t] = Append (@, [op 7→ q[t],
res 7→ rep])]
∧ dreq 0 = [dreq except ![t] = NoReq]
∧ unchanged hpdec, dvars, tvarsi
The PrematureAbort action.
∆
PrematureAbort(t) = ∧ t ∈
/ proposed
∧ pdec[t] ∈
/ Decided
∧ pdec 0 = [pdec except ![t] = Aborted ]
∧ unchanged hthist, q, dreq, dvars, tvars, DBinter i
The PassiveCommit action.
∆
PassiveCommit(t) = ∧ t ∈
/ proposed
∧ pdec[t] ∈
/ Decided
∧ dreply[t] = Committed
∧ pdec 0 = [pdec except ![t] = Committed ]
∧ unchanged hthist, q, dreq, dvars, tvars, DBinter i
The DBReq action with its three enabling conditions.
∆
DBReq(d , t, req) =
∧
∨ ∧ d = DBof (t)
Condition 1
∧ dreq[t] = req
∧ dcnt[d ][t] = Len(thist[t])
∨ ∧ t ∈ proposed
Condition 2
∧ dcnt[d ][t] < Len(ActHist(t))
∧ req = ActHist(t)[dcnt[d ][t] + 1].op
∨ ∧ req = Commit
Condition 3
∧ ∃ i ∈ 1 . . Len(learnedSeq[d ]) :
∧ learnedSeq[d ][i ] = t
∧ ∀ j ∈ 1 . . i : dcom[d ][learnedSeq[d ][j ]]
∧ ∨ d = DBof (t) ∧ vers[d ][t] = 0
∨ dcnt[d ][t] = Len(ActHist(t))
∧ DBS (d )!ReceiveReq(ht, vers[d ][t]i, req)
∧ OtherDBsStutter (d )
∧ unchanged hcvars, gdvars, tvars, DBinter i
The DBRep action.
∆
DBRep(d , t, rep) =
∧ DBS (d )!ReplyReq(ht, vers[d ][t]i, rep)
∧ OtherDBsStutter (d )
∧ if d = DBof (t) then dreply 0 = [dreply except ![t] = rep]
else unchanged dreply
∧ if rep = Aborted ∧ t ∈ proposed
then ∧ vers 0 = [vers except ![d ][t] = @ + 1]
32
∧ dcnt 0 = [dcnt except ![d ][t] = 0]
∧ unchanged dcom
else ∧ dcnt 0 = [dcnt except ![d ][t] = @ + 1]
∧ dcom 0 = [dcom except ![d ][t] = (rep = Committed )]
∧ unchanged vers
unchanged hcvars, tvars, DBinter i
∧
Initialization.
∆
Init = ∧ InitInterface
∧q
= [t ∈ Tid 7→ NoReq]
∧ dreq = [t ∈ Tid 7→ NoReq]
∧ dreply = [t ∈ Tid 7→ NoRep]
∧ pdec = [t ∈ Tid 7→ Unknown]
∧ vers = [d ∈ Database 7→ [t ∈ Tid 7→ 0]]
∧ dcom = [d ∈ Database 7→ [t ∈ Tid 7→ false]]
∧ dcnt = [d ∈ Database 7→ [t ∈ Tid 7→ 0]]
∧ ∀ d ∈ Database : DBS (d )!AOPInit
∧ GT !Init includes thist
The next-state action in terms of noninterleaving actions.
∆
Next = ∨ ∃ t ∈ Tid : ∨ ∃ req ∈ Request : ReceiveReq(t, req)
∨ ∃ rep ∈ Reply : ReplyReq(t, rep)
∨ PrematureAbort(t)
∨ PassiveCommit(t)
∨ ∃ d ∈ Database : ∨ ∃ t ∈ Tid : ∨ ∃ req ∈ Request : DBReq(d , t, req)
∨ ∃ rep ∈ Reply : DBRep(d , t, rep)
∨ ∧ unchanged ldinter [d ] ∧ DBS (d )!AOPNext
∧ OtherDBsStutter (d )
∧ unchanged hcvars, gdvars, tvars, DBinter i
∨ ∧ unchanged hcvars, dvars, DBinter i
∧ GT !TNext
The final specification, including the liveness condition of the termination protocol.
Spec = Init ∧ 2[Next]hcvars, dvars, tvars, DBinter i ∧ GT !Liveness
∆
D.6
Module GeneralTermination
This module presents our specification of the Termination Protocol.
module GeneralTermination
extends DatabaseConstants
constants Database
variables proposed , learnedSeq, gdec, thist
pdec stands for premature/passive decision (client decision)
gdec stands for global decision (for global transactions)
33
∆
vars = hproposed , learnedSeq, gdeci
∆
committedSet = {t ∈ Tid : gdec[t] = Committed }
IsPrefix (smallseq, bigseq) verifies if smallseq is a prefix of bigseq.
∆
IsPrefix (smallseq, bigseq) = ∃ n ∈ 0 . . Len(bigseq) : smallseq = SubSeq(bigseq, 1, n)
The consistency property in TLA+ The other properties are automatically guaranteed by the atomic actions below.
∆
Consistency =
∃ seq ∈ Perm(committedSet), st ∈ DBState :
∧ CorrectSerialization[seq, thist, InitialDBState, st]
∧ ∀ d ∈ Database : IsPrefix (learnedSeq[d ], seq)
Propose(t) proposes a transaction t for termination.
∆
Propose(t) =
∧t ∈
/ proposed
∧ proposed 0 = proposed ∪ {t}
∧ unchanged hlearnedSeq, gdeci
Decide(t) makes a final decision (Committed or Aborted ) about proposed transaction t.
∆
Decide(t) =
∧ t ∈ proposed
∧ gdec[t] = Unknown
∧ ∃ v ∈ Decided : gdec 0 = [gdec except ![t] = v ]
∧ unchanged hproposed , learnedSeqi
∧ Consistency 0
Learn(d , seq) extends learnedSeq[d ], but only if the new value ensures consistency.
∆
Learn(d , seq) =
∧ IsPrefix (learnedSeq[d ], seq) ∧ Len(seq) > Len(learnedSeq[d ])
∧ learnedSeq 0 = [learnedSeq except ![d ] = seq]
∧ unchanged hproposed , gdeci
∧ Consistency 0
The following two definitions have to do with our weak liveness requirement for termination.
∆
LivenessDatabase(t, d ) =
gdec[t] = Aborted ⇒
3(t ∈ {learnedSeq[d ][i ] : i ∈ domain learnedSeq[d ]})
∆
Liveness =
2(∀ t ∈ Tid , d ∈ Database : LivenessDatabase(t, d ))
The following action simply helps the model checking. It changes the transactions’ history vector.
∆
ChangeHist(t) =
∧t ∈
/ proposed
∧ ∃ o ∈ Op, r ∈ Result :
thist 0 = [thist except ![t] = Append (@, [op →
7 o,
res →
7 r ])]
∧ unchanged vars
34
TNext allows any action but Propose(v ). It is used in the specification of our general deferred update protocol.
∆
TNext =
∨ ∃ t ∈ Tid : Decide(t)
∨ ∃ d ∈ Database, seq ∈ FSeq(Tid ) : Learn(d , seq)
Next allows all the actions and is used by to model check termination in an isolated way.
∆
Next =
∨ TNext ∧ unchanged thist
∨ ∃ t ∈ Tid : ∨ Propose(t) ∧ unchanged thist
∨ ChangeHist(t)
Initialization.
∆
Init =
∧ proposed
∧ learnedSeq
∧ gdec
∧ thist
= {}
= [d ∈ Database 7→ hi]
= [t ∈ Tid 7→ Unknown]
= [t ∈ Tid 7→ hi]
Final specification.
Spec = Init ∧ 2[Next]hvars, thisti ∧ Liveness
∆
35

Download Report

A Formal Analysis of the Deferred Update Technique

Paperzz.com

Your Paperzz