Consistency and Replication

Dr Markus Hagenbuchner
[email protected]
CSCI319
Distributed Systems
Chapter 7 – Consistency & Replication
CSCI319
Chapter 7
Page: 1
Consistency And Replication
Lecture notes based on the textbook by Tannenbaum
Study objectives:
1. Understand the role of replication in DS systems.
2. Explain replication strategies.
3. Understand and explain the important consistency models,
and algorithms that realize a given consistency model.
4. Explain the difference between data centric consistency and
client centric consistency.
5. Understand the role of conit, and explain how conits are
computed.
CSCI319
Chapter 7
Page: 2
Content
•
Consistency models!
1. Data centric
2. Client centric
•
•
Consistency measure
Replication strategies
–
–
Protocols
Placement
CSCI319
Chapter 7
Page: 3
Reasons for Replication
•
•
Replication to increase the reliability of a system.
Replication for performance

Scaling in numbers

Scaling in geographical area
Contrariety

Consistency enforcement compromises
performance

Cost of increased bandwidth for maintaining
replication
CSCI319
Chapter 7
Page: 4
Consistency
With replication comes the question about consistency:
• How are changes on one replica “experienced” by
other replicas?
• How are changes “seen” by clients that access different
replicas?
Consistency models define the behavior of a system of
replicas.
If a system is in an inconsistent state then how can we
quantify (measure) the severity of the inconsistency?
• Consistency Units (conits)
CSCI319
Chapter 7
Page: 5
Measuring Inconsistency (1)
A (distributed) system is consistent when it
adheres to a given consistency model.
Consistency model: Processes agree to
obey a given rule, the store promises to
work correctly under such rule.
–This effectively places restrictions on how read
or write operations can be executed.
CSCI319
Chapter 7
Page: 6
Measuring Inconsistency (2)
With replication in a distributed system it
may not be possible to be consistent at
all times.
Measuring inconsistency in a distributed system:
– Why useful?
– How can this be done?
One solution: conits
CSCI319
Chapter 7
Page: 7
Measuring Inconsistency (3)
An example:
Lets assume that we have two storage containers
x and y, and that these are replicated on two disks.
If one machine changes the value of x (or y) on
one of the replicas, then this may not be
immediately replicated on the other replica. Hence,
the replicas can be in an inconsistent state. A conit
can measure the degree of an inconsistency with
respect to a set of storage containers. This is
illustrated on an example as follows:
CSCI319
Chapter 7
Page: 8
Measuring Inconsistency (4)
An example of keeping track of consistency deviations:
Replica A
Replica B
Conit
Conit
x=6; y=12
Operation
x=2; y=6
Result
Operation
Result
<5,B>
x:=x+2
[x=2]
<5,B>
x:=x+2
[x=2]
<7,B>
y:=x+1
[y=3]
<7,B>
y:=x+1
[y=3]
<8,A>
x:=x*2
[x=4]
<10,B>
y:=y*x
[y=6]
<10,B>
y:=y*x
[y=12]
<14,A>
x:=x+2
[x=6]
Vector clock A
= (15,10)
Order deviation
=3
Numerical deviation = (0,3)
CSCI319
Vector clock B
= (0,11)
Order deviation
=2
Numerical deviation = (2,12)
Chapter 7
Page: 9
Measuring Inconsistency (5)
Shown on the previous slide is:
• Two replicas.
• A number of operations with respect to the conit are
scheduled for execution. For example, the operation
<14,A> indicates that at logical time 14 an operation
was issued at replica A, and that the associated
operation is x := x+2
• The operation in the gray box indicates a committed
operation. Committed operations have been executed
locally and cannot be reversed.
Thus, the two replicas are in an obvious inconsistent state.
The question we wish to answer is: How can we
quantify the degree of inconsistency in this example.
CSCI319
Chapter 7
Page: 10 of 55
Interactive slide
What is a conit?
• The Consistency Unit specifies a unit over which
consistency is to be measured. A conit watches over a
set of related data items.
How is the order deviation computed?
• The number of tentative operations at a given replica
which have not yet been committed.
How is the numerical deviation computed?
• A vector counting the number of operations at other
replicas not seen at a given replica, and the maximum
difference (in value) between committed operations at a
given replica and the result of operations at other
replicas.
CSCI319
Chapter 7
Page: 11 of 55
Interactive slide
What is a conit?
• The Consistency Unit specifies a unit over which
consistency is to be measured. A conit watches over a
set of related data items.
How is the order deviation computed?
• The number of tentative (scheduled) operations at a
given replica which have not yet been committed.
How is the numerical deviation computed?
• A vector counting the number of operations at other
replicas not seen at a given replica, and the maximum
difference (in value) between committed operations at a
given replica and the result of operations at other
replicas.
CSCI319
Chapter 7
Page: 12 of 55
Interactive slide
How is the order deviation computed in the example of slide 10?
• The number of tentative (not committed yet) operations at replica A is
3, the number of tentative operations at replica B is 2.
How is the numerical deviation computed in the example?
• For Replica A: The number of operations in the system that the
replica has not yet seen is 1 (first part of the answer). And the result
of committed operations are x=2, y=0 whereas the result of tentative
operations at Replica B is x=2, y=5. The difference in x is 2, the
difference of y is 5. The maximum of the two differences is 5. Both
answers together give the numerical deviation for A=(1,5).
• For Replica B: The number of operations in the system that the
replica has not yet seen is 3 (first part of the answer). And the result
of committed operations is x=0, y=0 (no committed operations at B)
whereas the result of tentative operations at Replica A is x=6, y=3.
The difference in x is 6, the difference of y is 3. The maximum of the
two differences is 6. Both answers together give the numerical
deviation for B=(3,6).
CSCI319
Chapter 7
Page: 13 of 55
Interactive slide
How is the order deviation computed in the example of slide 10?
• The number of tentative (not committed yet) operations at replica A is
3, the number of tentative operations at replica B is 2.
How is the numerical deviation computed in the example?
• For Replica A: The number of operations in the system that the
replica has not yet seen is 0 (first part of the answer). And the result
of committed operations are x=2; y=3 whereas the result of tentative
operations at Replica B is x=2; y=6. The difference in value of x is 0,
the difference of y is 3. The maximum of the two differences is 3.
Both answers together give the numerical deviation for A=(0,3).
• For Replica B: The number of operations in the system that the
replica has not yet seen is 2 (first part of the answer). And the result
of committed operations is x=0; y=0 (no committed operations at B)
whereas the result of tentative operations at Replica A is x=6, y=12.
The difference in value of x is 6, the difference of y is 12. The
maximum of the two differences is 12. Both answers together give
the numerical deviation for B=(2,12).
CSCI319
Chapter 7
Page: 14 of 55
Consistency Models
1. Data centric consistency models
Concerns read and write on shared data (e.g.
shared memory, shared database, distributed file
system, etc.)
2. Client centric consistency models
Concerns consistency experienced by any one
client when accessing a distributed data store.
CSCI319
Chapter 7
Page: 15 of 55
Data-centric Consistency Models
The general organization of a logical data store, physically
distributed and/or replicated across multiple processes.
CSCI319
Chapter 7
Page: 16 of 55
Data centric consistency models
We will address two data centric consistency models:
1. Sequential consistency
2. Causal consistency
There is a third one called “Grouping operations”. This is
a technique with results in consistency between
elements in a group. Hence, this is considered a
consistency model as well.
CSCI319
Chapter 7
Page: 17 of 55
Data centric consistency models
Notation used explained on an example:
Behavior of two processes operating on the same data item.
The horizontal axis is time.
• Pi refers to the i-th process
• W(x)a refers to a value ‘a’ written to a data item x
• R(x)a refers to value ‘a’ read from data item x.
Note: P1 and P2 may write to a different replica (as was shown on slide 6).
CSCI319
Chapter 7
Page: 18 of 55
Sequential Consistency (1)
Definition: A data store is sequentially consistent
when:
The result of any execution is the same as if the
(read and write) operations by all processes on
the data store …
• were executed in some sequential order and …
• the operations of each individual process appear


in this sequence
in the order specified by its program.
Note that the term “appear” means how a process “sees” or
“experiences” the result of a write operation. This refers to the
read operation of a process.
CSCI319
Chapter 7
Page: 19 of 55
Sequential Consistency (2)
Three examples: (a) and (c) are a sequentially consistent data
store. (b) a data store that is not sequentially consistent.
CSCI319
Chapter 7
Page: 20 of 55
Sequential Consistency (2)
An example which may adopt the sequential consistency
model is: Data replication among true replicas. A true
replica may contain any information as long as all replica
contain the exact same information. Any event that is
encountered at any replica must be replicated in exactly
the same order to all other replica.
CSCI319
Chapter 7
Page: 21 of 55
Sequential Consistency (3)
A more thorough view into the effects of sequential consistency. Example:
Three concurrently-executing processes operating in a distributed memory
space. The variables involved are assumed to have been initialized with 0.
Process P1
x=1;
print(y, z);
Process P2
y = 1;
print(x, z);
Process P3
z = 1;
print(x, y);
Note, there are 90 valid execution sequences in this example,
64 of them are allowed under the sequential consistency model.
Lets have a look at four of them:
CSCI319
Chapter 7
Page: 22 of 55
Sequential Consistency (4)
Four of the possible 90 execution sequences for the
processes of the previous slide. The vertical axis is time:
x=1
print(y, z)
y=1
print(x, z)
z=1
print(x, y)
x=1
y=1
print(x, z)
print(y, z)
z=1
print(x, y)
y=1
z=1
print(x, y)
print(x, z)
x=1
print(y, z)
y=1
x=1
z=1
print(x, z)
print(y, z)
print(x, y)
Output: 001011
Output: 101011
Output: 010111
Output: 111111
(a)
(b)
(c)
(d)
Q: Which of these four execution sequences do not violate the
sequential consistency model?
CSCI319
Chapter 7
Page: 23 of 55
Interactive slide
Which of the previous four execution sequences do not violate
the sequential consistency model?
Answer: all four of them comply to the sequential consistency
model.
Example of an assessment question: Consider the following
situation:
Process P1
x=1;
print(y, z);
Process P2
y = 1;
print(x, z);
Process P3
z = 1;
print(x, y);
Task: Give an execution sequence, and output which would
violate the sequential consistency model.
CSCI319
Chapter 7
Page: 24 of 55
Causal Consistency (1)
Definition: Causal consistency is another data centric
consistency model:
For a data store to be considered causally
consistent, it is necessary that the store obeys the
following condition:
Writes that are potentially causally related …
– must be seen by all processes, and
– must be seen in the same order.
Concurrent writes …
– may be seen in a different order
– on different machines.
CSCI319
Chapter 7
Page: 25 of 55
Causal Consistency (2)
Example: This sequence is allowed with a causally-consistent
store (but would violate the sequentially consistency model).
CSCI319
Chapter 7
Page: 26 of 55
Causal Consistency (3)
Example (a) A violation of a causally-consistent store.
CSCI319
Chapter 7
Page: 27 of 55
Causal Consistency (4)
Example (b) A correct sequence of events in a causallyconsistent store.
The causal consistency model is particularly useful for shared
distributed databases
CSCI319
Chapter 7
Page: 28 of 55
Grouping Operations (1)
Grouping operations:
• are a more commonly applied synchronization
technique where the aim is to keep operations
between processes in a group synchronized.
• Support synchronization variables
–
•
Allow non-exclusive access to a resource
–
•
Which define a synchronization point
But does not guarantee that resource has been synchronized
Allows implementation of an entry consistency
model:
CSCI319
Chapter 7
Page: 29 of 55
Grouping Operations (2)
Necessary criteria for correct entry consistency synchronization
•
An “acquire access” of a synchronization variable is not
allowed to be performed until all updates to a guarded
shared data have been performed with respect to the
process which acquired the access.
•
Before an exclusive access to a synchronization variable is
allowed to be performed by a process, no other process
may hold the synchronization variable.
•
After the exclusive mode access to a synchronization
variable has been performed, any other process’ next
nonexclusive mode access to that synchronization variable
may not be performed until it has performed
synchronization with respect to that variable’s owner.
CSCI319
Chapter 7
Page: 30 of 55
Grouping Operations (3)
Example: A valid event sequence for entry consistency.
Acq(Lx) refers to the “acquire access” synchronization
operation on variable x.
CSCI319
Chapter 7
Page: 31 of 55
2.Client-Centric Consistency Models
• Data centric consistency models provide a
system wide consistency model on a shared
data structure.
• In contrast, client centric consistency is
consistency from a single clients’ point of
view.
• Realizes eventual consistency.
• Common client centric models:
1. Monotonic Reads
2. Monotonic Writes
3. Read your writes
4. Write follows reads
CSCI319
Chapter 7
Page: 32 of 55
Eventual Consistency
Client centric consistency: An illustration of the principle of a
mobile user accessing different replicas of a distributed database.
CSCI319
Chapter 7
Page: 33 of 55
Monotonic Reads (1)
Definition: A data store is said to provide
monotonic-read consistency if the following
condition holds:
If a process reads the value of a data item x then
any successive read operation on x by that process:
– will always return that same value
– or a more recent value.
CSCI319
Chapter 7
Page: 34 of 55
Monotonic Reads (2)
The read operations performed by a single process P at two
different local copies of the same data store. Example: (a) A
monotonic-read consistent data store.
Here, the notation is: xi is the version of item x at location i,
WS(xi) is the result of a write to xi at a local rfeplica, and
WS(xi,xj) is a subsequent writing to x based on the result of
xi at location j (in this sequence).
WS can be interpreted as a “Write executed by the System”
CSCI319
Chapter 7
Page: 35 of 55
Monotonic Reads (3)
The read operations performed by a single process P at two
different local copies of the same data store. Example (b): A
data store that does not provide monotonic reads.
This violates monotonic read consistency model since the
WS(x2) does not guarantee that all changes due to WS(x1)
have been performed on L2.
CSCI319
Chapter 7
Page: 36 of 55
Monotonic Reads (3)
Example: Email system
A client reading Emails by accessing a locally available
replica can expect to see the same Emails when accessing
another replica at a later time. It may be that new Email may
arrive (and hence, may be added to the users Email
database) in-between two reads. In this case, the client can
expect to see all the old Emails as well as the new Emails.
In other words, the Email client will never get to see an older
version of the Email database when accessing Email at
different replicas in the system. Such behavior is guaranteed
by the monotonic read consistency model.
CSCI319
Chapter 7
Page: 37 of 55
Monotonic Writes (1)
Definition: In a monotonic-write consistent store, the
following condition holds:
A write operation by a process on a data item x …
– is completed before any successive write operation on x
– and by the same process.
CSCI319
Chapter 7
Page: 38 of 55
Monotonic Writes (2)
The write operations performed by a single process P at two
different local copies of the same data store. Example (a): A
monotonic-write consistent data store.
CSCI319
Chapter 7
Page: 39
Monotonic Writes (3)
The write operations performed by a single process P at two
different local copies of the same data store. Example (b): A
data store that does not provide monotonic-write
consistency.
The monotonic writes consistency model is particularly
useful for distributed database systems.
CSCI319
Chapter 7
Page: 40
Read Your Writes (1)
Definition: A data store is said to provide read-yourwrites consistency, if the following condition holds:
The effect of a write operation by a process on data
item x …
– will always be seen by a successive read operation on x
– by the same process.
This is also known as the UNIX semantics.
CSCI319
Chapter 7
Page: 41
Read Your Writes (2)
Example (a): A data store that provides read-your-writes
consistency.
CSCI319
Chapter 7
Page: 42
Read Your Writes (3)
Example (b): A data store that does not provide read-yourwrites consistency.
CSCI319
Chapter 7
Page: 43
Writes Follow Reads (1)
Definition: A data store is said to provide
writes-follow-reads consistency, if the
following holds:
A write operation by a process …
– on a data item x following a previous read
operation on x by the same process is guaranteed to
take place on the same or a more recent value of x
that was read.
CSCI319
Chapter 7
Page: 44
Writes Follow Reads (2)
Example (a): A writes-follow-reads consistent data store.
CSCI319
Chapter 7
Page: 45
Writes Follow Reads (3)
Example (b): A data store that does not provide writesfollow-reads consistency.
Writes follows reads consistency is useful, for example, in
distributed user forums or newsgroups.
CSCI319
Chapter 7
Page: 46
Realizing Data-centric Consistency
Models
There are many ways by which each of the
consistency models can be realized.
One example: Most strategies for realizing
sequential consistency make use of
primary based protocols such as
– Remote-write protocols
– Local write protocols
These are also called primary backup protocols. The reason
for this nomenclature will become clear in the following:
CSCI319
Chapter 7
Page: 47
Remote-Write Protocols
The principle of a primary-backup protocol.
CSCI319
Chapter 7
Page: 48
Local-Write Protocols
Primary-backup protocol in which the primary migrates to the
process wanting to perform an update.
CSCI319
Chapter 7
Page: 49
Realizing Client-centric Consistency
Models
Again, there are many ways by which each of the
consistency models can be realized.
An example: Monotonic-read consistency
– Each write operation is assigned a unique identifier
wid.
– Each server has a globally unique identifier sid.
– Propagation includes the passing of sid and wid
– We can now determine whether a write has taken
place at a local copy before subsequent reads from
the data item by the same client is performed.
CSCI319
Chapter 7
Page: 50
Design issues
•
•
•
Where to place replicas (server, data)?
What to propagate?
Replication strategies
–
–
–
–
Server or client initiated?
Pull or Push protocol?
Remote or local write protocol?
How to realize a consistency model?
CSCI319
Chapter 7
Page: 51
Replica-Server Placement (1)
Strategies of replica-server placement can be non-trivial if a large
number of replicas are to be managed. A solution would be to segment
replicas by using a regular grid as in the following example. But then
choosing a proper cell size for server placement can be an issue:
CSCI319
Chapter 7
Page: 52 of 55
Replica-Server Placement (2)
More advanced strategies of replica-server placement
include clustering strategies such as LVQ, k-means,
average distance analysis, etc.
These are famous and scalable machine learning methods.
However, we will not go into detail of the underlying
algorithms here. Data Mining and Machine Learning
Subjects (postgraduate subjects) cover these.
CSCI319
Chapter 7
Page: 53 of 55
State versus Operations
Possibilities for what is to be propagated:
1. Propagate only a notification of an update.
2. Transfer data from one copy to another.
3. Propagate the update operation to other
copies.
CSCI319
Chapter 7
Page: 54 of 55
Pull versus Push Protocols
A replica can be maintained either by the server or by the client.
Depending of this, we compare between push-based (if replica is
maintained by a server) and pull-based (if maintained by client) protocols
in the case of multiple-client, single-server systems.
Push and pull protocols can be combined to gain certain
advantages (e.g. clients acquire a lease. The server pushes
updates until lease expires).
CSCI319
Chapter 7
Page: 55 of 55
Consistency Protocols
A consistency protocol:
• Describes an implementation of a specific consistency
model. I.e, if a certain level of out-of-datedness may be
acceptable in certain situations.
• The protocol creates a bound on
 Numerical deviation
 Staleness deviation
 Order deviation
Conit is a way to compute these deviations.
CSCI319
Chapter 7
Page: 56 of 55
Continuous Consistency
Binding numerical deviation:
– Allow a numerical upper bound for deviation of
number of transactions.
Binding staleness deviation:
– Achieved through vector clocks and clock
synchronization.
Binding order deviation:
– I.e., refuse writes until sufficient number of
tentative writes are committed.
CSCI319
Chapter 7
Page: 57 of 55
Summary
•
Consistency models
–
–
•
•
Client centric
Data centric
(In-)consistency measures
Design issues
–
–
–
Replication strategies
Placement
Protocols
CSCI319
Chapter 7
Page: 58 of 55