p 1

CPSC 668
Distributed Algorithms and
Systems
Fall 2009
Prof. Jennifer Welch
CPSC 668
Set 16: Distributed Shared Memory
1
Distributed Shared Memory
• A model for inter-process communication
• Provides illusion of shared variables on top of
message passing
• Shared memory is often considered a more
convenient programming platform than
message passing
• Formally, give a simulation of the shared
memory model on top of the message
passing model
• We'll consider the special case of
– no failures
– only read/write variables to be simulated
CPSC 668
Set 16: Distributed Shared Memory
2
The Simulation
users of read/write shared memory
read/write
return/ack
…
alg0
send
read/write
recv
Shared Memory
return/ack
algn-1
send
recv
Message Passing System
CPSC 668
Set 16: Distributed Shared Memory
3
Shared Memory Issues
• A process invokes a shared memory operation (read
or write) at some time
• The simulation algorithm running on the same node
executes some code, possibly involving exchanges of
messages
• Eventually the simulation algorithm informs the
process of the result of the shared memory operation.
• So shared memory operations are not instantaneous!
– Operations (invoked by different processes) can overlap
• What values should be returned by operations that
overlap other operations?
– defined by a memory consistency condition
CPSC 668
Set 16: Distributed Shared Memory
4
Sequential Specifications
• Each shared object has a sequential
specification: specifies behavior of
object in the absence of concurrency.
• Object supports operations
– invocations
– matching responses
• Set of sequences of operations that are
legal
CPSC 668
Set 16: Distributed Shared Memory
5
Sequential Spec for R/W Registers
• Each operation has two parts, invocation and
response
• Read operation has invocation readi(X) and
response returni(X,v)
• Write operation has invocation writei(X,v) and
response acki(X)
• A sequence of operations is legal iff each
read returns the value of the latest preceding
write.
• Ex: write0(X,3) ack0(X) read1(X) return1(X,3)
CPSC 668
Set 16: Distributed Shared Memory
6
Memory Consistency Conditions
• Consistency conditions tie together the
sequential specification with what
happens in the presence of concurrency.
• We will study two well-known conditions:
– linearizability
– sequential consistency
• We will only consider read/write
registers, in the absence of failures.
CPSC 668
Set 16: Distributed Shared Memory
7
Definition of Linearizability
• Suppose  is a sequence of invocations and
responses.
– an invocation is not necessarily immediately
followed by its matching response
•  is linearizable if there exists a permutation
 of all the operations in  (now each
invocation is immediately followed by its
matching response) s.t.
– |X is legal (satisfies sequential spec) for all X, and
– if response of operation O1 occurs in  before
invocation of operation O2, then O1 occurs in 
before O2 ( respects real-time order of nonoverlapping operations in ).
CPSC 668
Set 16: Distributed Shared Memory
8
Linearizability Examples
Suppose there are two shared variables, X and Y,
both initially 0
write(X,1)
ack(X)
read(Y)
return(Y,1)
p0
1
3
write(Y,1)
ack(Y)
0
return(X,1)
read(X)
p1
2
4
Is this sequence linearizable? Yes - green triangles.
What if p1's read returns 0?
CPSC 668
No - see arrow.
Set 16: Distributed Shared Memory
9
Definition of Sequential Consistency
• Suppose  is a sequence of invocations and
responses.
•  is sequentially consistent if there exists a
permutation  of all the operations in  s.t.
– |X is legal (satisfies sequential spec) for all X,
and
– if response of operation O1 occurs in  before
invocation of operation O2 at the same process,
then O1 occurs in  before O2 ( respects real-time
order of operations by the same process in ).
CPSC 668
Set 16: Distributed Shared Memory
10
Sequential Consistency Examples
Suppose there are two shared variables, X and Y,
both initially 0
write(X,1) 3 ack(X)
p0
write(Y,1)
1
read(Y)
ack(Y)
4
0
return(Y,1)
read(X)
2
return(X,0)
p1
Is this sequence sequentially consistent? Yes - green numbers.
What if p0's read returns 0?
CPSC 668
No - see arrows.
Set 16: Distributed Shared Memory
11
Specification of Linearizable
Shared Memory Comm. System
• Inputs are invocations on the shared objects
• Outputs are responses from the shared
objects
• A sequence  is in the allowable set iff
– Correct Interaction: each proc. alternates
invocations and matching responses
– Liveness: each invocation has a matching
response
– Linearizability:  is linearizable
CPSC 668
Set 16: Distributed Shared Memory
12
Specification of Sequentially
Consistent Shared Memory
• Inputs are invocations on the shared objects
• Outputs are responses from the shared
objects
• A sequence  is in the allowable set iff
– Correct Interaction: each proc. alternates
invocations and matching responses
– Liveness: each invocation has a matching
response
– Sequential Consistency:  is sequentially
consistent
CPSC 668
Set 16: Distributed Shared Memory
13
Algorithm to Implement
Linearizable Shared Memory
• Uses totally ordered broadcast as the underlying
communication system.
• Each proc keeps a replica for each shared variable
• When read request arrives:
– send bcast msg containing request
– when own bcast msg arrives, return value in local replica
• When write request arrives:
– send bcast msg containing request
– upon receipt, each proc updates its replica's value
– when own bcast msg arrives, respond with ack
CPSC 668
Set 16: Distributed Shared Memory
14
The Simulation
users of read/write shared memory
read/write
alg0
to-bc-send
return/ack
read/write
…
return/ack
algn-1
to-bc-recv
to-bc-send
Shared Memory
to-bc-recv
Totally Ordered Broadcast
CPSC 668
Set 16: Distributed Shared Memory
15
Correctness of Linearizability
Algorithm
• Consider any admissible execution  of
the algorithm
– underlying totally ordered broadcast
behaves properly
– users interact properly
• Show that , the restriction of  to the
events of the top interface, satisfies
Liveness, and Linearizability.
CPSC 668
Set 16: Distributed Shared Memory
16
Correctness of Linearizability
Algorithm
• Liveness (every invocation has a response):
By Liveness property of the underlying totally
ordered broadcast.
• Linearizability: Define the permutation  of
the operations to be the order in which the
corresponding broadcasts are received.
–  is legal: because all the operations are
consistently ordered by the TO bcast.
–  respects real-time order of operations: if O1
finishes before O2 begins, O1's bcast is ordered
before O2's bcast.
CPSC 668
Set 16: Distributed Shared Memory
17
Why is Read Bcast Needed?
• The bcast done for a read causes no
changes to any replicas, just delays the
response to the read.
• Why is it needed?
• Let's see what happens if we remove it.
CPSC 668
Set 16: Distributed Shared Memory
18
Why Read Bcast is Needed
read return(1)
p0
write(1)
p1
to-bc-send
p2
read return(0)
CPSC 668
Set 16: Distributed Shared Memory
19
Algorithm for Sequential Consistency
• The linearizability algorithm, without doing a bcast for
reads:
• Uses totally ordered broadcast as the underlying
communication system.
• Each proc keeps a replica for each shared variable
• When read request arrives:
– immediately return the value stored in the local replica
• When write request arrives:
– send bcast msg containing request
– upon receipt, each proc updates its replica's value
– when own bcast msg arrives, respond with ack
CPSC 668
Set 16: Distributed Shared Memory
20
Correctness of SC Algorithm
Lemma (9.3): The local copies at each proc.
take on all the values appearing in write
operations, in the same order, which
preserves the order of non-overlapping writes
- implies per-process order of writes
Lemma (9.4): If pi writes Y and later reads X,
then pi's update of its local copy of Y (on
behalf of that write) precedes its read of its
local copy of X (on behalf of that read).
CPSC 668
Set 16: Distributed Shared Memory
21
Correctness of the SC Algorithm
(Theorem 9.5) Why does SC hold?
• Given any admissible execution , must
come up with a permutation  of the
shared memory operations that is
– legal and
– respects per-proc. ordering of operations
CPSC 668
Set 16: Distributed Shared Memory
22
The Permutation 
•
•
Insert all writes into  in their to-bcast
order.
Consider each read R in  in the order of
invocation:
– suppose R is a read by pi of X
– place R in  immediately after the later of
1. the operation by pi that immediately precedes R
in , and
2. the write that R "read from" (caused the latest
update of pi's local copy of X preceding the
response for R)
CPSC 668
Set 16: Distributed Shared Memory
23
Permutation Example
4
read return(2)
p0
write(2)
p1
3
ack
to-bc-send
to-bc-send
p2
write(1)
1
ack
read return(1)
2
permutation is given by green numbers
CPSC 668
Set 16: Distributed Shared Memory
24
Permutation  Respects Per
Proc. Ordering
For a specific proc:
• Relative ordering of two writes is preserved
by Lemma 9.3
• Relative ordering of two reads is preserved
by the construction of 
• If write W precedes read R in exec. , then W
precedes R in  by construction
• Suppose read R precedes write W in .
Show same is true in .
CPSC 668
Set 16: Distributed Shared Memory
25
Permutation  Respects Ordering
• Suppose in contradiction R and W are swapped in :
– There is a read R' by pi that equals or precedes R in 
– There is a write W' that equals W or follows W in the tobcast order
– And R' "reads from" W'.
R'
|pi :
:
R
W
…W … W' … R' … R …
• But:
– R' finishes before W starts in  and
– updates are done to local replicas in to-bcast order (Lemma
9.3) so update for W' does not precede update for W
– so R' cannot read from W'.
CPSC 668
Set 16: Distributed Shared Memory
26
Permutation  is Legal
• Consider some read R of X by pi and some
write W s.t. R reads from W in .
• Suppose in contradiction, some other write W'
to X falls between W and R in :
:
…W … W' … R …
• Why does R follow W' in ?
CPSC 668
Set 16: Distributed Shared Memory
27
Permutation  is Legal
Case 1: W' is also by pi. Then R follows
W' in  because R follows W' in .
• Update for W at pi precedes update for
W' at pi in  (Lemma 9.3).
• Thus R does not read from W,
contradiction.
CPSC 668
Set 16: Distributed Shared Memory
28
Permutation  is Legal
Case 2: W' is not by pi. Then R follows W' in  due
to some operation O, also by pi , s.t.
– O precedes R in , and
– O is placed between W' and R in 
:
…W … W' … O … R …
Consider the earliest such O.
Case 2.1: O is a write (not necessarily to X).
• update for W' at pi precedes update for O at pi in 
(Lemma 9.3)
• update for O at pi precedes pi's local read for R in 
(Lemma 9.4)
• So R does not read from W, contradiction.
CPSC 668
Set 16: Distributed Shared Memory
29
Permutation  is Legal
:
…W … W' … O … R …
Case 2.2: O is a read.
• By construction of , O must read X and in
fact read from W' (otherwise O would not be
after W')
• Update for W at pi precedes update for W' at
pi in  (Lemma 9.3).
• Update for W' at pi precedes local read for O
at pi in  (otherwise O would not read from
W').
• Thus R cannot read from W, contradiction.
CPSC 668
Set 16: Distributed Shared Memory
30
Performance of SC Algorithm
• Read operations are implemented "locally",
without requiring any inter-process
communication.
• Thus reads can be viewed as "fast": time
between invocation and response is only that
needed for some local computation.
• Time for writes is time for delivery of one
totally ordered broadcast (depends on how
to-bcast is implemented).
CPSC 668
Set 16: Distributed Shared Memory
31
Alternative SC Algorithm
• It is possible to have an algorithm that
implements sequentially consistent shared
memory on top of totally ordered broadcast
that has reverse performance:
– writes are local/fast (even though bcasts are sent,
don't wait for them to be received)
– reads can require waiting for some bcasts to be
received
• Like the previous SC algorithm, this one does
not implement linearizable shared memory.
CPSC 668
Set 16: Distributed Shared Memory
32
Time Complexity for DSM
Algorithms
• One complexity measure of interest for DSM
algorithms is how long it takes for operations to
complete.
• The linearizability algorithm required D time for both
reads and writes, where D is the maximum time for a
totally-ordered broadcast message to be received.
• The sequential consistency algorithm required D time
for writes and C time for reads, where C is the time for
doing some local computation.
• Can we do better? To answer this question, we need
some kind of timing model.
CPSC 668
Set 16: Distributed Shared Memory
33
Timing Model
• Assume the underlying communication
system is the point-to-point message
passing system (not totally ordered
broadcast).
• Assume that every message has delay
in the range [d-u,d].
• Claim: Totally ordered broadcast can
be implemented in this model so that D,
the maximum time for delivery, is O(d).
CPSC 668
Set 16: Distributed Shared Memory
34
Time and Clocks in Layered Model
• Timed execution: associate an occurrence
time with each node input event.
• Times of other events are "inherited" from
time of triggering node input
– recall assumption that local processing time is
negligible.
• Model hardware clocks as before: run at
same rate as real time, but not synchronized
• Notions of view, timed view, shifting are
same:
– Shifting Lemma still holds (relates h/w clocks and
msg delays between original and shifted execs)
CPSC 668
Set 16: Distributed Shared Memory
35
Lower Bound for SC
Let Tread = worst-case time for a read to
complete
Let Twrite = worst-case time for a write to
complete
Theorem (9.7): In any simulation of
sequentially consistent shared memory
on top of point-to-point message
passing, Tread + Twrite  d.
CPSC 668
Set 16: Distributed Shared Memory
36
SC Lower Bound Proof
• Consider any SC simulation with Tread + Twrite < d.
• Let X and Y be two shared variables, both initially 0.
• Let 0 be admissible execution whose top layer
behavior is
write0(X,1) ack0(X) read0(Y) return0(Y,0)
– write begins at time 0, read ends before time d
– every msg has delay d
• Why does 0 exist?
– The alg. must respond correctly to any sequence of
invocations.
– Suppose user at p0 wants to do a write, immediately followed
by a read.
– By SC, read must return 0.
– By assumption, total elapsed time is less than d.
CPSC 668
Set 16: Distributed Shared Memory
37
SC Lower Bound Proof
• Similarly, let 1 be admissible execution whose
top layer behavior is
write1(Y,1) ack1(Y) read1(X) return1(X,0)
– write begins at time 0, read ends before time d
– every msg has delay d
• 1 exists for similar reason.
• Now merge p0's timed view in 0 with p1's
timed view in 1 to create admissible execution
'.
• But ' is not SC, contradiction!
CPSC 668
Set 16: Distributed Shared Memory
38
SC Lower Bound Proof
time
0
p0
0
write(X,1)
read(Y,0)
write(Y,1)
read(X,0)
write(X,1)
read(Y,0)
d
p1
1
p0
p1
'
p0
p1
CPSC 668
write(Y,1)
read(X,0)
Set 16: Distributed Shared Memory
39
Linearizability Write Lower Bound
Theorem (9.8): In any simulation of linearizable
shared memory on top of point-to-point
message passing, Twrite ≥ u/2.
Proof: Consider any linearizable simulation with
Twrite < u/2.
• Let be an admissible exec. whose top layer
behavior is:
p1 writes 1 to X, p2 writes 2 to X, p0 reads 2 from X
• Shift to create admissible exec. in which p1
and p2's writes are swapped, causing p0's
read to violate linearizability.
CPSC 668
Set 16: Distributed Shared Memory
40
Linearizability Write Lower Bound
time:
0
u
u/2
read 2
p0
:
write 1
p1
write 2
p2
p0
delay
pattern
d - u/2
d - u/2
d - u/2
p1
d - u/2
d
d-u
p2
CPSC 668
Set 16: Distributed Shared Memory
41
Linearizability Write Lower Bound
time:
0
u
u/2
read 2
p0
write 1
shift p1
by u/2
p1
shift p2
by -u/2
write 2
p2
p0
delay
pattern
d
d-u
d
p1
d-u
d- u
d
p2
CPSC 668
Set 16: Distributed Shared Memory
42
Linearizability Read Lower Bound
• Approach is similar to the write lower bound.
• Assume in contradiction there is an algorithm
with Tread < u/4.
• Identify a particular execution:
– fix a pattern of read and write invocations,
occurring at particular times
– fix the pattern of message delays
• Shift this execution to get one that is
– still admissible
– but not linearizable
CPSC 668
Set 16: Distributed Shared Memory
43
Linearizability Read Lower Bound
Original execution:
• p1 reads X and gets 0 (old value).
• Then p0 starts writing 1 to X.
• When write is done, p0 reads X and gets 1
(new value).
• Also, during the write, p0 and p1 alternate
reading X.
• At some point, the reads stop getting the old
value (0) and start getting the new value (1)
CPSC 668
Set 16: Distributed Shared Memory
44
Linearizability Read Lower Bound
• Set all delays in this execution to be d - u/2.
• Now shift p2 earlier by u/2.
• Verify that result is still admissible (every
delay either stays the same or becomes d or
d - u).
• But in shifted execution, sequence of values
read is
0, 0, …, 0, 1, 0, 1, 1, …, 1
CPSC 668
Set 16: Distributed Shared Memory
45
Linearizability Read Lower Bound
u/2
read 1
read 0
read 1
read 1
p2
read 0
read 0
read 1
read 1
p1
write 1
p0
p2
read 0
read 0
read 1
read 1
read 0
read 1
read 1
read 1
p1
p0
CPSC 668
write 1
Set 16: Distributed Shared Memory
46