Distributed System
UNIT-III
The Byzantine General Problem
Once upon a time...
Some of them may be traitors
who will try to confuse the
others
Communicating only
by messenger
Generals must agree
upon a common
battle plan
The pictures are taken from: R. Goscinny and A. Uderzo, Asterix and Latraviata.
Byzantine Generals Problem &
Impossible Results
• Find an algorithm
– To ensure that the loyal generals will reach agreement
– A small number of traitors cannot cause the loyal generals to
adopt a bad plan
• Remodeled as a commanding general sending an order
to his lieutenants
– IC1: All loyal generals get same result
– IC2: If commander is loyal, all loyal generals follow his choice
• No solution will work unless there are more than 2/3 loyal
ones
Example: Poor Lieutenant 1’s Dilemma
Commander
Lieutenant 1
He said retreat
Lieutenant 2 (Traitor)
Commander (Traitor)
IC1 violated !
The two situations
are identical to me!
Lieutenant 1
Attack
He said retreat
Lieutenant 2
Retreat
Solutions
• Solution 1: Using Oral Messages
• Solution 2: Using Signed Messages
Solution using Oral Message
• Solution for more than 3m+1 generals with m traitors
• Oral messages:
– Every message that is sent is delivered correctly
– The receiver of a message knows who sent it
– The absence of a message can be detected
• Function 'majority':
– With the property that if a majority of the values vi equals v, then
majority(v1,...,vn-1) equals v.
• Order set Vi
– Each lieutenant uses it to store orders from others
• Algorithm OM(m) can deal with m traitors
– Defined recursively
Base case:
CommanderOM(0)
0
Commander sends messages
to Lieutenants
Each Lieutenant receives and
records it.
attack
Lieutenant i
Lieutenant j
Lieutenant k
Vi ={v0:attack}
Vi ={v0:attack}
Vi ={v0:attack}
OM(m)
Commander
Each Lieutenant act as the
commander in OM(m-1)
Send messages to ‘his’
Lieutenants
Do this recursively
attack
attack
attack
……
Lieutenant i
Lieutenant j
attack
Lieutenant k
Step 3: Majority
Vote
Commander
For any m, Algorithm OM(m)
satisfies conditions IC1 and IC2
if there are more than 3m
generals and at most m traitors
My decision is:
majority(v1,v2,…,v_n-1)
Me too
Me too
……
Lieutenant 1
Lieutenant 2
Lieutenant n-1
OM(1): Lieutenant
Commander 3 is a traitor
IC1 achieved
IC2 achieved
Majority(attack,attack,attack)
=attack
Attack
Majority(attack,attack,retreat)
=attack
Attack
Attack
Attack
Lieutenant 1
Lieutenant 2
Attack
Attack
Retreat
Lieutenant 3 (Traitor)
OM(1): Commander
Commander (Traitor) is a traitor
IC1 achieved
IC2 need not be satisfied
Retreat
Majority(attack,retreat,retreat)
=retreat
Majority(attack,retreat,retreat)
=retreat
Attack
Retreat
Retreat
Lieutenant 1
Majority(attack,retreat,retreat)
=retreat
Retreat
Lieutenant 2
Attack
Retreat
Lieutenant 3
Solution with Signed Messages
• What is a signed message?
– A loyal general's signature cannot be forged, and any alteration
of the contents of his signed messages can be detected
– Anyone can verify the authenticity of a general's signature
• Function choice(V): decision making
– If the set V consists of the single element v, then choice(V)=v
• Note: no other characteristics needed for choice(V)
Step 1
Commander
sends message to
each Lieutenant
For any Lieutenant i, if he
receives the v:0 message and he
has not received any order yet
Commander (Traitor)
Let
Vi={v}
Send v:0:i to other lieutenants
attack:0:i
Vj={attack}
Vj={attack,attack}
Lieutenant j
attack:0:i
Lieutenant i
Vi={attack}
Lieutenant k
Vk={retreat}
Vk={retreat,attack}
Step 2
If
Lieutenant i receives a
Commander (Traitor)
message of v:0:j1:…:jk, and v is
NOT in set Vi, then
Add
If
v to Vi
k<m, send v:0:j1:…:jk:i to every
lieutenant except j1,…,jk
When
any Lieutenant i will
receive no more messages
Make
Vj={attack,attack,retreat}
decision using choice(Vi)
They get
the same order set!
Vi=Vj=Vk
Lieutenant i
Vi={attack,attack,retreat}
Lieutenant j
Lieutenant k
Vk={attack,attack,retreat}
Example
Commander (Traitor)
For
any m, Algoritym SM(m)
solves the Byzantine Generals
Problem if there are at most m
traitors.
The traitor can not
cheat now!
They get same
information, thus
same decision
Retreat:0:2
Attack:0:1
Lieutenant 1
Lieutenant 2
V1 = {Attack,Retreat}
V2 = {Attack,Retreat}
Conclusion
• The requirements (Interactive Consistency Condition)
– IC1: All loyal generals get same result
– IC2: If commander is loyal, all loyal generals follow his choice
• Theorems to remember:
– 1. For any m, Algorithm OM(m) satisfies conditions IC1 and IC2 if
there are more than 3m generals and at most m traitors
– 2. For any m, Algorithm SM(m) solves the Byzantine Generals
Problem if there are at most m traitors.
Discussions
• These solutions are not used in practice
– Why?
• What if the messages get lost a lot during
communication?
• Are there any other way besides ‘majority’
and ‘same information’?
Naïve solution
• ith general sends v(i) to all other generals
• To deal with two requirements:
– All generals combine their information v(1), v(2), .., v(n)
in the same way
– Majority (v(1), v(2), …, v(n)), ignore minority traitors
• Naïve solution does not work:
– Traitors may send different values to different generals.
– Loyal generals might get conflicting values from traitors
• Requirement: Any two loyal generals must use the same
value of v(i) to decide on same plan of action.
Reduction of General Problem
• Insight: We can restrict ourselves to the problem of one
general sending its order to others.
• Byzantine Generals Problem (BGP):
– A commanding general (commander) must send an order to his n1 lieutenants.
• Interactive Consistency Conditions:
– IC1: All loyal lieutenants obey the same order.
– IC2: If the commanding general is loyal, then every loyal
lieutenant obeys the order he sends.
• Note: If General is loyal, IC2 => IC1.
• Original problem: each general sends his value v(i) by
using the above solution, with other generals acting as
lieutenants.
3-General Impossibly Example
•
•
•
•
•
•
3 generals, 1 traitor among them.
Two messages: Attack or Retreat
Shaded – Traitor
L1 sees (A,R). Who is the traitor? C or L2?
Fig 1: L1 has to attack to satisfy IC2.
Fig 2: L1 attacks, L2 retreats. IC1 violated.
General Impossibility
• In general, no solutions with fewer than 3m+1
generals can cope with m traitors.
• Proof by contradiction.
– Assume there is a solution for 3m Albanians with m
traitors.
– Reduce to 3-General problem.
- Solution to 3m
problem => Solution to
3-General problem!!
Solution I – Oral Messages
• If there are 3m+1 generals, solution allows up to m
traitors.
• Oral messages – the sending of content is entirely under
the control of sender.
• Assumptions on oral messages:
– A1 – Each message that is sent is delivered correctly.
– A2 – The receiver of a message knows who sent it.
– A3 – The absence of a message can be detected.
• Assures:
– Traitors cannot interfere with communication as third party.
– Traitors cannot send fake messages
– Traitors cannot interfere by being silent.
• Default order to “retreat” for silent traitor.
Oral Messages (Cont)
• Algorithm OM(0)
– Commander send his value to every lieutenant.
– Each lieutenant (L) use the value received from commander, or
RETREAT if no value is received.
• Algorithm OM(m), m>0
– Commander sends his value to every Lieutenant (vi)
– Each Lieutenant acts as commander for OM(m-1) and sends vi to the
other n-2 lieutenants (or RETREAT)
– For each i, and each j<>i, let vj be the value lieutenant i receives from
lieutenant j in step (2) using OM(m-1). Lieutenant i uses the value
majority (v1, …, vn-1).
– Why j<>i? “Trust myself more than what others said I said.”
Restate Algorithm
• OM(M):
– Commander sends out command.
– Each lieutenant acts as commander in OM(m-1). Sends
out command to other lieutenants.
– Use majority to compute value based on commands
received by other lieutenants in OM(m-1)
• Revisit Interactive Consistency goals:
– IC1: All loyal lieutenants obey the same command.
– IC2: If the commanding general is loyal, then every loyal
lieutenant obeys the command he sends.
Example (n=4, m=1)
• Algorithm OM(1): L3 is a traitor.
• L1 and L2 both receive v,v,x. (IC1 is met.)
• IC2 is met because L1 and L2 obeys C
Example (n=4, m=1)
• Algorithm OM(1): Commander is a traitor.
• All lieutenants receive x,y,z. (IC1 is met).
• IC2 is irrelevant since commander is a traitor.
Expensive Communication
•
•
•
•
•
•
OM(m) invokes n-1 OM(m-1)
OM(m-1) invokes n-2 OM(m-2)
OM(m-2) invokes n-3 OM(m-3)
…
OM(m-k) will be called (n-1)…(n-k) times
O(nm) – Expensive!
Distributed File Systems
Introduction
File systems are responsible for the
organization, storage, retrieval, naming,
sharing and protection of files.
Files contain both data and attributes.
A typical attribute record structure is
illustrated in Figure .
30
Introduction
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list
Figure . File attribute record structure
31
Introduction
Distributed file systems support the
sharing of information in the form of files
and hardware resources.
With the advent of distributed object
systems (CORBA, Java) and the web, the
picture has become more complex.
32
Definition of a DFS
• DFS: multiple users, multiple sites, and
(possibly) distributed storage of files.
• Benefits
– File sharing
– Uniform view of system from different clients
– Centralized administration
• Goals of a distributed file system
– Network Transparency (access transparency)
– Availability
Goals
• Network (Access)Transparency
– Users should be able to access files over a
network as easily as if the files were stored
locally.
– Users should not have to know the physical
location of a file to access it.
• Transparency can be addressed through
naming and file mounting mechanisms
Components of Access Transparency
• Location Transparency: file name doesn’t
specify physical location
• Location Independence: files can be moved to
new physical location, no need to change
references to them. (A name is independent
of its addresses
• Location independence → location
transparency, but the reverse is not
necessarily true.
Goals
• Availability: files should be easily and quickly
accessible.
• The number of users, system failures, or other
consequences of distribution shouldn’t
compromise the availability.
• Addressed mainly through replication.
Introduction
Distributed File system requirements
Related requirements in distributed file systems
are:
Transparency
Concurrency
Replication
Heterogeneity
Fault tolerance
Consistency
Security
Efficiency
37
Architectures
• Client-Server
– Traditional; e.g. Sun Microsystem Network File
System (NFS)
– Cluster-Based Client-Server; e.g., Google File
System (GFS)
• Symmetric
– Fully decentralized; based on peer-to-peer
technology
– e.g., Ivy (uses a Chord DHT approach)
Client-Server Architecture
• One or more machines (file servers) manage
the file system.
• Files are stored on disks at the servers
• Requests for file operations are made from
clients to the servers.
• Client-server systems centralize storage and
management; P2P systems decentralize it.
client
client
cache
cache
Communication Network
cache
Server
Server
Disks
cache
Server
Architecture of a distributed file system: client-server model
Sun’s Network File System
• Sun’s NFS for many years was the most widely
used distributed file system.
– NFSv3: version three, used for many years
– NFSv4: introduced in 2003
• Version 4 made significant changes
Overview
• NFS goals:
– Each file server presents a standard view of its local file
system
– transparent access to remote files
– compatibility with multiple operating systems and
platforms.
– easy crash recovery at server (at least v1-v3)
• Originally UNIX based; now available for most
operating systems.
• NFS communication protocols lets processes running
in different environments share a file system.
Access Models
• Clients access the server transparently through
an interface similar to the local file system
interface
• Client-side caching may be used to save time
and network traffic
• Server defines and performs all file operations
Distributed File Systems Services
• Services provided by the distributed file system:
(1) Name Server: Provides mapping (name resolution) the
names supplied by clients into objects (files and directories)
• Takes place when process attempts to access file or directory the first
time.
(2) Cache manager: Improves performance through file caching
• Caching at the client - When client references file at server:
– Copy of data brought from server to client machine
– Subsequent accesses done locally at the client
• Caching at the server:
– File saved in memory to reduce subsequent access time
* Issue: different cached copies can become inconsistent. Cache
managers (at server and clients) have to provide coordination.
44
Typical Data Access in a Client/File Server Architecture
45
Mechanisms used in distributed file systems
(1) Mounting
•
•
•
The mount mechanism binds together several filename spaces
(collection of files and directories) into a single hierarchically structured
name space (Example: UNIX and its derivatives)
A name space ‘A’ can be mounted (bounded) at an internal node (mount
point) of a name space ‘B’
Implementation: kernel maintains the mount table, mapping mount
points to storage devices
46
Mechanisms used in distributed file systems (cont.)
(1) Mounting (cont.)
• Location of mount information
a. Mount information maintained at clients
– Each client mounts every file system
– Different clients may not see the same filename space
– If files move to another server, every client needs to update its mount
table
– Example: SUN NFS
b. Mount information maintained at servers
– Every client see the same filename space
– If files move to another server, mount info at server only needs to change
– Example: Sprite File System
47
Mechanisms used in distributed file systems (cont.)
(2) Caching
– Improves file system performance by exploiting the locality of
reference
– When client references a remote file, the file is cached in the main
memory of the server (server cache) and at the client (client cache)
– When multiple clients modify shared (cached) data, cache consistency
becomes a problem
– It is very difficult to implement a solution that guarantees consistency
(3) Hints
– Treat the cached data as hints, i.e. cached data may not be completely
accurate
– Can be used by applications that can discover that the cached data is
invalid and can recover
• Example:
– After the name of a file is mapped to an address, that address is stored as a
hint in the cache
– If the address later fails, it is purged from the cache
– The name server is consulted to provide the actual location of the file and the
cache is updated
48
Mechanism used in distributed file systems (cont.)
(4) Bulk data transfer
– Observations:
• Overhead introduced by protocols does not depend on the amount of data
transferred in one transaction
• Most files are accessed in their entirety
– Common practice: when client requests one block of data, multiple
consecutive blocks are transferred
(5) Encryption
– Encryption is needed to provide security in distributed systems
– Entities that need to communicate send request to authentication
server
– Authentication server provides key for conversation
49
Design Issues
1. Naming and name resolution
– Terminology
• Name: each object in a file system (file, directory) has a unique name
• Name resolution: mapping a name to an object or multiple objects (replication)
• Name space: collection of names with or without same resolution mechanism
– Approaches to naming files in a distributed system
(a) Concatenate name of host to names of files on that host
– Advantage: unique filenames, simple resolution
– Disadvantages:
» Conflicts with network transparency
» Moving file to another host requires changing its name and the applications using it
(b) Mount remote directories onto local directories
– Requires that host of remote directory is known
– After mounting, files referenced location-transparent (I.e., file name does not reveal its
location)
(c) Have a single global directory
– All files belong to a single name space
– Limitation: having unique system wide filenames require a single computing facility or
cooperating facilities
50
Design Issues (cont.)
1. Naming and Name Resolution (cont.)
– Contexts
• Solve the problem of system-wide unique names, by partitioning a name
space into contexts (geographical, organizational, etc.)
• Name resolution is done within that context
• Interpretation may lead to another context
• File Name = Context + Name local to context
– Nameserver
• Process that maps file names to objects (files, directories)
• Implementation options
– Single name Server
» Simple implementation, reliability and performance issues
– Several Name Servers (on different hosts)
» Each server responsible for a domain
» Example:
Client requests access to file ‘A/B/C’
Local name server looks up a table (in kernel)
Local name server points to a remote server for ‘/B/C’ mapping
51
Design Issues (Cont.)
3. Writing policy
– Question: once a client writes into a file (and the local cache), when should
the modified cache be sent to the server?
– Options:
• Write-through: all writes at the clients, immediately transferred to the
servers
– Advantage: reliability
– Disadvantage: performance, it does not take advantage of the cache
• Delayed writing: delay transfer to servers
– Advantages:
» Many writes take place (including intermediate results) before a
transfer
» Some data may be deleted
– Disadvantage: reliability
• Delayed writing until file is closed at client
– For short open intervals, same as delayed writing
– For long intervals, reliability problems
52
Design Issues (Cont.)
4. Availability
–
–
–
Issue: what is the level of availability of files in a distributed file system?
Resolution: use replication to increase availability, i.e. many copies
(replicas) of files are maintained at different sites/servers
Replication issues:
•
•
–
How to keep replicas consistent
How to detect inconsistency among replicas
Unit of replication
•
•
File
Group of files
a) Volume: group of all files of a user or group or all files in a server
»
»
Advantage: ease of implementation
Disadvantage: wasteful, user may need only a subset replicated
b) Primary pack vs. pack
»
»
Primary pack:all files of a user
Pack: subset of primary pack. Can receive a different degree of replication
for each pack
53
Design Issues (Cont.)
5. Scalability
– Issue: can the design support a growing system?
– Example: server-initiated cache invalidation complexity and load grow with
size of system. Possible solutions:
• Do not provide cache invalidation service for read-only files
• Provide design to allow users to share cached data
– Design file servers for scalability: threads, SMPs, clusters
6. Semantics
– Expected semantics: a read will return data stored by the latest write
– Possible options:
• All read and writes go through the server
– Disadvantage: communication overhead
• Use of lock mechanism
– Disadvantage: file not always available
54
Case Studies:
The Sun Network File System (NSF)
• Developed by Sun Microsystems to provide a distributed file
system independent of the hardware and operating system
• Architecture
– Virtual File System (VFS):
File system interface that allows NSF to support different file systems
– Requests for operation on remote files are routed by VFS to NFS
– Requests are sent to the VFS on the remote using
• The remote procedure call (RPC), and
• The external data representation (XDR)
– VFS on the remote server initiates files system operation locally
– Vnode (Virtual Node):
• There is a network-wide vnode for every object in the file system (file or
directory)- equivalent of UNIX inode
• vnode has a mount table, allowing any node to be a mount node
55
Cluster-based or Clustered File
System
• A distributed file system that consists of
several servers that share the responsibilities
of the system, as opposed to a single server
(possibly replicated).
• The design decisions for a cluster-based
systems are mostly related to how the data is
distributed across the cluster and how it is
managed.
Cluster-Based DFS
• Some cluster-based systems organize the clusters in an
application specific manner
• For file systems used primarily for parallel applications,
the data in a file might be striped across several servers
so it can be read in parallel.
• Or, it might make more sense to partition the file
system itself – some portion of the total number of files
are stored on each server.
• For systems that process huge numbers of requests;
e.g., large data centers, reliability and management
issues take precedence.
– e.g., Google File System
Google File System (GFS)
• GFS uses a cluster-based approach implemented on
ordinary commodity Linux boxes (not high-end
servers).
• Servers fail on a regular basis, just because there are
so many of them, so the system is designed to be
fault tolerant.
• There are a number of replicated clusters that map
to www.google.com
• DNS servers map requests to the clusters in a roundrobin fashion, as a load-balancing mechanism;
locality is also considered.
Scalability in GFS
• Clients only contact the master to get metadata, so it
isn’t a bottleneck.
• Updates are performed by having a client update the
nearest server which pushes the updates to one of
the backups, which in turn sends it on to the next
and so on.
– Updates aren’t committed until all replicas are complete.
• Information for mapping file names to contact
addresses is efficiently organized & stored (mostly) in
the master’s memory.
– Access time is optimized due to infrequent disk accesses.
Distributed Resource Management:
Distributed Shared Memory
60
Distributed shared memory (DSM)
• What
- The distributed shared memory (DSM) implements the shared
memory model in distributed systems, which have no physical
shared memory
- The shared memory model provides a virtual address space shared
between all nodes
- The overcome the high cost of communication in distributed
systems, DSM systems move data to the location of access
• How:
- Data moves between main memory and secondary memory (within
a node) and between main memories of different nodes
- Each data object is owned by a node
- Initial owner is the node that created object
- Ownership can change as object moves from node to node
- When a process accesses data in the shared address space, the
mapping manager maps shared memory address to physical memory
(local or remote)
61
Distributed shared memory (Cont.)
NODE 1
NODE 2
NODE 3
Memory
Memory
Memory
Mapping
Manager
Mapping
Manager
Mapping
Manager
Shared Memory
62
Advantages of distributed shared memory (DSM)
• Data sharing is implicit, hiding data movement (as opposed to ‘Send’/‘Receive’
in message passing model)
• Passing data structures containing pointers is easier (in message passing model
data moves between different address spaces)
• Moving entire object to user takes advantage of locality difference
• Less expensive to build than tightly coupled multiprocessor system: off-the-shelf
hardware, no expensive interface to shared physical memory
• Very large total physical memory for all nodes: Large programs can run more
efficiently
• No serial access to common bus for shared physical memory like in
multiprocessor systems
• Programs written for shared memory multiprocessors can be run on DSM
systems with minimum changes
63
Algorithms for implementing DSM
• Issues
- How to keep track of the location of remote data
- How to minimize communication overhead when accessing remote data
- How to access concurrently remote data at several nodes
1. The Central Server Algorithm
- Central server maintains all shared data
• Read request: returns data item
• Write request: updates data and returns acknowledgement message
- Implementation
• A timeout is used to resend a request if acknowledgment fails
• Associated sequence numbers can be used to detect duplicate write requests
• If an application’s request to access shared data fails repeatedly, a failure
condition is sent to the application
- Issues: performance and reliability
- Possible solutions
• Partition shared data between several servers
• Use a mapping function to distribute/locate data
64
Algorithms for implementing DSM (cont.)
2. The Migration Algorithm
- Operation
• Ship (migrate) entire data object (page, block) containing data item to
requesting location
• Allow only one node to access a shared data at a time
- Advantages
• Takes advantage of the locality of reference
• DSM can be integrated with VM at each node
- Make DSM page multiple of VM page size
- A locally held shared memory can be mapped into the VM page
address space
- If page not local, fault-handler migrates page and removes it from
address space at remote node
- To locate a remote data object:
• Use a location server
• Maintain hints at each node
• Broadcast query
- Issues
• Only one node can access a data object at a time
• Thrashing can occur: to minimize it, set minimum time data object resides65
at a node
Algorithms for implementing DSM (cont.)
3. The Read-Replication Algorithm
– Replicates data objects to multiple nodes
– DSM keeps track of location of data objects
– Multiple nodes can have read access or one node write access (multiple
readers-one writer protocol)
– After a write, all copies are invalidated or updated
– DSM has to keep track of locations of all copies of data objects. Examples
of implementations:
• IVY: owner node of data object knows all nodes that have copies
• PLUS: distributed linked-list tracks all nodes that have copies
– Advantage
• The read-replication can lead to substantial performance improvements if the
ratio of reads to writes is large
66
Algorithms for implementing DSM (cont.)
4. The Full–Replication Algorithm
- Extension of read-replication algorithm: multiple nodes can read and
multiple nodes can write (multiple-readers, multiple-writers protocol)
- Issue: consistency of data for multiple writers
- Solution: use of gap-free sequencer
• All writes sent to sequencer
• Sequencer assigns sequence number and sends write request to
all sites that have copies
• Each node performs writes according to sequence numbers
• A gap in sequence numbers indicates a missing write request:
node asks for retransmission of missing write requests
67
Memory coherence
• DSM are based on
- Replicated shared data objects
- Concurrent access of data objects at many nodes
• Coherent memory: when value returned by read operation is the
expected value (e.g., value of most recent write)
• Mechanism that control/synchronizes accesses is needed to
maintain memory coherence
• Sequential consistency: A system is sequentially consistent if
- The result of any execution of operations of all processors is the same as if
they were executed in sequential order, and
- The operations of each processor appear in this sequence in the order
specified by its program
• General consistency:
- All copies of a memory location (replicas) eventually contain same data
68
when all writes issued by every processor have completed
Memory coherence (Cont.)
•
Processor consistency:
- Operations issued by a processor are performed in the order they are issued
- Operations issued by several processors may not be performed in the same
order (e.g. simultaneous reads of same location by different processors may
yields different results)
• Weak consistency:
- Memory is consistent only (immediately) after a synchronization operation
- A regular data access can be performed only after all previous
synchronization accesses have completed
• Release consistency:
- Further relaxation of weak consistency
- Synchronization operations must be consistent which each other only within
a processor
- Synchronization operations: Acquire (i.e. lock), Release (i.e. unlock)
- Sequence:
Acquire
Regular access
Release
69
Coherence Protocols
• Issues
- How do we ensure that all replicas have the same information
- How do we ensure that nodes do not access stale data
1. Write-invalidate protocol
- A write to shared data invalidates all copies except one before write executes
- Invalidated copies are no longer accessible
- Advantage: good performance for
• Many updates between reads
• Per node locality of reference
- Disadvantage
• Invalidations sent to all nodes that have copies
• Inefficient if many nodes access same object
- Examples: most DSM systems: IVY, Clouds, Dash, Memnet, Mermaid, and Mirage
2. Write-update protocol
- A write to shared data causes all copies to be updated (new value sent, instead of
validation)
- More difficult to implement
70
Design issues
• Granularity: size of shared memory unit
- If DSM page size is a multiple of the local virtual memory (VM)
management page size (supported by hardware), then DSM can be
integrated with VM, i.e. use the VM page handling
- Advantages vs. disadvantages of using a large page size:
- (+) Exploit locality of reference
- (+) Less overhead in page transport
- (-) More contention for page by many processes
- Advantages vs. disadvantages of using a small page size
- (+) Less contention
- (+) Less false sharing (page contains two items, not shared but needed by two
processes)
- (-) More page traffic
- Examples
• PLUS: page size 4 Kbytes, unit of memory access is 32-bit word
• Clouds, Munin: object is unit of shared data structure
71
Design issues (cont.)
• Page replacement
- Replacement algorithm (e.g. LRU) must take into account page access
modes: shared, private, read-only, writable
- Example: LRU with access modes
• Private (local) pages to be replaced before shared ones
• Private pages swapped to disk
• Shared pages sent over network to owner
• Read-only pages may be discarded (owners have a copy)
72
© Copyright 2026 Paperzz