Distributed Shared Memory

Distributed System
UNIT-III
The Byzantine General Problem
Once upon a time...
Some of them may be traitors
who will try to confuse the
others
Communicating only
by messenger
Generals must agree
upon a common
battle plan
The pictures are taken from: R. Goscinny and A. Uderzo, Asterix and Latraviata.
Byzantine Generals Problem &
Impossible Results
• Find an algorithm
– To ensure that the loyal generals will reach agreement
– A small number of traitors cannot cause the loyal generals to
adopt a bad plan
• Remodeled as a commanding general sending an order
to his lieutenants
– IC1: All loyal generals get same result
– IC2: If commander is loyal, all loyal generals follow his choice
• No solution will work unless there are more than 2/3 loyal
ones
Example: Poor Lieutenant 1’s Dilemma
Commander
Lieutenant 1
He said retreat
Lieutenant 2 (Traitor)
Commander (Traitor)
 IC1 violated !
The two situations
are identical to me!
Lieutenant 1
 Attack
He said retreat
Lieutenant 2
 Retreat
Solutions
• Solution 1: Using Oral Messages
• Solution 2: Using Signed Messages
Solution using Oral Message
• Solution for more than 3m+1 generals with m traitors
• Oral messages:
– Every message that is sent is delivered correctly
– The receiver of a message knows who sent it
– The absence of a message can be detected
• Function 'majority':
– With the property that if a majority of the values vi equals v, then
majority(v1,...,vn-1) equals v.
• Order set Vi
– Each lieutenant uses it to store orders from others
• Algorithm OM(m) can deal with m traitors
– Defined recursively
Base case:
CommanderOM(0)
0
Commander sends messages
to Lieutenants
Each Lieutenant receives and
records it.
attack
Lieutenant i
Lieutenant j
Lieutenant k
Vi ={v0:attack}
Vi ={v0:attack}
Vi ={v0:attack}
OM(m)
Commander
Each Lieutenant act as the
commander in OM(m-1)
Send messages to ‘his’
Lieutenants
Do this recursively
attack
attack
attack
……
Lieutenant i
Lieutenant j
attack
Lieutenant k
Step 3: Majority
Vote
Commander
For any m, Algorithm OM(m)
satisfies conditions IC1 and IC2
if there are more than 3m
generals and at most m traitors
My decision is:
majority(v1,v2,…,v_n-1)
Me too
Me too
……
Lieutenant 1
Lieutenant 2
Lieutenant n-1
OM(1): Lieutenant
Commander 3 is a traitor
IC1 achieved
IC2 achieved
Majority(attack,attack,attack)
=attack
Attack
Majority(attack,attack,retreat)
=attack
Attack
Attack
Attack
Lieutenant 1
Lieutenant 2
Attack
Attack
Retreat
Lieutenant 3 (Traitor)
OM(1): Commander
Commander (Traitor) is a traitor
IC1 achieved
IC2 need not be satisfied
Retreat
Majority(attack,retreat,retreat)
=retreat
Majority(attack,retreat,retreat)
=retreat
Attack
Retreat
Retreat
Lieutenant 1
Majority(attack,retreat,retreat)
=retreat
Retreat
Lieutenant 2
Attack
Retreat
Lieutenant 3
Solution with Signed Messages
• What is a signed message?
– A loyal general's signature cannot be forged, and any alteration
of the contents of his signed messages can be detected
– Anyone can verify the authenticity of a general's signature
• Function choice(V): decision making
– If the set V consists of the single element v, then choice(V)=v
• Note: no other characteristics needed for choice(V)
Step 1
Commander
sends message to
each Lieutenant
For any Lieutenant i, if he
receives the v:0 message and he
has not received any order yet
Commander (Traitor)
Let
Vi={v}
Send v:0:i to other lieutenants
attack:0:i
Vj={attack}
Vj={attack,attack}
Lieutenant j
attack:0:i
Lieutenant i
Vi={attack}
Lieutenant k
Vk={retreat}
Vk={retreat,attack}
Step 2
If
Lieutenant i receives a
Commander (Traitor)
message of v:0:j1:…:jk, and v is
NOT in set Vi, then
Add
If
v to Vi
k<m, send v:0:j1:…:jk:i to every
lieutenant except j1,…,jk
When
any Lieutenant i will
receive no more messages
Make
Vj={attack,attack,retreat}
decision using choice(Vi)
They get
the same order set!
Vi=Vj=Vk
Lieutenant i
Vi={attack,attack,retreat}
Lieutenant j
Lieutenant k
Vk={attack,attack,retreat}
Example
Commander (Traitor)
For
any m, Algoritym SM(m)
solves the Byzantine Generals
Problem if there are at most m
traitors.
The traitor can not
cheat now!
They get same
information, thus
same decision
Retreat:0:2
Attack:0:1
Lieutenant 1
Lieutenant 2
V1 = {Attack,Retreat}
V2 = {Attack,Retreat}
Conclusion
• The requirements (Interactive Consistency Condition)
– IC1: All loyal generals get same result
– IC2: If commander is loyal, all loyal generals follow his choice
• Theorems to remember:
– 1. For any m, Algorithm OM(m) satisfies conditions IC1 and IC2 if
there are more than 3m generals and at most m traitors
– 2. For any m, Algorithm SM(m) solves the Byzantine Generals
Problem if there are at most m traitors.
Discussions
• These solutions are not used in practice
– Why?
• What if the messages get lost a lot during
communication?
• Are there any other way besides ‘majority’
and ‘same information’?
Naïve solution
• ith general sends v(i) to all other generals
• To deal with two requirements:
– All generals combine their information v(1), v(2), .., v(n)
in the same way
– Majority (v(1), v(2), …, v(n)), ignore minority traitors
• Naïve solution does not work:
– Traitors may send different values to different generals.
– Loyal generals might get conflicting values from traitors
• Requirement: Any two loyal generals must use the same
value of v(i) to decide on same plan of action.
Reduction of General Problem
• Insight: We can restrict ourselves to the problem of one
general sending its order to others.
• Byzantine Generals Problem (BGP):
– A commanding general (commander) must send an order to his n1 lieutenants.
• Interactive Consistency Conditions:
– IC1: All loyal lieutenants obey the same order.
– IC2: If the commanding general is loyal, then every loyal
lieutenant obeys the order he sends.
• Note: If General is loyal, IC2 => IC1.
• Original problem: each general sends his value v(i) by
using the above solution, with other generals acting as
lieutenants.
3-General Impossibly Example
•
•
•
•
•
•
3 generals, 1 traitor among them.
Two messages: Attack or Retreat
Shaded – Traitor
L1 sees (A,R). Who is the traitor? C or L2?
Fig 1: L1 has to attack to satisfy IC2.
Fig 2: L1 attacks, L2 retreats. IC1 violated.
General Impossibility
• In general, no solutions with fewer than 3m+1
generals can cope with m traitors.
• Proof by contradiction.
– Assume there is a solution for 3m Albanians with m
traitors.
– Reduce to 3-General problem.
- Solution to 3m
problem => Solution to
3-General problem!!
Solution I – Oral Messages
• If there are 3m+1 generals, solution allows up to m
traitors.
• Oral messages – the sending of content is entirely under
the control of sender.
• Assumptions on oral messages:
– A1 – Each message that is sent is delivered correctly.
– A2 – The receiver of a message knows who sent it.
– A3 – The absence of a message can be detected.
• Assures:
– Traitors cannot interfere with communication as third party.
– Traitors cannot send fake messages
– Traitors cannot interfere by being silent.
• Default order to “retreat” for silent traitor.
Oral Messages (Cont)
• Algorithm OM(0)
– Commander send his value to every lieutenant.
– Each lieutenant (L) use the value received from commander, or
RETREAT if no value is received.
• Algorithm OM(m), m>0
– Commander sends his value to every Lieutenant (vi)
– Each Lieutenant acts as commander for OM(m-1) and sends vi to the
other n-2 lieutenants (or RETREAT)
– For each i, and each j<>i, let vj be the value lieutenant i receives from
lieutenant j in step (2) using OM(m-1). Lieutenant i uses the value
majority (v1, …, vn-1).
– Why j<>i? “Trust myself more than what others said I said.”
Restate Algorithm
• OM(M):
– Commander sends out command.
– Each lieutenant acts as commander in OM(m-1). Sends
out command to other lieutenants.
– Use majority to compute value based on commands
received by other lieutenants in OM(m-1)
• Revisit Interactive Consistency goals:
– IC1: All loyal lieutenants obey the same command.
– IC2: If the commanding general is loyal, then every loyal
lieutenant obeys the command he sends.
Example (n=4, m=1)
• Algorithm OM(1): L3 is a traitor.
• L1 and L2 both receive v,v,x. (IC1 is met.)
• IC2 is met because L1 and L2 obeys C
Example (n=4, m=1)
• Algorithm OM(1): Commander is a traitor.
• All lieutenants receive x,y,z. (IC1 is met).
• IC2 is irrelevant since commander is a traitor.
Expensive Communication
•
•
•
•
•
•
OM(m) invokes n-1 OM(m-1)
OM(m-1) invokes n-2 OM(m-2)
OM(m-2) invokes n-3 OM(m-3)
…
OM(m-k) will be called (n-1)…(n-k) times
O(nm) – Expensive!
Distributed File Systems
Introduction
 File systems are responsible for the
organization, storage, retrieval, naming,
sharing and protection of files.
 Files contain both data and attributes.
 A typical attribute record structure is
illustrated in Figure .
30
Introduction
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list
Figure . File attribute record structure
31
Introduction
 Distributed file systems support the
sharing of information in the form of files
and hardware resources.
 With the advent of distributed object
systems (CORBA, Java) and the web, the
picture has become more complex.
32
Definition of a DFS
• DFS: multiple users, multiple sites, and
(possibly) distributed storage of files.
• Benefits
– File sharing
– Uniform view of system from different clients
– Centralized administration
• Goals of a distributed file system
– Network Transparency (access transparency)
– Availability
Goals
• Network (Access)Transparency
– Users should be able to access files over a
network as easily as if the files were stored
locally.
– Users should not have to know the physical
location of a file to access it.
• Transparency can be addressed through
naming and file mounting mechanisms
Components of Access Transparency
• Location Transparency: file name doesn’t
specify physical location
• Location Independence: files can be moved to
new physical location, no need to change
references to them. (A name is independent
of its addresses
• Location independence → location
transparency, but the reverse is not
necessarily true.
Goals
• Availability: files should be easily and quickly
accessible.
• The number of users, system failures, or other
consequences of distribution shouldn’t
compromise the availability.
• Addressed mainly through replication.
Introduction
 Distributed File system requirements
 Related requirements in distributed file systems
are:








Transparency
Concurrency
Replication
Heterogeneity
Fault tolerance
Consistency
Security
Efficiency
37
Architectures
• Client-Server
– Traditional; e.g. Sun Microsystem Network File
System (NFS)
– Cluster-Based Client-Server; e.g., Google File
System (GFS)
• Symmetric
– Fully decentralized; based on peer-to-peer
technology
– e.g., Ivy (uses a Chord DHT approach)
Client-Server Architecture
• One or more machines (file servers) manage
the file system.
• Files are stored on disks at the servers
• Requests for file operations are made from
clients to the servers.
• Client-server systems centralize storage and
management; P2P systems decentralize it.
client
client
cache
cache
Communication Network
cache
Server
Server
Disks
cache
Server
Architecture of a distributed file system: client-server model
Sun’s Network File System
• Sun’s NFS for many years was the most widely
used distributed file system.
– NFSv3: version three, used for many years
– NFSv4: introduced in 2003
• Version 4 made significant changes
Overview
• NFS goals:
– Each file server presents a standard view of its local file
system
– transparent access to remote files
– compatibility with multiple operating systems and
platforms.
– easy crash recovery at server (at least v1-v3)
• Originally UNIX based; now available for most
operating systems.
• NFS communication protocols lets processes running
in different environments share a file system.
Access Models
• Clients access the server transparently through
an interface similar to the local file system
interface
• Client-side caching may be used to save time
and network traffic
• Server defines and performs all file operations
Distributed File Systems Services
• Services provided by the distributed file system:
(1) Name Server: Provides mapping (name resolution) the
names supplied by clients into objects (files and directories)
• Takes place when process attempts to access file or directory the first
time.
(2) Cache manager: Improves performance through file caching
• Caching at the client - When client references file at server:
– Copy of data brought from server to client machine
– Subsequent accesses done locally at the client
• Caching at the server:
– File saved in memory to reduce subsequent access time
* Issue: different cached copies can become inconsistent. Cache
managers (at server and clients) have to provide coordination.
44
Typical Data Access in a Client/File Server Architecture
45
Mechanisms used in distributed file systems
(1) Mounting
•
•
•
The mount mechanism binds together several filename spaces
(collection of files and directories) into a single hierarchically structured
name space (Example: UNIX and its derivatives)
A name space ‘A’ can be mounted (bounded) at an internal node (mount
point) of a name space ‘B’
Implementation: kernel maintains the mount table, mapping mount
points to storage devices
46
Mechanisms used in distributed file systems (cont.)
(1) Mounting (cont.)
• Location of mount information
a. Mount information maintained at clients
– Each client mounts every file system
– Different clients may not see the same filename space
– If files move to another server, every client needs to update its mount
table
– Example: SUN NFS
b. Mount information maintained at servers
– Every client see the same filename space
– If files move to another server, mount info at server only needs to change
– Example: Sprite File System
47
Mechanisms used in distributed file systems (cont.)
(2) Caching
– Improves file system performance by exploiting the locality of
reference
– When client references a remote file, the file is cached in the main
memory of the server (server cache) and at the client (client cache)
– When multiple clients modify shared (cached) data, cache consistency
becomes a problem
– It is very difficult to implement a solution that guarantees consistency
(3) Hints
– Treat the cached data as hints, i.e. cached data may not be completely
accurate
– Can be used by applications that can discover that the cached data is
invalid and can recover
• Example:
– After the name of a file is mapped to an address, that address is stored as a
hint in the cache
– If the address later fails, it is purged from the cache
– The name server is consulted to provide the actual location of the file and the
cache is updated
48
Mechanism used in distributed file systems (cont.)
(4) Bulk data transfer
– Observations:
• Overhead introduced by protocols does not depend on the amount of data
transferred in one transaction
• Most files are accessed in their entirety
– Common practice: when client requests one block of data, multiple
consecutive blocks are transferred
(5) Encryption
– Encryption is needed to provide security in distributed systems
– Entities that need to communicate send request to authentication
server
– Authentication server provides key for conversation
49
Design Issues
1. Naming and name resolution
– Terminology
• Name: each object in a file system (file, directory) has a unique name
• Name resolution: mapping a name to an object or multiple objects (replication)
• Name space: collection of names with or without same resolution mechanism
– Approaches to naming files in a distributed system
(a) Concatenate name of host to names of files on that host
– Advantage: unique filenames, simple resolution
– Disadvantages:
» Conflicts with network transparency
» Moving file to another host requires changing its name and the applications using it
(b) Mount remote directories onto local directories
– Requires that host of remote directory is known
– After mounting, files referenced location-transparent (I.e., file name does not reveal its
location)
(c) Have a single global directory
– All files belong to a single name space
– Limitation: having unique system wide filenames require a single computing facility or
cooperating facilities
50
Design Issues (cont.)
1. Naming and Name Resolution (cont.)
– Contexts
• Solve the problem of system-wide unique names, by partitioning a name
space into contexts (geographical, organizational, etc.)
• Name resolution is done within that context
• Interpretation may lead to another context
• File Name = Context + Name local to context
– Nameserver
• Process that maps file names to objects (files, directories)
• Implementation options
– Single name Server
» Simple implementation, reliability and performance issues
– Several Name Servers (on different hosts)
» Each server responsible for a domain
» Example:
Client requests access to file ‘A/B/C’
Local name server looks up a table (in kernel)
Local name server points to a remote server for ‘/B/C’ mapping
51
Design Issues (Cont.)
3. Writing policy
– Question: once a client writes into a file (and the local cache), when should
the modified cache be sent to the server?
– Options:
• Write-through: all writes at the clients, immediately transferred to the
servers
– Advantage: reliability
– Disadvantage: performance, it does not take advantage of the cache
• Delayed writing: delay transfer to servers
– Advantages:
» Many writes take place (including intermediate results) before a
transfer
» Some data may be deleted
– Disadvantage: reliability
• Delayed writing until file is closed at client
– For short open intervals, same as delayed writing
– For long intervals, reliability problems
52
Design Issues (Cont.)
4. Availability
–
–
–
Issue: what is the level of availability of files in a distributed file system?
Resolution: use replication to increase availability, i.e. many copies
(replicas) of files are maintained at different sites/servers
Replication issues:
•
•
–
How to keep replicas consistent
How to detect inconsistency among replicas
Unit of replication
•
•
File
Group of files
a) Volume: group of all files of a user or group or all files in a server
»
»
Advantage: ease of implementation
Disadvantage: wasteful, user may need only a subset replicated
b) Primary pack vs. pack
»
»
Primary pack:all files of a user
Pack: subset of primary pack. Can receive a different degree of replication
for each pack
53
Design Issues (Cont.)
5. Scalability
– Issue: can the design support a growing system?
– Example: server-initiated cache invalidation complexity and load grow with
size of system. Possible solutions:
• Do not provide cache invalidation service for read-only files
• Provide design to allow users to share cached data
– Design file servers for scalability: threads, SMPs, clusters
6. Semantics
– Expected semantics: a read will return data stored by the latest write
– Possible options:
• All read and writes go through the server
– Disadvantage: communication overhead
• Use of lock mechanism
– Disadvantage: file not always available
54
Case Studies:
The Sun Network File System (NSF)
• Developed by Sun Microsystems to provide a distributed file
system independent of the hardware and operating system
• Architecture
– Virtual File System (VFS):
File system interface that allows NSF to support different file systems
– Requests for operation on remote files are routed by VFS to NFS
– Requests are sent to the VFS on the remote using
• The remote procedure call (RPC), and
• The external data representation (XDR)
– VFS on the remote server initiates files system operation locally
– Vnode (Virtual Node):
• There is a network-wide vnode for every object in the file system (file or
directory)- equivalent of UNIX inode
• vnode has a mount table, allowing any node to be a mount node
55
Cluster-based or Clustered File
System
• A distributed file system that consists of
several servers that share the responsibilities
of the system, as opposed to a single server
(possibly replicated).
• The design decisions for a cluster-based
systems are mostly related to how the data is
distributed across the cluster and how it is
managed.
Cluster-Based DFS
• Some cluster-based systems organize the clusters in an
application specific manner
• For file systems used primarily for parallel applications,
the data in a file might be striped across several servers
so it can be read in parallel.
• Or, it might make more sense to partition the file
system itself – some portion of the total number of files
are stored on each server.
• For systems that process huge numbers of requests;
e.g., large data centers, reliability and management
issues take precedence.
– e.g., Google File System
Google File System (GFS)
• GFS uses a cluster-based approach implemented on
ordinary commodity Linux boxes (not high-end
servers).
• Servers fail on a regular basis, just because there are
so many of them, so the system is designed to be
fault tolerant.
• There are a number of replicated clusters that map
to www.google.com
• DNS servers map requests to the clusters in a roundrobin fashion, as a load-balancing mechanism;
locality is also considered.
Scalability in GFS
• Clients only contact the master to get metadata, so it
isn’t a bottleneck.
• Updates are performed by having a client update the
nearest server which pushes the updates to one of
the backups, which in turn sends it on to the next
and so on.
– Updates aren’t committed until all replicas are complete.
• Information for mapping file names to contact
addresses is efficiently organized & stored (mostly) in
the master’s memory.
– Access time is optimized due to infrequent disk accesses.
Distributed Resource Management:
Distributed Shared Memory
60
Distributed shared memory (DSM)
• What
- The distributed shared memory (DSM) implements the shared
memory model in distributed systems, which have no physical
shared memory
- The shared memory model provides a virtual address space shared
between all nodes
- The overcome the high cost of communication in distributed
systems, DSM systems move data to the location of access
• How:
- Data moves between main memory and secondary memory (within
a node) and between main memories of different nodes
- Each data object is owned by a node
- Initial owner is the node that created object
- Ownership can change as object moves from node to node
- When a process accesses data in the shared address space, the
mapping manager maps shared memory address to physical memory
(local or remote)
61
Distributed shared memory (Cont.)
NODE 1
NODE 2
NODE 3
Memory
Memory
Memory
Mapping
Manager
Mapping
Manager
Mapping
Manager
Shared Memory
62
Advantages of distributed shared memory (DSM)
• Data sharing is implicit, hiding data movement (as opposed to ‘Send’/‘Receive’
in message passing model)
• Passing data structures containing pointers is easier (in message passing model
data moves between different address spaces)
• Moving entire object to user takes advantage of locality difference
• Less expensive to build than tightly coupled multiprocessor system: off-the-shelf
hardware, no expensive interface to shared physical memory
• Very large total physical memory for all nodes: Large programs can run more
efficiently
• No serial access to common bus for shared physical memory like in
multiprocessor systems
• Programs written for shared memory multiprocessors can be run on DSM
systems with minimum changes
63
Algorithms for implementing DSM
• Issues
- How to keep track of the location of remote data
- How to minimize communication overhead when accessing remote data
- How to access concurrently remote data at several nodes
1. The Central Server Algorithm
- Central server maintains all shared data
• Read request: returns data item
• Write request: updates data and returns acknowledgement message
- Implementation
• A timeout is used to resend a request if acknowledgment fails
• Associated sequence numbers can be used to detect duplicate write requests
• If an application’s request to access shared data fails repeatedly, a failure
condition is sent to the application
- Issues: performance and reliability
- Possible solutions
• Partition shared data between several servers
• Use a mapping function to distribute/locate data
64
Algorithms for implementing DSM (cont.)
2. The Migration Algorithm
- Operation
• Ship (migrate) entire data object (page, block) containing data item to
requesting location
• Allow only one node to access a shared data at a time
- Advantages
• Takes advantage of the locality of reference
• DSM can be integrated with VM at each node
- Make DSM page multiple of VM page size
- A locally held shared memory can be mapped into the VM page
address space
- If page not local, fault-handler migrates page and removes it from
address space at remote node
- To locate a remote data object:
• Use a location server
• Maintain hints at each node
• Broadcast query
- Issues
• Only one node can access a data object at a time
• Thrashing can occur: to minimize it, set minimum time data object resides65
at a node
Algorithms for implementing DSM (cont.)
3. The Read-Replication Algorithm
– Replicates data objects to multiple nodes
– DSM keeps track of location of data objects
– Multiple nodes can have read access or one node write access (multiple
readers-one writer protocol)
– After a write, all copies are invalidated or updated
– DSM has to keep track of locations of all copies of data objects. Examples
of implementations:
• IVY: owner node of data object knows all nodes that have copies
• PLUS: distributed linked-list tracks all nodes that have copies
– Advantage
• The read-replication can lead to substantial performance improvements if the
ratio of reads to writes is large
66
Algorithms for implementing DSM (cont.)
4. The Full–Replication Algorithm
- Extension of read-replication algorithm: multiple nodes can read and
multiple nodes can write (multiple-readers, multiple-writers protocol)
- Issue: consistency of data for multiple writers
- Solution: use of gap-free sequencer
• All writes sent to sequencer
• Sequencer assigns sequence number and sends write request to
all sites that have copies
• Each node performs writes according to sequence numbers
• A gap in sequence numbers indicates a missing write request:
node asks for retransmission of missing write requests
67
Memory coherence
• DSM are based on
- Replicated shared data objects
- Concurrent access of data objects at many nodes
• Coherent memory: when value returned by read operation is the
expected value (e.g., value of most recent write)
• Mechanism that control/synchronizes accesses is needed to
maintain memory coherence
• Sequential consistency: A system is sequentially consistent if
- The result of any execution of operations of all processors is the same as if
they were executed in sequential order, and
- The operations of each processor appear in this sequence in the order
specified by its program
• General consistency:
- All copies of a memory location (replicas) eventually contain same data
68
when all writes issued by every processor have completed
Memory coherence (Cont.)
•
Processor consistency:
- Operations issued by a processor are performed in the order they are issued
- Operations issued by several processors may not be performed in the same
order (e.g. simultaneous reads of same location by different processors may
yields different results)
• Weak consistency:
- Memory is consistent only (immediately) after a synchronization operation
- A regular data access can be performed only after all previous
synchronization accesses have completed
• Release consistency:
- Further relaxation of weak consistency
- Synchronization operations must be consistent which each other only within
a processor
- Synchronization operations: Acquire (i.e. lock), Release (i.e. unlock)
- Sequence:
Acquire
Regular access
Release
69
Coherence Protocols
• Issues
- How do we ensure that all replicas have the same information
- How do we ensure that nodes do not access stale data
1. Write-invalidate protocol
- A write to shared data invalidates all copies except one before write executes
- Invalidated copies are no longer accessible
- Advantage: good performance for
• Many updates between reads
• Per node locality of reference
- Disadvantage
• Invalidations sent to all nodes that have copies
• Inefficient if many nodes access same object
- Examples: most DSM systems: IVY, Clouds, Dash, Memnet, Mermaid, and Mirage
2. Write-update protocol
- A write to shared data causes all copies to be updated (new value sent, instead of
validation)
- More difficult to implement
70
Design issues
• Granularity: size of shared memory unit
- If DSM page size is a multiple of the local virtual memory (VM)
management page size (supported by hardware), then DSM can be
integrated with VM, i.e. use the VM page handling
- Advantages vs. disadvantages of using a large page size:
- (+) Exploit locality of reference
- (+) Less overhead in page transport
- (-) More contention for page by many processes
- Advantages vs. disadvantages of using a small page size
- (+) Less contention
- (+) Less false sharing (page contains two items, not shared but needed by two
processes)
- (-) More page traffic
- Examples
• PLUS: page size 4 Kbytes, unit of memory access is 32-bit word
• Clouds, Munin: object is unit of shared data structure
71
Design issues (cont.)
• Page replacement
- Replacement algorithm (e.g. LRU) must take into account page access
modes: shared, private, read-only, writable
- Example: LRU with access modes
• Private (local) pages to be replaced before shared ones
• Private pages swapped to disk
• Shared pages sent over network to owner
• Read-only pages may be discarded (owners have a copy)
72