Naming in Distributed System

Naming in
Distributed System
G.Ramesh Babu
Contents
• Naming Entities
– Names, Identifiers and Address
– Name Spaces
• Name Resolution
– Closure Mechanism
– Linking and Mounting
• Implementation of Name Space
• Implementation of Resolution
• Conclusion
Why naming is important?
• Names are used to
– Share resources
– Uniquely identify entities
– To refer locations, and so on…
• Name resolution allows a process to
access the named entity
Naming Entities
• Name  string of characters used to refer
to an entity
– Entity in DS can be anything, e.g., hosts,
printers, disks, files, mailboxes, web pages,
etc
• Access Point  To access an entity
• Address  name of access point
• Access points of an entity may change
Identifier and True Identifiers
• We need
– single name of entity independent from the address of
that entity  location independent
• Identifiers  name that uniquely identifies an
entity
• True Identifier has three properties
– Refers to at most one entity
– Each entity is referred to by at most one identifier
– Never reused
• Differentiating point for Address and Identifier
Name Space
• Names in DS are organized into Name Spaces
• Name Space represented as labeled, directed
graph
• Leaf node  no outgoing edges
• Directory node  number of labeled outgoing
edges
– Stores directory table containing entries for each
outgoing edge as a pair (edge label, node identifier)
• Root Node  only outgoing edges
• Path Name sequence of labels
– Absolute Path  first node in path name is root
– Relative Path  the opposite case
General Naming Graph
Name Resolution
• The process of looking up a name
• Closure Mechanism  Knowing how and where to start
name resolution
• Mounting  transparent way for name resolution with
different name spaces
• Mounted File System  letting a directory node store
the identifier of a directory node from a different name
space (foreign name space)
– Mount point  directory node storing the node
identifier
– Mounting point  directory node in the foreign name
space
• Normally the mounting point is root
Mounted File System
• During resolution, mounting point is looked up &
resolution proceeds by accessing its directory
table
• Mounting requires at least
– Name of an access protocol (for communication)
– Name of the server (resolved to address)
– Name of mounting point in foreign name space
(resolved to node identifier in foreign NS)
• Each of these names needs to be resolved
• Three names can be represented as URL
nfs://oslab.khu.ac.kr/home/faraz
Mounted File System
Global Name Service (GNS)
• Another way to merge different name spaces
• Mechanism  add a new root node and make
the exiting root node its children
• Problem
– Existing names need to be changed. E.g.,
home/faraz  people/home/faraz
• Expansion is generally hidden from user
• Has a significant performance overhead when
merging 100s or 1000s of name spaces
Global Name Service (GNS)
Implementation of Name Space
• For large scale DS, name spaces are
organized hierarchically
• Name Spaces are partitioned into three
logical layers
– Global Layer  formed by highest-level
nodes
– Administration Layer  formed by directory
nodes managed within a single organization
– Managerial Layer  formed by nodes that
may typically change regularly
Implementation of Name Space
Implementation of Name Space
Item
Global
Administrational
Managerial
Geographical scale of network
Worldwide
Organization
Department
Total number of nodes
Few
Many
Vast numbers
Responsiveness to lookups
Seconds
Milliseconds
Immediate
Update propagation
Lazy
Immediate
Immediate
Number of replicas
Many
None or few
None
Is client-side caching applied?
Yes
Yes
Sometimes
Implementation of Name
Resolution
• Assumptions
– No replication of name servers
– No client side caching
– Each client has access to a local name server
• Two possible implementations
– Iterative Name Resolution
• Server will resolve the path name as far as it can, and return
each intermediate result to the client
– Recursive Name Resolution
• A name server passes the result to the next name server
found by it
Iterative Name Resolution
• Advantage
– Less burden on name sever
• Disadvantage
– More communication cost
Recursive Name Resolution
• Advantages
– Caching result is more effective
– Reduced communication cost
• Disadvantage
– Demands high performance on each name server
Domain Name System (DNS)
• An example implementation of name resolution
• Primarily used for looking up host address and
mail servers
• DNS name space is hierarchically organized as
a rooted tree
• A label is a case sensitive string with max.
length of 63 characters
• Max. length of complete path name is 255
characters
• The root is represented by a dot
– We generally omit this dot for readability
Locating Mobile Entities
Naming versus Locating Entities
• Entities are named for lookup and subsequent
access
– Human-friendly Names
– Identifiers
– Addresses
• Virtually all naming systems maintain mapping
from Human-friendly names to addresses
• Partitioning of Name space
– Global Level
– Administrator Level
– Managerial Level
Naming versus Locating Entities
cs.vu.nl
cs.vu.nl
abc
ftp.cs.vu.nl
ftp.cs.vu.nl
cs.vu.nl
ftp.khu.ac.kr
ftp.cs.vu.nl
ftp.abc.cs.vu.nl
Naming versus Locating Entities
• Possible Solutions
– Record the address of new machine
• Lookup operation shall work
• Another update shall be required to database in case it
changes again
– Record the name of the new machine
• Less efficient
– Find the name of new machine
– Lookup the address associated with the name
• Addition of step to lookup operation
• For highly mobile entities, it becomes only worse
Naming versus Locating Entities
•
•
Direct, single level mapping between names and
addresses.
T-level mapping using identities.
Simple solutions: Broadcasting and
multicasting
• A location service accepts an identifier as input and
returns the current address of the identified entity.
• Simple solutions exist to work in local area network.
• Address Resolution Protocol (ARP) to map the IP
address of a machine to its data-link address, which
uses broadcasting.
• Multicasting can be used to locate entities in point-topoint networks (such as the Internet).
• Each multicasting address can be associated with
multiple replicated entities.
Forwarding Pointers (1)
• The principle of forwarding pointers using (proxy, skeleton)
pairs.
Forwarding Pointers (1)
• Redirecting a forwarding pointer, by
storing a shortcut in a proxy.
Home-Based Approaches
• Example: The principle of Mobile IP. (Perkins,
1997)
Hierarchical Approaches (1)
• Hierarchical organization of a location service
into domains, each having an associated
directory node.
Hierarchical Approaches (2)
• An example of storing information of an
entity having two addresses in different leaf
domains.
Hierarchical Approaches (3)
• Looking up a location in a hierarchically organized
location service.
Hierarchical Approaches (4)
a)
b)
An insert request is forwarded to the first node
that knows about entity E.
A chain of forwarding pointers to the leaf node is
created.
Pointer Caches (1)
• Caching a reference to a directory node of the
lowest-level domain in which an entity will reside
most of the time.
Pointer Caches (2)
• A cache entry that needs to be invalidated
because it returns a nonlocal address, while
such an address is available.
Scalability Issues
• The scalability issues related to uniformly placing
subnodes of a partitioned root node across the network
covered by a location service.
The Problem of Unreferenced
Objects
• An example of a graph representing objects
containing references to each other.
Reference Counting (1)
• The problem of maintaining a proper
reference count in the presence of unreliable
communication.
Reference Counting (2)
a) Copying a reference to another
process and incrementing the
counter too late
b) A solution.
Advanced Referencing
Counting (1)
a)
b)
The initial assignment of weights in
weighted reference counting
Weight assignment when creating a new
reference.
Advanced Referencing
Counting (2)
c) Weight assignment when copying a
reference.
Advanced Referencing
Counting (3)
• Creating an indirection when the partial
weight of a reference has reached 1.
Advanced Referencing
Counting (4)
• Creating and copying a remote
reference in generation reference
counting.
Reference Listing (1)
• Skeleton Keeps track of Proxies
– Instead of counting them maintain an explicit list of
references
• Adding/removing references to the list have no
effect on the fact the proxy is already
exists/removed
• Idempotent Operations
– Repeatable without affecting the end result
• Increment/decrement operation are clearly not
idempotent
Reference Listing (2)
• Advantages
– Don’t require reliable communication
– Duplicate messages need not to be detected
– Only insertion/deletion should be acknowledged
– Easier to keep system consistent in case of process
failures
• Drawback
– Scale badly
• Solution
– Leasing
•
Identifying Unreachable Entities
Trace based garbage collection
– Scalability problems
•
Naïve tracing
– Mark and sweep collectors
•
•
White, Grey, Black marks
Drawbacks
– Reachability graphs need to remain same during
both phases
– No process can run when GC is running
Tracing in Groups (1)
• Initial marking of skeletons.
Tracing in Groups (2)
• After local propagation in each process.
Tracing in Groups (3)
• Final marking.
Conclusion
• Naming, organization of names and name
resolution are key issue in any distributed
systems
• Locating entities is an open research issues.
There are few methods like Forwarding pointers,
hierarchical approaches, home based
approaches and pointer caches but each has its
own short comings
• Reference counting, advanced reference
counting and Reference listing are few methods
that can be used for unreferenced objects
- All is well that ends well !
Thank you all 
Questions / Comments?