POSIX-like OGSA/SOAP Services - San Diego Supercomputer Center

POSIX-like OGSA/SOAP Services
Arun Jagatheesan
Architect & Team Lead, SDSC Matrix
San Diego Supercomputer Center
GFS, Global Grid Forum-9
October 7, 2003, Chicago
National Partnership for Advanced Computational Infrastructure
University of Florida
San Diego Supercomputer Center
Talk Outline
•
•
•
•
•
Grid File System
The small big picture
Need for Schema
Need for Operation definitions
Data Transport
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
2
Grid File System
Applications (Astronomy, Physics, Life Science, business apps, . . .)
Hierarchical
Logical
Name space,
ACL,
metadata
Grid File System Service (POSIX-like Interface)
NFS/CIFS …
Virtual Directory Service
(Management of virtualization)
Data Services
Coordinated with
other groups
Data Sources
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
3
OGSA/SOAP
based
interfaces for
file operations
The small big picture
Grid File System Service (POSIX-like Interface)
XML
Schema for
Collections
, Data Sets
NFS or other
standard
interface over
the virtualized
schema
NFS/CIFS …
Virtual Directory Service
(Management of virtualization)
Data Services
Data Sources
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
4
Grid Collection Schema
• XML Schema based Description for
•
•
•
•
•
•
Collections or Virtual Directories
Data Sets
File System Meta-data (file size, date created, …)
Application Specific Meta-data
Access Permissions
…
• Logical Name space
•
•
•
•
Extensible
Scalable (more federations)
Dynamic Composition of the name space
Import and Export
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
5
Operations on Logical Namespace
• OGSA/SOAP based interfaces
• Grid File System operations
• Similar to traditional file systems operations / POSIX
• Open (= Get a GSR?), Read, Seek’n’Read, Seek’n’Write, …
• Simple Control (Context) Operations
• Management of Logical Namespace
• SOAP based bindings
• Bulk (Content) Operations
• Only SOAP bindings for data transport ??? (NOPE)
• Alternative mechanisms needed in standard
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
6
How do we form the logical
namespace?
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
7
Logical Layers (bits,data,information,..)
Collections or Virtual Directories
myActiveNeuroCollection
patientRecordsCollection
Virtual Data Transparency
image.cgi image.wsdl
image.sql
Data Replica Transparency
image_0.jpg…image_100.jpg
Data Identifier Transparency
E:\srbVault\image.jpg /users/srbVault/image.jpg Select … from srb.mdas.td where...
Storage Location Transparency
Storage Resource Transparency
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
8
Storage Resource Transparency (1)
• Storage repository abstraction
• Archival systems, file systems, databases, FTP sites, …
• Logical resources
•
•
•
•
•
Combine physical resources into a logical set of resources
Hide the type and protocol of physical storage system
Load balancing – based on access patterns
Unlike DBMS, user is aware of logical resources
Flexibility to changes in mass storage technology
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
9
Storage Resource Transparency (2)
• Standard operations at storage repositories
• POSIX like operations on all resources
• Storage specific operations
• Databases - bulk metadata access
• Object ring buffers - object based access
• Hierarchical resource managers - status and staging
requests
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
10
Storage Location Transparency
• Support replication of data for performance
• Transparent access to physical location and physical
resource
• Virtualization of distributed data resources
• Data naming managed by the data grid
• Redundancy for preservation
• Resource redundancy – “m of n” resources in list
• Location redundancy – replicate at multiple locations
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
11
Data Identifier Transparency
• Four Types of Data Identifiers:
1. Unique name
•
OID or handle
2. Descriptive name
•
•
Descriptive attributes – meta data
Semantic access to data
3. Collective name
•
•
Logical name space of a collection of data sets
Location independent
4. Physical name
•
Physical location of resource and physical path of data
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
12
Data Replica Transparency
• Replication
•
•
•
•
Improve access time
Improve reliability
Provide disaster backup and preservation
Physically or Semantically equivalent replicas
• Replica consistency
• Synchronization across replicas on writes
• Updates might use “m of n” or any other policy
• Distributed locking across multiple sites
• Versions of files
• Time-annotated snapshots of data
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
13
Conclusion
• Lot of possibilities
• Need for a Standard Grid File Schema and
Global Logical Namespace for virtualization
• Need for Standard description of Operations or
Grid File System Service
• Call for
• Users, Projects
• Developers, Vendors
• It’s a stone’s throw away – together, we will do it.
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
University of Florida
14