GGF Data Grid Interoperability Demonstration

iRODS Metadata
Grid File System
Reagan Moore
San Diego Supercomputer Center
www.ogf.org
OGF-23
integrated Rule-Oriented Data System
Client Interface
Admin Interface
Rule Invoker
Rule
Modifier
Module
Current
State
Rule
Rule
Base
Consistency
Check
Module
Metadata
Modifier
Module
Service
Manager
Consistency
Check
Module
Consistency
Check
Module
Confs
Resources
Resource-based
Services
Micro
Service
Modules
www.ogf.org
Config
Modifier
Module
Metadata-based
Services
Micro
Service
Modules
OGF-23
Metadata
Persistent
Repository
iRODS Data Grid - System Metadata
DB
iRODS Server
Rule Engine
Metadata Catalog
Rule Base
iRODS Server
Rule Engine
•User asks for data
•Data request goes to iRODS Server
•Server looks up information in catalog
•Catalog tells which iRODS server has data
•1st server asks 2nd for data
www.ogf.org
OGF-23
•The 2nd iRODS
server applies rules
Logical Name Spaces
1. Logical file names
 POSIX attributes - owner, location, size, creation time, access controls
 Grid attributes - aggregation in container, checksum, validation time
 Collection attributes - description, provenance, authenticity
2. Logical user names
 Home data grid, project, password, group membership, address, e-mail
3. Logical resource names
 Physical resource address, group membership, access controls
4. Logical rule names
 Physical rule instance, version number, condition for execution
5. Logical micro-service names
 Physical micro-service instance, version number, access controls
6. Logical state information
 Physical attribute instance, version number
www.ogf.org
OGF-23
Logical File Names
Data attributes
DATA_ID
DATA_COLL_ID
DATA_NAM E
DATA_REPL_NUM
DATA_VERSION
DATA_TYPE_NAM E
DATA_SIZE
DATA_RESC_GROUP_NAM E
DATA_RESC_NAM E
DATA_PATH
DATA_OWNER_NAM E
DATA_OWNER_ZONE
DATA_REPL_STATUS
DATA_CHECKSUM
DATA_EXPIRY
DATA_CREATE_TIM E
DATA_M ODIFY_TIM E
www.ogf.org
Unique identifier for a registered file
Unique identifier for the collection
Logic al name of the file in the data grid
Replication number of the file.
Version of the file.
Type of th e file (.doc, .pdf, …)
Size of the file in bytes
Group name of storage resources
Storage resource name for file storage
Physical path name of the file
Owner of the file (USER_NAME)
Home zone of the owner (USER_ZONE)
Status condition of a file (current, stale)
M D5 checksum of the file
Retention date of the file.
Date of file registration
Last time the file was modified.
OGF-23
Logical File Name - Collections
Collection attributes
COLL_ID
COLL_NAM E
COLL_PARENT_NAM E
COLL_OWNER_NAM E
COLL_OWNER_ZONE
COLL_COM M ENTS
COLL_CREATE_TIM E
COLL_M ODIFY_TIM E
www.ogf.org
Unique collection identifier
Collection name
Name of the parent collection
Name of the collection owner
(USER_NAM E)
Name of the home zone of the
collection owner (USER_ZONE)
Owner defined comments
Date collection was created
Time collection was last modified
OGF-23
File Access Controls
The triplet {DATA_ID, USER_ID, DATA_ACCESS_TYPE} is used to
define data access controls.
Access control attributes
DATA_ACCESS_TYPE
Unique identifier for the
type of access permission
DATA_ACCESS_NAM E
Name of the access
permission (read, write,
own, null)
DATA_TOKEN_NAM ESPACE Namespace used to specify
internal parameters for
managing data
DATA_ACCESS_USER_ID
Unique ID of the user
DATA_ACCESS_DATA_ID
Unique ID of the file
www.ogf.org
OGF-23
Logical File Name
Descriptive metadata for files
META_DATA_ATTR_NAM E Name of descriptive metadata attribute
META_DATA_ATTR_VALUE Value associated with the descriptive
metadata attribute
META_DATA_ATTR_UNITS
Units associated with the value
Collection metadata
M ETA_COLL_ATTR_NAM E
M ETA_COLL_ATTR_VALUE
M ETA_COLL_ATTR_UNITS
www.ogf.org
Name of descriptive metadata attribute
for a collection
Value associated with the descriptive
metadata attribute for the collection
Units associated with the value
OGF-23
Logical User Name
User attributes
ZONE_ID
ZONE_NAM E
USER_ID
USER_NAM E
USER_TYPE
USER_ZONE
USER_DN
USER_INFO
USER_COM M ENT
USER_CREATE_TIM E
USER_M ODIFY_TIM E
www.ogf.org
Unique identifier for the data grid, called a zone
Name of the data grid
Unique identifier for a user within a zone
Name of the user
Role of the user (rods_user, rods_administrator)
Home zone of the user
User distinguished name for security certificate
Address of the user
Data grid admin istrator comment on user
Date user identity was created
Last time the user identity was modified
OGF-23
Logical User Names
User groups
USER_GROUP_ID
Unique identifier for a user group
USER_GROUP_NAM E
List of users (USER_ID) in the
user group
User attributes
M ETA_USER_ATTR_NAM E
M ETA_USER_ATTR_VALUE
M ETA_USER_ATTR_UNITS
www.ogf.org
Name of descriptive metadata attribute
for a user
Value associated with the descriptive
metadata attribute for the user
Units associated with the value
OGF-23
Logical Resource Attributes
Resource attributes
RESC_ID
RESC_NAM E
RESC_ZONE_NAM E
RESC_TYPE_NAM E
RESC_CLASS_NAM E
RESC_LOC
RESC_VAULT_PATH
RESC_COM M ENT
RESC_CREATE_TIM E
RESC_M ODIFY_TIM E
www.ogf.org
Unique identifier for a storage r esource within a zone
Name of the storage r esource
Name of the zone in which the resource is located
Type of storage resource (unix-file-system)
Class of storage resource (archival, permanent disk,
cache, temporary disk)
Location of storage resource (IP internet address)
Path name under which files are stored on resource
Data grid admin istrator comment on storage resource
Date storage resource vault was created
Last time the storage resource vault was modified
OGF-23
Logical Resource Names
Resource attributes
M ETA_RESC_ATTR_NAM E
M ETA_RESC_ATTR_VALUE
M ETA_RESC_ATTR_UNITS
Resource groups
RESC_GROUP_RESC_ID
RESC_GROUP_NAM E
www.ogf.org
Name of descriptive metadata attribute
for a resource
Value associated with the descriptive
metadata attribute for the storage
resource
Units associated with the value
Unique identifier for a storage
resource group
List of storage resources
(RESC_ID) in the resource group
OGF-23
Logical Rule Names
Rule attributes for delayed execution
RULE_EXEC_ID
RULE_EXEC_NAM E
RULE_EXEC_REI_FILE_PATH
RULE_EXEC_USER_NAM E
RULE_EXEC_ADDRESS
RULE_EXEC_TIM E
RULE_EXEC_FREQUENCY
RULE_EXEC_LAST_E XE_TIM E
RULE_EXEC_STATUS
www.ogf.org
OGF-23
Unique identifier for a rule
Name of the rule
Session identifier for a rule
within the execution queue
Name of the user executing the
rule
Location of the host where the
rule will be executed
Date when the rule will be
executed (Unix seconds since
Jan 1 1970.)
Period in seconds after which a
periodic rule is executed again
Time when the rule was last
executed.
Status of the rule execution
(failed, retry, done)
Tokens - Internal State Variables
Token definition
TOKEN_NAM ESPACE
TOKEN_ID
TOKEN_NAM E
TOKEN_VALUE
TOKEN_VALUE2
TOKEN_VALUE3
TOKEN_COM M ENT
www.ogf.org
Namespace used to identify token attributes
Unique identifier of the token
System parameter name
Value of the token
Second associated value for token
Third associated value for a token
Comment defining purpose of token
OGF-23
Structured Information
• Distributed information resources

Information required to interact with remote resource resides within
the remote resource
 Mounted Collection interface accesses the information
• Applications now manipulate structured information

Posix I/O to manipulate bit streams is no longer sufficient
 Generate structured information through application of microservices at the remote storage location
 Transmit structured information from remote storage location to the
client
 Maintain information structures in memory to link multiple microservices into a server-side workflow
www.ogf.org
OGF-23
Mounted Collection Interface
• Mounted collection

Set of standard operations for acquiring information from remote resource
 Containers - tar files, HDF5, XFDU, XAM,
 Remotely mounted file system directories
• Structured information driver
 Mapping of standard operations to the protocol used by the remote information
resource
• Multiple standards for describing structured information

Data grids


Lstore, SRB, iRODS, …
Digital Library

METS, PREMIS metadata standards for descriptive metadata
 Fedora, DSpace information resources

Preservation systems

OAIS representation information for a record
 LOCKSS

Commercial

SNIA / XAM object-based storage interface
www.ogf.org
OGF-23
Audit Trails
• Log of operations performed upon a file

name of rule
 name of person who applied the operation
 date
 file that was manipulated
 additional information depending on operation type
• Trying two approaches

Store information with the file as descriptive metadata
 Store in separate log
• Log of operations performed at a storage system

Recent request for tracking storage level usage and performance
www.ogf.org
OGF-23
For More Information
Reagan W. Moore
San Diego Supercomputer Center
[email protected]
http://www.sdsc.edu/srb/
http://irods.sdsc.edu/
www.ogf.org
OGF-23