Presentation - Coalition for Networked Information

National Science Foundation Cooperative Agreement: OCI-0940841
DFC Vision
• Build collaboration environment
– Sharing of data, information, and knowledge
• Form national data cyberinfrastructure
– Federation of existing data management
systems
• Support reproducible data-driven
research
NEW
– Encapsulate knowledge within shared
workflows
• Enable student participation in
research
– Policy-controlled analysis of “live” data
Compute Resources – HPC
centers, institutional clusters
DFC Collaboration Environment
– Data Grid
Community Resources –
Repository, Catalog
Data Driven Science and Engineering
Collaboration Environments
– Oceanography – Ocean Observatory
Initiative
• Archiving climatic data records from
real-time sensor data streams
– Engineering – CIBER-U
• Engineering Digital Library: Curating
civil engineering data, materials data,
archaeology data, student training
materials
– Hydrology - EarthCube
• Automating hydrology research
workflows (data retrieval,
transformation, analysis)
– Plant biology – the iPlant
Collaborative
• Enable collaborative research across
existing data repositories
– Cognitive science – the Temporal
Dynamics of Learning Center
• Manage research data, apply IRB
policies
– Social Science – the Odum Institute
• Integrate policy-based data
management with the existing
Dataverse repository
Challenges
• Federated national data cyberinfrastructure
• Existing projects have web services, data
repositories, digital libraries, archives,
processing pipelines, science portals
• What are the interoperability mechanisms
needed to enable federation of existing
resources?
DFC Builds on the iRODS data grid
(integrated Rule Oriented Data System)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
Astrophysics
Atmospheric science
Biology
Climate
Cognitive Science
Computer Science
Cosmic Ray
Dark Matter Physics
Earth Science
Ecology
Engineering
High Energy Physics
Hydrology
Genomics
Medicine
Neuroscience
Neutrino Physics
Oceanography
Optical Astronomy
Particle Physics
Plant genetics
Quantum Chromodynamics
Radio Astronomy
Seismology
Social Science
Auger supernova search
NASA Langley Atmospheric Sciences Center
Phylogenetics at CC IN2P3
NOAA National Climatic Data Center
Temporal Dynamics of Learning Center
GENI experimental network
AMS experiment on the International Space Station
Edelweiss II
NASA Center for Climate Simulations
CEED Caveat Emptor Ecological Data
CIBER-U
BaBar / Stanford Linear Accelerator
Institute for the Environment, UNC-CH; Hydroshare
Broad Institute, Wellcome Trust Sanger Institute, NGS
Sick Kids Hospital
International Neuroinformatics Coordinating Facility
T2K and dChooz neutrino experiments
Ocean Observatories Initiative
National Optical Astronomy Observatory
Indra multi-detector collaboration at IN2P3
the iPlant Collaborative
IN2P3
Cyber Square Kilometer Array, TREND, BAOradio
Southern California Earthquake Center
Odum, TerraPop
Policy Concept Graph
Purpose
Policy
Enforcement
Purpose
Persistent
State
Collection
Property
Procedure
Policy
Collection
Defines
Replication
Policy
Has
Data Type
Policy
Isa
Authenticity
Access
control
Isa
Property
DATA_CHECKSUM
Isa
Isa
Has
Digital
Object
Isa
Quota
Policy
Integrity
DATA_REPL_NUM
Isa
Isa
Checksum
Policy
Defines
DATA_ID
Has
Has
Attribute
Isa
Isa
Updates
Isa
Policy
Defines
Controls
Procedure
Updates
Isa
Has
HasFeature
Persistent
State
Information
Isa
SubType
GetUserACL
HasFeature
Completeness
HasFeature
Policy
Enforcement
Point
Correctness
Periodic
Assessment
Criteria
Policy
Workflow
SetDataType
Chains
Function
HasFeature
Isa
Invokes
Isa
Isa
Isa
Isa
Consensus
SetQuota
DataObjRepl
Isa
Consistency
Client
Action
Operation
SysChksumDataObj
Policy-based Data Management – Implementation in iRODS
Purpose
(5 main types)
SubType
Replication
Policy
Archive
Data grid
Collection
Digital Library
Processing Pipeline
Data Type
Policy
Isa
Access
control
Property
(7 default)
Isa
DATA_REPL_NUM
Has
DATA_CHECKSUM
Isa
Isa
Isa
Isa
Digital
Object
Isa
Quota
Policy
Defines
DATA_ID
Has
Checksum
Policy
Integrity
Authenticity
Collection
Defines
Has
Has
Attribute
Isa
Isa
Updates
Isa
Defines
Policy
(11 default)
Controls
Procedure
(11 default)
Updates
Isa
Has
HasFeature
Isa
SubType
Persistent
State
Information
(338)
msiGetUserACL
HasFeature
Completeness
HasFeature
Policy
Enforcement
Points (70)
Correctness
Periodic
Assessment
Criteria
Policy
Workflow
msiSetDataType
Chains
Micro-service
(317)
HasFeature
Isa
Invokes
Isa
Isa
Isa
Isa
Consensus
msiSetQuota
msiDataObjRepl
Isa
Consistency
Clients (50)
Operation
msiSysChksumDataObj
Federation Approach
• Use middleware to implement unifying name spaces
for:
1.
2.
3.
4.
5.
6.
7.
Users
Collections
Objects
Storage systems
Metadata
Policies
Micro-services
Single sign-on
Directories, workflow, time series
Files, soft links, workflows
Cloud, tape, file systems, objects
Provenance, description, state
Management, assessment
Procedures, interactions
DFC - CNI
DFC Federation Hub
ooi
icat.oceanobservatories.org: 1247
hydrology
iren2.renci.org: 2823
renci
Iren2.renci.org: 1247
odumMain
iodum1.irss.unc.edu: 1247
engineering
irods.ischool.drexel.edu: 1247
TDLC
tdlc-01.sdsc.edu: 6688
dfctest
dfctest.renci.org: 1248
Port: 1237, Zone: dfcmain
iCAT
iren2.renci.org
hydroResc
hydro.renci.org
res-bk15
srbbrick15.ucsd.edu
res-dfcmain
iren2.renci.org
demoResc
iren2.renci.org
National Infrastructure
Existing infrastructure
Research Environment
- Portals, Applications, Workflows
XSEDE
Kepler
DFC Collaboration Environment
– Data Grid
OOI
TDLC
iPlant
CUAHSI
NCDC
Dataverse
Community Resource
Repository
Community Resource
Catalog
GeoBrain
DataONE
Community Resource
Services
DFC - CNI
NCSA Polyglot
The Future: Reproducible Research
Archives
Experiments
Literature
Sensors
Simulation
The Challenge:
Support reproducible data-driven research
Deliver the capability to manage, mine, and publish
knowledge through collaboration environments.
DFC - CNI
National Infrastructure Approach
1. Build national data cyberinfrastructure prototype
–
Support multiple science and engineering domains by loosely
coupling their existing infrastructure with a collaboration
environment
2. Develop generic interoperability framework
–
Define the generic infrastructure needed for the national
infrastructure to manage knowledge as well as data and
information
3. Define interoperability mechanisms
–
Support access across the disparate types of infrastructure in
common use
4. Define domain specific extensions
–
Support three levels: technical interoperability, project level
policy, and end user usage requirements
Interoperability Mechanisms
Policies control execution of each interoperability mechanism
Analysis Workflows
Knowledge Creation
Procedures : Micro-services
Knowledge Management
Soft Links
Collection Registration
Message Queue
Information Exchange
Database Query
Information Manipulation
Micro-services
Data Access
Storage Driver
Data Manipulation
Knowledge
Information
Data
DFC - CNI
DataNet Interoperability
Research Environment - Portals, Applications, Workflows
DFC
Data Grid
DFC Collaboration
Environment
Web Service
Message
Queue
SEAD Portal
(VIVO)
DataONE Coordinating Node
DataONE Member Node
SEAD
Data
TerraPop
Server
DFC
Data Grid
DFC - CNI
SEAD Engagement Center
DFC Interoperability Layers
Authentication
PAM / GSSAPI
InCommon, GSI, Kerberos, Shibboleth, LDAP
Data Access
Micro-Services
DataONE, Data Conservancy, CUAHSI, NCDC
Data Manipulation
Format Drivers
NetCDF, HDF5, THREDDS, ERDDAP
Workflows
Micro-Services
Kepler, NCSA Cyberintegrator, Taverna, NCSA
Polyglot
Networks
Network Drivers
HTTPS, TCP/IP, Parallel TCP/IP, RBUDP
Clients
OpenSocial
Storage Systems
Storage Drivers
Messaging
Micro-Services
AMQP, iRODS Xmsg
Vocabulary
Micro-Services
HIVE, (Cheshire)
Management
Policies
(RDA Policies), (ISO 16363 Criteria)
Web browsers, Web Services, Workflows,
FUSE, Synchronization, MediaWiki
File Systems, Tape Archives, Object Stores,
Cloud Storage
DFC - CNI
Interoperability Mechanisms
• Drivers
– Encapsulate knowledge to support your operations at the
remote repository: partial I/O, parsing of formats, manipulation
of data structures
– Authentication, format, storage
• Micro-services
– Encapsulate knowledge needed to interact with an external
system or with a data set using the remote protocol
– Data access, external workflows, semantics, messaging
• Policies
– Encapsulate knowledge needed for management functions
– Federation control, administrative tasks, validation checks
Assertion
• Three basic types of interoperability mechanisms
are sufficient for assembling national data
cyberinfrastructure
• Example: Linked software defined networks to
data grids
– From an iRODS data grid, controlled the selection of
three disjoint network paths for optimizing data
transport by adding appropriate policy enforcement
points and micro-services
• Expect functionality currently in data grid
middleware to migrate into network middleware
Future Architecture
Clients
Data Grid Middleware
Resources
Clients
Data Grid Middleware
Virtual
collection
Network Middleware
Virtual
network
DFC Federation
Resources
GEMI - GENI
Contacts
http://datafed.org
http://irods.org
Reagan W. Moore
[email protected]
National Science Foundation Cooperative
Agreement: OCI-0940841
DFC - CNI