National Science Foundation Cooperative Agreement: OCI-0940841 DFC Vision • Build collaboration environment – Sharing of data, information, and knowledge • Form national data cyberinfrastructure – Federation of existing data management systems • Support reproducible data-driven research NEW – Encapsulate knowledge within shared workflows • Enable student participation in research – Policy-controlled analysis of “live” data Compute Resources – HPC centers, institutional clusters DFC Collaboration Environment – Data Grid Community Resources – Repository, Catalog Data Driven Science and Engineering Collaboration Environments – Oceanography – Ocean Observatory Initiative • Archiving climatic data records from real-time sensor data streams – Engineering – CIBER-U • Engineering Digital Library: Curating civil engineering data, materials data, archaeology data, student training materials – Hydrology - EarthCube • Automating hydrology research workflows (data retrieval, transformation, analysis) – Plant biology – the iPlant Collaborative • Enable collaborative research across existing data repositories – Cognitive science – the Temporal Dynamics of Learning Center • Manage research data, apply IRB policies – Social Science – the Odum Institute • Integrate policy-based data management with the existing Dataverse repository Challenges • Federated national data cyberinfrastructure • Existing projects have web services, data repositories, digital libraries, archives, processing pipelines, science portals • What are the interoperability mechanisms needed to enable federation of existing resources? DFC Builds on the iRODS data grid (integrated Rule Oriented Data System) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Astrophysics Atmospheric science Biology Climate Cognitive Science Computer Science Cosmic Ray Dark Matter Physics Earth Science Ecology Engineering High Energy Physics Hydrology Genomics Medicine Neuroscience Neutrino Physics Oceanography Optical Astronomy Particle Physics Plant genetics Quantum Chromodynamics Radio Astronomy Seismology Social Science Auger supernova search NASA Langley Atmospheric Sciences Center Phylogenetics at CC IN2P3 NOAA National Climatic Data Center Temporal Dynamics of Learning Center GENI experimental network AMS experiment on the International Space Station Edelweiss II NASA Center for Climate Simulations CEED Caveat Emptor Ecological Data CIBER-U BaBar / Stanford Linear Accelerator Institute for the Environment, UNC-CH; Hydroshare Broad Institute, Wellcome Trust Sanger Institute, NGS Sick Kids Hospital International Neuroinformatics Coordinating Facility T2K and dChooz neutrino experiments Ocean Observatories Initiative National Optical Astronomy Observatory Indra multi-detector collaboration at IN2P3 the iPlant Collaborative IN2P3 Cyber Square Kilometer Array, TREND, BAOradio Southern California Earthquake Center Odum, TerraPop Policy Concept Graph Purpose Policy Enforcement Purpose Persistent State Collection Property Procedure Policy Collection Defines Replication Policy Has Data Type Policy Isa Authenticity Access control Isa Property DATA_CHECKSUM Isa Isa Has Digital Object Isa Quota Policy Integrity DATA_REPL_NUM Isa Isa Checksum Policy Defines DATA_ID Has Has Attribute Isa Isa Updates Isa Policy Defines Controls Procedure Updates Isa Has HasFeature Persistent State Information Isa SubType GetUserACL HasFeature Completeness HasFeature Policy Enforcement Point Correctness Periodic Assessment Criteria Policy Workflow SetDataType Chains Function HasFeature Isa Invokes Isa Isa Isa Isa Consensus SetQuota DataObjRepl Isa Consistency Client Action Operation SysChksumDataObj Policy-based Data Management – Implementation in iRODS Purpose (5 main types) SubType Replication Policy Archive Data grid Collection Digital Library Processing Pipeline Data Type Policy Isa Access control Property (7 default) Isa DATA_REPL_NUM Has DATA_CHECKSUM Isa Isa Isa Isa Digital Object Isa Quota Policy Defines DATA_ID Has Checksum Policy Integrity Authenticity Collection Defines Has Has Attribute Isa Isa Updates Isa Defines Policy (11 default) Controls Procedure (11 default) Updates Isa Has HasFeature Isa SubType Persistent State Information (338) msiGetUserACL HasFeature Completeness HasFeature Policy Enforcement Points (70) Correctness Periodic Assessment Criteria Policy Workflow msiSetDataType Chains Micro-service (317) HasFeature Isa Invokes Isa Isa Isa Isa Consensus msiSetQuota msiDataObjRepl Isa Consistency Clients (50) Operation msiSysChksumDataObj Federation Approach • Use middleware to implement unifying name spaces for: 1. 2. 3. 4. 5. 6. 7. Users Collections Objects Storage systems Metadata Policies Micro-services Single sign-on Directories, workflow, time series Files, soft links, workflows Cloud, tape, file systems, objects Provenance, description, state Management, assessment Procedures, interactions DFC - CNI DFC Federation Hub ooi icat.oceanobservatories.org: 1247 hydrology iren2.renci.org: 2823 renci Iren2.renci.org: 1247 odumMain iodum1.irss.unc.edu: 1247 engineering irods.ischool.drexel.edu: 1247 TDLC tdlc-01.sdsc.edu: 6688 dfctest dfctest.renci.org: 1248 Port: 1237, Zone: dfcmain iCAT iren2.renci.org hydroResc hydro.renci.org res-bk15 srbbrick15.ucsd.edu res-dfcmain iren2.renci.org demoResc iren2.renci.org National Infrastructure Existing infrastructure Research Environment - Portals, Applications, Workflows XSEDE Kepler DFC Collaboration Environment – Data Grid OOI TDLC iPlant CUAHSI NCDC Dataverse Community Resource Repository Community Resource Catalog GeoBrain DataONE Community Resource Services DFC - CNI NCSA Polyglot The Future: Reproducible Research Archives Experiments Literature Sensors Simulation The Challenge: Support reproducible data-driven research Deliver the capability to manage, mine, and publish knowledge through collaboration environments. DFC - CNI National Infrastructure Approach 1. Build national data cyberinfrastructure prototype – Support multiple science and engineering domains by loosely coupling their existing infrastructure with a collaboration environment 2. Develop generic interoperability framework – Define the generic infrastructure needed for the national infrastructure to manage knowledge as well as data and information 3. Define interoperability mechanisms – Support access across the disparate types of infrastructure in common use 4. Define domain specific extensions – Support three levels: technical interoperability, project level policy, and end user usage requirements Interoperability Mechanisms Policies control execution of each interoperability mechanism Analysis Workflows Knowledge Creation Procedures : Micro-services Knowledge Management Soft Links Collection Registration Message Queue Information Exchange Database Query Information Manipulation Micro-services Data Access Storage Driver Data Manipulation Knowledge Information Data DFC - CNI DataNet Interoperability Research Environment - Portals, Applications, Workflows DFC Data Grid DFC Collaboration Environment Web Service Message Queue SEAD Portal (VIVO) DataONE Coordinating Node DataONE Member Node SEAD Data TerraPop Server DFC Data Grid DFC - CNI SEAD Engagement Center DFC Interoperability Layers Authentication PAM / GSSAPI InCommon, GSI, Kerberos, Shibboleth, LDAP Data Access Micro-Services DataONE, Data Conservancy, CUAHSI, NCDC Data Manipulation Format Drivers NetCDF, HDF5, THREDDS, ERDDAP Workflows Micro-Services Kepler, NCSA Cyberintegrator, Taverna, NCSA Polyglot Networks Network Drivers HTTPS, TCP/IP, Parallel TCP/IP, RBUDP Clients OpenSocial Storage Systems Storage Drivers Messaging Micro-Services AMQP, iRODS Xmsg Vocabulary Micro-Services HIVE, (Cheshire) Management Policies (RDA Policies), (ISO 16363 Criteria) Web browsers, Web Services, Workflows, FUSE, Synchronization, MediaWiki File Systems, Tape Archives, Object Stores, Cloud Storage DFC - CNI Interoperability Mechanisms • Drivers – Encapsulate knowledge to support your operations at the remote repository: partial I/O, parsing of formats, manipulation of data structures – Authentication, format, storage • Micro-services – Encapsulate knowledge needed to interact with an external system or with a data set using the remote protocol – Data access, external workflows, semantics, messaging • Policies – Encapsulate knowledge needed for management functions – Federation control, administrative tasks, validation checks Assertion • Three basic types of interoperability mechanisms are sufficient for assembling national data cyberinfrastructure • Example: Linked software defined networks to data grids – From an iRODS data grid, controlled the selection of three disjoint network paths for optimizing data transport by adding appropriate policy enforcement points and micro-services • Expect functionality currently in data grid middleware to migrate into network middleware Future Architecture Clients Data Grid Middleware Resources Clients Data Grid Middleware Virtual collection Network Middleware Virtual network DFC Federation Resources GEMI - GENI Contacts http://datafed.org http://irods.org Reagan W. Moore [email protected] National Science Foundation Cooperative Agreement: OCI-0940841 DFC - CNI
© Copyright 2026 Paperzz