CCLRC Scientific Metadata (CSMD) Model April 2004 NESC Shoaib Sufi CCLRC e-Science Centre Model Motivation • A common general format/standard for Scientific Studies and data holdings metadata does not exist • By proposing Model and Implementation: – Form a specification for the types of metadata studies should captured by Scientific Studies – Ease citation, collaboration, exploitation and Integration – Allow easy Integration of distributed heterogeneous metadata systems into a homogeneous (albeit virtual) Platform Shoaib Sufi CCLRC e-Science Centre Structure of Metadata Model • The CCLRC Scientific metadata model (CSMD) is a studydata set orientated model: – Indexing – Provenance – Data Description – Data Location – Access Conditions – Related Material Shoaib Sufi CCLRC e-Science Centre What influenced CSMD • CIP from Earth Observation • DDI from Social Sciences • DublinCore from the Library community – Publication only metadata • XSIL as used on LIGO – Low level ‘Scientific Data Objects’ focus • CERA from the MPIM – A bit specific to Earth Sciences but close • … hence the need to develop out own General Model – CCLRC Scientific Metadata Model Shoaib Sufi CCLRC e-Science Centre some Model aims • Abstract class orientated description of the types of metadata that should be captured by Scientific Studies • Create a denominator for Scientific Study metadata which form a specification • Metadata workshop at NIEES 2002 during a discussion on metadata standards – are people capturing metadata at the moment – simple answer given was no !! Shoaib Sufi CCLRC e-Science Centre CSMD Used on DataPortal • XML Implementation used as Data Interface for DataPortal • Single view of heterogeneous systems/schemas • Acts as a stress test of the model – Limitations feed into Model Requirements – New requirements fed back into implementation Shoaib Sufi CCLRC e-Science Centre Model Breakdown: Provenance • The Study contains the following metadata: – The Study Name – The Study Institution – The Investigator – Extended Study Information • Abstract • Funding • Start and End times – Investigations Shoaib Sufi CCLRC e-Science Centre Investigations • A Study can have more than one investigation; possible enumerations are experiment, simulation, measurements etc. – investigations contain: – Name – Investigation Type – Abstract – Resource – Link to DataHolding Shoaib Sufi CCLRC e-Science Centre Topic (for indexing) • Keywords – Discipline (i.e. domain) – Keyword Source (e.g. domain dictionary) – Keyword • Subjects – Discipline – Subject Source (e.g. domain taxonomy) – Subject Shoaib Sufi CCLRC e-Science Centre Access Condition & Related Material • Access Conditions – Contains a list of users or groups who are allowed access to the metadata and data, or a pointer to an access control system which contains such data for this study • Related Material – One or many links and or textual descriptions of material related to this study e.g. earlier studies or parallel studies Shoaib Sufi CCLRC e-Science Centre Data • Data Description holds a logical description of the Study’s data: – Data Name – Type of Data – Status – Data Topic – Parameters – Related Data Ref – Relation type (e.g. derived) • Data Location contains the link between logical name and physical URI’s – Data Name – Locator(s) Shoaib Sufi CCLRC e-Science Centre More on Parameters • Parameters contain a lot of information about the data objects (DO) and collections • A collection/DO can have many parameter entries, each parameter entry contains: • Parameter derivation (e.g. measured/fixed) – The value – The units – Range – Error margin • Parameter aggregation is also supported Shoaib Sufi CCLRC e-Science Centre Cardinality Issues • The model recommends a certain cardinality of elements • Certain metadata components are necessary for one to have an instance of the implemented model – treating everything as optional is not acceptable • It is though implementations may modify this more to their needs – model attempts to remain ideal (i.e. most common Cardinality) Shoaib Sufi CCLRC e-Science Centre Enumeration Issues • Enumerations (or controlled vocabularies) e.g. types of investigator, types of institutions; these are distinct from the model e.g. as taxonomies are. • However they are necessary for the model to work so implementations e.g. CCLRC DataPortal XML implementation of the model propose some enumerations for common things • Recognised and relevant controlled vocabularies are hoped to be used by implementation where they are available Shoaib Sufi CCLRC e-Science Centre Conformance Level • For a complete metadata study-dataset record a large amount of metadata has to be stored/processed • So it’s useful to have conformance levels • Model uses 5 levels • Each level specifies more metadata (and Indexing information) should be held Shoaib Sufi CCLRC e-Science Centre Level 1 • Type of Information captured: – Study and Investigation metadata with indexing at the Study level • Level 1 metadata is similar to library/publication style metadata (e.g. DublinCore) Shoaib Sufi CCLRC e-Science Centre Level 2 • Type of Information captured: – Level 1 + DataHolding metadata (i.e. DataSets and DataObjects) Shoaib Sufi CCLRC e-Science Centre Level 3 • Type of Information captured: – Level 2 + related material, Access condition, indexing to data collection levels Shoaib Sufi CCLRC e-Science Centre Level 4 • Type of Information captured: – Level 3 + indexing to data object level and data object parameter information Shoaib Sufi CCLRC e-Science Centre Level 5 • Type of Information captured: – All metadata components are filled as L4 + funding, resources used, facilities used etc Shoaib Sufi CCLRC e-Science Centre Conformance Levels • L1 is similar to library/publication style metadata (e.g. DublinCore) • The current DataPortal uses somewhere between L2 and L3 – indexing at study level moving towards collection level but with parameter information • Envisaged only new systems designed with CSMD will conform to L4+ • Benefit of conformance levels; the higher the level of conformance to the CSMD the richer the clients that operate on the data can be – e.g. identifying datasets and objects which link directly to keywords/taxonomies and not just studies Shoaib Sufi CCLRC e-Science Centre Shoaib Sufi CCLRC e-Science Centre Facilities using CSMD • CCLRC Facilities (via CCLRC DataPortal): – ISIS - Neutron Spallation at Rutherford Appleton Laboratory (test) – SR – Synchroton Radiation source at Daresbury Laboratory (test) – British Atmospheric Data Centre (BADC) at RAL (prototype) • External Facilities (via CCLRC DataPortal): – Max-Planck-Institut für Meteorologie (MPIM) in Hamburg • External Projects using CSMD – NERC funded E-mineral ‘environment from the molecular level’ – EPSRC funded E-materials project – Manchester MyGrid project uses an adapted version – ISIS (RAL) have taken data needs inhouse and use a model based heavily on CSMD Shoaib Sufi CCLRC e-Science Centre The Future • Increased use/recommendation for use of Controlled vocabularies • Increased support for formal identification systems • Feeding relevant ideas from other standards • Update XML and Relational implementations so they more closely track the model. • Look into internationalisation issues and see if these effect the model or the implementations Shoaib Sufi CCLRC e-Science Centre More information • Latest Model description – http://wwwdienst.rl.ac.uk/library/2002/tr/dltr2002001.pdf • For an XML implementation and Relational Implementation, newer draft of the model documentation e-mail: – [email protected] with the subject containing [metadata model request] Shoaib Sufi CCLRC e-Science Centre
© Copyright 2024 Paperzz