Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory [email protected] AUKEGGS Canberra, 2006-11-29 Outline • • • • Introduction The feature model as integration key An interoperability approach for files xlink review and proposed profile for legacy data • Examples • Issues AUKEGGS Canberra, 2006-11-29 Introduction • Much ‘earth-science’ data exists as large legacy file-stores – e.g. ECMWF: 2 Pb of file-based data – e.g British Atmospheric Data Centre: 40 Tb of filebased data • Interoperability demands common approaches • BUT, multitude of formats masks commonality – netCDF, HDF4, HDF5, GRIB, NASA Ames, PP, ... AUKEGGS Canberra, 2006-11-29 Introduction • File-centred data management focusses on the container rather than content • File API is fundamental point of reference – binary format details not always exposed or guaranteed – public API may be only supported access mechanism – often implemented as performant optimised native library • Conclusion: can’t/shouldn’t migrate AUKEGGS Canberra, 2006-11-29 Introduction • Want to expose information, not format... AUKEGGS Canberra, 2006-11-29 Introduction • Information structures may be composed across files AUKEGGS Canberra, 2006-11-29 The feature model • Common pattern with file-data: – need to integrate information structures across multiple files – (relational tables provide this implicitly) • Semantics provide an integration key – e.g. an oceanographer and meteorologist can share a conversation about data despite format differences AUKEGGS Canberra, 2006-11-29 The feature model AUKEGGS Canberra, 2006-11-29 A model for file-based interoperability • Retain file-based persistence format • Supplement with feature-based conceptual model • ‘Cast’ legacy data onto conceptual model – interoperableData = (featureModel) legacyData • Legacy file data + GML-encoded conceptual ‘metadata’ = ‘interoperable view’ – may be exposed through W*S AUKEGGS Canberra, 2006-11-29 A model for file-based interoperability • GML provides conceptual feature ‘skeleton’ • File provides ‘flesh’ • GML ‘by-reference’ pattern for property values – uses simple xlink – “The value of a GML property that carries an xlink:href attribute is the resource returned by traversing the link” AUKEGGS Canberra, 2006-11-29 xlink review extended xlink [role] [title] remote resource B [href] [role] [title] [label] local resource A [role] [title] [label] remote resource C [href] [role] [title] [label] arc 1 [arcrole] [title] [show] [actuate] arc 2 local resource D [role] [title] [label] arc 3 AUKEGGS Canberra, 2006-11-29 xlink review simple xlink [role] [title] remote resource [href] [role] [title] [label] arc [arcrole] [title] [show] [actuate] local resource [role] [title] [label] AUKEGGS Canberra, 2006-11-29 xlink review • ‘role’ (URI): – indicates a property of the remote resource – must be a URI reference that “identifies some resource that describes the intended property” • ‘arcrole’ (URI): – describes the “meaning of the arc’s ending resource relative to its starting resource” – corresponds to RDF notion of a property • starting-resource HAS arc-role ending-resource AUKEGGS Canberra, 2006-11-29 xlink patterns for files extended xlink GML feature instance Aggregation semantics determined by xlink arc traversal rules AUKEGGS Canberra, 2006-11-29 xlink patterns for files simple xlink GML feature instance Aggregation semantics determined by storage descriptor AUKEGGS Canberra, 2006-11-29 xlink proposal <someGMLElement xlink:arcrole="hasRemoteContentEmbeddedAt#localXpath" xlink:href="storageDescriptor#portion" xlink:role="storageSchemaIdentifier" xlink:show="embed" xlink:actuate="onRequest | onLoad"/> • href examples: – – – – netCDF#variable RDBMS#SQLQuery GRIBFile#recordNumber CSMLStorageDescriptor#arrayID AUKEGGS Canberra, 2006-11-29 Example • GML CR 06-160 – ISO 19123 CV_ReferenceableGrid <gml:ReferenceableGrid gml:id="ID001" srsName="urn:ogc:def:crs:EPSG:6.6:4326" dimension="2"> <gml:limits> <gml:GridEnvelope> <gml:low>0 0</gml:low> <gml:high>7 4</gml:high> </gml:GridEnvelope> </gml:limits> <gml:axisLabels>x y</gml:axisLabels> <gml:coordTransformTable> <gml:GridCoordinatesTable> <gml:gridOrdinate> <gml:GridOrdinateDescription> <gml:coordAxisLabel>Geodetic longitude</gml:coordAxisLabel> <gml:coordAxisValues> <gml:SpatialOrTemporalPositionList> <gml:coordinateList>13.5 24.9 32.4 37.7 41.5 46.8 54.4 65.7</gml:coordinateList> </gml:SpatialOrTemporalPositionList> </gml:coordAxisValues> <gml:gridAxesSpanned>x</gml:gridAxesSpanned > <gml:sequenceRule axisOrder="+1">Linear</gml:sequenceRule> </gml:GridOrdinateDescription> </gml:gridOrdinate> <gml:gridOrdinate> <gml:GridOrdinateDescription> <gml:coordAxisLabel>Geodetic latitude</gml:coordAxisLabel> <gml:coordAxisValues> <gml:SpatialOrTemporalPositionList> <gml:coordinateList> 53.1 48.7 46.2 44.7 43.9 43.3 43.1 44.0 46.2 43.2 41.5 40.6 40.2 40.0 40.3 41.7 37.1 36.1 35.6 35.5 35.7 36.0 37.1 39.5 30.4 30.2 30.4 30.7 31.1 32.0 33.8 37.2 24.3 24.8 25.3 26.0 26.6 27.7 29.7 33.4 </gml:coordinateList> </gml:SpatialOrTemporalPositionList> </gml:coordAxisValues> <gml:gridAxesSpanned>x y</gml:gridAxesSpanned > <gml:sequenceRule axisOrder="+1 -2">Linear</gml:sequenceRule> </gml:GridOrdinateDescription> </gml:gridOrdinate> </gml:GridCoordinatesTable> </gml:coordTransformTable> </gml:ReferenceableGrid> AUKEGGS Canberra, 2006-11-29 Example • netCDF ASCII dump: netcdf myfile { dimensions: x=8; y=5; variables: float lon(x) ; lon:long_name = “longitude” ; lon:units = “degrees_east” ; float lat(x,y) ; lat:long_name = “latitude” ; lat:units = “degrees_north” ; float temp(x,y) ; temp:coordinates = “lon lat” ; temp:long_name = “temperature” ; temp:units = “degC” ; data: lon = 13.5, 24.9, 32.4, 37.7, 41.5, 46.8, 54.4, 65.7 ; lat = 53.1, 48.7, 46.2, 44.7, 43.9, 43.3, 43.1, 44.0, 46.2, 43.2, 41.5, ... AUKEGGS Canberra, 2006-11-29 Example <gml:gridOrdinate> <gml:GridOrdinateDescription> <gml:coordAxisLabel>Geodetic longitude</gml:coordAxisLabel> <gml:coordAxisValues> <gml:SpatialOrTemporalPositionList> <gml:coordinateList srsName=“WGS84”>13.5 24.9 32.4 37.7 41.5 46.8 54.4 65.7</gml:coordinateList> </gml:SpatialOrTemporalPositionList> </gml:coordAxisValues> <gml:gridAxesSpanned>x</gml:gridAxesSpanned > <gml:sequenceRule axisOrder="+1">Linear</gml:sequenceRule> </gml:GridOrdinateDescription> </gml:gridOrdinate> <gml:coordAxisValues xlink:arcrole=“http://ndg.nerc.ac.uk/xlinkUsage/insert#SpatialOrTemporalPositionList/coordinateList” xlink:href=“myfile.nc#lon” xlink:role=“http://ndg.nerc.ac.uk/fileFormat/netcdf” xlink:show=“embed”> <gml:SpatialOrTemporalPositionList> <gml:coordinateList srsName=“WGS84”/> </gml:SpatialOrTemporalPositionList> AUKEGGS </gml:coordAxisValues> Canberra, 2006-11-29 Issues • Need to ‘get as close as possible’ to target – ‘merge’ semantics consistent with GML? (Opportunity: no best practice for GML yet!) • “If both a link and content are present in an instance of a property element, then the object found by traversing the xlink:href link shall be the normative value of the property. The object included as content shall be used by the data recipient only if the remote instance cannot be resolved; this may be considered to be a "cached" version of the object.” [GML 7.2.3.4] AUKEGGS Canberra, 2006-11-29 Issues • xlink:href (URI) for remote resource fragment (formatspecific) – e.g. RDBMS#SQLQuery, netCDF#variable, etc... • xlink:role (URI) for resource format – e.g. reference PRONOM-type format repository? • implied conversion to GML target content type • xlink:arcrole (URI) for ‘embed remote content’ semantics – ‘insert at relative XPath’ essential • simple xlink can’t handle multiple resources – application-specific ‘storage descriptor’ schemas for file aggregation semantics AUKEGGS Canberra, 2006-11-29 Conclusion • Presented a profile for xlink with files in absence of current best practice • Meets key practical requirements – retain file-based persistence formats – provide interoperability ‘wrapper’ – focus on logical content, not container (feature model) • Semantic governance at appropriate points • Enables powerful, scalable mechanism for real data – e.g. large meteorological datasets AUKEGGS Canberra, 2006-11-29
© Copyright 2026 Paperzz