Geographic Feature Pipes

Geographic Feature Pipes
Marcell Roth
Institute for Geoinformatics, University of Muenster, Germany
[email protected]
Abstract. Aggregating and combining data coming from different Web
sources to create ad-hoc information refers to the concept of “piping”
data. Linked Data is a solution which facilitates the browsing through
related information and provides technologies to easily pipe data included
in this Web of Data. The Open Geospatial Consortium (OGC) has established standards for the storage, retrieval, and processing of geospatial
data. These standards act as foundation for the Spatial Data Infrastructures. The integration of existing geospatial data into the Data Web is
missing yet. The presented Geographic Feature Pipes (GFP) is an API
deployed as free Web service working towards closing this gap. It translates sensor data based on the OGC’s Observations and Measurements
specification as well as geospatial data served by OGC Web Feature Services into its RDF representations. This enables complex queries and
browsing through related geospatial data sources, as well as means of
merging information of geographic features with related sensor data into
one document. The translated data based on ontologies providing the vocabulary for the definition of the data entities. The presented approach
shows that in conjunction with semantic annotations, we are able to
bridge the gap between geospatial applications and Semantic Web technologies to move toward the development of the Geospatial Semantic
Web.
1
Introduction
The Web is based on URLs as unique identifiers for documents and other data.
These links allow users for browsing through the Web in order to retrieve information. Despite the advantages the Web offers, published data (information) is
primary nested in HTML Web pages. HTML is about layouting content and not
able to type links connecting an entity of the Web document to related entities
[3]. Hyperlinks indicate that two documents are related, but leave it to the user
to infer the nature of the relationship. Linked Data is a solution to create shared
and structured information spaces [3], which include links between related information stating the nature of the connection. Its purpose is to create and connect
related data on the Web with typed links, as if it would be one global database.
To realize such a Web of Data, published data has to follow the Linked Data
principles first outlined by Berners-Lee in 2006 [2]: the “raw” data is encoded
in the machine-readable RDF [17], the data is Web addressable via URIs, and
data is linked with other data via RDF links. RDF is a graph-based data model
representing information with subject-predicate-object expressions (also called
triples). A RDF link is one type of RDF triple and states that one data entity
has some kind of relation to another data entity [4]. Linked Data promotes the
reuse of information and reduces redundancy of existing information. It facilitates the discovery of relevant information within the variety of information
resources. Instead of following hyperlinks, users follow RDF links. The SPARQL
Protocol and RDF Query Language (SPARQL) [25] supports users in formulating more sophisticated queries. Information of relationships stated in various
RDF documents can be retrieved by querying across different sources. SPARQL
also provides capabilities to easily combine information from different sources by
merging two sets of triples into a single RDF graph [12]. Thus, new information
can be created from the resulting dataset.
Location is ubiquitous [11] and is an issue in many of the problems decision makers must solve [16]. Such problems may vary from simple questions like
“Where are my friends now?” to complex ones, for example, which areas are prone
to floods in order to reduce the potential damage. Such geographic problems illustrate the increasing interest in geographic information (GI) in recent years.
Geobrowsers like GoogleEarth1 or Microsoft’s Bing Maps2 are responses to user
needs for location-based information services. They are part of the Geospatial
Web [26], which makes GI shareable, searchable and ubiquitous for users and
decision makers [8] by using the infrastructure of the Web. In the Geospatial
Web, a distinction is made between geospatial data and services that facilitate
the use of GI in many domain applications [14]. The variety of datasets containing GI reaches from simple map images to complex vector or sensor data.
The Open Geospatial Consortium3 (OGC) developed the XML-based Geography Markup Language (GML) [24] as data modeling and encoding standard for
GI, in particular when modeled as features, following the ISO/OGC reference
model [23]. Vector data is conceived of as a feature, which is an abstraction of
a real world phenomenon. Associated with a geographic location relative to the
Earth, it is labeled as a geographic feature. Examples include buildings, streets,
and rivers. Sensor data is stored and published using OGC’s Observations and
Measurements (O&M) [6] model. Much of this data has been made available as
Web services in the last decades. Web services are an important component in
the fabric of the Geospatial Web [14], since they enable the sharing of geospatial
data across organization boundaries over the Web [29]. Furthermore, they act
on data and support discovery, retrieval and processing functionality. The OGC
specifies implementation standards for such geospatial Web services. They are
divided into various types: Web Feature Services (WFS) [28] serve vector-based
data. A Sensor Observation Service (SOS) [22] provides a Web service interface
to access observation results measured by sensors and sensor systems. Web Processing Services process or analyze geospatial data, e.g. the complex calculation
of roadway noise. Various other OGC Web Services (OWS) exist and are listed
1
2
3
See http://earth.google.com/
See http://www.bing.com/maps/
See http://www.opengeospatial.org/
on the OGC Web site4 . These services can be combined to Spatial Data Infrastructures (SDI) to improve the interoperability between various data providers
and users by smoothly exchanging and integrating GI.
Despite the benefits the Geospatial Web provides, several open issues have to
be discussed. GI is not well integrated in the Geospatial Web yet. It is possible
to request GI from an OWS via an unique URL, e.g. a feature collection served
by a WFS, but features (data entities) included in this dataset cannot be dereferenced by people or clients. Consequently, links to information that is related
to such a feature do not exist as well, although it would facilitate the Geographic
Information Retrieval (GIR) [15]. Different OGC standards also raise compatibility issues across different applications. Merging different datasets, such as
O&M and GML, into one dataset is very difficult.
The transformation and publication of the OpenStreetMap [1] and Ordnance
Survey [10] data according to the Linked Data principles have added a new
dimension to the Web of Data. This work also adds spatial and temporal dimensions to the Web of Data. With its benefits the work solves the mentioned
issues of the Geospatial Web. In this paper we present our implementation of
the Geographic Feature Pipes (GFP) which translate O&M based observations
and GML features into RDF. This provides options to discover spatiotemporal
data and possible related information by following RDF links or even to merge
information related to a geographic feature into one document. GFP is a proxybased solution [13] and a first step to bridge the gap between geospatial data
included in the Geospatial Web and the Linked Data community. It increases
the accessibility to non-OGC data sources. Features might be linked to Geonames5 entries. DBpedia6 entries might be connected with real-time sensor data.
Providing features and observations as Linked Data make them accessible for a
broad audience, which is maybe not aware of the geospatial Web services defined
by the OGC. Linking them to entries included in the LinkedGeoData dataset
[1] bridges the gap between the emerging Volunteered Geographic Information
(bottom-up) data formats [9] and top-down standards like GML or O&M as
well. Our approach adds extra knowledge to the datasets by using RDF-Schema
(RDF-S) [5] ontologies for the definition of the geospatial linked data entities.
Ontologies provide domain-specific terms for describing types of things in the
real world and relations among those. Even more powerful queries can be formulated as a result. Linking information of the geographic domain to the Web
of Data bridges the gap to the Semantic Web community as well.
The remainder of this paper is structured as follows. A brief application scenario is introduced in Section 2 which illustrates the benefit of creating geospatial
linked data entities. The implementation of GFP is described in Section 3, before
we summarize and outline future work in Section 4.
4
5
6
See http://www.opengeospatial.org/standards
See http://www.geonames.org/
See http://dbpedia.org/
2
Application Scenario
Creating GI following the Linked Data principles has several advantages. Geospatial linked data entities can easily be linked to related information and provide
capabilities of merging two datasets to infer more information. It supports users
to construct sophisticated queries and if the data is semantically annotated,
the queries are even more powerful. The semantic enrichment of the underlying
data models by linking them to formally specified vocabularies such as ontologies is called semantic annotation [21,18]. Semantic query processing performed
by reasoning engines like IRIS7 with semantic annotations return more precise
discovery results. In the following we present an application scenario which illustrates the benefit of O&M based observations and GML features provided as
linked data. Here we assume that the underlying data models, which are defined
as RDF-S ontologies, are semantically annotated. The data model additionally
includes a domain reference linking to global domain concepts capturing the
data’s relation to reality. Features are linked to Geonames entries as well.
Bob works for the Federal Institute of Hydrology in Germany. Due to a longer
dry season in North Rhine-Westphalia, he has to determine the navigability of
rivers in this state for vessels with a minimum width of 10 meters. To meet
his task, he requires information about the average width of the rivers and as
well about their continuously changing qualities such as water levels and flow
rate. The latter data is, for example, coming from sensors offering real-time
observations served by a SOS. Instead of finding and requesting each OGC Web
service offering him the needed information, he navigates to our website which
provides him a SPARQL endpoint to specify his query. This is connected to
a database including RDF-encoded observation data and river features served
by a WFS. First, he enters a query to find all rivers within his specified area
which are wide enough for his previously defined vessels. Bob retrieves URLs
pointing to RDF documents describing his selected features. Bob would like to
obtain the rivers which are deep enough to allow vessels with a loaded draft
depth of at least 4 meters to navigate. Hence, he searches for rivers with a
current water level below this threshold. The flow rate should be less than one
meter per second as well. Bob rephrases the query at our web site to filter the
rivers complying with his needs. After executing the query he gets URLs of RDF
datasets containing information only about rivers which are not navigable under
his specified conditions. These rivers will be closed for such vessels. These merged
datasets are composed of vector-based river data as well as river water level and
flow rate data served by a SOS, both previously translated into RDF.
7
See http://www.iris-reasoner.org/
3
Geographic Feature Pipes - Implementation
GFP is based on a Java API translating the O&M and GML based GI into RDF
descriptions. The two prominent packages of this library are depicted in Figure
1 as blue boxes and shows that the API is based on Sesame8 (pink and yellow
components), since each component depends on the components that are beneath
them. Sesame is an open source Java framework for storing and querying RDF
data including RDF-S inferencers, query result formats and query languages
such as SPARQL, various RDF storage backends, and various RDF file formats.
Further information about each Sesame component is available in chapter 3 of its
user guide9 . The primary interesting components GFP is based on are package
GFP.Translator and package GFP.Pipes. The latter provides means to retrieve
the translated and probably merged RDF documents by executing SPARQL
queries. The former is responsible for creating geospatial linked data.
GFP.Translator
GFP.Pipes
HTTP Server
Repository API
HTTP Repository
RDF I/O
HTTP Client
RDF Model
Fig. 1. Overview of components the GFP is based on.
The translation procedure is illustrated in Figure 2 and requires a ProcedureOriented Service Model (POSM)10 for a WFS or SOS including a reference to
the corresponding data model the created data entities are based on. These data
model ontologies either represent the O&M data model or the OGC Feature
Model and might be semantically annotated by linking to domain ontologies
capturing their meaning in the real world. The POSM describes a Web service
in RDF and is used in the European research project ENVISION11 to semantically annotate environmental models [20]. ENVISION provides a Service Model
Translator (SMT) implemented as Java API which creates such service models
for OGC-compliant Web services like WFS, SOS or WPS. The SMT creates
a respective POSM for each FeatureType or observedProperty. Its libraries can
8
9
10
11
See
See
See
See
http://www.openrdf.org/
http://www.openrdf.org/doc/sesame2/users/
http://www.wsmo.org/ns/posm/0.1/
http://www.envision-project.eu/
Fig. 2. The translation procedure.
be downloaded from the “ENVISION Portal” source code repository of ENVISION’s open source project12 and directly integrated into Java applications if
Maven13 is used for the project’s build process. Further descriptions of this API,
the POSM, the data model ontologies, and the annotation procedure are given
in Deliverable 4.214 of this research project. In step (1), a user registers a POSM
for either a WFS or SOS implementation. The Web service generates a context identifier (contextID) representing the IRI indicating the translated RDF
dataset, which thus can be obtained with SPARQL queries using the FROM
clause after the translation. A context, supported by Sesame as well, is also used
to identify which dataset has to be updated in the repository, and then to figure
out which statements eventually have to be removed or replaced. The Web service name, coupled with the feature type, forms the contextID used to retrieve
a RDF-encoded feature collection served by the WFS. The first URL in Figure
3 represents an example contextID. A contextID required for obtaining linked
sensor data is represented by the second URL. It consists of the Web service
name, the observedProperty identifier, and the featureOfInterest. GFP reads the
URL of the service and the data type, which are stored in the POSM, and opens
a connection to the underlying Web service for retrieving the corresponding data
entities (2). If those are observations, the GetCapabilites document of the SOS
is parsed before to get all featureOfInterest identifiers which are related to the
observedProperty parameter. For each pair of both parameters, an observation
collection is requested afterwards. In step (3), the obtained data is read, then
translated into RDF and finally added to a Sesame repository (either stored
local or on a Sesame server) associated with the contextID. Open source prod12
13
14
See http://kenai.com/projects/envision/pages/Home
See http://maven.apache.org/
See http://www.envision-project.eu/wp-content/uploads/2011/03/D4.2-1.0.pdf
Fig. 3. ContextIDs used to query the RDF datasets.
ucts are used for parsing the data. While features are read with a GML parser
provided by Geotools15 , observations are accessed, queried and parsed with the
OX-Framework16 . The contextIDs are sent back to the user which registered
the POSM and are as well mapped to SPARQL queries used to retrieve the
translated datasets. These queries are then stored as one RDF statement in
the Sesame repository associated with a query identifier (queryID). A queryID
consists of the contextID and an additional query parameter. It is required to
retrieve the SPARQL query and the request parameters which are also stored as
RDF statements. The latter are needed for retrieving the original geospatial data
if they are out-of-date. Our approach assumes that geospatial data, particular
dynamic sensor data, has to be updated regularly after last creation time that
is added as RDF statement in the translated dataset. Finally, a user can get
the translated data via the contextID. The service resolves the given id to the
queryID, retrieves the associated SPARQL query stored in the repository, and
executes it to get the translated data. If the data is up-to-date, the stream is directly forwarded to the user. Otherwise, the original data will be requested with
the previously queried request parameters, then translated into RDF, stored in
the Sesame repository and finally send back to the user.
The GFP.Translator API makes use of existing vocabularies for defining
the geospatial linked data entities. While observations are based on an O&M
ontology17 which in turn is aligned to the Semantic Sensor Network (SSN) ontology18 , GML features are defined by an OGC Feature Ontology19 based on
the GML simple features profile 2.0 [27]. Feature types are modeled as subclass
of an AbstractFeature class which can have a geometry and other properties.
Features are then serialized as instances of the feature type and inherit the
properties. Each geometry is an instance of an AbstractGeometry class. They
represent the geographic location and shape of a feature often defined by a set of
geographic coordinates. Since our approach is focused on merging feature properties instead of finding a common solution for the representation of geometries,
we represent them as Well-known text (WKT) strings. WKT reduces the amount
of triples in the dataset yielding a faster RDF translation and more efficient RDF
document merging.
In accordance with the O&M schema, observations are also modeled as subclasses of AbstractFeature. That enables the merging of observations and fea15
16
17
18
19
See
See
See
See
See
http://www.geotools.org/
http://52north.org/communities/sensorweb/oxf/index.html
http://purl.org/ifgi/om#
http://purl.oclc.org/NET/ssnx/ssn
http://purl.org/ifgi/gml/0.2#
Fig. 4. A SPARQL query to merge RDF-encoded GML features with related O&M
based observation data.
tures with SPARQL, because observations as well as features are RDF instances
of the same class. A sample query merging geospatial linked data describing river
features with related linked sensor data (observedProperty) is shown in Figure
4. Ensuring that observations belong to the translated GML feature is done by
a spatial match on the geometries of the feature and the feature of interest, and
through an additional logic-based reasoning on the semantically annotated data
models. However, we expect that this process has been done during a previous
design phase. Users are able to register such queries to the Sesame repository as
well. Once registered, the merged data can be retrieved using a contextID that
is defined by a combination of the merged dataset identifiers. The process to
retrieve the data is similar to the previously introduced one which ensures that
the user gets up-to-date information.
4
Conclusion
The vision of the Data Web assumes that information is published by following
the Linked Data principles. Providing geospatial data as Linked Data act as
bridge between that Web and the existing Geospatial Web. In conjunction with
semantic annotations, they are a move towards the Geospatial Semantic Web
introduced by Egenhofer [7]. It is a solution to bring the benefits of semantics
coupled with RDF data to existing geospatial infrastructures.
In this paper, we discussed the problems of the Geospatial Web and introduced a first solution for solving the issues with the benefits of the Data Web.
The presented approach targets RDF-based geospatial data served by a SOS
or a WFS. Describing satellite images, maps or even raster-based data such as
digital terrain models was not covered, but is also required to move the Geospatial Web towards the Geospatial Semantic Web. The introduced Web service
supports the creation of linked sensor data and linked GML-based features by
using ontologies and open source products as common base. We use existing
ontologies for the definition of the linked data entities as it ensures that the
data can be smoothly integrated by data consumers. Furthermore, they provide
the opportunity of merging sensor data with feature properties, since both, features and observations, are described by the same concept. The approach makes
it possible to users from other communities to gain access to geographic information as well. Since the geospatial data, particular the sensor data, changes
continuously, the solution also addresses the challenge of keeping the data upto-date. Geospatial linked data, semantically annotated by including references
to domain ontologies, help to bridge the vocabulary gap between different domains [19] and support a more efficient GIR. Querying the translated data with
SPARQL offers a sophisticated way to explore and aggregate information. These
benefits have been illustrated with a scenario. The approach also allows for users
a SPARQL query registry for storing individual queries representing their needs.
Future work will target the development of a Web interface which offers the
registration of a POSM and SPARQL queries. The latter may be used to merge
features with related sensor data without of a geographic information system.
We will also work on an interface allowing users for retrieving the geospatial
linked data via the registered contextID.
References
1. S. Auer, J. Lehmann, and S. Hellmann. LinkedGeoData: Adding a spatial dimension to the Web of Data. The Semantic Web-ISWC 2009, pages 731–746, 2009.
2. T. Berners-Lee.
Linked Data, 2009.
Personal view available from
http://www.w3.org/DesignIssue/LinkedData.html.
3. C. Bizer, T. Heath, and T. Berners-Lee. Linked Data-The Story So Far. International Journal on Semantic Web and Information Systems (IJSWIS), 5(3):1–22,
2009.
4. C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee. Linked Data on the Web
(LDOW2008). In Proceeding of the 17th international conference on World Wide
Web, pages 1265–1266, New York, USA, 2008. ACM.
5. D. Brickley and R. Guha. RDF Vocabulary Description Language 1.0: RDF
Schema. W3c recommendation, World Wide Web Consortium (W3C), 2004. Retrieved from http://www.w3.org/TR/rdf-schema/.
6. S. Cox. OGC Implementation Specification 07-022r1: OpenGIS Observations and
Measurements - Part 1: Observation schema. Technical report, Open Geospatial
Consortium Inc., 2007.
7. M. J. Egenhofer. Toward the Semantic Geospatial Web. In Proceedings of the 10th
ACM international symposium on Advances in geographic information systems,
GIS ’02, pages 1–4. ACM, 2002.
8. M. Gerlek and M. Fleagle. Imaging on the Geospatial Web Using JPEG 2000. In
A. Scharl and K. Tochtermann, editors, The Geospatial Web, pages 27–38. Springer
Verlag, 2007.
9. M. Goodchild. Citizens as sensors: the world of volunteered geography. GeoJournal,
69(4):211–221, 2007.
10. J. Goodwin, C. Dolbear, and G. Hart. Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web. Transactions in GIS,
12:19–30, 2008.
11. G. Hart and C. Dolbear. What’s So Special about Spatial? In A. Scharl and
K. Tochtermann, editors, The Geospatial Web, pages 39–44. Springer Verlag, 2007.
12. T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space.
Synthesis Lectures on the Semantic Web: Theory and Technology, 1(1):1–136, 2011.
13. J. Kahan, M. Koivunen, E. Prud’Hommeaux, and R. Swick. Annotea: An open rdf
infrastructure for shared web annotations. In WWW ’01: Proceedings of the 10th
international conference on World Wide Web, pages 623–632. ACM Press, 2001.
14. R. Lake and J. Farley. Infrastructure for the Geospatial Web. In A. Scharl and
K. Tochtermann, editors, The Geospatial Web, pages 15–26. Springer Verlag, 2007.
15. R. Larson. Geographic Information Retrieval and Spatial Browsing. GIS and
Libraries: Patrons, Maps and Spatial Information, pages 81–124, 1996.
16. P. A. Longley, M. F. Goodchild, D. J. Maguire, and D. W. Rhind. Geographic
Information Systems and Science. John Wiley & Sons, 2005.
17. F. Manola and E. Miller. RDF Primer. W3c recommendation, World Wide Web
Consortium (W3C), 2004. Retrieved from http://www.w3.org/TR/rdf-primer/.
18. P. Maué, H. Michels, and M. Roth. Injecting semantic annotations into (geospatial)
Web service descriptions. Semantic Web - Interoperability, Usability, Applicability,
1, 2010.
19. P. Maué and J. Ortmann. Getting across information communities. Earth Science
Informatics, 2:217–233, 2009.
20. P. Maué and D. Roman. The ENVISION Environmental Portal and Services Infrastructure. In Proceedings of International Symposium on Environmental Software
Systems (ISESS), 2011. Not yet published.
21. P. Maué, S. Schade, and P. Duchesne. Semantic Annotations in OGC Standards.
Open Geospatial Consortium (OGC), 2008.
22. A. Na and M. Priest. OGC Implementation Specification 06-009r6: OpenGIS Sensor Observation Service (SOS). Technical report, Open Geospatial Consortium
Inc., 2007.
23. OGC. OGC Reference Model (ORM) - Version 2.0, 2008.
24. C. Portele. OGC Implementation Specification 07-036: OpenGIS Geography
Markup Language (GML) Encoding Standard. Technical report, Open Geospatial Consortium Inc., 2007.
25. E. Prud’Hommeaux and A. Seaborne. SPARQL Query Language for RDF.
W3C Recommendation, 2004. Retrieved from http://www.w3.org/TR/rdf-sparqlquery/.
26. A. Scharl. Towards the Geospatial Web : Media Platforms for Managing Geotagged
Knowledge Repositories. In A. Scharl and K. Tochtermann, editors, The Geospatial
Web, volume 2, pages 3–14. Springer Verlag, 2007.
27. L. van den Brink, C. Portele, and P. A. Vretanos. OpenGIS Implementation Standard Profile 10-100r2: Geography Markup Language (GML) simple features profile.
Technical report, Open Geospatial Consortium Inc., 2010.
28. P. P. A. Vretanos. OGC Implementation Specification 09-025r1 : OpenGIS Web
Feature Service (WFS). Technical report, Open Geospatial Consortium Inc., 2010.
29. P. Zhao, G. Yu, and L. Di. Geospatial Web Services. In B. N. Hilton, editor,
Emerging Spatial Information Systems and Applications, pages 1–35. Idea Group
Publishing, 2007.