Foundations of Semantic Web Databases
Foundations of Semantic Web
Databases
Gutierrez, Hurtado and
Mendelzon
summary by: Nir Zepkowitz
1
2
Foundations of Semantic Web Databases
1
Background
Currently, the web is a huge collection of interconnected data. However, the web lacks semantic
information so managing and processing the data is hard. The idea of Semantic web is an idea to
build an infrastructure of machine-readable semantic for the data on the web.
In words of others: "If HTML and the Web made all the online documents look like one huge
book, RDF, schema, and inference languages will make all the data in the world look like one
huge database." Tim Berners-Lee, Weaving the Web, 1999.
In 1998 the W3C offered the language that will be the basis for that infrastructure – the
Resource Description Framework (RDF).
Query languages for RDF were developed side by side with RDF. Nevertheless, little research
about the foundations of RDF and its query languages has been conducted. This research is
necessary because of the new features that arise in querying RDF graphs (as opposed to
standard DB) and this is one of the reasons for this article.
2 Paper Goals
Study formal aspects of querying DBs containing RDF data.
New notation of normal form for RDF graphs.
Give formal definition of query language for RDF.
Investigate theoretical and complexity aspects related to query processing and
redundancy.
3 The RDF Model
Notations:
U – RDF URI references.
B – blank nodes (similar to variables that we saw in TATA).
L – RDF literals.
RDF triple: (U B) U (U B L)
V1 – subject, v2 – predicate, v3 – object. (the head of an arc the arc and the tail).
Definitions:
Graph is a set of triples.
Universe(G) – set of UBL ( U
Vocabulary of G – universe(G ) (U L)
A graph is ground if it has no blank nodes.
Map: a function (UBL->UBL) preserving URIs and literals (μ(u) = u).
μ(G) – a set (μ(s), μ(p), μ(o)) s.t. (s,p,o) in G.
μ is consistent with G if μ(G) is a RDF graph.
In this case we call μ(G) an instance of G.
B L ) elements that appear in a triple of G.
Foundations of Semantic Web Databases
An instance is proper if μ(G) has fewer blank nodes than G.
G1,G2 are isomorphic (
3
G1 G2 ) if there are maps μ1, μ2 s.t. μ1(G1)=G2 and
μ2(G2)=G1
Union of graphs (G1UG2) is the union of their triples.
Merge of graphs (G1+G2) is G1UG2’ where G2’ is isomorphic to G2 and its blank
nodes are disjoint with those of G1. (there is no relation between the graphs).
G is lean if there is no map μ s.t. μ(G) is a proper sub-graph of G. the intuition is
that a lean graph cannot be “minimized” the lean sub-graph is the essence of the
graph.
3.1 Example of RDF graph
3.2 RDFS
RDFS is an extended version of RDF. It defines classes and properties that may be used for
describing groups of resources and relationships between resources. This model supports:
reification (making statements about statements), typing and inheritance.
For example RDFS defines the predicate SC (sub class) and this property has some rules that
come with it. For instance, If A is SC of B and B is SC of C then A is SC of C.
3.3 Core(G)
Theorem: each RDF graph G contains a unique lean sub-graph which is an instance of G.
We will denote this unique sub-graph: core(G).
4 Semantics of RDF graphs
Theorem*: Let G1, G2 be simple (do not use predefined semantics like RDFS classes and
predefined properties) graphs. G1 entails G2 (G1╞ G2) iff there is a map G2->G1 (there is a map
s.t. μ(G2) is sub-graph of G1).
For example the graph that was presented before entails this graph:
Foundations of Semantic Web Databases
G1 and G2 are equivalent (G1≡G2) if G1╞ G2 and G2╞ G1.
Theorem: if G is simple (RDF model), then core(G) is the unique (up to isomorphism) minimal
(w.r.t number of triples) graph equivalent to G.
4.1 RDFS Model
There is a sound and complete set of rules for ╞ in graphs with RDFS-vocabulary.
For example: (a,sc,b), (b,sc,c) -> (a,sc,c).
In non-simple graphs we can not use theorem* because of issues like transitivity.
To avoid the problem we will “close” the graph with all possible triples that are entailed by the
existing ones.
A closure of G is a maximal set of triples G’ over universe(G’) plus the RDFS-vocabulary s.t. G’
contains G and is equivalent to G.
There is another kind of closure: RDFS-closure - Closure of G under the set of RDFS rules.
By using this definition we can prove that: G1╞ G2 iff there is a map from G2 to the RDFS
closure of G1.
Notice that from the data representation point of view, “closure” and “RDFS-closure” may have
redundancies. They are not the best choice to work with.
5 Normal forms
G’s normal form (nf(G)) is core(G’), where G’ is closure of G.
In the below the normal form of the right graph is the left one.
If G is a RDF graph:
1. nf(G) is unique.
2. G1╞ G2 iff nf(G2)->nf(G1).
3. G1≡G2 iff
nf (G1 ) nf (G2 )
Normal forms are not the most compact representation.
A reduction of a graph G is a minimal graph Gr equivalent to G and contained
in G.
The writers of the article present an algorithm to get the reduction of a graph.
The basic idea is to delete triplets deduced by RDFS rules.
((a,sc,b), (b,sc,c) -> (a,sc,c)).
4
Foundations of Semantic Web Databases
5
6 Querying RDF Databases
RDF graph can be viewed as standard relational database. Each tuple in the table is a triplet with
the attributes: subject, predicate and object.
Variables (disjoint from UBL) will be denoted ?X, ?Y, ?person.
The query language will be similar to datalog: (?A,creates,?Y) <- (?A,type,Flemish),
(?A,paints,?Y), (?Y,exhibited,?Gordon)
We will define A tableau as a pair (H,B), where H and B are RDF graphs.
Now we can say that a Query is a tableau (H,B) plus a set of premises P and a set of constraints
C, where P is a graph over UBL and C is a subset of the variables occurring in H.
6.1 Constraints
Constraints allow discriminating between blank and ground nodes in an answer (IS NOT NULL).
If we add the constraint {?A} this means that ?A variable must be bound to a non-blank element
in each answer to the query.
6.2 Premises
The premise represents information that the user supplies to the database in order to answer the
query. It a Allows hypothetical analysis.
6.3 Answering a query
6.3.1
6.3.2
Valuation and Matching
Valuation is a function: V->UBL. For a set C of variables, the valuation v satisfies the
constraint C, if for all x in C v(x) is not blank. We denote v(B) is the graph obtained
after replacing every occurrence of a variable x in B with v(x).
Matching of a graph B in DB D is a valuation v s.t. v( B) nf ( D).
Single answer
Let q=(H,B,P,C) be a query and D a DB. A pre-answer of q over D is:
o
preans(q,D)={v(H) : v is a matching of B in D+P and v satisfies C}.
A graph v(H) in preans(q,D) is called a single answer of query q over D.
6.3.3
Complex queries
We would like complex queries to be composed form simple ones, there are two options:
ansu(q,D) – union (set) of the triples of the simple answers. This option is good when
we want blank nodes to play the role of bridges between two queries.
ans+(q,D) – (merge) renaming blank nodes to avoid name clashes before the union of
the triples. Good when querying several unrelated DBs.
7 Query complexity
We consider simpler versions for calculating the query complexity:
Foundations of Semantic Web Databases
o
Query complexity version: fixed DB D, given a query q, is q(D) is non-empty?
o
Date complexity version: fixed query q, given a DB D, is q(D) non-empty?
Theorem: the evaluation problem is NP-complete for the query complexity version and
polynomial for the data complexity version.
We can show that the size of the set of answers of a query q over a DB D is |D| |q|. where |D| is
the size of the normal form of D and |q| is the number of symbols in the query.
6
© Copyright 2026 Paperzz