Lossless RDF graph decompositions

Tracking RDF Graph Provenance using RDF Molecules
Li Ding, Tim Finin, Yun Peng and Anupam Joshi
University of Maryland, Baltimore County
The Semantic Web knowledge onion
Paulo Pinheiro da Silva and Deborah L. McGuinness
Stanford University
The role of RDF molecule in tracking provenance
RDF document level?
Low recall: G2, G3 missed!
RDF triple level?
Low precision: G4 got matched!
Hence RDF Molecule is needed.
Lossless RDF graph decompositions
An RDF graph decomposition has three elements
• W, the background ontology
• d(G,W), a function that decomposes an RDF graph G
into sub-graphs Ĝ = {G1, ...,Gn } using W
• m(Ĝ,W), a function that merges all Ĝ's elements into a
unified RDF graph G' using W.
It is lossless iff G G = m(d(G,W),W)
We identified three types of lossless RDF decompositions:
1. Naive decomposition
• blank nodes connect triples
• G1 =>(t1) and (t2,t3,t4,t5).
2. Functional decomposition
• use functional dependency semantics
• (W is empty) G1 => (t1) and (t2,t3,t4,t5).
• (W asserts foaf:mbox is IFP) G1=> (t1), (t2,t5), (t3,t5), (t4,t5)
3. Heuristic decomposition
• use extend functional dependency
• (W asserts foaf:firstName + foaf:surname together functionally
decide a person instance) G1=> (t1), (t2,t3,t4), (t3,t4,t5)
RDF molecule and its applications
An RDF molecule of an RDF graph G is the finest, lossless sub-graph
of G given a lossless RDF graph decomposition (W, d, m).
• lossless - it can be used to restore graph G w/o adding new triples
• finest - it cannot be further decomposed into lossless sub-graphs
Tracking Web provenance with Swoogle
Given an RDF graph, we decompose it into RDF molecules
and track its provenance over 680K online RDF documents
indexed by Swoogle.
Evaluating trustworthiness of aggregated FOAF profile
With the provenance tracking service provided by Swoogle,
we may further compute trust to each piece of fused FOAF profile.
Finding additional (partial) explanations in Inference Web
Given an expression encoded as an RDF graph, we may search
for proofs whose conclusions imply corresponding molecules, and
then generate additional (partial) explanations to the expression.
Partial research support was provided by DARPA contract F30602-00-0591and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.