32N1639 DBSG Relational-to-RDF

JTC1 SC32N1639
Relational  RDF
A Mapping Investigation
The Need
• Relational databases hold vast quantities of
enterprise data
• That data is often needed in the context of
another data model
• SQL/XML maps relational data to XDM
(and vice versa)
• No extant standards effort to map relational
data to the RDF data model (or vice versa)
2006-08-22
Copyright © 2006 Oracle Corp.
2
But Why RDF?
• Provide business data in a form suitable for
taxonomical – and inferential – use
• Allow use of SPARQL[-like] facilities on
such data, allow use of existing inference
engines
• Greater participation in the Semantic Web
• Improve business value of enterprise data
2006-08-22
Copyright © 2006 Oracle Corp.
3
What Does It Mean?
• Literal, physical transformation: probably
rare, lousy cost/benefit ratio
• “Lenses”: allows apps to “see” their
preferred data model by transforming only
desired data, only upon demand (example:
SQL’s views & Oracle’s materialized views)
• Impedance mismatch: unavoidable, but
careful design can reduce the impact
2006-08-22
Copyright © 2006 Oracle Corp.
4
So, What Do We Need?
• Standardized transformation of arbitrary relational
data into RDF
• Obvious approach: For each row in an SQL table
(or view!), create one RDF triple corresponding to
each column of the row
• Not so simple! What about tables without keys?
What about table metadata?
• Will apps need customized transformations, too?
2006-08-22
Copyright © 2006 Oracle Corp.
5
For Example
T1:R100
T1:C1
100
T1:R100
T1:C2
…
…
…
…
T1:R100
T1:Cn-1
…
T1:R100
T1:Cn
Joe
…
…
…
…
…
…
T1:R450
T1:C1
450
T1:R450
T1:C2
…
Assumes:
…
…
…
T1 defined as namespace
(e.g., SQL://DB1/SCH1/T1)
T1:R450
T1:Cn-1
…
T1:R450
T1:Cn
Sue
T1
C1
"C2…Cn-1" Cn
100 …
…
…
450 …
Joe
…
Sue
PK (C1)
2006-08-22
Copyright © 2006 Oracle Corp.
6
But…
• Create a “namespace declaration” for T1, or use
blank nodes?
• Need to specify that T1 is a table, that it has n
columns, the names of those columns, the data
types of those columns, what the keys are, etc.
• Tables without keys – how to identify each row?
Most RDBMSs have row IDs, but not standardized
• How should foreign keys be handled?
• What about columns of complex types?
• Mindless proliferation of triples!
2006-08-22
Copyright © 2006 Oracle Corp.
7
An Expanded Example
_:t1
RDF:type
SQL:TBL
_:t1
SQL:TblName
T1
_:t1
SQL:Row
_:r1
_:t1
SQL:NumCols
3
_:r1
SQL:PKrow
_:pk1
_:t1
SQL:Column
_:c1
_:pk1
SQL:PKpos
1
_:c1
SQL:ColName
C1
_:pk1
SQL:PKval
100
_:c1
SQL:ColType
SQL:INT
_:r1
_:c1
100
_:c1
SQL:ColPos
1
_:r1
_:c2
…
…
…
…
…
…
…
_:t1
SQL:PKnumCols 1
_:r1
_:cn-1
…
_:t1
SQL:PKcolumn
_:kc1
_:r1
_:cn
Joe
_:kc1
SQL:PKcolPos
1
_:t1
SQL:rowID
3589122
_:kc1
SQL:PKcolumn
_:c1
2006-08-22
Copyright © 2006 Oracle Corp.
8
My Mission
• Ensure that SQL and XML/XQuery play well
together — SUCCESS (SQL/XML)
• Ensure that SQL and RDF/SPARQL play well
together — initial research in progress: two aspects:
a) “publish” relational  RDF & b) embed RDF in
SQL tables, transform SPARQL to same execution
trees as SQL)
• Can (should!) XML/XQuery and RDF/SPARQL be
made to play well together? More difficult?
2006-08-22
Copyright © 2006 Oracle Corp.
9
• Who should standardize
this? H2/WG3? W3C (e.g.,
DAWG)? OASIS? Others?
• Additional approaches?
• Research not completed!
Should I (and/or Oracle)
continue? [So far, my boss’
answer is “Yes”]
• Lingering unresolved
problems?
2006-08-22
Copyright © 2006 Oracle Corp.
10