XML Schema - San Diego Supercomputer Center

Time to Leave the Trees: From Syntactic
to Conceptual Querying of XML
Bertram Ludäscher
Ilkay Altintas
Amarnath Gupta
San Diego Supercomputer Center
U.C. San Diego
XMLDM'02, Prague
1
San Diego Supercomputer Center
Overview
• Motivating Example:
– querying XML w/o and w/ conceptual-level information
– “syntactic” vs. “conceptual” querying of XML
• Distilling conceptual-level information:
– MXS (abstract Model for XML Schema)
• XPathT:
– Incorporating conceptual-level information in XPath
XMLDM'02, Prague
2
San Diego Supercomputer Center
Motivating Example
• Example: “Books DB” (yes, more complex examples exist... ;)
– elements: <myDB> ... <book> .... <price> .... <author> ...
• Sample Queries:
– Q1: Which <book>s have a <price> below $80?
– Q2: What’s the count and average <price> of <book>s?
• (Nice) Try:
– Q1: myDB//book[price<80]
– Q2: N := count(myDB//book); S := sum(myDB//book/price);
Avg := S/N;
• But what about ...
– ... <book>s with multiple <price>s?
– ... <awe> (award-winning-exemplars) elements (= subtype of
book having subelement <award>): we forgot those!
XMLDM'02, Prague
3
San Diego Supercomputer Center
Schema Information to the Rescue!
• XML & Semistructured Data Model:
– labeled ordered trees
– “instance contains its own schema information”
– XML instances and DTDs have very little “schema info”:
• tag names (aka element “types”) = attribute names
• element nesting = object (“slot”) structure
 no data types, constraints, classes, class hierarchy, ...
• Schemas are Good for You!
– link to conceptual models/DB design, query formulation,
– validation, storage layout (optimization),
– query processing (optimization), ...
 XML Schema
XMLDM'02, Prague
4
San Diego Supercomputer Center
Motivating Example (Cont’d)
• Q1 after studying <myDB> and/or its XML Schema:
 there is a type hierarchy below type bookT
 tag names are bound to those types
 but XPath doesn’t know this => use Syntactic Queries:
//*[book OR tbook OR cbook OR...OR awe] [price<80]
 tedious and error-prone (do-it-yourself: Appendix A)
– e.g. you overlooked <publication xsi:type=“bookT”> !
(usually schema info not contained in the XML instance)
 small changes in the schema (adding a new subtype)
require rewriting of your query...
XMLDM'02, Prague
5
San Diego Supercomputer Center
From Syntactic to Conceptual XML Queries
1. Distill conceptual information from the XML Schema
 Abstract Model of XML Schema (MXS)
2. Incorporate MXS information into the query language
 XPathT (“XPath with types/classes”)
 turn Syntactic XML Query
//*[book OR tbook OR cbook OR ... OR awe] [price<80]
 into a more adequate Conceptual XML Query:
//*[ts(bookT)][price<80]
/* works for any subtype of bookT */
 more robust w.r.t. schema changes
 new opportunities for semantic query optimization
XMLDM'02, Prague
6
San Diego Supercomputer Center
Abstract Model of XML Schema (MXS)
• Basic Ideas:
– Formal abstract model (never mind the XML Schema syntax!),
inspired by Model Schema Language (MXL)
[Brown-Fuchs-Robie-Wadler-WWW10-2001]
– “Types as Classes”
• XML Schema Names:
– T: Type Names
– E: Element Names
– A: Attribute Names
• XML Instances...
– ... usually contain only element names (tags) E and attributes A
( exception: “xsd:type = ...” )
XMLDM'02, Prague
7
San Diego Supercomputer Center
Abstract Model of XML Schema (MXS)
• MXS Names
– T: Types, E: Elements, A: Attributes
• Kinds of Types
– simple vs. complex: T_s, T_c
– abstract vs. concrete: T_a, T_na
• Type Hierarchy
– restrict  (T_s  T_s)  (T_c  T_c)
• restricts possible instances, keeping structure
– extend  (T_s  T_c)  T_c
• adds “slots” (elements and attributes)
– subtype = extend  restrict
• extend and restrict are subtyping mechanisms
XMLDM'02, Prague
8
San Diego Supercomputer Center
Type (Class) Hierarchy in XML Schema
• Convention: user-defined type names end with “T”
– authorT, publicationT, bookT, ...
XMLDM'02, Prague
9
San Diego Supercomputer Center
Inheritance in XML Schema (I)
EXTEND
SUBTYPE
RESTRICT
expTextBookT ::= SUBTYPE (bookT) that RESTRICTs <price> to
expPriceT and EXTENDs with <recommended_for>
XMLDM'02, Prague
10
San Diego Supercomputer Center
Inheritance in XML Schema (II)
multiple
single
inheritance
19thcenturyTextBookType ::= SUBTYPE {textBookT, c19bookT}
XML Schema type system does not known the two are equivalent!
XMLDM'02, Prague
11
San Diego Supercomputer Center
Framework for Conceptual Queries in XML
• Binding Types to Elements
– bind  (E  (T_s  T_c ))  (A  T_s)
• binds element names to simple or complex types
• binds attribute names to simple types
• Syntactic XML Instance: D
– root(NodeId), child(NodeId,Integer,NodeId),
tag(NodeId,Tagname), data(NodeId,Data)
• Conceptual XML Instance: D+
– restrict(T, T), extend(T, T), subtype(T, T),
– bind(E  T, T)
– ...
XMLDM'02, Prague
12
San Diego Supercomputer Center
XPathT: Incorporating Type (Class)
Information in XPath
• XPath patterns p and qualifiers q:
p[q] returns matches of p which qualify according to q
• New XPathT patterns:
• r(t), e(t), s(t): restrict, extend, subtype type t
• tr(t), te(t), ts(t): transitive versions
XMLDM'02, Prague
13
San Diego Supercomputer Center
Semantics of XPathT
• Example:
“transitive subtype”:
SEM( ts(t) ) :=
{ t’ | subtype*(t,t’) }
from types to element names:
SEM( [T] ) :=
{ e | bind(t,e), tT }
SEM( [ts(bookT)] ) :=
{book,ebook,tbook, ...}
XMLDM'02, Prague
14
San Diego Supercomputer Center
Conceptual(-level) XML Queries in XPathT
• Which books have price below $80?
//*[ts(bookT)][price<80]
• Semantic-aware equivalent rewriting:
//*[ts(bookT)][NOT ts(expTextBookT)][price<80]
conceptual information
• Logic XPathT Query Plan: tree structure
information
XMLDM'02, Prague
15
San Diego Supercomputer Center
Summary
• Complex domains require conceptual level modeling and querying
capabilities beyond just tree structure
• Statues Quo: XML Schema: simple “conceptual model” with may
ad-hoc “design decisions”/restrictions
 Abstract Model of XML Schema (MXS)
 XPathT: first step towards “conceptual” or “semantic” XML query
language extensions
 more concise, intuitive, flexible, and robust queries
 the system maps conceptual to syntactic queries, not the
programmer/query designer!
XMLDM'02, Prague
16
San Diego Supercomputer Center
Next Steps & Outlook
• extend MXS to include more conceptual information
• develop formal semantics
– XPathT, extensions: XPathC, XQueryC
• research problems:
– mapping: XPathC queries => equivalent XPath queries
– formalize equivalence, always possible? Then, conventional
XML query processors can be used!
– “proxy XML Schema doc”: instead of rewriting into XPath over
the original instance, can one materialize some conceptual info
as a “proxy XML doc” such that conceptual queries become
conventional queries against the proxy...
– semantic query optimization: equivalent rewritings given the
conceptual level constraints
XMLDM'02, Prague
17
San Diego Supercomputer Center