Time to Leave the Trees: From Syntactic
to Conceptual Querying of XML
Bertram Ludäscher
Ilkay Altintas
Amarnath Gupta
San Diego Supercomputer Center
U.C. San Diego
XMLDM'02, Prague
1
San Diego Supercomputer Center
Overview
• Motivating Example:
– querying XML w/o and w/ conceptual-level information
– “syntactic” vs. “conceptual” querying of XML
• Distilling conceptual-level information:
– MXS (abstract Model for XML Schema)
• XPathT:
– Incorporating conceptual-level information in XPath
XMLDM'02, Prague
2
San Diego Supercomputer Center
Motivating Example
• Example: “Books DB” (yes, more complex examples exist... ;)
– elements: <myDB> ... <book> .... <price> .... <author> ...
• Sample Queries:
– Q1: Which <book>s have a <price> below $80?
– Q2: What’s the count and average <price> of <book>s?
• (Nice) Try:
– Q1: myDB//book[price<80]
– Q2: N := count(myDB//book); S := sum(myDB//book/price);
Avg := S/N;
• But what about ...
– ... <book>s with multiple <price>s?
– ... <awe> (award-winning-exemplars) elements (= subtype of
book having subelement <award>): we forgot those!
XMLDM'02, Prague
3
San Diego Supercomputer Center
Schema Information to the Rescue!
• XML & Semistructured Data Model:
– labeled ordered trees
– “instance contains its own schema information”
– XML instances and DTDs have very little “schema info”:
• tag names (aka element “types”) = attribute names
• element nesting = object (“slot”) structure
no data types, constraints, classes, class hierarchy, ...
• Schemas are Good for You!
– link to conceptual models/DB design, query formulation,
– validation, storage layout (optimization),
– query processing (optimization), ...
XML Schema
XMLDM'02, Prague
4
San Diego Supercomputer Center
Motivating Example (Cont’d)
• Q1 after studying <myDB> and/or its XML Schema:
there is a type hierarchy below type bookT
tag names are bound to those types
but XPath doesn’t know this => use Syntactic Queries:
//*[book OR tbook OR cbook OR...OR awe] [price<80]
tedious and error-prone (do-it-yourself: Appendix A)
– e.g. you overlooked <publication xsi:type=“bookT”> !
(usually schema info not contained in the XML instance)
small changes in the schema (adding a new subtype)
require rewriting of your query...
XMLDM'02, Prague
5
San Diego Supercomputer Center
From Syntactic to Conceptual XML Queries
1. Distill conceptual information from the XML Schema
Abstract Model of XML Schema (MXS)
2. Incorporate MXS information into the query language
XPathT (“XPath with types/classes”)
turn Syntactic XML Query
//*[book OR tbook OR cbook OR ... OR awe] [price<80]
into a more adequate Conceptual XML Query:
//*[ts(bookT)][price<80]
/* works for any subtype of bookT */
more robust w.r.t. schema changes
new opportunities for semantic query optimization
XMLDM'02, Prague
6
San Diego Supercomputer Center
Abstract Model of XML Schema (MXS)
• Basic Ideas:
– Formal abstract model (never mind the XML Schema syntax!),
inspired by Model Schema Language (MXL)
[Brown-Fuchs-Robie-Wadler-WWW10-2001]
– “Types as Classes”
• XML Schema Names:
– T: Type Names
– E: Element Names
– A: Attribute Names
• XML Instances...
– ... usually contain only element names (tags) E and attributes A
( exception: “xsd:type = ...” )
XMLDM'02, Prague
7
San Diego Supercomputer Center
Abstract Model of XML Schema (MXS)
• MXS Names
– T: Types, E: Elements, A: Attributes
• Kinds of Types
– simple vs. complex: T_s, T_c
– abstract vs. concrete: T_a, T_na
• Type Hierarchy
– restrict (T_s T_s) (T_c T_c)
• restricts possible instances, keeping structure
– extend (T_s T_c) T_c
• adds “slots” (elements and attributes)
– subtype = extend restrict
• extend and restrict are subtyping mechanisms
XMLDM'02, Prague
8
San Diego Supercomputer Center
Type (Class) Hierarchy in XML Schema
• Convention: user-defined type names end with “T”
– authorT, publicationT, bookT, ...
XMLDM'02, Prague
9
San Diego Supercomputer Center
Inheritance in XML Schema (I)
EXTEND
SUBTYPE
RESTRICT
expTextBookT ::= SUBTYPE (bookT) that RESTRICTs <price> to
expPriceT and EXTENDs with <recommended_for>
XMLDM'02, Prague
10
San Diego Supercomputer Center
Inheritance in XML Schema (II)
multiple
single
inheritance
19thcenturyTextBookType ::= SUBTYPE {textBookT, c19bookT}
XML Schema type system does not known the two are equivalent!
XMLDM'02, Prague
11
San Diego Supercomputer Center
Framework for Conceptual Queries in XML
• Binding Types to Elements
– bind (E (T_s T_c )) (A T_s)
• binds element names to simple or complex types
• binds attribute names to simple types
• Syntactic XML Instance: D
– root(NodeId), child(NodeId,Integer,NodeId),
tag(NodeId,Tagname), data(NodeId,Data)
• Conceptual XML Instance: D+
– restrict(T, T), extend(T, T), subtype(T, T),
– bind(E T, T)
– ...
XMLDM'02, Prague
12
San Diego Supercomputer Center
XPathT: Incorporating Type (Class)
Information in XPath
• XPath patterns p and qualifiers q:
p[q] returns matches of p which qualify according to q
• New XPathT patterns:
• r(t), e(t), s(t): restrict, extend, subtype type t
• tr(t), te(t), ts(t): transitive versions
XMLDM'02, Prague
13
San Diego Supercomputer Center
Semantics of XPathT
• Example:
“transitive subtype”:
SEM( ts(t) ) :=
{ t’ | subtype*(t,t’) }
from types to element names:
SEM( [T] ) :=
{ e | bind(t,e), tT }
SEM( [ts(bookT)] ) :=
{book,ebook,tbook, ...}
XMLDM'02, Prague
14
San Diego Supercomputer Center
Conceptual(-level) XML Queries in XPathT
• Which books have price below $80?
//*[ts(bookT)][price<80]
• Semantic-aware equivalent rewriting:
//*[ts(bookT)][NOT ts(expTextBookT)][price<80]
conceptual information
• Logic XPathT Query Plan: tree structure
information
XMLDM'02, Prague
15
San Diego Supercomputer Center
Summary
• Complex domains require conceptual level modeling and querying
capabilities beyond just tree structure
• Statues Quo: XML Schema: simple “conceptual model” with may
ad-hoc “design decisions”/restrictions
Abstract Model of XML Schema (MXS)
XPathT: first step towards “conceptual” or “semantic” XML query
language extensions
more concise, intuitive, flexible, and robust queries
the system maps conceptual to syntactic queries, not the
programmer/query designer!
XMLDM'02, Prague
16
San Diego Supercomputer Center
Next Steps & Outlook
• extend MXS to include more conceptual information
• develop formal semantics
– XPathT, extensions: XPathC, XQueryC
• research problems:
– mapping: XPathC queries => equivalent XPath queries
– formalize equivalence, always possible? Then, conventional
XML query processors can be used!
– “proxy XML Schema doc”: instead of rewriting into XPath over
the original instance, can one materialize some conceptual info
as a “proxy XML doc” such that conceptual queries become
conventional queries against the proxy...
– semantic query optimization: equivalent rewritings given the
conceptual level constraints
XMLDM'02, Prague
17
San Diego Supercomputer Center
© Copyright 2026 Paperzz