Covering Indexes for XM Queries by Prakash Ramanan

Covering Indexes for
XML Queries
by Prakash Ramanan
presented by
Dilek Demirel
Contents
XML query languages
Some definitions and concepts
Bisimulation and simulation relations
Results
The paper is about
Minimizing the search tree, trying to
build similar but smaller graphs
equivalent the original XML document
graph.
An XML document can be represented
as a graph D=(N, E, Eref), where N is
the set of nodes, E is the set of edges
and Eref is a set of idref edges.
Idref edges denotes an element–
subelement relationship.
The subgraph T=(N, E) is a tree.
Example
XML query languages
Some XML query languages
XPath
 XQuery

They allow navigation in an XML
document along different axes, to locate
the desired element.
Axes
XPath provides 13 different axes









Self
Child
Descendant/Descendant or self
Parent
Ancestor/ Ancestor or self
Preceding/Preceding sibling
Following/Following sibling
Attribute
Namespace
Subset languages of
XPath
Core Xpath (CXPath)
Branching Path Queries (BPQ)
Tree Pattern Queries (TPQ)
TPQ = TPQ+ subsetof BPQ+ subsetof
CXPath+ subsetof Xpath
Where C+ denotes query language C without
the operator NOT
Core XPath
Does not contain arithmetic and string
operations
Has the full navigational power of XPath
Consists all queries involving the
thirteen axes and three boolean
operators and, or and not
Branching Path Queries
A subset of CXPath
CXPath queries that ignore the order of
sibling elements
Allows nine axes, excluding the order
respecting axes
Tree Pattern Queries
Involve four axes
Self
 Child
 Descendant
 Descendant or self

The only operator and
Do not involve idref edges
Definitions and
concepts
An index for an XML document
Obtained by merging “equivalent” nodes
into a single node.
 “equivalent” according to what, coming
soon…

Index of an XML doc.
Definitions cont’d
A query Q distinguishes between two
nodes in an XML document D, if exactly
one of the two nodes is in the result of
evaluating query Q on D.
Definitions cont’d
An index DI is a covering index for a
class C of queries, if the following holds:

No query in C can distinguish between two
nodes of D that are in the same extend in
DI.
The important point about the covering
index is:

A covering index DI can be used to
evaluate the queries in C, without using D.
Focus of the paper
The paper have studied the evaluation
of CXPath queries and covering indexes
for the above mentioned subclasses of
CXPath.
Definitions cont’d
CXPath+ is complete, in the sense that,

For any node n in an XML document D,
one can always construct a query, which
starts from the root , Q in CXPath+, that
distinguishes n from all the other nodes.
The paper presented a method to build
this query.
We, till now,
Described some classes of XML queries
 Give some definitions and concepts

Will describe the equivalence relations
that are mentioned in the beginning:
Define the simulation relation on vertices of
an ordinary graph
 Define simulation and bisimulation
relations on an XML document

Simulation and
bisimulation
Question
Why do people deal with these
simulation quotients?
Because, for an XML document, if its
simulation quotient is small, then a set
of queries can be evaluated faster by
using this index instead of the bigger
XML document graph.
Simulation for Ordinary
Graphs
Directed graphs G1=(V1, A1),
G2=(V2,A2), each vertex v has a type
t(v)
Simulation is a binary relation between
the vertex sets V1 and V2 of two
graphs. It provides a possible notion of
dominance/equivalence between the
vertices of the two graphs.
Forward simulation
Fsimulation of G1 by G2 is the largest
binary relation subset of V1 * V2, such
that
Preserves vertex types t(v1)=t(v2)
 Preserve outgoing arcs: for each v1’
elementOf post(v1), there exists v2’
elementOf post(v2) such that v1’ is
Fsimulated by v2’

Fsimilarity is an equivalence relation
Backward Simulation
Analogous to Fsimulation
Deal with the incoming arcs at a vertex,
as opposed to forward simulation which
deals with outgoing arcs.
Forward and Backward
Simulation
Fbsimulation
Preserves vertex types
 Preserves outgoing arcs
 Preserves incoming arcs

Simulation for an XML
Document
Fsimulation of D is the largest binary relation
on N (node set of D), such that

Preserves node types



Preserve outgoing tree edges


If n1=root(D) then n2=root(D)
Else t(n2)=t(n1)
For each tree edge (n1,n1’), there exists a tree edge (n2,
n2’) such that n1’ is fsimulated by n2’.
Preserve outgoing idref edges

For each idref edge (n1,n1’), there exists an idref edge
(n2, n2’) such that n1’ is fsimulated by n2’.
FBsimulation of D
Deals with both incoming and outgoing
arcs
Preserves node types
 Preserve outgoing tree edges
 Preserve outgoing idref edges
 Preserve incoming tree edges
 Preserve incoming idref edges

Bisimulation Relation
Forward bisimulation of D is the largest binary
relation on N (node set of D), such that

Preserves node types



Preserve outgoing tree edges


If n1=root(D) then n2=root(D) and vice versa
Else t(n2)=t(n1)
For each tree edge (n1,n1’), there exists a tree edge (n2,
n2’) such that n1’ is fsimulated by n2’ and vice versa.
Preserve outgoing idref edges

For each idref edge (n1,n1’), there exists an idref edge (n2,
n2’) such that n1’ is fsimulated by n2’ and vice versa.
The Quotients
An equivalence relation on N partitions
N into equivalence classes. Any two
nodes in the same class are related,
any two nodes in different classes are
not.
The quotient graph D~ is obtained from
D by merging the nodes of each
equivalence class into a single node.
Example
Results
A CXPath+ query Q can be evaluated on an
XML document D by computing the
simulation of Q by D.
For an XML document, its simulation quotient
is the smallest covering index for BPQ+.
For an XML document, its simulation quotient,
with idref edges ignored throughout, is the
smallest covering index for TPQ.
Questions?