Query Caching and View
Selection for XML Databases
Bhushan Mandhani
Dan Suciu
University of Washington
Seattle, USA
Caching in Query Processing
Buffer Cache: In-memory pool of disk pages.
Semantic Cache:
Stores query results instead.
Can be in-memory or on disk.
Proposed in [Franklin et al., VLDB 96] for use in clientserver systems.
Semantic XPath Cache
A cached query result is a materialized view.
View V answers query Q if there exists C s.t
CºV=Q
C, V & Q are all XPath here.
Cache contains some views {V1,…,Vn}.
Query Q is a hit if some Vi answers it.
Motivation for the Semantic Cache
C º V = Q : a cache hit results in a simpler
query, on a much smaller XML fragment.
Cache hits were processed two orders of
magnitude faster than misses.
Can also be maintained outside the database
Ex: application tier.
Application tier caching for database-driven
websites has become increasingly popular
[Mohan et al., SIGMOD 02].
Example Illustrating C º V = Q
V = /a[u[@v]/w]/b[x//y]
Q = /a[u[w]/@v]/b[x/y//z]/c
C = /b[x/y//z]/c
The Query/View Answerability Problem
Given view V and query Q, does V answer Q, and
if yes, then what should C be?
XPath Tree Patterns
XPath queries have natural
representation as tree patterns.
E.g. If Q =
a[v]/b[@w=''val1''][x[.//y]]//c[z>val2]
The Query Axis is the path from the
root node to the result node.
The Query Depth is the number of axis
nodes.
Prefix(Q, k) is the query obtained by
truncating Q at its k-th axis node.
E.g. Prefix(Q, 2) = a[v]/b
Preds(Q, k) is the set of predicates of
the k-th axis node of Q.
E.g. Preds(Q, 2) ={[@w=''val1''], [x[.//y]]}
XPath Containment
A С B if the result of A will always be a subset of
the result of B.
Checking XPath containment is coNP-complete
[Miklau & Suciu, PODS 02].
A containment mapping maps nodes in B’s
tree pattern to those in A’s, and establishes
A С B.
It is a sound but incomplete condition.
It can be determined in polynomial time.
Containment is different from answerability.
E.g. let A = /a[x]/b, B = /a/b
Criteria for Query/View Answerability
“Rewriting XPath Queries using Materialized Views”,
Xu and Ozsoyoglu, VLDB 2005
An Example of Answerability Checking
Let V = /a[u[@v]/w]/b[x//y]
Let Q = /a[u[w]/@v]/b[x/y//z]/c
Prefix(V, 2) = /a[u[@v]/w]/b and
Prefix(Q, 2) = /a[u[w]/@v]/b.
The first condition is satisfied.
Preds(V, 2) = {[x//y]} and
Preds(Q, 2) = {[x/y//z]}.
The second condition is also satisfied.
Two Theoretical Results
Theorem: If two tree patterns are minimal, and
containment mappings exist both ways, then they
are isomorphic.
Theorem: If some view V answers query Q, then C
can be set to the subtree of Q rooted at its k-th
axis node, where k is the query depth of V.
The Cost of Answerability Checking
Answerability checking involves tree operations:
Checking isomorphism between trees.
Looking for containment mappings between trees.
Cache lookup by checking each view could give
a high lookup overhead.
Our Solution: Check answerability by string
matching. This will be:
Cheaper
Amenable to indexing in a standard RDBMS.
Checking Tree Isomorphism
We represent each tree pattern with its
“normalized” query string.
To obtain the normalized query:
At each axis node, do a DFS into each predicate subtree.
Before returning from a node, append its children node
labels to its label, in lexicographic order.
Concatenate all axis node labels to get the normalized
query.
Checking Predicate Containment
For the tree of each predicate in Preds(Q, k),
generate all trees that “containment map” to it.
ConPreds(Q, k) is the set of normalized
predicates obtained from these generated trees.
String-Based Criteria for Answerability
The second condition is tweaked to correctly
support comparison predicates.
Cache Organization
• Let V = /a[u]/b[v][@x=50][y/z=“str”].
• Insert into XmlData (‘<result></result>’). Record viewId.
• Insert into Prefix (‘/a[u]/b’). Record prefixId.
• Insert into View (viewId, prefixId, ‘@x=50’, ’y/z=“str”|v’).
Cache Lookup
• Prefix(V, k) = P.prefix = Prefix(Q, k).
• V.pred Є Preds(V, k) and V.pred Є ConPreds(Q, k).
Cache Warm-up as View Selection
Given the warm-up workload W, we generate
another workload S.
S is likely to have overlap with the test workload.
Problem: Choose some m views from S which
together answer a maximal subset of S.
This is a variant of the set cover problem, which
is NP-complete.
We use a variant of the greedy approximation
algorithm for set cover.
Experimental Setup
The data instance was a 300 MB XML document
created using the XMark generator.
Expts run on a Pentium 4, 512 MB RAM machine.
SQL Server 2005 beta used for storing the XML
data as well as the cache.
We compare our cache with a naïve cache which
stores (query string, result XML) pairs.
Query Workloads Used
We implemented our own XPath query generator
for creating workloads for our expts.
Query structure is generated randomly.
Value for each predicate is obtained by sampling
from its set of values taken, using Zipf distribution.
Queries with depths 3, 4 and 5 were given 2, 2 and
3 predicates respectively.
Conclusions
We defined a notion of query/view answerability
for XPath.
We showed how an efficient semantic cache based
on these ideas can be employed.
We demonstrated the scalability of cache lookup,
and performance gains in query processing.
© Copyright 2025 Paperzz