A Framework for Optimizing
and Parallelizing XQuery
Xiaogang Li
Motivations
Developing data processing applications is
hard
- Many data formats exist
- Different architectures
- Need independence from data format and architecture
XML has gained great popularity!
- Now the standard language for the internet
- Already extensively used as part of Grid/Distributed
Computing
High-level declarative languages ease
application development
-Popularity of Matlab for scientific computations
The Whole Picture
XQuer
y
HDF5
NetCDF
TEXT
XML
RDMS
XML
Contributions
Architectural independence
- Provide compilation support of XQuery for
- Stream processing (VLDB2005)
- Parallel processing on clusters (ICS2003, DBPL2003)
Data format independence
- Developed techniques to use XML as a logical interface
over physical datasets (LCPC 2003)
Performance
- Developed a series optimization techniques for efficient
XQuery processing
- Developed static analysis techniques to guide compiler
optimizations and transformations (XIMP2004,IPDPS2003)
Roadmap
Background
- XML, XQuery
- Related work
Stream Processing
Virtual XML
Parallelization
Conclusion
eXtensible Markup Language
Specification of a syntax for “encoding” data, with strict syntax rules
about how to do so.
A text-based syntax -- written using printable characters (no
explicit binary data)
Extensible -- you can define your own tags (essentially data types),
within the constraints of the syntax rules
Universal -- the syntax rules ensure that all XML processing
software MUST identically handle a given piece of XML.
An ideal data exchange format
XML Example
element
tags
<order>
attribute of this
quantity element
xmlns=“http://w3c.org/Spec/” >
<item>
<code>“30100026266” </code>
<desc> Viewsonic E90f Monitor,
0.21mm, DELL Outlet
</desc>
<price> 229.99 </price>
<quantity units=“gross”> 2 </quantity>
<deliveryDate date=“20APr2004-12:00h” />
</item>
<item>
<code> “2001234”
</code>
. . . . . .
</item>
</order>
XQuery
A declarative language for querying XML
-Widely accepted language for querying XML
- Declarative: like SQL, easy to use
- Powerful: types, user-defined functions, binary
expressions
- FLWR (for, let, where, return) expressions
Support XPath as a subset
- A query language that selects particular subsets of nodes
from an XML document
VMScope- XQuery Code
Unordered (
for $i in ( $x1 to $x2)
for $j in ($y1 to $y2)
let p:=document(“vmscope.xml”)
/data/pixel [(x=$i) and ( y=$j)
and (sacle >=$z1)
return
<pixel>
<latitute> {$i} </latitute>
<longtitute>{$j} <longtitute>
<sum>{accumula($p)}</sum>
</pixel>
)
Define function accumulate ($p)
as element
{
if (empty( $p)
then $null
else
let $max =accumula(subsequence($p,2))
let $q := item-at( $p, 1)
return
if ($q/scale < $max/scale ) or
($max = $null )
then $max
else $q
}
XQuery Example: Apriori
Users can write very complex, flexible programs.
Recursive functions are the only way for reduction
Roadmap
Background
- XML, XQuery
- Related work
Stream Processing
High-level Abstraction
Parallelization
Conclusion
Query Processing- Related Work
Much of the work focuses on XPath
-Xpath expressions are regular expressions-easy to analyze
Limited work on optimizing XQuery
-Optimizing from high-level using algebra
-Translating query into a tree of operators
-Query rewriting based on algebra
Algebra Approach: Limitations
Can not handle low level optimizations
- loop invariants, common subexpression …
Hard to catch all features using
algebra
- Recursive functions, types, aggregations
XQuery is complex, a simple algebra
just does not exist
Our Overall Approach
Using compiler technologies for Query
optimization
- Compiler techniques are well developed
- Data flow analysis, loop transformation,
parallelization
Advanced program analysis, loop transformation
and parallelization techniques can allow efficient
execution of XQuery
Roadmap
Background
- XML, XQuery
- Related work
Stream Processing
Virtual XML
Parallelization
Conclusion
Motivation
Why Streaming Data
Data needs to be analyzed at real time
- Stock market, Security, Climate, Network monitoring,
Telecommunication data management etc
Huge amount of data
- NASA EOS project – 50 GB per hour
Rapid improvements in networking
technologies
- 101.13 Gbps at SC2004 bandwidth challenge
Motivation
Why XML
- Standard data exchanging format for the Internet
-
Widely adapted in web-based, distributed and grid
computing
Why XQuery
- Widely accepted language for querying XML
- Easy to use
XQuery is the ideal language for querying XML streams
Can we compile it correctly and efficiently for streaming data?
Challenges
For an arbitrary query, can it be evaluated
correctly on unbounded streaming data?
- Single traversal of the data is required
- Decision should be made by the compiler, not the user
If not, can it be transformed accordingly?
How to generate efficient code for XQuery?
- Computations involved is nontrivial
- Recursive functions are frequently used
- Efficient memory usage is important
Our Solutions
For an arbitrary query, can it be evaluated
correctly on unbounded streaming data?
- Construct data-flow graph for a query
- Static analysis based on data-flow graph
If not, can it be transformed accordingly?
- Query transformation techniques based on static
analysis
How to generate efficient code for XQuery?
- Techniques based on static analysis to minimize
memory usage and optimize code
- Generating imperative code
- Recursive analysis and aggregation rewrite
Query Evaluation Model
Op1
Op2
Op3
Op4
Single input stream
Internal computations
Limited memory
linked operators
Pipeline operator and
Blocking operator
Pipeline and Blocking Operators
Pipeline Operator:
- Each input element produces an output element independently
- Selection etc
Blocking Operator:
- Can only generate output after receiving all input elements
- Cannot be processed in a single pass
- Sort, Join etc
Progressive Blocking Operator:
(1)|output|<<|input|: we can buffer the output
(2) Associative and commutative operation: discard input
- count(), sum()
Single Pass?
Pixels with x and y
Q1:
let $i := …/pixel
sortby (x)
Q2:
let $i := …/pixel
[x < count(/pixel)]
(1) A blocking operator exists
(2) A progressive blocking
operator is referred by
another pipeline operator
(or progressive blocking
operator)
Check condition 2 in a query
Single-Pass? Challenges
Must analyze data dependence
- Something like Data Dependence Graph may be
helpful
A Query may be flexible and complex
- Need a simplified view of the query to make
decision
Overall Framework
Data Flow Graph Construction
Low level Transformation
High level Transformation
GNL Generation
Horizontal Fusion
Recursion Analysis
Vertical Fusion
Aggregation Rewrite
Single-Pass Analysis
Stream Code Generation
Stream Data Flow Graph (DFG)
Node: variable
S1
S2
v1
i
b
S1:stream/pixel[x>0]
S2:stream/pixel
V1: count()
Sequence
Atomic
Edge: dependence
relation
v1->v2 if v2 uses v1
Aggregate dependence
Flow dependence
A DFG is acyclic
High-level Transformation
Goals
1: Enable single pass evaluation
2: Simplify the DFG for single-pass
analysis
Horizontal Fusion and Vertical
Fusion
- Based on DFG
Horizontal Fusion
Enable single-pass evaluation
- Merge sequence node with common prefix
S1
S2
v1
v2
b
S0
S1
S2
v1
v2
b
S1:stream/pixel[x>0]
S2:stream/pixel/y
V1: count() V2: sum()
S0:/stream/pixel
S1:[x>0] S2: /y
V1: count() V2: sum()
Horizontal Fusion with nested loops
Perform loop unrolling first
Merge sequence node accordingly
Before Horizontal Fusion
Output
Require 3 Scanning
Datasets
After Horizontal Fusion
Output
Requires Just one
Scanning
Datasets
Vertical Fusion
Simplify DFG and single-pass analysis
- Merge a cluster of nodes linked by flow dependence edges
S1
S1
i
i
b
b
S2
j
S
v
S2
j
v
v
Single-pass Analysis
Can a query be evaluated on-the fly?
THEOREM 1. If a DFG contains more than one
sequence node after vertical fusion, it can not
be evaluated correctly in a single pass.
Reason: for single input stream, each sequence
node requires one traversal
Single-pass Analysis- Continue
THEOREM 2. For any given two atomic nodes n1 and n2,
if (1) n1 and n2 are aggregate dependent on a
sequence node
(2) there is a path between them,
the query may not be evaluated in a single pass.
Reason: A progressive blocking operator is referred
by another progressive blocking operator
Example : count (pixel)
where /x>0.01*sum(/pixel/x)
Single-pass Analysis - Continue
THEOREM 3. In there is a cycle in a DFG, the corresponding
query may not be evaluated correctly using a single
pass.
Reason: A progressive blocking operator is referred
by a pipeline operator
S1
i
S2
b
S2
j
v
v
Single-pass Analysis
Check conditions corresponding to Theorem 1
2 and 3
-Stop further processing if any condition is true
Completeness of the analysis
- If a query without blocking operator pass the test, it can be
evaluated in a single pass
THEOREM 4. If the results of a progressive blocking operator are
referred to by a pipeline operator or a progressive blocking
operator, then for its DFG, at least one of the three
conditions holds true
A Review of the High-level
Transformation and Analysis
Can not be
evaluated in a
single pass!!
S1
S2
v1
i
b
S
S
S
v
i
v1
v1
b
b
i
Code Generation
Using SAX XML stream parser
- XML document is parsed as stream of
events
- Event-Driven: Need to generate code
to handle each event
Using Java JDK
-Our compiler generates Java source
code
Experiment
Query Benchmark
- Selected Benchmarks from XMARK
- Satellite, Virtual Microscope, Frequent Item
Systems compared with
- Galax
- Saxon
- Qizx/Open
Performance: XMARK Benchmark
>25% faster on small dataset
Scales well on very large datasets
Performance: Real Applications
>One order of magnitude faster on small dataset
Works well for very large datasets
Summary
Provide a formal approach for query
evaluation on XML stream
- Query transformation to enable correct execution on
stream
- Formal methods for single-pass analysis
- Strategies for efficient low-level code generation
- Experiment results show advantage over other wellknown systems
Roadmap
Background
- XML, XQuery
- Related work
Stream Processing
Virtual XML
Parallelization
Conclusion
Support High-Level Abstraction
Understanding the physical details is hard, but
necessary for performance
Logical Schema: A logical view over the data for programmer
Physical Schema: Low level details of physical storage, provided
to compilers
System Architecture
External Schema
XML Mapping Service
logical XML schema
physical XML schema
Compiler
XQuery Sources
C++/C
High-level and low-level XQuery
High-level query:
- Query base on logical schema
- Developed by programmers
Low-level query:
- Query base on physical schema
- Retrieve data by calling library functions
High-level Query is transformed to low-level
query by our compiler
-User can still modify low level query if not satisfied
Mapping to low-level Query
A number of getData
functions to retrieve data
stream
-getData($x)
-getData($x,$y)
getData functions Written
in Xquery
-allow analysis and
transformation
Find the optimal library
function to call
Unordered (
for $i in ( $x1 to $x2)
for $j in ($y1 to $y2)
let p:= getData($i,$j)
return
<pixel>
<latitute> {$i} </latitute>
<longtitute>{$j}
</longtitute>
<sum>{accumulate($p)}</sum>
</pixel>
)
Compiler Techniques
Insert getData functions
- Compatible: output should be superset of original data
stream
- performance: want smallest superset
Query rewritten based on relational algebra
- Reduce to canonical forms
- Compare canonical forms
Comparison with Manual - VMScope
4500
4000
3500
3000
2500
Xquery
C
2000
1500
1000
500
0
1
2
4
8
Roadmap
Background
- XML, XQuery
- Related work
Stream Processing
Virtual XML
Parallelization
Conclusion
Generalized Nested Loop (GNL)
An intermediate
representation
explicitly defines
For $b in student/score @t =cis
sum = sum +b
count = count +1
- iterative structures for
retrieving data
- aggregation operations to
be performed on the
qualified data
Filter Expr
index
variable
Path Expr
Loop
Body
Parallelization of XQuery
GNL offer a convenient base for
parallelization
- Iterative structure
- Explicitly defined reduction
Use ADR,MPI for parallel code
generation
-ADR: a C++ class library and runtime system
for building parallel databases of multidimensional datasets
-MPI : a standard communication library (C++)
Parallel Code Generation
1. From XQuery to C++
- ADR(MPI) is a C++ library
- Type systems of XQuery and C++ is quite different
2. Generation of Processing functions
- Local reduction
- Global reduction
Global Reduction Function
Local reduction:
process input, update on
local copy
Global reduction: process
local copy, update global copy
From local to global,
How ?
1. Extract a program slice
from local reduction
2. Replace data dependence
on input with those on local
copy of output
3. Remove control
dependence on input
Initialize Output
For $b in //score
If b in ( ee,cis,math)
output[b]++;
Initialize global Output
For $b in local copy
output[b]++;
Parallel Performance- Q5,Q20
180
160
140
120
100
Q 20
Q5
80
60
40
20
0
1
2
4
8
Q20 6GB, Q5 4GB, Good speedup
Parallel Performance- VMScope
4500
4000
3500
3000
2500
COMM
NO COMM
2000
1500
1000
500
0
1
2
4
8
Conclusion
Provided a new framework for processing
XQuery based on compiler techniques
Designed new optimization and analysis
techniques for XQuery
Support of high-level abstractions to hide
low-level details
Experiment results show effectiveness
Thank you !!!
GNL Example
DFG
GNL
S0
S1
S2
v1
v2
b
Facilitate code generation for any desired platform
Conservative analysis
Our analysis is conservative
- A valid query may be labeled as “cannot be evaluated
in a single-pass”
Example:
Horizontal Fusion: Side-effect
May resulted incorrect result due to interdependence
let $b = count(stream/pixel)
for $i in stream/pixel
return $i/x idiv $b
for $i in stream/pixel
return $i/x idiv count()
Partial result of count is used to compute output
Will be dealt with at single-pass analysis
© Copyright 2026 Paperzz