Path Expressions

Web Data Management
Path Expressions
1
In this lecture
• Path expressions
• Regular path expressions
• Evaluation techniques
Resources:
Data on the Web Abiteboul, Buneman, Suciu : section 4.1
2
Path Expressions
Examples:
• Bib.paper
• Bib.book.publisher
• Bib.paper.author.lastname
Given an OEM instance, the answer of a path
expression p is a set of objects
3
Path Expressions
Bib
&o1
paper
paper
Examples:
book
references
&o12
&o24
references
author
title
DB =
&o43
year
&o44
&o29
references
author
http
author
title publisher
title
author
author
author
&o45 &o46
&o52
page
&25
&96
1997
firstname
lastname
&o70
“Serge”
&o47 &o48 &o49 &o50 &o51
firstname
&o71
“Abiteboul”
Bib.paper={&o12,&o29}
Bib.book.publisher={&o51}
Bib.paper.author.lastname={&o71,&206}
&243
“Victor”
last
lastname
first
&206
“Vianu”
122
4
133
Answer of a Path Expression
Simple evaluation algorithms for Answer(P,DB):
Answer(P, DB) = f(P, root(DB))
Where:
f(e, x) = {x}
f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)}
Runs in PTIME in size(P), size(db):
– PTIME complexity
5
Regular Path Expressions
R ::= label | _ | R.R | (R|R) | R* | R+ | R?
Examples:
• Bib.(paper|book).author
• Bib.book.author.lastname?
• Bib.book.(references)*.author
• Bib.(_)*.zip
6
Applications of Regular Path Expressions
• Navigating uncertain structure:
– Bib.book.author.lastname?
• Syntactic substitution for inheritance:
– Bib.(paper|book).author
– Better: Bib.publication.author, but we don’t
have inheritance
7
Applications of Regular Path Expressions
• Computing transitive closure:
– Bib.(_)*.zip
= everything accessible
– Bib.book.(references)*.author
= everything accessible via references
• Some regular expressions of doubtful practical use:
– (references.references)*
= a path with an even number of references
– (_._)* = paths of even length
– (_._._.(_)?)* = paths of length (3m + 4n) for some m,n
• But make great examples for illustration 
8
Answer of a Regular Path Expression
Recall:
– Lang(R) = the set of words P generated by R
Answer of regular path expressions:
– Answer(R,DB) = {Answer(P,DB) | P  Lang(R)}
Need an evaluation algorithm that copes with cycles
9
Regular Path Expressions
Recall: each regular expression  NDFA
Example: R = (a.a)*.a.b
a
A=
s1
a
s2
a
s3
b
states(A) = {s1,s2,s3,s4}
initial(A) = s1
terminal(A) = {s4}
s4
10
Regular Path Expressions
Canonical Evaluation Algorithm
Answer(R,DB):
1. construct A from R
2. construct product automaton G = A x DB:
–
nodes(G) = states(A) x nodes(db)
–
edges(G) = {((s,x),L,(s’,x’) | (s,L,s’)  edges(A),
(x,L,x’)  edges(DB)}
–
root(G) = (initial(A), root(DB))
3. compute Gacc = set of nodes accessible from root(G)
4. return {x | s  terminal(A) s.t. (s,x)  Gacc}
11
Regular Path Expressions
Example: R = _.(_._)*.a
_
A=
s1
s2
s3
DB =
&o1
a
a
_
&o2
a
a
&o3
&o4
Answer of R on DB = { &o2, &o3}
b
12
Compute Product Automaton G
_
a
_
s1,&o1
s2,&o1
s3,&o1
a
a
a
s1,&o2
s2,&o2
s3,&o2
a
a
a
a
a
a
s1,&o3
s1,&o4
s2,&o3
s2,&o4
s3,&o3
s3,&o4
b
b
b
13
Compute Accessible Part
acc
G
_
a
_
s1,&o1
s2,&o1
s3,&o1
a
a
a
s1,&o2
s2,&o2
s3,&o2
a
a
a
a
a
a
s1,&o3
s1,&o4
s2,&o3
s2,&o4
s3,&o3
s3,&o4
b
Answer(R,DB) = {&o2, &o3}
b
b
14
Complexity of Regular Path
Expressions
• The evaluation algorithm runs in PTIME in
size(R), size(DB)
• Even when there are cycles in DB
15