Subtree Oracle Pushdown Automata for Trees in Prefix
Notation
Martin Plicka
Jan Janoušek
Bořivoj Melichar
Department of Theoretical Computer Science
Czech Technical University in Prague, Faculty of Information Technology
AAMP VIII, Villa Lanna, 20.3.2011
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
1 / 32
Introduction
Motivation
Motivation
Arbology tries to use the same principles on trees as they are used
on strings in stringology, using pushdown automata instead of finite
automata as computation model.
Factor oracle automata in stringology reduce the amount of states in
factor automata to minimal value. . . .
. . . with some disadvantages.
What about creating oracle automata for trees in prefix notation?
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
2 / 32
Introduction
Oracle Automata in Stringology
Factor Oracle Automata in Stringology
Suffix automaton accepts all suffixes of given string. Factor
automaton accepts all factors (substrings) of subject string.
Number of distinct factors of a string is limited by O(n2 ) where n is
length of the string. Number of states of factor automaton is linear
in n. Search time complexity is m where m is length of string to be
found.
Factor oracle automata have exactly n + 1 states.
Unfortunately, factor oracle automata also accept some
subsequences (discontinuous substrings) of subject string.
Despite this property, they can be still used for indexing of strings.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
3 / 32
Introduction
Oracle Automata in Stringology
Corresponding States
Definition
Let M be the factor automaton for string w and q1 , q2 be different
states of M. Let there exist two sequences of transitions in M:
(q0 , w1 ) `∗ (q1 , ε), and
(q0 , w2 ) `∗ (q2 , ε).
If w1 is a suffix of w2 and w2 is a prefix of w then q1 and q2 are
corresponding states.
The factor oracle automaton can be constructed by merging the
corresponding states.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
4 / 32
Introduction
Oracle Automata in Stringology
Construction of Factor Oracle
Input: String w ,
Output: Deterministic factor oracle automaton Oracle(w ).
1
Construct nondeterministic factor automaton for given string w .
2
Create equivalent deterministic automaton.
3
Merge all corresponding states.
4
If result automaton is not deterministic, repeat steps 2 and 3.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
5 / 32
Introduction
Oracle Automata in Stringology
Example
a
b
b
0
1
a
b
2
b
[2, 3]
b
3
a
4
a
b
[0]
a
[1, 4]
b
[2]
b
[3]
a
Factor automaton for string
x = abba.
Fact(x) =
{a, b, ab, bb, ba, abb, bba,
abba}
1
Non deterministic version,
2
deterministic version.
[4]
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
6 / 32
Introduction
Oracle Automata in Stringology
Example - cont.
[2, 3]
b
Factor automaton for string
"abba".
a
b
[0]
a
[1, 4]
b
[2]
b
b
[0]
a
[1, 4]
[3]
a
a
b
[2]
b
[3]
a
1
Deterministic version.
2
Deterministic factor
oracle. It additionally
accepts string "aba"
which is not factor of
"abba".
[4]
[4]
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
7 / 32
Pushdown Automata for Trees in Prefix Notation
Construction
Pushdown Automata (PDA) for Trees in Prefix Notation
Subtree automata have similar behavior to suffix automata from
stringology – each subtree is a prefix of some suffix of linear
representation in prefix notation.
Grammar of tree prefix notation is LL(1) – we can use pushdown
automata for accepting trees or their subtrees in prefix notation.
Arity checksum = number of nodes to be read to have complete
tree (number of remaining symbols on pushdown store).
ac(q) = 0 ⇒ accept.
We need the only one pushdown symbol ⇒ pushdown store can be
replaced by counter.
They represent index for quick searching in trees, using time linear
in m where m is a length of string representing the tree in prefix
notation.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
8 / 32
Pushdown Automata for Trees in Prefix Notation
Construction
Example
Tree t1 , pref (t1 ) = b2 b0 a2 a0 a2 a0 a0
Subtrees are:
b2
a2
b0
a0
a2
a0
a0
1
b2 b0 a2 a0 a2 a0 a0
2
a2 a0 a2 a0 a0
3
a2 a0 a0
4
a0
5
b0
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
9 / 32
Pushdown Automata for Trees in Prefix Notation
Construction
Example - Accepting Whole Tree
0
1
2
3
4
5
b2 |S 7→ SS b0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
6
7
a0 |S 7→ ε
AAMP VIII
10 / 32
Pushdown Automata for Trees in Prefix Notation
Construction
Example - Accepting Subtrees
a0 |S 7→ ε
a2 |S 7→ SS
a0 |S 7→ ε
a0 |S 7→ ε
a2 |S 7→ SS
b0 |S 7→ ε
0
1
2
3
4
5
b2 |S 7→ SS b0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
6
7
a0 |S 7→ ε
AAMP VIII
11 / 32
Pushdown Automata for Trees in Prefix Notation
Deterministic PDA
Making the PDA Deterministic
Subtree PDAs can be always transformed into deterministic ones
because they are input–driven – pushdown operations are
determined by input symbol.
Algorithm is the same as in case of finite automata.
States which are accessible by only invalid pushdown operation
(reading pushdown symbol from empty pushdown store), are omitted
since our PDA accepts input by reaching the empty pushdown store.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
12 / 32
Pushdown Automata for Trees in Prefix Notation
Deterministic PDA
Example - Deterministic Subtree PDA
[4, 6
7]
a0 |S →
7 ε
a0 |S 7→ ε
a2 |S 7→ SS
[3, 5]
b0 |S 7→ ε
[0]
[1]
a0 |S 7→ ε
[4, 6]
a2 |S 7→ SS
[2]
[3]
[4]
[5]
b2 |S 7→ SS b0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
[6]
[7]
a0 |S 7→ ε
AAMP VIII
13 / 32
Oracle Pushdown Automata
Construction
Oracle PDA - Construction
Construction adapts all steps from the construction of string factor
oracle automata.
Corresponding states are defined in similar manner.
Oracle PDA is constructed by merging all pairs of corresponding
states.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
14 / 32
Oracle Pushdown Automata
Construction
Example for Subtree
[4, 6
7]
a0 |S →
7 ε
a0 |S 7→ ε
a2 |S 7→ SS
[3, 5]
[0]
[1]
a0 |S 7→ ε
[4, 6]
b0 |S 7→ ε
a2 |S 7→ SS
[2]
[3]
[4]
[5]
[6]
b2 |S 7→ SS b0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε
[7]
a0 |S 7→ ε
Corresponding states are marked with distinct colours.
a0 |S 7→ ε
a2 |S 7→ SS
a0 |S 7→ ε
b0 |S 7→ ε
[0]
[1]
[2]
[3]
[4]
[5]
b2 |S 7→ SS b0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
[6]
[7]
a0 |S 7→ ε
AAMP VIII
15 / 32
Oracle Pushdown Automata
Construction
Example for Subtree - cont.
a0 |S 7→ ε
a2 |S 7→ SS
a0 |S 7→ ε
b0 |S 7→ ε
[0]
[1]
[2]
[3]
[4]
[5]
b2 |S 7→ SS b0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε
b2
t1
a0
a2
a0
a0
a0 |S 7→ ε
t2
Tree t2 is not subtree of subject
tree t1 .
a0
Pref (t2 ) is a subsequence of
Pref (t1 ), created by omitting
repetition of a2 a0 in Pref (t1 ).
a2
b0
[7]
This automaton accepts tree t2 .
b2
a2
b0
[6]
a0
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
16 / 32
Oracle Pushdown Automata
PDA for Prefix Bar Notation
Unranked Trees – Bar Notation
Subtrees can be marked with parentheses.
Tree t1
[b[b][a[a][a[a][a]]]]
Bar notation
[b[b][a[a][a[a][a]]]]
b
a
b
Subtrees are:
a
a
a
a
1
bb]aa]aa]a]]]]
2
aa]aa]a]]]
3
aa]a]]
4
a]
5
b]
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
17 / 32
Oracle Pushdown Automata
PDA for Prefix Bar Notation
Unranked Trees – Bar Notation
Subtrees can be marked with parentheses.
Tree t1
[b[b][a[a][a[a][a]]]]
Bar notation
[b[b][a[a][a[a][a]]]]
b
a
b
Subtrees are:
a
a
a
a
1
bb]aa]aa]a]]]]
2
aa]aa]a]]]
3
aa]a]]
4
a]
5
b]
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
17 / 32
Oracle Pushdown Automata
PDA for Prefix Bar Notation
Unranked Trees – Bar Notation
Subtrees can be marked with parentheses.
Tree t1
[b[b][a[a][a[a][a]]]]
Bar notation
[b[b][a[a][a[a][a]]]]
b
a
b
Subtrees are:
a
a
a
a
1
bb]aa]aa]a]]]]
2
aa]aa]a]]]
3
aa]a]]
4
a]
5
b]
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
17 / 32
Oracle Pushdown Automata
PDA for Prefix Bar Notation
Subtree PDA for Bar notation
a|S 7→ S
a|S 7→ S
a|S 7→ S
a|S 7→ S
a|S 7→ S
b|S 7→ S
0
1
b|S 7→ S
2
3
]|S 7→ ε
b|S 7→ SS
4
5
a|S 7→ SS
a|S 7→ SS
6
7
a|S 7→ SS
]|S 7→ ε
8
9
]|S 7→ ε
a|S 7→ SS
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
10
11
]|S 7→ ε
a|S 7→ SS
12
13
14
]|S 7→ ε
]|S 7→ ε
]|S 7→ ε
AAMP VIII
18 / 32
Oracle Pushdown Automata
PDA for Prefix Bar Notation
Deterministic Subtree PDA for Bar notation
6, 9
11
]|S 7→ ε
]|S 7→ ε
4, 5, 7
8, 10
a|S 7→ S
a|S 7→ SS
]|S 7→ ε
0
1, 2
b|S 7→ S
2
3
]|S 7→ ε
b|S 7→ SS
5, 8
4
5
a|S 7→ SS
a|S 7→ SS
6, 9
a|S 7→ SS
6
7
a|S 7→ SS
]|S 7→ ε
]|S 7→ ε
7, 10
a|S 7→ SS
8
9
]|S 7→ ε
a|S 7→ SS
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
10
11
]|S 7→ ε
a|S 7→ SS
12
13
14
]|S 7→ ε
]|S 7→ ε
]|S 7→ ε
AAMP VIII
19 / 32
Oracle Pushdown Automata
PDA for Prefix Bar Notation
Subtree Oracle PDA for Bar notation
a|S 7→ S
]|S 7→ ε
]|S 7→ ε
0
1, 2
b|S 7→ S
2
]|S 7→ ε
3
]|S 7→ ε
b|S 7→ SS
4
5
a|S 7→ SS
a|S 7→ SS
6
7
a|S 7→ SS
]|S 7→ ε
8
9
]|S 7→ ε
a|S 7→ SS
10
11
]|S 7→ ε
a|S 7→ SS
12
13
14
]|S 7→ ε
]|S 7→ ε
]|S 7→ ε
False positives:
1
bb]a]a]]
2
bb]a]aa]a]]]
3
bb]aa]a]]]
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
20 / 32
Oracle Pushdown Automata
PDA for Prefix Bar Notation
Subtree Oracle PDA for Bar notation
a|S 7→ S
]|S 7→ ε
]|S 7→ ε
0
1, 2
b|S 7→ S
2
]|S 7→ ε
3
]|S 7→ ε
b|S 7→ SS
4
5
a|S 7→ SS
a|S 7→ SS
6
7
a|S 7→ SS
]|S 7→ ε
8
9
]|S 7→ ε
a|S 7→ SS
10
11
]|S 7→ ε
a|S 7→ SS
12
13
14
]|S 7→ ε
]|S 7→ ε
]|S 7→ ε
False positives:
1
bb]a]a]]
2
bb]a]aa]a]]]
3
bb]aa]a]]]
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
20 / 32
Safe Oracle
Properties
Safe Oracle
Two significant properties cause that for trees, the number of false
positive accepts should be smaller.
1
2
During the transformation to PDA, using the stop condition of empty
pushdown store, inaccessible states are omitted.
Only strings representing trees are accepted.
During the state merge, outgoing transition from one state, in
combination with incoming transition to the second state, may
cause automaton accepting new tree.
Second property makes room for specifying conditions of "safe"
state merge.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
21 / 32
Safe Oracle
Properties
Input Arity Checksum
Definition
Minimal input arity checksum of state q ∈ Q:
−
ACmin
(q) = min{i : (q0 , αβ, S) `∗ (q, β, S i ), α ∈ A∗ , β ∈ A∗ , i ≥ 0}
Definition
Minimal input arity checksum of state q ∈ Q for input symbol x ∈ A,:
−
acmin
(q, x) = min{i : (q0 , αxβ, S) `∗ (q, β, S i ), α ∈ A∗ , β ∈ A∗ , i ≥ 0}
The triple represents pushdown automata configuration. A is a ranked
alphabet. q0 ∈ Q is initial state of pushdown automata.
In other words, minimal input arity checksum specifies the minimal
number of pushdown symbols that can appear in pushdown store in
state q after reading arbitrary input α.
The arity checksum for particular input only specifies the last transition
leading to the state q.
− , ac − ) versions as well.
Similarly, we can define "max" (ACmax
max
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
22 / 32
Safe Oracle
Properties
Output Arity Checksum
Definition
Maximal output arity checksum of state q ∈ Q:
+ (q) = max{i : (q, α, S i ) `∗ (r , ε, ε), α ∈ A∗ , r ∈ Q, i ≥ 0}
ACmax
Definition
Maximal output arity checksum of state q ∈ Q for input symbol x ∈ A:
+ (q, x) = max{i : (q, xα, S i ) `∗ (r , ε, ε), α ∈ A∗ , r ∈ Q, i ≥ 0}
acmax
The triple represents pushdown automata configuration. A is a ranked
alphabet.
Maximal output arity checksum specifies the maximal number of
pushdown symbols that can be read starting from state q after reading
arbitrary input α.
The arity checksum for particular input symbol only specifies first
transition leading from the state q.
+
+
Similarly, we can define "min" (ACmin
, acmin
) versions as well.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
23 / 32
Safe Oracle
Properties
Example
[4, 6
7]
a0 |S →
7 ε
a0 |S 7→ ε
a2 |S 7→ SS
[3, 5]
b0 |S 7→ ε
[0]
[1]
a0 |S 7→ ε
[4, 6]
a2 |S 7→ SS
[2]
[3]
[4]
[5]
[6]
b2 |S 7→ SS b0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε a2 |S 7→ SS a0 |S 7→ ε
[7]
a0 |S 7→ ε
−
+ ([4, 6, 7]) = 0
ACmin
([4, 6, 7]) = ACmax
−
+ ([4, 6], a2) = ac + ([4, 6], a0) = 1
ACmin
([4, 6]) = acmax
max
−
ACmin
([5]) = 2 (by reading either a2 a0 a2 or b2 b0 a2 a0 a2 )
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
24 / 32
Safe Oracle
Safe merge
Safe merge
Lemma
Two distinct corresponding states q, r of deterministic subtree pushdown
automata can be "safely" merged if at least one of following conditions is
met:
1
2
− (q) = 0 or AC − (r ) = 0.
Either ACmax
max
For every input symbol x ∈ A such that δ(q, x, S) 6= δ(r , x, S), it
+ (r , x) < AC − (q) and ac + (q, x) < AC − (r )
holds that acmax
max
min
min
By sequential safe merging of corresponding states, we may create
"safe" oracle PDA.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
25 / 32
Safe Oracle
Safe merge
Example of Safe Merge
Pref (t3 ) = a3 a2 a0 a0 a2 a0 b0 a2 a0 a0
[3, 4, 6
9, 10]
a0 |S 7→ ε
b0 |S 7→ ε
[2, 5
8]
a2 |S 7→ SS
[0]
[1]
[4, 10]
b0 |S 7→ ε
a0 |S →
7 ε a0 |S 7→ ε
[2]
a2 |S 7→ SS
a3 |S 7→ SSS
[3, 6
9]
[3]
a0 |S 7→ ε
[4]
a0 |S 7→ ε
[5]
a2 |S 7→ SS
[6]
a0 |S 7→ ε
[7]
b0 |S 7→ ε
[8]
a2 |S 7→ SS
[9]
a0 |S 7→ ε
[10]
a0 |S 7→ ε
− ([4, 10]) = 0 → condition 1 is satisfied.
ACmax
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
26 / 32
Safe Oracle
Safe merge
Example of Safe Merge - cont.
Pref (t3 ) = a3 a2 a0 a0 a2 a0 b0 a2 a0 a0
[3, 4, 6
9, 10]
a0 |S 7→ ε
b0 |S 7→ ε
[2, 5
8]
a2 |S 7→ SS
[0]
[1]
a0 |S 7→ ε
[2]
a2 |S 7→ SS
a3 |S 7→ SSS
[3, 6
9]
[3]
a0 |S 7→ ε
b0 |S 7→ ε
a0 |S 7→ ε
[4]
a0 |S 7→ ε
[5]
a2 |S 7→ SS
+ ([3], b )
acmax
0
+
acmax ([3, 6, 9], b0 )
[6]
a0 |S 7→ ε
[7]
b0 |S 7→ ε
[8]
a2 |S 7→ SS
[9]
a0 |S 7→ ε
[10]
a0 |S 7→ ε
−
([3, 6, 9]) = 1
= 0 < ACmin
−
= 2 < ACmin
([3]) = 3
→ condition 2 is satisfied.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
27 / 32
Safe Oracle
Safe merge
Example of Safe Merge - cont.
Pref (t3 ) = a3 a2 a0 a0 a2 a0 b0 a2 a0 a0
[3, 4, 6
9, 10]
a0 |S 7→ ε
b0 |S 7→ ε
[2, 5
8]
a0 |S 7→ ε
b0 |S 7→ ε
a2 |S 7→ SS
[0]
[1]
[2]
a2 |S 7→ SS
a3 |S 7→ SSS
[3]
a0 |S 7→ ε
[4]
a0 |S 7→ ε
[5]
a2 |S 7→ SS
[6]
a0 |S 7→ ε
[7]
b0 |S 7→ ε
[8]
a2 |S 7→ SS
[9]
a0 |S 7→ ε
[10]
a0 |S 7→ ε
There is no input symbol x such as δ([2, 5, 8], x, S) 6= δ([2], x, S)
⇒ condition 2 is satisfied.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
28 / 32
Safe Oracle
Safe merge
Example of Safe Merge - cont.
Pref (t3 ) = a3 a2 a0 a0 a2 a0 b0 a2 a0 a0
[3, 4, 6
9, 10]
a0 |S 7→ ε
b0 |S 7→ ε
b0 |S 7→ ε
a2 |S 7→ SS
[0]
[1]
[2]
a2 |S 7→ SS
a3 |S 7→ SSS
[3]
a0 |S 7→ ε
[4]
a0 |S 7→ ε
[5]
a2 |S 7→ SS
[6]
a0 |S 7→ ε
[7]
b0 |S 7→ ε
[8]
a2 |S 7→ SS
[9]
a0 |S 7→ ε
[10]
a0 |S 7→ ε
− ([3, 4, 6, 9, 10]) = 0
ACmax
⇒ condition 1 is satisfied.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
29 / 32
Safe Oracle
Safe merge
Example of Safe Merge - cont.
Pref (t3 ) = a3 a2 a0 a0 a2 a0 b0 a2 a0 a0
b0 |S 7→ ε
a0 |S 7→ ε
b0 |S 7→ ε
a2 |S 7→ SS
[0]
[1]
[2]
a2 |S 7→ SS
a3 |S 7→ SSS
[3]
a0 |S 7→ ε
[4]
a0 |S 7→ ε
[5]
a2 |S 7→ SS
[6]
a0 |S 7→ ε
[7]
b0 |S 7→ ε
[8]
a2 |S 7→ SS
[9]
a0 |S 7→ ε
[10]
a0 |S 7→ ε
This automaton does not accept any additional input.
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
30 / 32
Summary
Summary
Oracle automata for trees in prefix notation and prefix bar notation
were introduced.
General properties of such automata were illustrated,
including safe state merging definition.
Open problems:
Define properties of false accepted trees.
Find common properties of some kind of input trees (e.g.
repetitions).
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
31 / 32
Summary
Summary
Oracle automata for trees in prefix notation and prefix bar notation
were introduced.
General properties of such automata were illustrated,
including safe state merging definition.
Open problems:
Define properties of false accepted trees.
Find common properties of some kind of input trees (e.g.
repetitions).
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
31 / 32
Summary
Arbology
http://www.arbology.org/
Subtree Oracle Pushdown Automata for Trees in Prefix Notation
AAMP VIII
32 / 32
© Copyright 2026 Paperzz