Tester.

Property testing of Tree Regular Languages
Frédéric Magniez, LRI, CNRS
Michel de Rougemont, LRI , University Paris II
Property testing of Tree Regular Languages
1. Tester for regular words with the Edit
Distance with Moves
2. Tester for ranked regular trees with the
Tree-Edit Distance with Moves,
Testers on a class K
Let F be a property on a class K of structures U
An ε -tester for F is a probabilistic algorithm A such that:
• If U |= F, A accepts
• If U is ε far from F, A rejects with high probability
• Time(A) independent of n.
(Goldreich, Golwasser, Ron 1996 , Rubinfeld, Sudan 1994)
Tester usually implies a linear time corrector.
History of Testers
Self-testers and correctors for Linear Algebra ,Blum & Kanan 1989
Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994
Testers for graph properties : k-colorability, Goldreich and al. 1996

2
graph properties have testers, Alon and al. 1999
Regular languages have testers, Alon and al. 2000s
Testers for Regular tree languages , Mdr and Magniez, ICALP 2004
Edit distance on Words
1. Classical Edit Distance:
Insertions, Deletions, Modifications
2. Edit Distance with moves
0111000011110011001
0111011110000011001
3. Edit Distance with Moves generalizes to Trees
Testers on words
Simpler proof which generalizes to regular trees.
L is a regular language and A an automaton for L.
C2
C4
C0
C3
C1
Admissible Z=
C0.C2.C3.C4
init
accept
A word W is Z-feasible if there are two states
qCi ,q'C j such that
W
q
q' and Z...Ci ...C j...
The Tester
Tester. Input : W,A, ε
For i1,...,log( m/)
Choose Ni (2i.m3/) random
subwords wij of size 2i1
For every admissible path Z:
If all wij of W are Z feasible, ACCEPT.
else REJECT.
Theorem: Tester(W,A, ε ) is an ε -tester for L(A).
Proof schema of the Tester
Theorem: Regular words are testable.
Robustness lemma: If W is ε-far from L, then for every
2
admissible path Z, there exists ilog( 5.m ) such that the number
of Z-infeasible subwords
i1
2
2i1 is at least
..n.
2
m
Splitting lemma: if W is far from L there are many disjoint
infeasible subwords.
Amplifying lemma: If there are many infeasible words, there are
many short ones.
Merging
Merging lemma: Let Z be an admissible path, and let F be a Zfeasible cut of size h’ . Then Dist (F,L)m2h'
C
C
C
C
C
C
Take each word wi F and split it along its connected
components, removing single letters. Rearrange all the
words of the same component in its Z-order.
Add gluing words to obtain W’ in L:
W
' g0.w1.g1.w2.g2.w2.......
Splitting
Splitting lemma: If Z is an admissible path, W a word s.t. dist(W,L) > h,
then W has more than h/m 2 Zinfeasible disjoint subwords. (h.n)
Proof by contraposition:
W has less than h' h/m2 minimal Zinfeasible and disjoint subwords.
Removing the last letters provides a feasible cut F. Dist(W, F) h'.
By the merging lemma Dis(F, L)m2h'.
Hence Dist(W, L) h'm2h'
And Dist(W, L) h
Tree-Edit-Distance
a
Deletion
Edge
a
b
e
c
b
a
c
e
d
b
Insertion
Node and
Label
f
d
e
e
c
Tree Edit distance with moves:
a
a
c
b
1 move
b
d
e
c
d
Distance Problem is NP-complete, non-approximable.
e
Tree-Edit-Distance on binary trees
Binary trees : Distance with moves allows
permutations
Distance(T1,T2) =4
m-Distance (T1,T2) =2
Tree automata
•
•
(q0, q0)  q1
(q0,q1)  q1
(q1,q1)q2
(q1,q0)q2
(q2,-) q2
(-,q2) q2
A  (Q, q0,  , q1)
q1
q1
q0
q1
q0
q2
q1
q0
q1
q1
q0
q0
q0
q0
q0
q0
Infeasible subtrees
Fact . If Distance (T,L).n then the number of
infeasible subtrees of constant size is O(n).
Tester for regular Trees
Tester. Input : T,A, 
For i
r 2m1

Choose Ni (
m.r 4m3
2
) random
nodes and subtrees tij of size i
If all tij of T are Z feasible, ACCEPT.
Theorem: Tester(T,A, ε ) is an ε -tester for L(A).
Proof schema of the Tester
Theorem: Regular trees are testable.
Robustness lemma: If T 2is
ε-far from L, then for every admissible
m

1
r
path Z, there exists i(  ) such that the number
of Z-infeasible i-subtrees
is at least
r
1 . 2.n.
4m3
Splitting lemma: if T is far from L there are many disjoint infeasible
subtrees.
Amplifying lemma: If there are many infeasible subtrees, there are
many small ones.
Splitting and Merging
Splitting and Merging on words:
C
C
C
C
C
Splitting and Merging on trees:
C
Splitting and Merging trees
E
C
C
Connected Components
Corrected tree
C D
D
Conclusion
•
•
Verification is hard.
Approximate verification can be feasible.
1.
2.
3.
4.
5.
Testers and Correcters for regular words
Tester for regular trees
Corrector for regular trees
Unranked trees: XML files
Applications: Constant algorithm for Edit Distance with
moves
(Fischer, Magniez, Mdr)