Multi-dimensional Trees and a Chomsky-Schützenberger

ISSN 1346-5597
NII Technical Report
Multi-dimensional Trees and a
Chomsky-Schützenberger-Weir Representation
Theorem for Simple Context-Free Tree Grammars
Makoto Kanazawa
NII-2013-003E
Nov. 2013
Multi-dimensional Trees and a
Chomsky-Schützenberger-Weir Representation
Theorem for Simple Context-Free Tree
Grammars
Makoto Kanazawa?
National Institute of Informatics, Tokyo, Japan
[email protected]
Abstract. Weir [34] proved a Chomsky-Schützenberger-like representation theorem for the string languages of tree-adjoining grammars, where
the Dyck language Dn in the Chomsky-Schützenberger characterization
is replaced by the intersection D2n ∩ g −1 (D2n ), where g is a certain bijection on the alphabet consisting of 2n pairs of brackets. This paper
presents a generalization of this theorem to the string languages that
are the yield images of the tree languages generated by simple (i.e., linear and non-deleting) context-free tree grammars. This result is obtained
through a natural generalization of the original Chomsky-Schützenberger
theorem to the tree languages of simple context-free tree grammars. We
use Rogers’s [24,23] notion of multi-dimensional trees to state this latter
theorem in a very general, abstract form.
Keywords: Context-free tree grammar; Multi-dimensional tree; Dyck
language; Chomsky-Schützenberger theorem
1
Introduction
Weir [34] showed that every string language L generated by a tree-adjoining
grammar [11] can be written as
L = h(R ∩ D2n ∩ g −1 (D2n )),
where h is a homomorphism, R is a regular set, n is a positive integer, D2n
is the Dyck language over the alphabet Γ2n consisting of 2n pairs of brackets
[1 , ]1 , , . . . , [2n , ]2n , and g is the bijection on Γ2n defined by
g([2i+1 ) = [2i+1 ,
g(]2i+1 ) = ]2i+2 ,
g([2i+2 ) = ]2i+1 ,
g(]2i+2 ) = [2i+2 ,
for i = 0, . . . , n − 1. The effect of the intersection with g −1 (D2n ) on the Dyck
language D2n is to make the consecutive odd-numbered and even-numbered
?
This work was in part supported by the Japan Society for the Promotion of Science
Grant-in-Aid for Scientific Research (KAKENHI) (25330020).
2
Makoto Kanazawa
brackets [2i+1 , ]2i+1 , [2i+2 , ]2i+2 always appear as a group, in the configuration [2i+1 [2i+2 ]2i+2 ]2i+1 . When two such groups, say, [1 , ]1 , [2 , ]2 and
[3 , ]3 , [4 , ]4 , overlap, the only possible configurations are
[1 [3 [4 ]4 ]3 [2 ]2 ]1 ,
[1 [2 ]2 [3 [4 ]4 ]3 ]1 ,
[1 [3 [4 [2 ]2 ]4 ]3 ]1 ,
and those with the positions of the two groups interchanged. As Weir [34] showed,
D2n ∩ g −1 (D2n ) is a non-context-free tree-adjoining language for every n ≥ 1.
In this paper, we prove a generalization of Weir’s theorem for simple (i.e.,
linear and non-deleting) context-free tree grammars 1 [26,8,15]: if L is the string
language generated by a simple context-free tree grammar of rank q − 1, then L
can be written as
L = h(R ∩ Dqn ∩ g −1 (Dqn )),
where h, R, and n are as before, Dqn is the Dyck language over the alphabet
Γqn (containing qn pairs of brackets), and g is the bijection on Γqn defined by
g([qi+1 ) = [qi+1 ,
g(]qi+1 ) = ]qi+q ,
g([qi+j ) = ]qi+j−1 ,
g(]qi+j ) = [qi+j ,
for i = 0, . . . , n − 1 and j = 2, . . . , q. This result generalizes Weir’s [34] because
tree-adjoining grammars generate the same string languages as simple contextfree tree grammars that are monadic (i.e., of rank 1) [21,10,16]. As in the original
Chomsky-Schützenberger theorem [4,3], we can take R to be a local set and h
to be alphabetic in the sense that h maps each symbol either to a symbol or to
the empty string.
It is known [12,13] that the string languages of simple context-free tree grammars are exactly those generated by multiple context-free grammars [29] that are
well-nested in the sense of [13]. For multiple context-free grammars in general,
Yoshinaka et al. [36] have proved a Chomsky-Schützenberger-like representation
theorem, but the analogy to the Chomsky-Schützenberger theorem is somewhat
weak because their notion of a multiple Dyck language is given only by reference
to a certain multiple context-free grammar, and does not seem to have other independent characterizations, analogous to the characterization of ordinary Dyck
languages in terms of the cancellation law [i ]i
ε. Our result is obtained via a
natural generalization of the Chomsky-Schützenberger theorem to the tree languages of simple context-free tree grammars. This intermediate result is stated
in terms of Dyck tree languages, which are exactly analogous to the original Dyck
languages in that they have two equivalent definitions, one in terms of inductive
definitions and one in terms of rewriting with cancellation laws.
In order to emphasize the analogy between the string case and the tree
case, we use the notion of a multi-dimensional tree introduced by Rogers [24,23]
1
The term “simple context-free tree grammar” is taken from [7].
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
3
and state many lemmas as general facts about m-dimensional trees. We use 3dimensional trees to represent derivation trees of simple context-free tree grammars. We note that our notion of the yield of an m-dimensional tree will be
different from Rogers’s, because Rogers’s definition was specifically tailored for
the monadic case.2
2
Preliminaries
2.1
First-Child-Next-Sibling Encoding of Ordered Unranked Trees
In an ordered unranked tree, a node may have any number of children, and the
children of the same node are linearly ordered. We do not consider unordered
trees in this paper, so we call ordered unranked trees simply unranked trees. In
the usual term notation for unranked trees [32], the unranked trees over a set Σ
of labels are defined inductively as follows:
– If c ∈ Σ, then c is an unranked tree over Σ.
– If t1 , . . . , tn are unranked trees over Σ (n ≥ 1) and c ∈ Σ, then c(t1 . . . tn ) is
an unranked tree over Σ.
There is a well-known way of encoding unranked trees into binary trees [17],
often called the first-child-next-sibling encoding. We refer to a node in a binary
tree by a string over the set {1, 2}. Thus, the set of nodes of a binary tree forms
a prefix-closed subset T of {1, 2}∗ . (Note that we do not assume binary trees
to be full in the sense that each node has 0 or 2 children.) We write u · v for
the concatenation of two strings u, v ∈ {1, 2}∗ . In the first-child-next-sibling
encoding of unranked trees, the relation u · 2 = v represents the relation “v is the
first child of u”, and the relation u · 1 = v represents the relation “v is the next
sibling of u”. The child relation is then represented by the first-child relation
composed with the reflexive transitive closure of the next-sibling relation. In
this way, any non-empty finite prefix-closed subset T of {1, 2}∗ such that 1 6∈ T
encodes the set of nodes of some unranked tree. In general, an arbitrary nonempty finite prefix-closed subset of {1, 2}∗ encodes the nodes of a hedge, a finite,
non-empty sequence of unranked trees.3 In this encoding, ε (empty string) is the
2
3
When the present work was nearing completion, the author learned of a recent
paper by Sorokin [30], in which he states (without proof) a result similar to Theorem 42 below (Theorem 3 of [30]). (The statement of his theorem is actually closer to
Lemma 39 below.) As will be clear to the reader, the emphasis of the present paper
is very different from Sorokin’s. The merit of the present work lies not so much in
Theorem 42 itself as in the method of obtaining it though a natural generalization
of the constructions that can be used to prove the original Chomsky-Schützenberger
theorem. (Sorokin’s own emphasis is on the use of monoid automata to characterize
the string languages of simple context-free tree grammars.)
Sometimes the empty sequence of unranked trees is also allowed as a hedge, but we
exclude it here in order to be able to encode all hedges into binary trees. Note that
Knuth [17] and Takahashi [31] used forest instead of hedge, the term we adopt here
following [5].
4
Makoto Kanazawa
root of the first tree, 1 is the root of the second tree, 1 · 1 is the root of the third
tree, and so on.
Trees and hedges we consider in this paper are all labeled. Labeled unranked
trees and hedges over Σ are represented by pairs of the form T = (T, `), where
T is a non-empty finite prefix-closed subset of {1, 2}∗ and ` is a function from
T to Σ.
2.2
Dyck Languages
Sn
For n ≥ 1, let Γn = i=1 {[i , ]i }. For each i, the two symbols [i , ]i are regarded
as a matching pair of brackets. Define a binary relation
on Γn∗ by
= { (u [i ]i v, uv) | u, v ∈ Γn∗ , 1 ≤ i ≤ n }.
The Dyck language Dn is defined by
Dn = { v ∈ Γn∗ | v
∗
ε },
where ∗ denotes the reflexive transitive closure of the relation . An alternative way of defining Dn is by the following context-free grammar:
S → ε | AS,
A → [1 S ]1 | · · · | [n S ]n .
The set Dn0 of Dyck primes is defined by
Dn0 = (Dn − {ε}) − (Dn − {ε})2 .
Alternatively, the set Dn0 is defined by the nonterminal A in the above contextfree grammar.
Unranked trees and hedges can be represented by elements of Dyck languages.
If Σ is a set of symbols, let
[
ΓΣ =
{[c , ]c }.
c∈Σ
0
DΣ
We write DΣ and
for the Dyck language and the set of Dyck primes over this
alphabet. Using the standard term notation for labeled unranked trees, define
the string encoding function enc from labeled unranked trees and hedges over
Σ to strings over ΓΣ by
enc(c) = [c ]c ,
enc(c(t1 . . . tn )) = [c enc(t1 . . . tn ) ]c ,
enc(t1 . . . tn ) = enc(t1 ) enc(t2 . . . tn )
for n ≥ 2.
It is clear that the function enc maps any unranked tree over Σ to an element of
0
DΣ
, and any hedge over Σ to an element of DΣ − {ε}. Conversely, it is easy to
0
see that any element of DΣ
encodes a tree over Σ, and any element of DΣ − {ε}
encodes a hedge over Σ. These correspondences are bijections.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
2.3
5
Context-Free Tree Grammars
We deviate from the standard practice and let a context-free tree grammar
generate a set of unranked trees. Thus, the terminal alphabet of a context-free
tree grammar will be unranked. In contrast, the set of nonterminals will be a
ranked alphabet, as in the standard definition.
S
A ranked alphabet is a union Υ = r∈N Υ (r) of disjoint sets of symbols. If
f ∈ Υ (r) , r is the rank of f . If Σ is an (unranked) alphabet and Υ a ranked
alphabet, let TΣ,Υ be the set of trees T ∈ TΣ∪Υ such that whenever a node of
T is labeled by some f ∈ Υ , then the number of its children is equal to the rank
of f .
For convenience, we use the term representation of trees. The set TΣ,Υ can
be defined inductively as follows:
1. If f ∈ Σ ∪ Υ (0) , then f ∈ TΣ,Υ ;
2. If f ∈ Σ ∪ Υ (n) and t1 , . . . , tn ∈ TΣ,Υ (n ≥ 1), then f (t1 . . . tn ) ∈ TΣ,Υ .
In order to define the notion of a context-free tree grammar, we need a
countably infinite supply of variables x1 , x2 , x3 , . . . . The set consisting of the
first n variables is denoted Xn (i.e., Xn = {x1 , . . . , xn }). The notation TΣ,Υ (Xn )
denotes the set TΣ,Υ ∪Xn , where members of Xn are all assumed to have rank
0. A tree in TΣ,Υ (Xn ) is often written t[x1 , . . . , xn ], displaying the variables. If
t[x1 , . . . , xn ] ∈ TΣ,Υ (Xn ) and t1 , . . . , tn ∈ TΣ,Υ , then t[t1 , . . . , tn ] denotes the
result of substituting t1 , . . . , tn for x1 , . . . , xn , respectively, in t[x1 , . . . , xn ]. An
element t[x1 , . . . , xn ] of TΣ,Υ (Xn ) is an n-context if for each i = 1, . . . , n, xi
occurs exactly once in t[x1 , . . . , xn ]. (In the literature, an n-context is sometimes
called a simple tree.)
A context-free tree grammar [26,8] is a quadruple G = (N, Σ, P, S), where
1.
2.
3.
4.
N is a finite ranked alphabet of nonterminals,
Σ is a finite unranked alphabet of terminals,
S is a nonterminal of rank 0, and
P is a finite set of productions of the form
B(x1 . . . xn ) → t[x1 , . . . , xn ],
where B ∈ N (n) and t[x1 , . . . , xn ] ∈ TΣ,N (Xn ).
The rank of G is max{ r | N (r) 6= ∅ }.
For every s, s0 ∈ TΣ,N , s ⇒G s0 is defined to hold if and only if there is a
1-context c[x1 ] ∈ TΣ,N (1), a production B(x1 . . . xn ) → t[x1 , . . . , xn ] in P , and
trees t1 , . . . , tn ∈ TΣ,N such that
s = c[B(t1 . . . tn )],
s0 = c[t[t1 , . . . , tn ]].
The relation ⇒∗G on TΣ,N is defined as the reflexive transitive closure of
⇒G . The tree langauge generated by a context-free tree grammar G, denoted by
L(G), is defined as follows:
L(G) = { t ∈ TΣ | S ⇒∗G t }.
6
Makoto Kanazawa
The string language generated by G is
y(L(G)) = { y(t) | t ∈ L(G) },
where y(t) is the yield of t in the usual sense.
A context-free tree grammar G = (N, Σ, P, S) is said to be simple if for every
production
B(x1 . . . xn ) → t[x1 , . . . , xn ]
in P , t[x1 , . . . , xn ] is an n-context. We let CFTsp (r) stand for the family of tree
languages L such that L = L(G) for some simple context-free tree grammar G
whose rank does not exceed r. We write yCFTsp (r) for the corresponding string
languages { y(L) | L ∈ CFTsp (r) }.
Let {y1 , . . . , yk } be a ranked alphabet, where for i = 1, . . . , k, ri is the rank
of yi . Let ti [x1 , . . . , xri ] be an ri -context. For a tree t ∈ TΣ,{y1 ,...,yk } (Xn ), we
define t[ti [x1 , . . . , xri ]/yi ] inductively as follows:
c(u1 . . . um )[ti [x1 , . . . , xri ]/yi ] = c(u1 [ti [x1 , . . . , xri ]/yi ] . . . um [ti [x1 , . . . , xri ]/yi ])
if c ∈ Σ,
xj [ti [x1 , . . . , xri ]/yi ] = xj ,
yi (u1 . . . uri )[ti [x1 , . . . , xri ]/yi ] = ti [u1 [ti [x1 , . . . , xri ]/yi ], . . . , uri [ti [x1 , . . . , xri ]/yi ]],
yj (u1 . . . urj )[ti [x1 , . . . , xri ]/yi ] = yj (u1 [ti [x1 , . . . , xri ]/yi ] . . . urj [ti [x1 , . . . , xri ]/yi ])
if j 6= i.
(Here, the notation c(u1 . . . um ) stands for c when m = 0, and likewise with
yj (u1 . . . urj ).)
Let G = (N, Σ, P, S) be a simple context-free tree grammar. The derivation
trees of G and their tree yield are defined inductively as follows:
– Let π = B(x1 . . . xn ) → t[x1 , . . . , xn ] be a production in P with no nonterminal occurring in t[x1 , . . . , xn ]. Then d = π is a derivation tree of sort B
and its tree yield is ty(d) = t[x1 , . . . , xn ].
– Let π = B(x1 . . . xn ) → t[x1 , . . . , xn ] be a production in P with at least one
nonterminal occurring in t[x1 , . . . , xn ]. Let v1 , . . . , vk be the pre-order listing
of the nodes of t[x1 , . . . , xn ] labeled by nonterminals. Let Bi be the label of
vi , for i = 1, . . . , k. If di is a derivation tree of sort Bi for i = 1, . . . , k, then
π(d1 . . . dk )
is a derivation tree of sort B. If t̄[x1 , . . . , xn ] is a tree just like t[x1 , . . . , xn ]
except that the label of vi is changed to yi , where y1 , . . . , yk are new symbols,
then the tree yield ty(d) of d = π(d . . . dk ) is defined by
ty(d) = t̄[x1 , . . . , xn ][ty(d1 )/y1 ] . . . [ty(dk )/yk ].
Note that if d is a derivation tree of sort B and n is the rank of B, then ty(d)
is an n-context, so the right-hand side of the above equation is well-defined if yi
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
7
is regarded as a symbol whose rank equals the rank of Bi . It is well known that
if G = (N, Σ, P, S) is a simple context-free grammar, then
L(G) = { ty(d) | d is a derivation tree of G of sort S }.
Example 1. Consider a simple context-free tree grammar G = (N, Σ, P, S),
where N = N (0) ∪ N (2) = {S} ∪ {B}, Σ = {a1 , a2 , a3 , a4 , a5 , a6 , e, g, h}, and
P consists of the following rules:
π1 : S → B(ee),
π2 : B(x1 x2 ) → h(a1 B(h(a2 x1 a3 )h(a4 x2 a5 ))a6 ),
π3 : B(x1 x2 ) → g(x1 x2 ).
The following trees are derivation trees of this grammar:
π1 (π2 (π3 ))
π1 (π2 (π2 (π3 )))
We have
ty(π3 ) = g(x1 x2 ),
ty(π2 (π3 )) = h(a1 g(h(a2 x1 a3 )h(a4 x2 a5 ))a6 ),
ty(π1 (π2 (π3 ))) = h(a1 g(h(a2 ea3 )h(a4 ea5 ))a6 ),
ty(π2 (π2 (π3 ))) = h(a1 h(a1 g(h(a2 h(a2 x1 a3 )a3 )h(a4 h(a4 x2 a5 )a5 ))a6 )a6 ),
ty(π1 (π2 (π2 (π3 )))) = h(a1 h(a1 g(h(a2 h(a2 ea3 )a3 )h(a4 h(a4 ea5 )a5 ))a6 )a6 ).
The string language generated by this grammar is
y(L(G)) = { an1 an2 ean3 an4 ean5 an6 | n ≥ 0 }.
The string languages of simple context-free grammars are the languages generated by non-duplicating macro grammars [9], studied by Seki and Kato [28].4
They also coincide with the languages generated by well-nested multiple contextfree grammars [13].
3
The Chomsky-Schützenberger Theorem
There are many different proofs of the Chomsky-Schützenberger theorem for
context-free languages offered in the literature. Here, we give a proof based on
the relation between context-free languages and local sets of unranked trees.5
4
5
At the level of string languages, simple context-free tree grammars correspond
to non-duplicating and argument-preserving (i.e., non-deleting) macro grammars,
which are equivalent to non-duplicating macro grammars (Lemma 7 of [28]). Seki
and Kato [28] called non-duplicating macro grammars variable-linear.
Among the proofs found in well-known textbooks, the one closest to our proof seems
to be the one given by Kozen [18].
8
Makoto Kanazawa
This will serve as a starting point for our generalization of the theorem to simple
context-free tree grammars.
Let T = (T, `) be a first-child-next-sibling encoding of an unranked tree. We
define three binary relations on T :6
≺T2 = { (u, v) ∈ T × T | u · 2 = v },
≺T1 = { (u, v) ∈ T × T | u · 1 = v },
CT = { (u, v) ∈ T × T | u ≺T2 ◦ (≺T1 )∗ v }.
The relation CT is the child relation on the nodes of T .
Let TΣ be the set of all unranked trees over Σ, encoded as binary trees over
Σ. If A, Z ⊆ Σ and I ⊆ Σ × Σ + are finite sets, define Loc(A, Z, I) to be the set
of all trees T = (T, `) in TΣ that satisfy the following conditions:
L1. `(ε) ∈ A.
L2. u ∈ T − dom(≺T2 ) implies `(u) ∈ Z.
L3. u ≺T2 v1 ≺T1 . . . ≺T1 vn 6∈ dom(≺T1 ) (n ≥ 1) implies (`(u), `(v1 ) . . . `(vn )) ∈ I.
A set L ⊆ TΣ is local [33,31] if there are finite sets A, Z ⊆ Σ and I ⊆ Σ × Σ +
such that L = Loc(A, Z, I).
We introduce a more restrictive notion of locality. If A, Z, Y ⊆ Σ and K, J ⊆
Σ × Σ, define SLoc(A, Z, K, Y, J) to be the set of all trees T = (T, `) in TΣ that
satisfy the following conditions:
SL1.
SL2.
SL3.
SL4.
SL5.
`(ε) ∈ A.
u ∈ T − dom(≺T2 ) implies `(u) ∈ Z.
u ≺T2 v implies (`(u), `(v)) ∈ K.
u 6= ε and u ∈ T − dom(≺T1 ) imply `(u) ∈ Y .
u ≺T1 v implies (`(u), `(v)) ∈ J.
We call L ⊆ TΣ super-local if there are finite sets A, Z, Y ⊆ Σ and K, J ⊆ Σ ×Σ
such that L = SLoc(A, Z, K, Y, J).7
A set of strings L ⊆ Σ + is local 8 if there are finite sets A, Z ⊆ Σ and I ∈ Σ 2
such that
L = AΣ ∗ ∩ Σ ∗ Z − (Σ ∗ (Σ 2 − I)Σ ∗ ).
In this paper, we allow the alphabet Σ to be infinite, but any local subset
of Σ + is included in Σ0+ for some finite subset Σ0 of Σ; likewise, any local or
super-local subset of TΣ is included in TΣ0 for some finite Σ0 ⊆ Σ.
We extend the string encoding function enc to a function from P(TΣ ) to
P(ΓΣ+ ) by enc(L) = { enc(T ) | T ∈ L }.
6
7
8
When R and S are binary relations on some set, we write R ◦ S for the composition
of R and S, and write R∗ for the reflexive transitive closure of R.
This notion of super-locality was called Fe-locality by Takahashi [31].
This notion of a local set is slightly different from McNaughton and Papert’s [20]
notion of a strictly 2-testable language. In the literature, a local set of strings is
sometimes called strictly 2-local (for example, [25]). Eilenberg [6], Takahashi [31],
and Perrin [22] use “local” in the present sense. Local sets of strings were called
“standard regular events” by Chomsky and Schützenberger [3].
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
9
Lemma 2. Let L ⊆ TΣ . If L is super-local, then there is a local set of strings
0
L0 ⊆ ΓΣ+ such that enc(L) = L0 ∩ DΣ
.
Proof. Suppose that A, Z, Y ⊆ Σ, K, J ⊆ Σ × Σ are finite sets such that L =
SLoc(A, Z, K, Y, J). Let
A0 = { [c | c ∈ A },
Z 0 = { ]c | c ∈ A },
I = { [c [d | (c, d) ∈ K } ∪ { [c ]c | c ∈ Z } ∪ { ]c [d | (c, d) ∈ J } ∪
{ ]c ]d | c ∈ Y, d ∈ Σ }.
Let L0 ⊆ ΓΣ+ be the local set of strings defined by
L0 = A0 ΓΣ∗ ∩ ΓΣ∗ Z 0 − (ΓΣ∗ (ΓΣ2 − I)∗ ΓΣ∗ ).
0
.
We show that enc(L) = L0 ∩ DΣ
0
0
To prove enc(L) ⊆ L ∩ DΣ , suppose T = (T, `) ∈ L. Since we know that
0
, it suffices to show enc(T ) ∈ L0 . If c is the label of the root of T ,
enc(T ) ∈ DΣ
then by the definition of L, c ∈ A, so the first and last symbols of enc(T ) must
be [c ∈ A0 and ]c ∈ Z 0 , respectively. So enc(T ) ∈ A0 ΓΣ∗ ∩ ΓΣ∗ Z 0 . Now suppose
ab ∈ ΓΣ2 is a substring of enc(T ). We need to show ab ∈ I. From the definition
of enc, it is clear that there are two nodes u, v of T labeled c and d, respectively,
such that one of the following conditions holds:
–
–
–
–
a = [c ,
a = [c ,
a = ]c ,
a = ]c ,
b = [d ,
b = ]d ,
b = [d ,
b = ]d ,
and u ≺T2 v. In this case, T ∈ L implies (c, d) ∈ K.
u = v 6∈ dom(≺T2 ). In this case, T ∈ L implies c = d ∈ Z.
and u ≺T1 v. In this case, T ∈ L implies (c, d) ∈ J.
u 6= ε, and u 6∈ dom(≺T1 ). In this case, T ∈ L implies c ∈ Y .
In each case, we have ab ∈ I. This shows that enc(T ) 6∈ ΓΣ∗ (ΓΣ2 − I)∗ ΓΣ∗ . We
have shown that enc(T ) ∈ L0 .
0
0
0
To prove L0 ∩ DΣ
⊆ enc(L), suppose s ∈ L0 ∩ DΣ
. Since s ∈ DΣ
, there is
some T ∈ TΣ such that enc(T ) = s. We show that T satisfies the conditions
SL1–SL5 for membership in L = SLoc(A, Z, K, Y, J).
SL1. Let c be the label of the root of T . Then s = enc(T ) starts with [c and
ends with ]c . Since s ∈ L0 , it must be that [c ∈ A0 and ]c ∈ Z 0 . It follows that
c ∈ A.
SL2. Let c be the label of any node u ∈ T − dom(≺T2 ). Then from the
definition of enc, the string [c ]c must be a substring of s. Since s ∈ L0 , it must
be that [c ]c ∈ I. It follows that c ∈ Z.
SL3. Let u, v be nodes of T labeled c and d, respectively, such that u ≺T2 v.
By the definition of enc, it is easy to see that [c [d is a substring of s and hence
in I. It follows that (c, d) ∈ K.
SL4. Let u be any non-root node of T labeled c such that u 6∈ dom(≺T1 ).
Let d be the label of the parent of u. By the definition of enc, ]c ]d must be a
substring of s and hence in I. It follows that c ∈ Y .
10
Makoto Kanazawa
SL5. Let u, v be nodes of T labeled c and d, respectively such that u ≺T1 v.
By the definition of enc, ]c [d must be a substring of s and hence in I. It follows
that (c, d) ∈ J.
We have shown that T ∈ L = SLoc(A, Z, K, Y, J).
t
u
Note that the converse of the above lemma does not necessarily hold,
because L0 can place a restriction on ]d that can follow ]c . For example,
L = {a(bc), a(bd), e(bc)} is not super-local, even though enc(L) is local.
A mapping π : Σ → Σ 0 is called a projection. A projection π is extended to a
function from TΣ to TΣ 0 and to a function from P(TΣ ) to P(TΣ 0 ) in obvious
ways.
Lemma 3. Let L ⊆ TΣ be a local set. There exist a finite alphabet Σ 0 , a superlocal set L0 ⊆ TΣ 0 , and a projection π : Σ 0 → Σ such that L = π(L0 ). Moreover,
π maps L0 bijectively to L.
Proof. Let T ∈ L. We change the label of each node v of T by a pair (c1 . . . cn , i),
where c1 . . . cn is the string of labels c1 , . . . , cn of the siblings of v, including v
itself, in the order from left to right, and i is the position of v among its siblings.
The relabeled trees obtained this way form a super-local set, and we can get
back the original trees by a projection.
Formally, let9
Σ 00 = { (w, i) | w ∈ Σ + , 1 ≤ i ≤ |w| },
and define a projection π : Σ 00 → Σ by
π((c1 . . . cn ), i) = ci .
Suppose that A, Z ⊆ Σ and I ∈ Σ×Σ + are finite sets such that L = Loc(A, Z, I).
Let
F = A ∪ { w ∈ Σ + | (c, w) ∈ I },
Σ 0 = { (w, i) ∈ Σ 00 | w ∈ F }.
Note that Σ 0 is a finite subset of Σ 00 . Let
A0 = { (c, 1) | c ∈ A },
Z 0 = { (c1 . . . cn , i) ∈ Σ 0 | ci ∈ Z },
K = { ((d1 . . . dl , i), (c1 . . . cn , 1)) | (d1 . . . dl , i) ∈ Σ 0 , (di , c1 . . . cn ) ∈ I }
Y = { (c1 . . . cn , i) ∈ Σ 0 | i = n },
J = { ((c1 . . . cn , i), (c1 . . . cn , i + 1)) | (c1 . . . cn , i) ∈ Σ 0 , i ≤ n − 1 }.
9
If w is a string, we write |w| for the length of w. We use | · | both for the length of
a string and for the cardinality of a set. The context should make it clear which is
intended.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
11
These sets are all finite. Let L0 ⊆ TΣ 00 be the super-local set defined by L0 =
SLoc(A0 , Z 0 , K, Y, J). Clearly, L0 ⊆ TΣ 0 . We show that L0 and π (restricted to
Σ 0 ) satisfy the required properties.
For each T = (T, `T ) ∈ TΣ , define a tree T̂ = (T, `T̂ ) ∈ TΣ 00 by
`T̂ (ε) = (`T (ε), 1),
(1)
`T̂ (u · 2 · 1i−1 ) = (`T (u · 2)`T (u · 2 · 1) . . . `T (u · 2 · 1n−1 ), i)
if u · 2 · 1
n−1
∈T−
dom(≺T1 )
(2)
and 1 ≤ i ≤ n.
It is clear that π(T̂ ) = T for all T ∈ TΣ . Our goal is to show
L0 = { T̂ | T ∈ L }.
This clearly implies that π is a bijection from L0 to L.
We begin by showing that for all T ∈ TΣ ,
T ∈ L if and only if T̂ ∈ L0 .
(3)
This follows from five observations. Firstly, note the following:
– Suppose u ∈ T − dom(≺T1 ). Then `T̂ (u) is of the form (c1 . . . cn , n), which
means that `T̂ (u) ∈ Y if `T̂ (u) ∈ Σ 0 .
– Suppose u ≺T1 v. Then (`T̂ (u), `T̂ (v)) is of the form ((c1 . . . cn , i), (c1 . . . cn , i+
1)), which means that (`T̂ (u), `T̂ (v)) ∈ J if `T̂ (u) ∈ Σ 0 .
Thus, T̂ satisfies the last two conditions SL4 and SL5 for membership in
SLoc(A0 , Z 0 , K, Y, J) whenever T̂ ∈ TΣ 0 . Secondly, the following biconditional
always holds:
– `T (ε) ∈ A if and only if `T̂ (ε) ∈ A0 .
Thirdly, the following biconditional holds whenever `T (v) ∈ Σ 0 :
– `T (v) ∈ Z if and only if `T̂ (v) ∈ Z 0 .
Fourthly, if u · 2 · 1n−1 ∈ T − dom(≺T1 ) and `T̂ (u) ∈ Σ 0 , then the following
biconditional holds:
– (`T (u), `T (u · 2)`T (u · 2 · 1) . . . `T (u · 2 · 1n−1 )) ∈ I if and only if (`T̂ (u), `T̂ (u ·
2)) ∈ K.
Lastly, it is easy to see that T ∈ L implies T̂ ∈ TΣ 0 . Combining these five
observations, we get (3).
It follows from the “only if” direction of (3) that { T̂ | T ∈ L } ⊆ L0 . To
establish the converse inclusion, we show that
if T 0 ∈ L0 and T = π(T 0 ), then T 0 = T̂ .
12
Makoto Kanazawa
This together with the “if” direction of (3) clearly implies L0 ⊆ { T̂ | T ∈ L }.
0
So suppose T 0 = (T, `T ) ∈ L0 , and let T = (T, `T ) = π(T 0 ). All we need
to show is that the equations (1) and (2) hold with T 0 in place of T̂ . As for
0
(1), it follows from the fact that `T (ε) ∈ A0 . As for (2), suppose u · 2 · 1n−1 ∈
0
0
0
T − dom(≺T2 ). Since (`T (u), `T (u · 2)) ∈ K, we have `T (u · 2) = (c1 . . . cm , 1) for
0
0
some c1 . . . cm ∈ F . Since for all i ≤ n − 1 we must have (`T (u · 2 · 1i−1 ), `T (u ·
0
2 · 1i )) ∈ J, we get `T (u · 2 · 1i−1 ) = (c1 . . . cm , i) ∈ Σ 0 for i = 1, . . . , n. This
0
implies n ≤ m. But `T (u · 2 · 1n−1 ) = (c1 . . . cm , n) must be in Y , so m = n.
Since π((c1 . . . cn , i)) = ci = `T (u · 2 · 1i−1 ), it follows that (2) holds with T 0 in
place of T̂ .
t
u
We assume the standard definition of the yield function y : TΣ → Σ + . Using
the tem notation for unranked trees, we can define it as follows:
y(c) = c,
y(c(t1 . . . tn )) = y(t1 ) . . . y(tn ).
As is well known, a set of strings is a context-free language if and only if it is
the yield image of a local set of trees.
Let us call a tree T = (T, `) ∈ TΣ disjointly labeled with Σ0 , Σ1 if (i) Σ0
and Σ1 are disjoint subsets of Σ, (ii) u ∈ dom(≺T2 ) implies `(u) ∈ Σ1 , and (iii)
u ∈ T − dom(≺T2 ) implies `(u) ∈ Σ0 . Let Σ0 , Σ1 be disjoint sets and let
1
TΣ
Σ0 = { T ∈ TΣ0 ∪Σ1 | T is disjointly labeled with Σ0 , Σ1 }.
Σ1
+
1
On TΣ
Σ0 , the yield function y : TΣ0 → Σ0 can be expressed as the composition
y = hΣ0 ,Σ1 ◦ enc
of the string encoding function enc and an alphabetic homomorphism
hΣ0 ,Σ1 : (ΓΣ0 ∪Σ1 )∗ → Σ0∗ defined by
(
hΣ0 ,Σ1 ([c ) =
c if c ∈ Σ0 ,
ε if c ∈ Σ1 ,
hΣ0 ,Σ1 (]c ) = ε.
Lemma 4. Let L ⊆ Σ ∗ be a context-free language. There exist a set Υ disjoint
from Σ and a local set K ⊆ TΥΣ such that L = y(K) = hΣ,Υ (enc(K)).
Proof. Let G = (N, Σ, P, S) be a context-free grammar for L. Clearly, the parse
10
trees of G form a local subset K of TN
t
u
Σ , and L = y(K) = hΣ,N (enc(K)).
10
This also follows from the fact that a local set of trees is always obtained from a
local set of disjointly labeled trees by a projection that does not change the labels
of leaves.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
13
Conversely, h(enc(K)) is always a context-free language whenever K is a local
set of trees and h is a homomorphism.
A projection π : Σ 0 → Σ induces a projection π
b : ΓΣ 0 → ΓΣ in an obvious
way:
π
b([c ) = [π(c) ,
π
b(]c ) = ]π(c) .
Lemma 5. Let π : Σ 0 → Σ be a projection and L ⊆ TΣ 0 . Then enc(π(L)) =
π
b(enc(L)).
We can use Lemmas 2 through 5 to show that every context-free language
L can be represented as L = h(R ∩ Dn0 ) with some alphabetic homomorphism
h and local set R. The Chomsky-Schützenberger theorem, however, is stated in
terms of the Dyck language Dn rather than the set Dn0 of Dyck primes. The
following lemma bridges the representation in terms of Dn0 and that in terms of
Dn .
Lemma 6. Let L ⊆ ΓΣ+ be a local set of strings. Then there exist a finite
alphabet Σ 0 , a projection π : Σ 0 → Σ, and a local set L0 ⊆ ΓΣ+0 such that
0
0
.
=π
b(L0 ∩ DΣ 0 ). Moreover, π̂ maps L0 ∩ DΣ 0 bijectively to L ∩ DΣ
L ∩ DΣ
Proof. Let A, Z ⊆ ΓΣ , I ⊆ ΓΣ2 be finite sets such that L = AΓΣ∗ ∩ ΓΣ∗ Z −
(ΓΣ∗ (ΓΣ2 − I)ΓΣ∗ ). We may assume without loss of generality that Σ is finite. Let
Σ 0 = Σ ∪ { c̄ | c ∈ Σ }. Let
A0 = { [c̄ | [c ∈ A },
Z 0 = { ]c̄ | ]c ∈ Z },
I 0 = I ∪ { [c̄ d | [c d ∈ I } ∪ { d ]c̄ | d ]c ∈ I } ∪ { [c̄ ]c̄ | [c ]c ∈ I },
and put
L0 = A0 ΓΣ∗ 0 ∩ ΓΣ∗ 0 Z 0 − (ΓΣ∗ 0 (ΓΣ2 0 − I 0 )ΓΣ∗ 0 ).
Define π : Σ 0 → Σ by
π(c) = c,
π(c̄) = c
0
0
0
0
0
for each c ∈ Σ. Then it is clear that π
b(DΣ
0 ) = DΣ and L ∩DΣ 0 = L ∩DΣ 0 . Also,
0
0 0
since π
b maps A , Z , I to A, Z, I, respectively, it is easy to see that π
b(L0 ) ⊆ L.
0
0
0
This establishes π
b(L ∩ DΣ 0 ) ⊆ L ∩ DΣ . Now suppose w ∈ L ∩ DΣ
. Then
0
0
w = [c v ]c for some c ∈ Σ and v ∈ DΣ . Let w = [c̄ v ]c̄ . Then w0 ∈ DΣ
0
0
and π
b(w ) = w. Since w ∈ L, [c ∈ A and ]c ∈ Z, so it follows that [c̄ ∈ A0
and ]c̄ ∈ Z 0 . It is also easy to see that if a1 a2 is a substring of w0 , a1 a2 ∈ I 0 .
0
0
Therefore, w0 ∈ L0 , and this shows that L ∩ DΣ
⊆π
b(L0 ∩ DΣ
b(L0 ∩ DΣ 0 ).
0) = π
0
0
It is also easy to see that w is the unique element of L that π
b maps to w. t
u
Lemma 7. If L ⊆ TΣ is a local set, then there exist a finite alphabet Σ 0 , a
projection π : Σ 0 → Σ, and a local set L0 ⊆ ΓΣ+0 such that enc(L) = π
b(L0 ∩DΣ 0 ).
−1
0
Moreover, enc ◦ π
b maps L ∩ DΣ 0 bijectively to L.
14
Makoto Kanazawa
Proof. By Lemma 3, there exist a projection π1 : Σ1 → Σ and a super-local
set L1 ⊆ TΣ1 such that L = π1 (L1 ) and π1 is a bijection from L1 to L. By
0
Lemma 2, there exists a local set L2 ⊆ ΓΣ∗ 1 such that enc(L1 ) = L2 ∩ DΣ
. By
1
0
0
∗
Lemma 6, there exist a projection π3 : Σ → Σ1 and a local set L ⊆ ΓΣ 0 such
0
0
that L2 ∩ DΣ
=π
c3 (L0 ∩ DΣ 0 ) and π
c3 is a bijection from L0 ∩ DΣ 0 to L2 ∩ DΣ
.
1
1
By Lemma 5, enc(L) = enc(π1 (L1 )) = π
c1 (enc(L1 )). Since enc is injective, π
c1
is a bijection from enc(L1 ) to enc(L). Taking these all together, we get
enc(L) = enc(π1 (L1 ))
=π
c1 (enc(L1 ))
0
=π
c1 (L2 ∩ DΣ
)
1
=π
c1 (c
π3 (L0 ∩ DΣ 0 ))
= (c
π1 ◦ π
c3 )(L0 ∩ DΣ 0 )
=π
b(L0 ∩ DΣ 0 ),
0
where π = π1 ◦ π3 . Since π
c3 is a bijection from L0 ∩ DΣ 0 to L2 ∩ DΣ
= enc(L1 )
1
and π
c1 is a bijection from enc(L1 ) to enc(L), π
b is a bijection from L0 ∩ DΣ 0 to
enc(L), and the second statement of the lemma follows.
t
u
Theorem 8 (Chomsky and Schützenberger). A language L ⊆ Σ ∗ is
context-free if and only if there exist a positive integer n, a local set R ⊆ Γn+ ,
and an alphabetic homomorphism h : Γn∗ → Σ ∗ such that L = h(R ∩ Dn ).
Proof. The “if” direction is by standard closure properties of the context-free
languages. For the “only if” direction, let L ⊆ Σ ∗ be a context-free language.
Then Lemma 4 gives an alphabet Υ disjoint from Σ and a local set K ⊆ TΣ,Υ
such that L = hΣ,Υ (enc(K)). By Lemma 7, there are a projection π : Υ 0 → Σ ∪Υ
and a local set R ⊆ ΓΥ+0 such that enc(K) = π
b(R ∩ DΥ 0 ). We have
L = hΣ,Υ (enc(K))
= hΣ,Υ (b
π (R ∩ DΥ 0 )),
so the required condition holds with n = |Υ 0 | and h = hΣ,Υ ◦ π̂.11
t
u
In the proof of Theorem 8, enc−1 ◦b
π is a bijection from R∩DΥ 0 to K. (See the
second statement in Lemma 7.) If K is the set of derivation trees of a contextfree grammar for L, then an element s of R ∩ DΥ 0 represents both the element
t = enc−1 (b
π (s)) of K and the element hΣ,Υ (b
π (s)) = hΣ,Υ (enc(t)) = y(t) of L.
Moreover, every pair (t, y(t)) with t ∈ K is so represented. This is an important
consequence of the theorem explicitly noted by Chomsky [4, page 377], though
rarely emphasized since.12
11
12
Here, |Υ 0 | denotes the cardinality of the set Υ 0 . See footnote 9.
Instead of a super-local set of trees, Chomsky [4] used the notion of a modified normal
grammar, a restricted kind of grammar in Chomsky normal form.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
15
We took a rather long route to the Chomsky-Schützenberger theorem. Our
generalization of the theorem to multi-dimensional tree languages follows a similar path, except that an analogue of Lemma 4 is not needed, since the multidimensional counterpart of the function enc is not exactly a generalization of
the usual notion.
4
Multi-dimensional Trees
Multi-dimensional trees were introduced by Rogers [24,23]. In an ordinary (labeled, ordered unranked) tree, the set of children of a node forms a linearly ordered sequence of labeled nodes, i.e., a string. In an m-dimensional tree (m ≥ 1),
the set of children of a node (if non-empty) forms an (m − 1)-dimensional tree.
A 0-dimensional tree just consists of a single labeled node.
Unlike Rogers [24,23], who introduces the higher-dimensional tree as a new
kind of object, we prefer to define an m-dimensional tree as a special kind of
ordinary m-ary tree.13 The first-child-next-sibling encoding of unranked trees
will be a special case of this definition for m = 2.
We use finite strings of elements of [1, m] = {1, . . . , m} to represent nodes of
m-ary trees. We write u · v for the concatenation of finite strings u, v over [1, m],
and write ε for the empty string.
An m-ary tree domain is any non-empty, finite, prefix-closed subset of [1, m]∗ .
(Since ∅∗ = {ε}, the only 0-ary tree domain is {ε}.) If T is an m-ary tree domain,
we write u ≺Ti v to mean u, v ∈ T and u · i = v. If Σ is a (possibly infinite) set of
symbols, an m-ary tree over Σ is a pair (T, `), where T is an m-ary tree domain
and ` is a function from T to Σ.
If T = (T, `T ) is an m-ary tree and U ⊆ T is an m-ary tree domain, then
the restriction of T to U is the m-ary tree
T U = (U, `T U ).
When u ∈ T , let
T /u = { v | uv ∈ T }.
Then T /u is an m-ary tree domain and the subtree of T rooted at u is defined
by
T /u = (T /u, `),
where `(v) = `T (uv).
Recall that a first-child-next-sibling encoding of an unranked tree is a binary
tree (T, `) such that 1 6∈ T . Analogously, an m-dimensional tree is an m-ary tree
(T, `) such that T is an m-ary tree domain included in a certain special subset of
13
To be precise, our m-dimensional trees form a special class of m-ary cardinal trees
in the sense of Benoit et al. [2]. In m-ary cardinal trees, each node has m slots for
children, each of which may or may not be occupied, independently of the other
slots. Cardinal trees are also known as tries.
16
Makoto Kanazawa
[1, m]∗ . For each natural number m, define a subset Pm of [1, m]∗ by recursion,
as follows:
P0 = {ε},
∗
Pm = (m · Pm−1 )
for m > 1.
In other words, w ∈ Pm if and only if w = i · v implies i = m and w = u · i · j · v
implies j ≥ i − 1. For m ≥ 0, an m-dimensional tree (over Σ) is an m-ary tree
(over Σ) T = (T, `T ) such that T ⊆ Pm . For m ≥ 1, an m-dimensional hedge
(over Σ) is an m-ary tree (over Σ) T = (T, `T ) such that T ⊆ Pm−1 · Pm . We
m
write Tm
Σ and HΣ to denote the set of all m-dimensional trees over Σ and the set
of all m-dimensional hedges over Σ, respectively. For m ≥ 1, all m-dimensional
trees are m-dimensional hedges. Note that a 1-dimensional hedge is just a 1dimensional tree.
Note that a 0-dimensional tree is a structure Tc = ({ε}, {(ε, c)}) consisting
of a single node labeled by some c ∈ Σ. We may identify Tc with c; under this
n
0
convention, T0Σ = Σ. Note that if m 6= n, then Tm
Σ ∩ TΣ = TΣ .
A 1-dimensional tree is a non-empty, linearly ordered sequence of labeled
nodes. We may use a string c1 . . . cn ∈ Σ + to denote the 1-dimensional tree
Tc1 ...cn = ({ε, 1, . . . , 1n−1 }, `), where `(1i−1 ) = ci for i = 1, . . . , n. Under this
convention, T1Σ = Σ + .
The first-child-next-sibling encodings of unranked trees and unranked hedges
coincide with the 2-dimensional trees and the 2-dimensional hedges, respectively;
we have T2Σ = TΣ .
Henceforth, we use T , T 0 , U , etc., as variables ranging over m-dimensional
trees and m-dimensional hedges. Unless we indicate otherwise, we assume T =
0
(T, `T ), T 0 = (T 0 , `T ), U = (U, `U ), etc.
Let T be an m-dimensional hedge. We can see that if u ∈ T , then
ST i (T , u) = (T /u) { v ∈ Pi | uv ∈ T }
is always an i-dimensional tree, and
SH i (T , u) = (T /u) { v ∈ Pi−1 · Pi | uv ∈ T }
is always an i-dimensional hedge. In particular, when u · m ∈ T , the subtree
T /(u · m) = SH m (T , u · m) is always an m-dimensional hedge.
For i ≥ 1, we write u CTi v to mean
v ∈ T ∩ u · i · Pi−1 .
When u CTi v, we say that v is a child of u in the i-th dimension (in T ). If
u ≺Ti v, then v is the first child of u in the i-th dimension. Define
CiT (u) = { v ∈ Pi−1 | u · i · v ∈ T } = { v | u CTi u · i · v }.
If u · i 6∈ T , that is, if u 6∈ dom(≺Ti ), then CiT (u) = ∅. If u · i ∈ T , define
CiT (u) = ST i−1 (T , u · i) = T /(u · i) CiT (u).
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
17
Then CiT (u) is always an (i − 1)-dimensional tree.
We assume that elements of [1, m]∗ are alphabetically ordered, with k + 1
alphabetically preceding k. We write u CTi,j v to mean v is the j-th node, under
this ordering, among the children of u in the i-th dimension. The degree of a
node v ∈ T is the number of children of v in the m-th (i.e., highest) dimension.
A subset of Tm
Σ is an m-dimensional tree language. We allow Σ to be an
infinite set, but are usually interested in m-dimensional tree languages over some
finite subset of Σ.
We call a set L ⊆ Tm
Σ degree-bounded if there exists a k such that for all
T ∈ L and for all v ∈ T , the degree of v does not exceed k.
It is sometimes helpful to use term-like notations for m-dimensional hedges
and trees. Let P be an (m − 1)-ary tree domain included in Pm−1 (i.e., a finite,
non-empty, prefix-closed subset of Pm−1 ), and let u1 , . . . , uk be the elements of
P , in alphabetical order (which implies u1 = ε). If T1 , . . . , Tk ∈ Tm
Σ , then we
write
P (T1 , . . . , Tk )
to denote an m-dimensional hedge U = (U, `U ) ∈ Hm
Σ such that
U=
k
[
ui · Ti ,
ST m (U , ui ) = Ti .
i=1
m
(As a degenerate case, we have {ε}(T ) = T for any T ∈ Tm
Σ .) If T ∈ HΣ and
c ∈ Σ, then we write
c −m T
to denote an m-dimensional tree V = (V, `V ) ∈ Tm
Σ such that
V = {ε} ∪ m · T,
`V (ε) = c,
SH m (V , m) = T .
Combining the two notations,
c −m P (T1 , . . . , Tk )
T
denotes the m-dimensional tree T = (T, `T ) such that `T (ε) = c, Cm
(ε) = P ,
and ST m (T , ui ) = Ti , where u1 , . . . , uk is the alphabetical listing of the elements
of P .14
Example 9. Derivation trees of a simple context-free tree grammar G =
(N, Σ, P, S) can be represented as 3-dimensional trees over the alphabet N ∪ Σ.
In these 3-dimensional trees, a node has children in the third dimension if and
only if it is labeled by a nonterminal. For instance, the derivation tree π1 (π2 (π3 ))
of the grammar from Example 1 may be represented by the 3-dimensional tree
T in Fig. 1. In this tree, the node labeled by S is the root; the edges in the
third dimension are colored red, those in the second dimension blue, and those
in the first dimension green. For instance, the node 3 (i.e., the child of the root
14
An equivalent notation for m-dimensional trees has been used by Kasprzik [14].
18
Makoto Kanazawa
S
B
e
e
h
B
a1
a5
h
a6
x
a4
a3
h
x
a2
g
x
x
Fig. 1. A derivation tree of a simple context-free tree grammar represented as a 3dimensional tree.
in the third dimension) is labeled by the nonterminal B, and its children in the
third dimension form a 2-dimensional tree corresponding to the right-hand side
of the rule π2 = B(x1 x2 ) → h(a1 B(h(a2 x1 a3 )h(a4 x2 a5 ))a6 ). The numbering of
variables in the rules are eschewed in favor of a single variable x; the alphabetic
ordering of the nodes labeled by x among the children in the third dimension of
a nonterminal-labeled node is assumed to correspond to the numbering.15 This
tree can be represented in the term notation as follows (omitting the dots in the
strings over {1, 2} representing nodes):
S −3 {ε, 2, 21}(
B −3 {ε, 2, 21, 212, 2122, 21221, 212211, 2121, 21212, 212121, 2121211, 211}(
h, a1 , B −3 {ε, 2, 21}(g, x, x), h, a2 , x, a3 , h, a4 , x, a5 , a6
),
e,
e
).
We have
C3T (3 · 3 · 2 · 1) = g −2 {ε, 1}(x, x)
= g(xx),
C2T (3
· 3 · 2 · 1) = h −1 h
= hh,
15
It is known that simple context-free tree grammars satisfying this condition constitute a normal form. This use of a single variable instead of numbered variables is
not crucial for our purposes.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
19
using both the notation introduced just above and the standard term and string
representations for 2-dimensional and 1-dimensional trees.
5
Local and Super-local Sets of Multi-dimensional Trees
If A, Z ⊆ Σ and I ⊆ Σ × Tm−1
are finite sets, we let Locm (A, Z, I) denote
Σ
the set of all m-dimensional trees T = (T, `T ) in Tm
Σ that satisfy the following
conditions:
L1. `T (ε) ∈ A,
L2. v ∈ T − dom(≺Tm ) implies `T (v) ∈ Z, and
T
L3. v ∈ dom(≺Tm ) implies (`T (v), Cm
(v)) ∈ I.
m−1
A set L ⊆ Tm
Σ is local [24,23] if there exist finite sets A, Z ⊆ Σ and I ⊆ Σ ×TΣ
m
m
such that L = Loc (A, Z, I). Note that if L ⊆ TΣ is local, then L must be
degree-bounded; for, if L = Locm (A, Z, I), the degree of any node v of T ∈ L is
bounded by the maximal size of U such that (c, U ) ∈ I for some c. Clearly, the
notion of locality coincides with the usual notion [22,33,31] when m ∈ {1, 2}.
Let m ≥ 2. Write N+ for N−{0} (the set of positive integers). If A, Z, Y ⊆ Σ,
K ⊆ Σ × Σ, and J ⊆ Σ × { P ⊆ Pm−2 | P is an (m − 2)-ary tree domain } ×
N+ × Σ are finite sets, then we let SLocm (A, Z, K, Y, J) denote the set of all
trees T = (T, `T ) in Tm
Σ that satisfy the following conditions:
SL1.
SL2.
SL3.
SL4.
SL5.
`T (ε) ∈ A,
v ∈ T − dom(≺Tm ) implies `T (v) ∈ Z,
u ≺Tm v implies (`T (u), `T (v)) ∈ K,
v 6= ε and v ∈ T − dom(≺Tm−1 ) imply `T (v) ∈ Y , and
T
u ∈ dom(≺Tm−1 ) and u CTm−1,i v imply (`T (u), Cm−1
(u), i, `T (v)) ∈ J.
We call a set L ⊆ Tm
Σ super-local if there exist finite sets A, Z, Y ⊆ Σ, K ⊆ Σ×Σ,
and J ⊆ Σ × { P ⊆ Pm−2 | P is an (m − 2)-ary tree domain } × N+ × Σ such
that L = SLocm (A, Z, K, Y, J). For m = 2, Pm−2 = P0 = {ε}, and u CT1,i v only
if i = 1, so this definition generalizes our earlier definition of super-locality for
subsets of TΣ = T2Σ . It is easy to see that a degree-bounded super-local language
must be local. Although we allow Σ to be infinite, any local or super-local set
L ⊆ Tm
Σ must be an m-dimensional tree language over some finite subset of Σ.
Projections from Σ to Σ 0 are naturally extended to m-dimensional trees and
hedges over Σ and to m-dimensional tree languages over Σ. The next lemma
generalizes Lemma 3 to the higher-dimensional case. The proofs of two lemmas
to follow (Lemmas 11 and 35) will be adaptations of the proof of this lemma.
Lemma 10. Let m ≥ 2. For any local m-dimensional tree language L ⊆ Tm
Σ,
there exist a finite alphabet Σ 0 , a degree-bounded, super-local m-dimensional tree
0
0
language L0 ⊆ Tm
Σ 0 , and a projection π : Σ → Σ such that L = π(L ). Moreover,
0
π maps L bijectively to L.
20
Makoto Kanazawa
Proof. The proof parallels that of Lemma 3. The idea is to change the label of
each non-root node v of T ∈ L to
T
(Cm
(u), v 0 ),
where u · m · v 0 = v and v 0 ∈ Pm−1 . For uniformity, we change the label of the
root from c ∈ Σ to (Tc , ε), where Tc = ({ε}, {(ε, c)}) is the single-node tree that
we identified with c. The relabeled m-dimensional trees obtained this way form
a super-local set, and we can get back the original m-dimensional trees by a
projection.
Let
Σ 00 = { (T , v) | T = (T, `T ) ∈ Tm−1
, v ∈ T },
Σ
and define a projection π : Σ 00 → Σ by
π((T , v)) = `T (v).
Suppose that A, Z ⊆ Σ and I ⊆ Σ × Tm−1
are finite sets such that L =
Σ
Locm (A, Z, I). Let
F = { Tc | c ∈ A } ∪ { T | (c, T ) ∈ I },
Σ 0 = { (T , v) ∈ Σ 00 | T ∈ F }.
Note that Σ 0 is a finite subset of Σ 00 . Now define
A0 = { (Tc , ε) | c ∈ A },
Z 0 = { (T , v) ∈ Σ 0 | `T (v) ∈ Z },
K = { ((T , v), (T 0 , ε)) | (T , v) ∈ Σ 0 , (`T (v), T 0 ) ∈ I },
Y = { (T , v) ∈ Σ 0 | v 6∈ dom(≺Tm−1 ) },
T
J = { ((T , u), Cm−1
(u), i, (T , v)) | (T , u) ∈ Σ 0 , u CTm−1,i v }.
0
These are all finite sets. Let L0 ⊆ Tm
Σ 00 be the super-local set defined by L =
m
0
0
0
0
m
SLoc (A , Z , K, Y, J). It is quite clear that L ⊆ TΣ 0 . We show that L and π
(restricted to Σ 0 ) satisfy the required properties.
T̂
m
For each T ∈ Tm
Σ , define an m-dimensional tree T̂ = (T, ` ) ∈ TΣ 00 by
`T̂ (ε) = (T`T (ε) , ε),
T
`T̂ (u · m · v) = (Cm
(u), v),
(4)
T
if u ∈ dom(≺Tm ) and v ∈ Cm−1
(u).
(5)
It is clear that π(T̂ ) = T for all T ∈ Tm
Σ . Our goal is to show
L0 = { T̂ | T ∈ L }.
This clearly implies that π is a bijection from L0 to L.
We show that for all T ∈ Tm
Σ,
T ∈ L if and only if T̂ ∈ L0 .
This follows from five observations. Firstly, note the following:
(6)
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
21
– Suppose u ∈ T − dom(≺Tm−1 ). If `T̂ (u) = (U , v), then v 6∈ dom(≺U
m−1 ). This
means that `T̂ (u) ∈ Y if `T̂ (u) ∈ Σ 0 .
– Suppose s CTm u = s · m · u0 CTm−1,i v = u · (m − 1) · v 0 . Then we have
C T (s)
m
u0 Cm−1,i
u0 · (m − 1) · v 0 and
T
`T̂ (u) = (Cm
(s), u0 ),
T
`T̂ (v) = (Cm
(s), u0 · (m − 1) · v 0 ),
C T (s)
T
m
Cm−1
(u) = Cm−1
(u0 ).
T
This means that (`T̂ (u), Cm−1
(u), i, `T̂ (v)) ∈ J if `T̂ (u) ∈ Σ 0 .
Thus, T̂ satisfies the last two conditions SL4 and SL5 for membership in
SLocm (A0 , Z 0 , K, Y, J) whenever T̂ ∈ Tm
Σ 0 . Secondly, the following biconditional
always holds:
– `T (ε) ∈ A if and only if `T̂ (ε) ∈ A0 .
Thirdly, the following biconditional holds whenever `T (v) ∈ Σ 0 :
– `T (v) ∈ Z if and only if `T̂ (v) ∈ Z 0 .
Fourthly, if u ≺Tm v and `T̂ (u) ∈ Σ 0 , then the following biconditional holds:
T
(u)) ∈ I if and only if (`T̂ (u), `T̂ (v)) ∈ K.
– (`T (u), Cm
Lastly, it is easy to see that T ∈ L implies T̂ ∈ Tm
Σ 0 . Combining these five
observations, we get (6).
It follows from the “only if” direction of (6) that { T̂ | T ∈ L } ⊆ L0 . To
establish the converse inclusion, we show that
if T 0 ∈ L0 and T = π(T 0 ), then T 0 = T̂ .
This together with the “if” direction of (6) clearly implies L0 ⊆ { T̂ | T ∈ L }.
0
So suppose T 0 = (T, `T ) ∈ L0 , and let T = (T, `T ) = π(T 0 ). All we need to
show is that the equations (4) and (5) hold with T 0 in place of T̂ .
0
As for (4), it follows from the fact that `T (ε) ∈ A0 .
0
0
T
T0
As for (5), suppose u ∈ dom(≺m ). Since (` (u), `T (u·m)) ∈ K, `T (u·m) =
(V , ε) for some V ∈ F . We show two things:
0
T
v ∈ Cm
(u) implies v ∈ V and `T (u · m · v) = (V , v).
V ⊆
T
Cm
(u).
(7)
(8)
T
T
It then easily follows that V = Cm
(u) and (5) holds whenever v ∈ Cm
(u).
T
We show (7) by induction on v ∈ Cm (u). For v = ε, we already know that
0
ε ∈ V and `T (u · m) = (V , ε). If v 6= ε, we can write v = v 0 · (m − 1) · v 00
T
with v 00 ∈ Pm−2 . Suppose u · m · v 0 CTm−1,i u · m · v. Since v 0 ∈ Cm
(u), by
22
Makoto Kanazawa
0
induction hypothesis, v 0 ∈ V and `T (u · m · v 0 ) = (V , v 0 ). Since T 0 ∈ L0 ,
0
0
T
(`T (u · m · v 0 ), Cm−1
(u · m · v 0 ), i, `T (u · m · v)) ∈ J. By the definition of J, we
T
V
must have Cm−1
(u · m · v 0 ) = Cm−1
(v 0 ), which implies v 0 CVm−1,i v ∈ V . The
T0
definition of J then implies ` (u · m · v) = (V , v).
Having established (7), we proceed to show (8) by induction on v ∈ V . For
T
v = ε, we have ε ∈ Cm
(u) since u ∈ dom(≺Tm ). If v = v 0 · (m − 1) · v 00 with
00
0
T
v ∈ Pm−2 , then v ∈ Cm
(u) by induction hypothesis. Since v 0 ∈ dom(≺Vm−1 ),
0
`T (u·m·v 0 ) = (V , v 0 ) 6∈ Y . Since T 0 ∈ L0 , this means that u·m·v 0 ∈ dom(≺Tm−1 )
0
0
T
and so (`T (u · m · v 0 ), Cm−1
(u · m · v 0 ), 1, `T (u · m · v 0 · (m − 1))) ∈ J. Since
0
T
V
`T (u · m · v 0 ) = (V , v 0 ), the definition of J implies Cm−1
(u · m · v 0 ) = Cm−1
(v 0 ).
T
Since v = v 0 · (m − 1) · v 00 ∈ V , it follows that v 00 ∈ Cm−1
(u · m · v 0 ) and hence
T
v = v 0 · (m − 1) · v 00 ∈ Cm
(u).
This concludes the proof of the lemma.
t
u
6
Encoding and Yield at Higher Dimensions
In order to prove an analogue of the Chomsky-Schützenberger theorem for the
m-dimensional yields of local (m + 1)-dimensional tree languages, we need to
define the higher-dimensional counterparts of the mappings enc, y, and of the
Dyck languages. Since we use 3-dimensional trees to represent derivation trees
of simple context-free tree grammars, the yield function mapping 3-dimensional
trees to 2-dimensional trees must be consistent with the relation between derivation trees and their tree yield of simple context-free tree grammars.
We set aside a special symbol x and use it to extend a given set Σ of symbols. The intended role of x in m-dimensional trees over Σ ∪ {x} is that of a
placeholder; the encoding function erases all occurrences of x. We write Tm
Σ (n) to
denote the set of m-dimensional trees in Tm
in
which
x
labels
exactly
n
nodes
Σ∪{x}
and none of these nodes have a child in the m-th dimension. Let T ∈ Tm
Σ (n),
T1 , . . . , Tn ∈ Tm
,
and
let
u
,
.
.
.
,
u
be
the
nodes
of
T
labeled
by
x,
in
the
1
n
Σ∪{x}
alphabetical order. Then we write
T [T1 , . . . , Tn ]
for the tree T 0 ∈ Tm
Σ∪{x} such that
T 0 = T ∪ u1 · T1 ∪ · · · ∪ un · Tn ,
(
0
`T (v) if v ∈ T − {u1 , . . . , un },
`T (v) = Ti 0
` (v ) if v = ui · v 0 .
T
Given an m-dimensional hedge T ∈ Hm
Σ∪{x} , define a binary relation Jm,i on
T for each positive integer i, as follows: u JT
m,i v if and only if v is alphabetically
T
the i-th node in { w | u Cm w, ` (w) = x }.
Let m ≥ 2. An m-dimensional hedge T ∈ Hm
Σ∪{x} is well-labeled if
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
23
– for all v ∈ T , `T (v) = x implies v 6∈ dom(≺Tm ) ∪ dom(≺Tm−1 ), and
m−1
T
– for all v ∈ dom(≺Tm ), Cm
(v) ∈ TΣ
(n) implies |Cm−1 (v)| = n.
We write Hm
Σ,x to denote the class of well-labeled m-dimensional hedges over
T
Σ ∪ {x}. If T ∈ Hm
Σ,x , then for each node v ∈ dom(≺m ), there is a bijection
T
T
between { u ∈ T | v Cm u and ` (u) = x } and { u ∈ T | v CTm−1 u }, namely,
S
T
−1
◦ CTm−1,i ). We write Hm
Σ,x (n) for
i≥1 ((Jm,i )
T
{ T ∈ Hm
Σ,x | there are exactly n nodes v ∈ T ∩ Pm−1 such that ` (v) = x }.
Note that if T ∈ Hm
Σ,x (n), T may have more than n nodes labeled by x.
m
m
We write Tm
Σ,x for HΣ,x (0) ∩ TΣ∪x . We will give suitable definitions of encoding and yield for elements of Tm
Σ,x shortly. Before that, here is a variant
of Lemma 10 for languages consisting of well-labeled m-dimensional trees. If
π : Σ 0 → Σ is a projection, we extend it to a projection π : Σ 0 ∪ {x} → Σ ∪ {x}
by letting π(x) = x.
Lemma 11. Let m ≥ 2. For any local m-dimensional tree language L ⊆ Tm
Σ,x ,
there exist a finite alphabet Σ 0 , a degree-bounded, super-local m-dimensional
0
0
tree language L0 ⊆ Tm
Σ 0 ,x , and a projection π : Σ → Σ such that L = π(L ).
0
0
Moreover, π maps L bijectively L to L.
Proof. Since the case where L ⊆ Tm
Σ is covered by Lemma 10, we assume L 6⊆
.
Without
loss
of
generality,
we
may
assume that L = Locm (A, Z, I), for some
Tm
Σ
m−1
A ⊆ Σ, Z ⊆ Σ ∪ {x}, I ⊆ Σ × TΣ∪{x} such that
– x ∈ Z,
– (c, T ) ∈ I implies c ∈ Σ and `T (v) ∈ Σ for all v ∈ dom(≺Tm−1 ), and
– there exist (c, T ) ∈ I and v ∈ T such that `T (v) = x.
We modify the construction in the proof of Lemma 10 slightly. The difference
is that where (T , v) would appear in the earlier construction, x appears instead
just in case `T (v) = x. Otherwise, the proof is essentially the same.
The definition of Σ 00 is changed as follows:
T
Σ 00 = { (T , v) | T ∈ Tm−1
Σ∪x , v ∈ T, ` (v) ∈ Σ }.
The definition of π : Σ 00 → Σ remains the same:
π((T , v)) = `T (v).
As before, let
F = { Tc | c ∈ A } ∪ { T | (c, T ) ∈ I },
Σ 0 = { (T , v) ∈ Σ 00 | T ∈ F }.
The definitions of Z 0 , K, Y, J are modified as follows:
A0 = { (Tc , ε) | c ∈ A },
Z 0 = {x} ∪ { (T , v) ∈ Σ 0 | `T (v) ∈ Z − {x} },
24
Makoto Kanazawa
0
K = { ((T , v), (T 0 , ε)) | (T , v) ∈ Σ 0 , (`T (v), T 0 ) ∈ I, `T (ε) ∈ Σ } ∪
{ ((T , v), x) | (T , v) ∈ Σ 0 , (`T (v), Tx ) ∈ I },
Y = {x} ∪ { (T , v) ∈ Σ 0 | v 6∈ dom(≺Tm−1 ) },
T
(u), i, (T , v)) | (T , u) ∈ Σ 0 , u CTm−1,i v, `T (v) ∈ Σ } ∪
J = { ((T , u), Cm−1
T
{ ((T , u), Cm−1
(u), i, x) | (T , u) ∈ Σ 0 , u CTm−1,i v, `T (v) = x }.
These are finite sets. As before, let L0 ⊆ Tm
Σ 00 ∪{x} be the super-local set defined
0
0
0
by L = SLoc(A , Z , K, Y, J). It is easy to see that L0 ⊆ Tm
Σ 0 ∪{x} .
T̂
m
For each T ∈ Tm
Σ∪{x} , define an m-dimensional tree T̂ = (T, ` ) ∈ TΣ 00 ∪{x}
by
`T̂ (ε) = (T`T (ε) , ε),
(
T
(Cm
(u), v) if `T (u · m · v) ∈ Σ,
`T̂ (u · m · v) =
x
if `T (u · m · v) = x,
(9)
T
for v ∈ Cm−1
(u). (10)
It is clear that π(T̂ ) = T for all T ∈ Tm
Σ∪{x} . Our goal is to show
L0 = { T̂ | T ∈ L }.
This clearly implies that π is a bijection from L0 to L.
We show that for all T ∈ Tm
Σ∪{x} ,
T ∈ L if and only if T̂ ∈ L0 .
(11)
This follows from five observations. Firstly, note the following:
– Suppose u ∈ T − dom(≺Tm−1 ). If `T̂ (u) = (U , v), then v 6∈ dom(≺U
m−1 ). This
T̂
T̂
0
means that ` (u) ∈ Y if ` (u) ∈ Σ ∪ {x}.
– Suppose s CTm u = s · m · u0 CTm−1,i v = u · (m − 1) · v 0 . Then we have
C T (s)
m
u0 Cm−1,i
u0 · (m − 1) · v 0 and
(
T
(Cm
(s), u0 ) if `T (u) ∈ Σ,
` (u) =
x
if `T (u) = x,
(
T
(Cm
(s), u0 · (m − 1) · v 0 ) if `T (v) ∈ Σ,
`T̂ (v) =
x
if `T (v) = x,
T̂
C T (s)
T
m
Cm−1
(u) = Cm−1
(u0 ).
T
This means that (`T̂ (u), Cm−1
(u), i, `T̂ (v)) ∈ J if `T̂ (u) ∈ Σ 0 .
Thus, T̂ satisfies the last two conditions SL4 and SL5 for membership in
SLocm (A0 , Z 0 , K, Y, J) whenever T̂ ∈ Tm
Σ 0 ,x . Secondly, the following biconditional always holds:
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
25
– `T (ε) ∈ A if and only if `T̂ (ε) ∈ A0 .
Thirdly, the following biconditional holds whenever `T (v) ∈ Σ 0 ∪ {x}:
– `T (v) ∈ Z if and only if `T̂ (v) ∈ Z 0 .
Fourthly, if u ≺Tm v, `T̂ (u) ∈ Σ 0 , and either v 6∈ dom(≺Tm−1 ) or `T̂ (v) ∈ Σ 00 ,
then the following biconditional holds:
T
– (`T (u), Cm
(u)) ∈ I if and only if (`T̂ (u), `T̂ (v)) ∈ K.
Lastly, it is easy to see that T ∈ L implies T̂ ∈ Tm
Σ 0 ,x . Combining these five
observations, we get (11).
It follows from the “only if” direction of (11) that { T̂ | T ∈ L } ⊆ L0 . To
establish the converse inclusion, we show that
if T 0 ∈ L0 and T = π(T 0 ), then T 0 = T̂ .
This together with the “if” direction of (11) clearly implies L0 ⊆ { T̂ | T ∈ L }.
0
So suppose T 0 = (T, `T ) ∈ L0 , and let T = (T, `T ) = π(T 0 ). All we need to
show is that the equations (9) and (10) hold with T 0 in place of T̂ .
0
As for (9), it follows from the fact that `T (ε) ∈ A0 .
0
0
0
T
As for (10), suppose u ∈ dom(≺m ). Since (`T (u), `T (u·m)) ∈ K, `T (u·m) =
(V , ε) for some V ∈ F . We show two things:
(
(V , v) if `V (v) ∈ Σ,
T
T0
v ∈ Cm (u) implies v ∈ V and ` (u · m · v) =
(12)
x
if `V (v) = x.
T
V ⊆ Cm
(u).
(13)
T
T
(u).
(u) and (10) holds whenever v ∈ Cm
It then easily follows that V = Cm
T
We show (12) by induction on v ∈ Cm (u). For v = ε, we already know that
0
ε ∈ V and `T (u · m) = (V , ε). If v 6= ε, we can write v = v 0 · (m − 1) · v 00 with
T
(u), by induction
v 00 ∈ Pm−2 . Suppose u · m · v 0 CTm−1,i u · m · v. Since v 0 ∈ Cm
0
T0
0
0
hypothesis, v ∈ V and ` (u · m · v ) is either (V , v ) or x depending on whether
0
0
T
`V (v 0 ) ∈ Σ or not. Since T 0 ∈ L0 , (`T (u·m·v 0 ), Cm−1
(u·m·v 0 ), i, `T (u·m·v)) ∈ J.
0
T
By the definition of J, we must have `T (u·m·v 0 ) = (V , v 0 ) and Cm−1
(u·m·v 0 ) =
V
0
0
V
Cm−1 (v ), which implies v Cm−1,i v ∈ V . The definition of J then implies
0
`T (u · m · v) is either (V , v) or x depending on whether `V (v) ∈ Σ or not.
Having established (12), we proceed to show (13) by induction on v ∈ V .
T
For v = ε, we have ε ∈ Cm
(u) since u ∈ dom(≺Tm ). If v = v 0 · (m − 1) · v 00
00
0
T
with v ∈ Pm−2 , then v ∈ Cm
(u) by induction hypothesis. Since V ∈ F and
0
0
V
v ∈ dom(≺m−1 ), by the assumption about I, `V (v 0 ) ∈ Σ and so `T (u · m ·
v 0 ) = (V , v 0 ) 6∈ Y . Since T 0 ∈ L0 , this means that u · m · v 0 ∈ dom(≺Tm−1 )
0
0
T
(u · m · v 0 ), 1, `T (u · m · v 0 · (m − 1))) ∈ J. Since
and so (`T (u · m · v 0 ), Cm−1
0
T
V
(u · m · v 0 ) = Cm−1
(v 0 ).
`T (u · m · v 0 ) = (V , v 0 ), the definition of J implies Cm−1
0
00
00
T
0
Since v = v · (m − 1) · v ∈ V , it follows that v ∈ Cm−1 (u · m · v ) and hence
T
v = v 0 · (m − 1) · v 00 ∈ Cm
(u).
This concludes the proof of the lemma.
t
u
26
Makoto Kanazawa
Fix m ≥ 2 and Σ. For each c ∈ Σ and each (possibly empty) finite prefixclosed subset P of Pm−1 , define
Γc,P = { (c, P, i) | 0 ≤ i ≤ |P | }.
We consider Γc,P to be a group of symbols that match with each other; this
notion of a matching group of symbols generalizes the notion of a matching pair
of brackets. Let
[
e = Σ ∪ { Γc,P | c ∈ Σ and P ⊆ Pm−1 is finite and prefix-closed }.
Σ
e is an infinite set.16
Note that Σ
T
Let T ∈ Hm+1
Σ,x . For u ∈ Cm (ε), let Tu = { v ∈ Pm · Pm+1 | m · u · v ∈ T }, i.e.,
the domain of SH m+1 (T , m · u). Then we have
(
S
{ε} ∪ u∈Cm
if m + 1 6∈ T ,
T (ε) m · u · Tu
T =
S
{ε} ∪ (m + 1) · (T /(m + 1)) ∪ u∈Cm
if m + 1 ∈ T .
T (ε) m · u · Tu
Thus, T is completely determined by the following pieces of information:
–
–
–
–
–
`T (ε),
T
(ε),
Cm
T
(ε),
SH m+1 (T , m · u) for each u ∈ Cm
whether or not m + 1 ∈ T , and
in case m + 1 ∈ T , the (m + 1)-dimensional hedge SH m+1 (T , m + 1) =
T /(m + 1).
T
Let P = Cm
(ε), k = |P |, and for i = 1, . . . , k, ε CTm,i m · ui and Ti =
SH m+1 (T , m · ui ). In case m + 1 ∈ T or k ≥ 1 (i.e., m ∈ T ), let c = `T (ε) ∈ Σ.
The m-dimensional encoding of T , encm (T ) in symbols, is defined as follows:

T`T (ε)
if m + 1 6∈ T and k = 0,





c
−
P
(enc
(T
),
.
.
.
,
enc
(T
))
if
m + 1 6∈ T and k ≥ 1,
m
m
1
m
k





if m + 1 ∈ T and
(c, P, 0) −m (encm (T0 ))[
encm (T ) =
(c, P, 1) −m encm (T1 ), T0 = SH m+1 (T , m + 1).



...,





(c, P, k) −m encm (Tk )



]
(The substitution notation in the last clause presupposes encm (T0 ) ∈ Tm (k),
e
Σ
and this is indeed the case as shown by the following lemma.)
16
e from Σ in this way, we assume that Σ ∩Γc,P = ∅ for all c ∈ Σ and
When we define Σ
all finite and prefix-closed P ⊆ Pm−1 . Technically, this assumption may not always
be satisfied; nevertheless, we always regard the symbols in Γc,P as “new” symbols.
If more rigor is desired, it can be achieved by complicating the definition of Γc,P .
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
27
S
B
e
e
h
B
a1
a5
h
a6
x
a4
a3
h
x
a2
g
x
x
(S, ∅, 0)
(B, P, 0)
e
a5
e
h
h
a6
(B, P, 2)
a4
(B, P, 0)
a3
a1
h
(B, P, 1)
a2
g
(B, P, 2)
(B, P, 1)
Fig. 2. A well-labeled 3-dimensional tree and its 2-dimensional encoding (P = {ε, 1}).
Example 12. Fig. 2 shows the 3-dimensional tree T from Example 9, which is
in T3N ∪Σ,x , where N = {S, B} and Σ = {h, g, a1 , a2 , a3 , a4 , a5 , a6 }, along with
enc2 (T ). Here, P = {ε, 1}. As before, the edges in the third dimension are
colored red, those in the second dimension blue, and those in the first dimension
green.
m
Lemma 13. If T ∈ Hm+1
(n).
Σ,x (n), then encm (T ) ∈ TΣ
e
Proof. Induction on the size of T . Let c, P , k, and Ti be as above. Suppose T ∈
m+1
T
m
Hm+1
(1).
Σ,x (n). If ` (ε) = x, then T = Tx ∈ HΣ,x (1), and encm (T ) = Tx ∈ TΣ
e
m+1
T
If ` (ε) = c ∈ Σ, then Ti ∈ HΣ,x (ni ) for i = 1, . . . , k, where n = n1 + · · · + nk .
By induction hypothesis, encm (Ti ) ∈ Tm (ni ). Suppose m + 1 6∈ T . Then
e
Σ
it is easy to see encm (T ) ∈ Tm (n). Now suppose m + 1 ∈ T . Since T is
e
Σ
m
(k).
well-labeled, T0 ∈ Hm+1
Σ,x (k). By induction hypothesis, encm (T0 ) ∈ TΣ
e
28
Makoto Kanazawa
Also, (c, P, i) −m encm (Ti ) is in Tm (ni ) for i = 1, . . . , k. It easily follows
e
Σ
that encm (T ) = (c, P, 0) −m (encm (T0 ))[(c, P, 1) −m encm (T1 ), . . . , (c, P, k) −m
encm (Tk )] ∈ Tm (n).
t
u
e
Σ
Note that in encm (T ), every node with a label of the form (c, P, i) (i ≥ 0)
has exactly one child in the m-th dimension. There is a simple way of deleting
any collection of such nodes from an m-dimensional tree to produce another
m-dimensional tree.
Let T be any m-dimensional tree, and assume that U ⊆ T only consists of
T
nodes v such that |Cm
(v)| = 1. Define a function fU : T → [1, m]∗ by
fU (ε) = ε,
fU (v · i) = fU (v) · i
for i < m,
(
fU (v) · m if v 6∈ U ,
fU (v · m) =
fU (v)
if v ∈ U .
Let T 0 = ran(fU ) = { fU (v) | v ∈ T } and fU0 = fU (T − U ). Then it is easy to
see that T 0 = ran(fU0 ), T 0 is a non-empty prefix-closed subset of Pm , and fU0 is
a bijection from T − U to T 0 . Define
delm (T , U ) = (T 0 , `0 ),
where
`0 (v) = `T ((fU0 )−1 (v)).
Since T 0 is a non-empty prefix-closed subset of Pm , it follows that delm (T , U )
is an m-dimensional tree.
Let Υ ⊆ Σ and T ∈ Tm
Σ . We define
delm,Υ (T ) = delm (T , U ),
where
T
U = { v ∈ T | `T (v) ∈ Υ and |Cm
(v)| = 1 }.
Now let T ∈ Tm+1
Σ,x (m ≥ 2). The m-dimensional yield of T is defined as
follows:
ym (T ) = delm,Σ−Σ
e (encm (T )).
It is easy to see that ym (T ) ∈ Tm
Σ.
m
It is of course straightforward to define ym : Hm+1
Σ,x (n) → TΣ (n) directly:


Tc

c − P (y (T ), . . . , y (T ))
m
m
1
m
k
ym (T ) =
(ym (T0 ))[ym (T1 ), . . . , ym (Tk )]



if m + 1 6∈ T and k = 0,
if m + 1 6∈ T and k ≥ 1,
if m + 1 ∈ T and
T0 = SH m+1 (T , m + 1),
T
where, as before, c = `T (ε), P = Cm
(ε), k = |P |, and for i = 1, . . . , k, ε CTm,i m ·
ui and Ti = SH m+1 (T , m·ui ). The indirect definition through encm , however, is
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
29
(S, ∅, 0)
(B, P, 0)
h
a1
(B, P, 0)
a6
g
(B, P, 1)
(B, P, 2)
h
h
h
a1
a2
(B, P, 1)
a3
a4
(B, P, 2)
e
a5
g
h
a2
e
a6
h
e
a3
a4
a5
e
Fig. 3. The 2-dimensional encoding and 2-dimensional yield of a well-labeled 3dimensional tree (P = {ε, 1}).
useful for our generalization of the Chomsky-Schützenberger theorem for multidimensional tree languages.
The case m = 2 of the above definition of ym is meant to capture the notion
of the (tree) yield of a derivation tree of a simple context-free tree grammar,
which we represent as a (well-labeled) 3-dimensional tree. The definitions of
encm , delm,Υ , ym are all applicable to the case m = 1 as well, but the resulting
definitions of enc1 and of y1 will not be equivalent to the standard ones, so we
will continue to treat m = 1 as a special case.
Example 14. Fig. 3 shows enc2 (T ) (the same tree as the lower tree in Fig. 2
with the nodes rearranged) and y2 (T ), where T is the 3-dimensional tree from
Example 9 (the upper tree in Fig. 2).
7
Multi-dimensional Dyck Languages
We continue to work with the alphabet
[
e = Σ ∪ { Γc,P | c ∈ Σ and P is a finite prefix-closed subset of Pm−1 },
Σ
as defined in the previous section. The range of the function encm : Hm+1
Σ,x (0) →
Tm forms a special subset of Tm similar to Dyck languages.
e
e
Σ
Σ
Let us define a rewriting relation
on Tm :
e
Σ
T
T0
holds if there exist some v0 , v1 , . . . , vn ∈ T (n ≥ 0), c ∈ Σ, and finite prefix-closed
subset P of Pm such that17
17
∗
Note that for u, v ∈ T , u (CTm ) v is equivalent to v ∈ u · Pm , and u (CTm )
equivalent to v ∈ u · m · Pm−1 · Pm .
+
v is
30
Makoto Kanazawa
T 0 = delm (T , {v0 , v1 , . . . , vn }),
T
|Cm
(vi )| = 1 for i = 0, 1, . . . , n,
`T (vi ) = (c, P, i) for i = 0, 1, . . . , n,
n = |P |,
v1 , . . . , vn is the alphabetical listing of {v1 , . . . , vn },
+
v0 (CTm ) vi for i = 1, . . . , n,
∗
for every i, j ∈ {1, . . . , n}, if vi (CTm ) vj , then i = j, and
+
for every u ∈ T , if v0 (CTm ) u and there is no i ∈ {1, . . . , n} such that
T ∗
T
vi (Cm ) u, then ` (u) ∈ Σ.
–
–
–
–
–
–
–
–
Using the term notation, we can write
T
T 0 if and only if
T = U [(c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, n) −m Tn ]],
T 0 = U [T0 [T1 , . . . , Tn ]])
m
m
for some U ∈ Tm
e (1), T0 ∈ TΣ (n), Ti ∈ T e (i = 1, . . . , n),
Σ
Σ
c ∈ Σ, finite and prefix-closed P ⊆ Pm−1 with |P | = n.
Define the m-dimensional Dyck tree language over Σ by18
m
DT m
Σ = { T ∈ Te | T
Σ
∗
T 0 ∈ Tm
Σ }.
Note that the alphabet of DT m
Σ (i.e., the set of labels that appear in elements
of DT m
Σ ) is infinite.
Just like the ordinary Dyck language Dn of strings over Γn has an alternative
inductive definition in terms of a context-free grammar, so too the m-dimensional
tree language DT m
Σ admits an inductive definition. First, let us extend the definition of
to a rewriting relation on Tm (n) by taking the exact same definition
e
Σ
as before, requiring `T (u) ∈ Σ rather than `T (u) ∈ Σ ∪ {x} in the consequent
of the last condition. Note that this relation is confluent:
Lemma 15. Let T , T1 , T2 ∈ Tm (n). If T
T1 and T
e
Σ
some T 0 ∈ Tm (n) such that T1
T 0 and T2
T 0.
e
Σ
For each n ∈ N, we define DT m
Σ (n) by
m
DT m
Σ (n) = { T ∈ T e (n) | T
Σ
∗
T2 , then there exists
T 0 ∈ Tm
Σ (n) }.
m
Clearly, DT m
Σ (0) = DT Σ .
Then we can prove that (Xn )n∈N = (DT m
Σ (n))n∈N is the least (in terms of the
partial order defined by componentwise inclusion) sequence of sets that satisfies
the following closure conditions:
18
For dimension m = 2, analogous notions of Dyck tree language have been proposed
by Matsubara and Kasai [19] and by Arnold and Dauchet [1] to capture the tree
languages generated by tree-adjoining grammars and by (general) context-free tree
grammars, respectively.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
31
I1. Tc ∈ X0 for all c ∈ Σ.
I2. Tx ∈ X1 .
I3. If c ∈ Σ, P is a finite, non-empty, prefix-closed subset of Pm−1 , k = |P |,
Pk
T1 ∈ Xn1 , . . . , Tk ∈ Xnk , and n = i=1 ni , then
c −m P (T1 , . . . , Tk ) ∈ Xn .
I4. If c ∈ Σ, P is a (possibly empty) finite prefix-closed subset of Pm−1 , k = |P |,
Pk
T1 ∈ Xn1 , . . . , Tk ∈ Xnk , T0 ∈ Xk , and n = i=1 ni , then
(c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, k) −m Tk ] ∈ Xn .
Lemma 16. (Xn )n∈N = (DT m
Σ (n))n∈N satisfies I1–I4.
Lemma 17. If T ∈ DT m
Σ (n), then one of the following conditions holds:
C1. n = 0 and T = Tc for some c ∈ Σ.
C2. n = P
1 and T = Tx .
k
C3. n = i=1 ni for some k ≥ 1, n1 , . . . , nk ≥ 0 and
T = c −m P (T1 , . . . , Tk )
for some c ∈ Σ, some finite prefix-closed subset P of Pm−1 such that |P | =
m
k, and some T1 ∈ DT m
Σ (n1 ), . . . , Tk ∈ DT Σ (nk ).
Pk
C4. n = i=1 ni for some k ≥ 0, n1 , . . . , nk ≥ 0 and
T = (c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, k) −m Tk ]
for some c ∈ Σ, some finite prefix-closed subset P of Pm−1 such that |P | =
m
m
k, and some T1 ∈ DT m
Σ (n1 ), . . . , Tk ∈ DT Σ (nk ), T0 ∈ DT Σ (k).
Proof. Suppose T ∈ DT m
Σ (n). If |T | = 1, then clearly, either C1 or C2 holds.
T
If `T (ε) ∈ Σ and Cm
(ε) = P 6= ∅, let k = |P |. Then T = c −m P (T1 , . . . , Tk )
Pk
for some T1 ∈ Tm (n1 ), . . . , Tk ∈ Tm (nk ) such that i=1 ni = n. Since T ∗ T 0
e
e
Σ
Σ
∗
Ti0 for some
for some T 0 ∈ Tm
Σ (n), it is clear that for i = 1, . . . , k, Ti
0
m
Ti ∈ TΣ (ni ). Therefore, C3 holds.
Now suppose `T (ε) = (c, P, i). Since T ∗ T 0 ∈ Tm
Σ (n), it is easy to see that
T
i = 0 and Cm
(ε) = {ε}. Let k = |P | and let T00 ∈ Tm
(k),
T10 , . . . , Tk0 be such that
Σ
T
∗
(c, P, 0) −m T00 [(c, P, 1) −m T10 , . . . , (c, P, k) −m Tk0 ]
T00 [T10 , . . . , Tk0 ]
∗
T 0.
Then it is easy to see that for i = 1, . . . , k, Ti0
Pk
that n = i=1 ni . Also, we must have
∗
Ti00 ∈ Tm
Σ (ni ) for some ni such
T = (c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, k) −m Tk ]
for some T0 ∈ Tm (k), T1 ∈ Tm (n1 ), . . . , Tk ∈ Tm (nk ) such that T0 ∗ T00 and
e
e
e
Σ
Σ
Σ
for i = 1, . . . , k, Ti ∗ Ti0 . It follows that T0 ∈ DT m
Σ (k) and for i = 1, . . . , k,
Ti ∈ DT m
t
u
Σ (ni ), i.e., C4 holds.
32
Makoto Kanazawa
Theorem 18. (DT m
Σ (n))n∈N is the least sequence of sets that satisfies I1–I4.
Proof. By Lemma 16, we know that (DT m
Σ (n))n∈N satisfies I1–I4. Let (Xn )n∈N
be any sequence of sets satisfying I1–I4. To establish that DT m
Xn holds
Σ (n) ⊆S
for all n ∈ N, we prove by induction on the number of nodes of T ∈ n Tm (n)
e
Σ
m
that T ∈ DT m
Σ (n) implies T ∈ Xn . If T ∈ DT Σ (n), by Lemma 17, one of
C1–C4 holds. In case C1 or C2 holds, T ∈ Xn by I1 or I2. If C3 holds, then
by induction hypothesis, T = c −m P (T1 , . . . , Tk ), where Ti ∈ Xni for i =
1, . . . , k. Then by I3, T ∈ Xn . If C4 holds, then by inductionPhypothesis, T =
k
(c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, k) −m Tk ], where n = i=1 ni , T0 ∈ Xk ,
and Ti ∈ Xni for i = 1, . . . , k. Then by I4, T ∈ Xn .
t
u
Just as in the case of ordinary Dyck languages, the inductive definition of
DT m
Σ (n) in terms of the closure conditions I1–I4 is unabmiguous in the sense
that every T ∈ DT m
Σ (n) can be written in the form of one of the equations in
C1–C4, in exactly one way. This follows from the next lemma:
0
Lemma 19. Let U ∈ DT m
∈ DT m
U [(c, P, i1 ) −m
Σ (k), U
Σ (l). If
0
T1 , . . . , (c, P, ik ) −m Tk ] = U [(c, P, j1 ) −m T10 , . . . , (c, P, jl ) −m Tl0 ] with
i1 , . . . , ik , j1 , . . . , jl ≥ 1, then U = U 0 .
Proof. This can be proved by straightforward induction on the size of U , using
Lemma 17.
t
u
m
Lemma 20. { encm (T ) | T ∈ Hm+1
Σ,x (n) } = DT Σ (n).
m+1
Proof. (⊆). We show by induction on the size of T ∈ Hm+1
Σ,x that T ∈ HΣ,x (n)
T
implies encm (T ) ∈ DT m
Σ (n). Let P = Cm (ε), and k = |P |. Assume that for i =
T
1, . . . , k, ε Cm,i m · ui and Ti = SH m+1 (T , m · ui ). Clearly, for each i = 1, . . . , k,
Pk
Ti ∈ Hm+1
Σ,x (ni ) for some ni such that n =
i=1 ni . By induction hypothesis,
encm (Ti ) ∈ DT m
(n
).
i
Σ
Case 1. m + 1 ∈ T . Then `T (ε) = c for some c ∈ Σ. Let T0 =
SH m+1 (T , m + 1). Then T0 ∈ Hm+1
Σ,x (k). By induction hypothesis, we also have
encm (T0 ) ∈ DT m
(k).
Then
enc
m (T ) = (c, P, 0) −m (encm (T0 ))[(c, P, 1) −m
Σ
encm (T1 ), . . . , (c, P, k) −m encm (Tk )] ∈ DT m
Σ (n) by the closure condition I4.
Case 2. m + 1 6∈ T . If P 6= ∅, then `T (ε) = c for some c ∈ Σ, and encm (T ) =
c −m P (encm (T1 ), . . . , encm (Tk )) ∈ DT m
Σ (n) by the closure condition I3. If P =
m+1
∅, then T ∈ Hm+1
(0)
or
T
∈
H
(1)
depending
on whether `T (ε) = c ∈ Σ or
Σ,x
Σ,x
`T (ε) = x. In the former case, encm (T ) = Tc ∈ DT m
Σ (0) by the closure condition
I1. In the latter case, encm (T ) = Tx ∈ DT m
Σ (1) by the closure condition I2.
(⊇). By Theorem 18, it suffices to prove that (Xn )n∈N = ({ encm (T ) | T ∈
Hm+1
Σ,x (n) })n∈N satisfies the conditions I1–I4.
I1 is satisfied since Tc = encm (Tc ) and Tc ∈ Hm+1
Σ,x (0).
I2 is satisfied since Tx = encm (Tx ) and Tx ∈ Hm+1
Σ,x (1).
To check that I3 is satisfied, let c ∈ Σ, P be a finite, non-empty, prefixclosed subset of Pm−1 , k = |P |, and for each i = 1, . . . , k, Ti ∈ Hm+1
Σ,x (ni ), where
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
33
Pk
n = i=1 ni . Let u1 , . . . , uk be the elements of P , in alphabetical order. Define
T by
k
[
T = {ε} ∪
m · ui · Ti ,
i=1
`T (ε) = c,
`T (m · ui · v) = `Ti (v)
for v ∈ Ti .
Then T ∈ Hm+1
Σ,x (n) and Ti = SH m+1 (T , m·ui ) for i = 1, . . . , k. By the definition
of encm ,
encm (T ) = c −m P (encm (T1 ), . . . , encm (Tk )).
This shows that I3 is satisfied.
To check that I4 is satisfied, let c ∈ Σ, P be a finite, possibly empty, prefixclosed subset of Pm−1 , k = |P |, T0 ∈ Hm+1
Σ,x (k), and for each i = 1, . . . , k,
Pk
m+1
Ti ∈ HΣ,x (ni ), where n = i=1 ni . Let u1 , . . . , uk list the elements of P , in
alphabetical order. Define T by
T = {ε} ∪ (m + 1) · T0 ∪
k
[
m · ui · Ti ,
i=1
`T (ε) = c,
`T ((m + 1) · v) = `T0 (v)
T
Ti
` (m · ui · v) = ` (v)
for v ∈ T0 ,
for v ∈ Ti .
Then it is easy to see T ∈ Hm+1
Σ,x (n), T0 = SH m+1 (T , m+1), and for i = 1, . . . , k,
Ti = SH m+1 (T , m · ui ). By the definition of encm ,
encm (T ) =
(c, P, 0) −m (encm (T0 ))[(c, P, 1) −m (encm (T1 )), . . . , (c, P, k) −m (encm (Tk ))].
t
u
This shows that I4 is satisfied.
Lemma 21. For each m ≥ 2, encm is an injection.
Proof. This follows from the unambiguity of the inductive definition of DT m
Σ (n).
t
u
T
from the nodes of T ∈ Hm+1
It is useful to define a function fenc
Σ,x to
m
T
the nodes of T 0 = encm (T ). Let P = Cm
(ε) and k = |P |. Let u1 , . . . , uk list
the elements of P in alphabetical order, and let Ti = SH m+1 (T , m · ui ) for
T
i = 1, . . . , k. Define fenc
: T → T 0 by
m
T
(i) fenc
(ε) = ε.
m
(ii) If m + 1 ∈ T and T0 = SH m+1 (T , m + 1), then
T
T0
fenc
((m + 1) · w) = m · fenc
(w)
m
m
T
fenc
(m
m
· ui · w) = m ·
T0
fenc
(vi )
m
where w ∈ T0 ,
·m·
Ti
fenc
(w)
m
where w ∈ Ti and ε JT
m+1,i vi .
34
Makoto Kanazawa
(iii) If m + 1 6∈ T , then
T
Ti
fenc
(m · ui · w) = m · ui · fenc
(w)
m
m
where w ∈ Ti .
T
It is easy to check that fenc
(v) ∈ T 0 indeed holds for all v ∈ T .
m
Example 22. Consider the 3-dimensional tree T and its 2-dimensional encoding
enc2 (T ), depicted in Fig. 2. In these diagrams, the nodes that are related by
T
fenc
are placed in roughly the same geometrical positions.
m
0
Lemma 23. Let T ∈ Hm+1
Σ,x and T = encm (T ). For each v ∈ T , we have

c




0
(c, U, 0)
T
`T (fenc
(v)) =
m

(c, U, i)



x
if
if
if
if
v 6∈ dom(≺Tm+1 ) and `T (v) = c ∈ Σ,
T
v ∈ dom(≺Tm+1 ), `T (v) = c, and Cm
(v) = U ,
T
T
`T (v) = x, u JT
v,
`
(u)
=
c,
and
Cm
(u) = U ,
m+1,i
T
` (v) = x and v ∈ T ∩ Pm .
Proof. This is easy to see from the definition of encm .
t
u
T
allows an alternative definition by recursion with respect
The function fenc
m
to the alphabetical order on the nodes of T .
T
Lemma 24. The function fenc
satisfies the following equations:
m
T
fenc
(ε) = ε,
m
T
T
fenc
(u · (m + 1)) = fenc
(u) · m,
m
m
and for v ∈ Pm−1 ,

T

fencm (u) · m · v
T
T
fencm (u · m · v) = fenc
(u · (m + 1) · w) · m
m


if u 6∈ dom(≺Tm+1 ),
if u ∈ dom(≺Tm+1 ),
u CTm,i v, and u JT
m+1,i w.
Proof. The first equation is true by definition. The remaining two equations can
be proved by induction on the length of u.
t
u
T
Lemma 25. For all T ∈ Hm+1
Σ,x , fencm is a bijection from the nodes of T to the
nodes of encm (T ).
T
Proof. That fenc
is injective can be shown by induction with respect to the
m
T
T
T
alphabetical order on T . Suppose fenc
(t1 ) = fenc
(t2 ). If fenc
(t1 ) = ε, then
m
m
m
T
clearly t1 = t2 = ε. If fencm (t1 ) ∈ Pm · m · v for v ∈ Pm−1 − {ε}, then we must
T
T
have t1 = u1 · m · v and t2 = u2 · m · v with fenc
(u1 ) = fenc
(u2 ). By the
m
m
T
induction hypothesis, u1 = u2 and hence t1 = t2 . If fencm (t1 ) ∈ Pm · m, then for
each j ∈ {1, 2}, one of the following holds:
T
T
(i) tj = uj · (m + 1) and fenc
(tj ) = fenc
(uj ) · m.
m
m
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
35
T
T
(ii) tj = uj · m and fenc
(tj ) = fenc
(uj ) · m, where uj 6∈ dom(≺Tm+1 ).
m
m
T
T
(iii) tj = uj · m · vj and fenc
(t
)
=
fenc
(uj · (m + 1) · wj ) · m, where uj ∈
j
m
m
dom(≺Tm+1 ), uj CTm,i vj , and uj JT
m+1,i wj .
If the same case applies to both t1 and t2 , then the induction hypothesis implies
T
T
t1 = t2 . If (i) applies to t1 and (ii) applies to t2 , then fenc
(u1 ) = fenc
(u2 ),
m
m
T
T
but since u1 ∈ dom(≺m+1 ) and u2 6∈ dom(≺m+1 ), the induction hypothesis says
T
(u1 ) =
this is impossible. If (i) applies to t1 and (iii) applies to t2 , then fenc
m
T
T
fencm (u2 ·(m+1)·w2 ). Since u1 ∈ dom(≺m+1 ) and u2 ·(m+1)·w2 6∈ dom(≺Tm+1 )
(by the fact that T ∈ Hm+1
Σ,x ), the induction hypothesis says this is impossible. If
T
T
(ii) applies to t1 and (iii) applies to t2 , then fenc
(u1 ) = fenc
(u2 · (m + 1) · w2 ).
m
m
T
T
Since u1 ∈ dom(≺m ) and u2 · (m + 1) · w2 6∈ dom(≺m ) (again by the fact that
T ∈ Hm+1
Σ,x ), the induction hypothesis says this is impossible. The remaining
cases are symmetric.
T
We have shown that fenc
is injective. Since T and T 0 have the same number
m
T
of nodes, fencm is indeed a bijection.
t
u
m
The m-dimensional counterpart DT 0 Σ of the set of Dyck primes (m ≥ 2) is
defined by
m
DT 0 Σ = { (c, ∅, 0) −m T | c ∈ Σ, T ∈ DT m
Σ } ∪ { Tc | c ∈ Σ }.
m
0
m+1
Lemma 26. For all T ∈ Hm+1
Σ,x (0), T ∈ TΣ,x if and only if encm (T ) ∈ DT Σ .
e → Σ ∪ {x} by
Define a function ρ : Σ
ρ(c) = c
for c ∈ Σ,
ρ((c, U, 0)) = c,
ρ((c, U, i)) = x
for 1 ≤ i ≤ |U |.
0
Then for every T ∈ Hm+1
Σ,x and v ∈ T , if T = encm (T ), we have
0
T
ρ(`T (fenc
(v))) = `T (v).
m
The following is a generalization of Lemma 2 to higher dimensions:
Lemma 27. Let L ⊆ Tm+1
Σ,x . If L is super-local, then there exists a local set
m
0
m
L ⊆ T such that encm (L) = L0 ∩ DT 0 Σ .
e
Σ
Proof. Let L be a super-local subset of Tm+1
Σ,x . Without loss of generality, we
may suppose that L = SLocm+1 (A, Z, K, Y, J) for some finite sets
A ⊆ Σ,
Z ⊆ Σ ∪ {x},
K ⊆ Σ × (Σ ∪ {x}),
Y ⊆ Σ ∪ {x},
J ⊆ Σ × { P ⊆ Pm−1 | P is an (m − 1)-ary tree domain } × N+ × (Σ ∪ {x}).
36
Makoto Kanazawa
Let
S
Σ 0 = (Z ∩ Σ) ∪ { Γc,U | x ∈ Y ∩ Z, (c, U, i, a) ∈ J, (c, b) ∈ K } ∪
{ (c, ∅, 0) | c ∈ A ∪ Y, (c, a) ∈ K }.
e Define finite sets A0 , Z 0 ⊆ Σ 0 and I ⊆
Note that Σ 0 is a finite subset of Σ.
m−1
0
Σ × TΣ 0 by
A0 = (A ∩ Z) ∪ { (c, ∅, 0) | c ∈ A, (c, a) ∈ K },
Z 0 = (A ∪ Y ) ∩ Z ∩ Σ,
I = { ((c, U, 0), Td ) | (c, U, 0) ∈ Σ 0 , d ∈ Z ∩ Σ, (c, d) ∈ K } ∪
{ ((c, U, 0), T(d,V,0) ) | (c, U, 0), (d, V, 0) ∈ Σ 0 , (c, d) ∈ K, and
either V 6= ∅ or d ∈ Y } ∪
{ ((c, U, 0), T(c,U,1) ) | (c, U, 1) ∈ Σ 0 , |U | = 1, (c, x) ∈ K } ∪
{ ((c, U, i), Td ) | (c, U, i) ∈ Σ 0 , (c, U, i, d) ∈ J, d ∈ Z ∩ Σ } ∪
{ ((c, U, i), T(d,V,0) ) | (c, U, i) ∈ Σ 0 , (c, U, i, d) ∈ J, (d, V, 0) ∈ Σ 0 , and
either V 6= ∅ or d ∈ Y } ∪
{ ((c, U, i), T(d,V,j) ) | (c, U, i) ∈ Σ 0 , (c, U, i, x) ∈ J, j ≥ 1, (d, V, j) ∈ Σ 0 } ∪
{ (c, U ) | c ∈ Z ∩ Σ, U ∈ Tm−1
Σ0 ,
for i = 1, . . . , |U |, if ui is the i-th node of U , then
(c, U, i, ρ(`U (ui ))) ∈ J, and `U (ui ) = (d, ∅, 0) implies d ∈ Y }.
We show that L0 = Locm (A0 , Z 0 , I) satisfies the desired property.
m
To prove encm (L) ⊆ L0 ∩ DT 0 Σ , suppose T ∈ L and let T 0 = encm (T ). By
m
Lemma 26, T 0 ∈ DT 0 Σ , so it suffices to show T 0 ∈ L0 .
0
L1. First we show `T (ε) ∈ A0 . Let c = `T (ε). Since T ∈ L, c ∈ A. Suppose
first ε 6∈ dom(≺Tm+1 ). Then since T ∈ L, c ∈ Z. By the definition of encm ,
0
`T (ε) = c ∈ A0 . Now suppose ε ∈ dom(≺Tm+1 ). Then since T ∈ L, (c, `T (m +
T
1)) ∈ K. Since T is an (m + 1)-dimensional tree, Cm
(ε) = ∅. By the definition
T0
0
of encm , ` (ε) = (c, ∅, 0) ∈ A .
0
T
)−1 (v). By the definition of
L2. Suppose v ∈ T 0 − dom(≺Tm ). Let u = (fenc
m
T
T
T
fencm , it is clear that u 6∈ dom(≺m+1 )∪dom(≺m ) and `T (u) ∈ Σ. By Lemma 23,
0
`T (v) = `T (u). Since T ∈ L, `T (u) ∈ (A ∪ Y ) ∩ Z, and this implies that
T0
` (v) ∈ Z 0 .
0
0
T
T
L3. Suppose v ∈ dom(≺Tm ) and U 0 = Cm
(v). Let u = (fenc
)−1 (v).
m
0
Case 1. u ∈ dom(≺Tm+1 ). In this case, `T (v) = (c, U, 0), where c = `T (u) and
T
U = Cm
(u). Since T ∈ L, we have (c, `T (u · (m + 1))) ∈ K. If u 6∈ dom(≺Tm ),
then U = ∅, and either u = ε and c ∈ A or c ∈ Y . If u ∈ dom(≺Tm ), then
(c, U, 1, `T (u · m)) ∈ J. In either case, we have (c, U, 0) ∈ Σ 0 . Note that U 0 = {ε}
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
37
T
and v · m = fenc
(u · (m + 1)). We have
m


d
if `T (u · (m + 1)) = d ∈ Σ and





u · (m + 1) 6∈ dom(≺Tm+1 ),

0
`T (v · m) = (d, V, 0) if `T (u · (m + 1)) = d ∈ Σ, u · (m + 1) ∈ dom(≺Tm+1 ),


T

and Cm
(u · (m + 1)) = V ,



(c, U, 1) if `T (u · (m + 1)) = x.
Since T ∈ L, we have d ∈ Z ∩ Σ ⊆ Σ 0 in the first case. In the second case,
(d, `T (u · (m + 1) · (m + 1))) ∈ K and either V = ∅ and d ∈ Y or (d, V, 1, `T (u ·
(m + 1) · m)) ∈ J, which implies (d, V, 0) ∈ Σ 0 . In the third case, since T is
well-labeled, we have |U | = 1 and u · (m + 1) 6∈ dom(≺Tm ), and T ∈ L implies
x ∈ Y ∩ Z and (c, U, 1, `T (u · m)) ∈ J. It follows that (c, U, 1) ∈ Σ 0 . In each case,
we have ((c, U, 0), U 0 ) = ((c, U, 0), T`T 0 (v·m) ) ∈ I.
Case 2. u 6∈ dom(≺Tm+1 ).
0
Case 2a. `T (u) = c ∈ Σ. In this case, `T (v) = c ∈ Σ ∩ Z ⊆ Σ 0 and it is easy
T
T
to see from Lemma 24 that U 0 = Cm
(u) and fenc
(u · m · w) = v · m · w for all
m
T
T
T
(u), i, `T (u · m · ui )) =
w ∈ Cm (u). Since T ∈ L, if u Cm,i u · m · ui , then (c, Cm
0
0
0
T0
0
U0
(c, U , i, ρ(` (v · m · ui ))) = (c, U , i, ρ(` (ui ))) ∈ J. Also, if `U (ui ) = `T (v ·
T
m · ui ) = (d, ∅, 0) for some d ∈ Σ, then u · m · ui 6∈ dom(≺m ) and hence d ∈ Y .
It follows that (c, U 0 ) ∈ I.
T
Case 2b. `T (u) = x. In this case, for some w, z ∈ T −{ε}, w JT
m+1,i u, w Cm,i
T
z. By Lemma 24, we have U 0 = {ε} and v · m = fenc
(z). Let c = `T (w) ∈ Σ
m
T
T
and U = Cm (w). Since T ∈ L, (c, ` (w · m)) ∈ K and (c, U, i, `T (z)) ∈ J. Also,
0
`T (u) = x ∈ Y ∩ Z, since T is well-labeled. Hence `T (v) = (c, U, i) ∈ Σ 0 and


if `T (z) = d ∈ Σ and z 6∈ dom(≺Tm+1 ),
d
0
T
`T (v · m) = (d, V, 0) if `T (z) = d ∈ Σ, z ∈ dom(≺Tm+1 ), and Cm
(z) = V ,


T
T
T
T
(d, V, j) if ` (z) = x, t Jm+1,j z, ` (t) = d, and Cm (t) = V .
In the first case, since T ∈ L, d ∈ Z. In the second case, (d, `T (z · (m + 1))) ∈ K,
and if V = ∅, then z 6∈ dom(≺Tm ) and hence d ∈ Y . In the third case, (d, `T (t ·
(m + 1))) ∈ K. It follows that in each case, ((c, U, i), U 0 ) = ((c, U, i), T`T 0 (v·m) ) ∈
I.
We have proved that T 0 satisfies the requirements L1–L3 for membership in
L0 = Locm (A0 , Z 0 , I).
m
m
To show L0 ∩ DT 0 Σ ⊆ encm (L), let T 0 ∈ L0 ∩ DT 0 Σ . By Lemma 26, there is
m+1
a T ∈ TΣ,x such that T 0 = encm (T ). It suffices to prove that T ∈ L.
We show that T satisfies all requirements for membership in L =
SLocm+1 (A, Z, K, Y, J).
0
0
SL1. The fact that `T (ε) ∈ A0 easily implies `T (ε) = ρ(`T (ε)) ∈ A.
SL2. Let v ∈ T − dom(≺Tm+1 ).
0
T
Case 1. `T (v) = c ∈ Σ. Then `T (fenc
(v)) = c ∈ Σ 0 ⊆ Z, and it follows
m
that c ∈ Z.
38
Makoto Kanazawa
0
T
Case 2. `T (v) = x. Then `T (fenc
(v)) = (d, V, i) ∈ Σ 0 for some V 6= ∅ and
m
i ≥ 1, and it follows that x ∈ Z.
SL3. Let v ∈ dom(≺Tm+1 ). We show (`T (v), `T (v · (m + 1))) ∈ K. By
T
T
Lemma 24, we have fenc
(v · (m + 1)) = fenc
(v) · m. Let c = `T (v) and U =
m
m
T
T0
T
T0
T
Cm (v). Then ` (fencm (v)) = (c, U, 0) and Cm (fenc
(v)) = T`T 0 (fenc
T
(v)·m) .
m
m
0
0
0
Since T ∈ L , it holds that ((c, U, 0), T`T (fenc
T
(v)·m) ) ∈ I.
m
0
T
(v) · m) is either d or (d, V, 0)
Case 1. `T (v · (m + 1)) = d ∈ Σ. Then `T (fenc
m
for some V depending on whether v · (m + 1) ∈ dom(≺Tm+1 ) or not. Either way,
T
T
((c, U, 0), T`T 0 (fenc
T
(v)·m) ) ∈ I implies (` (v), ` (v · (m + 1))) = (c, d) ∈ K.
m
0
T
Case 2. `T (v · (m + 1)) = x. Then |U | = 1 and `T (fenc
(v) · m) = (c, U, 1).
m
T
T
0
Since ((c, U, 0), T`T (fenc
) ∈ I, we have (` (v), ` (v · (m + 1))) = (c, x) ∈
T
(v)·m)
m
K.
SL4. Suppose v 6= ε and v ∈ T − dom(≺Tm ). We show `T (v) ∈ Y .
Case 1. `T (v) = c ∈ Σ.
0
T
Case 1a. v ∈ dom(≺Tm+1 ). Then `T (fenc
(v)) = (c, ∅, 0). Since v 6= ε,
m
T
0
fencm (v) 6= ε and there are some u ∈ T and w ∈ Pm−1 such that u · m · w =
0
0
T
T0
T
(v)) =
fenc
(v). Since T 0 ∈ L0 , (`T (u), Cm
(u)) ∈ I. Since (c, ∅, 0) = `T (fenc
m
m
T0
`Cm (u) (w), the definition of I implies c ∈ Y .
0
T
Case 1b. v 6∈ dom(≺Tm+1 ). Then `T (fenc
(v)) = c. Since v 6∈ dom(≺Tm ) and
m
0
T
`T (v) 6= x, Lemma 24 implies fenc
(v) 6∈ dom(≺Tm ). Since T 0 ∈ L0 , c ∈ Z 0 ,
m
which implies c ∈ Y .
0
T
Case 2. `T (v) = x. Then `T (fenc
(v)) = (d, V, j) ∈ Σ 0 for some V 6= ∅ and
m
j ≥ 1, which implies x ∈ Y .
T
(u) = U , c = `T (u). We show (c, U, i, `T (v)) ∈ J.
SL5. Let u CTm,i v, Cm
0
T
T
Case 1. u ∈ dom(≺m+1 ). Then `T (fenc
(u)) = (c, U, 0) and for some
m
0
T
T
T
T
T
w, u Jm+1,i w, ` (fencm (w)) = (c, U, i), fencm (v) = fenc
(w) · m, and
m
0
T
T
0
0
0
0
Cm
(fenc
(w))
=
T
.
Since
T
∈
L
,
((c,
U,
i),
T
T
T
`T (fenc
(v))
`T (fenc
(v)) ) ∈ I.
m
m
m
We have

d
if `T (v) = d ∈ Σ and v 6∈ dom(≺Tm+1 ),




T
0
(v) = V ,
(d, V, 0) if `T (v) = d ∈ Σ, v ∈ dom(≺Tm+1 ), and Cm
T
`T (fenc
(v)) =
m
T
T
T

(d,
V,
j)
if
`
(v)
=
x,
z
J
v,
`
(z)
=
d
∈
Σ,
and

m+1,j


T
Cm
(z) = V .
In each case, ((c, U, i), T`T 0 (fenc
T
m
(v)) )
∈ I implies (c, U, i, `T (v)) ∈ J.
0
T
T
(u)) = c, fenc
(u · m · t) =
Case 2. u 6∈ dom(≺Tm+1 ). In this case, `T (fenc
m
m
T
T
T0
T
T
fencm (u)·m·t for all t ∈ Cm (u), and Cm (fencm (u)) = Cm (u) = U . In particular,
0
T
T
T0
T
fenc
(u) CTm,i fenc
(v). Since T 0 ∈ L0 , (c, Cm
(fenc
(u))) ∈ I. Let ui be the
m
m
m
T
T
i-th node of U , so that fencm (v) = fencm (u) · m · ui . Then (c, U, i, `T (v)) =
0
0
T0
T
T
T
T
(c, U, i, ρ(`T (fenc
(v)))) = (c, Cm
(fenc
(u)), i, ρ(`Cm (fencm (u)) (ui ))) ∈ J, by
m
m
the definition of I.
This establishes T ∈ L = SLocm+1 (A, Z, K, Y, J).
t
u
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
39
The converse of the above lemma does not hold for a reason similar to the
one for the case of the standard enc function for dimension 1.19
f0 → Σ
e in an
A projection π : Σ 0 → Σ naturally induces a projection π
e: Σ
obvious way:
π
e(c) = π(c),
π
e((c, P, i)) = (π(c), P, i).
0
Clearly, if T 0 ∈ DT m
e(T 0 ) ∈ DT m
∈ Tm+1
Σ 0 , then π
Σ . Also, if T
Σ 0 ,x , then
0
0
encm (π(T )) = π
e(encm (T )).
Here is a generalization of Lemma 6 to multi-dimensional Dyck languages:
Lemma 28. Let L ⊆ Tm be a local set. Then there exist a finite alphabet Σ 0 ,
e
Σ
m
a projection π : Σ 0 → Σ, and a local set L0 ⊆ Tm0 such that L ∩ DT 0 Σ =
e
Σ
0m
π
e(L0 ∩ DT m
e maps L0 ∩ DT m
Σ 0 ). Moreover, π
Σ 0 bijectively to L ∩ DT Σ .
e and I ⊆ Σ
e × Tm−1 be finite sets such that L =
Proof. Let A, Z ⊆ Σ
e
Σ
m
Locm (A, Z, I). Since we are interested in the intersection of L and DT 0 Σ , we
may assume without loss of generality Z ⊆ Σ. Define
Σ0 = Z ∪ { c ∈ Σ | (c, T ) ∈ I },
Σ 0 = Σ0 ∪ { c̄ | c ∈ A ∩ Z },
A0 = { c̄ | c ∈ A ∩ Z } ∪ { (c, ∅, 0) | c ∈ Σ0 , (c, ∅, 0) ∈ A },
Z 0 = Z ∪ { c̄ | c ∈ A ∩ Z }.
f0 . Let π : Σ 0 → Σ be the projection defined
Then A0 and Z 0 are finite subsets of Σ
by
¯ =d
π(c) = c, π(d)
for each c ∈ Σ0 and d ∈ A ∩ Z. Let
L0 = Locm (A0 , Z 0 , I).
m
m
0
It is easy to see that L ∩ DT 0 Σ = π
e(L0 ∩ DT m
Σ 0 ) and for each T ∈ L ∩ DT Σ ,
0
0
0
there is a unique T ∈ L such that π(T ) = T .
t
u
0
Lemma 29. If L ⊆ Tm+1
Σ,x is a local set, then there exist a finite alphabet Σ , a
0
0
m
projection π : Σ → Σ, and a local set L ⊆ T 0 such that
e
Σ
encm (L) = π
e(L0 ∩ DT m
Σ 0 ).
Moreover, enc−1
e maps L0 ∩ DT m
m ◦π
Σ 0 bijectively to L.
19
There is also an additional reason. L = {a −3 a} is not super-local even though
enc2 (L) = {(a, ∅, 0) −2 a} is local.
40
Makoto Kanazawa
Proof. By Lemma 11, there exist a projection π1 : Σ1 → Σ and a super-local
m
L1 ⊆ Tm+1
Σ1 ,x such that L = π1 (L1 ). By Lemma 27, there exist a local set L2 ⊆ TΣ
f1
m
such that encm (L1 ) = L2 ∩ DT 0 Σ1 . By Lemma 28, there exist a projection
m
π2 : Σ 0 → Σ1 and a local set L0 ⊆ Tm0 such that L2 ∩ DT 0 Σ1 = π
f2 (L0 ∩ DT m
Σ 0 ).
e
Σ
So
encm (L) = encm (π1 (L1 ))
=π
f1 (encm (L1 ))
m
=π
f1 (L2 ∩ DT 0 Σ1 )
=π
f1 (f
π2 (L0 ∩ DT m
Σ 0 ))
=π
e(L0 ∩ DT m
Σ 0 ),
where π = π1 ◦ π2 . Since π1 is a bijection from L1 to L and encm is injective, π
f1
maps encm (L1 ) bijectively to encm (L). Since π
f2 maps L0 ∩ DT m
bijectively
to
0
Σ
m
L2 ∩ DT 0 Σ1 = encm (L1 ), π
e=π
f1 ◦ π
f2 maps L0 ∩ DT m
bijectively
to
enc(L).
t
u
0
Σ
8
A Multi-dimensional Generalization of the
Chomsky-Schützenberger Theorem
Let m ≥ 2. We call L ⊆ Tm
Σ simple context-free if there exist a finite alphabet Υ
and a local set K ⊆ Tm+1
Υ,x such that L = ym (K).
For a finite alphabet Σ and r ≥ 0, we define the finite alphabet
er = Σ ∪
Σ
[
{ Γc,P | c ∈ Σ, P is finite and prefix-closed, |P | ≤ r }.
For any alphabet Υ and p ≥ 1, let
m
T
Tm
Υ,p = { T ∈ TΥ | |Cm (v)| ≤ p for all v ∈ T }.
m
Clearly, if Υ is finite, Tm
Υ,p is a local subset of TΥ . Also, any local subset L
m
m
of TΥ is included in TΥ,p for some p, which is just another way of saying L is
degree-bounded.
m
Lemma 30. Let Σ be a finite set. For m ≥ 2, DT m
is simple contextΣ ∩ TΣ
er ,p
free.
Proof. We adapt the inductive definition of DT m
Σ (n) to obtain the required local
e
set. Let Υ = Σr ∪ {X0 , . . . , Xr }. We write Uk for the set {ε, (m − 1), . . . , (m −
1)k−1 } ⊆ Pm−1 . Let
An = {Xn } for n = 0, . . . , r,
er ∪ {x},
Z=Σ
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
41
I = {(X0 , Tc ), (X1 , Tx )} ∪



c −m P (



P ⊆ Pm−1 ,






X
−
U
(x,
.
.
.
,
x),

n
m
n
1
1
 P is finite and prefix-closed, 


Xn ,
...,
∪
 1 ≤ |P | = k ≤ p,







X
−
U
(x,
.
.
.
,
x)


n
m
n
k
k
0 ≤ n = n1 + · · · + nk ≤ r 



)



(c, P, 0) −m Xk −m Uk (




P ⊆ Pm−1 ,





(c,
P,
1)
−
X
−
U
(x,
.
.
.
,
x),

m
n
m
n
1
1
 P is finite and prefix-closed, 


Xn , . . . ,
.
 0 ≤ |P | = k ≤ r,






(c,
P,
k)
−
X
−
U
(x,
.
.
.
,
x)


m
n
m
n
k
k
0 ≤ n = n1 + · · · + nk ≤ r 



)
Here, the number of occurrences of x in Uni (x, . . . , x) is |Uni | = ni . When
j = 0, we understand the notation Xj −m Uj (x, . . . , x) to mean X0 , i.e., the
tree consisting of a single node labeled by X0 . Note that An and Z are (finite)
subsets of Υ and I is a finite subset of Υ × Tm
Υ ∪x . It is straightforward to prove
that
m
DT m
(n) = ym (Locm+1 (An , Z, I))
Σ (n) ∩ T e
Σr ,p
holds for n = 0, . . . , r. The case of n = 0 gives the statement of the lemma. We
omit the details.
t
u
The following lemma is straightforward.
Lemma 31. If L is simple context-free, there are finite sets A, Z, I such that
L = ym (Locm+1 (A, Z, I)) and Z ∩ { c | (c, T ) ∈ I } = ∅.
T
Recall the definition of delm (T , U ) for U ⊆ { v ∈ T | |Cm
(v)| = 1 }, which
∗
was given in terms of the function fU : T → [1, m] :
fU (ε) = ε,
fU (v · i) = fU (v) · i
for i < m,
(
fU (v) · m if v 6∈ U ,
fU (v · m) =
fU (v)
if v ∈ U .
For TΣm and Υ ⊆ Σ, delm,Υ (T ) was defined to be delm (T , U ), with U = { v ∈
T
T | `T (v) ∈ Υ, |Cm
(v)| = 1 }.
m+1
For T ∈ TΣ,x , let fyTm be the mapping from the nodes of T to the nodes of
ym (T ) defined by
T
fyTm (u) = fU (fenc
(u)),
m
where
T
U = fenc
({ u ∈ T | u ∈ dom(≺Tm+1 ) or `T (u) = x }).
m
It is clear that if u ∈ T − dom(≺m+1 ) and `T (u) 6= x, then `T (u) =
`ym (T ) (fyTm (u)).20
20
Here and in the proof of the next lemma, we use the notations `ym (T ) and `encm (T )
with their obvious meaning.
42
Makoto Kanazawa
Lemma 32. Let L ⊆ Tm
Σ be a simple context-free set.
0
(i) If L0 ⊆ Tm
Σ is local, then L ∩ L is simple context-free.
(ii) For every projection π : Σ → Σ 0 , π(L) is simple context-free.
(iii) If Σ 0 ⊆ Σ, then delm,Σ 0 (L) is simple context-free.
Proof. Let L = ym (Locm+1 (A, Z, I)), where Σ ⊆ Υ and A ⊆ Υ, Z ⊆ Υ ∪{x}, I ⊆
Υ × Tm
Υ ∪{x} are finite sets. By Lemma 31, we may assume Z ⊆ Σ ∪ {x} and
{ c | (c, T ) ∈ I } ⊆ Υ − Σ. Let r = max{ n | (c, T ) ∈ I, T ∈ Tm
Υ (n) }. We only
prove (i), since (ii) and (iii) are straightforward.21
Let L0 = Locm (A0 , Z 0 , I 0 ), where A0 , Z 0 ⊆ Σ, I 0 ⊆ Σ × Tm−1
are finite sets.
Σ
Let
Υ1 = { (c, d, e1 , . . . , en ) | c ∈ Υ − Σ, d, e1 , . . . , en ∈ Σ, n ≤ r }.
Define projections π1 : Σ ∪ Υ1 → Υ and π2 : Σ ∪ Υ1 → Σ by
π1 (c) = c
π1 ((c, d, e1 , . . . , en )) = c
π2 (c) = c
for c ∈ Σ,
for (c, d, e1 , . . . , en ) ∈ Υ1 ,
for c ∈ Σ,
π2 ((c, d, e1 , . . . , en )) = d for (c, d, e1 , . . . , en ) ∈ Υ1 .
Let
A00d = π2−1 ({d})
for d ∈ Σ,
Z 00 = Z 0 ∪ { (c, d, e1 , . . . , en ) ∈ Υ1 | n = 0 },
0
I 00 = { (c, T ) ∈ Σ × Tm−1
Σ∪Υ1 | (c, π2 (T )) ∈ I } ∪
{ ((c, d, e1 , . . . , en ), T ) ∈ Υ1 × Tm−1
Σ∪Υ1 |
|T | = n and if ui is the i-th node of T , then π2 (`T (ui )) = ei }.
Define
A1 = (A ∩ A0 ∩ Z 0 ) ∪ { (c, d) ∈ Υ1 | c ∈ A, d ∈ A0 },
Z1 = Z,
I1 = { ((c, d, e1 , . . . , en ), T ) | (c, d, e1 , . . . , en ) ∈ Υ1 , T ∈ Tm
Σ∪Υ1 (n), (c, π1 (T )) ∈ I,
m
T [(c, e1 ), . . . , (c, en )] ∈ Loc (A00d , Z 00 , I 00 ) }.
Note that A1 , Z1 , I1 are finite sets. We show ym (Locm+1 (A1 , Z1 , I1 )) = L ∩ L0 .
We first note some useful facts about members of Locm+1 (A1 , Z1 , I1 ). To
state this, we need another projection and a certain relabeling defined on some
f
subset of Tm+1
Σ∪Υ1 ,x . Define a projection π3 : Σ ∪ (Υ1 − Υ1 ) → Σ by
π3 (c) = c
if c ∈ Σ,
π3 (((c, d, e1 , . . . , en ), P, 0)) = d,
π3 (((c, d, e1 , . . . , en ), P, i)) = ei
21
if 1 ≤ i ≤ n.
This proof is very long and laborious. An alternative approach would be to use the
notion of a recognizable (equivalently, MSO-definable) set of m-dimensional trees
[24,23] and rely on the fact that the yield mapping is an MSO-definable transduction.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
43
Let
M = { T ∈ Tm+1
Σ∪Υ1 ,x |
for all v ∈ T , (i) v ∈ dom(≺Tm+1 ) if and only if `T1 (v) ∈ Υ1
T1
(v) ∈ Tm
and (ii) `T1 (v) = (c, d, e1 , . . . , en ) implies Cm+1
Σ∪Υ1 (n) }.
It is clear that Locm+1 (A1 , Z1 , I1 ) ⊆ M . For T1 = (T, `T1 ) ∈ M , define T1† =
†
(T, `T1 ) by
(
`T1 (v) if `T1 (v) ∈ Σ ∪ Υ1 ,
T1†
` (v) =
(c, ei ) if `T1 (v) = x, u Jm+1,i v, and `T1 (u) = (c, d, e1 , . . . , en ).
(14)
It is easy to see that for all u ∈ T , we have
†
T1
(u))).
π2 (`T1 (u)) = π3 (`encm (T1 ) (fenc
m
(15)
Now let T1 = (T, `T1 ) ∈ Locm+1 (A1 , Z1 , I1 ). Since T1 ∈ Locm+1 (A1 , Z1 , I1 ),
it is also easy to see the following:
†
†
u ∈ dom(≺Tm+1 ) implies π2 (`T1 (u)) = π2 (`T1 (u · (m + 1))),
u
1
JT
m+1,i
v and u
CTm,i
T1†
w imply π2 (`
T1†
(v)) = π2 (`
(w)).
(16)
(17)
It follows from (15), (16), and (17) that if u ∈ dom(≺Tm+1 ) or `T1 (u) = x, then
T1
T1
π3 (`T1 (fenc
(u)) = π3 (`T1 (fenc
(u) · m)).
m
m
T1
fenc
(dom(≺Tm+1 )
m
T1
(18)
T1
Let U =
∪ { v ∈ T | ` (v) = x }). Note that if
T1
(u) 6∈ U ), then `T1 (u) =
u ∈ T and ` (u) ∈ Σ (or, equivalently, if fenc
m
T1
`encm (T1 ) (fenc
(u)) = `ym (T1 ) (fyTm1 (u)). By (18), we can see that for all nodes
m
w, t of encm (T1 ), fU (w) = fU (t) implies π3 (`encm (T1 ) (w)) = π3 (`encm (T1 ) (t)).
So if w0 is the unique node such that w0 6∈ U and fU (w) = fU (w0 ), then
π3 (`encm (T1 ) (w)) = π3 (`encm (T1 ) (w0 ))
= π3 (`ym (T1 ) (fU (w0 )))
= `ym (T1 ) (fU (w0 ))
= `ym (T1 ) (fU (w)).
It follows from this and (15) that for all u ∈ T , we have
†
π2 (`T1 (u)) = `ym (T1 ) (fyTm1 (u)).
(19)
Now for T ∈ Locm+1 (A, Z, I), define T̂ = (T, `T̂ ) ∈ Tm+1
Σ∪Υ1 ,x by
 T
` (u)
if u 6∈ dom(≺Tm+1 ),



(c, d, e , . . . , e ) if u ∈ dom(≺T ), c = `T (u),
1
n
m+1
`T̂ (u) =
ym (T ) T
T
T

d
=
`
(f

ym (u)), |Cm (u)| = n, u Cm,i wi ,


ei = `ym (T ) (fyTm (wi )) for i = 1, . . . , n.
(20)
44
Makoto Kanazawa
Then for all T ∈ Locm+1 (A, Z, I), we have T̂ ∈ M , π1 (T̂ ) = T , and ym (T ) =
ym (T̂ ).
We claim that
if T1 ∈ Locm+1 (A1 , Z1 , I1 ) and T = π1 (T1 ), then
T ∈ Locm+1 (A, Z, I) and T1 = T̂ .
(21)
The definition of A1 , Z1 , I1 easily implies π1 (Locm+1 (A1 , Z1 , I1 )) ⊆
Locm+1 (A, Z, I). Suppose T1 ∈ Locm+1 (A1 , Z1 , I1 ) and let T = π1 (T1 ). Clearly,
we have ym (T ) = ym (T1 ) and fyTm1 = fyTm . To prove T1 = T̂ , all we need
to show is that (20) holds with T1 in place of T̂ . If u 6∈ dom(≺Tm+1 ), then
`T1 (u) ∈ Z1 = Z ⊆ Σ ∪ {x}, so `T (u) = π1 (`T1 (u)) = `T1 (u). Suppose
T
T
1
u ∈ dom(≺Tm+1 ), |Cm
(u)| = n, and for i = 1, . . . , n, u JT
m+1,i vi , u Cm,i wi .
T1
T1
T1
Since T1 is well-labeled, Cm+1
(u) ∈ Tm
Σ∪Υ1 (n). Since (` (u), Cm+1 (u)) ∈ I1 ,
T1
` (u) = (c, d, e1 , . . . , en ) for some c ∈ Υ − Σ and d, e1 , . . . , en ∈ Σ. By (19), d =
†
π2 (`T1 (u)) = π2 (`T1 (u)) = `ym (T1 ) (fyTm1 (u)) = `ym (T ) (fyTm (u)). By (17), we have
†
†
†
ei = π2 (`T1 (vi )) = π2 (`T1 (wi )). By (19) again, π2 (`T1 (wi )) = `ym (T1 ) (fyTm1 (wi )),
so ei = `ym (T1 ) (fyTm1 (wi )) = `ym (T ) (fyTm (wi )). We have shown that (20) holds
with T1 in place of T̂ . This proves (21).
We next prove that the following equivalence holds for all T ∈ L =
Locm+1 (A, Z, I):
ym (T ) ∈ L0 if and only if T̂ ∈ Locm+1 (A1 , Z1 , I1 ).
(22)
It easily follows from (21) and (22) that L ∩ L0 = ym (Locm+1 (A1 , Z1 , I1 )) and
hence L ∩ L0 is simple context-free.
We prove (22). Let T ∈ Locm+1 (A, Z, I) and V = ym (T ) = ym (T̂ ). Since
†
T̂ ∈ M , the relabeling T̂ † = (T, `T̂ ) of T̂ given by (14) is defined. From the
way T̂ is defined, it is clear that for all u ∈ T , we have
†
π2 (`T̂ (u)) = `ym (T ) (fyTm (u)).
(23)
For the “only if” direction of (22), suppose V ∈ L0 . We show that T̂ satisfies
the requirements L1–L3 for membership in Locm+1 (A1 , Z1 , I1 ).
L1. We have `T̂ (ε) ∈ A1 if and only if either ε 6∈ dom(≺Tm+1 ) and `T (ε) =
V
` (ε) ∈ A ∩ A0 ∩ Z 0 or ε ∈ dom(≺Tm+1 ), `T (ε) ∈ A, and `ym (T ) (fyTm (ε)) =
`V (ε) ∈ A0 . Since T ∈ Locm+1 (A, Z, I) and V ∈ L0 = Locm (A0 , Z 0 , I 0 ), clearly
one of these conditions holds.
L2. Suppose u ∈ T − dom(≺Tm+1 ). Then `T̂ (u) = `T (u) ∈ Z = Z1 .
L3. Suppose u ∈ dom(≺Tm+1 ). Let (c, d, e1 , . . . , en ) = `T̂ (u) and
T̂
T̂ †
W = Cm+1
(u)[(c, e1 ), . . . , (c, en )] = Cm+1
(u). We need to show W ∈
Locm (A00d , Z 00 , I 00 ).
†
– We show `W (ε) ∈ A00d . By (23), π2 (`W (ε)) = π2 (`T̂ (u · (m + 1)) =
`ym (T ) (fyTm (u · (m + 1))) = `ym (T ) (fyTm (u)) = d. So `W (ε) ∈ A00d .
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
45
W
00
– Let w ∈ W − dom(≺W
m ). We show ` (w) ∈ Z . Note that since w 6∈
W
T
dom(≺m ), we have u · (m + 1) · w 6∈ dom(≺m ).
Case 1. u · (m + 1) · w ∈ dom(≺Tm+1 ). Then `T̂ (u · (m + 1) · w) 6= x and
T
`W (w) = `T̂ (u · (m + 1) · w). Since Cm
(u · (m + 1) · w) = ∅, `W (w) =
T̂
0 0
` (u · (m + 1) · w) is of the form (c , d ) and hence is in Z 00 .
Case 2. u · (m + 1) · w 6∈ dom(≺Tm+1 ). In this case, either `W (w) = `T̂ (u · (m +
1) · w) = `T (u · (m + 1) · w) = c0 for some c0 ∈ Σ or `T̂ (u · (m + 1) · w) = x and
`W (w) = (c, ei ) for some i. In the latter case, `W (w) ∈ Z 00 by the definition
T
V
of Z 00 . In the former case, Cm
(u · (m + 1) · w) = Cm
(fyVm (f · (m + 1) · w)),
T
T
and since u · (m + 1) · w 6∈ dom(≺m ), fym (u · (m + 1) · w) 6∈ dom(≺Vm ). Since
V ∈ L0 , `W (w) = `T (u · (m + 1) · w) = `V (fyTm (u · (m + 1) · w)) ∈ Z 0 ⊆ Z 00 .
T
T̂
– Let w ∈ dom(≺W
m ). Then u · (m + 1) · w ∈ dom(≺m ) and ` (u · (m + 1) · w) 6=
W
x, so `W (w) = `T̂ (u · (m + 1) · w). We show (`W (w), Cm
(w)) ∈ I 00 . Let
W
W
p = |Cm
(w)| and for i = 1, . . . , p, let wi be the i-th node of Cm
(w).
T
W
Case 1. u · (m + 1) · w 6∈ dom(≺m+1 ). Then ` (w) ∈ Σ and `W (w) =
W
(w) =
`ym (T̂ ) (fyT̂m (u · (m + 1) · w)) = `V (fyTm (u · (m + 1) · w)). Also, Cm
T
V
T
Cm (u · (m + 1) · w) = Cm (fym (u · (m + 1) · w)). For i = 1, . . . , p, π2 (`W (w ·
†
m · wi )) = π2 (`T̂ (u · (m + 1) · w · m · wi )) = `V (fyTm (u · (m + 1) · w · m · wi ))
by (23). Since u · (m + 1) · w 6∈ dom(≺Tm+1 ), fyTm (u · (m + 1) · w · m · wi ) =
V
W
(fyTm (u · (m + 1) · w)).
(w)) = Cm
fyTm (u · (m + 1) · w) · m · wi . Hence π2 (Cm
m
0
0
0 0
W
W
Since V ∈ L = Loc (A , Z , I ), (` (w), π2 (Cm (w))) = (`V (fyTm (u · (m +
V
W
1) · w)), Cm
(fyTm (u · (m + 1) · w))) ∈ I 0 . Therefore, (`W (w), Cm
(w)) ∈ I 00 .
Case 2. u · (m + 1) · w ∈ dom(≺Tm+1 ). Then by the definition of T̂ , `W (w) =
`T̂ (u · (m + 1) · w) = (c0 , d0 , e01 , . . . , e0p ), where for i = 1, . . . , p, e0i = `V (fyTm (u ·
W
†
(m + 1) · w · m · wi )). We have `Cm (w) (wi ) = `W (w · m · wi ) = `T̂ (u · (m + 1) ·
W
w ·m·wi ), and by (23), π2 (`Cm (w) (wi )) = `V (fyTm (u·(m+1)·w ·m·wi )) = e0i ,
W
(w)) ∈ I 00 .
so (`W (w), Cm
For the “if” direction of (22), suppose T̂ ∈ Locm+1 (A1 , Z1 , I1 ). We show that
ym (T ) = V satisfies the requirements L1–L3 for membership in Locm (A0 , Z 0 , I 0 ).
For each v ∈ V , let h(v) be the unique node in T − dom(≺Tm+1 ) such that
`T (h(v)) 6= x and fyTm (h(v)) = v. We have `T (h(v)) = `T̂ (h(v)) = `V (v) ∈ Σ
V
T
(v).
(h(v)) = Cm
and Cm
T̂
L1. Since ` (ε) ∈ A1 , either ε 6∈ dom(≺Tm+1 ) and `T (ε) = `V (ε) ∈ A∩A0 ∩Z 0 ,
or ε ∈ dom(≺Tm+1 ), `T (ε) ∈ A, and `V (fyTm (ε)) = `V (ε) ∈ A0 . So in either case,
we have `V (ε) ∈ A0 .
L2. Let v ∈ V − dom(≺Vm ). If h(v) = ε, then ε 6∈ dom(≺Tm+1 ) and
`T̂ (ε) = `V (v) ∈ A ∩ A0 ∩ Z 0 , so `V (v) ∈ Z 0 . If h(v) 6= ε, let u ∈ T be
such that u CTm+1 h(v) = u · (m + 1) · w. Let `T̂ (u) = (c, d, e1 , . . . , en ) and
T̂
V
T
W
W = Cm+1
(u)[(c, e1 ), . . . , (c, en )]. Since ∅ = Cm
(v) = Cm
(h(v)) = Cm
(w), w 6∈
m
W
dom(≺m ). Since ((c, d, e1 , . . . , en ), W ) ∈ I1 , we have W ∈ Loc (A00d , Z 00 , I 00 ),
46
Makoto Kanazawa
and hence `V (v) = `T̂ (h(v)) = `W (w) ∈ Z 00 . Since `V (v) ∈ Σ, it follows that
`V (v) ∈ Z 0 .
V
T
L3. Let v ∈ dom(≺Vm ). Since Cm
(v) = Cm
(h(v)) 6= ∅, h(v) 6= ε. Let u ∈ T
T
be such that u Cm+1 = h(v) = u · (m + 1) · w. Let `T̂ (u) = (c, d, e1 , . . . , en ).
T̂
V
Then W = Cm+1
(u)[(c, e1 ), . . . , (c, en )] ∈ Locm (A00d , Z 00 , I 00 ). We have Cm
(v) =
T
W
V
Cm (h(v)) = Cm (w). Let p = |Cm (v)| and for i = 1, . . . , p, let vi be such that
W
v CVm,i v · m · vi . We must have (`W (w), Cm
(w)) ∈ I 00 , and since `W (w) =
V
W
W
0
` (v) ∈ Σ, we get (` (w), π2 (Cm (w))) ∈ I . By (23),
†
π2 (`W (w · m · vi )) = π2 (`T̂ (u · (m + 1) · w · m · vi ))
= `V (fyTm (u · (m + 1) · w · m · vi ))
= `V (fyTm (h(v) · m · vi ))
= `V (fyTm (h(v)) · m · vi )
= `V (v · m · vi ).
V
W
So (`V (v), Cm
(v)) = (`W (w), π2 (Cm
(w))) ∈ I 0 .
We have established (22). This concludes the proof.
t
u
m
Clearly, T ∈ DT m
Σ implies delm,Σ−Σ
e (T ) ∈ TΣ . We obtain the following
generalization of the Chomsky-Schützenberger theorem:
Theorem 33. Let L ⊆ Tm
Σ . The following are equivalent:
(i) L is simple context-free.
(ii) There exist finite alphabets Υ, Υ 0 , a projection π : Υ 0 → Υ , and a local set
R ⊆ Tm0 such that L = delm,Υe−Υ (e
π (R ∩ DT m
Υ 0 )).
Υe
Proof. (ii) ⇒ (i). Suppose R ⊆ Tm0 is a local set. Clearly, R ⊆ Tm0
for some
Υe
Υe q ,p
m
m
m
p, q. So R ∩ DT Υ 0 = R ∩ DT Υ 0 ∩ Tm0 . By Lemma 30, DT Υ 0 ∩ Tm0 is simple
Υe q ,p
Υe q ,p
context-free. It then follows by Lemma 32 that L = delm,Υe−Υ (e
π (R ∩ DT m
Υ 0 )) =
m
m
delm,Υe−Υ (e
π (R ∩ DT Υ 0 ∩ T 0 )) is simple context-free.
Υe q ,p
(i) ⇒ (ii). Let K ⊆ Tm+1
Υ,x be a local set such that L = ym (K). By Lemma 29,
there exist a projection π : Υ 0 → Υ , and a local set R ⊆ Tm0 such that encm (K) =
Υe
π
e(R ∩ DT m
Υ 0 ). So
L = ym (K)
= delm,Υe−Υ (encm (K))
= delm,Υe−Υ (e
π (R ∩ DT m
Υ 0 )).
t
u
As was the case with the original Chomsky-Schützenberger Theorem, in the
proof of Theorem 33, enc−1
e is a bijection from R ∩ DT m
m ◦π
Υ 0 to K. (See the
second statement in Lemma 29.)
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
9
47
A Chomsky-Schützenberger-Weir Representation
Theorem for Simple Context-Free Tree Grammars
We are now going to use Theorem 33 to obtain a generalization of Weir’s representation theorem about the string languages of tree-adjoining grammars to the
string languages of simple context-free tree grammars. First, we prove a lemma
that generally holds of m-dimensional Dyck tree languages.
The following lemma is straightforward.
Lemma 34. Let Σ 0 be a finite alphabet and π : Σ 0 → Σ be a projection. If L is
−1
a super-local subset of Tm
(L) is a super-local subset of Tm
Σ , then π
Σ0 .
Lemma 35. Let m ≥ 2. For any local set L ⊆ Tm , there exist a finite alphabet
e
Σ
Σ 0 , a degree-bounded, super-local L0 ⊆ Tm0 , and a projection π : Σ 0 → Σ that
e
Σ
satisfy the following conditions:
(i) π
e(L0 ) ⊆ L.
(ii) L ∩ DT m
e(L0 ∩ DT m
e maps L0 ∩ DT m
Σ = π
Σ 0 ). Moreover, π
Σ 0 bijectively to
m
L ∩ DT Σ .
Proof. By Lemma 10, there are a finite alphabet Σ1 , a super-local subset L1
e
of Tm
Σ1 , and a projection π1 : Σ1 → Σ such that π1 maps L1 bijectively to L.
−1
m
Since π
f1 (L ∩ DT Σ ) is not a subset of an m-dimensional Dyck tree language,
−1
we have to relabel some nodes of π
f1 (T ) for T ∈ L ∩ DT m
Σ to get a set of the
m
0
form L ∩ DT Σ 0 .
For d ∈ Σ, P a finite prefix-closed subset of Pm−1 , and i ∈ [0, |P |], let
∆d,P,i = { δ ∈ Σ1 | π1 (δ) = (d, P, i) }.
Define
Σ2 = Σ1 −
[
∆d,P,i ,
d,P,i
∆d,P = { (δ0 , δ1 , . . . , δ|P | ) | δi ∈ ∆d,P,i for i = 1, . . . , |P | },
[
∆=
∆d,P ,
d,P
0
Σ = Σ2 ∪ ∆.
Note that Σ 0 is a finite alphabet. Define a projection π : Σ 0 → Σ by
π(c) = π1 (c)
if c ∈ Σ2 ,
π(δ) = d
if δ ∈ ∆d,P .
f0 to m-dimensional trees over Σ.
e Let
Then π
e maps m-dimensional trees over Σ
f0 | δ ∈ ∆d,P , 0 ≤ i ≤ |P | },
∆0 = { (δ, P, i) ∈ Σ
48
Makoto Kanazawa
and define a projection π2 : Σ2 ∪ ∆0 → Σ1 by
π2 (c) = c
π2 (((δ0 , δ1 , . . . , δ|P | ), P, i)) = δi
if c ∈ Σ2 ,
if (δ0 , δ1 , . . . , δk ) ∈ ∆d,P and 0 ≤ i ≤ |P |.
Then for T ∈ Tm
Σ2 ∪∆0 ,
π
e(T ) = π1 (π2 (T )).
Let
L0 = π
e−1 (L) ∩ Tm
Σ2 ∪∆0 .
Then
L0 = π2−1 (π1−1 (L))
= π2−1 (L1 ).
m
By Lemma 34, L0 is a super-local subset of Tm
.
Σ2 ∪∆0 and hence of TΣ
e0
m
m
0
Clearly, π
e(L ) ⊆ L, so (i) holds. Since π
e(DT Σ 0 ) ⊆ DT Σ always holds for any
m
projection π : Σ 0 → Σ, we also have π
e(L0 ∩ DT m
Σ 0 ) ⊆ L ∩ DT Σ .
m
It remains to show that for each T ∈ L ∩ DT Σ , there is a unique T 0 ∈
0
L ∩ DT m
e(T 0 ) = T .
Σ 0 such that π
m
Let T ∈ L ∩ DT Σ . We relabel the nodes of T to turn it into a T̂ ∈ DT m
Σ0 .
Recall that π1 maps L1 bijectively to L, so we have T1 = π1−1 (T ) ∈ L1 . Let
V̂
V = (V, `V ) = enc−1
m (T ). Define V̂ = (V, ` ) by

V

`T1 (fenc
(v)) if v ∈ V − dom(≺Vm+1 ) and `V (v) 6= x,

m




if `V (v) = x,
x
V̂
V
` (v) = (δ0 , δ1 , . . . , δk ) if v ∈ dom(≺Vm+1 ), |Cm
(v)| = k,


V
T
1

(v)),
and
(f
δ
=
`

0
encm



V
T1
v JV
m+1,i vi , δi = ` (fencm (vi )) for i = 1, . . . , k.
Let
T̂ = encm (V̂ ).
If v ∈ V − dom(≺Vm+1 ) and `V̂ (v) 6= x, it is easy to see that `V̂ (v) =
V
V
(v), k = |P |, and
`T̂ (fenc
(v)) ∈ Σ2 ⊆ Σ 0 . Let v ∈ dom(≺Vm+1 ), P = Cm
m
V
V̂
v Jm+1,i vi for i = 1, . . . , k. Let (δ0 , δ1 , . . . , δk ) = ` (v) and d = `V (v). Then
V
π1 (δ0 ) = `T (fenc
(v)) = (d, P, 0)
m
and for i = 1, . . . , k,
V
π1 (δi ) = `T (fenc
(vi )) = (d, P, i).
m
This implies that (δ0 , δ1 , . . . , δk ) = `V̂ (v) ∈ ∆d,P ⊆ ∆ ⊆ Σ 0 . We have
V
V
`T̂ (fenc
(v)) = ((δ0 , δ1 , . . . , δk ), P, 0) ∈ ∆0 and for i = 1, . . . , k, `T̂ (fenc
(vi )) =
m
m
m
m+1
0
m
((δ0 , δ1 , . . . , δk ), P, i) ∈ ∆ . Therefore, V̂ ∈ TΣ 0 ,x and T̂ ∈ DT Σ 0 ∩ TΣ2 ∪∆0 . It
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
49
is also easy to see that π2 (T̂ ) = T1 , so T̂ ∈ π2−1 (L1 ) = L0 . We have shown
T̂ ∈ L0 ∩ DT m
e(T̂ ) = π1 (π2 (T̂ )) = π1 (T1 ) = T .
Σ 0 . Since π2 (T̂ ) = T1 , π
Now to show uniqueness, suppose T 0 ∈ L0 ∩ DT m
e(T 0 ). Let
Σ 0 and T = π
0
−1
0
0
0
T1 = π1 (T ). Then we have T1 = π2 (T ). We prove T = T̂ . Let V = (V, `V ) =
0
V
0
enc−1
m (T ) and V = (V, ` ) = π(V ). Then it is clear that encm (V )0 = T . So it
0
suffices to prove V = V̂ . Let v ∈ V . If `V (v) = x, then clearly, `V (v) = x =
0
0
V0
`V̂ (v). If v ∈ V − dom(≺Vm+1 ) and `V (v) 6= x, then `V (v) = `T (fenc
(v)) ∈
m
0
0
m+1
0
0
0
m
V
T
V0
Σ2 , since V ∈ TΣ 0 ,x and T ∈ L ⊆ TΣ2 ∪∆0 . So ` (v) = ` (fencm (v)) =
0
0
V
V
V
π2 (`T (fenc
(v))) = `T1 (fenc
(v)) = `V̂ (v). If v ∈ dom(≺Vm+1 ) and Cm
(v) = P ,
m
m
T0
V0
V0
0
V0
then ` (fencm (v)) = (` (v), P, 0) ∈ ∆ , so ` (v) = (δ0 , δ1 , . . . , δk ), where
k = |P | and (δ0 , δ1 , . . . , δk ) ∈ ∆d,P for some d. For i = 1, . . . , k, let vi be such
0
V
that v JV
m+1,i vi , or, equivalently, v Jm+1,i vi . Then for i = 1, . . . , k, δi =
0
0
V
V
π2 ((δ0 , δ1 , . . . , δk ), P, i) = π2 (`T (fenc
(vi ))) = `T1 (fenc
(vi )). We also have
m
m
0
0
T
V
T1
V
δ0 = π2 ((δ0 , δ1 , . . . , δk ), P, 0) = π2 (` (fencm (v))) = ` (fencm (v)). So `V̂ (v) =
0
(δ0 , δ1 , . . . , δk ) = `V (v).
t
u
Next we prove a lemma about DT 2Υ . Recall that there was an implicit dependence on the dimension m in the definition of Υe, which is the alphabet
e
of the language DT m
Υ ; when a symbol of the form (c, P, i) is in Υ , it is assumed that P is a finite, possibly empty, prefix-closed subset of Pm−1 . In what
follows, we assume that the alphabet Υe is defined from Υ with respect to dimension m = 2, so that (c, P, i) ∈ Υe implies P = {ε, 1, . . . , 1k−1 } for some
k ≥ 0. We abbreviate (c, P, i) by (c, k, i), where |P | = k. Under this convention,
Υeq = Υ ∪ { (c, k, i) | c ∈ Υ, 0 ≤ k ≤ q, 0 ≤ i ≤ k }.
Recall that for any alphabet Σ, the alphabet ΓΣ consists of symbols of the
form [c or ]c with c ∈ Σ.
We let TΥ,Υe−Υ stand for the set of trees T ∈ TΥe such that for every v ∈ T ,
T
` (v) ∈ Υe − Υ implies |C2T (v)| = 1. (In other words, Υe − Υ is regarded as a
ranked alphabet all of whose symbols have rank 1.)
Lemma 36. Let η : Γ ∗ → Γ ∗ be the alphabetic homomorphism defined as fole
e
Υ
Υ
lows:
η([c ) = ε,
η(]c ) = ε
for c ∈ Υ ,
η([(c,k,0) ) = [(c,k,0) ,
η(](c,k,0) ) = ](c,k,k) ,
η([(c,k,i) ) = ](c,k,i−1) ,
η(](c,k,i) ) = [(c,k,i)
for 1 ≤ i ≤ k.
Then
DT 2Υ = TΥ,Υe−Υ ∩ enc−1 (η −1 (DΥe)).
(Here, enc is the standard encoding function defined on ordinary 2-dimensional
trees.)
50
Makoto Kanazawa
Proof. (⊆). Suppose T ∈ DT 2Υ . Clearly, T ∈ TΥ,Υe−Υ , so it suffices to prove
η(enc(T )) ∈ DΥe. This is proved by induction on the length of the reduc∗
tion T
T 0 ∈ TΥ . If T ∈ TΥ , then clearly, η(enc(T )) = ε ∈ DΥe.
∗
Suppose T
T 00
T 0 ∈ TΥ . Then T = U [(c, k, 0) −2 T0 [(c, k, 1) −2
T1 , . . . , (c, k, k) −2 Tk ]] and T 0 = U [T0 [T1 , . . . , Tk ]] for some c ∈ Υ , k ≥ 0,
U ∈ TΥe(1), T0 ∈ TΥ (k), and Ti ∈ TΥe (i = 1, . . . , k). Then η(enc(T )) =
z1 [(c,k,0) ](c,k,0) y1 [(c,k,1) ](c,k,1) y2 . . . yk [(c,k,k) ](c,k,k) z2 and η(enc(T 00 )) =
z1 y1 y2 . . . yk z2 for some z1 , z2 , y1 , y2 , . . . , yk ∈ (ΓΥe−Υ )∗ . By induction hypothesis, η(enc(T 00 )) ∈ DΥe, and this easily implies η(enc(T )) ∈ DΥe.
(⊇). Suppose T ∈ TΥ,Υe−Υ and η(enc(T )) ∈ DΥe. If T ∈ TΥ , then T ∈ DT 2Υ .
Suppose that T 6∈ TΥ . First, we claim that T must have a node labeled by
(c, k, 0) for some c ∈ Υ and k ≥ 0. For, if T has a node v such that `T (v) =
(c, k, i) for some c ∈ Υ , k ≥ 1, and i ≥ 1, then enc(T ) = x1 [(c,k,i) x2 ](c,k,i) x3
and η(enc(T )) = η(x1 ) ](c,k,i−1) η(x2 ) [(c,k,i) η(x3 ) for some x1 , x2 , x3 ∈ Γ ∗ .
e
Υ
Since η(enc(T )) ∈ DΥe, [(c,k,i−1) must occur in η(x1 ). If i − 1 = 0, this implies
that [(c,k,0) occurs in x1 and it follows that T has a node labeled by (c, k, 0).
Otherwise, ](c,k,i−1) occurs in x1 , and it follows that T has a node labeled by
(c, k, i − 1). Repeating this reasoning, we see that T must have a node labeled
by (c, k, 0).
We show that T ∈ DT 2Υ by induction on the number of nodes of T that
are labeled by a symbol of the form (c, k, 0). Let v be a node of T labeled by
+
(c, k, 0) such that no node v 0 with v (CT2 ) v 0 is labeled by a symbol of the
form (d, l, 0). Then enc(T ) = x1 [(c,k,0) y ](c,k,0) x2 for some x1 , x2 ∈ Γ ∗ and
e
Υ
y ∈ enc(T2
). (Note that |C2T (v)| = 1.)
e−Υ
Υ,Υ
Case 1. k = 0. We show y ∈ DΥ0 , i.e., the subtree T0 of T rooted at v ·
2 is in TΥ . Suppose otherwise, and take the alphabetically first node of T0
labeled by some (d, l, j) ∈ Υe − Υ . Then y = y 0 [(d,l,j) y 00 for some y 0 ∈ ΓΥ∗ and
y 00 ∈ Γ ∗ . By our assumption about v, j ≥ 1 and l ≥ 1. We have η(enc(T )) =
e
Υ
η(x1 ) [(c,0,0) η(y) ](c,0,0) η(x2 ) = η(x1 ) [(c,0,0) ](d,l,j−1) η(y 00 ) ](c,0,0) η(x2 ), which
contradicts the assumption that η(enc(T )) ∈ DΥe. So T0 is in TΥ , and we can
write T = U [(c, 0, 0) −2 T0 ]. So T
U [T0 ] ∈ TΥ,Υe−Υ . We have η(enc(T )) =
η(x1 ) [(c,0,0) ](c,0,0) η(x2 ) and η(enc(U [T0 ])) = η(x1 )η(x2 ). Since η(enc(T )) ∈
DΥe, η(x1 )η(x2 ) ∈ DΥe as well, and U [T0 ] ∈ DT 2Υ by induction hypothesis. Since
T
U [T0 ], we conclude T ∈ DT 2Υ .
Case 2. k ≥ 1. We show
y = z0 [(c,k,1) y1 ](c,k,1) z1 [(c,k,2) y2 ](c,k,2) . . . zk−1 [(c,k,k) yk ](c,k,k) zk
for some z0 , z1 , . . . , zk ∈ ΓΥ∗ and y1 , y2 , . . . , yk ∈ enc(TΥ,Υe−Υ ). First, we show by
induction that the following condition holds for i = 0, . . . , k:
y has a prefix of the form z0 [(c,k,1) y1 ](c,k,1) . . . zi−1 [(c,k,i) yi ](c,k,i) .
(24)
The case of i = 0 is trivial. Suppose we have shown (24) for i < k, i.e.,
y = z0 [(c,k,1) y1 ](c,k,1) . . . zi−1 [(c,k,i) yi ](c,k,i) y 0 with z0 , . . . , zi−1 ∈ ΓΥ∗ and
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
51
y1 , . . . , yi ∈ enc(TΥ,Υe−Υ ). Since
η(enc(T )) = η(x1 ) [(c,k,0) η(y) ](c,k,k) η(x2 )
= η(x1 ) [(c,k,0) ](c,k,0) η(y1 ) [(c,k,1) . . . ](c,k,i−1) η(yi ) [(c,k,i) η(y 0 ) ](c,k,k) ,
we must have η(y 0 ) 6= ε. Let zi be the longest prefix of y 0 in ΓΥ∗ . Since
z0 , . . . , zi−1 ∈ ΓΥ∗ and y1 , . . . , yi and y are all in enc(TΥ,Υe−Υ ), it is easy to
see that y 0 = zi [(d,l,j) yi+1 ](d,l,j) y 00 for some yi+1 ∈ enc(TΥ,Υe−Υ ) and l, j ≥ 1.
We have η(y 0 ) = ](d,l,j−1) η(yi+1 ) [(d,l,j) η(y 00 ), and so
η(enc(T )) =
η(x1 ) [(c,k,0) ](c,k,0) η(y1 ) [(c,k,1) . . . ](c,k,i−1) η(yi ) [(c,k,i) ](d,l,j−1) η(yi+1 ) [(d,l,j) η(y 00 ) ](c,k,k) ,
which implies d = c, l = k, and j = i + 1. This shows that (24) holds with i + 1
in place of i. By induction, (24) holds with i = k.
We have
y = z0 [(c,k,1) y1 ](c,k,1) . . . zk−1 [(c,k,k) yk ](c,k,k) y 0
with z0 , . . . , zk−1 ∈ ΓΥ∗ and y1 , . . . , yk ∈ enc(TΥ,Υe−Υ ). We show that y 0 ∈ ΓΥ∗ .
Suppose otherwise. Then we must have y 0 = zk [(d,l,j) yk+1 ](d,l,j) y 00 for some
d ∈ Υ , j, l ≥ 1, zk ∈ ΓΥ∗ , and yk+1 ∈ enc(TΥ,Υe−Υ ). Then η(enc(T )) contains as
a substring [(c,k,k) ](d,l,j−1) , which is a contradiction since η(enc(T )) ∈ DΥe and
j − 1 < l. Therefore,
y = z0 [(c,k,1) y1 ](c,k,1) . . . zk−1 [(c,k,k) yk ](c,k,k) zk
with z0 , . . . , zk ∈ ΓΥ∗ and y1 , . . . , yk ∈ enc(TΥ,Υe−Υ ). This means that T is of the
form
T = U [(c, k, 0) −2 T0 [(c, k, 1) −2 T1 , . . . , (c, k, k) −2 Tk ]],
where T0 ∈ TΥ (k). So
T
T 0 = U [T0 [T1 , . . . , Tk ]].
Clearly, T 0 ∈ TΥ,Υe−Υ , and
η(enc(T 0 )) = η(x1 )η(y1 ) . . . η(yk )η(x2 ).
Since
η(enc(T )) =
η(x1 ) [(c,k,0) ](c,k,0) η(y1 ) [(c,k,1) ](c,k,1) η(y2 ) . . . [(c,k,k−1) ](c,k,k−1) η(yk ) [(c,k,k) ](c,k,k) η(x2 )
is in DΥe, it follows that η(enc(T 0 )) is in DΥe as well, and the induction hypothesis
gives T 0 ∈ DT 2Υ . Since T
T 0 , we conclude T ∈ DT 2Υ .
t
u
52
Makoto Kanazawa
Lemma 37. If L ⊆ T3Σ,x is a local set, then there exist a finite alphabet Υ , a
projection π : Υ → Σ, and a local set R ⊆ Γ + such that
e
Υ
b
enc(enc2 (L)) = π
e(R ∩ DΥe ∩ η −1 (DΥe)),
where η is the alphabetic homomorphism defined in Lemma 36. Moreover,
−1 b
enc−1
◦π
e maps R ∩ DΥe ∩ η −1 (DΥe) bijectively to L.
2 ◦ enc
Proof. By Lemma 29, there exist a finite alphabet Υ1 , a projection π1 : Υ1 → Σ,
f1 is a bijection
f1 (L1 ∩DT 2Υ1 ) and π
and a local set L1 ⊆ T2 such that enc2 (L) = π
Υe1
2
from L1 ∩ DT Υ1 to enc2 (L). We may assume L1 ⊆ TΥ ,Υe −Υ . By Lemma 35,
1
1
1
there exist a finite alphabet Υ2 , a projection π2 : Υ2 → Υ1 , and a super-local
set L2 ⊆ T2 such that π
f2 (L2 ) ⊆ L1 and π
f2 (L2 ∩ DT 2Υ2 ) = L1 ∩ DT 2Υ1 . Since
Υe2
L1 ⊆ TΥ ,Υe −Υ , it follows that L2 ⊆ TΥ ,Υe −Υ . We have
1
1
1
2
2
2
enc(enc2 (L)) = enc(f
π1 (L1 ∩ DT 2Υ1 ))
= enc(f
π1 (f
π2 (L2 ∩ DT 2Υ2 )))
c
c
=π
f1 (π
f2 (enc(L2 ∩ DT 2Υ2 )))
c
c
=π
f1 (π
f2 (enc(L2 ) ∩ enc(DT 2Υ2 ))),
(25)
since enc is injective. By Lemma 36,
enc(DT 2Υ2 ) = enc(TΥ ,Υe −Υ ) ∩ η2−1 (DΥe ),
2
2
2
2
where η2 : Γ ∗ → Γ ∗ is an alphabetic homomorphism defined like η. Since L2 ⊆
Υe2
Υe2
TΥ ,Υe −Υ ,
2
2
2
enc(L2 ) ∩ enc(DT 2Υ2 ) = enc(L2 ) ∩ η2−1 (DΥe ).
2
By Lemma 2, enc(L2 ) = R2 ∩
D0
Υe2
for some local set R2 ⊆ Γ + . So we have
Υe2
enc(L2 ) ∩ enc(DT 2Υ2 ) = R2 ∩ D0e ∩ η2−1 (DΥe ).
Υ2
2
(26)
Given (25) and (26), all we need is to turn R2 ∩ D0 ∩ η2−1 (DΥe ) into the
2
Υe2
c
form π
f3 (R ∩ DΥe ∩ η −1 (DΥe)). For this, we can use a method similar to the one
we used in the proof of Lemma 6. Let
Υ = Υ2 ∪ { c̄ | c ∈ Υ2 }
and define π3 : Υ → Υ2 by
π3 (c) = c,
π3 (c̄) = c,
for each c ∈ Υ2 . Let
f2 },
∆1 = { c̄ | c ∈ Υ2 } ∪ { (c̄, P, 0) | (c, P, 0) ∈ Υ
f2 ∪ { (c̄, P, i) | (c, P, i) ∈ Υ
f2 , i ≥ 1 }.
∆2 = Υ
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
53
Then ∆1 , ∆2 is a partition of Υe. Let
∗
c
R = ({ [d | d ∈ ∆1 } Γ∆
{ ]d | d ∈ ∆1 }) ∩ π
f3
2
−1
(R2 ).
Then R is a local subset of Γ + ,22 and it is easy to see
e
Υ
c
π
f3 (R ∩ DΥe ∩ η −1 (DΥe)) = R2 ∩ D0e ∩ η2−1 (DΥe ),
Υ2
2
(27)
c
where η −1 is as defined in Lemma 36. It is also easy to see that π
f3 maps R ∩
−1
−1
0
DΥe ∩ η (DΥe) bijectively to R2 ∩ D ∩ η2 (DΥe ).
2
Υe2
We obtain the statement of the lemma from (25), (26), and (27) by letting
π = π3 ◦ π2 ◦ π1 .
t
u
1
Recall that when Σ0 and Σ1 are disjoint alphabets, TΣ
Σ0 consists of all (ordinary 2-dimensional) trees in TΣ0 ∪Σ1 that are disjointly labeled with Σ0 , Σ1 .
Lemma 38. If L ⊆ T3Σ,x is a local set, then there exist an alphabet Σ 0 disjoint
from Σ, a projection π : Σ ∪ Σ 0 → Σ, and a local set L0 ⊆ T3Σ∪Σ 0 ,x that satisfy
the following conditions:
(i)
(ii)
(iii)
(iv)
(v)
π(c) = c for all c ∈ Σ.
L = π(L0 ). Moreover, π maps L0 bijectively to L.
0
y2 (L0 ) ⊆ TΣ
Σ .
y2 (L) = π(y2 (L0 )). Moreover, π maps y2 (L0 ) bijectively to y2 (L).
y(y2 (L)) = y(y2 (L0 )).
Proof. Let L = Loc3 (A, Z, I) be a local subset of T3Σ,x . Let Σ 0 = { c̄ | c ∈ Σ }.
Define a projection π : Σ ∪ Σ 0 → Σ by
π(c) = c,
π(c̄) = c,
for each c ∈ Σ. Let
Z 0 = Z ∪ { c̄ | c ∈ Z }
I 0 = { (c, T 0 ) | (c, T ) ∈ I, T ∈ T2Σ (0) } ∪ { (c̄, T 0 ) | (c, T ) ∈ I, T ∈ T2Σ (n), n ≥ 1 },
0
where for T = (T, `T ) ∈ T2Σ∪{x} , T 0 = (T, `T ) is defined by
(
c̄
if `T (v) = c ∈ Σ and v ∈ dom(≺T2 ),
` (v) = T
` (v) otherwise.
T0
Note that T 0 ∈ T2Σ∪Σ 0 ∪{x} . Define a local subset L0 of T3Σ∪Σ 0 ∪{x} by L0 =
Loc3 (A, Z 0 , I 0 ). Then it is easy to see that π and L0 satisfy the required properties.
t
u
22
Although ΓΥe is an infinite alphabet, only finitely many symbols in it appear in
−1
πb
e3 (R2 ) since R2 is local.
54
Makoto Kanazawa
Lemma 39. If L ⊆ T3Σ,x is a local set, then there exist a finite alphabet Υ , a
local set R ⊆ Γ + , and an alphabetic homomorphism h : Γ ∗ → Σ ∗ such that
e
e
Υ
Υ
y(y2 (L)) = h(R ∩ DΥe ∩ η −1 (DΥe)),
where Υe is defined with respect to dimension 2 and η is the alphabetic homomorphism defined in Lemma 36.
Proof. Let L ⊆ T3Σ,x be a local set. Applying Lemma 38 to L, we obtain a
projection π 0 : Σ ∪ Σ 0 → Σ and a local set L0 ⊆ T3Σ∪Σ 0 ,x such that π 0 maps L0
0
0
bijectively to L, y2 (L0 ) ⊆ TΣ
Σ , and y(y2 (L)) = y(y2 (L )). By Lemma 37,
b
enc(enc2 (L0 )) = π
e(R ∩ DΥe ∩ η −1 (DΥe))
for some finite set Υ , projection π : Υ → Σ ∪ Σ 0 , and local set R ⊆ Γ + . We have
e
Υ
y(y2 (L)) = y(y2 (L0 ))
= hΣ0 ,Σ1 (enc(del2,Υe−Υ (enc2 (L0 )))))
= hΣ0 ,Σ1 (hΓΥ ,Γ
Υ −Υ
e
= hΣ0 ,Σ1 (hΓΥ ,Γ
Υ −Υ
e
(enc(enc2 (L0 ))))
b
(π
e(R ∩ DΥe ∩ η −1 (DΥe)))),
so the statement of the lemma holds with h = hΣ0 ,Σ1 ◦ hΓΥ ,Γ
Υ −Υ
e
b
◦π
e.
t
u
Note that in the above proof, the set R ∩ DΥe ∩ η −1 (DΥe) is mapped bijectively
−1 b
to L by π 0 ◦ enc−1
◦π
e.
2 ◦ enc
The following lemma is analogous to the corresponding characterization of
context-free (string) languages.
Lemma 40. Let L ⊆ TΣ . Then L ∈ CFTsp (r) if and only if there exist a finite
alphabet Υ and a local set K ⊆ T3Υ,x such that
(i) L = y2 (K), and
(ii) for all T ∈ K and all v ∈ T , if v ∈ dom(≺T3 ), then |C2T (v)| ≤ r.
Lemma 41. For every L ∈ CFTsp (r), there is an L0 ∈ CFTsp (r) such that
enc(L) = y(L0 ).
Proof. Let K ⊆ T3Υ,x be a local set satisfying condition (ii) of Lemma 40. By
Lemma 38, we may assume K = Loc3 (A, Z, I) with Z ∩ { c | (c, T ) ∈ I } = ∅.
Let
A0 = A ∪ { c̄ | c ∈ A ∩ Z },
[
Z 0 = Z ∪ { {[c , ]c } | c ∈ Z },
I 0 = { (c, ϕ(T )) | (c, T ) ∈ I } ∪ { (c̄, c([c ]c )) | c ∈ A ∩ Z }.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
55
where for each c ∈ A ∩ Z, c̄ is a new symbol, and
ϕ(x) = x,
(
c([c ]c ) if c ∈ Z,
ϕ(c) =
c
otherwise,
(
c([c ϕ(T1 ) . . . ϕ(Tn ) ]c ) if c ∈ Z,
ϕ(c(T1 . . . Tn )) =
c(ϕ(T1 ) . . . ϕ(Tn ))
otherwise.
Then it is easy to see K 0 = Loc3 (A, Z 0 , I 0 ) also satisfies condition (ii) of
Lemma 40, and we have enc(y2 (K)) = y(y2 (K 0 )). We omit the details.
t
u
Recall that Γn = {[1 , ]1 , . . . , [n , ]n } and Dn is the Dyck language over Γn ,
where [i and ]i form a matching pair of brackets for i = 1, . . . , n.
Theorem 42. Let q ≥ 1 and M ⊆ Σ ∗ . The following are equivalent:
(i) M ∈ yCFTsp (q − 1).
∗
, and an alphabetic
(ii) There exist a positive integer n, a local set R ⊆ Γqn
∗
∗
homomorphism h : Γqn → Σ such that M = h(R ∩ Dqn ∩ g −1 (Dqn )),
where g is the bijection on Γqn defined by
g([qi+1 ) = [qi+1 ,
g(]qi+1 ) = ]qi+q ,
g([qi+j ) = ]qi+j−1 ,
g(]qi+j ) = [qi+j ,
for i = 0, . . . , n − 1 and j = 2, . . . , q.
Proof. (ii) ⇒ (i). By Lemma 41 and the fact that yCFTsp (q−1) is a substitutionclosed full abstract family of languages [28], it suffices to show that there are some
L ∈ CFTsp (q − 1) and homomorphism h such that h(enc(L)) = Dqn ∩g −1 (Dqn ).
eq−1 ∪ {X0 , . . . , Xq−1 }.
Let m = 2, Σ = {c1 , . . . , cn }, r = q − 1, p = 2, and Υ = Σ
eq−1 , I ⊆ {X0 , . . . , Xq−1 } × T2 such that
Lemma 30 gives finite sets A ⊆ Υ, Z = Σ
Υ
DT 2Σ ∩ T2e
Σq−1 ,2
= y2 (Loc3 (A, Z, I)).
Let L = DT 2Σ ∩ T2
. Inspection of the proof of Lemma 30 also shows that
eq−1 ,2
Σ
3
for all T ∈ Loc (A, Z, I) and all v ∈ T , v ∈ dom(≺T3 ) implies |C2T (v)| ≤ q − 1.
So L ∈ CFTsp (q
S − 1). Let Φ = { (ci , q − 1, j) | 1 ≤ i ≤ n, 0 ≤∗ j ≤ q − 1 }.∗ We
identify ΓΦ = { {[d , ]d } | d ∈ Φ } with Γqn . Let h : (ΓΣ
eq−1 ) → (ΓΣ
eq−1 ) be
the homomorphism that erases all symbols that are not in Γqn . Our goal is to
show
h(enc(L)) = Dqn ∩ g −1 (Dqn ).
To show h(enc(L)) ⊆ Dqn ∩ g −1 (Dqn ), suppose T ∈ L. Since L ∈ T2
, it
eq−1 ,2
Σ
is clear that
h(enc(T )) ∈ Dqn .
(28)
56
Makoto Kanazawa
Since L ⊆ DT 2Σ , by Lemma 36,
η(enc(T )) ∈ DΣ
e
q−1
,
where η : Γ ∗ → Γ ∗ is as defined in Lemma 36, with Σ in place of Υ . So
e
e
Σ
Σ
h(η(enc(T )) ∈ h(DΣ
e
q−1
) = Dqn .
Note that η restricted to Γqn coincides with g. So we have
h(η(enc(T )) = η(h(enc(T )))
= g(h(enc(T ))).
This shows that
h(enc(T )) ∈ g −1 (Dqn ).
(29)
By (28) and (29), h(enc(L)) ⊆ Dqn ∩ g −1 (Dqn ).
Now we show the converse inclusion. Let s ∈ Dqn ∩ g −1 (Dqn ). Then there
is a hedge T ∈ H2Φ such that s = enc(T ). We turn T into a tree T 0 = ϕ(T ) ∈
T{c1 },Φ ∩ T2
, where symbols in Φ are assumed to have rank 1:
eq−1 ,2
Σ
ϕ((ci , q − 1, j)) = (ci , q − 1, j)(c1 ),
ϕ((ci , q − 1, j)(T1 . . . Tn )) = (ci , q − 1, j)(ϕ(T1 . . . Tn )),
ϕ(T1 . . . Tn ) = c1 (ϕ(T1 ) ϕ(T2 . . . Tn ))
where n ≥ 2.
Then s = h(enc(T 0 )). We have
η(enc(T 0 )) = g(s) ∈ Dqn ⊆ DΣ
e.
2
0
0
Since T 0 ∈ T{c1 },Φ ⊆ TΣ,Σ−Σ
e , Lemma 36 implies that T ∈ DT Σ . So T ∈ L
0
−1
and s = h(enc(T )) ∈ h(enc(L)). We conclude Dqn ∩ g (Dqn ) ⊆ h(enc(L)).
We have shown h(enc(L)) = Dqn ∩ g −1 (Dqn ).
(i) ⇒ (ii). Let L ∈ CFTsp (q − 1) be such that M = y(L). By Lemma 40,
L = y2 (K) and enc2 (K) ⊆ T2
for some finite set Ψ and some local set
eq−1
Ψ
b
K ⊆ T3Ψ,x . By Lemma 37, enc(enc2 (K)) = π
e(R0 ∩ DΥe ∩ η −1 (DΥe)) for some
0
local set R ⊆ ΓΥe and projection π : Υ → Ψ . Since enc2 (K) ⊆ T2 , it easily
eq−1
Ψ
b
follows that enc(enc2 (K)) = π
e(R00 ∩DΥe ∩η −1 (DΥe )), where R00 = R0 ∩(ΓΥe )+ ,
q
q
q−1
which is a local subset of (ΓΥe )+ . Using this in the proof of Lemma 39, we
q−1
easily obtain
M = h0 (R00 ∩ DΥe ∩ η −1 (DΥe )),
(30)
q−1
q−1
where h0 : (ΓΥe )∗ → Σ ∗ is an alphabetic homomorphism. In order to obtain
q−1
the statement of the lemma, there are three things we need to fix:
– for c ∈ Υ , η erases [c and ]c ,
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
57
– when q ≥ 2, the number of pairs of brackets in the group [c , ]c is 1 < q, and
– when k < q − 1, the number of pairs of brackets in the group
[(c,k,0) , ](c,k,0) , [(c,k,1) , ](c,k,1) , . . . , [(c,k,k) , ](c,k,k) is k + 1 < q.
We introduce the following new brackets:
[c,1 , ]c,1 , . . . , [c,q−1 , ]c,q−1 ,
[(c,k,k+1) , ](c,k,k+1) , . . . , [(c,k,q−1) , ](c,k,q−1) ,
for each c ∈ Υ and k < q − 1. We now have an alphabet Γqn consisting of n
groups of q pairs of brackets:
[c , ]c , [c,1 , ]c,1 , . . . , [c,q−1 , ]c,q−1 ,
[(c,k,0) , ](c,k,0) , . . . , [(c,k,q−1) , ](c,k,q−1) ,
∗
by
where n = |Υ | × (q + 1). Define a homomorphism ψ : (ΓΥe )∗ → Γqn
q−1
ψ([c ) = [c [c,1 ,
ψ(]c ) = ]c,1 [c,2 ]c,2 . . . [c,q−1 ]q−1 ]c ,
ψ([(c,k,0) ) = [(c,k,0) ,
ψ(](c,k,0) ) = [(c,k,k+1) ](c,k,k+1) . . . [(c,k,q−1) ](c,k,q−1) ](c,k,0) ,
ψ([(c,k,i) ) = [(c,k,i) ,
for 1 ≤ i ≤ k.
ψ(](c,k,i) ) = ](c,k,i)
∗
Now it is easy to see that ψ is injective and ψ(R00 ) is a local subset of Γqn
. Also,
∗
we can show that all x ∈ (ΓΥe ) satisfy the following properties:
q−1
x ∈ DΥe
q−1
if and only if ψ(x) ∈ Dqn ,
g(ψ(x))
∗
(31)
ψ(η(x)).
These properties combined ensure that
η(x) ∈ DΥe
q−1
if and only if g(ψ(x)) ∈ Dqn .
(32)
∗
Let χ : Γqn
→ (ΓΥe )∗ be the alphabetic homomorphism that erases all new
q−1
brackets. Clearly, χ restricted to the range of ψ is the inverse of ψ. Now observe
χ(ψ(R00 ) ∩ Dqn ∩ g −1 (Dqn )) = R00 ∩ DΥe
q−1
∩ η −1 (DΥe
q−1
).
(33)
Indeed, if x ∈ R00 , ψ(x) ∈ Dqn , and g(ψ(x)) ∈ Dqn , then χ(ψ(x)) = x ∈
R00 ∩ DΥe
by (31), and η(χ(ψ(x)) = η(x) ∈ DΥe
by (32). Conversely, if
q−1
q−1
00
x ∈ R ∩ DΥe
and η(x) ∈ DΥe , then ψ(x) ∈ ψ(R00 ) ∩ Dqn by (31) and
q−1
q−1
g(ψ(x)) ∈ Dqn by (32), and so x = χ(ψ(x)) ∈ χ(ψ(R00 ) ∩ Dqn ∩ g −1 (Dqn )).
We obtain the statement of the lemma from (30) and (33) by taking R =
ψ(R00 ) and h = h0 ◦ χ.
t
u
58
Makoto Kanazawa
As before, in the direction (i) ⇒ (ii) of the above proof, R∩Dqn ∩g −1 (Dqn ) =
ψ(R00 ) ∩ Dqn ∩ g −1 (Dqn ) stands in one-one correspondence with the local set
K ⊆ T3Ψ,x . Thus, each derivation tree T of a simple context-free tree grammar
for M is uniquely represented by an element s of R ∩ Dqn ∩ g −1 (Dqn ) such that
enc(enc2 (T )) is the image of s under a certain alphabetic homomorphism, and
vice versa. This is exactly analogous to the situation with the original ChomskySchützenberger representation theorem.
As in the case of context-free languages, we can take a fixed Dyck language
D2q , instead of Dqn with varying n, and use a rational transduction to represent
any string language that is the yield image of some L ∈ CFTsp (q − 1):23
Corollary 43. For any M ∈ yCFTsp (q − 1), there is a rational transduction τ
such that M = τ (D2q ∩ g −1 (D2q )), where g is as defined in Theorem 42.
10
Conclusion
We have generalized Weir’s [34] characterization of the string languages of treeadjoining grammars to the string languages of simple context-free tree grammars
of arbitrary fixed rank. We obtained this result via a natural generalization of
the original Chomsky-Schützenberger theorem to simple context-free tree grammars. We represented derivation trees of simple context-free tree grammars as
3-dimensional trees, and proved this latter result as a general fact about simple
context-free sets of m-dimensional trees, for arbitrary m ≥ 2. This generality is
of course an overkill for the purpose of obtaining our generalization of Weir’s
theorem, but it may be of independent interest. Moreover, all the complexity of
the general case is essentially already present in the 3-dimensional case; proving
only the special case of our lemmas that are needed for the generalization of
Weir’s theorem will not be substantially simpler.
In order to define the m-dimensional yield of an (m + 1)-dimensional tree,
we placed a restriction on the occurrences of a special label x that serve as
targets for substitution. If we wish to iterate the process of taking the yield,
i.e., if we are interested in the yield of the yield of an (m + 2)-dimensional tree,
etc., we will need more than one variable as placeholders, with each variable
providing targets for substitution at a different step of the iterative process of
taking yields. Although we did not attempt to do so in this paper, it may be
interesting to study the resulting hierarchy of classes of tree languages (and their
yield images), the first three levels of the hierarchy being the local tree languages,
23
Also, when string languages over a fixed alphabet Σ with |Σ| = k are considered,
we can use Dq(k+2) and a fixed alphabetic homomorphism h so that every M ∈
yCFTsp (q − 1) can be written as M = h(R ∩ Dq(k+2) ∩ g −1 (Dq(k+2) )) for some
regular set R. (This will require modification of the constructions used in several
lemmas.) See [36] for an analogous characterization of string languages of q-MCFGs
of rank r, and, e.g., [27] for the case of context-free languages.
Multi-dimensional Trees and a Representation Theorem for Simple CFTGs
59
the simple context-free tree languages, and the yields of the simple context-free
sets of (well-labeled) 3-dimensional trees.24
References
1. Arnold, A., Dauchet, M.: Un theoreme de Chomsky-Schützenberger pour les forets
algebriques. CALCOLO 14(2), 161–184 (1977)
2. Benoit, D., Demaine, E.D., Munro, I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43, 275–292 (2005)
3. Chomsky, N., Schützeberger, M.P.: The algebraic theory of context-free languages.
In: Braffort, P., Hirschberg, D. (eds.) Computer Programming and Formal Systems,
pp. 118–161. North-Holland, Amsterdam (1963)
4. Chomsky, N.: Formal properties of grammars. In: Luce, R.D., Bush, R.R., Galanter,
E. (eds.) Handbook of Mathematical Psychology, Volume II, pp. 323–418. John
Wiley and Sons, New York (1963)
5. Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Löding, C.,
Tison, S., Tommasi, M.: Tree Automata Techniques and Applications. Available on
http://tata.gforge.inria.fr/ (2008), release November 18, 2008
6. Eilenberg, S.: Automata, Languages, and Machines, vol. A. Academic Press, New
York (1974)
7. Engelfreit, J., Maneth, S.: Tree languages generated by context-free graph grammars. In: Ehrig, H., Engels, G., Kreowski, H.J., Rozenberg, G. (eds.) Theory and
Application of Graph Transformations: 6th International Workshop, TAGT’98. pp.
15–59. Springer, Berlin (2000)
8. Engelfriet, J., Schmidt, E.M.: IO and OI, part I. The Journal of Computer and
System Sciences 15, 328–353 (1977)
9. Fischer, M.J.: Grammars with Macro-Like Productions. Ph.D. thesis, Harvard University (1968)
10. Fujiyoshi, A., Kasai, T.: Spinal-formed context-free tree grammars. Theory of Computing Systems 33, 59–83 (2000)
11. Joshi, A.K., Schabes, Y.: Tree-adjoining grammars. In: Rozenberg, G., Salomaa, A.
(eds.) Handbook of Formal Languages, vol. 3, pp. 69–123. Springer, Berlin (1997)
12. Kanazawa, M.: The convergence of well-nested mildly context-sensitive
grammar formalisms (July 2009), an invited talk given at the 14th
Conference on Formal Grammar, Bordeaux, France. Slides available at
http://research.nii.ac.jp/˜kanazawa/.
13. Kanazawa, M.: The pumping lemma for well-nested multiple context-free languages. In: Diekert, V., Nowotka, D. (eds.) Developments in Language Theory:
13th International Conference, DLT 2009. pp. 312–325. Springer, Berlin (2009)
14. Kasprzik, A.: Making finite-state methods applicable to languages beyond contextfreeness via multi-dimensional trees. In: Piskorski, J., Vatson, B.W., Yli-Jyrä, A.
(eds.) Finite-State Methods and Natural Language Processing, pp. 98–109. IOS
Press, Amsterdam (2009)
15. Kepser, S., Mönnich, U.: Closure properties of linear context-free tree languages
with an application to optimality theory. Theoretical Computer Science 354(1),
82–97 (2006)
24
Rogers [23] looked at a connection between multi-dimensional trees (with his notion
of yield) and Weir’s [35] control language hierarchy.
60
Makoto Kanazawa
16. Kepser, S., Rogers, J.: The equivalence of tree adjoining grammars and monadic
linear context-free tree grammars. Journal of Logic, Language and Information 20,
361–384 (2011)
17. Knuth, D.E.: The Art of Computer Programming, Vol. I: Fundamental Algorithms.
Addison-Wesley, Reading, Mass. (1997)
18. Kozen, D.C.: Automata and Computability. Springer, New York (1997)
19. Matsubara, S., Kasai, T.: A characterization of TALs by the generalized Dyck
language. IEICE Transactions on Information and Systems (Japanese Edition)
J90-D, 1417–1427 (2007)
20. McNaughton, R., Papert, S.A.: Counter-Free Automata. MIT Press, Cambridge,
Mass. (1971)
21. Mönnich, U.: Adjunction as substitution: An algebraic formulation of regular,
context-free and tree adjoining languages (1997), arXiv:cmp-lg/9707012v1
22. Perrin, D.: Finite automata. In: van Leeuwen, J. (ed.) Handbook of Theoretical
Computer Science, vol. B, pp. 1–57. Elsevier, Amsterdam (1990)
23. Rogers, J.: Syntactic structures as multi-dimensional trees. Research on Language
and Computation 1, 265–305 (2003)
24. Rogers, J.: wMSO theories as grammar formalisms. Theoretical Computer Science
293, 291–320 (2003)
25. Rogers, J., Pullum, G.K.: Aural pattern recognition experiments and the subregular hierarchy. Journal of Logic, Language and Information 20, 329–342 (2011)
26. Rounds, W.: Mappings and grammars on trees. Mathematical Systems Theory 4,
257–287 (1970)
27. Salomaa, A.: Formal Languages. Academic Press, Orlando, Florida (1973)
28. Seki, H., Kato, Y.: On the generative power of multiple context-free grammars
and macro grammars. IEICE Transactions on Information and Systems E91–D,
209–221 (2008)
29. Seki, H., Matsumura, T., Fujii, M., Kasami, T.: On multiple context-free grammars.
Theoretical Computer Science 88(2), 191–229 (1991)
30. Sorokin, A.: Monoid automata for displacement context-free languages. In: ESSLLI
Student Session 2013 Preproceedings. pp. 158–167 (2013)
31. Takahashi, M.: Generalizations of regular sets and their application to a study of
context-free languages. Information and Control 27, 1–36 (1975)
32. Thatcher, J.W.: Characterizing derivation trees of context-free grammars through a
generalization of finite automata theory. Journal of Computer and System Sciences
1, 317–322 (1967)
33. Thatcher, J.W.: Generalized2 sequential machine maps. Journal of Computer and
System Sciences 24, 339–367 (1970)
34. Weir, D.J.: Characterizing Mildly Context-Sensitive Grammar Formalisms. Ph.D.
thesis, University of Pennsylvania (1988)
35. Weir, D.J.: A geometric hierarchy beyond context-free languages. Theoretical Computer Science 104(2), 235–261 (1992)
36. Yoshinaka, R., Kaji, Y., Seki, H.: Chomsky-Schützenberger-type characterization
of multiple context-free languages. In: Dediu, A.H., Fernau, H., Martı́n-Vide, C.
(eds.) Language and Automata Theory and Applications, Fourth International
Conference, LATA 2010. pp. 596–607. Springer, Berlin (2010)