ISSN 1346-5597 NII Technical Report Multi-dimensional Trees and a Chomsky-Schützenberger-Weir Representation Theorem for Simple Context-Free Tree Grammars Makoto Kanazawa NII-2013-003E Nov. 2013 Multi-dimensional Trees and a Chomsky-Schützenberger-Weir Representation Theorem for Simple Context-Free Tree Grammars Makoto Kanazawa? National Institute of Informatics, Tokyo, Japan [email protected] Abstract. Weir [34] proved a Chomsky-Schützenberger-like representation theorem for the string languages of tree-adjoining grammars, where the Dyck language Dn in the Chomsky-Schützenberger characterization is replaced by the intersection D2n ∩ g −1 (D2n ), where g is a certain bijection on the alphabet consisting of 2n pairs of brackets. This paper presents a generalization of this theorem to the string languages that are the yield images of the tree languages generated by simple (i.e., linear and non-deleting) context-free tree grammars. This result is obtained through a natural generalization of the original Chomsky-Schützenberger theorem to the tree languages of simple context-free tree grammars. We use Rogers’s [24,23] notion of multi-dimensional trees to state this latter theorem in a very general, abstract form. Keywords: Context-free tree grammar; Multi-dimensional tree; Dyck language; Chomsky-Schützenberger theorem 1 Introduction Weir [34] showed that every string language L generated by a tree-adjoining grammar [11] can be written as L = h(R ∩ D2n ∩ g −1 (D2n )), where h is a homomorphism, R is a regular set, n is a positive integer, D2n is the Dyck language over the alphabet Γ2n consisting of 2n pairs of brackets [1 , ]1 , , . . . , [2n , ]2n , and g is the bijection on Γ2n defined by g([2i+1 ) = [2i+1 , g(]2i+1 ) = ]2i+2 , g([2i+2 ) = ]2i+1 , g(]2i+2 ) = [2i+2 , for i = 0, . . . , n − 1. The effect of the intersection with g −1 (D2n ) on the Dyck language D2n is to make the consecutive odd-numbered and even-numbered ? This work was in part supported by the Japan Society for the Promotion of Science Grant-in-Aid for Scientific Research (KAKENHI) (25330020). 2 Makoto Kanazawa brackets [2i+1 , ]2i+1 , [2i+2 , ]2i+2 always appear as a group, in the configuration [2i+1 [2i+2 ]2i+2 ]2i+1 . When two such groups, say, [1 , ]1 , [2 , ]2 and [3 , ]3 , [4 , ]4 , overlap, the only possible configurations are [1 [3 [4 ]4 ]3 [2 ]2 ]1 , [1 [2 ]2 [3 [4 ]4 ]3 ]1 , [1 [3 [4 [2 ]2 ]4 ]3 ]1 , and those with the positions of the two groups interchanged. As Weir [34] showed, D2n ∩ g −1 (D2n ) is a non-context-free tree-adjoining language for every n ≥ 1. In this paper, we prove a generalization of Weir’s theorem for simple (i.e., linear and non-deleting) context-free tree grammars 1 [26,8,15]: if L is the string language generated by a simple context-free tree grammar of rank q − 1, then L can be written as L = h(R ∩ Dqn ∩ g −1 (Dqn )), where h, R, and n are as before, Dqn is the Dyck language over the alphabet Γqn (containing qn pairs of brackets), and g is the bijection on Γqn defined by g([qi+1 ) = [qi+1 , g(]qi+1 ) = ]qi+q , g([qi+j ) = ]qi+j−1 , g(]qi+j ) = [qi+j , for i = 0, . . . , n − 1 and j = 2, . . . , q. This result generalizes Weir’s [34] because tree-adjoining grammars generate the same string languages as simple contextfree tree grammars that are monadic (i.e., of rank 1) [21,10,16]. As in the original Chomsky-Schützenberger theorem [4,3], we can take R to be a local set and h to be alphabetic in the sense that h maps each symbol either to a symbol or to the empty string. It is known [12,13] that the string languages of simple context-free tree grammars are exactly those generated by multiple context-free grammars [29] that are well-nested in the sense of [13]. For multiple context-free grammars in general, Yoshinaka et al. [36] have proved a Chomsky-Schützenberger-like representation theorem, but the analogy to the Chomsky-Schützenberger theorem is somewhat weak because their notion of a multiple Dyck language is given only by reference to a certain multiple context-free grammar, and does not seem to have other independent characterizations, analogous to the characterization of ordinary Dyck languages in terms of the cancellation law [i ]i ε. Our result is obtained via a natural generalization of the Chomsky-Schützenberger theorem to the tree languages of simple context-free tree grammars. This intermediate result is stated in terms of Dyck tree languages, which are exactly analogous to the original Dyck languages in that they have two equivalent definitions, one in terms of inductive definitions and one in terms of rewriting with cancellation laws. In order to emphasize the analogy between the string case and the tree case, we use the notion of a multi-dimensional tree introduced by Rogers [24,23] 1 The term “simple context-free tree grammar” is taken from [7]. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 3 and state many lemmas as general facts about m-dimensional trees. We use 3dimensional trees to represent derivation trees of simple context-free tree grammars. We note that our notion of the yield of an m-dimensional tree will be different from Rogers’s, because Rogers’s definition was specifically tailored for the monadic case.2 2 Preliminaries 2.1 First-Child-Next-Sibling Encoding of Ordered Unranked Trees In an ordered unranked tree, a node may have any number of children, and the children of the same node are linearly ordered. We do not consider unordered trees in this paper, so we call ordered unranked trees simply unranked trees. In the usual term notation for unranked trees [32], the unranked trees over a set Σ of labels are defined inductively as follows: – If c ∈ Σ, then c is an unranked tree over Σ. – If t1 , . . . , tn are unranked trees over Σ (n ≥ 1) and c ∈ Σ, then c(t1 . . . tn ) is an unranked tree over Σ. There is a well-known way of encoding unranked trees into binary trees [17], often called the first-child-next-sibling encoding. We refer to a node in a binary tree by a string over the set {1, 2}. Thus, the set of nodes of a binary tree forms a prefix-closed subset T of {1, 2}∗ . (Note that we do not assume binary trees to be full in the sense that each node has 0 or 2 children.) We write u · v for the concatenation of two strings u, v ∈ {1, 2}∗ . In the first-child-next-sibling encoding of unranked trees, the relation u · 2 = v represents the relation “v is the first child of u”, and the relation u · 1 = v represents the relation “v is the next sibling of u”. The child relation is then represented by the first-child relation composed with the reflexive transitive closure of the next-sibling relation. In this way, any non-empty finite prefix-closed subset T of {1, 2}∗ such that 1 6∈ T encodes the set of nodes of some unranked tree. In general, an arbitrary nonempty finite prefix-closed subset of {1, 2}∗ encodes the nodes of a hedge, a finite, non-empty sequence of unranked trees.3 In this encoding, ε (empty string) is the 2 3 When the present work was nearing completion, the author learned of a recent paper by Sorokin [30], in which he states (without proof) a result similar to Theorem 42 below (Theorem 3 of [30]). (The statement of his theorem is actually closer to Lemma 39 below.) As will be clear to the reader, the emphasis of the present paper is very different from Sorokin’s. The merit of the present work lies not so much in Theorem 42 itself as in the method of obtaining it though a natural generalization of the constructions that can be used to prove the original Chomsky-Schützenberger theorem. (Sorokin’s own emphasis is on the use of monoid automata to characterize the string languages of simple context-free tree grammars.) Sometimes the empty sequence of unranked trees is also allowed as a hedge, but we exclude it here in order to be able to encode all hedges into binary trees. Note that Knuth [17] and Takahashi [31] used forest instead of hedge, the term we adopt here following [5]. 4 Makoto Kanazawa root of the first tree, 1 is the root of the second tree, 1 · 1 is the root of the third tree, and so on. Trees and hedges we consider in this paper are all labeled. Labeled unranked trees and hedges over Σ are represented by pairs of the form T = (T, `), where T is a non-empty finite prefix-closed subset of {1, 2}∗ and ` is a function from T to Σ. 2.2 Dyck Languages Sn For n ≥ 1, let Γn = i=1 {[i , ]i }. For each i, the two symbols [i , ]i are regarded as a matching pair of brackets. Define a binary relation on Γn∗ by = { (u [i ]i v, uv) | u, v ∈ Γn∗ , 1 ≤ i ≤ n }. The Dyck language Dn is defined by Dn = { v ∈ Γn∗ | v ∗ ε }, where ∗ denotes the reflexive transitive closure of the relation . An alternative way of defining Dn is by the following context-free grammar: S → ε | AS, A → [1 S ]1 | · · · | [n S ]n . The set Dn0 of Dyck primes is defined by Dn0 = (Dn − {ε}) − (Dn − {ε})2 . Alternatively, the set Dn0 is defined by the nonterminal A in the above contextfree grammar. Unranked trees and hedges can be represented by elements of Dyck languages. If Σ is a set of symbols, let [ ΓΣ = {[c , ]c }. c∈Σ 0 DΣ We write DΣ and for the Dyck language and the set of Dyck primes over this alphabet. Using the standard term notation for labeled unranked trees, define the string encoding function enc from labeled unranked trees and hedges over Σ to strings over ΓΣ by enc(c) = [c ]c , enc(c(t1 . . . tn )) = [c enc(t1 . . . tn ) ]c , enc(t1 . . . tn ) = enc(t1 ) enc(t2 . . . tn ) for n ≥ 2. It is clear that the function enc maps any unranked tree over Σ to an element of 0 DΣ , and any hedge over Σ to an element of DΣ − {ε}. Conversely, it is easy to 0 see that any element of DΣ encodes a tree over Σ, and any element of DΣ − {ε} encodes a hedge over Σ. These correspondences are bijections. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 2.3 5 Context-Free Tree Grammars We deviate from the standard practice and let a context-free tree grammar generate a set of unranked trees. Thus, the terminal alphabet of a context-free tree grammar will be unranked. In contrast, the set of nonterminals will be a ranked alphabet, as in the standard definition. S A ranked alphabet is a union Υ = r∈N Υ (r) of disjoint sets of symbols. If f ∈ Υ (r) , r is the rank of f . If Σ is an (unranked) alphabet and Υ a ranked alphabet, let TΣ,Υ be the set of trees T ∈ TΣ∪Υ such that whenever a node of T is labeled by some f ∈ Υ , then the number of its children is equal to the rank of f . For convenience, we use the term representation of trees. The set TΣ,Υ can be defined inductively as follows: 1. If f ∈ Σ ∪ Υ (0) , then f ∈ TΣ,Υ ; 2. If f ∈ Σ ∪ Υ (n) and t1 , . . . , tn ∈ TΣ,Υ (n ≥ 1), then f (t1 . . . tn ) ∈ TΣ,Υ . In order to define the notion of a context-free tree grammar, we need a countably infinite supply of variables x1 , x2 , x3 , . . . . The set consisting of the first n variables is denoted Xn (i.e., Xn = {x1 , . . . , xn }). The notation TΣ,Υ (Xn ) denotes the set TΣ,Υ ∪Xn , where members of Xn are all assumed to have rank 0. A tree in TΣ,Υ (Xn ) is often written t[x1 , . . . , xn ], displaying the variables. If t[x1 , . . . , xn ] ∈ TΣ,Υ (Xn ) and t1 , . . . , tn ∈ TΣ,Υ , then t[t1 , . . . , tn ] denotes the result of substituting t1 , . . . , tn for x1 , . . . , xn , respectively, in t[x1 , . . . , xn ]. An element t[x1 , . . . , xn ] of TΣ,Υ (Xn ) is an n-context if for each i = 1, . . . , n, xi occurs exactly once in t[x1 , . . . , xn ]. (In the literature, an n-context is sometimes called a simple tree.) A context-free tree grammar [26,8] is a quadruple G = (N, Σ, P, S), where 1. 2. 3. 4. N is a finite ranked alphabet of nonterminals, Σ is a finite unranked alphabet of terminals, S is a nonterminal of rank 0, and P is a finite set of productions of the form B(x1 . . . xn ) → t[x1 , . . . , xn ], where B ∈ N (n) and t[x1 , . . . , xn ] ∈ TΣ,N (Xn ). The rank of G is max{ r | N (r) 6= ∅ }. For every s, s0 ∈ TΣ,N , s ⇒G s0 is defined to hold if and only if there is a 1-context c[x1 ] ∈ TΣ,N (1), a production B(x1 . . . xn ) → t[x1 , . . . , xn ] in P , and trees t1 , . . . , tn ∈ TΣ,N such that s = c[B(t1 . . . tn )], s0 = c[t[t1 , . . . , tn ]]. The relation ⇒∗G on TΣ,N is defined as the reflexive transitive closure of ⇒G . The tree langauge generated by a context-free tree grammar G, denoted by L(G), is defined as follows: L(G) = { t ∈ TΣ | S ⇒∗G t }. 6 Makoto Kanazawa The string language generated by G is y(L(G)) = { y(t) | t ∈ L(G) }, where y(t) is the yield of t in the usual sense. A context-free tree grammar G = (N, Σ, P, S) is said to be simple if for every production B(x1 . . . xn ) → t[x1 , . . . , xn ] in P , t[x1 , . . . , xn ] is an n-context. We let CFTsp (r) stand for the family of tree languages L such that L = L(G) for some simple context-free tree grammar G whose rank does not exceed r. We write yCFTsp (r) for the corresponding string languages { y(L) | L ∈ CFTsp (r) }. Let {y1 , . . . , yk } be a ranked alphabet, where for i = 1, . . . , k, ri is the rank of yi . Let ti [x1 , . . . , xri ] be an ri -context. For a tree t ∈ TΣ,{y1 ,...,yk } (Xn ), we define t[ti [x1 , . . . , xri ]/yi ] inductively as follows: c(u1 . . . um )[ti [x1 , . . . , xri ]/yi ] = c(u1 [ti [x1 , . . . , xri ]/yi ] . . . um [ti [x1 , . . . , xri ]/yi ]) if c ∈ Σ, xj [ti [x1 , . . . , xri ]/yi ] = xj , yi (u1 . . . uri )[ti [x1 , . . . , xri ]/yi ] = ti [u1 [ti [x1 , . . . , xri ]/yi ], . . . , uri [ti [x1 , . . . , xri ]/yi ]], yj (u1 . . . urj )[ti [x1 , . . . , xri ]/yi ] = yj (u1 [ti [x1 , . . . , xri ]/yi ] . . . urj [ti [x1 , . . . , xri ]/yi ]) if j 6= i. (Here, the notation c(u1 . . . um ) stands for c when m = 0, and likewise with yj (u1 . . . urj ).) Let G = (N, Σ, P, S) be a simple context-free tree grammar. The derivation trees of G and their tree yield are defined inductively as follows: – Let π = B(x1 . . . xn ) → t[x1 , . . . , xn ] be a production in P with no nonterminal occurring in t[x1 , . . . , xn ]. Then d = π is a derivation tree of sort B and its tree yield is ty(d) = t[x1 , . . . , xn ]. – Let π = B(x1 . . . xn ) → t[x1 , . . . , xn ] be a production in P with at least one nonterminal occurring in t[x1 , . . . , xn ]. Let v1 , . . . , vk be the pre-order listing of the nodes of t[x1 , . . . , xn ] labeled by nonterminals. Let Bi be the label of vi , for i = 1, . . . , k. If di is a derivation tree of sort Bi for i = 1, . . . , k, then π(d1 . . . dk ) is a derivation tree of sort B. If t̄[x1 , . . . , xn ] is a tree just like t[x1 , . . . , xn ] except that the label of vi is changed to yi , where y1 , . . . , yk are new symbols, then the tree yield ty(d) of d = π(d . . . dk ) is defined by ty(d) = t̄[x1 , . . . , xn ][ty(d1 )/y1 ] . . . [ty(dk )/yk ]. Note that if d is a derivation tree of sort B and n is the rank of B, then ty(d) is an n-context, so the right-hand side of the above equation is well-defined if yi Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 7 is regarded as a symbol whose rank equals the rank of Bi . It is well known that if G = (N, Σ, P, S) is a simple context-free grammar, then L(G) = { ty(d) | d is a derivation tree of G of sort S }. Example 1. Consider a simple context-free tree grammar G = (N, Σ, P, S), where N = N (0) ∪ N (2) = {S} ∪ {B}, Σ = {a1 , a2 , a3 , a4 , a5 , a6 , e, g, h}, and P consists of the following rules: π1 : S → B(ee), π2 : B(x1 x2 ) → h(a1 B(h(a2 x1 a3 )h(a4 x2 a5 ))a6 ), π3 : B(x1 x2 ) → g(x1 x2 ). The following trees are derivation trees of this grammar: π1 (π2 (π3 )) π1 (π2 (π2 (π3 ))) We have ty(π3 ) = g(x1 x2 ), ty(π2 (π3 )) = h(a1 g(h(a2 x1 a3 )h(a4 x2 a5 ))a6 ), ty(π1 (π2 (π3 ))) = h(a1 g(h(a2 ea3 )h(a4 ea5 ))a6 ), ty(π2 (π2 (π3 ))) = h(a1 h(a1 g(h(a2 h(a2 x1 a3 )a3 )h(a4 h(a4 x2 a5 )a5 ))a6 )a6 ), ty(π1 (π2 (π2 (π3 )))) = h(a1 h(a1 g(h(a2 h(a2 ea3 )a3 )h(a4 h(a4 ea5 )a5 ))a6 )a6 ). The string language generated by this grammar is y(L(G)) = { an1 an2 ean3 an4 ean5 an6 | n ≥ 0 }. The string languages of simple context-free grammars are the languages generated by non-duplicating macro grammars [9], studied by Seki and Kato [28].4 They also coincide with the languages generated by well-nested multiple contextfree grammars [13]. 3 The Chomsky-Schützenberger Theorem There are many different proofs of the Chomsky-Schützenberger theorem for context-free languages offered in the literature. Here, we give a proof based on the relation between context-free languages and local sets of unranked trees.5 4 5 At the level of string languages, simple context-free tree grammars correspond to non-duplicating and argument-preserving (i.e., non-deleting) macro grammars, which are equivalent to non-duplicating macro grammars (Lemma 7 of [28]). Seki and Kato [28] called non-duplicating macro grammars variable-linear. Among the proofs found in well-known textbooks, the one closest to our proof seems to be the one given by Kozen [18]. 8 Makoto Kanazawa This will serve as a starting point for our generalization of the theorem to simple context-free tree grammars. Let T = (T, `) be a first-child-next-sibling encoding of an unranked tree. We define three binary relations on T :6 ≺T2 = { (u, v) ∈ T × T | u · 2 = v }, ≺T1 = { (u, v) ∈ T × T | u · 1 = v }, CT = { (u, v) ∈ T × T | u ≺T2 ◦ (≺T1 )∗ v }. The relation CT is the child relation on the nodes of T . Let TΣ be the set of all unranked trees over Σ, encoded as binary trees over Σ. If A, Z ⊆ Σ and I ⊆ Σ × Σ + are finite sets, define Loc(A, Z, I) to be the set of all trees T = (T, `) in TΣ that satisfy the following conditions: L1. `(ε) ∈ A. L2. u ∈ T − dom(≺T2 ) implies `(u) ∈ Z. L3. u ≺T2 v1 ≺T1 . . . ≺T1 vn 6∈ dom(≺T1 ) (n ≥ 1) implies (`(u), `(v1 ) . . . `(vn )) ∈ I. A set L ⊆ TΣ is local [33,31] if there are finite sets A, Z ⊆ Σ and I ⊆ Σ × Σ + such that L = Loc(A, Z, I). We introduce a more restrictive notion of locality. If A, Z, Y ⊆ Σ and K, J ⊆ Σ × Σ, define SLoc(A, Z, K, Y, J) to be the set of all trees T = (T, `) in TΣ that satisfy the following conditions: SL1. SL2. SL3. SL4. SL5. `(ε) ∈ A. u ∈ T − dom(≺T2 ) implies `(u) ∈ Z. u ≺T2 v implies (`(u), `(v)) ∈ K. u 6= ε and u ∈ T − dom(≺T1 ) imply `(u) ∈ Y . u ≺T1 v implies (`(u), `(v)) ∈ J. We call L ⊆ TΣ super-local if there are finite sets A, Z, Y ⊆ Σ and K, J ⊆ Σ ×Σ such that L = SLoc(A, Z, K, Y, J).7 A set of strings L ⊆ Σ + is local 8 if there are finite sets A, Z ⊆ Σ and I ∈ Σ 2 such that L = AΣ ∗ ∩ Σ ∗ Z − (Σ ∗ (Σ 2 − I)Σ ∗ ). In this paper, we allow the alphabet Σ to be infinite, but any local subset of Σ + is included in Σ0+ for some finite subset Σ0 of Σ; likewise, any local or super-local subset of TΣ is included in TΣ0 for some finite Σ0 ⊆ Σ. We extend the string encoding function enc to a function from P(TΣ ) to P(ΓΣ+ ) by enc(L) = { enc(T ) | T ∈ L }. 6 7 8 When R and S are binary relations on some set, we write R ◦ S for the composition of R and S, and write R∗ for the reflexive transitive closure of R. This notion of super-locality was called Fe-locality by Takahashi [31]. This notion of a local set is slightly different from McNaughton and Papert’s [20] notion of a strictly 2-testable language. In the literature, a local set of strings is sometimes called strictly 2-local (for example, [25]). Eilenberg [6], Takahashi [31], and Perrin [22] use “local” in the present sense. Local sets of strings were called “standard regular events” by Chomsky and Schützenberger [3]. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 9 Lemma 2. Let L ⊆ TΣ . If L is super-local, then there is a local set of strings 0 L0 ⊆ ΓΣ+ such that enc(L) = L0 ∩ DΣ . Proof. Suppose that A, Z, Y ⊆ Σ, K, J ⊆ Σ × Σ are finite sets such that L = SLoc(A, Z, K, Y, J). Let A0 = { [c | c ∈ A }, Z 0 = { ]c | c ∈ A }, I = { [c [d | (c, d) ∈ K } ∪ { [c ]c | c ∈ Z } ∪ { ]c [d | (c, d) ∈ J } ∪ { ]c ]d | c ∈ Y, d ∈ Σ }. Let L0 ⊆ ΓΣ+ be the local set of strings defined by L0 = A0 ΓΣ∗ ∩ ΓΣ∗ Z 0 − (ΓΣ∗ (ΓΣ2 − I)∗ ΓΣ∗ ). 0 . We show that enc(L) = L0 ∩ DΣ 0 0 To prove enc(L) ⊆ L ∩ DΣ , suppose T = (T, `) ∈ L. Since we know that 0 , it suffices to show enc(T ) ∈ L0 . If c is the label of the root of T , enc(T ) ∈ DΣ then by the definition of L, c ∈ A, so the first and last symbols of enc(T ) must be [c ∈ A0 and ]c ∈ Z 0 , respectively. So enc(T ) ∈ A0 ΓΣ∗ ∩ ΓΣ∗ Z 0 . Now suppose ab ∈ ΓΣ2 is a substring of enc(T ). We need to show ab ∈ I. From the definition of enc, it is clear that there are two nodes u, v of T labeled c and d, respectively, such that one of the following conditions holds: – – – – a = [c , a = [c , a = ]c , a = ]c , b = [d , b = ]d , b = [d , b = ]d , and u ≺T2 v. In this case, T ∈ L implies (c, d) ∈ K. u = v 6∈ dom(≺T2 ). In this case, T ∈ L implies c = d ∈ Z. and u ≺T1 v. In this case, T ∈ L implies (c, d) ∈ J. u 6= ε, and u 6∈ dom(≺T1 ). In this case, T ∈ L implies c ∈ Y . In each case, we have ab ∈ I. This shows that enc(T ) 6∈ ΓΣ∗ (ΓΣ2 − I)∗ ΓΣ∗ . We have shown that enc(T ) ∈ L0 . 0 0 0 To prove L0 ∩ DΣ ⊆ enc(L), suppose s ∈ L0 ∩ DΣ . Since s ∈ DΣ , there is some T ∈ TΣ such that enc(T ) = s. We show that T satisfies the conditions SL1–SL5 for membership in L = SLoc(A, Z, K, Y, J). SL1. Let c be the label of the root of T . Then s = enc(T ) starts with [c and ends with ]c . Since s ∈ L0 , it must be that [c ∈ A0 and ]c ∈ Z 0 . It follows that c ∈ A. SL2. Let c be the label of any node u ∈ T − dom(≺T2 ). Then from the definition of enc, the string [c ]c must be a substring of s. Since s ∈ L0 , it must be that [c ]c ∈ I. It follows that c ∈ Z. SL3. Let u, v be nodes of T labeled c and d, respectively, such that u ≺T2 v. By the definition of enc, it is easy to see that [c [d is a substring of s and hence in I. It follows that (c, d) ∈ K. SL4. Let u be any non-root node of T labeled c such that u 6∈ dom(≺T1 ). Let d be the label of the parent of u. By the definition of enc, ]c ]d must be a substring of s and hence in I. It follows that c ∈ Y . 10 Makoto Kanazawa SL5. Let u, v be nodes of T labeled c and d, respectively such that u ≺T1 v. By the definition of enc, ]c [d must be a substring of s and hence in I. It follows that (c, d) ∈ J. We have shown that T ∈ L = SLoc(A, Z, K, Y, J). t u Note that the converse of the above lemma does not necessarily hold, because L0 can place a restriction on ]d that can follow ]c . For example, L = {a(bc), a(bd), e(bc)} is not super-local, even though enc(L) is local. A mapping π : Σ → Σ 0 is called a projection. A projection π is extended to a function from TΣ to TΣ 0 and to a function from P(TΣ ) to P(TΣ 0 ) in obvious ways. Lemma 3. Let L ⊆ TΣ be a local set. There exist a finite alphabet Σ 0 , a superlocal set L0 ⊆ TΣ 0 , and a projection π : Σ 0 → Σ such that L = π(L0 ). Moreover, π maps L0 bijectively to L. Proof. Let T ∈ L. We change the label of each node v of T by a pair (c1 . . . cn , i), where c1 . . . cn is the string of labels c1 , . . . , cn of the siblings of v, including v itself, in the order from left to right, and i is the position of v among its siblings. The relabeled trees obtained this way form a super-local set, and we can get back the original trees by a projection. Formally, let9 Σ 00 = { (w, i) | w ∈ Σ + , 1 ≤ i ≤ |w| }, and define a projection π : Σ 00 → Σ by π((c1 . . . cn ), i) = ci . Suppose that A, Z ⊆ Σ and I ∈ Σ×Σ + are finite sets such that L = Loc(A, Z, I). Let F = A ∪ { w ∈ Σ + | (c, w) ∈ I }, Σ 0 = { (w, i) ∈ Σ 00 | w ∈ F }. Note that Σ 0 is a finite subset of Σ 00 . Let A0 = { (c, 1) | c ∈ A }, Z 0 = { (c1 . . . cn , i) ∈ Σ 0 | ci ∈ Z }, K = { ((d1 . . . dl , i), (c1 . . . cn , 1)) | (d1 . . . dl , i) ∈ Σ 0 , (di , c1 . . . cn ) ∈ I } Y = { (c1 . . . cn , i) ∈ Σ 0 | i = n }, J = { ((c1 . . . cn , i), (c1 . . . cn , i + 1)) | (c1 . . . cn , i) ∈ Σ 0 , i ≤ n − 1 }. 9 If w is a string, we write |w| for the length of w. We use | · | both for the length of a string and for the cardinality of a set. The context should make it clear which is intended. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 11 These sets are all finite. Let L0 ⊆ TΣ 00 be the super-local set defined by L0 = SLoc(A0 , Z 0 , K, Y, J). Clearly, L0 ⊆ TΣ 0 . We show that L0 and π (restricted to Σ 0 ) satisfy the required properties. For each T = (T, `T ) ∈ TΣ , define a tree T̂ = (T, `T̂ ) ∈ TΣ 00 by `T̂ (ε) = (`T (ε), 1), (1) `T̂ (u · 2 · 1i−1 ) = (`T (u · 2)`T (u · 2 · 1) . . . `T (u · 2 · 1n−1 ), i) if u · 2 · 1 n−1 ∈T− dom(≺T1 ) (2) and 1 ≤ i ≤ n. It is clear that π(T̂ ) = T for all T ∈ TΣ . Our goal is to show L0 = { T̂ | T ∈ L }. This clearly implies that π is a bijection from L0 to L. We begin by showing that for all T ∈ TΣ , T ∈ L if and only if T̂ ∈ L0 . (3) This follows from five observations. Firstly, note the following: – Suppose u ∈ T − dom(≺T1 ). Then `T̂ (u) is of the form (c1 . . . cn , n), which means that `T̂ (u) ∈ Y if `T̂ (u) ∈ Σ 0 . – Suppose u ≺T1 v. Then (`T̂ (u), `T̂ (v)) is of the form ((c1 . . . cn , i), (c1 . . . cn , i+ 1)), which means that (`T̂ (u), `T̂ (v)) ∈ J if `T̂ (u) ∈ Σ 0 . Thus, T̂ satisfies the last two conditions SL4 and SL5 for membership in SLoc(A0 , Z 0 , K, Y, J) whenever T̂ ∈ TΣ 0 . Secondly, the following biconditional always holds: – `T (ε) ∈ A if and only if `T̂ (ε) ∈ A0 . Thirdly, the following biconditional holds whenever `T (v) ∈ Σ 0 : – `T (v) ∈ Z if and only if `T̂ (v) ∈ Z 0 . Fourthly, if u · 2 · 1n−1 ∈ T − dom(≺T1 ) and `T̂ (u) ∈ Σ 0 , then the following biconditional holds: – (`T (u), `T (u · 2)`T (u · 2 · 1) . . . `T (u · 2 · 1n−1 )) ∈ I if and only if (`T̂ (u), `T̂ (u · 2)) ∈ K. Lastly, it is easy to see that T ∈ L implies T̂ ∈ TΣ 0 . Combining these five observations, we get (3). It follows from the “only if” direction of (3) that { T̂ | T ∈ L } ⊆ L0 . To establish the converse inclusion, we show that if T 0 ∈ L0 and T = π(T 0 ), then T 0 = T̂ . 12 Makoto Kanazawa This together with the “if” direction of (3) clearly implies L0 ⊆ { T̂ | T ∈ L }. 0 So suppose T 0 = (T, `T ) ∈ L0 , and let T = (T, `T ) = π(T 0 ). All we need to show is that the equations (1) and (2) hold with T 0 in place of T̂ . As for 0 (1), it follows from the fact that `T (ε) ∈ A0 . As for (2), suppose u · 2 · 1n−1 ∈ 0 0 0 T − dom(≺T2 ). Since (`T (u), `T (u · 2)) ∈ K, we have `T (u · 2) = (c1 . . . cm , 1) for 0 0 some c1 . . . cm ∈ F . Since for all i ≤ n − 1 we must have (`T (u · 2 · 1i−1 ), `T (u · 0 2 · 1i )) ∈ J, we get `T (u · 2 · 1i−1 ) = (c1 . . . cm , i) ∈ Σ 0 for i = 1, . . . , n. This 0 implies n ≤ m. But `T (u · 2 · 1n−1 ) = (c1 . . . cm , n) must be in Y , so m = n. Since π((c1 . . . cn , i)) = ci = `T (u · 2 · 1i−1 ), it follows that (2) holds with T 0 in place of T̂ . t u We assume the standard definition of the yield function y : TΣ → Σ + . Using the tem notation for unranked trees, we can define it as follows: y(c) = c, y(c(t1 . . . tn )) = y(t1 ) . . . y(tn ). As is well known, a set of strings is a context-free language if and only if it is the yield image of a local set of trees. Let us call a tree T = (T, `) ∈ TΣ disjointly labeled with Σ0 , Σ1 if (i) Σ0 and Σ1 are disjoint subsets of Σ, (ii) u ∈ dom(≺T2 ) implies `(u) ∈ Σ1 , and (iii) u ∈ T − dom(≺T2 ) implies `(u) ∈ Σ0 . Let Σ0 , Σ1 be disjoint sets and let 1 TΣ Σ0 = { T ∈ TΣ0 ∪Σ1 | T is disjointly labeled with Σ0 , Σ1 }. Σ1 + 1 On TΣ Σ0 , the yield function y : TΣ0 → Σ0 can be expressed as the composition y = hΣ0 ,Σ1 ◦ enc of the string encoding function enc and an alphabetic homomorphism hΣ0 ,Σ1 : (ΓΣ0 ∪Σ1 )∗ → Σ0∗ defined by ( hΣ0 ,Σ1 ([c ) = c if c ∈ Σ0 , ε if c ∈ Σ1 , hΣ0 ,Σ1 (]c ) = ε. Lemma 4. Let L ⊆ Σ ∗ be a context-free language. There exist a set Υ disjoint from Σ and a local set K ⊆ TΥΣ such that L = y(K) = hΣ,Υ (enc(K)). Proof. Let G = (N, Σ, P, S) be a context-free grammar for L. Clearly, the parse 10 trees of G form a local subset K of TN t u Σ , and L = y(K) = hΣ,N (enc(K)). 10 This also follows from the fact that a local set of trees is always obtained from a local set of disjointly labeled trees by a projection that does not change the labels of leaves. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 13 Conversely, h(enc(K)) is always a context-free language whenever K is a local set of trees and h is a homomorphism. A projection π : Σ 0 → Σ induces a projection π b : ΓΣ 0 → ΓΣ in an obvious way: π b([c ) = [π(c) , π b(]c ) = ]π(c) . Lemma 5. Let π : Σ 0 → Σ be a projection and L ⊆ TΣ 0 . Then enc(π(L)) = π b(enc(L)). We can use Lemmas 2 through 5 to show that every context-free language L can be represented as L = h(R ∩ Dn0 ) with some alphabetic homomorphism h and local set R. The Chomsky-Schützenberger theorem, however, is stated in terms of the Dyck language Dn rather than the set Dn0 of Dyck primes. The following lemma bridges the representation in terms of Dn0 and that in terms of Dn . Lemma 6. Let L ⊆ ΓΣ+ be a local set of strings. Then there exist a finite alphabet Σ 0 , a projection π : Σ 0 → Σ, and a local set L0 ⊆ ΓΣ+0 such that 0 0 . =π b(L0 ∩ DΣ 0 ). Moreover, π̂ maps L0 ∩ DΣ 0 bijectively to L ∩ DΣ L ∩ DΣ Proof. Let A, Z ⊆ ΓΣ , I ⊆ ΓΣ2 be finite sets such that L = AΓΣ∗ ∩ ΓΣ∗ Z − (ΓΣ∗ (ΓΣ2 − I)ΓΣ∗ ). We may assume without loss of generality that Σ is finite. Let Σ 0 = Σ ∪ { c̄ | c ∈ Σ }. Let A0 = { [c̄ | [c ∈ A }, Z 0 = { ]c̄ | ]c ∈ Z }, I 0 = I ∪ { [c̄ d | [c d ∈ I } ∪ { d ]c̄ | d ]c ∈ I } ∪ { [c̄ ]c̄ | [c ]c ∈ I }, and put L0 = A0 ΓΣ∗ 0 ∩ ΓΣ∗ 0 Z 0 − (ΓΣ∗ 0 (ΓΣ2 0 − I 0 )ΓΣ∗ 0 ). Define π : Σ 0 → Σ by π(c) = c, π(c̄) = c 0 0 0 0 0 for each c ∈ Σ. Then it is clear that π b(DΣ 0 ) = DΣ and L ∩DΣ 0 = L ∩DΣ 0 . Also, 0 0 0 since π b maps A , Z , I to A, Z, I, respectively, it is easy to see that π b(L0 ) ⊆ L. 0 0 0 This establishes π b(L ∩ DΣ 0 ) ⊆ L ∩ DΣ . Now suppose w ∈ L ∩ DΣ . Then 0 0 w = [c v ]c for some c ∈ Σ and v ∈ DΣ . Let w = [c̄ v ]c̄ . Then w0 ∈ DΣ 0 0 and π b(w ) = w. Since w ∈ L, [c ∈ A and ]c ∈ Z, so it follows that [c̄ ∈ A0 and ]c̄ ∈ Z 0 . It is also easy to see that if a1 a2 is a substring of w0 , a1 a2 ∈ I 0 . 0 0 Therefore, w0 ∈ L0 , and this shows that L ∩ DΣ ⊆π b(L0 ∩ DΣ b(L0 ∩ DΣ 0 ). 0) = π 0 0 It is also easy to see that w is the unique element of L that π b maps to w. t u Lemma 7. If L ⊆ TΣ is a local set, then there exist a finite alphabet Σ 0 , a projection π : Σ 0 → Σ, and a local set L0 ⊆ ΓΣ+0 such that enc(L) = π b(L0 ∩DΣ 0 ). −1 0 Moreover, enc ◦ π b maps L ∩ DΣ 0 bijectively to L. 14 Makoto Kanazawa Proof. By Lemma 3, there exist a projection π1 : Σ1 → Σ and a super-local set L1 ⊆ TΣ1 such that L = π1 (L1 ) and π1 is a bijection from L1 to L. By 0 Lemma 2, there exists a local set L2 ⊆ ΓΣ∗ 1 such that enc(L1 ) = L2 ∩ DΣ . By 1 0 0 ∗ Lemma 6, there exist a projection π3 : Σ → Σ1 and a local set L ⊆ ΓΣ 0 such 0 0 that L2 ∩ DΣ =π c3 (L0 ∩ DΣ 0 ) and π c3 is a bijection from L0 ∩ DΣ 0 to L2 ∩ DΣ . 1 1 By Lemma 5, enc(L) = enc(π1 (L1 )) = π c1 (enc(L1 )). Since enc is injective, π c1 is a bijection from enc(L1 ) to enc(L). Taking these all together, we get enc(L) = enc(π1 (L1 )) =π c1 (enc(L1 )) 0 =π c1 (L2 ∩ DΣ ) 1 =π c1 (c π3 (L0 ∩ DΣ 0 )) = (c π1 ◦ π c3 )(L0 ∩ DΣ 0 ) =π b(L0 ∩ DΣ 0 ), 0 where π = π1 ◦ π3 . Since π c3 is a bijection from L0 ∩ DΣ 0 to L2 ∩ DΣ = enc(L1 ) 1 and π c1 is a bijection from enc(L1 ) to enc(L), π b is a bijection from L0 ∩ DΣ 0 to enc(L), and the second statement of the lemma follows. t u Theorem 8 (Chomsky and Schützenberger). A language L ⊆ Σ ∗ is context-free if and only if there exist a positive integer n, a local set R ⊆ Γn+ , and an alphabetic homomorphism h : Γn∗ → Σ ∗ such that L = h(R ∩ Dn ). Proof. The “if” direction is by standard closure properties of the context-free languages. For the “only if” direction, let L ⊆ Σ ∗ be a context-free language. Then Lemma 4 gives an alphabet Υ disjoint from Σ and a local set K ⊆ TΣ,Υ such that L = hΣ,Υ (enc(K)). By Lemma 7, there are a projection π : Υ 0 → Σ ∪Υ and a local set R ⊆ ΓΥ+0 such that enc(K) = π b(R ∩ DΥ 0 ). We have L = hΣ,Υ (enc(K)) = hΣ,Υ (b π (R ∩ DΥ 0 )), so the required condition holds with n = |Υ 0 | and h = hΣ,Υ ◦ π̂.11 t u In the proof of Theorem 8, enc−1 ◦b π is a bijection from R∩DΥ 0 to K. (See the second statement in Lemma 7.) If K is the set of derivation trees of a contextfree grammar for L, then an element s of R ∩ DΥ 0 represents both the element t = enc−1 (b π (s)) of K and the element hΣ,Υ (b π (s)) = hΣ,Υ (enc(t)) = y(t) of L. Moreover, every pair (t, y(t)) with t ∈ K is so represented. This is an important consequence of the theorem explicitly noted by Chomsky [4, page 377], though rarely emphasized since.12 11 12 Here, |Υ 0 | denotes the cardinality of the set Υ 0 . See footnote 9. Instead of a super-local set of trees, Chomsky [4] used the notion of a modified normal grammar, a restricted kind of grammar in Chomsky normal form. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 15 We took a rather long route to the Chomsky-Schützenberger theorem. Our generalization of the theorem to multi-dimensional tree languages follows a similar path, except that an analogue of Lemma 4 is not needed, since the multidimensional counterpart of the function enc is not exactly a generalization of the usual notion. 4 Multi-dimensional Trees Multi-dimensional trees were introduced by Rogers [24,23]. In an ordinary (labeled, ordered unranked) tree, the set of children of a node forms a linearly ordered sequence of labeled nodes, i.e., a string. In an m-dimensional tree (m ≥ 1), the set of children of a node (if non-empty) forms an (m − 1)-dimensional tree. A 0-dimensional tree just consists of a single labeled node. Unlike Rogers [24,23], who introduces the higher-dimensional tree as a new kind of object, we prefer to define an m-dimensional tree as a special kind of ordinary m-ary tree.13 The first-child-next-sibling encoding of unranked trees will be a special case of this definition for m = 2. We use finite strings of elements of [1, m] = {1, . . . , m} to represent nodes of m-ary trees. We write u · v for the concatenation of finite strings u, v over [1, m], and write ε for the empty string. An m-ary tree domain is any non-empty, finite, prefix-closed subset of [1, m]∗ . (Since ∅∗ = {ε}, the only 0-ary tree domain is {ε}.) If T is an m-ary tree domain, we write u ≺Ti v to mean u, v ∈ T and u · i = v. If Σ is a (possibly infinite) set of symbols, an m-ary tree over Σ is a pair (T, `), where T is an m-ary tree domain and ` is a function from T to Σ. If T = (T, `T ) is an m-ary tree and U ⊆ T is an m-ary tree domain, then the restriction of T to U is the m-ary tree T U = (U, `T U ). When u ∈ T , let T /u = { v | uv ∈ T }. Then T /u is an m-ary tree domain and the subtree of T rooted at u is defined by T /u = (T /u, `), where `(v) = `T (uv). Recall that a first-child-next-sibling encoding of an unranked tree is a binary tree (T, `) such that 1 6∈ T . Analogously, an m-dimensional tree is an m-ary tree (T, `) such that T is an m-ary tree domain included in a certain special subset of 13 To be precise, our m-dimensional trees form a special class of m-ary cardinal trees in the sense of Benoit et al. [2]. In m-ary cardinal trees, each node has m slots for children, each of which may or may not be occupied, independently of the other slots. Cardinal trees are also known as tries. 16 Makoto Kanazawa [1, m]∗ . For each natural number m, define a subset Pm of [1, m]∗ by recursion, as follows: P0 = {ε}, ∗ Pm = (m · Pm−1 ) for m > 1. In other words, w ∈ Pm if and only if w = i · v implies i = m and w = u · i · j · v implies j ≥ i − 1. For m ≥ 0, an m-dimensional tree (over Σ) is an m-ary tree (over Σ) T = (T, `T ) such that T ⊆ Pm . For m ≥ 1, an m-dimensional hedge (over Σ) is an m-ary tree (over Σ) T = (T, `T ) such that T ⊆ Pm−1 · Pm . We m write Tm Σ and HΣ to denote the set of all m-dimensional trees over Σ and the set of all m-dimensional hedges over Σ, respectively. For m ≥ 1, all m-dimensional trees are m-dimensional hedges. Note that a 1-dimensional hedge is just a 1dimensional tree. Note that a 0-dimensional tree is a structure Tc = ({ε}, {(ε, c)}) consisting of a single node labeled by some c ∈ Σ. We may identify Tc with c; under this n 0 convention, T0Σ = Σ. Note that if m 6= n, then Tm Σ ∩ TΣ = TΣ . A 1-dimensional tree is a non-empty, linearly ordered sequence of labeled nodes. We may use a string c1 . . . cn ∈ Σ + to denote the 1-dimensional tree Tc1 ...cn = ({ε, 1, . . . , 1n−1 }, `), where `(1i−1 ) = ci for i = 1, . . . , n. Under this convention, T1Σ = Σ + . The first-child-next-sibling encodings of unranked trees and unranked hedges coincide with the 2-dimensional trees and the 2-dimensional hedges, respectively; we have T2Σ = TΣ . Henceforth, we use T , T 0 , U , etc., as variables ranging over m-dimensional trees and m-dimensional hedges. Unless we indicate otherwise, we assume T = 0 (T, `T ), T 0 = (T 0 , `T ), U = (U, `U ), etc. Let T be an m-dimensional hedge. We can see that if u ∈ T , then ST i (T , u) = (T /u) { v ∈ Pi | uv ∈ T } is always an i-dimensional tree, and SH i (T , u) = (T /u) { v ∈ Pi−1 · Pi | uv ∈ T } is always an i-dimensional hedge. In particular, when u · m ∈ T , the subtree T /(u · m) = SH m (T , u · m) is always an m-dimensional hedge. For i ≥ 1, we write u CTi v to mean v ∈ T ∩ u · i · Pi−1 . When u CTi v, we say that v is a child of u in the i-th dimension (in T ). If u ≺Ti v, then v is the first child of u in the i-th dimension. Define CiT (u) = { v ∈ Pi−1 | u · i · v ∈ T } = { v | u CTi u · i · v }. If u · i 6∈ T , that is, if u 6∈ dom(≺Ti ), then CiT (u) = ∅. If u · i ∈ T , define CiT (u) = ST i−1 (T , u · i) = T /(u · i) CiT (u). Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 17 Then CiT (u) is always an (i − 1)-dimensional tree. We assume that elements of [1, m]∗ are alphabetically ordered, with k + 1 alphabetically preceding k. We write u CTi,j v to mean v is the j-th node, under this ordering, among the children of u in the i-th dimension. The degree of a node v ∈ T is the number of children of v in the m-th (i.e., highest) dimension. A subset of Tm Σ is an m-dimensional tree language. We allow Σ to be an infinite set, but are usually interested in m-dimensional tree languages over some finite subset of Σ. We call a set L ⊆ Tm Σ degree-bounded if there exists a k such that for all T ∈ L and for all v ∈ T , the degree of v does not exceed k. It is sometimes helpful to use term-like notations for m-dimensional hedges and trees. Let P be an (m − 1)-ary tree domain included in Pm−1 (i.e., a finite, non-empty, prefix-closed subset of Pm−1 ), and let u1 , . . . , uk be the elements of P , in alphabetical order (which implies u1 = ε). If T1 , . . . , Tk ∈ Tm Σ , then we write P (T1 , . . . , Tk ) to denote an m-dimensional hedge U = (U, `U ) ∈ Hm Σ such that U= k [ ui · Ti , ST m (U , ui ) = Ti . i=1 m (As a degenerate case, we have {ε}(T ) = T for any T ∈ Tm Σ .) If T ∈ HΣ and c ∈ Σ, then we write c −m T to denote an m-dimensional tree V = (V, `V ) ∈ Tm Σ such that V = {ε} ∪ m · T, `V (ε) = c, SH m (V , m) = T . Combining the two notations, c −m P (T1 , . . . , Tk ) T denotes the m-dimensional tree T = (T, `T ) such that `T (ε) = c, Cm (ε) = P , and ST m (T , ui ) = Ti , where u1 , . . . , uk is the alphabetical listing of the elements of P .14 Example 9. Derivation trees of a simple context-free tree grammar G = (N, Σ, P, S) can be represented as 3-dimensional trees over the alphabet N ∪ Σ. In these 3-dimensional trees, a node has children in the third dimension if and only if it is labeled by a nonterminal. For instance, the derivation tree π1 (π2 (π3 )) of the grammar from Example 1 may be represented by the 3-dimensional tree T in Fig. 1. In this tree, the node labeled by S is the root; the edges in the third dimension are colored red, those in the second dimension blue, and those in the first dimension green. For instance, the node 3 (i.e., the child of the root 14 An equivalent notation for m-dimensional trees has been used by Kasprzik [14]. 18 Makoto Kanazawa S B e e h B a1 a5 h a6 x a4 a3 h x a2 g x x Fig. 1. A derivation tree of a simple context-free tree grammar represented as a 3dimensional tree. in the third dimension) is labeled by the nonterminal B, and its children in the third dimension form a 2-dimensional tree corresponding to the right-hand side of the rule π2 = B(x1 x2 ) → h(a1 B(h(a2 x1 a3 )h(a4 x2 a5 ))a6 ). The numbering of variables in the rules are eschewed in favor of a single variable x; the alphabetic ordering of the nodes labeled by x among the children in the third dimension of a nonterminal-labeled node is assumed to correspond to the numbering.15 This tree can be represented in the term notation as follows (omitting the dots in the strings over {1, 2} representing nodes): S −3 {ε, 2, 21}( B −3 {ε, 2, 21, 212, 2122, 21221, 212211, 2121, 21212, 212121, 2121211, 211}( h, a1 , B −3 {ε, 2, 21}(g, x, x), h, a2 , x, a3 , h, a4 , x, a5 , a6 ), e, e ). We have C3T (3 · 3 · 2 · 1) = g −2 {ε, 1}(x, x) = g(xx), C2T (3 · 3 · 2 · 1) = h −1 h = hh, 15 It is known that simple context-free tree grammars satisfying this condition constitute a normal form. This use of a single variable instead of numbered variables is not crucial for our purposes. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 19 using both the notation introduced just above and the standard term and string representations for 2-dimensional and 1-dimensional trees. 5 Local and Super-local Sets of Multi-dimensional Trees If A, Z ⊆ Σ and I ⊆ Σ × Tm−1 are finite sets, we let Locm (A, Z, I) denote Σ the set of all m-dimensional trees T = (T, `T ) in Tm Σ that satisfy the following conditions: L1. `T (ε) ∈ A, L2. v ∈ T − dom(≺Tm ) implies `T (v) ∈ Z, and T L3. v ∈ dom(≺Tm ) implies (`T (v), Cm (v)) ∈ I. m−1 A set L ⊆ Tm Σ is local [24,23] if there exist finite sets A, Z ⊆ Σ and I ⊆ Σ ×TΣ m m such that L = Loc (A, Z, I). Note that if L ⊆ TΣ is local, then L must be degree-bounded; for, if L = Locm (A, Z, I), the degree of any node v of T ∈ L is bounded by the maximal size of U such that (c, U ) ∈ I for some c. Clearly, the notion of locality coincides with the usual notion [22,33,31] when m ∈ {1, 2}. Let m ≥ 2. Write N+ for N−{0} (the set of positive integers). If A, Z, Y ⊆ Σ, K ⊆ Σ × Σ, and J ⊆ Σ × { P ⊆ Pm−2 | P is an (m − 2)-ary tree domain } × N+ × Σ are finite sets, then we let SLocm (A, Z, K, Y, J) denote the set of all trees T = (T, `T ) in Tm Σ that satisfy the following conditions: SL1. SL2. SL3. SL4. SL5. `T (ε) ∈ A, v ∈ T − dom(≺Tm ) implies `T (v) ∈ Z, u ≺Tm v implies (`T (u), `T (v)) ∈ K, v 6= ε and v ∈ T − dom(≺Tm−1 ) imply `T (v) ∈ Y , and T u ∈ dom(≺Tm−1 ) and u CTm−1,i v imply (`T (u), Cm−1 (u), i, `T (v)) ∈ J. We call a set L ⊆ Tm Σ super-local if there exist finite sets A, Z, Y ⊆ Σ, K ⊆ Σ×Σ, and J ⊆ Σ × { P ⊆ Pm−2 | P is an (m − 2)-ary tree domain } × N+ × Σ such that L = SLocm (A, Z, K, Y, J). For m = 2, Pm−2 = P0 = {ε}, and u CT1,i v only if i = 1, so this definition generalizes our earlier definition of super-locality for subsets of TΣ = T2Σ . It is easy to see that a degree-bounded super-local language must be local. Although we allow Σ to be infinite, any local or super-local set L ⊆ Tm Σ must be an m-dimensional tree language over some finite subset of Σ. Projections from Σ to Σ 0 are naturally extended to m-dimensional trees and hedges over Σ and to m-dimensional tree languages over Σ. The next lemma generalizes Lemma 3 to the higher-dimensional case. The proofs of two lemmas to follow (Lemmas 11 and 35) will be adaptations of the proof of this lemma. Lemma 10. Let m ≥ 2. For any local m-dimensional tree language L ⊆ Tm Σ, there exist a finite alphabet Σ 0 , a degree-bounded, super-local m-dimensional tree 0 0 language L0 ⊆ Tm Σ 0 , and a projection π : Σ → Σ such that L = π(L ). Moreover, 0 π maps L bijectively to L. 20 Makoto Kanazawa Proof. The proof parallels that of Lemma 3. The idea is to change the label of each non-root node v of T ∈ L to T (Cm (u), v 0 ), where u · m · v 0 = v and v 0 ∈ Pm−1 . For uniformity, we change the label of the root from c ∈ Σ to (Tc , ε), where Tc = ({ε}, {(ε, c)}) is the single-node tree that we identified with c. The relabeled m-dimensional trees obtained this way form a super-local set, and we can get back the original m-dimensional trees by a projection. Let Σ 00 = { (T , v) | T = (T, `T ) ∈ Tm−1 , v ∈ T }, Σ and define a projection π : Σ 00 → Σ by π((T , v)) = `T (v). Suppose that A, Z ⊆ Σ and I ⊆ Σ × Tm−1 are finite sets such that L = Σ Locm (A, Z, I). Let F = { Tc | c ∈ A } ∪ { T | (c, T ) ∈ I }, Σ 0 = { (T , v) ∈ Σ 00 | T ∈ F }. Note that Σ 0 is a finite subset of Σ 00 . Now define A0 = { (Tc , ε) | c ∈ A }, Z 0 = { (T , v) ∈ Σ 0 | `T (v) ∈ Z }, K = { ((T , v), (T 0 , ε)) | (T , v) ∈ Σ 0 , (`T (v), T 0 ) ∈ I }, Y = { (T , v) ∈ Σ 0 | v 6∈ dom(≺Tm−1 ) }, T J = { ((T , u), Cm−1 (u), i, (T , v)) | (T , u) ∈ Σ 0 , u CTm−1,i v }. 0 These are all finite sets. Let L0 ⊆ Tm Σ 00 be the super-local set defined by L = m 0 0 0 0 m SLoc (A , Z , K, Y, J). It is quite clear that L ⊆ TΣ 0 . We show that L and π (restricted to Σ 0 ) satisfy the required properties. T̂ m For each T ∈ Tm Σ , define an m-dimensional tree T̂ = (T, ` ) ∈ TΣ 00 by `T̂ (ε) = (T`T (ε) , ε), T `T̂ (u · m · v) = (Cm (u), v), (4) T if u ∈ dom(≺Tm ) and v ∈ Cm−1 (u). (5) It is clear that π(T̂ ) = T for all T ∈ Tm Σ . Our goal is to show L0 = { T̂ | T ∈ L }. This clearly implies that π is a bijection from L0 to L. We show that for all T ∈ Tm Σ, T ∈ L if and only if T̂ ∈ L0 . This follows from five observations. Firstly, note the following: (6) Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 21 – Suppose u ∈ T − dom(≺Tm−1 ). If `T̂ (u) = (U , v), then v 6∈ dom(≺U m−1 ). This means that `T̂ (u) ∈ Y if `T̂ (u) ∈ Σ 0 . – Suppose s CTm u = s · m · u0 CTm−1,i v = u · (m − 1) · v 0 . Then we have C T (s) m u0 Cm−1,i u0 · (m − 1) · v 0 and T `T̂ (u) = (Cm (s), u0 ), T `T̂ (v) = (Cm (s), u0 · (m − 1) · v 0 ), C T (s) T m Cm−1 (u) = Cm−1 (u0 ). T This means that (`T̂ (u), Cm−1 (u), i, `T̂ (v)) ∈ J if `T̂ (u) ∈ Σ 0 . Thus, T̂ satisfies the last two conditions SL4 and SL5 for membership in SLocm (A0 , Z 0 , K, Y, J) whenever T̂ ∈ Tm Σ 0 . Secondly, the following biconditional always holds: – `T (ε) ∈ A if and only if `T̂ (ε) ∈ A0 . Thirdly, the following biconditional holds whenever `T (v) ∈ Σ 0 : – `T (v) ∈ Z if and only if `T̂ (v) ∈ Z 0 . Fourthly, if u ≺Tm v and `T̂ (u) ∈ Σ 0 , then the following biconditional holds: T (u)) ∈ I if and only if (`T̂ (u), `T̂ (v)) ∈ K. – (`T (u), Cm Lastly, it is easy to see that T ∈ L implies T̂ ∈ Tm Σ 0 . Combining these five observations, we get (6). It follows from the “only if” direction of (6) that { T̂ | T ∈ L } ⊆ L0 . To establish the converse inclusion, we show that if T 0 ∈ L0 and T = π(T 0 ), then T 0 = T̂ . This together with the “if” direction of (6) clearly implies L0 ⊆ { T̂ | T ∈ L }. 0 So suppose T 0 = (T, `T ) ∈ L0 , and let T = (T, `T ) = π(T 0 ). All we need to show is that the equations (4) and (5) hold with T 0 in place of T̂ . 0 As for (4), it follows from the fact that `T (ε) ∈ A0 . 0 0 T T0 As for (5), suppose u ∈ dom(≺m ). Since (` (u), `T (u·m)) ∈ K, `T (u·m) = (V , ε) for some V ∈ F . We show two things: 0 T v ∈ Cm (u) implies v ∈ V and `T (u · m · v) = (V , v). V ⊆ T Cm (u). (7) (8) T T It then easily follows that V = Cm (u) and (5) holds whenever v ∈ Cm (u). T We show (7) by induction on v ∈ Cm (u). For v = ε, we already know that 0 ε ∈ V and `T (u · m) = (V , ε). If v 6= ε, we can write v = v 0 · (m − 1) · v 00 T with v 00 ∈ Pm−2 . Suppose u · m · v 0 CTm−1,i u · m · v. Since v 0 ∈ Cm (u), by 22 Makoto Kanazawa 0 induction hypothesis, v 0 ∈ V and `T (u · m · v 0 ) = (V , v 0 ). Since T 0 ∈ L0 , 0 0 T (`T (u · m · v 0 ), Cm−1 (u · m · v 0 ), i, `T (u · m · v)) ∈ J. By the definition of J, we T V must have Cm−1 (u · m · v 0 ) = Cm−1 (v 0 ), which implies v 0 CVm−1,i v ∈ V . The T0 definition of J then implies ` (u · m · v) = (V , v). Having established (7), we proceed to show (8) by induction on v ∈ V . For T v = ε, we have ε ∈ Cm (u) since u ∈ dom(≺Tm ). If v = v 0 · (m − 1) · v 00 with 00 0 T v ∈ Pm−2 , then v ∈ Cm (u) by induction hypothesis. Since v 0 ∈ dom(≺Vm−1 ), 0 `T (u·m·v 0 ) = (V , v 0 ) 6∈ Y . Since T 0 ∈ L0 , this means that u·m·v 0 ∈ dom(≺Tm−1 ) 0 0 T and so (`T (u · m · v 0 ), Cm−1 (u · m · v 0 ), 1, `T (u · m · v 0 · (m − 1))) ∈ J. Since 0 T V `T (u · m · v 0 ) = (V , v 0 ), the definition of J implies Cm−1 (u · m · v 0 ) = Cm−1 (v 0 ). T Since v = v 0 · (m − 1) · v 00 ∈ V , it follows that v 00 ∈ Cm−1 (u · m · v 0 ) and hence T v = v 0 · (m − 1) · v 00 ∈ Cm (u). This concludes the proof of the lemma. t u 6 Encoding and Yield at Higher Dimensions In order to prove an analogue of the Chomsky-Schützenberger theorem for the m-dimensional yields of local (m + 1)-dimensional tree languages, we need to define the higher-dimensional counterparts of the mappings enc, y, and of the Dyck languages. Since we use 3-dimensional trees to represent derivation trees of simple context-free tree grammars, the yield function mapping 3-dimensional trees to 2-dimensional trees must be consistent with the relation between derivation trees and their tree yield of simple context-free tree grammars. We set aside a special symbol x and use it to extend a given set Σ of symbols. The intended role of x in m-dimensional trees over Σ ∪ {x} is that of a placeholder; the encoding function erases all occurrences of x. We write Tm Σ (n) to denote the set of m-dimensional trees in Tm in which x labels exactly n nodes Σ∪{x} and none of these nodes have a child in the m-th dimension. Let T ∈ Tm Σ (n), T1 , . . . , Tn ∈ Tm , and let u , . . . , u be the nodes of T labeled by x, in the 1 n Σ∪{x} alphabetical order. Then we write T [T1 , . . . , Tn ] for the tree T 0 ∈ Tm Σ∪{x} such that T 0 = T ∪ u1 · T1 ∪ · · · ∪ un · Tn , ( 0 `T (v) if v ∈ T − {u1 , . . . , un }, `T (v) = Ti 0 ` (v ) if v = ui · v 0 . T Given an m-dimensional hedge T ∈ Hm Σ∪{x} , define a binary relation Jm,i on T for each positive integer i, as follows: u JT m,i v if and only if v is alphabetically T the i-th node in { w | u Cm w, ` (w) = x }. Let m ≥ 2. An m-dimensional hedge T ∈ Hm Σ∪{x} is well-labeled if Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 23 – for all v ∈ T , `T (v) = x implies v 6∈ dom(≺Tm ) ∪ dom(≺Tm−1 ), and m−1 T – for all v ∈ dom(≺Tm ), Cm (v) ∈ TΣ (n) implies |Cm−1 (v)| = n. We write Hm Σ,x to denote the class of well-labeled m-dimensional hedges over T Σ ∪ {x}. If T ∈ Hm Σ,x , then for each node v ∈ dom(≺m ), there is a bijection T T between { u ∈ T | v Cm u and ` (u) = x } and { u ∈ T | v CTm−1 u }, namely, S T −1 ◦ CTm−1,i ). We write Hm Σ,x (n) for i≥1 ((Jm,i ) T { T ∈ Hm Σ,x | there are exactly n nodes v ∈ T ∩ Pm−1 such that ` (v) = x }. Note that if T ∈ Hm Σ,x (n), T may have more than n nodes labeled by x. m m We write Tm Σ,x for HΣ,x (0) ∩ TΣ∪x . We will give suitable definitions of encoding and yield for elements of Tm Σ,x shortly. Before that, here is a variant of Lemma 10 for languages consisting of well-labeled m-dimensional trees. If π : Σ 0 → Σ is a projection, we extend it to a projection π : Σ 0 ∪ {x} → Σ ∪ {x} by letting π(x) = x. Lemma 11. Let m ≥ 2. For any local m-dimensional tree language L ⊆ Tm Σ,x , there exist a finite alphabet Σ 0 , a degree-bounded, super-local m-dimensional 0 0 tree language L0 ⊆ Tm Σ 0 ,x , and a projection π : Σ → Σ such that L = π(L ). 0 0 Moreover, π maps L bijectively L to L. Proof. Since the case where L ⊆ Tm Σ is covered by Lemma 10, we assume L 6⊆ . Without loss of generality, we may assume that L = Locm (A, Z, I), for some Tm Σ m−1 A ⊆ Σ, Z ⊆ Σ ∪ {x}, I ⊆ Σ × TΣ∪{x} such that – x ∈ Z, – (c, T ) ∈ I implies c ∈ Σ and `T (v) ∈ Σ for all v ∈ dom(≺Tm−1 ), and – there exist (c, T ) ∈ I and v ∈ T such that `T (v) = x. We modify the construction in the proof of Lemma 10 slightly. The difference is that where (T , v) would appear in the earlier construction, x appears instead just in case `T (v) = x. Otherwise, the proof is essentially the same. The definition of Σ 00 is changed as follows: T Σ 00 = { (T , v) | T ∈ Tm−1 Σ∪x , v ∈ T, ` (v) ∈ Σ }. The definition of π : Σ 00 → Σ remains the same: π((T , v)) = `T (v). As before, let F = { Tc | c ∈ A } ∪ { T | (c, T ) ∈ I }, Σ 0 = { (T , v) ∈ Σ 00 | T ∈ F }. The definitions of Z 0 , K, Y, J are modified as follows: A0 = { (Tc , ε) | c ∈ A }, Z 0 = {x} ∪ { (T , v) ∈ Σ 0 | `T (v) ∈ Z − {x} }, 24 Makoto Kanazawa 0 K = { ((T , v), (T 0 , ε)) | (T , v) ∈ Σ 0 , (`T (v), T 0 ) ∈ I, `T (ε) ∈ Σ } ∪ { ((T , v), x) | (T , v) ∈ Σ 0 , (`T (v), Tx ) ∈ I }, Y = {x} ∪ { (T , v) ∈ Σ 0 | v 6∈ dom(≺Tm−1 ) }, T (u), i, (T , v)) | (T , u) ∈ Σ 0 , u CTm−1,i v, `T (v) ∈ Σ } ∪ J = { ((T , u), Cm−1 T { ((T , u), Cm−1 (u), i, x) | (T , u) ∈ Σ 0 , u CTm−1,i v, `T (v) = x }. These are finite sets. As before, let L0 ⊆ Tm Σ 00 ∪{x} be the super-local set defined 0 0 0 by L = SLoc(A , Z , K, Y, J). It is easy to see that L0 ⊆ Tm Σ 0 ∪{x} . T̂ m For each T ∈ Tm Σ∪{x} , define an m-dimensional tree T̂ = (T, ` ) ∈ TΣ 00 ∪{x} by `T̂ (ε) = (T`T (ε) , ε), ( T (Cm (u), v) if `T (u · m · v) ∈ Σ, `T̂ (u · m · v) = x if `T (u · m · v) = x, (9) T for v ∈ Cm−1 (u). (10) It is clear that π(T̂ ) = T for all T ∈ Tm Σ∪{x} . Our goal is to show L0 = { T̂ | T ∈ L }. This clearly implies that π is a bijection from L0 to L. We show that for all T ∈ Tm Σ∪{x} , T ∈ L if and only if T̂ ∈ L0 . (11) This follows from five observations. Firstly, note the following: – Suppose u ∈ T − dom(≺Tm−1 ). If `T̂ (u) = (U , v), then v 6∈ dom(≺U m−1 ). This T̂ T̂ 0 means that ` (u) ∈ Y if ` (u) ∈ Σ ∪ {x}. – Suppose s CTm u = s · m · u0 CTm−1,i v = u · (m − 1) · v 0 . Then we have C T (s) m u0 Cm−1,i u0 · (m − 1) · v 0 and ( T (Cm (s), u0 ) if `T (u) ∈ Σ, ` (u) = x if `T (u) = x, ( T (Cm (s), u0 · (m − 1) · v 0 ) if `T (v) ∈ Σ, `T̂ (v) = x if `T (v) = x, T̂ C T (s) T m Cm−1 (u) = Cm−1 (u0 ). T This means that (`T̂ (u), Cm−1 (u), i, `T̂ (v)) ∈ J if `T̂ (u) ∈ Σ 0 . Thus, T̂ satisfies the last two conditions SL4 and SL5 for membership in SLocm (A0 , Z 0 , K, Y, J) whenever T̂ ∈ Tm Σ 0 ,x . Secondly, the following biconditional always holds: Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 25 – `T (ε) ∈ A if and only if `T̂ (ε) ∈ A0 . Thirdly, the following biconditional holds whenever `T (v) ∈ Σ 0 ∪ {x}: – `T (v) ∈ Z if and only if `T̂ (v) ∈ Z 0 . Fourthly, if u ≺Tm v, `T̂ (u) ∈ Σ 0 , and either v 6∈ dom(≺Tm−1 ) or `T̂ (v) ∈ Σ 00 , then the following biconditional holds: T – (`T (u), Cm (u)) ∈ I if and only if (`T̂ (u), `T̂ (v)) ∈ K. Lastly, it is easy to see that T ∈ L implies T̂ ∈ Tm Σ 0 ,x . Combining these five observations, we get (11). It follows from the “only if” direction of (11) that { T̂ | T ∈ L } ⊆ L0 . To establish the converse inclusion, we show that if T 0 ∈ L0 and T = π(T 0 ), then T 0 = T̂ . This together with the “if” direction of (11) clearly implies L0 ⊆ { T̂ | T ∈ L }. 0 So suppose T 0 = (T, `T ) ∈ L0 , and let T = (T, `T ) = π(T 0 ). All we need to show is that the equations (9) and (10) hold with T 0 in place of T̂ . 0 As for (9), it follows from the fact that `T (ε) ∈ A0 . 0 0 0 T As for (10), suppose u ∈ dom(≺m ). Since (`T (u), `T (u·m)) ∈ K, `T (u·m) = (V , ε) for some V ∈ F . We show two things: ( (V , v) if `V (v) ∈ Σ, T T0 v ∈ Cm (u) implies v ∈ V and ` (u · m · v) = (12) x if `V (v) = x. T V ⊆ Cm (u). (13) T T (u). (u) and (10) holds whenever v ∈ Cm It then easily follows that V = Cm T We show (12) by induction on v ∈ Cm (u). For v = ε, we already know that 0 ε ∈ V and `T (u · m) = (V , ε). If v 6= ε, we can write v = v 0 · (m − 1) · v 00 with T (u), by induction v 00 ∈ Pm−2 . Suppose u · m · v 0 CTm−1,i u · m · v. Since v 0 ∈ Cm 0 T0 0 0 hypothesis, v ∈ V and ` (u · m · v ) is either (V , v ) or x depending on whether 0 0 T `V (v 0 ) ∈ Σ or not. Since T 0 ∈ L0 , (`T (u·m·v 0 ), Cm−1 (u·m·v 0 ), i, `T (u·m·v)) ∈ J. 0 T By the definition of J, we must have `T (u·m·v 0 ) = (V , v 0 ) and Cm−1 (u·m·v 0 ) = V 0 0 V Cm−1 (v ), which implies v Cm−1,i v ∈ V . The definition of J then implies 0 `T (u · m · v) is either (V , v) or x depending on whether `V (v) ∈ Σ or not. Having established (12), we proceed to show (13) by induction on v ∈ V . T For v = ε, we have ε ∈ Cm (u) since u ∈ dom(≺Tm ). If v = v 0 · (m − 1) · v 00 00 0 T with v ∈ Pm−2 , then v ∈ Cm (u) by induction hypothesis. Since V ∈ F and 0 0 V v ∈ dom(≺m−1 ), by the assumption about I, `V (v 0 ) ∈ Σ and so `T (u · m · v 0 ) = (V , v 0 ) 6∈ Y . Since T 0 ∈ L0 , this means that u · m · v 0 ∈ dom(≺Tm−1 ) 0 0 T (u · m · v 0 ), 1, `T (u · m · v 0 · (m − 1))) ∈ J. Since and so (`T (u · m · v 0 ), Cm−1 0 T V (u · m · v 0 ) = Cm−1 (v 0 ). `T (u · m · v 0 ) = (V , v 0 ), the definition of J implies Cm−1 0 00 00 T 0 Since v = v · (m − 1) · v ∈ V , it follows that v ∈ Cm−1 (u · m · v ) and hence T v = v 0 · (m − 1) · v 00 ∈ Cm (u). This concludes the proof of the lemma. t u 26 Makoto Kanazawa Fix m ≥ 2 and Σ. For each c ∈ Σ and each (possibly empty) finite prefixclosed subset P of Pm−1 , define Γc,P = { (c, P, i) | 0 ≤ i ≤ |P | }. We consider Γc,P to be a group of symbols that match with each other; this notion of a matching group of symbols generalizes the notion of a matching pair of brackets. Let [ e = Σ ∪ { Γc,P | c ∈ Σ and P ⊆ Pm−1 is finite and prefix-closed }. Σ e is an infinite set.16 Note that Σ T Let T ∈ Hm+1 Σ,x . For u ∈ Cm (ε), let Tu = { v ∈ Pm · Pm+1 | m · u · v ∈ T }, i.e., the domain of SH m+1 (T , m · u). Then we have ( S {ε} ∪ u∈Cm if m + 1 6∈ T , T (ε) m · u · Tu T = S {ε} ∪ (m + 1) · (T /(m + 1)) ∪ u∈Cm if m + 1 ∈ T . T (ε) m · u · Tu Thus, T is completely determined by the following pieces of information: – – – – – `T (ε), T (ε), Cm T (ε), SH m+1 (T , m · u) for each u ∈ Cm whether or not m + 1 ∈ T , and in case m + 1 ∈ T , the (m + 1)-dimensional hedge SH m+1 (T , m + 1) = T /(m + 1). T Let P = Cm (ε), k = |P |, and for i = 1, . . . , k, ε CTm,i m · ui and Ti = SH m+1 (T , m · ui ). In case m + 1 ∈ T or k ≥ 1 (i.e., m ∈ T ), let c = `T (ε) ∈ Σ. The m-dimensional encoding of T , encm (T ) in symbols, is defined as follows: T`T (ε) if m + 1 6∈ T and k = 0, c − P (enc (T ), . . . , enc (T )) if m + 1 6∈ T and k ≥ 1, m m 1 m k if m + 1 ∈ T and (c, P, 0) −m (encm (T0 ))[ encm (T ) = (c, P, 1) −m encm (T1 ), T0 = SH m+1 (T , m + 1). ..., (c, P, k) −m encm (Tk ) ] (The substitution notation in the last clause presupposes encm (T0 ) ∈ Tm (k), e Σ and this is indeed the case as shown by the following lemma.) 16 e from Σ in this way, we assume that Σ ∩Γc,P = ∅ for all c ∈ Σ and When we define Σ all finite and prefix-closed P ⊆ Pm−1 . Technically, this assumption may not always be satisfied; nevertheless, we always regard the symbols in Γc,P as “new” symbols. If more rigor is desired, it can be achieved by complicating the definition of Γc,P . Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 27 S B e e h B a1 a5 h a6 x a4 a3 h x a2 g x x (S, ∅, 0) (B, P, 0) e a5 e h h a6 (B, P, 2) a4 (B, P, 0) a3 a1 h (B, P, 1) a2 g (B, P, 2) (B, P, 1) Fig. 2. A well-labeled 3-dimensional tree and its 2-dimensional encoding (P = {ε, 1}). Example 12. Fig. 2 shows the 3-dimensional tree T from Example 9, which is in T3N ∪Σ,x , where N = {S, B} and Σ = {h, g, a1 , a2 , a3 , a4 , a5 , a6 }, along with enc2 (T ). Here, P = {ε, 1}. As before, the edges in the third dimension are colored red, those in the second dimension blue, and those in the first dimension green. m Lemma 13. If T ∈ Hm+1 (n). Σ,x (n), then encm (T ) ∈ TΣ e Proof. Induction on the size of T . Let c, P , k, and Ti be as above. Suppose T ∈ m+1 T m Hm+1 (1). Σ,x (n). If ` (ε) = x, then T = Tx ∈ HΣ,x (1), and encm (T ) = Tx ∈ TΣ e m+1 T If ` (ε) = c ∈ Σ, then Ti ∈ HΣ,x (ni ) for i = 1, . . . , k, where n = n1 + · · · + nk . By induction hypothesis, encm (Ti ) ∈ Tm (ni ). Suppose m + 1 6∈ T . Then e Σ it is easy to see encm (T ) ∈ Tm (n). Now suppose m + 1 ∈ T . Since T is e Σ m (k). well-labeled, T0 ∈ Hm+1 Σ,x (k). By induction hypothesis, encm (T0 ) ∈ TΣ e 28 Makoto Kanazawa Also, (c, P, i) −m encm (Ti ) is in Tm (ni ) for i = 1, . . . , k. It easily follows e Σ that encm (T ) = (c, P, 0) −m (encm (T0 ))[(c, P, 1) −m encm (T1 ), . . . , (c, P, k) −m encm (Tk )] ∈ Tm (n). t u e Σ Note that in encm (T ), every node with a label of the form (c, P, i) (i ≥ 0) has exactly one child in the m-th dimension. There is a simple way of deleting any collection of such nodes from an m-dimensional tree to produce another m-dimensional tree. Let T be any m-dimensional tree, and assume that U ⊆ T only consists of T nodes v such that |Cm (v)| = 1. Define a function fU : T → [1, m]∗ by fU (ε) = ε, fU (v · i) = fU (v) · i for i < m, ( fU (v) · m if v 6∈ U , fU (v · m) = fU (v) if v ∈ U . Let T 0 = ran(fU ) = { fU (v) | v ∈ T } and fU0 = fU (T − U ). Then it is easy to see that T 0 = ran(fU0 ), T 0 is a non-empty prefix-closed subset of Pm , and fU0 is a bijection from T − U to T 0 . Define delm (T , U ) = (T 0 , `0 ), where `0 (v) = `T ((fU0 )−1 (v)). Since T 0 is a non-empty prefix-closed subset of Pm , it follows that delm (T , U ) is an m-dimensional tree. Let Υ ⊆ Σ and T ∈ Tm Σ . We define delm,Υ (T ) = delm (T , U ), where T U = { v ∈ T | `T (v) ∈ Υ and |Cm (v)| = 1 }. Now let T ∈ Tm+1 Σ,x (m ≥ 2). The m-dimensional yield of T is defined as follows: ym (T ) = delm,Σ−Σ e (encm (T )). It is easy to see that ym (T ) ∈ Tm Σ. m It is of course straightforward to define ym : Hm+1 Σ,x (n) → TΣ (n) directly: Tc c − P (y (T ), . . . , y (T )) m m 1 m k ym (T ) = (ym (T0 ))[ym (T1 ), . . . , ym (Tk )] if m + 1 6∈ T and k = 0, if m + 1 6∈ T and k ≥ 1, if m + 1 ∈ T and T0 = SH m+1 (T , m + 1), T where, as before, c = `T (ε), P = Cm (ε), k = |P |, and for i = 1, . . . , k, ε CTm,i m · ui and Ti = SH m+1 (T , m·ui ). The indirect definition through encm , however, is Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 29 (S, ∅, 0) (B, P, 0) h a1 (B, P, 0) a6 g (B, P, 1) (B, P, 2) h h h a1 a2 (B, P, 1) a3 a4 (B, P, 2) e a5 g h a2 e a6 h e a3 a4 a5 e Fig. 3. The 2-dimensional encoding and 2-dimensional yield of a well-labeled 3dimensional tree (P = {ε, 1}). useful for our generalization of the Chomsky-Schützenberger theorem for multidimensional tree languages. The case m = 2 of the above definition of ym is meant to capture the notion of the (tree) yield of a derivation tree of a simple context-free tree grammar, which we represent as a (well-labeled) 3-dimensional tree. The definitions of encm , delm,Υ , ym are all applicable to the case m = 1 as well, but the resulting definitions of enc1 and of y1 will not be equivalent to the standard ones, so we will continue to treat m = 1 as a special case. Example 14. Fig. 3 shows enc2 (T ) (the same tree as the lower tree in Fig. 2 with the nodes rearranged) and y2 (T ), where T is the 3-dimensional tree from Example 9 (the upper tree in Fig. 2). 7 Multi-dimensional Dyck Languages We continue to work with the alphabet [ e = Σ ∪ { Γc,P | c ∈ Σ and P is a finite prefix-closed subset of Pm−1 }, Σ as defined in the previous section. The range of the function encm : Hm+1 Σ,x (0) → Tm forms a special subset of Tm similar to Dyck languages. e e Σ Σ Let us define a rewriting relation on Tm : e Σ T T0 holds if there exist some v0 , v1 , . . . , vn ∈ T (n ≥ 0), c ∈ Σ, and finite prefix-closed subset P of Pm such that17 17 ∗ Note that for u, v ∈ T , u (CTm ) v is equivalent to v ∈ u · Pm , and u (CTm ) equivalent to v ∈ u · m · Pm−1 · Pm . + v is 30 Makoto Kanazawa T 0 = delm (T , {v0 , v1 , . . . , vn }), T |Cm (vi )| = 1 for i = 0, 1, . . . , n, `T (vi ) = (c, P, i) for i = 0, 1, . . . , n, n = |P |, v1 , . . . , vn is the alphabetical listing of {v1 , . . . , vn }, + v0 (CTm ) vi for i = 1, . . . , n, ∗ for every i, j ∈ {1, . . . , n}, if vi (CTm ) vj , then i = j, and + for every u ∈ T , if v0 (CTm ) u and there is no i ∈ {1, . . . , n} such that T ∗ T vi (Cm ) u, then ` (u) ∈ Σ. – – – – – – – – Using the term notation, we can write T T 0 if and only if T = U [(c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, n) −m Tn ]], T 0 = U [T0 [T1 , . . . , Tn ]]) m m for some U ∈ Tm e (1), T0 ∈ TΣ (n), Ti ∈ T e (i = 1, . . . , n), Σ Σ c ∈ Σ, finite and prefix-closed P ⊆ Pm−1 with |P | = n. Define the m-dimensional Dyck tree language over Σ by18 m DT m Σ = { T ∈ Te | T Σ ∗ T 0 ∈ Tm Σ }. Note that the alphabet of DT m Σ (i.e., the set of labels that appear in elements of DT m Σ ) is infinite. Just like the ordinary Dyck language Dn of strings over Γn has an alternative inductive definition in terms of a context-free grammar, so too the m-dimensional tree language DT m Σ admits an inductive definition. First, let us extend the definition of to a rewriting relation on Tm (n) by taking the exact same definition e Σ as before, requiring `T (u) ∈ Σ rather than `T (u) ∈ Σ ∪ {x} in the consequent of the last condition. Note that this relation is confluent: Lemma 15. Let T , T1 , T2 ∈ Tm (n). If T T1 and T e Σ some T 0 ∈ Tm (n) such that T1 T 0 and T2 T 0. e Σ For each n ∈ N, we define DT m Σ (n) by m DT m Σ (n) = { T ∈ T e (n) | T Σ ∗ T2 , then there exists T 0 ∈ Tm Σ (n) }. m Clearly, DT m Σ (0) = DT Σ . Then we can prove that (Xn )n∈N = (DT m Σ (n))n∈N is the least (in terms of the partial order defined by componentwise inclusion) sequence of sets that satisfies the following closure conditions: 18 For dimension m = 2, analogous notions of Dyck tree language have been proposed by Matsubara and Kasai [19] and by Arnold and Dauchet [1] to capture the tree languages generated by tree-adjoining grammars and by (general) context-free tree grammars, respectively. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 31 I1. Tc ∈ X0 for all c ∈ Σ. I2. Tx ∈ X1 . I3. If c ∈ Σ, P is a finite, non-empty, prefix-closed subset of Pm−1 , k = |P |, Pk T1 ∈ Xn1 , . . . , Tk ∈ Xnk , and n = i=1 ni , then c −m P (T1 , . . . , Tk ) ∈ Xn . I4. If c ∈ Σ, P is a (possibly empty) finite prefix-closed subset of Pm−1 , k = |P |, Pk T1 ∈ Xn1 , . . . , Tk ∈ Xnk , T0 ∈ Xk , and n = i=1 ni , then (c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, k) −m Tk ] ∈ Xn . Lemma 16. (Xn )n∈N = (DT m Σ (n))n∈N satisfies I1–I4. Lemma 17. If T ∈ DT m Σ (n), then one of the following conditions holds: C1. n = 0 and T = Tc for some c ∈ Σ. C2. n = P 1 and T = Tx . k C3. n = i=1 ni for some k ≥ 1, n1 , . . . , nk ≥ 0 and T = c −m P (T1 , . . . , Tk ) for some c ∈ Σ, some finite prefix-closed subset P of Pm−1 such that |P | = m k, and some T1 ∈ DT m Σ (n1 ), . . . , Tk ∈ DT Σ (nk ). Pk C4. n = i=1 ni for some k ≥ 0, n1 , . . . , nk ≥ 0 and T = (c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, k) −m Tk ] for some c ∈ Σ, some finite prefix-closed subset P of Pm−1 such that |P | = m m k, and some T1 ∈ DT m Σ (n1 ), . . . , Tk ∈ DT Σ (nk ), T0 ∈ DT Σ (k). Proof. Suppose T ∈ DT m Σ (n). If |T | = 1, then clearly, either C1 or C2 holds. T If `T (ε) ∈ Σ and Cm (ε) = P 6= ∅, let k = |P |. Then T = c −m P (T1 , . . . , Tk ) Pk for some T1 ∈ Tm (n1 ), . . . , Tk ∈ Tm (nk ) such that i=1 ni = n. Since T ∗ T 0 e e Σ Σ ∗ Ti0 for some for some T 0 ∈ Tm Σ (n), it is clear that for i = 1, . . . , k, Ti 0 m Ti ∈ TΣ (ni ). Therefore, C3 holds. Now suppose `T (ε) = (c, P, i). Since T ∗ T 0 ∈ Tm Σ (n), it is easy to see that T i = 0 and Cm (ε) = {ε}. Let k = |P | and let T00 ∈ Tm (k), T10 , . . . , Tk0 be such that Σ T ∗ (c, P, 0) −m T00 [(c, P, 1) −m T10 , . . . , (c, P, k) −m Tk0 ] T00 [T10 , . . . , Tk0 ] ∗ T 0. Then it is easy to see that for i = 1, . . . , k, Ti0 Pk that n = i=1 ni . Also, we must have ∗ Ti00 ∈ Tm Σ (ni ) for some ni such T = (c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, k) −m Tk ] for some T0 ∈ Tm (k), T1 ∈ Tm (n1 ), . . . , Tk ∈ Tm (nk ) such that T0 ∗ T00 and e e e Σ Σ Σ for i = 1, . . . , k, Ti ∗ Ti0 . It follows that T0 ∈ DT m Σ (k) and for i = 1, . . . , k, Ti ∈ DT m t u Σ (ni ), i.e., C4 holds. 32 Makoto Kanazawa Theorem 18. (DT m Σ (n))n∈N is the least sequence of sets that satisfies I1–I4. Proof. By Lemma 16, we know that (DT m Σ (n))n∈N satisfies I1–I4. Let (Xn )n∈N be any sequence of sets satisfying I1–I4. To establish that DT m Xn holds Σ (n) ⊆S for all n ∈ N, we prove by induction on the number of nodes of T ∈ n Tm (n) e Σ m that T ∈ DT m Σ (n) implies T ∈ Xn . If T ∈ DT Σ (n), by Lemma 17, one of C1–C4 holds. In case C1 or C2 holds, T ∈ Xn by I1 or I2. If C3 holds, then by induction hypothesis, T = c −m P (T1 , . . . , Tk ), where Ti ∈ Xni for i = 1, . . . , k. Then by I3, T ∈ Xn . If C4 holds, then by inductionPhypothesis, T = k (c, P, 0) −m T0 [(c, P, 1) −m T1 , . . . , (c, P, k) −m Tk ], where n = i=1 ni , T0 ∈ Xk , and Ti ∈ Xni for i = 1, . . . , k. Then by I4, T ∈ Xn . t u Just as in the case of ordinary Dyck languages, the inductive definition of DT m Σ (n) in terms of the closure conditions I1–I4 is unabmiguous in the sense that every T ∈ DT m Σ (n) can be written in the form of one of the equations in C1–C4, in exactly one way. This follows from the next lemma: 0 Lemma 19. Let U ∈ DT m ∈ DT m U [(c, P, i1 ) −m Σ (k), U Σ (l). If 0 T1 , . . . , (c, P, ik ) −m Tk ] = U [(c, P, j1 ) −m T10 , . . . , (c, P, jl ) −m Tl0 ] with i1 , . . . , ik , j1 , . . . , jl ≥ 1, then U = U 0 . Proof. This can be proved by straightforward induction on the size of U , using Lemma 17. t u m Lemma 20. { encm (T ) | T ∈ Hm+1 Σ,x (n) } = DT Σ (n). m+1 Proof. (⊆). We show by induction on the size of T ∈ Hm+1 Σ,x that T ∈ HΣ,x (n) T implies encm (T ) ∈ DT m Σ (n). Let P = Cm (ε), and k = |P |. Assume that for i = T 1, . . . , k, ε Cm,i m · ui and Ti = SH m+1 (T , m · ui ). Clearly, for each i = 1, . . . , k, Pk Ti ∈ Hm+1 Σ,x (ni ) for some ni such that n = i=1 ni . By induction hypothesis, encm (Ti ) ∈ DT m (n ). i Σ Case 1. m + 1 ∈ T . Then `T (ε) = c for some c ∈ Σ. Let T0 = SH m+1 (T , m + 1). Then T0 ∈ Hm+1 Σ,x (k). By induction hypothesis, we also have encm (T0 ) ∈ DT m (k). Then enc m (T ) = (c, P, 0) −m (encm (T0 ))[(c, P, 1) −m Σ encm (T1 ), . . . , (c, P, k) −m encm (Tk )] ∈ DT m Σ (n) by the closure condition I4. Case 2. m + 1 6∈ T . If P 6= ∅, then `T (ε) = c for some c ∈ Σ, and encm (T ) = c −m P (encm (T1 ), . . . , encm (Tk )) ∈ DT m Σ (n) by the closure condition I3. If P = m+1 ∅, then T ∈ Hm+1 (0) or T ∈ H (1) depending on whether `T (ε) = c ∈ Σ or Σ,x Σ,x `T (ε) = x. In the former case, encm (T ) = Tc ∈ DT m Σ (0) by the closure condition I1. In the latter case, encm (T ) = Tx ∈ DT m Σ (1) by the closure condition I2. (⊇). By Theorem 18, it suffices to prove that (Xn )n∈N = ({ encm (T ) | T ∈ Hm+1 Σ,x (n) })n∈N satisfies the conditions I1–I4. I1 is satisfied since Tc = encm (Tc ) and Tc ∈ Hm+1 Σ,x (0). I2 is satisfied since Tx = encm (Tx ) and Tx ∈ Hm+1 Σ,x (1). To check that I3 is satisfied, let c ∈ Σ, P be a finite, non-empty, prefixclosed subset of Pm−1 , k = |P |, and for each i = 1, . . . , k, Ti ∈ Hm+1 Σ,x (ni ), where Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 33 Pk n = i=1 ni . Let u1 , . . . , uk be the elements of P , in alphabetical order. Define T by k [ T = {ε} ∪ m · ui · Ti , i=1 `T (ε) = c, `T (m · ui · v) = `Ti (v) for v ∈ Ti . Then T ∈ Hm+1 Σ,x (n) and Ti = SH m+1 (T , m·ui ) for i = 1, . . . , k. By the definition of encm , encm (T ) = c −m P (encm (T1 ), . . . , encm (Tk )). This shows that I3 is satisfied. To check that I4 is satisfied, let c ∈ Σ, P be a finite, possibly empty, prefixclosed subset of Pm−1 , k = |P |, T0 ∈ Hm+1 Σ,x (k), and for each i = 1, . . . , k, Pk m+1 Ti ∈ HΣ,x (ni ), where n = i=1 ni . Let u1 , . . . , uk list the elements of P , in alphabetical order. Define T by T = {ε} ∪ (m + 1) · T0 ∪ k [ m · ui · Ti , i=1 `T (ε) = c, `T ((m + 1) · v) = `T0 (v) T Ti ` (m · ui · v) = ` (v) for v ∈ T0 , for v ∈ Ti . Then it is easy to see T ∈ Hm+1 Σ,x (n), T0 = SH m+1 (T , m+1), and for i = 1, . . . , k, Ti = SH m+1 (T , m · ui ). By the definition of encm , encm (T ) = (c, P, 0) −m (encm (T0 ))[(c, P, 1) −m (encm (T1 )), . . . , (c, P, k) −m (encm (Tk ))]. t u This shows that I4 is satisfied. Lemma 21. For each m ≥ 2, encm is an injection. Proof. This follows from the unambiguity of the inductive definition of DT m Σ (n). t u T from the nodes of T ∈ Hm+1 It is useful to define a function fenc Σ,x to m T the nodes of T 0 = encm (T ). Let P = Cm (ε) and k = |P |. Let u1 , . . . , uk list the elements of P in alphabetical order, and let Ti = SH m+1 (T , m · ui ) for T i = 1, . . . , k. Define fenc : T → T 0 by m T (i) fenc (ε) = ε. m (ii) If m + 1 ∈ T and T0 = SH m+1 (T , m + 1), then T T0 fenc ((m + 1) · w) = m · fenc (w) m m T fenc (m m · ui · w) = m · T0 fenc (vi ) m where w ∈ T0 , ·m· Ti fenc (w) m where w ∈ Ti and ε JT m+1,i vi . 34 Makoto Kanazawa (iii) If m + 1 6∈ T , then T Ti fenc (m · ui · w) = m · ui · fenc (w) m m where w ∈ Ti . T It is easy to check that fenc (v) ∈ T 0 indeed holds for all v ∈ T . m Example 22. Consider the 3-dimensional tree T and its 2-dimensional encoding enc2 (T ), depicted in Fig. 2. In these diagrams, the nodes that are related by T fenc are placed in roughly the same geometrical positions. m 0 Lemma 23. Let T ∈ Hm+1 Σ,x and T = encm (T ). For each v ∈ T , we have c 0 (c, U, 0) T `T (fenc (v)) = m (c, U, i) x if if if if v 6∈ dom(≺Tm+1 ) and `T (v) = c ∈ Σ, T v ∈ dom(≺Tm+1 ), `T (v) = c, and Cm (v) = U , T T `T (v) = x, u JT v, ` (u) = c, and Cm (u) = U , m+1,i T ` (v) = x and v ∈ T ∩ Pm . Proof. This is easy to see from the definition of encm . t u T allows an alternative definition by recursion with respect The function fenc m to the alphabetical order on the nodes of T . T Lemma 24. The function fenc satisfies the following equations: m T fenc (ε) = ε, m T T fenc (u · (m + 1)) = fenc (u) · m, m m and for v ∈ Pm−1 , T fencm (u) · m · v T T fencm (u · m · v) = fenc (u · (m + 1) · w) · m m if u 6∈ dom(≺Tm+1 ), if u ∈ dom(≺Tm+1 ), u CTm,i v, and u JT m+1,i w. Proof. The first equation is true by definition. The remaining two equations can be proved by induction on the length of u. t u T Lemma 25. For all T ∈ Hm+1 Σ,x , fencm is a bijection from the nodes of T to the nodes of encm (T ). T Proof. That fenc is injective can be shown by induction with respect to the m T T T alphabetical order on T . Suppose fenc (t1 ) = fenc (t2 ). If fenc (t1 ) = ε, then m m m T clearly t1 = t2 = ε. If fencm (t1 ) ∈ Pm · m · v for v ∈ Pm−1 − {ε}, then we must T T have t1 = u1 · m · v and t2 = u2 · m · v with fenc (u1 ) = fenc (u2 ). By the m m T induction hypothesis, u1 = u2 and hence t1 = t2 . If fencm (t1 ) ∈ Pm · m, then for each j ∈ {1, 2}, one of the following holds: T T (i) tj = uj · (m + 1) and fenc (tj ) = fenc (uj ) · m. m m Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 35 T T (ii) tj = uj · m and fenc (tj ) = fenc (uj ) · m, where uj 6∈ dom(≺Tm+1 ). m m T T (iii) tj = uj · m · vj and fenc (t ) = fenc (uj · (m + 1) · wj ) · m, where uj ∈ j m m dom(≺Tm+1 ), uj CTm,i vj , and uj JT m+1,i wj . If the same case applies to both t1 and t2 , then the induction hypothesis implies T T t1 = t2 . If (i) applies to t1 and (ii) applies to t2 , then fenc (u1 ) = fenc (u2 ), m m T T but since u1 ∈ dom(≺m+1 ) and u2 6∈ dom(≺m+1 ), the induction hypothesis says T (u1 ) = this is impossible. If (i) applies to t1 and (iii) applies to t2 , then fenc m T T fencm (u2 ·(m+1)·w2 ). Since u1 ∈ dom(≺m+1 ) and u2 ·(m+1)·w2 6∈ dom(≺Tm+1 ) (by the fact that T ∈ Hm+1 Σ,x ), the induction hypothesis says this is impossible. If T T (ii) applies to t1 and (iii) applies to t2 , then fenc (u1 ) = fenc (u2 · (m + 1) · w2 ). m m T T Since u1 ∈ dom(≺m ) and u2 · (m + 1) · w2 6∈ dom(≺m ) (again by the fact that T ∈ Hm+1 Σ,x ), the induction hypothesis says this is impossible. The remaining cases are symmetric. T We have shown that fenc is injective. Since T and T 0 have the same number m T of nodes, fencm is indeed a bijection. t u m The m-dimensional counterpart DT 0 Σ of the set of Dyck primes (m ≥ 2) is defined by m DT 0 Σ = { (c, ∅, 0) −m T | c ∈ Σ, T ∈ DT m Σ } ∪ { Tc | c ∈ Σ }. m 0 m+1 Lemma 26. For all T ∈ Hm+1 Σ,x (0), T ∈ TΣ,x if and only if encm (T ) ∈ DT Σ . e → Σ ∪ {x} by Define a function ρ : Σ ρ(c) = c for c ∈ Σ, ρ((c, U, 0)) = c, ρ((c, U, i)) = x for 1 ≤ i ≤ |U |. 0 Then for every T ∈ Hm+1 Σ,x and v ∈ T , if T = encm (T ), we have 0 T ρ(`T (fenc (v))) = `T (v). m The following is a generalization of Lemma 2 to higher dimensions: Lemma 27. Let L ⊆ Tm+1 Σ,x . If L is super-local, then there exists a local set m 0 m L ⊆ T such that encm (L) = L0 ∩ DT 0 Σ . e Σ Proof. Let L be a super-local subset of Tm+1 Σ,x . Without loss of generality, we may suppose that L = SLocm+1 (A, Z, K, Y, J) for some finite sets A ⊆ Σ, Z ⊆ Σ ∪ {x}, K ⊆ Σ × (Σ ∪ {x}), Y ⊆ Σ ∪ {x}, J ⊆ Σ × { P ⊆ Pm−1 | P is an (m − 1)-ary tree domain } × N+ × (Σ ∪ {x}). 36 Makoto Kanazawa Let S Σ 0 = (Z ∩ Σ) ∪ { Γc,U | x ∈ Y ∩ Z, (c, U, i, a) ∈ J, (c, b) ∈ K } ∪ { (c, ∅, 0) | c ∈ A ∪ Y, (c, a) ∈ K }. e Define finite sets A0 , Z 0 ⊆ Σ 0 and I ⊆ Note that Σ 0 is a finite subset of Σ. m−1 0 Σ × TΣ 0 by A0 = (A ∩ Z) ∪ { (c, ∅, 0) | c ∈ A, (c, a) ∈ K }, Z 0 = (A ∪ Y ) ∩ Z ∩ Σ, I = { ((c, U, 0), Td ) | (c, U, 0) ∈ Σ 0 , d ∈ Z ∩ Σ, (c, d) ∈ K } ∪ { ((c, U, 0), T(d,V,0) ) | (c, U, 0), (d, V, 0) ∈ Σ 0 , (c, d) ∈ K, and either V 6= ∅ or d ∈ Y } ∪ { ((c, U, 0), T(c,U,1) ) | (c, U, 1) ∈ Σ 0 , |U | = 1, (c, x) ∈ K } ∪ { ((c, U, i), Td ) | (c, U, i) ∈ Σ 0 , (c, U, i, d) ∈ J, d ∈ Z ∩ Σ } ∪ { ((c, U, i), T(d,V,0) ) | (c, U, i) ∈ Σ 0 , (c, U, i, d) ∈ J, (d, V, 0) ∈ Σ 0 , and either V 6= ∅ or d ∈ Y } ∪ { ((c, U, i), T(d,V,j) ) | (c, U, i) ∈ Σ 0 , (c, U, i, x) ∈ J, j ≥ 1, (d, V, j) ∈ Σ 0 } ∪ { (c, U ) | c ∈ Z ∩ Σ, U ∈ Tm−1 Σ0 , for i = 1, . . . , |U |, if ui is the i-th node of U , then (c, U, i, ρ(`U (ui ))) ∈ J, and `U (ui ) = (d, ∅, 0) implies d ∈ Y }. We show that L0 = Locm (A0 , Z 0 , I) satisfies the desired property. m To prove encm (L) ⊆ L0 ∩ DT 0 Σ , suppose T ∈ L and let T 0 = encm (T ). By m Lemma 26, T 0 ∈ DT 0 Σ , so it suffices to show T 0 ∈ L0 . 0 L1. First we show `T (ε) ∈ A0 . Let c = `T (ε). Since T ∈ L, c ∈ A. Suppose first ε 6∈ dom(≺Tm+1 ). Then since T ∈ L, c ∈ Z. By the definition of encm , 0 `T (ε) = c ∈ A0 . Now suppose ε ∈ dom(≺Tm+1 ). Then since T ∈ L, (c, `T (m + T 1)) ∈ K. Since T is an (m + 1)-dimensional tree, Cm (ε) = ∅. By the definition T0 0 of encm , ` (ε) = (c, ∅, 0) ∈ A . 0 T )−1 (v). By the definition of L2. Suppose v ∈ T 0 − dom(≺Tm ). Let u = (fenc m T T T fencm , it is clear that u 6∈ dom(≺m+1 )∪dom(≺m ) and `T (u) ∈ Σ. By Lemma 23, 0 `T (v) = `T (u). Since T ∈ L, `T (u) ∈ (A ∪ Y ) ∩ Z, and this implies that T0 ` (v) ∈ Z 0 . 0 0 T T L3. Suppose v ∈ dom(≺Tm ) and U 0 = Cm (v). Let u = (fenc )−1 (v). m 0 Case 1. u ∈ dom(≺Tm+1 ). In this case, `T (v) = (c, U, 0), where c = `T (u) and T U = Cm (u). Since T ∈ L, we have (c, `T (u · (m + 1))) ∈ K. If u 6∈ dom(≺Tm ), then U = ∅, and either u = ε and c ∈ A or c ∈ Y . If u ∈ dom(≺Tm ), then (c, U, 1, `T (u · m)) ∈ J. In either case, we have (c, U, 0) ∈ Σ 0 . Note that U 0 = {ε} Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 37 T and v · m = fenc (u · (m + 1)). We have m d if `T (u · (m + 1)) = d ∈ Σ and u · (m + 1) 6∈ dom(≺Tm+1 ), 0 `T (v · m) = (d, V, 0) if `T (u · (m + 1)) = d ∈ Σ, u · (m + 1) ∈ dom(≺Tm+1 ), T and Cm (u · (m + 1)) = V , (c, U, 1) if `T (u · (m + 1)) = x. Since T ∈ L, we have d ∈ Z ∩ Σ ⊆ Σ 0 in the first case. In the second case, (d, `T (u · (m + 1) · (m + 1))) ∈ K and either V = ∅ and d ∈ Y or (d, V, 1, `T (u · (m + 1) · m)) ∈ J, which implies (d, V, 0) ∈ Σ 0 . In the third case, since T is well-labeled, we have |U | = 1 and u · (m + 1) 6∈ dom(≺Tm ), and T ∈ L implies x ∈ Y ∩ Z and (c, U, 1, `T (u · m)) ∈ J. It follows that (c, U, 1) ∈ Σ 0 . In each case, we have ((c, U, 0), U 0 ) = ((c, U, 0), T`T 0 (v·m) ) ∈ I. Case 2. u 6∈ dom(≺Tm+1 ). 0 Case 2a. `T (u) = c ∈ Σ. In this case, `T (v) = c ∈ Σ ∩ Z ⊆ Σ 0 and it is easy T T to see from Lemma 24 that U 0 = Cm (u) and fenc (u · m · w) = v · m · w for all m T T T (u), i, `T (u · m · ui )) = w ∈ Cm (u). Since T ∈ L, if u Cm,i u · m · ui , then (c, Cm 0 0 0 T0 0 U0 (c, U , i, ρ(` (v · m · ui ))) = (c, U , i, ρ(` (ui ))) ∈ J. Also, if `U (ui ) = `T (v · T m · ui ) = (d, ∅, 0) for some d ∈ Σ, then u · m · ui 6∈ dom(≺m ) and hence d ∈ Y . It follows that (c, U 0 ) ∈ I. T Case 2b. `T (u) = x. In this case, for some w, z ∈ T −{ε}, w JT m+1,i u, w Cm,i T z. By Lemma 24, we have U 0 = {ε} and v · m = fenc (z). Let c = `T (w) ∈ Σ m T T and U = Cm (w). Since T ∈ L, (c, ` (w · m)) ∈ K and (c, U, i, `T (z)) ∈ J. Also, 0 `T (u) = x ∈ Y ∩ Z, since T is well-labeled. Hence `T (v) = (c, U, i) ∈ Σ 0 and if `T (z) = d ∈ Σ and z 6∈ dom(≺Tm+1 ), d 0 T `T (v · m) = (d, V, 0) if `T (z) = d ∈ Σ, z ∈ dom(≺Tm+1 ), and Cm (z) = V , T T T T (d, V, j) if ` (z) = x, t Jm+1,j z, ` (t) = d, and Cm (t) = V . In the first case, since T ∈ L, d ∈ Z. In the second case, (d, `T (z · (m + 1))) ∈ K, and if V = ∅, then z 6∈ dom(≺Tm ) and hence d ∈ Y . In the third case, (d, `T (t · (m + 1))) ∈ K. It follows that in each case, ((c, U, i), U 0 ) = ((c, U, i), T`T 0 (v·m) ) ∈ I. We have proved that T 0 satisfies the requirements L1–L3 for membership in L0 = Locm (A0 , Z 0 , I). m m To show L0 ∩ DT 0 Σ ⊆ encm (L), let T 0 ∈ L0 ∩ DT 0 Σ . By Lemma 26, there is m+1 a T ∈ TΣ,x such that T 0 = encm (T ). It suffices to prove that T ∈ L. We show that T satisfies all requirements for membership in L = SLocm+1 (A, Z, K, Y, J). 0 0 SL1. The fact that `T (ε) ∈ A0 easily implies `T (ε) = ρ(`T (ε)) ∈ A. SL2. Let v ∈ T − dom(≺Tm+1 ). 0 T Case 1. `T (v) = c ∈ Σ. Then `T (fenc (v)) = c ∈ Σ 0 ⊆ Z, and it follows m that c ∈ Z. 38 Makoto Kanazawa 0 T Case 2. `T (v) = x. Then `T (fenc (v)) = (d, V, i) ∈ Σ 0 for some V 6= ∅ and m i ≥ 1, and it follows that x ∈ Z. SL3. Let v ∈ dom(≺Tm+1 ). We show (`T (v), `T (v · (m + 1))) ∈ K. By T T Lemma 24, we have fenc (v · (m + 1)) = fenc (v) · m. Let c = `T (v) and U = m m T T0 T T0 T Cm (v). Then ` (fencm (v)) = (c, U, 0) and Cm (fenc (v)) = T`T 0 (fenc T (v)·m) . m m 0 0 0 Since T ∈ L , it holds that ((c, U, 0), T`T (fenc T (v)·m) ) ∈ I. m 0 T (v) · m) is either d or (d, V, 0) Case 1. `T (v · (m + 1)) = d ∈ Σ. Then `T (fenc m for some V depending on whether v · (m + 1) ∈ dom(≺Tm+1 ) or not. Either way, T T ((c, U, 0), T`T 0 (fenc T (v)·m) ) ∈ I implies (` (v), ` (v · (m + 1))) = (c, d) ∈ K. m 0 T Case 2. `T (v · (m + 1)) = x. Then |U | = 1 and `T (fenc (v) · m) = (c, U, 1). m T T 0 Since ((c, U, 0), T`T (fenc ) ∈ I, we have (` (v), ` (v · (m + 1))) = (c, x) ∈ T (v)·m) m K. SL4. Suppose v 6= ε and v ∈ T − dom(≺Tm ). We show `T (v) ∈ Y . Case 1. `T (v) = c ∈ Σ. 0 T Case 1a. v ∈ dom(≺Tm+1 ). Then `T (fenc (v)) = (c, ∅, 0). Since v 6= ε, m T 0 fencm (v) 6= ε and there are some u ∈ T and w ∈ Pm−1 such that u · m · w = 0 0 T T0 T (v)) = fenc (v). Since T 0 ∈ L0 , (`T (u), Cm (u)) ∈ I. Since (c, ∅, 0) = `T (fenc m m T0 `Cm (u) (w), the definition of I implies c ∈ Y . 0 T Case 1b. v 6∈ dom(≺Tm+1 ). Then `T (fenc (v)) = c. Since v 6∈ dom(≺Tm ) and m 0 T `T (v) 6= x, Lemma 24 implies fenc (v) 6∈ dom(≺Tm ). Since T 0 ∈ L0 , c ∈ Z 0 , m which implies c ∈ Y . 0 T Case 2. `T (v) = x. Then `T (fenc (v)) = (d, V, j) ∈ Σ 0 for some V 6= ∅ and m j ≥ 1, which implies x ∈ Y . T (u) = U , c = `T (u). We show (c, U, i, `T (v)) ∈ J. SL5. Let u CTm,i v, Cm 0 T T Case 1. u ∈ dom(≺m+1 ). Then `T (fenc (u)) = (c, U, 0) and for some m 0 T T T T T w, u Jm+1,i w, ` (fencm (w)) = (c, U, i), fencm (v) = fenc (w) · m, and m 0 T T 0 0 0 0 Cm (fenc (w)) = T . Since T ∈ L , ((c, U, i), T T T `T (fenc (v)) `T (fenc (v)) ) ∈ I. m m m We have d if `T (v) = d ∈ Σ and v 6∈ dom(≺Tm+1 ), T 0 (v) = V , (d, V, 0) if `T (v) = d ∈ Σ, v ∈ dom(≺Tm+1 ), and Cm T `T (fenc (v)) = m T T T (d, V, j) if ` (v) = x, z J v, ` (z) = d ∈ Σ, and m+1,j T Cm (z) = V . In each case, ((c, U, i), T`T 0 (fenc T m (v)) ) ∈ I implies (c, U, i, `T (v)) ∈ J. 0 T T (u)) = c, fenc (u · m · t) = Case 2. u 6∈ dom(≺Tm+1 ). In this case, `T (fenc m m T T T0 T T fencm (u)·m·t for all t ∈ Cm (u), and Cm (fencm (u)) = Cm (u) = U . In particular, 0 T T T0 T fenc (u) CTm,i fenc (v). Since T 0 ∈ L0 , (c, Cm (fenc (u))) ∈ I. Let ui be the m m m T T i-th node of U , so that fencm (v) = fencm (u) · m · ui . Then (c, U, i, `T (v)) = 0 0 T0 T T T T (c, U, i, ρ(`T (fenc (v)))) = (c, Cm (fenc (u)), i, ρ(`Cm (fencm (u)) (ui ))) ∈ J, by m m the definition of I. This establishes T ∈ L = SLocm+1 (A, Z, K, Y, J). t u Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 39 The converse of the above lemma does not hold for a reason similar to the one for the case of the standard enc function for dimension 1.19 f0 → Σ e in an A projection π : Σ 0 → Σ naturally induces a projection π e: Σ obvious way: π e(c) = π(c), π e((c, P, i)) = (π(c), P, i). 0 Clearly, if T 0 ∈ DT m e(T 0 ) ∈ DT m ∈ Tm+1 Σ 0 , then π Σ . Also, if T Σ 0 ,x , then 0 0 encm (π(T )) = π e(encm (T )). Here is a generalization of Lemma 6 to multi-dimensional Dyck languages: Lemma 28. Let L ⊆ Tm be a local set. Then there exist a finite alphabet Σ 0 , e Σ m a projection π : Σ 0 → Σ, and a local set L0 ⊆ Tm0 such that L ∩ DT 0 Σ = e Σ 0m π e(L0 ∩ DT m e maps L0 ∩ DT m Σ 0 ). Moreover, π Σ 0 bijectively to L ∩ DT Σ . e and I ⊆ Σ e × Tm−1 be finite sets such that L = Proof. Let A, Z ⊆ Σ e Σ m Locm (A, Z, I). Since we are interested in the intersection of L and DT 0 Σ , we may assume without loss of generality Z ⊆ Σ. Define Σ0 = Z ∪ { c ∈ Σ | (c, T ) ∈ I }, Σ 0 = Σ0 ∪ { c̄ | c ∈ A ∩ Z }, A0 = { c̄ | c ∈ A ∩ Z } ∪ { (c, ∅, 0) | c ∈ Σ0 , (c, ∅, 0) ∈ A }, Z 0 = Z ∪ { c̄ | c ∈ A ∩ Z }. f0 . Let π : Σ 0 → Σ be the projection defined Then A0 and Z 0 are finite subsets of Σ by ¯ =d π(c) = c, π(d) for each c ∈ Σ0 and d ∈ A ∩ Z. Let L0 = Locm (A0 , Z 0 , I). m m 0 It is easy to see that L ∩ DT 0 Σ = π e(L0 ∩ DT m Σ 0 ) and for each T ∈ L ∩ DT Σ , 0 0 0 there is a unique T ∈ L such that π(T ) = T . t u 0 Lemma 29. If L ⊆ Tm+1 Σ,x is a local set, then there exist a finite alphabet Σ , a 0 0 m projection π : Σ → Σ, and a local set L ⊆ T 0 such that e Σ encm (L) = π e(L0 ∩ DT m Σ 0 ). Moreover, enc−1 e maps L0 ∩ DT m m ◦π Σ 0 bijectively to L. 19 There is also an additional reason. L = {a −3 a} is not super-local even though enc2 (L) = {(a, ∅, 0) −2 a} is local. 40 Makoto Kanazawa Proof. By Lemma 11, there exist a projection π1 : Σ1 → Σ and a super-local m L1 ⊆ Tm+1 Σ1 ,x such that L = π1 (L1 ). By Lemma 27, there exist a local set L2 ⊆ TΣ f1 m such that encm (L1 ) = L2 ∩ DT 0 Σ1 . By Lemma 28, there exist a projection m π2 : Σ 0 → Σ1 and a local set L0 ⊆ Tm0 such that L2 ∩ DT 0 Σ1 = π f2 (L0 ∩ DT m Σ 0 ). e Σ So encm (L) = encm (π1 (L1 )) =π f1 (encm (L1 )) m =π f1 (L2 ∩ DT 0 Σ1 ) =π f1 (f π2 (L0 ∩ DT m Σ 0 )) =π e(L0 ∩ DT m Σ 0 ), where π = π1 ◦ π2 . Since π1 is a bijection from L1 to L and encm is injective, π f1 maps encm (L1 ) bijectively to encm (L). Since π f2 maps L0 ∩ DT m bijectively to 0 Σ m L2 ∩ DT 0 Σ1 = encm (L1 ), π e=π f1 ◦ π f2 maps L0 ∩ DT m bijectively to enc(L). t u 0 Σ 8 A Multi-dimensional Generalization of the Chomsky-Schützenberger Theorem Let m ≥ 2. We call L ⊆ Tm Σ simple context-free if there exist a finite alphabet Υ and a local set K ⊆ Tm+1 Υ,x such that L = ym (K). For a finite alphabet Σ and r ≥ 0, we define the finite alphabet er = Σ ∪ Σ [ { Γc,P | c ∈ Σ, P is finite and prefix-closed, |P | ≤ r }. For any alphabet Υ and p ≥ 1, let m T Tm Υ,p = { T ∈ TΥ | |Cm (v)| ≤ p for all v ∈ T }. m Clearly, if Υ is finite, Tm Υ,p is a local subset of TΥ . Also, any local subset L m m of TΥ is included in TΥ,p for some p, which is just another way of saying L is degree-bounded. m Lemma 30. Let Σ be a finite set. For m ≥ 2, DT m is simple contextΣ ∩ TΣ er ,p free. Proof. We adapt the inductive definition of DT m Σ (n) to obtain the required local e set. Let Υ = Σr ∪ {X0 , . . . , Xr }. We write Uk for the set {ε, (m − 1), . . . , (m − 1)k−1 } ⊆ Pm−1 . Let An = {Xn } for n = 0, . . . , r, er ∪ {x}, Z=Σ Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 41 I = {(X0 , Tc ), (X1 , Tx )} ∪ c −m P ( P ⊆ Pm−1 , X − U (x, . . . , x), n m n 1 1 P is finite and prefix-closed, Xn , ..., ∪ 1 ≤ |P | = k ≤ p, X − U (x, . . . , x) n m n k k 0 ≤ n = n1 + · · · + nk ≤ r ) (c, P, 0) −m Xk −m Uk ( P ⊆ Pm−1 , (c, P, 1) − X − U (x, . . . , x), m n m n 1 1 P is finite and prefix-closed, Xn , . . . , . 0 ≤ |P | = k ≤ r, (c, P, k) − X − U (x, . . . , x) m n m n k k 0 ≤ n = n1 + · · · + nk ≤ r ) Here, the number of occurrences of x in Uni (x, . . . , x) is |Uni | = ni . When j = 0, we understand the notation Xj −m Uj (x, . . . , x) to mean X0 , i.e., the tree consisting of a single node labeled by X0 . Note that An and Z are (finite) subsets of Υ and I is a finite subset of Υ × Tm Υ ∪x . It is straightforward to prove that m DT m (n) = ym (Locm+1 (An , Z, I)) Σ (n) ∩ T e Σr ,p holds for n = 0, . . . , r. The case of n = 0 gives the statement of the lemma. We omit the details. t u The following lemma is straightforward. Lemma 31. If L is simple context-free, there are finite sets A, Z, I such that L = ym (Locm+1 (A, Z, I)) and Z ∩ { c | (c, T ) ∈ I } = ∅. T Recall the definition of delm (T , U ) for U ⊆ { v ∈ T | |Cm (v)| = 1 }, which ∗ was given in terms of the function fU : T → [1, m] : fU (ε) = ε, fU (v · i) = fU (v) · i for i < m, ( fU (v) · m if v 6∈ U , fU (v · m) = fU (v) if v ∈ U . For TΣm and Υ ⊆ Σ, delm,Υ (T ) was defined to be delm (T , U ), with U = { v ∈ T T | `T (v) ∈ Υ, |Cm (v)| = 1 }. m+1 For T ∈ TΣ,x , let fyTm be the mapping from the nodes of T to the nodes of ym (T ) defined by T fyTm (u) = fU (fenc (u)), m where T U = fenc ({ u ∈ T | u ∈ dom(≺Tm+1 ) or `T (u) = x }). m It is clear that if u ∈ T − dom(≺m+1 ) and `T (u) 6= x, then `T (u) = `ym (T ) (fyTm (u)).20 20 Here and in the proof of the next lemma, we use the notations `ym (T ) and `encm (T ) with their obvious meaning. 42 Makoto Kanazawa Lemma 32. Let L ⊆ Tm Σ be a simple context-free set. 0 (i) If L0 ⊆ Tm Σ is local, then L ∩ L is simple context-free. (ii) For every projection π : Σ → Σ 0 , π(L) is simple context-free. (iii) If Σ 0 ⊆ Σ, then delm,Σ 0 (L) is simple context-free. Proof. Let L = ym (Locm+1 (A, Z, I)), where Σ ⊆ Υ and A ⊆ Υ, Z ⊆ Υ ∪{x}, I ⊆ Υ × Tm Υ ∪{x} are finite sets. By Lemma 31, we may assume Z ⊆ Σ ∪ {x} and { c | (c, T ) ∈ I } ⊆ Υ − Σ. Let r = max{ n | (c, T ) ∈ I, T ∈ Tm Υ (n) }. We only prove (i), since (ii) and (iii) are straightforward.21 Let L0 = Locm (A0 , Z 0 , I 0 ), where A0 , Z 0 ⊆ Σ, I 0 ⊆ Σ × Tm−1 are finite sets. Σ Let Υ1 = { (c, d, e1 , . . . , en ) | c ∈ Υ − Σ, d, e1 , . . . , en ∈ Σ, n ≤ r }. Define projections π1 : Σ ∪ Υ1 → Υ and π2 : Σ ∪ Υ1 → Σ by π1 (c) = c π1 ((c, d, e1 , . . . , en )) = c π2 (c) = c for c ∈ Σ, for (c, d, e1 , . . . , en ) ∈ Υ1 , for c ∈ Σ, π2 ((c, d, e1 , . . . , en )) = d for (c, d, e1 , . . . , en ) ∈ Υ1 . Let A00d = π2−1 ({d}) for d ∈ Σ, Z 00 = Z 0 ∪ { (c, d, e1 , . . . , en ) ∈ Υ1 | n = 0 }, 0 I 00 = { (c, T ) ∈ Σ × Tm−1 Σ∪Υ1 | (c, π2 (T )) ∈ I } ∪ { ((c, d, e1 , . . . , en ), T ) ∈ Υ1 × Tm−1 Σ∪Υ1 | |T | = n and if ui is the i-th node of T , then π2 (`T (ui )) = ei }. Define A1 = (A ∩ A0 ∩ Z 0 ) ∪ { (c, d) ∈ Υ1 | c ∈ A, d ∈ A0 }, Z1 = Z, I1 = { ((c, d, e1 , . . . , en ), T ) | (c, d, e1 , . . . , en ) ∈ Υ1 , T ∈ Tm Σ∪Υ1 (n), (c, π1 (T )) ∈ I, m T [(c, e1 ), . . . , (c, en )] ∈ Loc (A00d , Z 00 , I 00 ) }. Note that A1 , Z1 , I1 are finite sets. We show ym (Locm+1 (A1 , Z1 , I1 )) = L ∩ L0 . We first note some useful facts about members of Locm+1 (A1 , Z1 , I1 ). To state this, we need another projection and a certain relabeling defined on some f subset of Tm+1 Σ∪Υ1 ,x . Define a projection π3 : Σ ∪ (Υ1 − Υ1 ) → Σ by π3 (c) = c if c ∈ Σ, π3 (((c, d, e1 , . . . , en ), P, 0)) = d, π3 (((c, d, e1 , . . . , en ), P, i)) = ei 21 if 1 ≤ i ≤ n. This proof is very long and laborious. An alternative approach would be to use the notion of a recognizable (equivalently, MSO-definable) set of m-dimensional trees [24,23] and rely on the fact that the yield mapping is an MSO-definable transduction. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 43 Let M = { T ∈ Tm+1 Σ∪Υ1 ,x | for all v ∈ T , (i) v ∈ dom(≺Tm+1 ) if and only if `T1 (v) ∈ Υ1 T1 (v) ∈ Tm and (ii) `T1 (v) = (c, d, e1 , . . . , en ) implies Cm+1 Σ∪Υ1 (n) }. It is clear that Locm+1 (A1 , Z1 , I1 ) ⊆ M . For T1 = (T, `T1 ) ∈ M , define T1† = † (T, `T1 ) by ( `T1 (v) if `T1 (v) ∈ Σ ∪ Υ1 , T1† ` (v) = (c, ei ) if `T1 (v) = x, u Jm+1,i v, and `T1 (u) = (c, d, e1 , . . . , en ). (14) It is easy to see that for all u ∈ T , we have † T1 (u))). π2 (`T1 (u)) = π3 (`encm (T1 ) (fenc m (15) Now let T1 = (T, `T1 ) ∈ Locm+1 (A1 , Z1 , I1 ). Since T1 ∈ Locm+1 (A1 , Z1 , I1 ), it is also easy to see the following: † † u ∈ dom(≺Tm+1 ) implies π2 (`T1 (u)) = π2 (`T1 (u · (m + 1))), u 1 JT m+1,i v and u CTm,i T1† w imply π2 (` T1† (v)) = π2 (` (w)). (16) (17) It follows from (15), (16), and (17) that if u ∈ dom(≺Tm+1 ) or `T1 (u) = x, then T1 T1 π3 (`T1 (fenc (u)) = π3 (`T1 (fenc (u) · m)). m m T1 fenc (dom(≺Tm+1 ) m T1 (18) T1 Let U = ∪ { v ∈ T | ` (v) = x }). Note that if T1 (u) 6∈ U ), then `T1 (u) = u ∈ T and ` (u) ∈ Σ (or, equivalently, if fenc m T1 `encm (T1 ) (fenc (u)) = `ym (T1 ) (fyTm1 (u)). By (18), we can see that for all nodes m w, t of encm (T1 ), fU (w) = fU (t) implies π3 (`encm (T1 ) (w)) = π3 (`encm (T1 ) (t)). So if w0 is the unique node such that w0 6∈ U and fU (w) = fU (w0 ), then π3 (`encm (T1 ) (w)) = π3 (`encm (T1 ) (w0 )) = π3 (`ym (T1 ) (fU (w0 ))) = `ym (T1 ) (fU (w0 )) = `ym (T1 ) (fU (w)). It follows from this and (15) that for all u ∈ T , we have † π2 (`T1 (u)) = `ym (T1 ) (fyTm1 (u)). (19) Now for T ∈ Locm+1 (A, Z, I), define T̂ = (T, `T̂ ) ∈ Tm+1 Σ∪Υ1 ,x by T ` (u) if u 6∈ dom(≺Tm+1 ), (c, d, e , . . . , e ) if u ∈ dom(≺T ), c = `T (u), 1 n m+1 `T̂ (u) = ym (T ) T T T d = ` (f ym (u)), |Cm (u)| = n, u Cm,i wi , ei = `ym (T ) (fyTm (wi )) for i = 1, . . . , n. (20) 44 Makoto Kanazawa Then for all T ∈ Locm+1 (A, Z, I), we have T̂ ∈ M , π1 (T̂ ) = T , and ym (T ) = ym (T̂ ). We claim that if T1 ∈ Locm+1 (A1 , Z1 , I1 ) and T = π1 (T1 ), then T ∈ Locm+1 (A, Z, I) and T1 = T̂ . (21) The definition of A1 , Z1 , I1 easily implies π1 (Locm+1 (A1 , Z1 , I1 )) ⊆ Locm+1 (A, Z, I). Suppose T1 ∈ Locm+1 (A1 , Z1 , I1 ) and let T = π1 (T1 ). Clearly, we have ym (T ) = ym (T1 ) and fyTm1 = fyTm . To prove T1 = T̂ , all we need to show is that (20) holds with T1 in place of T̂ . If u 6∈ dom(≺Tm+1 ), then `T1 (u) ∈ Z1 = Z ⊆ Σ ∪ {x}, so `T (u) = π1 (`T1 (u)) = `T1 (u). Suppose T T 1 u ∈ dom(≺Tm+1 ), |Cm (u)| = n, and for i = 1, . . . , n, u JT m+1,i vi , u Cm,i wi . T1 T1 T1 Since T1 is well-labeled, Cm+1 (u) ∈ Tm Σ∪Υ1 (n). Since (` (u), Cm+1 (u)) ∈ I1 , T1 ` (u) = (c, d, e1 , . . . , en ) for some c ∈ Υ − Σ and d, e1 , . . . , en ∈ Σ. By (19), d = † π2 (`T1 (u)) = π2 (`T1 (u)) = `ym (T1 ) (fyTm1 (u)) = `ym (T ) (fyTm (u)). By (17), we have † † † ei = π2 (`T1 (vi )) = π2 (`T1 (wi )). By (19) again, π2 (`T1 (wi )) = `ym (T1 ) (fyTm1 (wi )), so ei = `ym (T1 ) (fyTm1 (wi )) = `ym (T ) (fyTm (wi )). We have shown that (20) holds with T1 in place of T̂ . This proves (21). We next prove that the following equivalence holds for all T ∈ L = Locm+1 (A, Z, I): ym (T ) ∈ L0 if and only if T̂ ∈ Locm+1 (A1 , Z1 , I1 ). (22) It easily follows from (21) and (22) that L ∩ L0 = ym (Locm+1 (A1 , Z1 , I1 )) and hence L ∩ L0 is simple context-free. We prove (22). Let T ∈ Locm+1 (A, Z, I) and V = ym (T ) = ym (T̂ ). Since † T̂ ∈ M , the relabeling T̂ † = (T, `T̂ ) of T̂ given by (14) is defined. From the way T̂ is defined, it is clear that for all u ∈ T , we have † π2 (`T̂ (u)) = `ym (T ) (fyTm (u)). (23) For the “only if” direction of (22), suppose V ∈ L0 . We show that T̂ satisfies the requirements L1–L3 for membership in Locm+1 (A1 , Z1 , I1 ). L1. We have `T̂ (ε) ∈ A1 if and only if either ε 6∈ dom(≺Tm+1 ) and `T (ε) = V ` (ε) ∈ A ∩ A0 ∩ Z 0 or ε ∈ dom(≺Tm+1 ), `T (ε) ∈ A, and `ym (T ) (fyTm (ε)) = `V (ε) ∈ A0 . Since T ∈ Locm+1 (A, Z, I) and V ∈ L0 = Locm (A0 , Z 0 , I 0 ), clearly one of these conditions holds. L2. Suppose u ∈ T − dom(≺Tm+1 ). Then `T̂ (u) = `T (u) ∈ Z = Z1 . L3. Suppose u ∈ dom(≺Tm+1 ). Let (c, d, e1 , . . . , en ) = `T̂ (u) and T̂ T̂ † W = Cm+1 (u)[(c, e1 ), . . . , (c, en )] = Cm+1 (u). We need to show W ∈ Locm (A00d , Z 00 , I 00 ). † – We show `W (ε) ∈ A00d . By (23), π2 (`W (ε)) = π2 (`T̂ (u · (m + 1)) = `ym (T ) (fyTm (u · (m + 1))) = `ym (T ) (fyTm (u)) = d. So `W (ε) ∈ A00d . Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 45 W 00 – Let w ∈ W − dom(≺W m ). We show ` (w) ∈ Z . Note that since w 6∈ W T dom(≺m ), we have u · (m + 1) · w 6∈ dom(≺m ). Case 1. u · (m + 1) · w ∈ dom(≺Tm+1 ). Then `T̂ (u · (m + 1) · w) 6= x and T `W (w) = `T̂ (u · (m + 1) · w). Since Cm (u · (m + 1) · w) = ∅, `W (w) = T̂ 0 0 ` (u · (m + 1) · w) is of the form (c , d ) and hence is in Z 00 . Case 2. u · (m + 1) · w 6∈ dom(≺Tm+1 ). In this case, either `W (w) = `T̂ (u · (m + 1) · w) = `T (u · (m + 1) · w) = c0 for some c0 ∈ Σ or `T̂ (u · (m + 1) · w) = x and `W (w) = (c, ei ) for some i. In the latter case, `W (w) ∈ Z 00 by the definition T V of Z 00 . In the former case, Cm (u · (m + 1) · w) = Cm (fyVm (f · (m + 1) · w)), T T and since u · (m + 1) · w 6∈ dom(≺m ), fym (u · (m + 1) · w) 6∈ dom(≺Vm ). Since V ∈ L0 , `W (w) = `T (u · (m + 1) · w) = `V (fyTm (u · (m + 1) · w)) ∈ Z 0 ⊆ Z 00 . T T̂ – Let w ∈ dom(≺W m ). Then u · (m + 1) · w ∈ dom(≺m ) and ` (u · (m + 1) · w) 6= W x, so `W (w) = `T̂ (u · (m + 1) · w). We show (`W (w), Cm (w)) ∈ I 00 . Let W W p = |Cm (w)| and for i = 1, . . . , p, let wi be the i-th node of Cm (w). T W Case 1. u · (m + 1) · w 6∈ dom(≺m+1 ). Then ` (w) ∈ Σ and `W (w) = W (w) = `ym (T̂ ) (fyT̂m (u · (m + 1) · w)) = `V (fyTm (u · (m + 1) · w)). Also, Cm T V T Cm (u · (m + 1) · w) = Cm (fym (u · (m + 1) · w)). For i = 1, . . . , p, π2 (`W (w · † m · wi )) = π2 (`T̂ (u · (m + 1) · w · m · wi )) = `V (fyTm (u · (m + 1) · w · m · wi )) by (23). Since u · (m + 1) · w 6∈ dom(≺Tm+1 ), fyTm (u · (m + 1) · w · m · wi ) = V W (fyTm (u · (m + 1) · w)). (w)) = Cm fyTm (u · (m + 1) · w) · m · wi . Hence π2 (Cm m 0 0 0 0 W W Since V ∈ L = Loc (A , Z , I ), (` (w), π2 (Cm (w))) = (`V (fyTm (u · (m + V W 1) · w)), Cm (fyTm (u · (m + 1) · w))) ∈ I 0 . Therefore, (`W (w), Cm (w)) ∈ I 00 . Case 2. u · (m + 1) · w ∈ dom(≺Tm+1 ). Then by the definition of T̂ , `W (w) = `T̂ (u · (m + 1) · w) = (c0 , d0 , e01 , . . . , e0p ), where for i = 1, . . . , p, e0i = `V (fyTm (u · W † (m + 1) · w · m · wi )). We have `Cm (w) (wi ) = `W (w · m · wi ) = `T̂ (u · (m + 1) · W w ·m·wi ), and by (23), π2 (`Cm (w) (wi )) = `V (fyTm (u·(m+1)·w ·m·wi )) = e0i , W (w)) ∈ I 00 . so (`W (w), Cm For the “if” direction of (22), suppose T̂ ∈ Locm+1 (A1 , Z1 , I1 ). We show that ym (T ) = V satisfies the requirements L1–L3 for membership in Locm (A0 , Z 0 , I 0 ). For each v ∈ V , let h(v) be the unique node in T − dom(≺Tm+1 ) such that `T (h(v)) 6= x and fyTm (h(v)) = v. We have `T (h(v)) = `T̂ (h(v)) = `V (v) ∈ Σ V T (v). (h(v)) = Cm and Cm T̂ L1. Since ` (ε) ∈ A1 , either ε 6∈ dom(≺Tm+1 ) and `T (ε) = `V (ε) ∈ A∩A0 ∩Z 0 , or ε ∈ dom(≺Tm+1 ), `T (ε) ∈ A, and `V (fyTm (ε)) = `V (ε) ∈ A0 . So in either case, we have `V (ε) ∈ A0 . L2. Let v ∈ V − dom(≺Vm ). If h(v) = ε, then ε 6∈ dom(≺Tm+1 ) and `T̂ (ε) = `V (v) ∈ A ∩ A0 ∩ Z 0 , so `V (v) ∈ Z 0 . If h(v) 6= ε, let u ∈ T be such that u CTm+1 h(v) = u · (m + 1) · w. Let `T̂ (u) = (c, d, e1 , . . . , en ) and T̂ V T W W = Cm+1 (u)[(c, e1 ), . . . , (c, en )]. Since ∅ = Cm (v) = Cm (h(v)) = Cm (w), w 6∈ m W dom(≺m ). Since ((c, d, e1 , . . . , en ), W ) ∈ I1 , we have W ∈ Loc (A00d , Z 00 , I 00 ), 46 Makoto Kanazawa and hence `V (v) = `T̂ (h(v)) = `W (w) ∈ Z 00 . Since `V (v) ∈ Σ, it follows that `V (v) ∈ Z 0 . V T L3. Let v ∈ dom(≺Vm ). Since Cm (v) = Cm (h(v)) 6= ∅, h(v) 6= ε. Let u ∈ T T be such that u Cm+1 = h(v) = u · (m + 1) · w. Let `T̂ (u) = (c, d, e1 , . . . , en ). T̂ V Then W = Cm+1 (u)[(c, e1 ), . . . , (c, en )] ∈ Locm (A00d , Z 00 , I 00 ). We have Cm (v) = T W V Cm (h(v)) = Cm (w). Let p = |Cm (v)| and for i = 1, . . . , p, let vi be such that W v CVm,i v · m · vi . We must have (`W (w), Cm (w)) ∈ I 00 , and since `W (w) = V W W 0 ` (v) ∈ Σ, we get (` (w), π2 (Cm (w))) ∈ I . By (23), † π2 (`W (w · m · vi )) = π2 (`T̂ (u · (m + 1) · w · m · vi )) = `V (fyTm (u · (m + 1) · w · m · vi )) = `V (fyTm (h(v) · m · vi )) = `V (fyTm (h(v)) · m · vi ) = `V (v · m · vi ). V W So (`V (v), Cm (v)) = (`W (w), π2 (Cm (w))) ∈ I 0 . We have established (22). This concludes the proof. t u m Clearly, T ∈ DT m Σ implies delm,Σ−Σ e (T ) ∈ TΣ . We obtain the following generalization of the Chomsky-Schützenberger theorem: Theorem 33. Let L ⊆ Tm Σ . The following are equivalent: (i) L is simple context-free. (ii) There exist finite alphabets Υ, Υ 0 , a projection π : Υ 0 → Υ , and a local set R ⊆ Tm0 such that L = delm,Υe−Υ (e π (R ∩ DT m Υ 0 )). Υe Proof. (ii) ⇒ (i). Suppose R ⊆ Tm0 is a local set. Clearly, R ⊆ Tm0 for some Υe Υe q ,p m m m p, q. So R ∩ DT Υ 0 = R ∩ DT Υ 0 ∩ Tm0 . By Lemma 30, DT Υ 0 ∩ Tm0 is simple Υe q ,p Υe q ,p context-free. It then follows by Lemma 32 that L = delm,Υe−Υ (e π (R ∩ DT m Υ 0 )) = m m delm,Υe−Υ (e π (R ∩ DT Υ 0 ∩ T 0 )) is simple context-free. Υe q ,p (i) ⇒ (ii). Let K ⊆ Tm+1 Υ,x be a local set such that L = ym (K). By Lemma 29, there exist a projection π : Υ 0 → Υ , and a local set R ⊆ Tm0 such that encm (K) = Υe π e(R ∩ DT m Υ 0 ). So L = ym (K) = delm,Υe−Υ (encm (K)) = delm,Υe−Υ (e π (R ∩ DT m Υ 0 )). t u As was the case with the original Chomsky-Schützenberger Theorem, in the proof of Theorem 33, enc−1 e is a bijection from R ∩ DT m m ◦π Υ 0 to K. (See the second statement in Lemma 29.) Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 9 47 A Chomsky-Schützenberger-Weir Representation Theorem for Simple Context-Free Tree Grammars We are now going to use Theorem 33 to obtain a generalization of Weir’s representation theorem about the string languages of tree-adjoining grammars to the string languages of simple context-free tree grammars. First, we prove a lemma that generally holds of m-dimensional Dyck tree languages. The following lemma is straightforward. Lemma 34. Let Σ 0 be a finite alphabet and π : Σ 0 → Σ be a projection. If L is −1 a super-local subset of Tm (L) is a super-local subset of Tm Σ , then π Σ0 . Lemma 35. Let m ≥ 2. For any local set L ⊆ Tm , there exist a finite alphabet e Σ Σ 0 , a degree-bounded, super-local L0 ⊆ Tm0 , and a projection π : Σ 0 → Σ that e Σ satisfy the following conditions: (i) π e(L0 ) ⊆ L. (ii) L ∩ DT m e(L0 ∩ DT m e maps L0 ∩ DT m Σ = π Σ 0 ). Moreover, π Σ 0 bijectively to m L ∩ DT Σ . Proof. By Lemma 10, there are a finite alphabet Σ1 , a super-local subset L1 e of Tm Σ1 , and a projection π1 : Σ1 → Σ such that π1 maps L1 bijectively to L. −1 m Since π f1 (L ∩ DT Σ ) is not a subset of an m-dimensional Dyck tree language, −1 we have to relabel some nodes of π f1 (T ) for T ∈ L ∩ DT m Σ to get a set of the m 0 form L ∩ DT Σ 0 . For d ∈ Σ, P a finite prefix-closed subset of Pm−1 , and i ∈ [0, |P |], let ∆d,P,i = { δ ∈ Σ1 | π1 (δ) = (d, P, i) }. Define Σ2 = Σ1 − [ ∆d,P,i , d,P,i ∆d,P = { (δ0 , δ1 , . . . , δ|P | ) | δi ∈ ∆d,P,i for i = 1, . . . , |P | }, [ ∆= ∆d,P , d,P 0 Σ = Σ2 ∪ ∆. Note that Σ 0 is a finite alphabet. Define a projection π : Σ 0 → Σ by π(c) = π1 (c) if c ∈ Σ2 , π(δ) = d if δ ∈ ∆d,P . f0 to m-dimensional trees over Σ. e Let Then π e maps m-dimensional trees over Σ f0 | δ ∈ ∆d,P , 0 ≤ i ≤ |P | }, ∆0 = { (δ, P, i) ∈ Σ 48 Makoto Kanazawa and define a projection π2 : Σ2 ∪ ∆0 → Σ1 by π2 (c) = c π2 (((δ0 , δ1 , . . . , δ|P | ), P, i)) = δi if c ∈ Σ2 , if (δ0 , δ1 , . . . , δk ) ∈ ∆d,P and 0 ≤ i ≤ |P |. Then for T ∈ Tm Σ2 ∪∆0 , π e(T ) = π1 (π2 (T )). Let L0 = π e−1 (L) ∩ Tm Σ2 ∪∆0 . Then L0 = π2−1 (π1−1 (L)) = π2−1 (L1 ). m By Lemma 34, L0 is a super-local subset of Tm . Σ2 ∪∆0 and hence of TΣ e0 m m 0 Clearly, π e(L ) ⊆ L, so (i) holds. Since π e(DT Σ 0 ) ⊆ DT Σ always holds for any m projection π : Σ 0 → Σ, we also have π e(L0 ∩ DT m Σ 0 ) ⊆ L ∩ DT Σ . m It remains to show that for each T ∈ L ∩ DT Σ , there is a unique T 0 ∈ 0 L ∩ DT m e(T 0 ) = T . Σ 0 such that π m Let T ∈ L ∩ DT Σ . We relabel the nodes of T to turn it into a T̂ ∈ DT m Σ0 . Recall that π1 maps L1 bijectively to L, so we have T1 = π1−1 (T ) ∈ L1 . Let V̂ V = (V, `V ) = enc−1 m (T ). Define V̂ = (V, ` ) by V `T1 (fenc (v)) if v ∈ V − dom(≺Vm+1 ) and `V (v) 6= x, m if `V (v) = x, x V̂ V ` (v) = (δ0 , δ1 , . . . , δk ) if v ∈ dom(≺Vm+1 ), |Cm (v)| = k, V T 1 (v)), and (f δ = ` 0 encm V T1 v JV m+1,i vi , δi = ` (fencm (vi )) for i = 1, . . . , k. Let T̂ = encm (V̂ ). If v ∈ V − dom(≺Vm+1 ) and `V̂ (v) 6= x, it is easy to see that `V̂ (v) = V V (v), k = |P |, and `T̂ (fenc (v)) ∈ Σ2 ⊆ Σ 0 . Let v ∈ dom(≺Vm+1 ), P = Cm m V V̂ v Jm+1,i vi for i = 1, . . . , k. Let (δ0 , δ1 , . . . , δk ) = ` (v) and d = `V (v). Then V π1 (δ0 ) = `T (fenc (v)) = (d, P, 0) m and for i = 1, . . . , k, V π1 (δi ) = `T (fenc (vi )) = (d, P, i). m This implies that (δ0 , δ1 , . . . , δk ) = `V̂ (v) ∈ ∆d,P ⊆ ∆ ⊆ Σ 0 . We have V V `T̂ (fenc (v)) = ((δ0 , δ1 , . . . , δk ), P, 0) ∈ ∆0 and for i = 1, . . . , k, `T̂ (fenc (vi )) = m m m m+1 0 m ((δ0 , δ1 , . . . , δk ), P, i) ∈ ∆ . Therefore, V̂ ∈ TΣ 0 ,x and T̂ ∈ DT Σ 0 ∩ TΣ2 ∪∆0 . It Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 49 is also easy to see that π2 (T̂ ) = T1 , so T̂ ∈ π2−1 (L1 ) = L0 . We have shown T̂ ∈ L0 ∩ DT m e(T̂ ) = π1 (π2 (T̂ )) = π1 (T1 ) = T . Σ 0 . Since π2 (T̂ ) = T1 , π Now to show uniqueness, suppose T 0 ∈ L0 ∩ DT m e(T 0 ). Let Σ 0 and T = π 0 −1 0 0 0 T1 = π1 (T ). Then we have T1 = π2 (T ). We prove T = T̂ . Let V = (V, `V ) = 0 V 0 enc−1 m (T ) and V = (V, ` ) = π(V ). Then it is clear that encm (V )0 = T . So it 0 suffices to prove V = V̂ . Let v ∈ V . If `V (v) = x, then clearly, `V (v) = x = 0 0 V0 `V̂ (v). If v ∈ V − dom(≺Vm+1 ) and `V (v) 6= x, then `V (v) = `T (fenc (v)) ∈ m 0 0 m+1 0 0 0 m V T V0 Σ2 , since V ∈ TΣ 0 ,x and T ∈ L ⊆ TΣ2 ∪∆0 . So ` (v) = ` (fencm (v)) = 0 0 V V V π2 (`T (fenc (v))) = `T1 (fenc (v)) = `V̂ (v). If v ∈ dom(≺Vm+1 ) and Cm (v) = P , m m T0 V0 V0 0 V0 then ` (fencm (v)) = (` (v), P, 0) ∈ ∆ , so ` (v) = (δ0 , δ1 , . . . , δk ), where k = |P | and (δ0 , δ1 , . . . , δk ) ∈ ∆d,P for some d. For i = 1, . . . , k, let vi be such 0 V that v JV m+1,i vi , or, equivalently, v Jm+1,i vi . Then for i = 1, . . . , k, δi = 0 0 V V π2 ((δ0 , δ1 , . . . , δk ), P, i) = π2 (`T (fenc (vi ))) = `T1 (fenc (vi )). We also have m m 0 0 T V T1 V δ0 = π2 ((δ0 , δ1 , . . . , δk ), P, 0) = π2 (` (fencm (v))) = ` (fencm (v)). So `V̂ (v) = 0 (δ0 , δ1 , . . . , δk ) = `V (v). t u Next we prove a lemma about DT 2Υ . Recall that there was an implicit dependence on the dimension m in the definition of Υe, which is the alphabet e of the language DT m Υ ; when a symbol of the form (c, P, i) is in Υ , it is assumed that P is a finite, possibly empty, prefix-closed subset of Pm−1 . In what follows, we assume that the alphabet Υe is defined from Υ with respect to dimension m = 2, so that (c, P, i) ∈ Υe implies P = {ε, 1, . . . , 1k−1 } for some k ≥ 0. We abbreviate (c, P, i) by (c, k, i), where |P | = k. Under this convention, Υeq = Υ ∪ { (c, k, i) | c ∈ Υ, 0 ≤ k ≤ q, 0 ≤ i ≤ k }. Recall that for any alphabet Σ, the alphabet ΓΣ consists of symbols of the form [c or ]c with c ∈ Σ. We let TΥ,Υe−Υ stand for the set of trees T ∈ TΥe such that for every v ∈ T , T ` (v) ∈ Υe − Υ implies |C2T (v)| = 1. (In other words, Υe − Υ is regarded as a ranked alphabet all of whose symbols have rank 1.) Lemma 36. Let η : Γ ∗ → Γ ∗ be the alphabetic homomorphism defined as fole e Υ Υ lows: η([c ) = ε, η(]c ) = ε for c ∈ Υ , η([(c,k,0) ) = [(c,k,0) , η(](c,k,0) ) = ](c,k,k) , η([(c,k,i) ) = ](c,k,i−1) , η(](c,k,i) ) = [(c,k,i) for 1 ≤ i ≤ k. Then DT 2Υ = TΥ,Υe−Υ ∩ enc−1 (η −1 (DΥe)). (Here, enc is the standard encoding function defined on ordinary 2-dimensional trees.) 50 Makoto Kanazawa Proof. (⊆). Suppose T ∈ DT 2Υ . Clearly, T ∈ TΥ,Υe−Υ , so it suffices to prove η(enc(T )) ∈ DΥe. This is proved by induction on the length of the reduc∗ tion T T 0 ∈ TΥ . If T ∈ TΥ , then clearly, η(enc(T )) = ε ∈ DΥe. ∗ Suppose T T 00 T 0 ∈ TΥ . Then T = U [(c, k, 0) −2 T0 [(c, k, 1) −2 T1 , . . . , (c, k, k) −2 Tk ]] and T 0 = U [T0 [T1 , . . . , Tk ]] for some c ∈ Υ , k ≥ 0, U ∈ TΥe(1), T0 ∈ TΥ (k), and Ti ∈ TΥe (i = 1, . . . , k). Then η(enc(T )) = z1 [(c,k,0) ](c,k,0) y1 [(c,k,1) ](c,k,1) y2 . . . yk [(c,k,k) ](c,k,k) z2 and η(enc(T 00 )) = z1 y1 y2 . . . yk z2 for some z1 , z2 , y1 , y2 , . . . , yk ∈ (ΓΥe−Υ )∗ . By induction hypothesis, η(enc(T 00 )) ∈ DΥe, and this easily implies η(enc(T )) ∈ DΥe. (⊇). Suppose T ∈ TΥ,Υe−Υ and η(enc(T )) ∈ DΥe. If T ∈ TΥ , then T ∈ DT 2Υ . Suppose that T 6∈ TΥ . First, we claim that T must have a node labeled by (c, k, 0) for some c ∈ Υ and k ≥ 0. For, if T has a node v such that `T (v) = (c, k, i) for some c ∈ Υ , k ≥ 1, and i ≥ 1, then enc(T ) = x1 [(c,k,i) x2 ](c,k,i) x3 and η(enc(T )) = η(x1 ) ](c,k,i−1) η(x2 ) [(c,k,i) η(x3 ) for some x1 , x2 , x3 ∈ Γ ∗ . e Υ Since η(enc(T )) ∈ DΥe, [(c,k,i−1) must occur in η(x1 ). If i − 1 = 0, this implies that [(c,k,0) occurs in x1 and it follows that T has a node labeled by (c, k, 0). Otherwise, ](c,k,i−1) occurs in x1 , and it follows that T has a node labeled by (c, k, i − 1). Repeating this reasoning, we see that T must have a node labeled by (c, k, 0). We show that T ∈ DT 2Υ by induction on the number of nodes of T that are labeled by a symbol of the form (c, k, 0). Let v be a node of T labeled by + (c, k, 0) such that no node v 0 with v (CT2 ) v 0 is labeled by a symbol of the form (d, l, 0). Then enc(T ) = x1 [(c,k,0) y ](c,k,0) x2 for some x1 , x2 ∈ Γ ∗ and e Υ y ∈ enc(T2 ). (Note that |C2T (v)| = 1.) e−Υ Υ,Υ Case 1. k = 0. We show y ∈ DΥ0 , i.e., the subtree T0 of T rooted at v · 2 is in TΥ . Suppose otherwise, and take the alphabetically first node of T0 labeled by some (d, l, j) ∈ Υe − Υ . Then y = y 0 [(d,l,j) y 00 for some y 0 ∈ ΓΥ∗ and y 00 ∈ Γ ∗ . By our assumption about v, j ≥ 1 and l ≥ 1. We have η(enc(T )) = e Υ η(x1 ) [(c,0,0) η(y) ](c,0,0) η(x2 ) = η(x1 ) [(c,0,0) ](d,l,j−1) η(y 00 ) ](c,0,0) η(x2 ), which contradicts the assumption that η(enc(T )) ∈ DΥe. So T0 is in TΥ , and we can write T = U [(c, 0, 0) −2 T0 ]. So T U [T0 ] ∈ TΥ,Υe−Υ . We have η(enc(T )) = η(x1 ) [(c,0,0) ](c,0,0) η(x2 ) and η(enc(U [T0 ])) = η(x1 )η(x2 ). Since η(enc(T )) ∈ DΥe, η(x1 )η(x2 ) ∈ DΥe as well, and U [T0 ] ∈ DT 2Υ by induction hypothesis. Since T U [T0 ], we conclude T ∈ DT 2Υ . Case 2. k ≥ 1. We show y = z0 [(c,k,1) y1 ](c,k,1) z1 [(c,k,2) y2 ](c,k,2) . . . zk−1 [(c,k,k) yk ](c,k,k) zk for some z0 , z1 , . . . , zk ∈ ΓΥ∗ and y1 , y2 , . . . , yk ∈ enc(TΥ,Υe−Υ ). First, we show by induction that the following condition holds for i = 0, . . . , k: y has a prefix of the form z0 [(c,k,1) y1 ](c,k,1) . . . zi−1 [(c,k,i) yi ](c,k,i) . (24) The case of i = 0 is trivial. Suppose we have shown (24) for i < k, i.e., y = z0 [(c,k,1) y1 ](c,k,1) . . . zi−1 [(c,k,i) yi ](c,k,i) y 0 with z0 , . . . , zi−1 ∈ ΓΥ∗ and Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 51 y1 , . . . , yi ∈ enc(TΥ,Υe−Υ ). Since η(enc(T )) = η(x1 ) [(c,k,0) η(y) ](c,k,k) η(x2 ) = η(x1 ) [(c,k,0) ](c,k,0) η(y1 ) [(c,k,1) . . . ](c,k,i−1) η(yi ) [(c,k,i) η(y 0 ) ](c,k,k) , we must have η(y 0 ) 6= ε. Let zi be the longest prefix of y 0 in ΓΥ∗ . Since z0 , . . . , zi−1 ∈ ΓΥ∗ and y1 , . . . , yi and y are all in enc(TΥ,Υe−Υ ), it is easy to see that y 0 = zi [(d,l,j) yi+1 ](d,l,j) y 00 for some yi+1 ∈ enc(TΥ,Υe−Υ ) and l, j ≥ 1. We have η(y 0 ) = ](d,l,j−1) η(yi+1 ) [(d,l,j) η(y 00 ), and so η(enc(T )) = η(x1 ) [(c,k,0) ](c,k,0) η(y1 ) [(c,k,1) . . . ](c,k,i−1) η(yi ) [(c,k,i) ](d,l,j−1) η(yi+1 ) [(d,l,j) η(y 00 ) ](c,k,k) , which implies d = c, l = k, and j = i + 1. This shows that (24) holds with i + 1 in place of i. By induction, (24) holds with i = k. We have y = z0 [(c,k,1) y1 ](c,k,1) . . . zk−1 [(c,k,k) yk ](c,k,k) y 0 with z0 , . . . , zk−1 ∈ ΓΥ∗ and y1 , . . . , yk ∈ enc(TΥ,Υe−Υ ). We show that y 0 ∈ ΓΥ∗ . Suppose otherwise. Then we must have y 0 = zk [(d,l,j) yk+1 ](d,l,j) y 00 for some d ∈ Υ , j, l ≥ 1, zk ∈ ΓΥ∗ , and yk+1 ∈ enc(TΥ,Υe−Υ ). Then η(enc(T )) contains as a substring [(c,k,k) ](d,l,j−1) , which is a contradiction since η(enc(T )) ∈ DΥe and j − 1 < l. Therefore, y = z0 [(c,k,1) y1 ](c,k,1) . . . zk−1 [(c,k,k) yk ](c,k,k) zk with z0 , . . . , zk ∈ ΓΥ∗ and y1 , . . . , yk ∈ enc(TΥ,Υe−Υ ). This means that T is of the form T = U [(c, k, 0) −2 T0 [(c, k, 1) −2 T1 , . . . , (c, k, k) −2 Tk ]], where T0 ∈ TΥ (k). So T T 0 = U [T0 [T1 , . . . , Tk ]]. Clearly, T 0 ∈ TΥ,Υe−Υ , and η(enc(T 0 )) = η(x1 )η(y1 ) . . . η(yk )η(x2 ). Since η(enc(T )) = η(x1 ) [(c,k,0) ](c,k,0) η(y1 ) [(c,k,1) ](c,k,1) η(y2 ) . . . [(c,k,k−1) ](c,k,k−1) η(yk ) [(c,k,k) ](c,k,k) η(x2 ) is in DΥe, it follows that η(enc(T 0 )) is in DΥe as well, and the induction hypothesis gives T 0 ∈ DT 2Υ . Since T T 0 , we conclude T ∈ DT 2Υ . t u 52 Makoto Kanazawa Lemma 37. If L ⊆ T3Σ,x is a local set, then there exist a finite alphabet Υ , a projection π : Υ → Σ, and a local set R ⊆ Γ + such that e Υ b enc(enc2 (L)) = π e(R ∩ DΥe ∩ η −1 (DΥe)), where η is the alphabetic homomorphism defined in Lemma 36. Moreover, −1 b enc−1 ◦π e maps R ∩ DΥe ∩ η −1 (DΥe) bijectively to L. 2 ◦ enc Proof. By Lemma 29, there exist a finite alphabet Υ1 , a projection π1 : Υ1 → Σ, f1 is a bijection f1 (L1 ∩DT 2Υ1 ) and π and a local set L1 ⊆ T2 such that enc2 (L) = π Υe1 2 from L1 ∩ DT Υ1 to enc2 (L). We may assume L1 ⊆ TΥ ,Υe −Υ . By Lemma 35, 1 1 1 there exist a finite alphabet Υ2 , a projection π2 : Υ2 → Υ1 , and a super-local set L2 ⊆ T2 such that π f2 (L2 ) ⊆ L1 and π f2 (L2 ∩ DT 2Υ2 ) = L1 ∩ DT 2Υ1 . Since Υe2 L1 ⊆ TΥ ,Υe −Υ , it follows that L2 ⊆ TΥ ,Υe −Υ . We have 1 1 1 2 2 2 enc(enc2 (L)) = enc(f π1 (L1 ∩ DT 2Υ1 )) = enc(f π1 (f π2 (L2 ∩ DT 2Υ2 ))) c c =π f1 (π f2 (enc(L2 ∩ DT 2Υ2 ))) c c =π f1 (π f2 (enc(L2 ) ∩ enc(DT 2Υ2 ))), (25) since enc is injective. By Lemma 36, enc(DT 2Υ2 ) = enc(TΥ ,Υe −Υ ) ∩ η2−1 (DΥe ), 2 2 2 2 where η2 : Γ ∗ → Γ ∗ is an alphabetic homomorphism defined like η. Since L2 ⊆ Υe2 Υe2 TΥ ,Υe −Υ , 2 2 2 enc(L2 ) ∩ enc(DT 2Υ2 ) = enc(L2 ) ∩ η2−1 (DΥe ). 2 By Lemma 2, enc(L2 ) = R2 ∩ D0 Υe2 for some local set R2 ⊆ Γ + . So we have Υe2 enc(L2 ) ∩ enc(DT 2Υ2 ) = R2 ∩ D0e ∩ η2−1 (DΥe ). Υ2 2 (26) Given (25) and (26), all we need is to turn R2 ∩ D0 ∩ η2−1 (DΥe ) into the 2 Υe2 c form π f3 (R ∩ DΥe ∩ η −1 (DΥe)). For this, we can use a method similar to the one we used in the proof of Lemma 6. Let Υ = Υ2 ∪ { c̄ | c ∈ Υ2 } and define π3 : Υ → Υ2 by π3 (c) = c, π3 (c̄) = c, for each c ∈ Υ2 . Let f2 }, ∆1 = { c̄ | c ∈ Υ2 } ∪ { (c̄, P, 0) | (c, P, 0) ∈ Υ f2 ∪ { (c̄, P, i) | (c, P, i) ∈ Υ f2 , i ≥ 1 }. ∆2 = Υ Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 53 Then ∆1 , ∆2 is a partition of Υe. Let ∗ c R = ({ [d | d ∈ ∆1 } Γ∆ { ]d | d ∈ ∆1 }) ∩ π f3 2 −1 (R2 ). Then R is a local subset of Γ + ,22 and it is easy to see e Υ c π f3 (R ∩ DΥe ∩ η −1 (DΥe)) = R2 ∩ D0e ∩ η2−1 (DΥe ), Υ2 2 (27) c where η −1 is as defined in Lemma 36. It is also easy to see that π f3 maps R ∩ −1 −1 0 DΥe ∩ η (DΥe) bijectively to R2 ∩ D ∩ η2 (DΥe ). 2 Υe2 We obtain the statement of the lemma from (25), (26), and (27) by letting π = π3 ◦ π2 ◦ π1 . t u 1 Recall that when Σ0 and Σ1 are disjoint alphabets, TΣ Σ0 consists of all (ordinary 2-dimensional) trees in TΣ0 ∪Σ1 that are disjointly labeled with Σ0 , Σ1 . Lemma 38. If L ⊆ T3Σ,x is a local set, then there exist an alphabet Σ 0 disjoint from Σ, a projection π : Σ ∪ Σ 0 → Σ, and a local set L0 ⊆ T3Σ∪Σ 0 ,x that satisfy the following conditions: (i) (ii) (iii) (iv) (v) π(c) = c for all c ∈ Σ. L = π(L0 ). Moreover, π maps L0 bijectively to L. 0 y2 (L0 ) ⊆ TΣ Σ . y2 (L) = π(y2 (L0 )). Moreover, π maps y2 (L0 ) bijectively to y2 (L). y(y2 (L)) = y(y2 (L0 )). Proof. Let L = Loc3 (A, Z, I) be a local subset of T3Σ,x . Let Σ 0 = { c̄ | c ∈ Σ }. Define a projection π : Σ ∪ Σ 0 → Σ by π(c) = c, π(c̄) = c, for each c ∈ Σ. Let Z 0 = Z ∪ { c̄ | c ∈ Z } I 0 = { (c, T 0 ) | (c, T ) ∈ I, T ∈ T2Σ (0) } ∪ { (c̄, T 0 ) | (c, T ) ∈ I, T ∈ T2Σ (n), n ≥ 1 }, 0 where for T = (T, `T ) ∈ T2Σ∪{x} , T 0 = (T, `T ) is defined by ( c̄ if `T (v) = c ∈ Σ and v ∈ dom(≺T2 ), ` (v) = T ` (v) otherwise. T0 Note that T 0 ∈ T2Σ∪Σ 0 ∪{x} . Define a local subset L0 of T3Σ∪Σ 0 ∪{x} by L0 = Loc3 (A, Z 0 , I 0 ). Then it is easy to see that π and L0 satisfy the required properties. t u 22 Although ΓΥe is an infinite alphabet, only finitely many symbols in it appear in −1 πb e3 (R2 ) since R2 is local. 54 Makoto Kanazawa Lemma 39. If L ⊆ T3Σ,x is a local set, then there exist a finite alphabet Υ , a local set R ⊆ Γ + , and an alphabetic homomorphism h : Γ ∗ → Σ ∗ such that e e Υ Υ y(y2 (L)) = h(R ∩ DΥe ∩ η −1 (DΥe)), where Υe is defined with respect to dimension 2 and η is the alphabetic homomorphism defined in Lemma 36. Proof. Let L ⊆ T3Σ,x be a local set. Applying Lemma 38 to L, we obtain a projection π 0 : Σ ∪ Σ 0 → Σ and a local set L0 ⊆ T3Σ∪Σ 0 ,x such that π 0 maps L0 0 0 bijectively to L, y2 (L0 ) ⊆ TΣ Σ , and y(y2 (L)) = y(y2 (L )). By Lemma 37, b enc(enc2 (L0 )) = π e(R ∩ DΥe ∩ η −1 (DΥe)) for some finite set Υ , projection π : Υ → Σ ∪ Σ 0 , and local set R ⊆ Γ + . We have e Υ y(y2 (L)) = y(y2 (L0 )) = hΣ0 ,Σ1 (enc(del2,Υe−Υ (enc2 (L0 ))))) = hΣ0 ,Σ1 (hΓΥ ,Γ Υ −Υ e = hΣ0 ,Σ1 (hΓΥ ,Γ Υ −Υ e (enc(enc2 (L0 )))) b (π e(R ∩ DΥe ∩ η −1 (DΥe)))), so the statement of the lemma holds with h = hΣ0 ,Σ1 ◦ hΓΥ ,Γ Υ −Υ e b ◦π e. t u Note that in the above proof, the set R ∩ DΥe ∩ η −1 (DΥe) is mapped bijectively −1 b to L by π 0 ◦ enc−1 ◦π e. 2 ◦ enc The following lemma is analogous to the corresponding characterization of context-free (string) languages. Lemma 40. Let L ⊆ TΣ . Then L ∈ CFTsp (r) if and only if there exist a finite alphabet Υ and a local set K ⊆ T3Υ,x such that (i) L = y2 (K), and (ii) for all T ∈ K and all v ∈ T , if v ∈ dom(≺T3 ), then |C2T (v)| ≤ r. Lemma 41. For every L ∈ CFTsp (r), there is an L0 ∈ CFTsp (r) such that enc(L) = y(L0 ). Proof. Let K ⊆ T3Υ,x be a local set satisfying condition (ii) of Lemma 40. By Lemma 38, we may assume K = Loc3 (A, Z, I) with Z ∩ { c | (c, T ) ∈ I } = ∅. Let A0 = A ∪ { c̄ | c ∈ A ∩ Z }, [ Z 0 = Z ∪ { {[c , ]c } | c ∈ Z }, I 0 = { (c, ϕ(T )) | (c, T ) ∈ I } ∪ { (c̄, c([c ]c )) | c ∈ A ∩ Z }. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 55 where for each c ∈ A ∩ Z, c̄ is a new symbol, and ϕ(x) = x, ( c([c ]c ) if c ∈ Z, ϕ(c) = c otherwise, ( c([c ϕ(T1 ) . . . ϕ(Tn ) ]c ) if c ∈ Z, ϕ(c(T1 . . . Tn )) = c(ϕ(T1 ) . . . ϕ(Tn )) otherwise. Then it is easy to see K 0 = Loc3 (A, Z 0 , I 0 ) also satisfies condition (ii) of Lemma 40, and we have enc(y2 (K)) = y(y2 (K 0 )). We omit the details. t u Recall that Γn = {[1 , ]1 , . . . , [n , ]n } and Dn is the Dyck language over Γn , where [i and ]i form a matching pair of brackets for i = 1, . . . , n. Theorem 42. Let q ≥ 1 and M ⊆ Σ ∗ . The following are equivalent: (i) M ∈ yCFTsp (q − 1). ∗ , and an alphabetic (ii) There exist a positive integer n, a local set R ⊆ Γqn ∗ ∗ homomorphism h : Γqn → Σ such that M = h(R ∩ Dqn ∩ g −1 (Dqn )), where g is the bijection on Γqn defined by g([qi+1 ) = [qi+1 , g(]qi+1 ) = ]qi+q , g([qi+j ) = ]qi+j−1 , g(]qi+j ) = [qi+j , for i = 0, . . . , n − 1 and j = 2, . . . , q. Proof. (ii) ⇒ (i). By Lemma 41 and the fact that yCFTsp (q−1) is a substitutionclosed full abstract family of languages [28], it suffices to show that there are some L ∈ CFTsp (q − 1) and homomorphism h such that h(enc(L)) = Dqn ∩g −1 (Dqn ). eq−1 ∪ {X0 , . . . , Xq−1 }. Let m = 2, Σ = {c1 , . . . , cn }, r = q − 1, p = 2, and Υ = Σ eq−1 , I ⊆ {X0 , . . . , Xq−1 } × T2 such that Lemma 30 gives finite sets A ⊆ Υ, Z = Σ Υ DT 2Σ ∩ T2e Σq−1 ,2 = y2 (Loc3 (A, Z, I)). Let L = DT 2Σ ∩ T2 . Inspection of the proof of Lemma 30 also shows that eq−1 ,2 Σ 3 for all T ∈ Loc (A, Z, I) and all v ∈ T , v ∈ dom(≺T3 ) implies |C2T (v)| ≤ q − 1. So L ∈ CFTsp (q S − 1). Let Φ = { (ci , q − 1, j) | 1 ≤ i ≤ n, 0 ≤∗ j ≤ q − 1 }.∗ We identify ΓΦ = { {[d , ]d } | d ∈ Φ } with Γqn . Let h : (ΓΣ eq−1 ) → (ΓΣ eq−1 ) be the homomorphism that erases all symbols that are not in Γqn . Our goal is to show h(enc(L)) = Dqn ∩ g −1 (Dqn ). To show h(enc(L)) ⊆ Dqn ∩ g −1 (Dqn ), suppose T ∈ L. Since L ∈ T2 , it eq−1 ,2 Σ is clear that h(enc(T )) ∈ Dqn . (28) 56 Makoto Kanazawa Since L ⊆ DT 2Σ , by Lemma 36, η(enc(T )) ∈ DΣ e q−1 , where η : Γ ∗ → Γ ∗ is as defined in Lemma 36, with Σ in place of Υ . So e e Σ Σ h(η(enc(T )) ∈ h(DΣ e q−1 ) = Dqn . Note that η restricted to Γqn coincides with g. So we have h(η(enc(T )) = η(h(enc(T ))) = g(h(enc(T ))). This shows that h(enc(T )) ∈ g −1 (Dqn ). (29) By (28) and (29), h(enc(L)) ⊆ Dqn ∩ g −1 (Dqn ). Now we show the converse inclusion. Let s ∈ Dqn ∩ g −1 (Dqn ). Then there is a hedge T ∈ H2Φ such that s = enc(T ). We turn T into a tree T 0 = ϕ(T ) ∈ T{c1 },Φ ∩ T2 , where symbols in Φ are assumed to have rank 1: eq−1 ,2 Σ ϕ((ci , q − 1, j)) = (ci , q − 1, j)(c1 ), ϕ((ci , q − 1, j)(T1 . . . Tn )) = (ci , q − 1, j)(ϕ(T1 . . . Tn )), ϕ(T1 . . . Tn ) = c1 (ϕ(T1 ) ϕ(T2 . . . Tn )) where n ≥ 2. Then s = h(enc(T 0 )). We have η(enc(T 0 )) = g(s) ∈ Dqn ⊆ DΣ e. 2 0 0 Since T 0 ∈ T{c1 },Φ ⊆ TΣ,Σ−Σ e , Lemma 36 implies that T ∈ DT Σ . So T ∈ L 0 −1 and s = h(enc(T )) ∈ h(enc(L)). We conclude Dqn ∩ g (Dqn ) ⊆ h(enc(L)). We have shown h(enc(L)) = Dqn ∩ g −1 (Dqn ). (i) ⇒ (ii). Let L ∈ CFTsp (q − 1) be such that M = y(L). By Lemma 40, L = y2 (K) and enc2 (K) ⊆ T2 for some finite set Ψ and some local set eq−1 Ψ b K ⊆ T3Ψ,x . By Lemma 37, enc(enc2 (K)) = π e(R0 ∩ DΥe ∩ η −1 (DΥe)) for some 0 local set R ⊆ ΓΥe and projection π : Υ → Ψ . Since enc2 (K) ⊆ T2 , it easily eq−1 Ψ b follows that enc(enc2 (K)) = π e(R00 ∩DΥe ∩η −1 (DΥe )), where R00 = R0 ∩(ΓΥe )+ , q q q−1 which is a local subset of (ΓΥe )+ . Using this in the proof of Lemma 39, we q−1 easily obtain M = h0 (R00 ∩ DΥe ∩ η −1 (DΥe )), (30) q−1 q−1 where h0 : (ΓΥe )∗ → Σ ∗ is an alphabetic homomorphism. In order to obtain q−1 the statement of the lemma, there are three things we need to fix: – for c ∈ Υ , η erases [c and ]c , Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 57 – when q ≥ 2, the number of pairs of brackets in the group [c , ]c is 1 < q, and – when k < q − 1, the number of pairs of brackets in the group [(c,k,0) , ](c,k,0) , [(c,k,1) , ](c,k,1) , . . . , [(c,k,k) , ](c,k,k) is k + 1 < q. We introduce the following new brackets: [c,1 , ]c,1 , . . . , [c,q−1 , ]c,q−1 , [(c,k,k+1) , ](c,k,k+1) , . . . , [(c,k,q−1) , ](c,k,q−1) , for each c ∈ Υ and k < q − 1. We now have an alphabet Γqn consisting of n groups of q pairs of brackets: [c , ]c , [c,1 , ]c,1 , . . . , [c,q−1 , ]c,q−1 , [(c,k,0) , ](c,k,0) , . . . , [(c,k,q−1) , ](c,k,q−1) , ∗ by where n = |Υ | × (q + 1). Define a homomorphism ψ : (ΓΥe )∗ → Γqn q−1 ψ([c ) = [c [c,1 , ψ(]c ) = ]c,1 [c,2 ]c,2 . . . [c,q−1 ]q−1 ]c , ψ([(c,k,0) ) = [(c,k,0) , ψ(](c,k,0) ) = [(c,k,k+1) ](c,k,k+1) . . . [(c,k,q−1) ](c,k,q−1) ](c,k,0) , ψ([(c,k,i) ) = [(c,k,i) , for 1 ≤ i ≤ k. ψ(](c,k,i) ) = ](c,k,i) ∗ Now it is easy to see that ψ is injective and ψ(R00 ) is a local subset of Γqn . Also, ∗ we can show that all x ∈ (ΓΥe ) satisfy the following properties: q−1 x ∈ DΥe q−1 if and only if ψ(x) ∈ Dqn , g(ψ(x)) ∗ (31) ψ(η(x)). These properties combined ensure that η(x) ∈ DΥe q−1 if and only if g(ψ(x)) ∈ Dqn . (32) ∗ Let χ : Γqn → (ΓΥe )∗ be the alphabetic homomorphism that erases all new q−1 brackets. Clearly, χ restricted to the range of ψ is the inverse of ψ. Now observe χ(ψ(R00 ) ∩ Dqn ∩ g −1 (Dqn )) = R00 ∩ DΥe q−1 ∩ η −1 (DΥe q−1 ). (33) Indeed, if x ∈ R00 , ψ(x) ∈ Dqn , and g(ψ(x)) ∈ Dqn , then χ(ψ(x)) = x ∈ R00 ∩ DΥe by (31), and η(χ(ψ(x)) = η(x) ∈ DΥe by (32). Conversely, if q−1 q−1 00 x ∈ R ∩ DΥe and η(x) ∈ DΥe , then ψ(x) ∈ ψ(R00 ) ∩ Dqn by (31) and q−1 q−1 g(ψ(x)) ∈ Dqn by (32), and so x = χ(ψ(x)) ∈ χ(ψ(R00 ) ∩ Dqn ∩ g −1 (Dqn )). We obtain the statement of the lemma from (30) and (33) by taking R = ψ(R00 ) and h = h0 ◦ χ. t u 58 Makoto Kanazawa As before, in the direction (i) ⇒ (ii) of the above proof, R∩Dqn ∩g −1 (Dqn ) = ψ(R00 ) ∩ Dqn ∩ g −1 (Dqn ) stands in one-one correspondence with the local set K ⊆ T3Ψ,x . Thus, each derivation tree T of a simple context-free tree grammar for M is uniquely represented by an element s of R ∩ Dqn ∩ g −1 (Dqn ) such that enc(enc2 (T )) is the image of s under a certain alphabetic homomorphism, and vice versa. This is exactly analogous to the situation with the original ChomskySchützenberger representation theorem. As in the case of context-free languages, we can take a fixed Dyck language D2q , instead of Dqn with varying n, and use a rational transduction to represent any string language that is the yield image of some L ∈ CFTsp (q − 1):23 Corollary 43. For any M ∈ yCFTsp (q − 1), there is a rational transduction τ such that M = τ (D2q ∩ g −1 (D2q )), where g is as defined in Theorem 42. 10 Conclusion We have generalized Weir’s [34] characterization of the string languages of treeadjoining grammars to the string languages of simple context-free tree grammars of arbitrary fixed rank. We obtained this result via a natural generalization of the original Chomsky-Schützenberger theorem to simple context-free tree grammars. We represented derivation trees of simple context-free tree grammars as 3-dimensional trees, and proved this latter result as a general fact about simple context-free sets of m-dimensional trees, for arbitrary m ≥ 2. This generality is of course an overkill for the purpose of obtaining our generalization of Weir’s theorem, but it may be of independent interest. Moreover, all the complexity of the general case is essentially already present in the 3-dimensional case; proving only the special case of our lemmas that are needed for the generalization of Weir’s theorem will not be substantially simpler. In order to define the m-dimensional yield of an (m + 1)-dimensional tree, we placed a restriction on the occurrences of a special label x that serve as targets for substitution. If we wish to iterate the process of taking the yield, i.e., if we are interested in the yield of the yield of an (m + 2)-dimensional tree, etc., we will need more than one variable as placeholders, with each variable providing targets for substitution at a different step of the iterative process of taking yields. Although we did not attempt to do so in this paper, it may be interesting to study the resulting hierarchy of classes of tree languages (and their yield images), the first three levels of the hierarchy being the local tree languages, 23 Also, when string languages over a fixed alphabet Σ with |Σ| = k are considered, we can use Dq(k+2) and a fixed alphabetic homomorphism h so that every M ∈ yCFTsp (q − 1) can be written as M = h(R ∩ Dq(k+2) ∩ g −1 (Dq(k+2) )) for some regular set R. (This will require modification of the constructions used in several lemmas.) See [36] for an analogous characterization of string languages of q-MCFGs of rank r, and, e.g., [27] for the case of context-free languages. Multi-dimensional Trees and a Representation Theorem for Simple CFTGs 59 the simple context-free tree languages, and the yields of the simple context-free sets of (well-labeled) 3-dimensional trees.24 References 1. Arnold, A., Dauchet, M.: Un theoreme de Chomsky-Schützenberger pour les forets algebriques. CALCOLO 14(2), 161–184 (1977) 2. Benoit, D., Demaine, E.D., Munro, I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43, 275–292 (2005) 3. Chomsky, N., Schützeberger, M.P.: The algebraic theory of context-free languages. In: Braffort, P., Hirschberg, D. (eds.) Computer Programming and Formal Systems, pp. 118–161. North-Holland, Amsterdam (1963) 4. Chomsky, N.: Formal properties of grammars. In: Luce, R.D., Bush, R.R., Galanter, E. (eds.) Handbook of Mathematical Psychology, Volume II, pp. 323–418. John Wiley and Sons, New York (1963) 5. Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Löding, C., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications. Available on http://tata.gforge.inria.fr/ (2008), release November 18, 2008 6. Eilenberg, S.: Automata, Languages, and Machines, vol. A. Academic Press, New York (1974) 7. Engelfreit, J., Maneth, S.: Tree languages generated by context-free graph grammars. In: Ehrig, H., Engels, G., Kreowski, H.J., Rozenberg, G. (eds.) Theory and Application of Graph Transformations: 6th International Workshop, TAGT’98. pp. 15–59. Springer, Berlin (2000) 8. Engelfriet, J., Schmidt, E.M.: IO and OI, part I. The Journal of Computer and System Sciences 15, 328–353 (1977) 9. Fischer, M.J.: Grammars with Macro-Like Productions. Ph.D. thesis, Harvard University (1968) 10. Fujiyoshi, A., Kasai, T.: Spinal-formed context-free tree grammars. Theory of Computing Systems 33, 59–83 (2000) 11. Joshi, A.K., Schabes, Y.: Tree-adjoining grammars. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 3, pp. 69–123. Springer, Berlin (1997) 12. Kanazawa, M.: The convergence of well-nested mildly context-sensitive grammar formalisms (July 2009), an invited talk given at the 14th Conference on Formal Grammar, Bordeaux, France. Slides available at http://research.nii.ac.jp/˜kanazawa/. 13. Kanazawa, M.: The pumping lemma for well-nested multiple context-free languages. In: Diekert, V., Nowotka, D. (eds.) Developments in Language Theory: 13th International Conference, DLT 2009. pp. 312–325. Springer, Berlin (2009) 14. Kasprzik, A.: Making finite-state methods applicable to languages beyond contextfreeness via multi-dimensional trees. In: Piskorski, J., Vatson, B.W., Yli-Jyrä, A. (eds.) Finite-State Methods and Natural Language Processing, pp. 98–109. IOS Press, Amsterdam (2009) 15. Kepser, S., Mönnich, U.: Closure properties of linear context-free tree languages with an application to optimality theory. Theoretical Computer Science 354(1), 82–97 (2006) 24 Rogers [23] looked at a connection between multi-dimensional trees (with his notion of yield) and Weir’s [35] control language hierarchy. 60 Makoto Kanazawa 16. Kepser, S., Rogers, J.: The equivalence of tree adjoining grammars and monadic linear context-free tree grammars. Journal of Logic, Language and Information 20, 361–384 (2011) 17. Knuth, D.E.: The Art of Computer Programming, Vol. I: Fundamental Algorithms. Addison-Wesley, Reading, Mass. (1997) 18. Kozen, D.C.: Automata and Computability. Springer, New York (1997) 19. Matsubara, S., Kasai, T.: A characterization of TALs by the generalized Dyck language. IEICE Transactions on Information and Systems (Japanese Edition) J90-D, 1417–1427 (2007) 20. McNaughton, R., Papert, S.A.: Counter-Free Automata. MIT Press, Cambridge, Mass. (1971) 21. Mönnich, U.: Adjunction as substitution: An algebraic formulation of regular, context-free and tree adjoining languages (1997), arXiv:cmp-lg/9707012v1 22. Perrin, D.: Finite automata. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. B, pp. 1–57. Elsevier, Amsterdam (1990) 23. Rogers, J.: Syntactic structures as multi-dimensional trees. Research on Language and Computation 1, 265–305 (2003) 24. Rogers, J.: wMSO theories as grammar formalisms. Theoretical Computer Science 293, 291–320 (2003) 25. Rogers, J., Pullum, G.K.: Aural pattern recognition experiments and the subregular hierarchy. Journal of Logic, Language and Information 20, 329–342 (2011) 26. Rounds, W.: Mappings and grammars on trees. Mathematical Systems Theory 4, 257–287 (1970) 27. Salomaa, A.: Formal Languages. Academic Press, Orlando, Florida (1973) 28. Seki, H., Kato, Y.: On the generative power of multiple context-free grammars and macro grammars. IEICE Transactions on Information and Systems E91–D, 209–221 (2008) 29. Seki, H., Matsumura, T., Fujii, M., Kasami, T.: On multiple context-free grammars. Theoretical Computer Science 88(2), 191–229 (1991) 30. Sorokin, A.: Monoid automata for displacement context-free languages. In: ESSLLI Student Session 2013 Preproceedings. pp. 158–167 (2013) 31. Takahashi, M.: Generalizations of regular sets and their application to a study of context-free languages. Information and Control 27, 1–36 (1975) 32. Thatcher, J.W.: Characterizing derivation trees of context-free grammars through a generalization of finite automata theory. Journal of Computer and System Sciences 1, 317–322 (1967) 33. Thatcher, J.W.: Generalized2 sequential machine maps. Journal of Computer and System Sciences 24, 339–367 (1970) 34. Weir, D.J.: Characterizing Mildly Context-Sensitive Grammar Formalisms. Ph.D. thesis, University of Pennsylvania (1988) 35. Weir, D.J.: A geometric hierarchy beyond context-free languages. Theoretical Computer Science 104(2), 235–261 (1992) 36. Yoshinaka, R., Kaji, Y., Seki, H.: Chomsky-Schützenberger-type characterization of multiple context-free languages. In: Dediu, A.H., Fernau, H., Martı́n-Vide, C. (eds.) Language and Automata Theory and Applications, Fourth International Conference, LATA 2010. pp. 596–607. Springer, Berlin (2010)
© Copyright 2026 Paperzz