Playing with RST: Two Algorithms for the Automated Manipulation of Discourse Trees Floriana Grasso Department of Computer Science - University of Liverpool, UK [email protected] Abstract. This paper presents two algorithms for modifying RST based discourse trees in order to solve two given problems. By only exploiting syntactic properties of the trees, information originally presented is reorganized, to produce new, coherent text. 1 Introduction This paper describes two algorithms for manipulating discourse trees produced by a text planner based on the Rhetorical Structure Theory (RST) principles [2]. Although many techniques exist to this purpose, the algorithms presented here are original insofar as they concentrate on purely syntactic manipulations, on the grounds that coherent text can be produced from existing text by only exploiting the property of nuclearity stated by the RST. The hypotheses behind them are general enough to ensure they can be applied to the output of any RST based text planner. We assume that a step of surface generation follows the algorithms application, and ignore practical problems of smooth phrasing of the text. 2 Definitions and Assumptions We refer to the definitions and assumptions in [2,3], some of them slightly modified to add generality, and include a few more. These summarize the minimal set of characteristics that would be reasonably expected in any discourse tree adhering to the RST criteria. Definition 1. A Rhetorical Structure tree (RS-tree) is a tree whose nodes are defined by the triple <Name,Type,Content> where Name is an identifier, Type is either Root or the role (Nucleus or Satellite) that the node plays in the rhetorical relation (RR) associated with its parent, and Content is either the RR holding among the node’s children (if the node is intermediate), or the informative unit (IU) associated with the node (if it is a leaf ). We do not fix any limit on the number of nucleus and satellite children of a node, provided that each node has at least two children, with at least one nucleus. Definition 2. Given two sets of nodes N = {n1 , . . . , nj } and M = {m1 , . . . , mk } of T, then N precedes M in T (N <T M ) if each node in N is considered before every node in M when exploring T in a depth-first, left to right fashion. V. Matoušek et al. (Eds.): TSD ’99, LNAI 1692, pp. 357–360, 1999. c Springer-Verlag Berlin Heidelberg 1999 358 F. Grasso Definition 3. Given T a RS-tree, L = {l1 , . . . , ln } a set of (not necessarily adjacent) leaves of T, and n a node (not leaf ) of T, then: • n generates L if L is contained in the set of leaves that n spans. • The lowest generator of L (γL ) is the unique node of T such that: (i) γL generates L (ii) for all ni , nodes of T generating L, γL is a descendant of ni . • The context of L (χL ) is the set of all leaves generated by γL . • L is a span if χL = L. Definition 4. Two set of leaves L1 and L2 of a RS-tree are independent if their contexts do not overlap (χL1 ∩ χL2 = ∅). Definition 5. Given T a RS-tree, the most nuclear part of T (N ucT ) is the set of T’s leaves recursively defined as: (i) if T consists of a single node, then N ucT is T itself; (ii) otherwise, if RT is the root of T, N ucT is the union of the most nuclear parts of all RT ’s children having a Nucleus role. We define the most nuclear part of a node n as N ucTn , where Tn is the sub-tree whose root is n, and the most nuclear part of a span S as N ucγS . Definition 6. Given T a RS-tree, the nuclear structure of T (NT ) is the set of the most nuclear part of all T’s nodes (NT = {N |N = N ucn , n ∈ T }). Assumption 1. A rhetorical relation Rel holding between two spans S1 and S2 also holds between N ucS1 and N ucS2 . We say that Rel projects a deep-RR (∆Rel ) between the two most nuclear parts. Assumption 2. Two RS-trees having the same set of leaves (IUs), the same nuclear structure, and the same set of deep-RRs holding among the elements of their nuclear structures, have the same meaning (are equivalent). Definition 7. A RS-tree manipulation operation is meaning preserving if the resulting RS-tree is equivalent to the original one. 3 Algorithm 1: Extracting Sub-Trees Problem 1. Given a RS-tree T and an arbitrary set L = {l1 , ..., ln } of leaves of T , extract T1 , the smallest sub-tree of T whose set of leaves contains L, such that NT1 ⊆ NT , and in which the deep-RRs defined by T in L are preserved. Algorithm 1. Mark the nodes of TL , the tree originating from γL , as follows: 1. let Nk = {n1 , ..., nm } be the set of nodes to mark at step k (where N0 = L); 2. repeat until Nk is empty: a) mark in TL each element belonging to Nk ; b) for each element ni of Nk , consider ni ’s parent, pni : i. if ni has a satellite role, mark in TL the most nuclear part of pni ; ii. if pni 6= γL then add pni to Nk+1 ; 3. mark γL and prune out, from TL , all unmarked nodes; 4. for each marked node n having only one child nc , prune out n and connect nc with n’s parent, by also attributing n’s role to nc (see Fig. 1 for an example). Two Algorithms for the Automated Manipulation of Discourse Trees <A,Root,RR1 > <B,N uc,RR2 > <D,N uc,IU1 > A A <E,Sat,RR4 > <H,N uc,RR6 > A <L,N uc,IU2 > A <B,N uc,RR2 > H H <G,Sat,IU 10 > A A <J,N uc,IU6 > <D,N uc,IU1 > - AA J J <N,N uc,IU7 > <O,N uc,IU8 > <P,N uc,IU9 > <R,Sat,IU4 > <K,Sat,RR7 > AA <M,N uc,RR8 > H H <F,N uc,RR5 > <Q,N uc,IU3 > <A,Root,RR1 > H H <C,Sat,RR3 > <I,Sat,IU5 > 359 A A AA <F,Sat,RR5 > A A <J,N uc,IU6 > AA <K,Sat,RR7 > <N,N uc,IU7 > <O,N uc,IU8 > <Q,Sat,IU3 > Fig. 1. Extraction of the smallest subtree generating L = {Q, N, O}. 4 Algorithm 2: Exchanging Text Spans Problem 2. Given a RS-tree, T , and two independent sets L1 = {lj , ..., ln } and L2 = {lk , ..., lm } of leaves of T , such that L1 <T L2 , generate a RS-tree T1 , equivalent to T , such that L2 <T1 L1 . We first describe two basic operations on the RS-tree, then the main algorithm: Inversion of siblings: Let n be a node of a RS-tree T , and Ni = {ni1 , ..., nik } and Nj = {nj1 , ..., njh }, two non overlapping subsets of n’s children such that Ni <T Nj . Then Inv(n, Ni , Nj ) rearranges n’s children so that Nj <T Ni . Exchange of satellite children: Let < n1 , Role1 , RR1 > and < n2 , Role2 , RR2 > be two nodes of a RS-tree T . Let Sa1 and Sa2 be the respective sets of T ’s subtrees originating from the children of n1 and n2 having a satellite role. An exchange of satellites between n1 and n2 , ExcSat(n1 , n2 ), consists of: 1. replacing < n1 , Role1 , RR1 > with < n1 , Role1 , RR2 >; 2. replacing < n2 , Role2 , RR2 > with < n2 , Role2 , RR1 >; 3. substituting the set Sa1 with the set Sa2 in n1 ; 4. substituting the set Sa2 with the set Sa1 in n2 . Inv(n, Ni , Nj ) is always meaning preserving (we assume with [2] that a rhetorical relation application does not constrain the order of nuclei and satellites), whereas ExcSat(n1 , n2 ) is meaning preserving only if N ucn1 = N ucn2 . Algorithm 2. Let χ1 and χ2 be the contexts of L1 and L2 respectively, L12 = χ1 ∪ χ2 and γL12 the lowest generator of L12 . Two cases may occur: 1. γL12 has at least one satellite child: let γN uc12 be the lowest generator of N ucγL12 , the most nuclear part of γL12 , and LN uc12 the set of leaves generated by the nucleus children of γN uc12 . Two cases may occur: a) LN uc12 has empty intersection with L12 . Let γ1 be the lowest generator of χ1 ∪ LN uc12 and γ2 the lowest generator of χ2 ∪ LN uc12 . Trivially, N ucγ1 = N ucγ2 = N ucγL12 . Then execute ExcSat(γ1 , γ2 ) (Fig 2, left). 360 F. Grasso <A,Root,RR1 > H H <B,N uc,RR2 > H H <C,Sat,IU1 > <D,N uc,IU2 > <E,Sat,IU3 > <A,Root,RR2 > <B,N uc,RR1 > H H H H <E,Sat,IU3 > <D,N uc,IU2 > <C,Sat,IU1 > HH j H <A,Root,RR1 > <C,Sat,IU1 > H H <B,N uc,RR2 > H H <D,N uc,IU2 > <E,Sat,IU3 > Fig. 2. Exchanging {C} and {E} (left) and {C} and {D} (right). b) LN uc12 has non-empty intersection with L12 . Because of the definition of context and the hypothesis of independence, it will have a non empty intersection with either χ1 or χ2 but not both. Then, if N1 and N2 are the sets of children of γL12 generating χ1 and χ2 respectively, execute Inv(γL12 , N1 , N2 ) (Fig 2, right). 2. γL12 has no satellite children: treat as case 1(b). Note that the algorithm can be applied only to two independent sets of leaves. If the independence hypothesis is relaxed, a purely syntactic exchange cannot be performed, and semantics has to be taken into account. 5 Conclusions Modifying an existing text generally requires semantic knowledge, and can therefore be a difficult task. However, for a restricted class of modification problems, good results can be efficiently achieved by exploiting the syntactic properties of the RS-tree representing the text. Such problems concern the mere re-organization of existent text, with the only constraint that the deep rhetorical relations established among the informative units of the original text are preserved. For these problems, algorithms can be found that take in input a RS-tree and produce new, equivalent RS-trees satisfying a given requirement. This paper described two of these text modification problems, and introduced two algorithms to solve them. The algorithms’ results have been evaluated, and they were applied to two typical cases: re-structuring of one-shot documents [1] and generation of dynamic hypermedia. The two applications showed that these algorithms can be successfully used in practical contexts. References 1. F. de Rosis, F. Grasso and D.C. Berry. Refining Instructional Text Generation After Evaluation. Artificial Intelligence in Medicine, to appear. 2. W. Mann and S. Thompson. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8(3):243–281, 1988. 3. D. Marcu. Building up Rhetorical Structure Trees. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI96), vol. 2, p. 1069–1074.
© Copyright 2026 Paperzz