Two Algorithms for the Automated Manipulation of Discourse Trees

Playing with RST: Two Algorithms for the
Automated Manipulation of Discourse Trees
Floriana Grasso
Department of Computer Science - University of Liverpool, UK
[email protected]
Abstract. This paper presents two algorithms for modifying RST based
discourse trees in order to solve two given problems. By only exploiting
syntactic properties of the trees, information originally presented is reorganized, to produce new, coherent text.
1
Introduction
This paper describes two algorithms for manipulating discourse trees produced
by a text planner based on the Rhetorical Structure Theory (RST) principles [2].
Although many techniques exist to this purpose, the algorithms presented here
are original insofar as they concentrate on purely syntactic manipulations, on the
grounds that coherent text can be produced from existing text by only exploiting
the property of nuclearity stated by the RST. The hypotheses behind them are
general enough to ensure they can be applied to the output of any RST based
text planner. We assume that a step of surface generation follows the algorithms
application, and ignore practical problems of smooth phrasing of the text.
2
Definitions and Assumptions
We refer to the definitions and assumptions in [2,3], some of them slightly modified to add generality, and include a few more. These summarize the minimal
set of characteristics that would be reasonably expected in any discourse tree
adhering to the RST criteria.
Definition 1. A Rhetorical Structure tree (RS-tree) is a tree whose nodes
are defined by the triple <Name,Type,Content> where Name is an identifier,
Type is either Root or the role (Nucleus or Satellite) that the node plays in
the rhetorical relation (RR) associated with its parent, and Content is either
the RR holding among the node’s children (if the node is intermediate), or the
informative unit (IU) associated with the node (if it is a leaf ).
We do not fix any limit on the number of nucleus and satellite children of a node,
provided that each node has at least two children, with at least one nucleus.
Definition 2. Given two sets of nodes N = {n1 , . . . , nj } and M = {m1 , . . . , mk }
of T, then N precedes M in T (N <T M ) if each node in N is considered
before every node in M when exploring T in a depth-first, left to right fashion.
V. Matoušek et al. (Eds.): TSD ’99, LNAI 1692, pp. 357–360, 1999.
c Springer-Verlag Berlin Heidelberg 1999
358
F. Grasso
Definition 3. Given T a RS-tree, L = {l1 , . . . , ln } a set of (not necessarily adjacent) leaves of T, and n a node (not leaf ) of T, then:
• n generates L if L is contained in the set of leaves that n spans.
• The lowest generator of L (γL ) is the unique node of T such that:
(i) γL generates L
(ii) for all ni , nodes of T generating L, γL is a descendant of ni .
• The context of L (χL ) is the set of all leaves generated by γL .
• L is a span if χL = L.
Definition 4. Two set of leaves L1 and L2 of a RS-tree are independent if
their contexts do not overlap (χL1 ∩ χL2 = ∅).
Definition 5. Given T a RS-tree, the most nuclear part of T (N ucT ) is the
set of T’s leaves recursively defined as:
(i) if T consists of a single node, then N ucT is T itself;
(ii) otherwise, if RT is the root of T, N ucT is the union of the most nuclear
parts of all RT ’s children having a Nucleus role.
We define the most nuclear part of a node n as N ucTn , where Tn is the sub-tree
whose root is n, and the most nuclear part of a span S as N ucγS .
Definition 6. Given T a RS-tree, the nuclear structure of T (NT ) is the set
of the most nuclear part of all T’s nodes (NT = {N |N = N ucn , n ∈ T }).
Assumption 1. A rhetorical relation Rel holding between two spans S1 and
S2 also holds between N ucS1 and N ucS2 . We say that Rel projects a deep-RR
(∆Rel ) between the two most nuclear parts.
Assumption 2. Two RS-trees having the same set of leaves (IUs), the same
nuclear structure, and the same set of deep-RRs holding among the elements of
their nuclear structures, have the same meaning (are equivalent).
Definition 7. A RS-tree manipulation operation is meaning preserving if
the resulting RS-tree is equivalent to the original one.
3
Algorithm 1: Extracting Sub-Trees
Problem 1. Given a RS-tree T and an arbitrary set L = {l1 , ..., ln } of leaves of
T , extract T1 , the smallest sub-tree of T whose set of leaves contains L, such
that NT1 ⊆ NT , and in which the deep-RRs defined by T in L are preserved.
Algorithm 1. Mark the nodes of TL , the tree originating from γL , as follows:
1. let Nk = {n1 , ..., nm } be the set of nodes to mark at step k (where N0 = L);
2. repeat until Nk is empty:
a) mark in TL each element belonging to Nk ;
b) for each element ni of Nk , consider ni ’s parent, pni :
i. if ni has a satellite role, mark in TL the most nuclear part of pni ;
ii. if pni 6= γL then add pni to Nk+1 ;
3. mark γL and prune out, from TL , all unmarked nodes;
4. for each marked node n having only one child nc , prune out n and connect nc
with n’s parent, by also attributing n’s role to nc (see Fig. 1 for an example).
Two Algorithms for the Automated Manipulation of Discourse Trees
<A,Root,RR1 >
<B,N uc,RR2 >
<D,N uc,IU1 > A
A
<E,Sat,RR4 >
<H,N uc,RR6 >
A
<L,N uc,IU2 > A
<B,N uc,RR2 >
H
H
<G,Sat,IU
10 >
A
A
<J,N uc,IU6 >
<D,N uc,IU1 >
-
AA
J
J
<N,N uc,IU7 >
<O,N uc,IU8 >
<P,N uc,IU9 >
<R,Sat,IU4 >
<K,Sat,RR7 >
AA
<M,N uc,RR8 >
H
H
<F,N uc,RR5 >
<Q,N uc,IU3 >
<A,Root,RR1 >
H
H
<C,Sat,RR3 >
<I,Sat,IU5 > 359
A
A
AA
<F,Sat,RR5 >
A
A
<J,N uc,IU6 >
AA
<K,Sat,RR7 >
<N,N uc,IU7 >
<O,N uc,IU8 >
<Q,Sat,IU3 >
Fig. 1. Extraction of the smallest subtree generating L = {Q, N, O}.
4
Algorithm 2: Exchanging Text Spans
Problem 2. Given a RS-tree, T , and two independent sets L1 = {lj , ..., ln } and
L2 = {lk , ..., lm } of leaves of T , such that L1 <T L2 , generate a RS-tree T1 ,
equivalent to T , such that L2 <T1 L1 .
We first describe two basic operations on the RS-tree, then the main algorithm:
Inversion of siblings: Let n be a node of a RS-tree T , and Ni = {ni1 , ..., nik }
and Nj = {nj1 , ..., njh }, two non overlapping subsets of n’s children such that
Ni <T Nj . Then Inv(n, Ni , Nj ) rearranges n’s children so that Nj <T Ni .
Exchange of satellite children: Let < n1 , Role1 , RR1 > and < n2 , Role2 , RR2 >
be two nodes of a RS-tree T . Let Sa1 and Sa2 be the respective sets of T ’s
subtrees originating from the children of n1 and n2 having a satellite role.
An exchange of satellites between n1 and n2 , ExcSat(n1 , n2 ), consists of:
1. replacing < n1 , Role1 , RR1 > with < n1 , Role1 , RR2 >;
2. replacing < n2 , Role2 , RR2 > with < n2 , Role2 , RR1 >;
3. substituting the set Sa1 with the set Sa2 in n1 ;
4. substituting the set Sa2 with the set Sa1 in n2 .
Inv(n, Ni , Nj ) is always meaning preserving (we assume with [2] that a rhetorical
relation application does not constrain the order of nuclei and satellites), whereas
ExcSat(n1 , n2 ) is meaning preserving only if N ucn1 = N ucn2 .
Algorithm 2. Let χ1 and χ2 be the contexts of L1 and L2 respectively, L12 =
χ1 ∪ χ2 and γL12 the lowest generator of L12 . Two cases may occur:
1. γL12 has at least one satellite child: let γN uc12 be the lowest generator of
N ucγL12 , the most nuclear part of γL12 , and LN uc12 the set of leaves generated by the nucleus children of γN uc12 . Two cases may occur:
a) LN uc12 has empty intersection with L12 . Let γ1 be the lowest generator
of χ1 ∪ LN uc12 and γ2 the lowest generator of χ2 ∪ LN uc12 . Trivially,
N ucγ1 = N ucγ2 = N ucγL12 . Then execute ExcSat(γ1 , γ2 ) (Fig 2, left).
360
F. Grasso
<A,Root,RR1 >
H
H
<B,N uc,RR2 >
H
H
<C,Sat,IU1 >
<D,N uc,IU2 > <E,Sat,IU3 >
<A,Root,RR2 >
<B,N uc,RR1 >
H
H
H
H
<E,Sat,IU3 >
<D,N uc,IU2 > <C,Sat,IU1 >
HH
j
H
<A,Root,RR1 >
<C,Sat,IU1 >
H
H
<B,N uc,RR2 >
H
H
<D,N uc,IU2 > <E,Sat,IU3 >
Fig. 2. Exchanging {C} and {E} (left) and {C} and {D} (right).
b) LN uc12 has non-empty intersection with L12 . Because of the definition
of context and the hypothesis of independence, it will have a non empty
intersection with either χ1 or χ2 but not both. Then, if N1 and N2 are
the sets of children of γL12 generating χ1 and χ2 respectively, execute
Inv(γL12 , N1 , N2 ) (Fig 2, right).
2. γL12 has no satellite children: treat as case 1(b).
Note that the algorithm can be applied only to two independent sets of leaves.
If the independence hypothesis is relaxed, a purely syntactic exchange cannot
be performed, and semantics has to be taken into account.
5
Conclusions
Modifying an existing text generally requires semantic knowledge, and can therefore be a difficult task. However, for a restricted class of modification problems, good results can be efficiently achieved by exploiting the syntactic properties of the RS-tree representing the text. Such problems concern the mere
re-organization of existent text, with the only constraint that the deep rhetorical relations established among the informative units of the original text are
preserved. For these problems, algorithms can be found that take in input a
RS-tree and produce new, equivalent RS-trees satisfying a given requirement.
This paper described two of these text modification problems, and introduced
two algorithms to solve them. The algorithms’ results have been evaluated, and
they were applied to two typical cases: re-structuring of one-shot documents [1]
and generation of dynamic hypermedia. The two applications showed that these
algorithms can be successfully used in practical contexts.
References
1. F. de Rosis, F. Grasso and D.C. Berry. Refining Instructional Text Generation
After Evaluation. Artificial Intelligence in Medicine, to appear.
2. W. Mann and S. Thompson. Rhetorical Structure Theory: Toward a Functional
Theory of Text Organization. Text, 8(3):243–281, 1988.
3. D. Marcu. Building up Rhetorical Structure Trees. In Proceedings of the 13th
National Conference on Artificial Intelligence (AAAI96), vol. 2, p. 1069–1074.